similarity.RdMultiple measurements of similarity and distance between pairs of binary matrices as listed in Choi et al (2010).
similarity( M, ..., statistic, normalized = TRUE, firstonly = FALSE, include_self = FALSE, exclude_j = FALSE ) # S3 method for list similarity( M, ..., statistic, normalized = TRUE, firstonly = FALSE, include_self = FALSE, exclude_j = FALSE ) # S3 method for matrix similarity( M, ..., statistic, normalized = TRUE, firstonly = FALSE, include_self = FALSE, exclude_j = FALSE )
| M | Either a list of matrices of size |
|---|---|
| ... | More matrices to be passed to the function. |
| statistic | Character. Name of the similarity index to be using. |
| normalized | Logical. When |
| firstonly | Logical. When |
| include_self | Logical. When set to |
| exclude_j | Logical. When |
A matrix of size n*(n - 1)/2 by length(statistic), where columns
1 and 2 indicate the id if i and j, and the reminder columns are the
corresponding distances/similarities.
All of the available statistics are based on a 2x2 contingency matrix counting matches and missmatches between each pair of matrices (R1, R2).
| R2 | |||
| 1 | 0 | ||
| R1 | 1 | a | b |
| 0 | c | d |
A complete list of the statistics available is available in the Similarity and Distance sections.
distance is just an alias for similarity.
Jaccard (1): "sjaccard" or "jaccard"
Sørensen–Dice coefficient (2), Sczekanowsk (3), Nei \& Li (5): "sdice" or "sczekanowsk" or "sneili"
3w-jaccard (4): "s3wjaccard"
Sokal & Michener (7): "ssokmich" or "sokmich"
Sokal & Sneath II (8): "ssoksne" or "soksne"
Roger & Tanimoto (9): "roger&tanimoto" or "sroger&tanimoto"
Faith (10): "sfaith" or "faith"
Gower and Legendre (11): "sgl" or "gl"
Rusell & Rao (14): "srusrao"
Tarwid (54): "starwid" or "tarwid".
Pearson & Heron 1 (54): "sph1" or "ph1" or "s14". This is also known as S14 in
Gower and Legendre (1986).
In the case of the S14 function, following Krackhardt's 1989:
$$% \sqrt{\left(\frac{a}{(a + c)} - \frac{b}{(b + d)}\right)\times\left(\frac{a}{(a + b)} - \frac{c}{(c + d)}\right)} $$
Which is an statistic lying between 0 and 1.
Dennis (44): "sdennis" or "dennis"
Yuleq (61): "syuleq"
Yuleq similarity (63): "syuleqw"
Michael (68): "smichael" or "michael"
$$% S_{michael} = \frac{4(ad-bc)}{(a+d)^2 + (b+c)^2} $$
Dispersion (66): "sdisp" or "disp"
$$% S_{Dispersion} = \frac{ad - bc}{(a + b + c + d)^2} $$
Hamann (67): "shamann" or "hamann"
$$% S_{Hamann} = \frac{(a + d) - (b + c)}{a + b + c + d} $$
Goodman & Kruskal (69): "sgk" or "gk"
$$% S_{Goodman \& Kruskal} = \frac{\sigma - \sigma'}{2n - \sigma'} $$
where \(\sigma = \max(a,b) + \max(c,d) + \max(a,c) + \max(b,d)\), and \(\sigma' = \max(a + c, b + d) + \max(a + b, c + d)\)
Anderberg (70): "sanderberg" or "anderberg"
$$% S_{Anderberg} = \frac{\sigma - \sigma'}{2n} $$
where \(\sigma\) and \(\sigma\) are defined as in (69).
Peirce (73): "speirce" or "peirce"
$$% S_{Peirce} = \frac{ab + bc}{ab + 2bc + cd} $$
In the case of fscore, ask Kyosuke Tanaka.
FScore (00): "fscore" or "sfscore"
Vari (23): "dvari" or "vari"
Sized Difference (24): "dsizedif" or "sizedif"
Shaped Difference (25): "dsphd" or "sphd"
Pattern Difference (26): "dpattdif" or "pattdif"
Hamming (15): "dhamming" or "hamming"
Mean Manhattan (20): "dmeanman" or "meamman"
$$%
D_{Mean-manhattan} = \frac{b + c}{a + b + c + d}
$$
Yuleq distance (62): "dyuleq"
Choi, S. S., Cha, S. H., & Tappert, C. C. (2010). A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 8(1), 43-48.
Krackhardt, D. (1990). Assessing the political landscape: Structure, cognition, and power in organizations. Administrative science quarterly, 342-369.
Gower, J. C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of classification, 3(1), 5-48.
The statistics object contains a list with the available statistics for convenience.
# Getting all networks of size 3 data(powerset03) # We can compute it over the entire set head(similarity(powerset03, statistic="s14"))#> i j s14 #> [1,] 1 2 0.6324555 #> [2,] 1 3 -0.2000000 #> [3,] 1 4 0.6324555 #> [4,] 1 5 0.4472136 #> [5,] 1 6 -0.3162278 #> [6,] 1 7 -0.2000000# Or over two pairs head(similarity(powerset03[[1]], powerset03[[2]], powerset03[[3]], statistic="s14"))#> i j s14 #> [1,] 1 2 0.6324555 #> [2,] 1 3 -0.2000000 #> [3,] 2 3 0.6324555# We can compute multiple distances at the same time ans <- similarity(powerset03, statistic=c("hamming", "dennis", "jaccard")) head(ans)#> i j hamming dennis jaccard #> [1,] 1 2 0.1666667 1.6329932 0.5000000 #> [2,] 1 3 0.3333333 -0.5773503 0.0000000 #> [3,] 1 4 0.1666667 1.6329932 0.5000000 #> [4,] 1 5 0.3333333 1.0000000 0.3333333 #> [5,] 1 6 0.5000000 -0.8164966 0.0000000 #> [6,] 1 7 0.3333333 -0.5773503 0.0000000