netCoin creates interactive networked graphs of coincidences within the data. It brings together the data analysis capabilities of R with powerful interactive visualization JavaScript libraries to provide a package to study coincidences.
This vignette briefly describes the statistical methods and provides a few examples on how to use the package. The sections are structured as follows:
Coincidence analysis is a set of techniques to detect events, characters, objects, attributes, or characteristics that tend to occur together within certain delimited spaces.
These spaces are call scenarios (\(S\)) and are considered to be the units of analysis, and as such they have to be placed in the rows of a matrix or data.frame.
In each \(i\) scenario, a series of \(J\) events \(X_j\), which are represented as dichotomous variables \(X_{j}\) in columns, may occur (1) or may not occur (0). Scenarios and events constitute an incidence matrix \(\mathbf{I}\). \[\mathbf{I}= \begin{pmatrix} 0&1&0&...&1 \\ 1&0&1&...&0 \\ ...&...&...&...&... \\ 1&1&0&...&0 \end{pmatrix}\]
From this incidences matrix, a coincidence symmetric matrix (\(\mathbf{C}\)) can be obtained with the
coin
function . In this matrix the main diagonal represents
frequencies of \(X_j\), while the
others elements are number of coincidences between two events.
\[\mathbf{C}= \begin{pmatrix} 2&1&1&...&1 \\ 1&2&0&...&2 \\ 1&0&1&...&0 \\ ...&...&...&...&... \\ 1&2&0&...&2 \end{pmatrix}\]
Once there is a coin
object, a similarity matrix can be
obtained. Similarity matrices available in netCoin are:
In addition to the previous, other measures that can be obtained from
coin
are:
sim
elaborates a list of them.
Haberman | odd | even | small | large |
---|---|---|---|---|
odd | 10.000000 | -10.000000 | 4.766506 | -4.766506 |
even | -10.000000 | 10.000000 | -4.766506 | 4.766506 |
small | 4.766506 | -4.766506 | 10.000000 | -10.000000 |
large | -4.766506 | 4.766506 | -10.000000 | 10.000000 |
The function edgeList
generates a collecion of edges
composed by a list of similarity measures whenever a criterium
(generally p(Z)<.50) is met.
Source | Target | Haberman | P(z) |
---|---|---|---|
odd | small | 4.766506 | 3.18645e-06 |
even | large | 4.766506 | 3.18645e-06 |
In order to make a graph, two data frames are needed: a nodes data
frames with names and other nodes attributes (see as.nodes
and an edge data frame (see edgeList
). For more information
go to netCoin.
To install and load the updated version of the netCoin package simply run the following commands:
Once the netCoin package has been installed and loaded, let’s now load the dice data and have a look at it:
## dice 1 2 3 4 5 6 odd even small large
## V1 1 1 0 0 0 0 0 1 0 1 0
## V2 2 0 1 0 0 0 0 0 1 1 0
## V3 5 0 0 0 0 1 0 1 0 0 1
## V4 4 0 0 0 1 0 0 0 1 0 1
## V5 2 0 1 0 0 0 0 0 1 1 0
## V6 5 0 0 0 0 1 0 1 0 0 1
It contains the results of rolling a dice 100 times. The scenarios here are each dice roll. The events are the possible results, i.e. each of the numbers from 1 to 6 as well as odd or even and small(<4) or large(>3). Thus the first column contains the numeric result, the following 6 columns represent each of the dice roll possible outcomes with 1’s and 0’s. Finally, the last four columns also contain 0’s and 1’s for representing whether the result is odd or even, small or large.
Columns 2 to 11 can be considered the incidence matrix \(\mathbf{I}\)
## 1 2 3 4 5 6 odd even small large
## V1 1 0 0 0 0 0 1 0 1 0
## V2 0 1 0 0 0 0 0 1 1 0
## V3 0 0 0 0 1 0 1 0 0 1
## V4 0 0 0 1 0 0 0 1 0 1
## V5 0 1 0 0 0 0 0 1 1 0
## V6 0 0 0 0 1 0 1 0 0 1
Using the coin
function the coincidence matrix \(\mathbf{C}\) can be obtained:
## n= 100
## 1 2 3 4 5 6 odd even small large
## 1 15
## 2 0 13
## 3 0 0 26
## 4 0 0 0 18
## 5 0 0 0 0 13
## 6 0 0 0 0 0 15
## odd 15 0 26 0 13 0 54
## even 0 13 0 18 0 15 0 46
## small 15 13 26 0 0 0 41 13 54
## large 0 0 0 18 13 15 13 33 0 46
The nodes and edges can be calculated from the coincidence matrix \(\mathbf{C}\) and then the network object can be generated
N <- asNodes(C) # node data frame
E <- edgeList(C) # edge data frame
Net <- netCoin(N,E) # network object
The network to be visualised is created using the following command which generates a folder with an index.html file to open with a browser that will display the interface shown below:
The following example uses data about families of Renaissance Italy from Padgett & Ansell (1983). It consists of a dataframe (families) with information about italian families of the renaissance, and another dataframe (links) with the marriage and business links between families.
The previous coin
, edgeList
,
asNodes
and netCoin
functions can be executed
together with the allNet
function where several parameters
can be specifyed:
With the following commands two networks are generated that represent on the business and marriages links between the two families.
G <- allNet(incidence=links[links$link=="Marriage",-17],
nodes=families, layout="md", percentages= FALSE,
criteria="f",minL=1, size="f.Marriages",shape="seat",
main="Marriage links beetween Italian families",
note="Data source: Padgett & Ansell (1983)")
H <- allNet(incidence=links[links$link=="Business",-17],
nodes=families, layout="md", percentages= FALSE,
criteria="f",minL=1, size="f.Business",shape="seat",
main="Marriage links beetween Italian families",
note="Data source: Padgett & Ansell (1983)")
Once the two networks are ready, the function
multigraphCreate
generates both graphs in the specified
file.
This section uses one of the most renowned data examples in ecology. Charles Darwin compiled data about 13 species of finches and where they could be found in 17 of the Galapago islands. Sanderson ….
Here we add a few extra features to our graph:
data("Galapagos")
data("finches")
finches$species<-system.file("extdata", finches$species,
package="netCoin") # copy path to the species field
Net<-allNet(Galapagos,nodes=finches, criteria="hyp", maxL=.05,
lwidth ="Haberman",lweight="Haberman",
size="frequency", image="species", layout="mds",
main="Species coincidences in Galapagos Islands",
note="Data source: Sanderson (2000)")
plot(Net)