Introducing EcoGenetics

EcoGenetics is an R package with infrastructure and flexible methods designed to make easier and faster the data analysis routines of population ecologists and geneticists. These routines usually involve a series of operations over multi-source datasets, that can be simplified in a set of functions. The package defines a new class of object, “ecogen”, a data structure where multi-source information of populations can be stored. A series of methods for multi-source data manipulation are defined for ecogen objects. EcoGenetics also provides functions for exploratory analysis of Spatial Autocorrelation, for single and multiple variables, with an interactive data visualization environment.

Storing and organizing multi-source population data with ecogen objects

There is no movement apart from things; for change is always according to the categories of being
- Aristotle

The class “ecogen” was designed for a more efficient managing of the multi-source information in the different stages of population data analyses, and for a straightforward data pre-processing and storage. An object of this class behaves like an ordered “stack” of information, and possess a set of slots, each one storing data from a different source:

XY : data frame with geographic coordinates
P : phenotypic data frame
G : genotypic data frame
A : matrix of allele counts (only available for codominant markers)
E : environmental data frame
S : data frame with classes assigned to the individuals
C : custom data frame
OUT : list for storage of results

Construction of ecogen objects

First, we load the sample data contained in the package:

library("EcoGenetics")

data(eco.test)

The data frames contained in the workspace represent information from different sources for 225 individuals, as indicated by their names. A new ecogen object is created with the constructor as follows:

eco <- ecogen(XY = coordinates, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE)

## Note: ordered genotypes in slot G

eco

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 225 x 8        phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 225 x 6        environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

The names of individuals must be included as the row names of the data frames, or they may be configured during the call to the constructor. The number of rows of the non empty data frames must be identical (same individuals in same order). Otherwise, the program will return an error message:

eco_bad <- ecogen(XY = coordinates[1:10, ], P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE)

## Error in validObject(object): invalid class “ecogen” object: number of rows differ for non empty data frames

Unordered data frames can be ordered using the option order.df = TRUE. Data frames are ordered using as reference the order of the first non-empty slot, in this case "XY":

XY_random <- coordinates[sample(nrow(coordinates)), ]
eco_rand <- ecogen(XY = XYrandom, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE, order.df = TRUE)

Names can be set for all the non-empty data frames with the option set.names:

myNames <- paste0("X_", seq_len(nrow(coordinates)))

eco_myNames <- ecogen(XY = coordinates, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE, set.names = newNames)

The program can also create a set of valid names (individuals named as I₁, I₂,..., I_n):

eco_autoNames <- ecogen(XY = coordinates, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE, valid.names = TRUE)

Use of accessor functions with ecogen objects

EcoGenetics defines accessor functions to get (“get” mode) and set (“set” mode) the slot content of the objects defined in the package (see ?`EcoGenetics accessors`). The following notation is used for the name of the accessor functions: a prefix “ecoslot.”, followed by the name of the respective slot. In the case of ecogen objects, and using the object “eco” of the example as reference, the corresponding accessors are: ecoslot.XY(eco), ecoslot.P(eco), ecoslot.G(eco), ecoslot.A(eco), ecoslot.E(eco), ecoslot.S(eco), ecoslot.C(eco) and ecoslot.OUT(eco). The correct assignation of content to the slot of an existent object is made with accessors; these special functions ensure a basic pre-processing and checking of the data when used in “set” mode. A new ecogen object can be created using the constructor alone or combined with accessors:

# Creation of an ecogen object with only XY and P slots filled
eco.temp <- ecogen(XY = coordinates, P = phenotype)
eco.temp

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 225 x 8        phenotypic variables
##  | slot G:     | --> 0 x 0          loci         
##  | slot A:     | --> 0 x 0          alleles
##  | slot E:     | --> 0 x 0          environmental variables
##  | slot S:     | --> 0 x 0          structures   
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

# ecoslot.G “set” mode
ecoslot.G(eco.temp, order.G = TRUE) <- genotype

## Note: ordered genotypes in slot G

# Note: ordered genotypes in slot G 

# ecoslot.G “get” mode (only showing top of the table here)
head(ecoslot.G(eco.temp))

##     G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
## I.1 11 55 33 22 11 12 11 33 12  25
## I.2 11 22 33 22 11 11 11 33 22  44
## I.3 33 55 23 22 11 22 66 33 22  55
## I.4 14 55 22 11 11 22 44 33 12  22
## I.5 44 22 22 22 44 11 22 33 33  55
## I.6 44 22 22 22 33 12 22 44 11  55

# Fill the oher slots
ecoslot.E(eco.temp) <- environment 
ecoslot.S(eco.temp) <- structure 
eco.temp

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 225 x 8        phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 225 x 6        environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

As with the constructor, the number of rows of the non empty data frames must be identical in the output object. Trying to assign data frames with invalid number of rows will trigger an error message:

ecoslot.P(eco) <- phenotype[1:5, ]

## Error in validObject(object): invalid class “ecogen” object: 1: number of rows differ for non empty data frames 
## invalid class “ecogen” object: 2: invalid length in object names

Algebra of ecogen objects

The class “ecogen” has a set of operations for multi-source data manipulation. These operations can be divided into subset, split and combine methods:

1. Subset methods

A. Integer subsetting

eco.sub <- eco[1:50] # New object with first 50 individuals from eco
eco.sub

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 50 x 2         coordinates
##  | slot P:     | --> 50 x 8         phenotypic variables
##  | slot G:     | --> 50 x 10        loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 50 x 40        alleles
##  | slot E:     | --> 50 x 6         environmental variables
##  | slot S:     | --> 50 x 1         structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

B. Logical subsetting

eco[ecoslot.P(eco)[, 3] > 0.5]

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 206 x 2        coordinates
##  | slot P:     | --> 206 x 8        phenotypic variables
##  | slot G:     | --> 206 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 206 x 40       alleles
##  | slot E:     | --> 206 x 6        environmental variables
##  | slot S:     | --> 206 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

# The length of the logical vector must be identical to the number of rows of non empty slots. If not, an error will be produced:
eco[c(rep(TRUE, 200))]

# For using a logical vector of length < nrow of non empty slots, the logical vector be converted into integer with the function “which”
eco[which(c(TRUE,TRUE,TRUE))] # which(c(TRUE,TRUE,TRUE)) equivalent to c(1,2,3)

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 3 x 2          coordinates
##  | slot P:     | --> 3 x 8          phenotypic variables
##  | slot G:     | --> 3 x 10         loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 3 x 18         alleles
##  | slot E:     | --> 3 x 6          environmental variables
##  | slot S:     | --> 3 x 1          structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

# Special cases
eco[] # returns the same object

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 225 x 8        phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 225 x 6        environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

eco[FALSE]; eco[numeric(0)]; eco[logical(0)] # return an empty object

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 0 x 0          coordinates
##  | slot P:     | --> 0 x 0          phenotypic variables
##  | slot G:     | --> 0 x 0          loci         
##  | slot A:     | --> 0 x 0          alleles
##  | slot E:     | --> 0 x 0          environmental variables
##  | slot S:     | --> 0 x 0          structures   
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

C. Subsetting by group of the slot S

eco.subS <- eco.subset(eco,"pop", 1)
eco.subS

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 174 x 2        coordinates
##  | slot P:     | --> 174 x 8        phenotypic variables
##  | slot G:     | --> 174 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 174 x 40       alleles
##  | slot E:     | --> 174 x 6        environmental variables
##  | slot S:     | --> 174 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

2. Split methods

# Split as list of ecogen objects
x <- eco.split(eco, "pop", asList = TRUE)

# Split and create objects with prefix "eco" in the workspace
eco.split(eco, hier = "pop", asList = FALSE)

## New object created in workspace: eco.1 
## New object created in workspace: eco.2 
## New object created in workspace: eco.3 
## New object created in workspace: eco.4

# Split and create objects with prefix "newObjects" in the workspace
eco.split(eco, hier = "pop", name = "newObjects", asList = FALSE)

## New object created in workspace: newObjects.1 
## New object created in workspace: newObjects.2 
## New object created in workspace: newObjects.3 
## New object created in workspace: newObjects.4

3. Combine methods

A. Union by row

# Re-bind the previous split sets
rebinded <- eco.rbind(eco.1, eco.2, eco.3)# rebinding objects in workspace
eco.bind <- eco.rbind(x) #rebinding the list of ecogen objects
# Note that different subsets can also be created
S1.3 <- eco.rbind(x[[1]], x[[3]])

B. Union by column

eco1 <- eco
eco.c <- eco.cbind(eco, eco1)

## Note: null or duplicated column names. using generic labels.

eco.c

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 225 x 16       phenotypic variables
##  | slot G:     | --> 225 x 20       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 80       alleles
##  | slot E:     | --> 225 x 12       environmental variables
##  | slot S:     | --> 225 x 2        structures   >> 2 structures found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

C. Intersection

eco1 <- eco[2:20]
merged <- eco.merge(eco, eco1)
merged

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 19 x 4         coordinates
##  | slot P:     | --> 19 x 16        phenotypic variables
##  | slot G:     | --> 19 x 20        loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 19 x 74        alleles
##  | slot E:     | --> 19 x 12        environmental variables
##  | slot S:     | --> 19 x 2         structures   >> 2 structures found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

Other operations

nrow(eco) # number of rows in each slot

##  XY   P   G   A   E   S   C 
## 225 225 225 225 225 225   0

ncol(eco) # number of columns in each slot

## XY  P  G  A  E  S  C 
##  2  8 10 40  6  1  0

dim(eco) # dimension of each slot

## $XY
## [1] 225   2
## 
## $P
## [1] 225   8
## 
## $G
## [1] 225  10
## 
## $A
## [1] 225  40
## 
## $E
## [1] 225   6
## 
## $S
## [1] 225   1
## 
## $C
## [1] 0 0

as.list(eco) # conversion to list

Saving/loading ecogen objects

Serialization is a mechanism that allows to convert an object into a storable file, that can be saved on disk and restored in future R sessions.

Serialization provides a clean and compact option to save ecogen objects at the end of a session. This is performed by the function saveRDS of R:

saveRDS(eco, "myObject.rds")
# The object can be loaded in the future with the function readRDS:
eco_restored <- readRDS("myObject.rds")
eco_restored

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 225 x 8        phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 225 x 6        environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

identical(eco, eco_restored)

## [1] TRUE

Ecogen conversion from/to other formats

ecogen objects can be imported/exported from/to: Genepop (Rousset 2008), SPAGeDi (Hardy and Vekemans 2002), adegenet (genind objects, Jombart 2008), gstudio (data frame with loci objects, Dyer 2016)
ecogen objects can be exported also to hierfstat ( Goudet and Jombart 2015) and Geneland (Guillot et al. 2005)

Import/export from/to Genepop

# ingpop, file with Genepop format in the folder "/extdata" of the package
ecopath <- paste(path.package("EcoGenetics"), "/extdata/ingpop", sep = "")
ingpop <- genepop2ecogen(ecopath)
ingpop

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 0 x 0          coordinates
##  | slot P:     | --> 0 x 0          phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 0 x 0          environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

ecogen2genepop(ingpop, dir = "", outName = "infile.genepop.txt", grp = "pop")
# dir = "" writes the file to the current working directory

Import/export from/to SPAGeDi

ecogen2spagedi(eco, dir = "", pop = "pop", ndig = 1, int=2, smax = 6, outName="infile.spagedi.txt")

ecosp <- spagedi2ecogen("infile.spagedi.txt", sep = "")

ecosp

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 0 x 0          phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 0 x 0          environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

Import/export from/to genind

# ecogen to genind
outGenind <- ecogen2genind(eco)
outGenind

## /// GENIND OBJECT /////////
## 
##  // 225 individuals; 10 loci; 40 alleles; size: 73 Kb
## 
##  // Basic content
##    @tab:  225 x 40 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 2-6)
##    @loc.fac: locus factor for the 40 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: NULL
## 
##  // Optional content
##    @strata: a data frame with 1 columns ( pop )

# genind to ecogen
outEco <- genind2ecogen(outGenind)
outEco

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 0 x 0          coordinates
##  | slot P:     | --> 0 x 0          phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 0 x 0          environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

Import/export from/to gstudio

togstudio <- ecogen2gstudio(eco)
head(togstudio)

##      ID Latitude Longitude pop  G1  G2  G3  G4  G5  G6  G7  G8  G9 G10
## I.1 I.1        1         1   1 1:1 5:5 3:3 2:2 1:1 1:2 1:1 3:3 1:2 2:5
## I.2 I.2        1         2   1 1:1 2:2 3:3 2:2 1:1 1:1 1:1 3:3 2:2 4:4
## I.3 I.3        1         3   1 3:3 5:5 2:3 2:2 1:1 2:2 6:6 3:3 2:2 5:5
## I.4 I.4        1         4   1 1:4 5:5 2:2 1:1 1:1 2:2 4:4 3:3 1:2 2:2
## I.5 I.5        1         5   1 4:4 2:2 2:2 2:2 4:4 1:1 2:2 3:3 3:3 5:5
## I.6 I.6        1         6   1 4:4 2:2 2:2 2:2 3:3 1:2 2:2 4:4 1:1 5:5

toeco <- gstudio2ecogen(togstudio, lat = "Latitude", lon = "Longitude", ID = "ID", struct = "pop")
toeco

## 
##                || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
##  Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)> 
##  See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
## 
##  | slot XY:    | --> 225 x 2        coordinates
##  | slot P:     | --> 0 x 0          phenotypic variables
##  | slot G:     | --> 225 x 10       loci         >> ploidy: 2 || codominant
##  | slot A:     | --> 225 x 40       alleles
##  | slot E:     | --> 0 x 0          environmental variables
##  | slot S:     | --> 225 x 1        structure    >> 1 structure found
##  | slot C:     | --> 0 x 0          variables
##  | slot OUT:   | --> 0              results
## ----------------------------------------------------------------------------

Export to hierfstat

hiereco <- ecogen2hierfstat(eco, "pop")
require("hierfstat")
basic.stats(hiereco)

## $perloc
##         Ho     Hs     Ht    Dst    Htp   Dstp    Fst   Fstp    Fis   Dest
## G1  0.0926 0.4222 0.4964 0.0742 0.5212 0.0990 0.1496 0.1899 0.7807 0.1713
## G2  0.1319 0.7080 0.7333 0.0252 0.7417 0.0336 0.0344 0.0453 0.8137 0.1152
## G3  0.0454 0.5713 0.6751 0.1038 0.7097 0.1384 0.1538 0.1950 0.9206 0.3229
## G4  0.0750 0.4825 0.5016 0.0191 0.5080 0.0254 0.0380 0.0501 0.8446 0.0492
## G5  0.1160 0.5910 0.6790 0.0880 0.7084 0.1173 0.1296 0.1656 0.8037 0.2869
## G6  0.0717 0.3496 0.6425 0.2929 0.7401 0.3905 0.4559 0.5277 0.7949 0.6005
## G7  0.1013 0.7459 0.8242 0.0782 0.8502 0.1043 0.0949 0.1227 0.8643 0.4105
## G8  0.1153 0.4682 0.7295 0.2612 0.8166 0.3483 0.3581 0.4266 0.7538 0.6550
## G9  0.0796 0.5632 0.6216 0.0585 0.6411 0.0780 0.0941 0.1216 0.8586 0.1785
## G10 0.1290 0.7573 0.7748 0.0175 0.7807 0.0234 0.0226 0.0300 0.8297 0.0964
## 
## $overall
##     Ho     Hs     Ht    Dst    Htp   Dstp    Fst   Fstp    Fis   Dest 
## 0.0958 0.5659 0.6678 0.1019 0.7018 0.1358 0.1525 0.1936 0.8308 0.3129

Export to Geneland

ecogen2geneland(eco, dir = "", ncod = 1)
# dir = "" writes the file to the current working directory

Data manipulation functions

The package defines several functions to manipulate data, many designed for genetic data structures:

eco.convert

The function interconverts genetic data among matrix format (one locus or one allele per column) and list format (one locus or one allele per column)

## Not run: 

data(eco3)

# One allele per column
loc2al <- eco.convert(eco3[["G"]], "matrix", "alleles.matrix", ploidy = 2)
head(loc2al)

##       G1.1 G1.2 G2.1 G2.2 G3.1 G3.2 G4.1 G4.2 G5.1 G5.2 G6.1 G6.2 G7.1
## Ind.1 "3"  "7"  "3"  "6"  "5"  "6"  "2"  "2"  "2"  "4"  "2"  "3"  "2" 
## Ind.2 "1"  "9"  "3"  "8"  "3"  "7"  "2"  "3"  "2"  "1"  "1"  "1"  "4" 
## Ind.3 "2"  "6"  "6"  "1"  "3"  "7"  "2"  "3"  "4"  "1"  "2"  "1"  "4" 
## Ind.4 "7"  "1"  "3"  "7"  "3"  "1"  "3"  "1"  "2"  "3"  "2"  "1"  "7" 
## Ind.5 "6"  "5"  "3"  "2"  "4"  "8"  "3"  "1"  "4"  "3"  "1"  "4"  "5" 
## Ind.6 "5"  "4"  "8"  "5"  NA   NA   "3"  "1"  "2"  "1"  "1"  "4"  "4" 
##       G7.2 G8.1 G8.2
## Ind.1 "7"  "3"  "7" 
## Ind.2 "4"  "1"  "9" 
## Ind.3 "1"  "2"  "6" 
## Ind.4 "5"  "7"  "1" 
## Ind.5 "3"  "6"  "5" 
## Ind.6 "1"  "5"  "4"

# Inverse operation (collapse alleles into locus)
al2loc <- eco.convert(loc2al, "alleles.matrix", "matrix", ploidy = 2)
head(al2loc)

##       G1   G2   G3   G4   G5   G6   G7   G8  
## Ind.1 "37" "36" "56" "22" "24" "23" "27" "37"
## Ind.2 "19" "38" "37" "23" "21" "11" "44" "19"
## Ind.3 "26" "61" "37" "23" "41" "21" "41" "26"
## Ind.4 "71" "37" "31" "31" "23" "21" "75" "71"
## Ind.5 "65" "32" "48" "31" "43" "14" "53" "65"
## Ind.6 "54" "85" NA   "31" "21" "14" "41" "54"

# Separating alleles with a character string
loc2loc <- eco.convert(eco3[["G"]], "matrix", "matrix", ploidy = 2, sep.out = "/")
head(loc2loc)

##       G1    G2    G3    G4    G5    G6    G7    G8   
## Ind.1 "3/7" "3/6" "5/6" "2/2" "2/4" "2/3" "2/7" "3/7"
## Ind.2 "1/9" "3/8" "3/7" "2/3" "2/1" "1/1" "4/4" "1/9"
## Ind.3 "2/6" "6/1" "3/7" "2/3" "4/1" "2/1" "4/1" "2/6"
## Ind.4 "7/1" "3/7" "3/1" "3/1" "2/3" "2/1" "7/5" "7/1"
## Ind.5 "6/5" "3/2" "4/8" "3/1" "4/3" "1/4" "5/3" "6/5"
## Ind.6 "5/4" "8/5" NA    "3/1" "2/1" "1/4" "4/1" "5/4"

# Inverse operation (removing separator)
loc2loc.nosep <- eco.convert(loc2loc, "matrix", "matrix", ploidy = 2, sep.in = "/", sep.out = "")
head(loc2loc.nosep)

##       G1   G2   G3   G4   G5   G6   G7   G8  
## Ind.1 "37" "36" "56" "22" "24" "23" "27" "37"
## Ind.2 "19" "38" "37" "23" "21" "11" "44" "19"
## Ind.3 "26" "61" "37" "23" "41" "21" "41" "26"
## Ind.4 "71" "37" "31" "31" "23" "21" "75" "71"
## Ind.5 "65" "32" "48" "31" "43" "14" "53" "65"
## Ind.6 "54" "85" NA   "31" "21" "14" "41" "54"

# Locus to list
loc2list <- eco.convert(eco3[["G"]], "matrix", "list", ploidy = 2)

# Locus to allele list
al2list <- eco.convert(eco3[["G"]], "matrix",  "alleles.list", ploidy = 2)

# The inverse operations are also defined. All the formats are interconvertible.
# Locus operations have defined a "within" operation (matrix to matrix, list to list), 
# with the purpose of put/remove separators between alleles. The program accepts any ploidy level.

eco.format

This function can format data with different levels of ploidy, allowing to:

add/remove zeros at the beginning/end of each allele
separate alleles with a character
divide alleles into columns
bind alleles from separate columns
transform character data into numeric data

# Adding zeros
example <- as.matrix(genotype[1:10,])
mode(example) <- "character"

example

##      G1   G2   G3   G4   G5   G6   G7   G8   G9   G10 
## I.1  "11" "55" "33" "22" "11" "12" "11" "33" "21" "25"
## I.2  "11" "22" "33" "22" "11" "11" "11" "33" "22" "44"
## I.3  "33" "55" "32" "22" "11" "22" "66" "33" "22" "55"
## I.4  "41" "55" "22" "11" "11" "22" "44" "33" "12" "22"
## I.5  "44" "22" "22" "22" "44" "11" "22" "33" "33" "55"
## I.6  "44" "22" "22" "22" "33" "21" "22" "44" "11" "55"
## I.7  "12" "33" "33" "11" "43" "11" "22" "33" "22" "55"
## I.8  "11" "33" "22" "22" "11" "11" "11" "11" "11" "11"
## I.9  "11" "33" "22" "22" "11" "22" "22" "11" "22" "22"
## I.10 "22" "35" "11" "11" "33" "22" "11" "33" "22" "33"

recoded <- eco.format(example, ncod = 1, ploidy = 2, nout = 3)
head(recoded)

##     G1       G2       G3       G4       G5       G6       G7      
## I.1 "100100" "500500" "300300" "200200" "100100" "100200" "100100"
## I.2 "100100" "200200" "300300" "200200" "100100" "100100" "100100"
## I.3 "300300" "500500" "300200" "200200" "100100" "200200" "600600"
## I.4 "400100" "500500" "200200" "100100" "100100" "200200" "400400"
## I.5 "400400" "200200" "200200" "200200" "400400" "100100" "200200"
## I.6 "400400" "200200" "200200" "200200" "300300" "200100" "200200"
##     G8       G9       G10     
## I.1 "300300" "200100" "200500"
## I.2 "300300" "200200" "400400"
## I.3 "300300" "200200" "500500"
## I.4 "300300" "100200" "200200"
## I.5 "300300" "300300" "500500"
## I.6 "400400" "100100" "500500"

# Tetraploid data, separating alleles with a "/"
tetrap <- as.matrix(example)
# Simulated tetraploid example data
tetrap <- matrix(paste(example,example, sep = ""), ncol = ncol(example)) 
recoded <- eco.format(tetrap, ncod = 1, ploidy = 4, sep.out = "/")

## Note: null or duplicated column names. using generic labels.

head(recoded)

##      [,1]              [,2]              [,3]             
## [1,] "100/100/100/100" "500/500/500/500" "300/300/300/300"
## [2,] "100/100/100/100" "200/200/200/200" "300/300/300/300"
## [3,] "300/300/300/300" "500/500/500/500" "300/200/300/200"
## [4,] "400/100/400/100" "500/500/500/500" "200/200/200/200"
## [5,] "400/400/400/400" "200/200/200/200" "200/200/200/200"
## [6,] "400/400/400/400" "200/200/200/200" "200/200/200/200"
##      [,4]              [,5]              [,6]             
## [1,] "200/200/200/200" "100/100/100/100" "100/200/100/200"
## [2,] "200/200/200/200" "100/100/100/100" "100/100/100/100"
## [3,] "200/200/200/200" "100/100/100/100" "200/200/200/200"
## [4,] "100/100/100/100" "100/100/100/100" "200/200/200/200"
## [5,] "200/200/200/200" "400/400/400/400" "100/100/100/100"
## [6,] "200/200/200/200" "300/300/300/300" "200/100/200/100"
##      [,7]              [,8]              [,9]             
## [1,] "100/100/100/100" "300/300/300/300" "200/100/200/100"
## [2,] "100/100/100/100" "300/300/300/300" "200/200/200/200"
## [3,] "600/600/600/600" "300/300/300/300" "200/200/200/200"
## [4,] "400/400/400/400" "300/300/300/300" "100/200/100/200"
## [5,] "200/200/200/200" "300/300/300/300" "300/300/300/300"
## [6,] "200/200/200/200" "400/400/400/400" "100/100/100/100"
##      [,10]            
## [1,] "200/500/200/500"
## [2,] "400/400/400/400"
## [3,] "500/500/500/500"
## [4,] "200/200/200/200"
## [5,] "500/500/500/500"
## [6,] "500/500/500/500"

aue.rescale

The function scales each column of a data frame or a matrix to [0,1] or [-1,1] range

data(eco.test)
require(adegenet)
pc <- dudi.pca(eco[["P"]], scannf = FALSE, nf = 3)
pc <- pc$li
pc <- aue.rescale(pc)

plot(eco[["XY"]][, 1], eco[["XY"]][, 2], col = rgb(pc), pch = 16, cex = 1.5, xlab ="X", ylab= "Y")

aue.sort

Order the content of cells in a matrix, following a series of rules determined by the user, by means of the parameters “ploidy” (ploidy of genetic data; the term can be used in a wide sense for other classes of data) and “ncod” (number of digits coding each allele, also extensible in a more general sense for other classes of data).

geno <- c(12, 52, 62, 45, 54, 21)
geno <- matrix(geno, 3, 2)

# Ordering the data 
aue.sort(geno, ploidy = 2)

##      [,1] [,2]
## [1,] "12" "45"
## [2,] "25" "45"
## [3,] "26" "12"

# Decreasing sort order
aue.sort(geno, ploidy = 2, decreasing = TRUE)

##      [,1] [,2]
## [1,] "21" "54"
## [2,] "52" "54"
## [3,] "62" "21"

geno2 <- c(456123, 524556, 629359, 459459, 950543, 219405)
geno2 <- matrix(geno2, 3, 2)
geno2

##        [,1]   [,2]
## [1,] 456123 459459
## [2,] 524556 950543
## [3,] 629359 219405

# Ordering the data as diploid
aue.sort(geno2, ploidy = 2)  # the data is ordered using blocks of 3 characters  (ie,  "234561 = 234 - 561")

##      [,1]     [,2]    
## [1,] "123456" "459459"
## [2,] "524556" "543950"
## [3,] "359629" "219405"

# Ordering the data as triploid
aue.sort(geno2, ploidy = 3)  # the data is ordered using blocks of 2 characters (ie,  "234561 = 23 - 45 - 61")

##      [,1]     [,2]    
## [1,] "234561" "455994"
## [2,] "455256" "054395"
## [3,] "596293" "052194"

Interactive data exploration

Everything is related to everything else, but near things are more related than distant things
- Tobler’s first law of geography

Preliminary concepts

Following Sokal (1983) this tutorial uses the terms “univariate” for the analysis of single variables, “multivariable” for the parallel analysis of several variables (i.e., the analysis returns an individual result for each variable), and “multivariate” for the analysis of several variables, but considered jointly (i.e., the analysis returns a single results for all the variables).

Spatial weights

Many of the functions of the package require spatial weights matrices as input. A spatial weights matrix W is a square positive matrix that defines the strength of the spatial relations between observations, assigning the value w_ij to the connection between individuals (i, j). In a binary weighting scheme, W corresponds to an adjacency matrix (“connection network”) indicating if the individual pairs (i, j) are connected (w_ij = 1) or not (w_ij = 0). In other situations, W is a matrix assigning a value to the connection (i, j) using a model for the spatial relations (e.g., following an exponential decay with distance, up to a threshold distance d, where the weights are set to 0). The matrix W is obtained in EcoGenetics using the function eco.weight:

data(eco3)

# "circle" method
con <- eco.weight(eco3[["XY"]], method = "circle", d1 = 0, d2 = 500)
con

## 
##  ################### 
##   spatial weights 
##  ################### 
## 
##   Method  -->  circle 
##   Parameters -->  (d1 = 0)(d2 = 500) 
##   Row-standardization -->  FALSE 
##   Self-included -->  FALSE 
##   Number of individuals -->  173 
##   Non-zero (non-self) links -->  4.8 % 
##   Individuals with non-zero (non-self) links -->  71.1 % 
##   Average (non-self) links per individual -->  8.3 
##   Average distance between connected individuals -->  314.25

class(con)

## [1] "eco.weight"
## attr(,"package")
## [1] "EcoGenetics"

Different definitions of weights are available in the package. In the example above, W contains binary weights. The criterion used to assign the values “0” and “1” is based in radial distances: every individual j is connected to i if j resides in a circle with center in i, where the distance d_ij between i and j is d1 < d _ij ≤ d2.

The function also allows to create a weights object from a custom adjacency matrix. The following example uses the package igraph (Csardi & Nepusz 2006) to construct a weights matrix:

require(igraph)

# Graph generation
tr <- make_tree(40, children = 3, mode = "undirected")

# Plot graph  
plot(tr, vertex.size = 10, vertex.label = NA)

# Convert into adjacency matrix
weights <- as.matrix(as_adj(tr))

# Provide names to the matrix
my_names <- 1:nrow(weights)
rownames(weights) <- colnames(weights) <- my_names

# Extract coordinates and provide names to the matrix
coord <- layout.auto(tr)
rownames(coord) <- my_names

# Conversion into eco.weight
my_weights <- eco.weight(XY = coord, W = weights)

## Custom W matrix detected. Method argument set as 'customW'

my_weights

## 
##  ################### 
##   spatial weights 
##  ################### 
## 
##   Method  -->  customW 
##   Parameters -->  NULL 
##   Row-standardization -->  FALSE 
##   Non-zero (non-self) links -->  5 % 
##   Number of individuals -->  40 
##   Individuals with non-zero (non-self) links -->  100 % 
##   Average (non-self) links per individual -->  2 
##   Average distance between connected individuals -->  2.388

Different plots, interactive and non interactive, are available for weights objects:

# "simple" plot
eco.plotWeight(my_weights,  type = "simple")

# Force network via "igraph"
myColors <- c(rep(1,13), rep(2, 9), rep(3, 9), rep(4, 9))
eco.plotWeight(my_weights, type = "igraph",group = myColors)

# Force network via "networkD3" (Gandrud et al. 2016) 

# You can interact with the graph, click and move the nodes
eco.plotWeight(my_weights, type="network", bounded = FALSE, group = myColors)

# Circle plot with bundled edges via "edgebundleR" (Bostock et al. 2016)
# Use the github version of edgebundleR to solve issues related to the order
# of the individuals in the current version (0.1.14), installing the package
# with the command: #devtools::install_github('garthtarr/edgebundleR')

# You can interact with the graph hovering over nodes
eco.plotWeight(my_weights, type="edgebundle", group = myColors)

Working with connection networks

Conversion from listw to eco.weight with the function eco.listw2ew

require(adegenet)
# Delaunay triangulation
temp <-chooseCN(eco3[["XY"]], type = 1, result.type = "listw", plot.nb = FALSE)

con <- eco.listw2ew(temp)

## Custom W matrix detected. Method argument set as 'customW'

# You can interact with the graph, click and move the nodes
eco.plotWeight(con, "network", bounded = TRUE, group = eco3[["S"]]$structure)

Creation of a list of weights following a set of rules

The function eco.lagweight creates a lagged series of weights matrices, dividing the distance range of individuals in classes. For each class, the program computes a weights matrix. The user can define how to divide the distance range into classes.

# Some examples

## example using Sturges' rule (default)
classlist <- eco.lagweight(eco[["XY"]]) 

## example dividing the distance range into 4 classes, using a cutoff distance of 15
classlist <- eco.lagweight(eco[["XY"]], nclass = 4, smax = 15)

## example using cutoff distances of 3 and 15, and a distance between consecutive classes of 2
classlist <- eco.lagweight(eco[["XY"]], int = 2, smin = 3, smax = 15)

## example including 1000 individuals per class
classlist <- eco.lagweight(eco[["XY"]], size = 1000)

Computing a statistic in a connection network

The function eco.slide.con allows to apply a function defined by the user, over the individuals included in a connection network. For a given variable, the program computes recursively a function for the individuals of the network, using all the individuals connected to each.

data(eco2)
myMatrix <- eco2[["P"]]
con <- eco.weight(XY = eco2[["XY"]], method = "knearest", k = 5)
result <- eco.slide.con(myMatrix, con, function(x) mean(x, na.rm = TRUE))

image(matrix(myMatrix[, 1], 30, 30)) # original image

image(matrix(result[, 1], 30, 30)) # smoothed image

Correlograms

A correlogram is a plot of spatial correlation values against distance classes. Different correlograms are supported:

For univariate and multivariable approaches, correlograms based in Moran's I (Moran 1950), Geary's C (Geary 1954), and bivariate Moran's I_xy (Reich et al. 1994)
For multivariate approaches, correlograms based in Mantel (Mantel 1967) and partial Mantel (Smouse et al. 1986) statistics
For a multivariate approach of genetic data in particular, the correlogram based on Nason’s kinship coefficient F_ij(Loiselle 1995; Kalisz 2001) or a custom kinship measure

Two approaches are available for correlogram construction:

Omnidirectional correlograms: the standard approach, do not discriminate the direction pointed by vectors connecting individuals pairs. The method assumes isotropy (the autocorrelation varies similarly with distance in all directions).
Directional correlograms: based on the construction of directional correlograms by the “bearing correlogram” method (Rosenberg 2000). The method can be used to explore whether the data is likely to hold the isotropic assumption, and if not (anisotropy), to construct correlograms for different directions. The method consists in the ponderation of the weights matrices used for the analysis by a factor that varies between 0 and 1, related with the direction pointed by the vector v connecting each pair of points. Given the individuals (i, j), each w_ij of the weights matrix W is recomputed as w'_ij = w_ij cos ² ( α_ij - θ ), where α_ij is the angle that v forms with the positive x axis (due East) in counterclockwise direction, and θ is the angle used for the directional analysis. When α_ij = θ , w_ij is pondered by 1 and does not change, and when α_ij = θ ± π/2, w_ij is 0. For the function eco.malecot, a directional slope can be computed as in Born (2012).

Correlograms based in I, C and I_xy are computed with the function eco.correlog:

moran <- eco.correlog(Z = eco[["A"]], XY = eco[["XY"]], method = "I", smax = 10, size = 1000)

# You can interact with the graphs
eco.plotCorrelog(moran)

moran2 <- eco.correlog(Z = eco[["P"]], XY = eco[["XY"]], method = "I", smax = 10, size = 1000)

eco.plotCorrelog(moran2)

A bearing correlogram using a Bearing Plot (Falsetti & Sokal, 1993; Rosenberg 2000). Bearing plots are available for eco.correlog and eco.cormantel. Instead of using distance classes as independent variables, bearing plots use the distance classes as fixed and the angles as independent variables. This method allows to see how direction affects autocorrelation within each distance class:

# Directions from 0 to 175 degrees counterclockwise from due E
moran_b <- eco.correlog(Z = eco[["P"]][, 3], XY = eco[["XY"]], method = "I", smax=10, size=1000, angle = seq(0, 175, 5))


# Bearing correlograms are plotted with the function eco.plotcorrelogB	
eco.plotCorrelogB(moran_b)

The correlograms for each fixed angle and distance classes as independent variables can also be constructed in the same way, setting the additional argument "as.deg = FALSE" in the eco.correlog/eco.cormantel function, and plotting with eco.plotCorrelog.

Mantel correlograms are computed with the function eco.cormantel:

# Checking threshold in a Mantel correlogram:
corm <- eco.cormantel(M = dist(eco[["A"]]), XY = eco[["XY"]],  nsim = 99, size = 1000)

eco.plotCorrelog(corm)

Correlograms for genetic data based in a kinship analysis measure (default: Nason’s F_ij) can be performed with the function eco.malecot:

globaltest <- eco.malecot(eco=eco, method = "global", smax = 4, size = 1000)
globaltest

eco.plotCorrelog(globaltest, errorbar = TRUE)

Directional approach:

globaltest_aniso <- eco.malecot(eco=eco, method = "global", smax=4, size=1000, angle = 120)

eco.plotCorrelog(globaltest_aniso)

Global SA analysis

In contrast with correlograms, which take the distance among samples as factor of analysis, global SA analysis implies the obtention of a single SA statistic across all the analyzed area (i.e., “globally”). Correlograms can be conceived as the computation of several global analyses, one for each distance class.

eco.gsa computes global spatial statistics (Moran’s I, Geary’s C, Join-count [Cliff & Ord 1981; Moran 1948], and the bivariate Moran’s I_xy). This program requires a weights matrix. In this example a cut-off distance of 4 (selected in function of the results of the correlograms) is used to compute a weights object and perform a global test for the Moran’s I statistic:

con <- eco.weight(eco[["XY"]], method = "circle", d2 = 4)

global <- eco.gsa(Z = eco[["A"]], con = con, method = "I", nsim = 200)

eco.plotGlobal(global)

The multivariate approach of global analysis is based on the use of Mantel and partial Mantel statistics. The use of the global test (Mantel test) is currently under active debate (see Legendre et al. 2015). In addition to the ordinary and partial Mantel tests, a version with truncated distance matrices can be performed in EcoGenetics. This is an alternative with higher power than the classical Mantel test when there is a specific ecological or genetic dispersion model in mind, i.e., the effect of the distance among sites can only be perceived up to a certain distance where contagion, dispersal of propagules in plants, or migration in animals, no longer creates spatial correlation among the sites (Legendre et al. 2015).

# Correlation is around 0 when distance between points is > 4 as seen with the multivariate correlogram and the correlograms for individual variables 
 
# We use the weights object “con” created above for truncation and computation of a truncated Mantel test
eco.mantel(dist(eco[["P"]]), dist(eco[["XY"]]), con = con, thres = 4, nsim = 999)

## 
##  ############################ 
##    Mantel test 
##  ############################ 
## 
##   > Correlation coefficient used -->  Pearson 
##   > Number of simulations -->  999 
##   > Alternative -->  greater 
##   > P-value -->  0.001 
##   > Observed value -->  0.1841 
##   > Expected value -->  -6e-04

Compute a bearing Mantel test for an angle of 37 degrees N from due E as used by Falsetti & Sokal (1993)

con_b <- eco.bearing(XY = eco[["XY"]], theta = 37)
eco.mantel(dist(eco[["P"]]), dist(eco[["XY"]]), con = con_b, thres = 4, nsim=999)

## 
##  ############################ 
##    Mantel test 
##  ############################ 
## 
##   > Correlation coefficient used -->  Pearson 
##   > Number of simulations -->  999 
##   > Alternative -->  greater 
##   > P-value -->  0.001 
##   > Observed value -->  0.1855 
##   > Expected value -->  -0.0011

Local SA analysis

Local SA analysis is based on the computation of a local SA measure. The function eco.lsa allows to compute local Moran’s I and Geary’s C (Anselin 1995), and Getis & Ord’s G_i and G_i* (Getis and Ord 1992; Ord and Getis 1995). For each individual, a SA statistic is obtained using a weights object specifying the spatial relations among individuals. For the following example, it is used the the weights object of the previous example. A distance of 4 represents, for a given individual, a circle of radius 4 centered on it, and those individuals within the circle are connected to the central one.

# Univariate test
localmoran <- eco.lsa(eco[["P"]][, 1], con, method = "I", nsim = 999)

# "rankplot" graph
eco.plotLocal(localmoran, significant = TRUE)

# Multivariable test
multiLocal <- eco.lsa(eco[["P"]], con, method = "I", nsim = 99)

# A multivariable visualization is obtained via "d3heatmap" (Cheng & Galili 2016). The plot is an interactive heatmap of individuals x variables

# Hover over the image, and select areas with the mouse to zoom; double click to return to the original scale
eco.plotLocal(multiLocal)

An example based in Getis-Ord's G for alleles. This statistics allows to detect areas areas with high and low positive autocorrelation (hot- and cold-spots, respectivelly). Note that during the computation of weights, the connections of the individuals with themselves are also considered (argument self = "TRUE"):

# Set self = TRUE in eco.weight for G*
con2<- eco.weight(eco[["XY"]], method = "knearest",  k = 4, self = TRUE) 
getis.A <- eco.lsa(eco[["A"]], con2, method = "G*", nsim = 99, adjust = "none")
eco.plotLocal(getis.A)

Integration of EcoGenetics in the R ecosystem

Using ecogen objects to analyze the relations among data from different sources

The next example shows how to analyze the relation among data frames of ecogen objects, using functions of other packages. The work with these functions is significantly fast and easy with ecogen objects. The analysis also uses the function eco.formula, which can create complex expressions with ecogen objects, acting as a proxy between the variables stored in ecogen objects and any other function able to use a formula as argument. The next examples illustrate the use of eco.formula with the function rda of the package vegan (Oksanen et al. 2016). The arguments that rda can take are X (a matrix of response variables, e.g., phenotypic traits), Y (predictor variables, e.g., environmental variables or alleles), and Z (conditioning variables, e.g., geographic coordinates or dbMEM’s, the latter can be computed with the package adespatial (Dray et al. 2016). In its simplest version, when Y and Z are missing, the function performs a principal components analysis. If Y and Z are provided, the function performs a redundancy analysis.

#### PCA analysis ####

pc <- rda(ecoslot.P(eco), scale=TRUE)

#### RDA based in dbMEM analysis ####
# First compute a dbMEM for eco in adespatial and store the result in slot C

require(adespatial)
require(vegan)

ecoslot.C(eco, use.object.names = TRUE) <-dbmem(eco_distance, thresh = 1, MEM.autocor = "positive")

## Information: Square regular grid; multiple eigenvalues

# Perform now a RDA with vegan using the first 20 axes of dbMEM, piping the
# EcoGenetics function “eco.formula”
my_formula <- eco.formula(eco, P1 + P2 + P3  ~ E1 + E2 + U(A) + Condition(U(C[, 1:20])))

rda(my_formula)


## Call: rda(formula = eco@P[, 1] + eco@P[, 2] + eco@P[, 3] ~ eco@E[, 1] + eco@E[, 2] +
## eco@A[, 1] + eco@A[, 2] + eco@A[, 3] + eco@A[, 4] + eco@A[, 5] +
## eco@A[, 6] + eco@A[, 7] + eco@A[, 8] + eco@A[, 9] + eco@A[, 10] +
## eco@A[, 11] + eco@A[, 12] + eco@A[, 13] + eco@A[, 14] + eco@A[,
## 15] + eco@A[, 16] + eco@A[, 17] + eco@A[, 18] + eco@A[, 19] +
## eco@A[, 20] + eco@A[, 21] + eco@A[, 22] + eco@A[, 23] + eco@A[,
## 24] + eco@A[, 25] + eco@A[, 26] + eco@A[, 27] + eco@A[, 28] +
## eco@A[, 29] + eco@A[, 30] + eco@A[, 31] + eco@A[, 32] + eco@A[,
## 33] + eco@A[, 34] + eco@A[, 35] + eco@A[, 36] + eco@A[, 37] +
## eco@A[, 38] + eco@A[, 39] + eco@A[, 40])
## + Condition((eco@C[, 1] + eco@C[, 2] + eco@C[, 3] + eco@C[, 4] +
## eco@C[, 5] + eco@C[, 6] + eco@C[, 7] + eco@C[, 8] + eco@C[, 9] +
## eco@C[, 10] + eco@C[, 11] + eco@C[, 12] + eco@C[, 13] + eco@C[,
## 14] + eco@C[, 15] + eco@C[, 16] + eco@C[, 17] + eco@C[, 18] +
## eco@C[, 19] + eco@C[, 20])))
## 
##               Inertia Proportion Rank
## Total         1.4404    1.00000     
## Conditional   0.9573    0.6646   20
## Constrained   0.1575    0.1094    1
## Unconstrained 0.3256    0.2260    1
## Inertia is variance 
## Some constraints were aliased because they were collinear (redundant)
##
## Eigenvalues for constrained axes:
##    RDA1 
## 0.15753
## 
## Eigenvalues for unconstrained axes:
##    PC1 
## 0.3256

Note that eco.formula substitutes variables of the data frames of ecogen slots (for example "P1") with an explicit expression "pointing" to the location of the variable in the data structure (in the present case, P1 is replaced by eco@P[, 1]). The function also substitutes any slot passed within the function U() by their variables, also as explicit expressions (for example, using the object "eco", U(A) is substituted by eco@A[, 1] + eco@A[, 2]...+...+...eco@A[, n])

Interaction with other population genetics packages

The next examples use an optional element, the Forward-Pipe operator (%>%). The operator, defined in the package magrittr (Bache & Wichham 2014) facilitates the execution of pipelines and complex operations with ecogen objects and EcoGenetics methods, allowing to feed functions of other packages after data manipulation or vice versa.

Perform an analysis with ecogen objects in gstudio:

require(gstudio)

data(arapat)
#Use the "arapat" data set of gstudio for the following examples
arpt <- gstudio2ecogen(arapat, struct = colnames(arapat)[1:3]) 

arpt %>% ecogen2gstudio %>% plot_populations(stratum = "ID", color = as.numeric(arapat$Species)) # Map of individuals

arpt %>% ecogen2gstudio %>% Ho # Overall Ho computation with gstudio

##        Locus         Ho
## 1       LTRS 0.23691460
## 2        WNT 0.27840909
## 3         EN 0.20555556
## 4         EF 0.14404432
## 5        ZMP 0.15454545
## 6        AML 0.37058824
## 7       ATPS 0.08539945
## 8       MP20 0.44692737
## 9 Multilocus 0.24029801

#Split arpt using "Species" column of slot S and compute Per-species Ho
arpt %>% eco.split("Species") %>% lapply(ecogen2gstudio) %>% lapply(Ho)

## $Cape
##        Locus         Ho
## 1       LTRS 0.10666667
## 2        WNT 0.05405405
## 3         EN 0.38666667
## 4         EF 0.04000000
## 5        ZMP 0.00000000
## 6        AML 0.09859155
## 7       ATPS 0.04000000
## 8       MP20 0.17333333
## 9 Multilocus 0.11241403
## 
## $Mainland
##        Locus         Ho
## 1       LTRS 0.47222222
## 2        WNT 0.03333333
## 3         EN 0.22857143
## 4         EF 0.32352941
## 5        ZMP 0.00000000
## 6        AML 0.14285714
## 7       ATPS 0.16666667
## 8       MP20 0.66666667
## 9 Multilocus 0.25423086
## 
## $Peninsula
##        Locus         Ho
## 1       LTRS 0.24206349
## 2        WNT 0.37500000
## 3         EN 0.14800000
## 4         EF 0.15079365
## 5        ZMP 0.21250000
## 6        AML 0.46774194
## 7       ATPS 0.08730159
## 8       MP20 0.50000000
## 9 Multilocus 0.27292508

sPCA analysis for ecogen objects with the package adegenet:

 require(adegenet)

 # Analyzing only continental data of arapat data set
 arpt_p <- eco.subset(arpt, "Species", "Peninsula")
 # Select first some cutoff distance
 Note the use of the argument latlon = TRUE to project latitude-longitude coordinates,
 as required for the analysis. Other functions as eco.cormantel or eco.weight also include the argument.

 cor_arpt_p <- eco.correlog(ecoslot.A(arpt_p), ecoslot.XY(arpt_p), latlon = TRUE, nsim = 0)

 eco.plotCorrelog(cor_arpt_p, quiet = TRUE)
 # A distance of 2E5 seems like an appropriate cutoff. Create weights
 arpt_p_W <- eco.weight(ecoslot.XY(arpt_p), method = "circle", d2 = 2E5, latlon = TRUE)
 arpt_p_spca <- arpt_p %>% ecogen2genind %>% spca(xy = ecoslot.XY(arpt_p), matWeight = ecoslot.W(arpt_p_W), scannf = FALSE, nfposi = 3)

 plot(arpt_p_spca, 1); plot(arpt_p_spca, 2); plot(arpt_p_spca, 3)

Minimum spanning network for ecogen objects in poppr:

require(poppr)

mydist <- dist(ecoslot.A(arpt_p))

ecogen2genind(arpt_p) %>% poppr.msn(distmat = mydist, vertex.label = NA)

A second example with simulated data

The object “eco2” included in the package contains point patterns in a grid of 30 x 30 pixels, simulating phenoypic data:

data(eco2)
par(mfrow=c(1,3))
image(matrix(ecoslot.P(eco2)[,1], 30, 30))
image(matrix(ecoslot.P(eco2)[,2], 30, 30))
image(matrix(ecoslot.P(eco2)[,3], 30, 30))

The Figure shows the three first variables of the slot P in the object eco2. The first variable has a local pattern, while the second and third include linear global gradients (the third also has random noise). The effect caused by the global pattern over the variable can be removed with a trend surface analysis (Borcard et al., 2011) using the function eco.detrend, performing a regression for each trait (z) over the polynom of degree n of the spatial coordinates (x, y). This example uses the polynom of degree 1.

To remove the global trend in direction SE-NW over the second of the images shown in the figure shown below, a fist degree polynom of the spatial coordinates (z_i = a₁ + a₂x + a₃y) is used:

det <- eco.detrend(eco2[["P"]][,2], eco2[["XY"]][,1:2], degree = 1)
sim.det <- data.frame(matrix(0, 900,3))
sim.det[,1:2] <- eco2[["XY"]]
sim.det[, 3]<-ecoslot.RES(det)
par(mfrow = c(1,1))
image(matrix(sim.det[,3], 30, 30))

The figure above shows the resulting image after the remotion of the global trend.

The trend was removed and the data is now ready to be analyzed, for example, computing a Moran correlogram:

cordata <- eco.correlog(Z = sim.det[,3], XY = sim.det[,1:2], int = 2, smax = 30)

eco.plotCorrelog(cordata)

The variable shows a decreasing trend from positive to negative values, and then stabilizes. The local patterns can be analyzed with the G_i* statistic. First, a weights object is created for the 20 nearest-neighbors of each pivotal individual, included the self-connection:

weight.con <- eco.weight(sim.det[,1:2], method = "knearest", k = 10, 
self = TRUE)
weight.con

## 
##  ################### 
##   spatial weights 
##  ################### 
## 
##   Method  -->  knearest - min 
##   Parameters -->  (k = 10) 
##   Row-standardization -->  FALSE 
##   Non-zero (non-self) links -->  5.5 % 
##   Number of individuals -->  900 
##   Individuals with non-zero (non-self) links -->  100 % 
##   Average (non-self) links per individual -->  49.5 
##   Average distance between connected individuals -->  2.852

These weights can now be used for the local analysis:

get <- eco.lsa(var = sim.det[,3],con = weight.con, method = "G*", nsim = 999)

eco.plotLocal(get, significant = TRUE)

The result can be stored in the ecogen object. Several results are assigned as list:

ecoslot.OUT(eco2) <- list(get, cordata, weight.con)
ecoslot.OUT(eco2)

##      OBJECTS        CLASSES
## 1    cordata | eco.correlog
## 2        get |      eco.lsa
## 3 weight.con |   eco.weight

The slot can be managed as a list.

The stored objects can be accessed as follow:

ecoslot.OUT(eco2, "cordata")

## $cordata
## 
##  ############################ 
##    Moran's I 
##  ############################ 
## 
##   > Number of simulations -->  99 
##   > Random test -->  permutation 
##   > P-adjust method -->  holm -sequential: TRUE 
## 
##   >  ecoslot.OUT(x): results -->  
## 
## $Z
##         d.mean     obs p.val  size
## d=0-2    1.466  0.6236  0.01  5102
## d=2-4    3.081  0.2856  0.02 14152
## d=4-6    5.028  0.1518  0.03 22886
## d=6-8    6.987  0.0291  0.04 27238
## d=8-10   9.003 -0.1290  0.05 34802
## d=10-12 10.945 -0.0729  0.06 32340
## d=12-14 12.915 -0.0018  0.49 39418
## d=14-16 14.940 -0.0089  0.14 36782
## d=16-18 16.916 -0.0404  0.09 36484
## d=18-20 18.945 -0.0862  0.10 35906
## d=20-22 20.965 -0.1307  0.11 31014
## d=22-24 22.899 -0.0700  0.12 26452
## d=24-26 24.897  0.0350  0.13 23818
## d=26-28 26.907  0.0970  0.14 16898
## d=28-30 28.888  0.1395  0.15 11382
## 
## 
##  Results table(s) in ecoslot.OUT(x) 
## 
##  -------------------------------------------------------------
##   Slot access: <ecoslot.> + <slot name> + <(object x name)> 
##   See help("EcoGenetics accessors")
##  -------------------------------------------------------------

The objects can be removed as follow:

 eco2 <- eco.remove(eco2, cordata) 
 ecoslot.OUT(eco2)

##      OBJECTS      CLASSES
## 1        get |    eco.lsa
## 2 weight.con | eco.weight

The workspace can be cleaned storing only the created object:

eco.clear(eco2)

The object can be saved using serialization:

saveRDS(eco2, "eco2.rds")

Additional utilities

eco.alfreq: distribution for frequencies of alleles, to evaluate mutation-drift equilibrium (Luikart 1998)
aue.phenosimil: phenotypic similarity (Ritland 1996)
eco.variogram: empirical variogram

Image manipulation utilities