EcoGenetics
An R package for landscape genetics
EcoGenetics
An R package for landscape genetics
Introducing EcoGenetics
EcoGenetics is an R package with infrastructure and flexible methods designed to make easier and faster the data analysis routines of population ecologists and geneticists. These routines usually involve a series of operations over multi-source datasets, that can be simplified in a set of functions. The package defines a new class of object, “ecogen”, a data structure where multi-source information of populations can be stored. A series of methods for multi-source data manipulation are defined for ecogen objects. EcoGenetics also provides functions for exploratory analysis of Spatial Autocorrelation, for single and multiple variables, with an interactive data visualization environment.
Storing and organizing multi-source population data with ecogen objects
There is no movement apart from things; for change is always according to the categories of being
- Aristotle
The class “ecogen” was designed for a more efficient managing of the multi-source information in the different stages of population data analyses, and for a straightforward data pre-processing and storage. An object of this class behaves like an ordered “stack” of information, and possess a set of slots, each one storing data from a different source:
- XY : data frame with geographic coordinates
- P : phenotypic data frame
- G : genotypic data frame
- A : matrix of allele counts (only available for codominant markers)
- E : environmental data frame
- S : data frame with classes assigned to the individuals
- C : custom data frame
- OUT : list for storage of results
Construction of ecogen objects
First, we load the sample data contained in the package:
library("EcoGenetics")
data(eco.test)
The data frames contained in the workspace represent information from different sources for 225 individuals, as indicated by their names. A new ecogen object is created with the constructor as follows:
eco <- ecogen(XY = coordinates, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE)
## Note: ordered genotypes in slot G
eco
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 225 x 8 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 225 x 6 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
The names of individuals must be included as the row names of the data frames, or they may be configured during the call to the constructor. The number of rows of the non empty data frames must be identical (same individuals in same order). Otherwise, the program will return an error message:
eco_bad <- ecogen(XY = coordinates[1:10, ], P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE)
## Error in validObject(object): invalid class “ecogen” object: number of rows differ for non empty data frames
Unordered data frames can be ordered using the option order.df = TRUE. Data frames are ordered using as reference the order of the first non-empty slot, in this case "XY":
XY_random <- coordinates[sample(nrow(coordinates)), ]
eco_rand <- ecogen(XY = XYrandom, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE, order.df = TRUE)
Names can be set for all the non-empty data frames with the option set.names:
myNames <- paste0("X_", seq_len(nrow(coordinates)))
eco_myNames <- ecogen(XY = coordinates, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE, set.names = newNames)
The program can also create a set of valid names (individuals named as I1, I2,..., In):
eco_autoNames <- ecogen(XY = coordinates, P = phenotype, G = genotype, E = environment, S = structure, order.G = TRUE, valid.names = TRUE)
Use of accessor functions with ecogen objects
EcoGenetics defines accessor functions to get (“get” mode) and set (“set” mode) the slot content of the objects defined in the package (see ?`EcoGenetics accessors`). The following notation is used for the name of the accessor functions: a prefix “ecoslot.”, followed by the name of the respective slot. In the case of ecogen objects, and using the object “eco” of the example as reference, the corresponding accessors are: ecoslot.XY(eco), ecoslot.P(eco), ecoslot.G(eco), ecoslot.A(eco), ecoslot.E(eco), ecoslot.S(eco), ecoslot.C(eco) and ecoslot.OUT(eco). The correct assignation of content to the slot of an existent object is made with accessors; these special functions ensure a basic pre-processing and checking of the data when used in “set” mode. A new ecogen object can be created using the constructor alone or combined with accessors:
# Creation of an ecogen object with only XY and P slots filled
eco.temp <- ecogen(XY = coordinates, P = phenotype)
eco.temp
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 225 x 8 phenotypic variables
## | slot G: | --> 0 x 0 loci
## | slot A: | --> 0 x 0 alleles
## | slot E: | --> 0 x 0 environmental variables
## | slot S: | --> 0 x 0 structures
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
# ecoslot.G “set” mode
ecoslot.G(eco.temp, order.G = TRUE) <- genotype
## Note: ordered genotypes in slot G
# Note: ordered genotypes in slot G
# ecoslot.G “get” mode (only showing top of the table here)
head(ecoslot.G(eco.temp))
## G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
## I.1 11 55 33 22 11 12 11 33 12 25
## I.2 11 22 33 22 11 11 11 33 22 44
## I.3 33 55 23 22 11 22 66 33 22 55
## I.4 14 55 22 11 11 22 44 33 12 22
## I.5 44 22 22 22 44 11 22 33 33 55
## I.6 44 22 22 22 33 12 22 44 11 55
# Fill the oher slots
ecoslot.E(eco.temp) <- environment
ecoslot.S(eco.temp) <- structure
eco.temp
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 225 x 8 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 225 x 6 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
As with the constructor, the number of rows of the non empty data frames must be identical in the output object. Trying to assign data frames with invalid number of rows will trigger an error message:
ecoslot.P(eco) <- phenotype[1:5, ]
## Error in validObject(object): invalid class “ecogen” object: 1: number of rows differ for non empty data frames
## invalid class “ecogen” object: 2: invalid length in object names
Algebra of ecogen objects
The class “ecogen” has a set of operations for multi-source data manipulation. These operations can be divided into subset, split and combine methods:
1. Subset methods
A. Integer subsetting
eco.sub <- eco[1:50] # New object with first 50 individuals from eco
eco.sub
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 50 x 2 coordinates
## | slot P: | --> 50 x 8 phenotypic variables
## | slot G: | --> 50 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 50 x 40 alleles
## | slot E: | --> 50 x 6 environmental variables
## | slot S: | --> 50 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
B. Logical subsetting
eco[ecoslot.P(eco)[, 3] > 0.5]
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 206 x 2 coordinates
## | slot P: | --> 206 x 8 phenotypic variables
## | slot G: | --> 206 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 206 x 40 alleles
## | slot E: | --> 206 x 6 environmental variables
## | slot S: | --> 206 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
# The length of the logical vector must be identical to the number of rows of non empty slots. If not, an error will be produced:
eco[c(rep(TRUE, 200))]
# For using a logical vector of length < nrow of non empty slots, the logical vector be converted into integer with the function “which”
eco[which(c(TRUE,TRUE,TRUE))] # which(c(TRUE,TRUE,TRUE)) equivalent to c(1,2,3)
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 3 x 2 coordinates
## | slot P: | --> 3 x 8 phenotypic variables
## | slot G: | --> 3 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 3 x 18 alleles
## | slot E: | --> 3 x 6 environmental variables
## | slot S: | --> 3 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
# Special cases
eco[] # returns the same object
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 225 x 8 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 225 x 6 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
eco[FALSE]; eco[numeric(0)]; eco[logical(0)] # return an empty object
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 0 x 0 coordinates
## | slot P: | --> 0 x 0 phenotypic variables
## | slot G: | --> 0 x 0 loci
## | slot A: | --> 0 x 0 alleles
## | slot E: | --> 0 x 0 environmental variables
## | slot S: | --> 0 x 0 structures
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
C. Subsetting by group of the slot S
eco.subS <- eco.subset(eco,"pop", 1)
eco.subS
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 174 x 2 coordinates
## | slot P: | --> 174 x 8 phenotypic variables
## | slot G: | --> 174 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 174 x 40 alleles
## | slot E: | --> 174 x 6 environmental variables
## | slot S: | --> 174 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
2. Split methods
# Split as list of ecogen objects
x <- eco.split(eco, "pop", asList = TRUE)
# Split and create objects with prefix "eco" in the workspace
eco.split(eco, hier = "pop", asList = FALSE)
## New object created in workspace: eco.1
## New object created in workspace: eco.2
## New object created in workspace: eco.3
## New object created in workspace: eco.4
# Split and create objects with prefix "newObjects" in the workspace
eco.split(eco, hier = "pop", name = "newObjects", asList = FALSE)
## New object created in workspace: newObjects.1
## New object created in workspace: newObjects.2
## New object created in workspace: newObjects.3
## New object created in workspace: newObjects.4
3. Combine methods
A. Union by row
# Re-bind the previous split sets
rebinded <- eco.rbind(eco.1, eco.2, eco.3)# rebinding objects in workspace
eco.bind <- eco.rbind(x) #rebinding the list of ecogen objects
# Note that different subsets can also be created
S1.3 <- eco.rbind(x[[1]], x[[3]])
B. Union by column
eco1 <- eco
eco.c <- eco.cbind(eco, eco1)
## Note: null or duplicated column names. using generic labels.
eco.c
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 225 x 16 phenotypic variables
## | slot G: | --> 225 x 20 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 80 alleles
## | slot E: | --> 225 x 12 environmental variables
## | slot S: | --> 225 x 2 structures >> 2 structures found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
C. Intersection
eco1 <- eco[2:20]
merged <- eco.merge(eco, eco1)
merged
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 19 x 4 coordinates
## | slot P: | --> 19 x 16 phenotypic variables
## | slot G: | --> 19 x 20 loci >> ploidy: 2 || codominant
## | slot A: | --> 19 x 74 alleles
## | slot E: | --> 19 x 12 environmental variables
## | slot S: | --> 19 x 2 structures >> 2 structures found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
Other operations
nrow(eco) # number of rows in each slot
## XY P G A E S C
## 225 225 225 225 225 225 0
ncol(eco) # number of columns in each slot
## XY P G A E S C
## 2 8 10 40 6 1 0
dim(eco) # dimension of each slot
## $XY
## [1] 225 2
##
## $P
## [1] 225 8
##
## $G
## [1] 225 10
##
## $A
## [1] 225 40
##
## $E
## [1] 225 6
##
## $S
## [1] 225 1
##
## $C
## [1] 0 0
as.list(eco) # conversion to list
Saving/loading ecogen objects
Serialization is a mechanism that allows to convert an object into a storable file, that can be saved on disk and restored in future R sessions.
Serialization provides a clean and compact option to save ecogen objects at the end of a session. This is performed by the function saveRDS of R:
saveRDS(eco, "myObject.rds")
# The object can be loaded in the future with the function readRDS:
eco_restored <- readRDS("myObject.rds")
eco_restored
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 225 x 8 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 225 x 6 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
identical(eco, eco_restored)
## [1] TRUE
Ecogen conversion from/to other formats
- ecogen objects can be imported/exported from/to: Genepop (Rousset 2008), SPAGeDi (Hardy and Vekemans 2002), adegenet (genind objects, Jombart 2008), gstudio (data frame with loci objects, Dyer 2016)
ecogen objects can be exported also to hierfstat ( Goudet and Jombart 2015) and Geneland (Guillot et al. 2005)
Import/export from/to Genepop
# ingpop, file with Genepop format in the folder "/extdata" of the package
ecopath <- paste(path.package("EcoGenetics"), "/extdata/ingpop", sep = "")
ingpop <- genepop2ecogen(ecopath)
ingpop
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 0 x 0 coordinates
## | slot P: | --> 0 x 0 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 0 x 0 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
ecogen2genepop(ingpop, dir = "", outName = "infile.genepop.txt", grp = "pop")
# dir = "" writes the file to the current working directory
Import/export from/to SPAGeDi
ecogen2spagedi(eco, dir = "", pop = "pop", ndig = 1, int=2, smax = 6, outName="infile.spagedi.txt")
ecosp <- spagedi2ecogen("infile.spagedi.txt", sep = "")
ecosp
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 0 x 0 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 0 x 0 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
Import/export from/to genind
# ecogen to genind
outGenind <- ecogen2genind(eco)
outGenind
## /// GENIND OBJECT /////////
##
## // 225 individuals; 10 loci; 40 alleles; size: 73 Kb
##
## // Basic content
## @tab: 225 x 40 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 2-6)
## @loc.fac: locus factor for the 40 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: NULL
##
## // Optional content
## @strata: a data frame with 1 columns ( pop )
# genind to ecogen
outEco <- genind2ecogen(outGenind)
outEco
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 0 x 0 coordinates
## | slot P: | --> 0 x 0 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 0 x 0 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
Import/export from/to gstudio
togstudio <- ecogen2gstudio(eco)
head(togstudio)
## ID Latitude Longitude pop G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
## I.1 I.1 1 1 1 1:1 5:5 3:3 2:2 1:1 1:2 1:1 3:3 1:2 2:5
## I.2 I.2 1 2 1 1:1 2:2 3:3 2:2 1:1 1:1 1:1 3:3 2:2 4:4
## I.3 I.3 1 3 1 3:3 5:5 2:3 2:2 1:1 2:2 6:6 3:3 2:2 5:5
## I.4 I.4 1 4 1 1:4 5:5 2:2 1:1 1:1 2:2 4:4 3:3 1:2 2:2
## I.5 I.5 1 5 1 4:4 2:2 2:2 2:2 4:4 1:1 2:2 3:3 3:3 5:5
## I.6 I.6 1 6 1 4:4 2:2 2:2 2:2 3:3 1:2 2:2 4:4 1:1 5:5
toeco <- gstudio2ecogen(togstudio, lat = "Latitude", lon = "Longitude", ID = "ID", struct = "pop")
toeco
##
## || ECOGEN CLASS OBJECT ||
## ----------------------------------------------------------------------------
## Access to slots: <ecoslot.> + <name of the slot> + <(name of the object)>
## See help("EcoGenetics accessors")
## ----------------------------------------------------------------------------
##
## | slot XY: | --> 225 x 2 coordinates
## | slot P: | --> 0 x 0 phenotypic variables
## | slot G: | --> 225 x 10 loci >> ploidy: 2 || codominant
## | slot A: | --> 225 x 40 alleles
## | slot E: | --> 0 x 0 environmental variables
## | slot S: | --> 225 x 1 structure >> 1 structure found
## | slot C: | --> 0 x 0 variables
## | slot OUT: | --> 0 results
## ----------------------------------------------------------------------------
Export to hierfstat
hiereco <- ecogen2hierfstat(eco, "pop")
require("hierfstat")
basic.stats(hiereco)
## $perloc
## Ho Hs Ht Dst Htp Dstp Fst Fstp Fis Dest
## G1 0.0926 0.4222 0.4964 0.0742 0.5212 0.0990 0.1496 0.1899 0.7807 0.1713
## G2 0.1319 0.7080 0.7333 0.0252 0.7417 0.0336 0.0344 0.0453 0.8137 0.1152
## G3 0.0454 0.5713 0.6751 0.1038 0.7097 0.1384 0.1538 0.1950 0.9206 0.3229
## G4 0.0750 0.4825 0.5016 0.0191 0.5080 0.0254 0.0380 0.0501 0.8446 0.0492
## G5 0.1160 0.5910 0.6790 0.0880 0.7084 0.1173 0.1296 0.1656 0.8037 0.2869
## G6 0.0717 0.3496 0.6425 0.2929 0.7401 0.3905 0.4559 0.5277 0.7949 0.6005
## G7 0.1013 0.7459 0.8242 0.0782 0.8502 0.1043 0.0949 0.1227 0.8643 0.4105
## G8 0.1153 0.4682 0.7295 0.2612 0.8166 0.3483 0.3581 0.4266 0.7538 0.6550
## G9 0.0796 0.5632 0.6216 0.0585 0.6411 0.0780 0.0941 0.1216 0.8586 0.1785
## G10 0.1290 0.7573 0.7748 0.0175 0.7807 0.0234 0.0226 0.0300 0.8297 0.0964
##
## $overall
## Ho Hs Ht Dst Htp Dstp Fst Fstp Fis Dest
## 0.0958 0.5659 0.6678 0.1019 0.7018 0.1358 0.1525 0.1936 0.8308 0.3129
Export to Geneland
ecogen2geneland(eco, dir = "", ncod = 1)
# dir = "" writes the file to the current working directory
Data manipulation functions
The package defines several functions to manipulate data, many designed for genetic data structures:
eco.convert
The function interconverts genetic data among matrix format (one locus or one allele per column) and list format (one locus or one allele per column)
## Not run:
data(eco3)
# One allele per column
loc2al <- eco.convert(eco3[["G"]], "matrix", "alleles.matrix", ploidy = 2)
head(loc2al)
## G1.1 G1.2 G2.1 G2.2 G3.1 G3.2 G4.1 G4.2 G5.1 G5.2 G6.1 G6.2 G7.1
## Ind.1 "3" "7" "3" "6" "5" "6" "2" "2" "2" "4" "2" "3" "2"
## Ind.2 "1" "9" "3" "8" "3" "7" "2" "3" "2" "1" "1" "1" "4"
## Ind.3 "2" "6" "6" "1" "3" "7" "2" "3" "4" "1" "2" "1" "4"
## Ind.4 "7" "1" "3" "7" "3" "1" "3" "1" "2" "3" "2" "1" "7"
## Ind.5 "6" "5" "3" "2" "4" "8" "3" "1" "4" "3" "1" "4" "5"
## Ind.6 "5" "4" "8" "5" NA NA "3" "1" "2" "1" "1" "4" "4"
## G7.2 G8.1 G8.2
## Ind.1 "7" "3" "7"
## Ind.2 "4" "1" "9"
## Ind.3 "1" "2" "6"
## Ind.4 "5" "7" "1"
## Ind.5 "3" "6" "5"
## Ind.6 "1" "5" "4"
# Inverse operation (collapse alleles into locus)
al2loc <- eco.convert(loc2al, "alleles.matrix", "matrix", ploidy = 2)
head(al2loc)
## G1 G2 G3 G4 G5 G6 G7 G8
## Ind.1 "37" "36" "56" "22" "24" "23" "27" "37"
## Ind.2 "19" "38" "37" "23" "21" "11" "44" "19"
## Ind.3 "26" "61" "37" "23" "41" "21" "41" "26"
## Ind.4 "71" "37" "31" "31" "23" "21" "75" "71"
## Ind.5 "65" "32" "48" "31" "43" "14" "53" "65"
## Ind.6 "54" "85" NA "31" "21" "14" "41" "54"
# Separating alleles with a character string
loc2loc <- eco.convert(eco3[["G"]], "matrix", "matrix", ploidy = 2, sep.out = "/")
head(loc2loc)
## G1 G2 G3 G4 G5 G6 G7 G8
## Ind.1 "3/7" "3/6" "5/6" "2/2" "2/4" "2/3" "2/7" "3/7"
## Ind.2 "1/9" "3/8" "3/7" "2/3" "2/1" "1/1" "4/4" "1/9"
## Ind.3 "2/6" "6/1" "3/7" "2/3" "4/1" "2/1" "4/1" "2/6"
## Ind.4 "7/1" "3/7" "3/1" "3/1" "2/3" "2/1" "7/5" "7/1"
## Ind.5 "6/5" "3/2" "4/8" "3/1" "4/3" "1/4" "5/3" "6/5"
## Ind.6 "5/4" "8/5" NA "3/1" "2/1" "1/4" "4/1" "5/4"
# Inverse operation (removing separator)
loc2loc.nosep <- eco.convert(loc2loc, "matrix", "matrix", ploidy = 2, sep.in = "/", sep.out = "")
head(loc2loc.nosep)
## G1 G2 G3 G4 G5 G6 G7 G8
## Ind.1 "37" "36" "56" "22" "24" "23" "27" "37"
## Ind.2 "19" "38" "37" "23" "21" "11" "44" "19"
## Ind.3 "26" "61" "37" "23" "41" "21" "41" "26"
## Ind.4 "71" "37" "31" "31" "23" "21" "75" "71"
## Ind.5 "65" "32" "48" "31" "43" "14" "53" "65"
## Ind.6 "54" "85" NA "31" "21" "14" "41" "54"
# Locus to list
loc2list <- eco.convert(eco3[["G"]], "matrix", "list", ploidy = 2)
# Locus to allele list
al2list <- eco.convert(eco3[["G"]], "matrix", "alleles.list", ploidy = 2)
# The inverse operations are also defined. All the formats are interconvertible.
# Locus operations have defined a "within" operation (matrix to matrix, list to list),
# with the purpose of put/remove separators between alleles. The program accepts any ploidy level.
eco.format
This function can format data with different levels of ploidy, allowing to:
add/remove zeros at the beginning/end of each allele
separate alleles with a character
divide alleles into columns
bind alleles from separate columns
transform character data into numeric data
# Adding zeros
example <- as.matrix(genotype[1:10,])
mode(example) <- "character"
example
## G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
## I.1 "11" "55" "33" "22" "11" "12" "11" "33" "21" "25"
## I.2 "11" "22" "33" "22" "11" "11" "11" "33" "22" "44"
## I.3 "33" "55" "32" "22" "11" "22" "66" "33" "22" "55"
## I.4 "41" "55" "22" "11" "11" "22" "44" "33" "12" "22"
## I.5 "44" "22" "22" "22" "44" "11" "22" "33" "33" "55"
## I.6 "44" "22" "22" "22" "33" "21" "22" "44" "11" "55"
## I.7 "12" "33" "33" "11" "43" "11" "22" "33" "22" "55"
## I.8 "11" "33" "22" "22" "11" "11" "11" "11" "11" "11"
## I.9 "11" "33" "22" "22" "11" "22" "22" "11" "22" "22"
## I.10 "22" "35" "11" "11" "33" "22" "11" "33" "22" "33"
recoded <- eco.format(example, ncod = 1, ploidy = 2, nout = 3)
head(recoded)
## G1 G2 G3 G4 G5 G6 G7
## I.1 "100100" "500500" "300300" "200200" "100100" "100200" "100100"
## I.2 "100100" "200200" "300300" "200200" "100100" "100100" "100100"
## I.3 "300300" "500500" "300200" "200200" "100100" "200200" "600600"
## I.4 "400100" "500500" "200200" "100100" "100100" "200200" "400400"
## I.5 "400400" "200200" "200200" "200200" "400400" "100100" "200200"
## I.6 "400400" "200200" "200200" "200200" "300300" "200100" "200200"
## G8 G9 G10
## I.1 "300300" "200100" "200500"
## I.2 "300300" "200200" "400400"
## I.3 "300300" "200200" "500500"
## I.4 "300300" "100200" "200200"
## I.5 "300300" "300300" "500500"
## I.6 "400400" "100100" "500500"
# Tetraploid data, separating alleles with a "/"
tetrap <- as.matrix(example)
# Simulated tetraploid example data
tetrap <- matrix(paste(example,example, sep = ""), ncol = ncol(example))
recoded <- eco.format(tetrap, ncod = 1, ploidy = 4, sep.out = "/")
## Note: null or duplicated column names. using generic labels.
head(recoded)
## [,1] [,2] [,3]
## [1,] "100/100/100/100" "500/500/500/500" "300/300/300/300"
## [2,] "100/100/100/100" "200/200/200/200" "300/300/300/300"
## [3,] "300/300/300/300" "500/500/500/500" "300/200/300/200"
## [4,] "400/100/400/100" "500/500/500/500" "200/200/200/200"
## [5,] "400/400/400/400" "200/200/200/200" "200/200/200/200"
## [6,] "400/400/400/400" "200/200/200/200" "200/200/200/200"
## [,4] [,5] [,6]
## [1,] "200/200/200/200" "100/100/100/100" "100/200/100/200"
## [2,] "200/200/200/200" "100/100/100/100" "100/100/100/100"
## [3,] "200/200/200/200" "100/100/100/100" "200/200/200/200"
## [4,] "100/100/100/100" "100/100/100/100" "200/200/200/200"
## [5,] "200/200/200/200" "400/400/400/400" "100/100/100/100"
## [6,] "200/200/200/200" "300/300/300/300" "200/100/200/100"
## [,7] [,8] [,9]
## [1,] "100/100/100/100" "300/300/300/300" "200/100/200/100"
## [2,] "100/100/100/100" "300/300/300/300" "200/200/200/200"
## [3,] "600/600/600/600" "300/300/300/300" "200/200/200/200"
## [4,] "400/400/400/400" "300/300/300/300" "100/200/100/200"
## [5,] "200/200/200/200" "300/300/300/300" "300/300/300/300"
## [6,] "200/200/200/200" "400/400/400/400" "100/100/100/100"
## [,10]
## [1,] "200/500/200/500"
## [2,] "400/400/400/400"
## [3,] "500/500/500/500"
## [4,] "200/200/200/200"
## [5,] "500/500/500/500"
## [6,] "500/500/500/500"
aue.rescale
The function scales each column of a data frame or a matrix to [0,1] or [-1,1] range
data(eco.test)
require(adegenet)
pc <- dudi.pca(eco[["P"]], scannf = FALSE, nf = 3)
pc <- pc$li
pc <- aue.rescale(pc)
plot(eco[["XY"]][, 1], eco[["XY"]][, 2], col = rgb(pc), pch = 16, cex = 1.5, xlab ="X", ylab= "Y")
aue.sort
Order the content of cells in a matrix, following a series of rules determined by the user, by means of the parameters “ploidy” (ploidy of genetic data; the term can be used in a wide sense for other classes of data) and “ncod” (number of digits coding each allele, also extensible in a more general sense for other classes of data).
geno <- c(12, 52, 62, 45, 54, 21)
geno <- matrix(geno, 3, 2)
# Ordering the data
aue.sort(geno, ploidy = 2)
## [,1] [,2]
## [1,] "12" "45"
## [2,] "25" "45"
## [3,] "26" "12"
# Decreasing sort order
aue.sort(geno, ploidy = 2, decreasing = TRUE)
## [,1] [,2]
## [1,] "21" "54"
## [2,] "52" "54"
## [3,] "62" "21"
geno2 <- c(456123, 524556, 629359, 459459, 950543, 219405)
geno2 <- matrix(geno2, 3, 2)
geno2
## [,1] [,2]
## [1,] 456123 459459
## [2,] 524556 950543
## [3,] 629359 219405
# Ordering the data as diploid
aue.sort(geno2, ploidy = 2) # the data is ordered using blocks of 3 characters (ie, "234561 = 234 - 561")
## [,1] [,2]
## [1,] "123456" "459459"
## [2,] "524556" "543950"
## [3,] "359629" "219405"
# Ordering the data as triploid
aue.sort(geno2, ploidy = 3) # the data is ordered using blocks of 2 characters (ie, "234561 = 23 - 45 - 61")
## [,1] [,2]
## [1,] "234561" "455994"
## [2,] "455256" "054395"
## [3,] "596293" "052194"
Interactive data exploration
Everything is related to everything else, but near things are more related than distant things
- Tobler’s first law of geography
Preliminary concepts
Following Sokal (1983) this tutorial uses the terms “univariate” for the analysis of single variables, “multivariable” for the parallel analysis of several variables (i.e., the analysis returns an individual result for each variable), and “multivariate” for the analysis of several variables, but considered jointly (i.e., the analysis returns a single results for all the variables).
Spatial weights
Many of the functions of the package require spatial weights matrices as input. A spatial weights matrix W is a square positive matrix that defines the strength of the spatial relations between observations, assigning the value wij to the connection between individuals (i, j). In a binary weighting scheme, W corresponds to an adjacency matrix (“connection network”) indicating if the individual pairs (i, j) are connected (wij = 1) or not (wij = 0). In other situations, W is a matrix assigning a value to the connection (i, j) using a model for the spatial relations (e.g., following an exponential decay with distance, up to a threshold distance d, where the weights are set to 0). The matrix W is obtained in EcoGenetics using the function eco.weight:
data(eco3)
# "circle" method
con <- eco.weight(eco3[["XY"]], method = "circle", d1 = 0, d2 = 500)
con
##
## ###################
## spatial weights
## ###################
##
## Method --> circle
## Parameters --> (d1 = 0)(d2 = 500)
## Row-standardization --> FALSE
## Self-included --> FALSE
## Number of individuals --> 173
## Non-zero (non-self) links --> 4.8 %
## Individuals with non-zero (non-self) links --> 71.1 %
## Average (non-self) links per individual --> 8.3
## Average distance between connected individuals --> 314.25
class(con)
## [1] "eco.weight"
## attr(,"package")
## [1] "EcoGenetics"
Different definitions of weights are available in the package. In the example above, W contains binary weights. The criterion used to assign the values “0” and “1” is based in radial distances: every individual j is connected to i if j resides in a circle with center in i, where the distance dij between i and j is d1 < d ij ≤ d2.
The function also allows to create a weights object from a custom adjacency matrix. The following example uses the package igraph (Csardi & Nepusz 2006) to construct a weights matrix:
require(igraph)
# Graph generation
tr <- make_tree(40, children = 3, mode = "undirected")
# Plot graph
plot(tr, vertex.size = 10, vertex.label = NA)
# Convert into adjacency matrix
weights <- as.matrix(as_adj(tr))
# Provide names to the matrix
my_names <- 1:nrow(weights)
rownames(weights) <- colnames(weights) <- my_names
# Extract coordinates and provide names to the matrix
coord <- layout.auto(tr)
rownames(coord) <- my_names
# Conversion into eco.weight
my_weights <- eco.weight(XY = coord, W = weights)
## Custom W matrix detected. Method argument set as 'customW'
my_weights
##
## ###################
## spatial weights
## ###################
##
## Method --> customW
## Parameters --> NULL
## Row-standardization --> FALSE
## Non-zero (non-self) links --> 5 %
## Number of individuals --> 40
## Individuals with non-zero (non-self) links --> 100 %
## Average (non-self) links per individual --> 2
## Average distance between connected individuals --> 2.388
Different plots, interactive and non interactive, are available for weights objects:
# "simple" plot
eco.plotWeight(my_weights, type = "simple")
# Force network via "igraph"
myColors <- c(rep(1,13), rep(2, 9), rep(3, 9), rep(4, 9))
eco.plotWeight(my_weights, type = "igraph",group = myColors)
# Force network via "networkD3" (Gandrud et al. 2016)
# You can interact with the graph, click and move the nodes
eco.plotWeight(my_weights, type="network", bounded = FALSE, group = myColors)
# Circle plot with bundled edges via "edgebundleR" (Bostock et al. 2016)
# Use the github version of edgebundleR to solve issues related to the order
# of the individuals in the current version (0.1.14), installing the package
# with the command: #devtools::install_github('garthtarr/edgebundleR')
# You can interact with the graph hovering over nodes
eco.plotWeight(my_weights, type="edgebundle", group = myColors)
Working with connection networks
Conversion from listw to eco.weight with the function eco.listw2ew
require(adegenet)
# Delaunay triangulation
temp <-chooseCN(eco3[["XY"]], type = 1, result.type = "listw", plot.nb = FALSE)
con <- eco.listw2ew(temp)
## Custom W matrix detected. Method argument set as 'customW'
# You can interact with the graph, click and move the nodes
eco.plotWeight(con, "network", bounded = TRUE, group = eco3[["S"]]$structure)
Creation of a list of weights following a set of rules
The function eco.lagweight creates a lagged series of weights matrices, dividing the distance range of individuals in classes. For each class, the program computes a weights matrix. The user can define how to divide the distance range into classes.
# Some examples
## example using Sturges' rule (default)
classlist <- eco.lagweight(eco[["XY"]])
## example dividing the distance range into 4 classes, using a cutoff distance of 15
classlist <- eco.lagweight(eco[["XY"]], nclass = 4, smax = 15)
## example using cutoff distances of 3 and 15, and a distance between consecutive classes of 2
classlist <- eco.lagweight(eco[["XY"]], int = 2, smin = 3, smax = 15)
## example including 1000 individuals per class
classlist <- eco.lagweight(eco[["XY"]], size = 1000)
Computing a statistic in a connection network
The function eco.slide.con allows to apply a function defined by the user, over the individuals included in a connection network. For a given variable, the program computes recursively a function for the individuals of the network, using all the individuals connected to each.
data(eco2)
myMatrix <- eco2[["P"]]
con <- eco.weight(XY = eco2[["XY"]], method = "knearest", k = 5)
result <- eco.slide.con(myMatrix, con, function(x) mean(x, na.rm = TRUE))
image(matrix(myMatrix[, 1], 30, 30)) # original image
image(matrix(result[, 1], 30, 30)) # smoothed image
Correlograms
A correlogram is a plot of spatial correlation values against distance classes. Different correlograms are supported:
For univariate and multivariable approaches, correlograms based in Moran's I (Moran 1950), Geary's C (Geary 1954), and bivariate Moran's Ixy (Reich et al. 1994)
For multivariate approaches, correlograms based in Mantel (Mantel 1967) and partial Mantel (Smouse et al. 1986) statistics
For a multivariate approach of genetic data in particular, the correlogram based on Nason’s kinship coefficient Fij(Loiselle 1995; Kalisz 2001) or a custom kinship measure
Two approaches are available for correlogram construction:
Omnidirectional correlograms: the standard approach, do not discriminate the direction pointed by vectors connecting individuals pairs. The method assumes isotropy (the autocorrelation varies similarly with distance in all directions).
Directional correlograms: based on the construction of directional correlograms by the “bearing correlogram” method (Rosenberg 2000). The method can be used to explore whether the data is likely to hold the isotropic assumption, and if not (anisotropy), to construct correlograms for different directions. The method consists in the ponderation of the weights matrices used for the analysis by a factor that varies between 0 and 1, related with the direction pointed by the vector v connecting each pair of points. Given the individuals (i, j), each wij of the weights matrix W is recomputed as w'ij = wij cos 2 ( αij - θ ), where αij is the angle that v forms with the positive x axis (due East) in counterclockwise direction, and θ is the angle used for the directional analysis. When αij = θ , wij is pondered by 1 and does not change, and when αij = θ ± π/2, wij is 0. For the function eco.malecot, a directional slope can be computed as in Born (2012).
Correlograms based in I, C and Ixy are computed with the function eco.correlog:
moran <- eco.correlog(Z = eco[["A"]], XY = eco[["XY"]], method = "I", smax = 10, size = 1000)
# You can interact with the graphs
eco.plotCorrelog(moran)
moran2 <- eco.correlog(Z = eco[["P"]], XY = eco[["XY"]], method = "I", smax = 10, size = 1000)
eco.plotCorrelog(moran2)
A bearing correlogram using a Bearing Plot (Falsetti & Sokal, 1993; Rosenberg 2000). Bearing plots are available for eco.correlog and eco.cormantel. Instead of using distance classes as independent variables, bearing plots use the distance classes as fixed and the angles as independent variables. This method allows to see how direction affects autocorrelation within each distance class:
# Directions from 0 to 175 degrees counterclockwise from due E
moran_b <- eco.correlog(Z = eco[["P"]][, 3], XY = eco[["XY"]], method = "I", smax=10, size=1000, angle = seq(0, 175, 5))
# Bearing correlograms are plotted with the function eco.plotcorrelogB
eco.plotCorrelogB(moran_b)
The correlograms for each fixed angle and distance classes as independent variables can also be constructed in the same way, setting the additional argument "as.deg = FALSE" in the eco.correlog/eco.cormantel function, and plotting with eco.plotCorrelog.
Mantel correlograms are computed with the function eco.cormantel:
# Checking threshold in a Mantel correlogram:
corm <- eco.cormantel(M = dist(eco[["A"]]), XY = eco[["XY"]], nsim = 99, size = 1000)
eco.plotCorrelog(corm)
Correlograms for genetic data based in a kinship analysis measure (default: Nason’s Fij) can be performed with the function eco.malecot:
globaltest <- eco.malecot(eco=eco, method = "global", smax = 4, size = 1000)
globaltest
eco.plotCorrelog(globaltest, errorbar = TRUE)
Directional approach:
globaltest_aniso <- eco.malecot(eco=eco, method = "global", smax=4, size=1000, angle = 120)
eco.plotCorrelog(globaltest_aniso)
Global SA analysis
In contrast with correlograms, which take the distance among samples as factor of analysis, global SA analysis implies the obtention of a single SA statistic across all the analyzed area (i.e., “globally”). Correlograms can be conceived as the computation of several global analyses, one for each distance class.
eco.gsa computes global spatial statistics (Moran’s I, Geary’s C, Join-count [Cliff & Ord 1981; Moran 1948], and the bivariate Moran’s Ixy). This program requires a weights matrix. In this example a cut-off distance of 4 (selected in function of the results of the correlograms) is used to compute a weights object and perform a global test for the Moran’s I statistic:
con <- eco.weight(eco[["XY"]], method = "circle", d2 = 4)
global <- eco.gsa(Z = eco[["A"]], con = con, method = "I", nsim = 200)
eco.plotGlobal(global)
The multivariate approach of global analysis is based on the use of Mantel and partial Mantel statistics. The use of the global test (Mantel test) is currently under active debate (see Legendre et al. 2015). In addition to the ordinary and partial Mantel tests, a version with truncated distance matrices can be performed in EcoGenetics. This is an alternative with higher power than the classical Mantel test when there is a specific ecological or genetic dispersion model in mind, i.e., the effect of the distance among sites can only be perceived up to a certain distance where contagion, dispersal of propagules in plants, or migration in animals, no longer creates spatial correlation among the sites (Legendre et al. 2015).
# Correlation is around 0 when distance between points is > 4 as seen with the multivariate correlogram and the correlograms for individual variables
# We use the weights object “con” created above for truncation and computation of a truncated Mantel test
eco.mantel(dist(eco[["P"]]), dist(eco[["XY"]]), con = con, thres = 4, nsim = 999)
##
## ############################
## Mantel test
## ############################
##
## > Correlation coefficient used --> Pearson
## > Number of simulations --> 999
## > Alternative --> greater
## > P-value --> 0.001
## > Observed value --> 0.1841
## > Expected value --> -6e-04
Compute a bearing Mantel test for an angle of 37 degrees N from due E as used by Falsetti & Sokal (1993)
con_b <- eco.bearing(XY = eco[["XY"]], theta = 37)
eco.mantel(dist(eco[["P"]]), dist(eco[["XY"]]), con = con_b, thres = 4, nsim=999)
##
## ############################
## Mantel test
## ############################
##
## > Correlation coefficient used --> Pearson
## > Number of simulations --> 999
## > Alternative --> greater
## > P-value --> 0.001
## > Observed value --> 0.1855
## > Expected value --> -0.0011
Local SA analysis
Local SA analysis is based on the computation of a local SA measure. The function eco.lsa allows to compute local Moran’s I and Geary’s C (Anselin 1995), and Getis & Ord’s Gi and Gi* (Getis and Ord 1992; Ord and Getis 1995). For each individual, a SA statistic is obtained using a weights object specifying the spatial relations among individuals. For the following example, it is used the the weights object of the previous example. A distance of 4 represents, for a given individual, a circle of radius 4 centered on it, and those individuals within the circle are connected to the central one.
# Univariate test
localmoran <- eco.lsa(eco[["P"]][, 1], con, method = "I", nsim = 999)
# "rankplot" graph
eco.plotLocal(localmoran, significant = TRUE)
# Multivariable test
multiLocal <- eco.lsa(eco[["P"]], con, method = "I", nsim = 99)
# A multivariable visualization is obtained via "d3heatmap" (Cheng & Galili 2016). The plot is an interactive heatmap of individuals x variables
# Hover over the image, and select areas with the mouse to zoom; double click to return to the original scale
eco.plotLocal(multiLocal)
An example based in Getis-Ord's G for alleles. This statistics allows to detect areas areas with high and low positive autocorrelation (hot- and cold-spots, respectivelly). Note that during the computation of weights, the connections of the individuals with themselves are also considered (argument self = "TRUE"):
# Set self = TRUE in eco.weight for G*
con2<- eco.weight(eco[["XY"]], method = "knearest", k = 4, self = TRUE)
getis.A <- eco.lsa(eco[["A"]], con2, method = "G*", nsim = 99, adjust = "none")
eco.plotLocal(getis.A)
Integration of EcoGenetics in the R ecosystem
Using ecogen objects to analyze the relations among data from different sources
The next example shows how to analyze the relation among data frames of ecogen objects, using functions of other packages. The work with these functions is significantly fast and easy with ecogen objects. The analysis also uses the function eco.formula, which can create complex expressions with ecogen objects, acting as a proxy between the variables stored in ecogen objects and any other function able to use a formula as argument. The next examples illustrate the use of eco.formula with the function rda of the package vegan (Oksanen et al. 2016). The arguments that rda can take are X (a matrix of response variables, e.g., phenotypic traits), Y (predictor variables, e.g., environmental variables or alleles), and Z (conditioning variables, e.g., geographic coordinates or dbMEM’s, the latter can be computed with the package adespatial (Dray et al. 2016). In its simplest version, when Y and Z are missing, the function performs a principal components analysis. If Y and Z are provided, the function performs a redundancy analysis.
#### PCA analysis ####
pc <- rda(ecoslot.P(eco), scale=TRUE)
#### RDA based in dbMEM analysis ####
# First compute a dbMEM for eco in adespatial and store the result in slot C
require(adespatial)
require(vegan)
ecoslot.C(eco, use.object.names = TRUE) <-dbmem(eco_distance, thresh = 1, MEM.autocor = "positive")
## Information: Square regular grid; multiple eigenvalues
# Perform now a RDA with vegan using the first 20 axes of dbMEM, piping the
# EcoGenetics function “eco.formula”
my_formula <- eco.formula(eco, P1 + P2 + P3 ~ E1 + E2 + U(A) + Condition(U(C[, 1:20])))
rda(my_formula)
## Call: rda(formula = eco@P[, 1] + eco@P[, 2] + eco@P[, 3] ~ eco@E[, 1] + eco@E[, 2] +
## eco@A[, 1] + eco@A[, 2] + eco@A[, 3] + eco@A[, 4] + eco@A[, 5] +
## eco@A[, 6] + eco@A[, 7] + eco@A[, 8] + eco@A[, 9] + eco@A[, 10] +
## eco@A[, 11] + eco@A[, 12] + eco@A[, 13] + eco@A[, 14] + eco@A[,
## 15] + eco@A[, 16] + eco@A[, 17] + eco@A[, 18] + eco@A[, 19] +
## eco@A[, 20] + eco@A[, 21] + eco@A[, 22] + eco@A[, 23] + eco@A[,
## 24] + eco@A[, 25] + eco@A[, 26] + eco@A[, 27] + eco@A[, 28] +
## eco@A[, 29] + eco@A[, 30] + eco@A[, 31] + eco@A[, 32] + eco@A[,
## 33] + eco@A[, 34] + eco@A[, 35] + eco@A[, 36] + eco@A[, 37] +
## eco@A[, 38] + eco@A[, 39] + eco@A[, 40])
## + Condition((eco@C[, 1] + eco@C[, 2] + eco@C[, 3] + eco@C[, 4] +
## eco@C[, 5] + eco@C[, 6] + eco@C[, 7] + eco@C[, 8] + eco@C[, 9] +
## eco@C[, 10] + eco@C[, 11] + eco@C[, 12] + eco@C[, 13] + eco@C[,
## 14] + eco@C[, 15] + eco@C[, 16] + eco@C[, 17] + eco@C[, 18] +
## eco@C[, 19] + eco@C[, 20])))
##
## Inertia Proportion Rank
## Total 1.4404 1.00000
## Conditional 0.9573 0.6646 20
## Constrained 0.1575 0.1094 1
## Unconstrained 0.3256 0.2260 1
## Inertia is variance
## Some constraints were aliased because they were collinear (redundant)
##
## Eigenvalues for constrained axes:
## RDA1
## 0.15753
##
## Eigenvalues for unconstrained axes:
## PC1
## 0.3256
Note that eco.formula substitutes variables of the data frames of ecogen slots (for example "P1") with an explicit expression "pointing" to the location of the variable in the data structure (in the present case, P1 is replaced by eco@P[, 1]). The function also substitutes any slot passed within the function U() by their variables, also as explicit expressions (for example, using the object "eco", U(A) is substituted by eco@A[, 1] + eco@A[, 2]...+...+...eco@A[, n])
Interaction with other population genetics packages
The next examples use an optional element, the Forward-Pipe operator (%>%). The operator, defined in the package magrittr (Bache & Wichham 2014) facilitates the execution of pipelines and complex operations with ecogen objects and EcoGenetics methods, allowing to feed functions of other packages after data manipulation or vice versa.Perform an analysis with ecogen objects in gstudio:
require(gstudio)
data(arapat)
#Use the "arapat" data set of gstudio for the following examples
arpt <- gstudio2ecogen(arapat, struct = colnames(arapat)[1:3])
arpt %>% ecogen2gstudio %>% plot_populations(stratum = "ID", color = as.numeric(arapat$Species)) # Map of individuals
arpt %>% ecogen2gstudio %>% Ho # Overall Ho computation with gstudio
## Locus Ho
## 1 LTRS 0.23691460
## 2 WNT 0.27840909
## 3 EN 0.20555556
## 4 EF 0.14404432
## 5 ZMP 0.15454545
## 6 AML 0.37058824
## 7 ATPS 0.08539945
## 8 MP20 0.44692737
## 9 Multilocus 0.24029801
#Split arpt using "Species" column of slot S and compute Per-species Ho
arpt %>% eco.split("Species") %>% lapply(ecogen2gstudio) %>% lapply(Ho)
## $Cape
## Locus Ho
## 1 LTRS 0.10666667
## 2 WNT 0.05405405
## 3 EN 0.38666667
## 4 EF 0.04000000
## 5 ZMP 0.00000000
## 6 AML 0.09859155
## 7 ATPS 0.04000000
## 8 MP20 0.17333333
## 9 Multilocus 0.11241403
##
## $Mainland
## Locus Ho
## 1 LTRS 0.47222222
## 2 WNT 0.03333333
## 3 EN 0.22857143
## 4 EF 0.32352941
## 5 ZMP 0.00000000
## 6 AML 0.14285714
## 7 ATPS 0.16666667
## 8 MP20 0.66666667
## 9 Multilocus 0.25423086
##
## $Peninsula
## Locus Ho
## 1 LTRS 0.24206349
## 2 WNT 0.37500000
## 3 EN 0.14800000
## 4 EF 0.15079365
## 5 ZMP 0.21250000
## 6 AML 0.46774194
## 7 ATPS 0.08730159
## 8 MP20 0.50000000
## 9 Multilocus 0.27292508
sPCA analysis for ecogen objects with the package adegenet:
require(adegenet)
# Analyzing only continental data of arapat data set
arpt_p <- eco.subset(arpt, "Species", "Peninsula")
# Select first some cutoff distance
Note the use of the argument latlon = TRUE to project latitude-longitude coordinates,
as required for the analysis. Other functions as eco.cormantel or eco.weight also include the argument.
cor_arpt_p <- eco.correlog(ecoslot.A(arpt_p), ecoslot.XY(arpt_p), latlon = TRUE, nsim = 0)
eco.plotCorrelog(cor_arpt_p, quiet = TRUE)
# A distance of 2E5 seems like an appropriate cutoff. Create weights
arpt_p_W <- eco.weight(ecoslot.XY(arpt_p), method = "circle", d2 = 2E5, latlon = TRUE)
arpt_p_spca <- arpt_p %>% ecogen2genind %>% spca(xy = ecoslot.XY(arpt_p), matWeight = ecoslot.W(arpt_p_W), scannf = FALSE, nfposi = 3)
plot(arpt_p_spca, 1); plot(arpt_p_spca, 2); plot(arpt_p_spca, 3)
Minimum spanning network for ecogen objects in poppr:
require(poppr)
mydist <- dist(ecoslot.A(arpt_p))
ecogen2genind(arpt_p) %>% poppr.msn(distmat = mydist, vertex.label = NA)
A second example with simulated data
The object “eco2” included in the package contains point patterns in a grid of 30 x 30 pixels, simulating phenoypic data:
data(eco2)
par(mfrow=c(1,3))
image(matrix(ecoslot.P(eco2)[,1], 30, 30))
image(matrix(ecoslot.P(eco2)[,2], 30, 30))
image(matrix(ecoslot.P(eco2)[,3], 30, 30))
The Figure shows the three first variables of the slot P in the object eco2. The first variable has a local pattern, while the second and third include linear global gradients (the third also has random noise). The effect caused by the global pattern over the variable can be removed with a trend surface analysis (Borcard et al., 2011) using the function eco.detrend, performing a regression for each trait (z) over the polynom of degree n of the spatial coordinates (x, y). This example uses the polynom of degree 1.
To remove the global trend in direction SE-NW over the second of the images shown in the figure shown below, a fist degree polynom of the spatial coordinates (zi = a1 + a2x + a3y) is used:
det <- eco.detrend(eco2[["P"]][,2], eco2[["XY"]][,1:2], degree = 1)
sim.det <- data.frame(matrix(0, 900,3))
sim.det[,1:2] <- eco2[["XY"]]
sim.det[, 3]<-ecoslot.RES(det)
par(mfrow = c(1,1))
image(matrix(sim.det[,3], 30, 30))
The figure above shows the resulting image after the remotion of the global trend.
The trend was removed and the data is now ready to be analyzed, for example, computing a Moran correlogram:
cordata <- eco.correlog(Z = sim.det[,3], XY = sim.det[,1:2], int = 2, smax = 30)
eco.plotCorrelog(cordata)
The variable shows a decreasing trend from positive to negative values, and then stabilizes. The local patterns can be analyzed with the Gi* statistic. First, a weights object is created for the 20 nearest-neighbors of each pivotal individual, included the self-connection:
weight.con <- eco.weight(sim.det[,1:2], method = "knearest", k = 10,
self = TRUE)
weight.con
##
## ###################
## spatial weights
## ###################
##
## Method --> knearest - min
## Parameters --> (k = 10)
## Row-standardization --> FALSE
## Non-zero (non-self) links --> 5.5 %
## Number of individuals --> 900
## Individuals with non-zero (non-self) links --> 100 %
## Average (non-self) links per individual --> 49.5
## Average distance between connected individuals --> 2.852
These weights can now be used for the local analysis:
get <- eco.lsa(var = sim.det[,3],con = weight.con, method = "G*", nsim = 999)
eco.plotLocal(get, significant = TRUE)
The result can be stored in the ecogen object. Several results are assigned as list:
ecoslot.OUT(eco2) <- list(get, cordata, weight.con)
ecoslot.OUT(eco2)
## OBJECTS CLASSES
## 1 cordata | eco.correlog
## 2 get | eco.lsa
## 3 weight.con | eco.weight
The slot can be managed as a list.
The stored objects can be accessed as follow:
ecoslot.OUT(eco2, "cordata")
## $cordata
##
## ############################
## Moran's I
## ############################
##
## > Number of simulations --> 99
## > Random test --> permutation
## > P-adjust method --> holm -sequential: TRUE
##
## > ecoslot.OUT(x): results -->
##
## $Z
## d.mean obs p.val size
## d=0-2 1.466 0.6236 0.01 5102
## d=2-4 3.081 0.2856 0.02 14152
## d=4-6 5.028 0.1518 0.03 22886
## d=6-8 6.987 0.0291 0.04 27238
## d=8-10 9.003 -0.1290 0.05 34802
## d=10-12 10.945 -0.0729 0.06 32340
## d=12-14 12.915 -0.0018 0.49 39418
## d=14-16 14.940 -0.0089 0.14 36782
## d=16-18 16.916 -0.0404 0.09 36484
## d=18-20 18.945 -0.0862 0.10 35906
## d=20-22 20.965 -0.1307 0.11 31014
## d=22-24 22.899 -0.0700 0.12 26452
## d=24-26 24.897 0.0350 0.13 23818
## d=26-28 26.907 0.0970 0.14 16898
## d=28-30 28.888 0.1395 0.15 11382
##
##
## Results table(s) in ecoslot.OUT(x)
##
## -------------------------------------------------------------
## Slot access: <ecoslot.> + <slot name> + <(object x name)>
## See help("EcoGenetics accessors")
## -------------------------------------------------------------
The objects can be removed as follow:
eco2 <- eco.remove(eco2, cordata)
ecoslot.OUT(eco2)
## OBJECTS CLASSES
## 1 get | eco.lsa
## 2 weight.con | eco.weight
The workspace can be cleaned storing only the created object:
eco.clear(eco2)
The object can be saved using serialization:
saveRDS(eco2, "eco2.rds")
Additional utilities
- eco.alfreq: distribution for frequencies of alleles, to evaluate mutation-drift equilibrium (Luikart 1998)
- aue.phenosimil: phenotypic similarity (Ritland 1996)
- eco.variogram: empirical variogram
Image manipulation utilities
- eco.NDVI: NDVI computation for Landsat imaginery
- eco.NDVI.post: postprocessing for NDVI images
- eco.slide.matrix: moving window computations for a matrix
- aue.df2image: conversor, data frame to image
- aue.image2df: conversor, image to data frame
Please refer to the corresponding EcoGenetics help files for examples and usage of the functions.
References
- Anselin L. 1995. Local indicators of spatial association-LISA. Geographical analysis, 27: 93–115
- Bache S., Wickham H. 2014. magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr
- Bostock M., Patrick E., Tarr G. 2016. edgebundleR: Circle Plot with Bundled Edges. R package version 0.1.5. https://github.com/garthtarr/edgebundleR
- Borcard D., Gillet F., Legendre P. 2011. Numerical ecology with R. Springer Science & Business Media
- Born C., Le Roux P., Spohr C., McGeoch M., Van Vuuren B. 2012. Plant dispersal in the sub‐Antarctic inferred from anisotropic genetic structure. Molecular ecology, 21: 184–194
- Cliff A., Ord J. 1981. Spatial Processes, Models and Applications. London: Pion
- Csardi G., Nepusz T. 2006. The igraph software package for complex network research. InterJournal, Complex Systems. pp. 1695. http://igraph.org
- Dyer R. 2016. gstudio: Tools Related to the Spatial Analysis of Genetic Marker Data. R package version 1.5.0
- Dray S., Blanchet G., Borcard D., Guenard G., Jombart T., Larocque G., Legendre P., Wagner H. 2016. adespatial: Multivariate Multiscale Spatial Analysis. R package version 0.0-6. https://CRAN.R-project.org/package=adespatial
- Falsetti A., Sokal R. 1993. Genetic structure of human populations in the British Isles. Annals of Human Biology, 20: 215–229.
- Gandrud C., Allaire J., Russell K. 2016. networkD3: D3 JavaScript Network Graphs from R. R package version 0.2.13. https://CRAN.R-project.org/package=networkD3
- Geary R. 1954. The contiguity ratio and statistical mapping. The incorporated statistician, 115–146
- Getis A., Ord J. 1992. The analysis of spatial association by use of distance statistics. Geographical analysis, 24: 189–206
- Goudet J., Jombart T. 2015. hierfstat: Estimation and Tests of Hierarchical F-Statistics. R package version 0.04-22. https://CRAN.R-project.org/package=hierfstat
- Guillot G., Mortier F., Estoup A. 2015. Geneland: A program for landscape genetics. Molecular Ecology Notes, 5: 712–715
- Hardy O., Vekemans X. 2002. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Molecular Ecology Notes, 2: 618–620
- Jombart T. 2008. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics, 24: 1403–1405
- Kalisz S., Nason J., Handazawa F., Tonsor S. 2001. Spatial population genetic structure in Trillium grandiflorum: the roles of dispersal, mating, history, and selection. Evolution, 55: 1560–1568
- Legendre P., Fortin M., Borcard D. 2015. Should the Mantel test be used in spatial analysis?. Methods in Ecology and Evolution, 6: 1239–1247
- Loiselle B., Sork V., Nason J., Graham C. 1995. Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). American Journal of Botany, 82: 1420–1425
- Luikart G., Allendorf F., Cornuet F., Sherwin W. 1998. Distortion of allele frequency distributions provides a test for recent population bottlenecks. Journal of Heredity, 89: 238–247
- Mantel N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Research, 27: 209–220.
- Moran P. 1948. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10: 243–251
- Moran P. 1950. Notes on continuous stochastic phenomena. Biometrika, 37: 17–23
- Oksanen J., Blanchet F., Kindt R., Legendre P., Minchin P., O'Hara R., Simpson G., Solymos P., Stevens M., Wagner H. 2016. vegan: Community Ecology Package. R package version 2.3-5. Available at: https://CRAN.R-project.org/package=vegan
- Ord J., Getis A. 1995. Local spatial autocorrelation statistics: distributional issues and an application. Geographical analysis, 27: 286–306
- Reich R., Czaplewski R., Bechtold W. 1994. Spatial cross-correlation of undisturbed, natural shortleaf pine stands in northern Georgia. Environmental and Ecological Statistics, 1: 201–217
- Ritland K. 1996. Estimators for pairwise relatedness and individual inbreeding coefficients. Genetical Research, 67: 175–185
- Rosenberg M. 2000. The bearing correlogram: a new method of analyzing directional spatial autocorrelation. Geographical Analysis, 32: 267–278
- Rousset F. 2008. Genepop’007: a complete reimplementation of the Genepop software for Windows and Linux. Molecular Ecology Resources , 8: 103–106
- Smouse P., Long J., Sokal R. 1986. Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Systematic Zoology, 35: 627–632
- Sokal R. 1983. Analyzing character variation in geographic space. In Numerical Taxonomy (pp. 384–403). Springer Berlin Heidelberg