Format tool for genetic data
eco.format(data, ncod = NULL, nout = 3, ploidy = 2, sep.in, sep.out, fill.mode = c("last", "first", "none"), recode = c("none", "all", "column"), show.codes = FALSE)
data | Genetic data frame. |
---|---|
ncod | Number of digits coding each allele in the input file. |
nout | Number of digits in the output. |
ploidy | Ploidy of the data. |
sep.in | Character separating alleles in the input data if present. |
sep.out | Character separating alleles in the output data. Default |
fill.mode | Add zeros at the beggining ("fist") or the end ("last") of each allele. Default = "last". |
recode | Recode mode: "none" for no recoding (defalut), "all" for recoding the data considering all the individuals values at once (e.g., protein data), or "column" for recoding the values by column (e.g., microsatellite data). |
show.codes | May we returned tables with the equivalence between the old and new codes when recode = "all" or recode = "column"? |
The function can format data with different ploidy levels. It allows to: - add/remove zeros at the beginning/end of each allele - separate alleles with a character - divide alleles into columns - bind alleles from separate columns - transform character data into numeric data
"NA" is considered special character (not available data).
# NOT RUN { data(eco.test) # Adding zeros example <- as.matrix(genotype[1:10,]) mode(example) <- "character" # example data example recoded <- eco.format(example, ncod = 1, ploidy = 2, nout = 3) # recoded data recoded # Tetraploid data, separating alleles with a "/" tetrap <- as.matrix(example) # simulated tetraploid example data tetrap <- matrix(paste(example,example, sep = ""), ncol = ncol(example)) recoded <- eco.format(tetrap, ncod = 1, ploidy = 4, sep.out = "/") # recoded data recoded # Example with a single character ex <- c("A","T","G","C") ex <- sample(ex, 100, rep= T) ex <- matrix(ex, 10, 10) colnames(ex) <- letters[1:10] rownames(ex) <- LETTERS[1:10] # example data ex recoded <- eco.format(ex, ploidy = 1, nout = 1, recode = "all", show.codes = TRUE) # recoded data recoded # Example with two strings per cell and missing values: ex <- c("Ala", "Asx", "Cys", "Asp", "Glu", "Phe", "Gly", "His", "Ile", "Lys", "Leu", "Met", "Asn", "Pro", "Gln", "Arg", "Ser", "Thr", "Val", "Trp") ex1 <- sample(ex, 100, rep= T) ex2 <- sample(ex, 100, rep= T) ex3 <- paste(ex1, ex2, sep="") missing.ex3 <- sample(1:100, 20) ex3[missing.ex3] <-NA ex4 <- matrix(ex3, 10, 10) colnames(ex4) <- letters[1:10] rownames(ex4) <- LETTERS[1:10] # example data ex4 recoded <- eco.format(ex4, ncod = 3, ploidy = 2, nout = 2, recode = "column") # recoded data recoded # Example with a vector, following the latter example: ex1 <- as.data.frame(ex1) # example data ex1 recoded <- eco.format(ex1, ploidy = 1, recode = "all") # recoded data recoded # }