| Type: | Package |
| Title: | Importing and Analysing 'SNP' and 'Silicodart' Data Generated byGenome-Wide Restriction Fragment Analysis |
| Version: | 2.9.9.5 |
| Date: | 2025-03-24 |
| Description: | Functions are provided that facilitate the import and analysis of 'SNP' (single nucleotide polymorphism) and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'CallRate', heterozygosity, reproducibility, maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR' supports also the analysis of 3rd party software package such as 'newhybrid', 'structure', 'NeEstimator' and 'blast'. Since version 2.0.3 we also implemented simulation functions, that allow to forward simulate 'SNP' dynamics under different population and evolutionary dynamics. Comprehensive tutorials and support can be found at our 'github' repository: github.com/green-striped-gecko/dartR/. If you want to cite 'dartR', you find the information by typing citation('dartR') in the console. |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.5), adegenet (≥ 2.0.0), ggplot2, dplyr, dartR.data |
| Imports: | ape,crayon,data.table,fields,foreach,gridExtra,MASS,methods,patchwork,plyr,PopGenReport,raster,reshape2,shiny,SNPRelate,sp(≥ 1.6.1),StAMPP,stats,stringr,tidyr,utils, gsubfn, purrr |
| Suggests: | boot, devtools, directlabels, dismo, doParallel, expm,gdistance, ggtern, gganimate, ggrepel, grid, gtable, ggthemes,gplots, HardyWeinberg, hierfstat, igraph, iterpc, knitr,label.switching, lattice, leaflet, leaflet.minicharts,markdown, mmod, networkD3, parallel, pegas, pheatmap, plotly,poppr, proxy, qvalue, RColorBrewer, Rcpp, rgl, rmarkdown,rrBLUP, scales, seqinr, shinyBS, shinyjs, shinythemes,shinyWidgets, SIBER, snpStats, stringi, terra, tibble, vcfR,zoo, viridis, vegan |
| License: | GPL (≥ 3) |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2025-03-25 00:35:23 UTC; s425824 |
| Author: | Bernd Gruber [aut, cre], Arthur Georges [aut], Jose L. Mijangos [aut], Carlo Pacioni [aut], Diana Robledo-Ruiz [aut], Peter J. Unmack [ctb], Oliver Berry [ctb], Lindsay V. Clark [ctb], Floriaan Devloo-Delva [ctb], Eric Archer [ctb] |
| URL: | https://green-striped-gecko.github.io/dartR/,https://github.com/green-striped-gecko/dartR |
| BugReports: | https://groups.google.com/g/dartr?pli=1 |
| Maintainer: | Bernd Gruber <bernd.gruber@canberra.edu.au> |
| Repository: | CRAN |
| Date/Publication: | 2025-03-25 09:50:02 UTC |
indexing dartR objects correctly...
Description
indexing dartR objects correctly...
Usage
## S4 method for signature 'dartR,ANY,ANY,ANY'x[i, j, ..., pop = NULL, treatOther = TRUE, quiet = TRUE, drop = FALSE]Arguments
x | dartR object |
i | index for individuals |
j | index for loci |
... | other parameters |
pop | list of populations to be kept |
treatOther | elements in other (and ind.metrics & loci.metrics) as indexed as well. default: TRUE |
quiet | warnings are suppressed. default: TRUE |
drop | reduced to a vector if a single individual/loci is selected. default: FALSE [should never set to TRUE] |
A genlight object created via the read.dart functions
Description
This a test data set to test the validity of functions within dartR and is based on a DArT SNP data set of simulated bandicoots across Australia. It contains 96 individuals and 1000 SNPs.
Usage
bandicoot.glFormat
genlight object
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr
adjust cbind for dartR
Description
cbind is a bit lazy and does not take care for the metadata (so data in theother slot is lost). You can get most of the loci metadata back usinggl.compliance.check.
Usage
## S3 method for class 'dartR'cbind(...)Arguments
... | list of dartR objects |
Value
A genlight object
Examples
t1 <- platypus.glclass(t1) <- "dartR"t2 <- cbind(t1[,1:10],t1[,11:20])Converts a genind object into a genlight object
Description
Converts a genind object into a genlight object
Usage
gi2gl(gi, parallel = FALSE, verbose = NULL)Arguments
gi | A genind object [required]. |
parallel | Switch to deactivate parallel version. It might not be worthto run it parallel most of the times [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default 2]. |
Details
Be aware due to ambiguity which one is the reference allele a combination ofgi2gl(gl2gi(gl)) does not return an identical object (but in terms ofanalysis this conversions are equivalent)
Value
A genlight object, with all slots filled.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Estimates expected Heterozygosity
Description
Estimates expected Heterozygosity
Usage
gl.He(gl)Arguments
gl | A genlight object [required] |
Value
A simple vector whit Ho for each loci
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Estimates observed Heterozygosity
Description
Estimates observed Heterozygosity
Usage
gl.Ho(gl)Arguments
gl | A genlight object [required] |
Value
A simple vector whit Ho for each loci
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Estimates effective population size using the Linkage Disequilibriummethod based on NeEstimator (V2)
Description
This function is basically a convenience function that runs the LD Neestimator using Neestimator2within R using the provided genlight object.To be able to do so, the software has to be downloaded from their website and the appropriate executable Ne2-1 has to be copied into the path as specified in the function. (see example below).
Usage
gl.LDNe( x, outfile = "genepopLD.txt", outpath = tempdir(), neest.path = getwd(), critical = 0, singleton.rm = TRUE, mating = "random", plot.out = TRUE, plot_theme = theme_dartR(), plot_colors_pop = discrete_palette, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file withall results from Neestimator 2 [default 'genepopLD.txt']. |
outpath | Path where to save the output file. Use outpath=getwd() oroutpath='.' when calling this function to direct output files to your workingdirectory [default tempdir(), mandated by CRAN]. |
neest.path | Path to the folder of the NE2-1 file.Please note there are 3 different executables depending on your OS:Ne2-1.exe (=Windows), Ne2-1M (=Mac), Ne2-1L (=Linux). You only need to pointto the folder (the function will recognise which OS you are running)[default getwd()]. |
critical | (vector of) Critical values that are used to remove allelesbased on their minor allele frequency. This can be done before using thegl.filter.maf function, therefore the default is set to 0 (no loci areremoved). To run for MAF 0 and MAF 0.05 at the same time specify: critical =c(0,0.05) [default 0]. |
singleton.rm | Whether to remove singleton alleles [default TRUE]. |
mating | Formula for Random mating='random' or monogamy= 'monogamy'[default 'random']. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | User specified theme [default theme_dartR()]. |
plot_colors_pop | A discrete palette for population colors or a listwith as many colors as there are populations in the dataset[default discrete_palette]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
Dataframe with the results as table
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
## Not run: # SNP data (use two populations and only the first 100 SNPs)pops <- possums.gl[1:60,1:100]nes <- gl.LDNe(pops, outfile="popsLD.txt", outpath=tempdir(),neest.path = "./path_to Ne-21",critical=c(0,0.05), singleton.rm=TRUE, mating='random')nes## End(Not run)Calculates allele frequency of the first and second allele for each lociA very simple function to report allele frequencies
Description
Calculates allele frequency of the first and second allele for each lociA very simple function to report allele frequencies
Usage
gl.alf(x)Arguments
x | Name of the genlight object containing the SNP data [required]. |
Value
A simple data.frame with alf1, alf2.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
#for the first 10 loci onlygl.alf(possums.gl[,1:10])barplot(t(as.matrix(gl.alf(possums.gl[,1:10]))))Generates percentage allele frequencies by locus and population
Description
This is a support script, to take SNP data or SilicoDArT presence/absencedata grouped into populations in a genlight object {adegenet} and generatea table of allele frequencies for each population and locus
Usage
gl.allele.freq(x, percent = FALSE, by = "pop", simple = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or Tag P/A(SilicoDArT) data [required]. |
percent | If TRUE, percentage allele frequencies are given, if FALSEallele proportions are given [default FALSE] |
by | If by='popxloc' then breakdown is given by population and locus; if by='pop'then breakdown is given by population with statistics averaged across loci; if by='loc'then breakdown is given by locus with statistics averaged across individuals [default 'pop'] |
simple | A legacy option to return a dataframe with the frequency of the reference allele (alf1) and the frequency of the alternate allele (alf2) by locus [default FALSE] |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity] |
Value
A matrix with allele (SNP data) or presence/absence frequencies(Tag P/A data) broken down by population and locus
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other unmatched report:gl.report.heterozygosity()
Examples
gl.allele.freq(testset.gl,percent=FALSE,by='pop')gl.allele.freq(testset.gl,percent=FALSE,by="loc")gl.allele.freq(testset.gl,percent=FALSE,by="popxloc")gl.allele.freq(testset.gl,simple=TRUE)Performs AMOVA using genlight data
Description
This script performs an AMOVA based on the genetic distance matrix fromstamppNeisD() [package StAMPP] using the amova() function from the packagePEGAS for exploring within and between population variation. For detailedinformation use their help pages: ?pegas::amova, ?StAMPP::stamppAmova. Beaware due to a conflict of the amova functions from various packages I hadto 'hack' StAMPP::stamppAmova to avoid a namespace conflict.
Usage
gl.amova(x, distance = NULL, permutations = 100, verbose = NULL)Arguments
x | Name of the genlight containing the SNP genotypes, withpopulation information [required]. |
distance | Distance matrix between individuals (if not provided NeisDfrom StAMPP::stamppNeisD is calculated) [default NULL]. |
permutations | Number of permutations to perform for hypothesistesting [default 100]. Please note should be set to 1000 for analysis. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
An object of class 'amova' which is a list with a table of sums ofsquare deviations (SSD), mean square deviations (MSD), and the number ofdegrees of freedom, and a vector of variance components.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
#permutations should be higher, here set to 1 because of speedout <- gl.amova(bandicoot.gl, permutations=1)Population assignment using grm
Description
This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.
Usage
gl.assign.grm(x, unknown, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
unknown | Name of the individual to be assigned to a population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function is a re-implementation of the function multilocus_assignmentfrom package gstudio.Description of the method used in this function can be found at:https://dyerlab.github.io/applied_population_genetics/population-assignment.html
Value
Adata.frame consisting of assignment probabilities for eachpopulation.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")if ((requireNamespace("rrBLUP", quietly = TRUE)) &(requireNamespace("gplots", quietly = TRUE)) ) {res <- gl.assign.grm(platypus.gl,unknown="T27")}Assign an individual of unknown provenance to population based on Mahalanobis Distance
Description
This script assigns an individual of unknown provenance to one or more targetpopulations based on the unknown individual's proximity to population centroids; proximity is estimated using Mahalanobis Distance.
The following process is followed:
An ordination is undertaken on the populations to again yield aseries of orthogonal (independent) axes.
A workable subset of dimensions is chosen, that specified, orequal to the number of dimensions with substantive eigenvalues, whichever isthe smaller.
The Mahalobalis Distance is calculated for the unknown against eachpopulation and probability of membership of each population is calculated.The assignment probabilities are listed in support of a decision.
Usage
gl.assign.mahalanobis( x, dim.limit = 2, plevel = 0.999, plot.out = TRUE, unknown, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
dim.limit | Maximum number of dimensions to consider for theconfidence ellipses [default 2] |
plevel | Probability level for bounding ellipses[default 0.999]. |
plot.out | If TRUE, produces a plot showing the position of the unknown in relation to putative source populations [default TRUE] |
unknown | Identity label of the focal individual whose provenance isunknown [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
There are three considerations to assignment. First, consider only thosepopulations for which the unknown has no private alleles. Private alleles arean indication that the unknown does not belong to a target population(provided that the sample size is adequate, say >=10). This can be evaluatedwith gl.assign.pa().
A next step is to consider the PCoA plot for populations where no privatealleles have been detected. The position of the unknown in relation to theconfidence ellipses is plotted by this script as a basis for narrowing downthe list of putative source populations. This can be evaluated with gl.assign.pca().
The third step (delivered by this script) is to consider the assignment probabilities based on the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, then to consider the probability associated with its quantile using the Chisquare approximation. In effect, this index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination. The larger the assignment probability, the greater the confidence in the assignment.
If dim.limit is set to 2, to correspond with the dimensions used ingl.assign.pa(), then the output provides a ranking of the final setof putative source populations.
If dim.limit is set to be > 2, then this script provides a basis forfurther narrowing the set of putative populations.If the unknown individualis an extreme outlier, say at less than 0.001 probability of population membership (0.999 confidence envelope), then the associated population can be eliminated from further consideration.
Warning: gl.assign.mahal() treats each specified dimension equally, withoutregard to the percentage variation explained after ordination. If the unknown is an outlier in a lower dimension with an explanatory variance of,say, 0.1dimensions from the ordination.
Each of these above approaches provides evidence, none are 100They need to be interpreted cautiously.
In deciding the assignment, the script considers an individual to be anoutlier with respect to a particular population at alpha = 0.001 as default
Value
A data frame with the results of the assignment analysis.
Author(s)
Custodian: Arthur Georges –Post tohttps://groups.google.com/d/forum/dartr
Examples
## Not run: #Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown='UC_01044', nmin=10, threshold=1,verbose=3) test_2 <- gl.assign.pca(test, unknown='UC_01044', plevel=0.95, verbose=3)df <- gl.assign.mahalanobis(test_2, unknown='UC_01044', verbose=3)## End(Not run)Eliminates populations as possible source populations for an individual of unknown provenance, using private alleles
Description
This script eliminates from consideration as putative source populations,those populations for which the individual has too many private alleles. Thepopulations that remain are putative source populations, subject to furtherconsideration.
The algorithm identifies those target populations for which the individualhas no private alleles or for which the number of private alleles does notexceed a user specified threshold.
An excessive count of private alleles is an indication that the unknown doesnot belong to a target population (provided that the sample size isadequate, say >=10).
Usage
gl.assign.pa( x, unknown, nmin = 10, threshold = 0, n.best = NULL, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
unknown | SpecimenID label (indName) of the focal individual whoseprovenance is unknown [required]. |
nmin | Minimum sample size for a target population to be included in theanalysis [default 10]. |
threshold | Populations to retain for consideration; those for which thefocal individual has less than or equal to threshold loci with privatealleles [default 0]. |
n.best | If given a value, dictates the best n=n.best populations toretain for consideration (or more if their are ties) based on private alleles[default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object containing the focal individual (assigned topopulation 'unknown') and populations for which the focal individual is notdistinctive (number of loci with private alleles less than or equal to thethreshold). If no such populations, the genlight object contains only datafor the unknown individual.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Examples
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown='UC_00146', nmin=10, threshold=1, verbose=3)Assign an individual of unknown provenance to population based on PCA
Description
This script assigns an individual of unknown provenance to one or more targetpopulations based on its proximity to each population defined by aconfidence ellipse in ordinated space of two dimensions.
The following process is followed:
The space defined by the loci is ordinated to yield a series oforthogonal axes (independent), and the top two dimensions are considered.Populations for which the unknown lies outside the specified confidencelimits are no longer removed from the dataset.
Usage
gl.assign.pca(x, unknown, plevel = 0.999, plot.out = TRUE, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
unknown | Identity label of the focal individual whose provenance isunknown [required]. |
plevel | Probability level for bounding ellipses in the PCoA plot[default 0.999]. |
plot.out | If TRUE, plot the 2D PCA showing the position of the unknown [default TRUE] |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
There are three considerations to assignment. First, consider only thosepopulations for which the unknown has no private alleles. Private alleles arean indication that the unknown does not belong to a target population(provided that the sample size is adequate, say >=10). This can be evaluatedwith gl.assign.pa().
A next step is to consider the PCoA plot for populations where no privatealleles have been detected and the position of the unknown in relation to theconfidence ellipses as is plotted by this script. Note, this plot isconsidering only the top two dimensions of the ordination, and so an unknownlying outside the confidence ellipse can be unambiguously interpreted as it lying outside the confidence envelope. However, if the unknown lies inside the confidence ellipse in two dimensions, then it may still lie outside the confidence envelope in deeper dimensions. This second step is good for eliminating populations from consideration, but does not provide confidence in assignment.
The third step is to consider the assignment probabilities, using the scriptgl.assign.mahalanobis(). This approach calculates the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, and calculates the probability associated with its quantile under the zero truncated normal distribution. This index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination.
Each of these approaches provides evidence, none are 100need to be interpreted cautiously. They are best applied sequentially.
In deciding the assignment, the script considers an individual to be anoutlier with respect to a particular population at alpha = 0.001 as default.
Value
A genlight object containing only those populations that areputative source populations for the unknown individual.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
## Not run: #Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown='UC_00146', nmin=10, threshold=1,verbose=3) test_2 <- gl.assign.pca(test, unknown='UC_00146', plevel=0.95, verbose=3)## End(Not run)Calculates basic statistics for each loci (Hs, Ho, Fis etc.)
Description
Based on functionbasic.stats. Check ?basic.statsfor help.
Usage
gl.basic.stats(x, digits = 4, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
digits | Number of digits that should be returned [default 4]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
Several tables and lists with all basic stats.basic.stats for details.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
if (!(requireNamespace("hierfstat", quietly = TRUE))) {out <- gl.basic.stats(possums.gl[1:10,1:100])}Aligns nucleotides sequences against those present in a target database using blastn
Description
Basic Local Alignment Search Tool (BLAST; Altschul et al., 1990 &1997) is a sequence comparison algorithm optimized for speed used to searchsequence databases for optimal local alignments to a query. This functioncreates fasta files, creates databases to run BLAST, runs blastn and filtersthese results to obtain the best hit per sequence.
This function can be used to run BLAST alignment of short-read (DArTseqdata) and long-read sequences (Illumina, PacBio... etc). You can usereference genomes from NCBI, genomes from your private collection, contigs,scaffolds or any other genetic sequence that you would like to use asreference.
Usage
gl.blast( x, ref_genome, task = "megablast", Percentage_identity = 70, Percentage_overlap = 0.8, bitscore = 50, number_of_threads = 2, verbose = NULL)Arguments
x | Either a genlight object containing a column named'TrimmedSequence' containing the sequence of the SNPs (the sequence tag)trimmed of adapters as provided by DArT; or a path to a fasta file with thequery sequences [required]. |
ref_genome | Path to a reference genome in fasta of fna format[required]. |
task | Four different tasks are supported: 1) “megablast”, for verysimilar sequences (e.g, sequencing errors), 2) “dc-megablast”, typicallyused for inter-species comparisons, 3) “blastn”, the traditional programused for inter-species comparisons, 4) “blastn-short”, optimized forsequences less than 30 nucleotides [default 'megablast']. |
Percentage_identity | Not a very sensitive or reliable measure ofsequence similarity, however it is a reasonable proxy for evolutionarydistance. The evolutionary distance associated with a 10 percent change inPercentage_identity is much greater at longer distances. Thus, a change from80 – 70 percent identity might reflect divergence 200 million years earlierin time, but the change from 30 percent to 20 percent might correspond to abillion year divergence time change [default 70]. |
Percentage_overlap | Calculated as alignment length divided by thequery length or subject length (whichever is shortest of the two lengths,i.e. length / min(qlen,slen) ) [default 0.8]. |
bitscore | A rule-of-thumb for inferring homology, a bit score of 50is almost always significant [default 50]. |
number_of_threads | Number of threads (CPUs) to use in blastn search[default 2]. |
verbose | verbose= 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity] |
Details
Installing BLAST
You can download the BLAST installs from:https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
It is important to install BLAST in a path that does not contain spaces forthis function to work.
Running BLAST
Four different tasks are supported:
“megablast”, for verysimilar sequences (e.g, sequencing errors)
“dc-megablast”, typicallyused for inter-species comparisons
“blastn”, the traditional programused for inter-species comparisons
“blastn-short”, optimized forsequences less than 30 nucleotides
If you are running a BLAST alignment of similar sequences, forexample Turtle Genome Vs Turtle Sequences, the recommended parametersare: task = “megablast”, Percentage_identity = 70, Percentage_overlap = 0.8and bitscore = 50.
If you are running a BLAST alignment of highly dissimilar sequences becauseyou are probably looking for sex linked hits in a distantly relatedspecies, and you are aligning for example sequences of Chicken Genome VsBassiana, the recommended parameters are: task = “dc-megablast”,Percentage_identity = 50, Percentage_overlap = 0.01 and bitscore = 30.
Be aware that running BLAST might take a long time (i.e. days) depending ofthe size of your query, the size of your database and the number of threadsselected for your computer.
BLAST output
The BLAST output is formatted as a table using output format 6, with columnsdefined in the following order:
qseqid - Query Seq-id
sacc - Subject accession
stitle - Subject Title
qseq - Alignedpart of query sequence
sseq - Aligned part of subject sequence
nident - Number of identical matches
mismatch - Number of mismatches
pident - Percentage of identical matches
length - Alignmentlength
evalue - Expect value
bitscore - Bit score
qstart -Start of alignment in query
qend - End of alignment in query
sstart - Start of alignment in subject
send - End of alignment insubject
gapopen - Number of gap openings
gaps - Total number ofgaps
qlen - Query sequence length
slen - Subject sequence length
PercentageOverlap - length / min(qlen,slen)
Databases containing unfiltered aligned sequences, filtered alignedsequences and one hit per sequence are saved to the temporal directory(tempdir) and can be accessed with the functiongl.print.reports and listed with the functiongl.list.reports. Note that they can be accessed only in thecurrent R session because tempdir is cleared each time that the R session isclosed.
BLAST filtering
BLAST output is filtered by ordering the hits of each sequence first by thehighest percentage identity, then the highest percentage overlap and thenthe highest bitscore. Only one hit per sequence is kept based on theseselection criteria.
Value
If the input is a genlight object: returns a genlight object with onehit per sequence merged to the slot $other$loc.metrics. If the input is afasta file: returns a dataframe with one hit per sequence.
Author(s)
Berenice Talamantes Becerra & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D.J. (1990). Basic local alignment search tool. Journal of molecular biology,215(3), 403-410.
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang,Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic acids research,25(17), 3389-3402.
Pearson, W. R. (2013). An introduction to sequence similarity(“homology”) searching. Current protocols in bioinformatics, 42(1), 3-1.
See Also
Examples
## Not run: res <- gl.blast(x= testset.gl,ref_genome = 'sequence.fasta')# display of reports saved in the temporal directorygl.list.reports()# open the reports saved in the temporal directoryblast_databases <- gl.print.reports(1)## End(Not run)Checks the current global verbosity
Description
The verbosity can be set in one of two ways – (a) explicitly by the user bypassing a value using the parameter verbose in a function, or (b) by settingthe verbosity globally as part of the r environment (gl.set.verbosity).
Usage
gl.check.verbosity(x = NULL)Arguments
x | User requested level of verbosity [default NULL]. |
Value
The verbosity, in variable verbose
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
gl.check.verbosity()Checks the global working directory
Description
The working directory can be set in one of two ways – (a) explicitly by the user bypassing a value using the parameter plot.dir in a function, or (b) by settingthe working directory globally as part of the r environment (gl.setwd). The default is in acccordance to CRAN set to tempdir().
Usage
gl.check.wd(wd = NULL, verbose = NULL)Arguments
wd | path to the working directory [default: tempdir()]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
the working directory
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
gl.check.wd()Collapses a distance matrix by amalgamating populations with pairwisefixed difference count less that a threshold
Description
This script takes a file generated by gl.fixed.diff and amalgamatespopulations with distance less than or equal to a specified threshold. Thedistance matrix is generated by gl.fixed.diff().
The script then applies the new population assignments to the genlight objectand recalculates the distance and associated matrices.
Usage
gl.collapse(fd, tpop = 0, tloc = 0, pb = FALSE, verbose = NULL)Arguments
fd | Name of the list of matrices produced by gl.fixed.diff() [required]. |
tpop | Threshold number of fixed differences above which populationswill not be amalgamated [default 0]. |
tloc | Threshold defining a fixed difference (e.g. 0.05 implies 95:5 vs5:95 is fixed) [default 0]. |
pb | If TRUE, show a progress bar on time consuming loops [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity] |
Value
A list containing the gl object x and the following square matrices:
$gl – the new genlight object with populations collapsed;
$fd – raw fixed differences;
$pcfd – percent fixed differences;
$nobs – mean no. of individuals used in each comparison;
$nloc – total number of loci used in each comparison;
$expfpos – NA's, populated by gl.fixed.diff [by simulation]
$expfpos – NA's, populated by gl.fixed.diff [by simulation]
$prob – NA's, populated by gl.fixed.diff [by simulation]
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
fd <- gl.fixed.diff(testset.gl,tloc=0.05)fdfd2 <- gl.collapse(fd,tpop=1)fd2fd3 <- gl.collapse(fd2,tpop=1)fd3 fd <- gl.fixed.diff(testset.gl,tloc=0.05) fd2 <- gl.collapse(fd)This is a helper function that supports the creation of color palettes for all plotting functions.
Description
This is a helper function that supports the creation of color palettes for all plotting functions.
Usage
gl.colors(type = 2)Arguments
type | the type of color or palette. Can be "2" [two colors], "2c" [two colors contrast], "3" [three colors], "4" [four colors], "pal" [need to be specify the palette type and the number of colors ]. A palette of colors can be specified via "div" [divergent], "dis" [discrete], "con" [convergent], "vir" [viridis]. Be aware a palette needs the number of colors specified as well. It returns a function and therefore the number of colors needs to be a part of the function call. Check the examples to see how this works. |
Examples
gl.colors(2)gl.colors("2")gl.colors("2c")#five discrete colorsgl.colors(type="dis")(5)#seven divergent colorsgl.colors("div")(7)Checks a genlight object to see if it complies with dartRexpectations and amends it to comply if necessary
Description
This function will check to see that the genlight object conforms toexpectation in regard to dartR requirements (see details), and if it doesnot, will rectify it.
Usage
gl.compliance.check(x, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
A genlight object used by dartR has a number of requirements that allowfunctions within the package to operate correctly. The genlight objectcomprises:
The SNP genotypes or Tag Presence/Absence data (SilicoDArT);
An associated dataframe (gl@other$loc.metrics) containing the locusmetrics (e.g. Call Rate, Repeatability, etc);
An associated dataframe (gl@other$ind.metrics) containing theindividual/sample metrics (e.g. sex, latitude (=lat), longitude(=lon), etc);
A specimen identity field (indNames(gl)) with the unique labels appliedto each individual/sample;
A population assignment (popNames) for each individual/specimen;
Flags that indicate whether or not calculable locus metrics have beenupdated.
Value
A genlight object that conforms to the expectations of dartR
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
x <- gl.compliance.check(testset.gl)x <- gl.compliance.check(testset.gs)Calculates cost distances for a given landscape (resistance matrix)
Description
Calculates a cost distance matrix, to be used with run.popgensim.
Usage
gl.costdistances(landscape, locs, method, NN, verbose = NULL)Arguments
landscape | A raster object coding the resistance of the landscape[required]. |
locs | Coordinates of the subpopulations. If a genlight object isprovided coordinates are taken from @other$latlon and centers for population(pop(gl)) are calculated. In case you want to calculate costdistances betweenindividuals redefine pop(gl) via: |
method | Defines the type of cost distance, types are 'leastcost','rSPDistance' or 'commute' (Circuitscape type) [required]. |
NN | Number of next neighbours recommendation is 8 [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A costdistance matrix between all pairs of locs.
Examples
## Not run: data(possums.gl)library(raster) #needed for that examplelandscape.sim <- readRDS(system.file('extdata','landscape.sim.rdata', package='dartR'))#calculate mean centers of individuals per populationxy <- apply(possums.gl@other$xy, 2, function(x) tapply(x, pop(possums.gl), mean))cd <- gl.costdistances(landscape.sim, xy, method='leastcost', NN=8)round(cd,3)## End(Not run)Defines a new population in a genlight object for specified individuals
Description
The script reassigns existing individuals to a new population and removestheir existing population assignment.
The script returns a genlight object with the new population assignment.
Usage
gl.define.pop(x, ind.list, new, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes [required]. |
ind.list | A list of individuals to be assigned to the new population[required]. |
new | Name of the new population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object with the redefined population structure.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
popNames(testset.gl)gl <- gl.define.pop(testset.gl, ind.list=c('AA019073','AA004859'), new='newguys')popNames(gl)indNames(gl)[pop(gl)=='newguys']Provides descriptive stats and plots to diagnose potential problemswith Hardy-Weinberg proportions
Description
Different causes may be responsible for lack of Hardy-Weinbergproportions. This function helps diagnose potential problems.
Usage
gl.diagnostics.hwe( x, alpha_val = 0.05, bins = 20, stdErr = TRUE, colors_hist = two_colors, colors_barplot = two_colors_contrast, plot_theme = theme_dartR(), save2tmp = FALSE, n.cores = "auto", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
alpha_val | Level of significance for testing [default 0.05]. |
bins | Number of bins to display in histograms [default 20]. |
stdErr | Whether standard errors for Fis and Fst should be computed (default: TRUE) |
colors_hist | List of two color names for the borders and fill of thehistogram [default two_colors]. |
colors_barplot | Vector with two color names for the observed andexpected number of significant HWE tests [default two_colors_contrast]. |
plot_theme | User specified theme [default theme_dartR()]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
n.cores | The number of cores to use. If "auto", it will use all but one available cores [default "auto"]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
This function initially runsgl.report.hwe and reportsthe ternary plots. The remaining outputs follow the recommendations fromWaples(2015) paper and De Meeûs 2018. These include:
A histogramwith the distribution of p-values of the HWE tests. The distribution shouldbe roughly uniform across equal-sized bins.
A bar plot with observedand expected (null expectation) number of significant HWE tests for the samelocus in multiple populations (that is, the x-axis shows whether a locusresults significant in 1, 2, ..., n populations. The y axis is the count ofthese occurrences. The zero value on x-axis shows the number ofnon-significant tests). If HWE tests are significant by chance alone,observed and expected number of HWE tests should have roughly a similardistribution.
A scatter plot with a linear regression between Fst and Fis,averaged across subpopulations. De Meeûs 2018 suggests that in the case ofNull alleles, a strong positive relationship is expected (together with theFis standard error much larger than the Fst standard error, see below).Note, this is not the scatter plot that Waples 2015 presents in hispaper. In the lower right corner of the plot, the Pearson correlationcoefficient is reported.
The Fis and Fst (averaged over loci andsubpopulations) standard errors are also printed on screen and reported inthe returned list (if
stdErr=TRUE). These are computed with the Jackknife method over loci (See De Meeûs 2007 for details on how this is computed) and it may take some time for these computations to complete. De Meeûs 2018 suggests that under a global significant heterozygosity deficit:- if thecorrelation between Fis and Fst is strongly positive, and StdErrFis >>StdErrFst, Null alleles are likely to be the cause.
- if the correlationbetween Fis and Fst is ~0 or mildly positive, and StdErrFis > StdErrFst,Wahlund may be the cause.
- if the correlation between Fis and Fst is ~0, andStdErrFis ~ StdErrFst, selfing or sib mating could to be the cause.
It isimportant to realise that these statistics only suggest a pattern (pointers).Their absence is not conclusive evidence of the absence of the problem, as their presence does not confirm the cause of the problem.
A table where thenumber of observed and expected significant HWE tests are reported by eachpopulation, indicating whether these are due to heterozygosity excess ordeficiency. These can be used to have a clue of potential problems (e.g.deficiency might be due to a Wahlund effect, presence of null alleles ornon-random sampling; excess might be due to sex linkage or differentselection between sexes, demographic changes or small Ne. See Table 1 inWapples 2015). The last two columns of the table generated by this functionreport chisquare values and their associated p-values. Chisquare is computedfollowing Fisher's procedure for a global test (Fisher 1970). This basicallytests whether there is at least one test that is truly significant in theseries of tests conducted (De Meeûs et al 2009).
Value
A list with the table with the summary of the HWE tests and (if stdErr=TRUE) a named vector with the StdErrFis and StdErrFst.
Author(s)
Custodian: Carlo Pacioni – Post tohttps://groups.google.com/d/forum/dartr
References
de Meeûs, T., McCoy, K.D., Prugnolle, F.,Chevillon, C., Durand, P., Hurtrez-Boussès, S., Renaud, F., 2007. Populationgenetics and molecular epidemiology or how to “débusquer la bête”. Infection,Genetics and Evolution 7, 308-332.
De Meeûs, T., Guégan, J.-F., Teriokhin, A.T., 2009. MultiTest V.1.2, a program to binomially combine independent tests and performance comparison with other related methods onproportional data. BMC Bioinformatics 10, 443-443.
De Meeûs, T., 2018. Revisiting FIS, FST, Wahlund Effects, and Null Alleles. Journal of Heredity 109, 446-456.
Fisher, R., 1970.Statistical methods for research workers Edinburgh: Oliver and Boyd.
Waples, R. S. (2015). Testing for Hardy–Weinberg proportions: have we lostthe plot?. Journal of heredity, 106(1), 1-19.
See Also
Examples
## Not run: require("dartR.data")res <- gl.diagnostics.hwe(x = gl.filter.allna(platypus.gl[,1:50]), stdErr=FALSE, n.cores=1)## End(Not run)Comparing simulations against theoretical expectations
Description
Comparing simulations against theoretical expectations
Usage
gl.diagnostics.sim( x, Ne, iteration = 1, pop_he = 1, pops_fst = c(1, 2), plot_theme = theme_dartR(), save2tmp = FALSE, verbose = NULL)Arguments
x | Output from function |
Ne | Effective population size to use as input to compare theoretical expectations [required]. |
iteration | Iteration number to analyse [default 1]. |
pop_he | Population name in which the rate of loss of heterozygosity isgoing to be compared against theoretical expectations [default 1]. |
pops_fst | Pair of populations in which FST is going to be compared against theoretical expectations [default c(1,2)]. |
plot_theme | User specified theme [default theme_dartR()]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
Two plots are presented comparing the simulations against theoretical expectations:
Expected heterozygosity under neutrality (Crow & Kimura, 1970, p. 329) is calculated as:
Het = He0(1-(1/2Ne))^t,
where Ne is effective population size, He0 is heterozygosity at generation 0and t is the number of generations.
Expected FST under neutrality (Takahata, 1983) is calculated as:
FST=1/(4Nem(n/(n-1))^2+1),
where Ne is effective populations size of each individual subpopulation, m isdispersal rate and n the number of subpopulations (always 2).
Value
Returns plots comparing simulations against theoretical expectations
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
References
Crow JF, Kimura M. An introduction to population genetics theory. An introduction to population genetics theory. 1970.
Takahata N. Gene identity and genetic differentiation of populations in the finite island model. Genetics. 1983;104(3):497-512.
See Also
Examples
## Not run: ref_table <- gl.sim.WF.table(file_var=system.file('extdata', 'ref_variables.csv', package = 'dartR'),interactive_vars = FALSE)res_sim <- gl.sim.WF.run(file_var = system.file('extdata', 'sim_variables.csv', package ='dartR'),ref_table=ref_table,interactive_vars = FALSE,number_pops_phase2=2,population_size_phase2="50 50")res <- gl.diagnostics.sim(x=res_sim,Ne=50)## End(Not run)Calculates a distance matrix for individuals defined in a genlight object
Description
This script calculates various distances between individuals based on allelefrequencies or presence-absence data
Usage
gl.dist.ind( x, method = NULL, scale = FALSE, swap = FALSE, output = "dist", plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight containing the SNP genotypes or presence-absence data [required]. |
method | Specify distance measure [SNP: Euclidean; P/A: Simple]. |
scale | If TRUE, the distances are scaled to fall in the range [0,1] [default TRUE] |
swap | If TRUE and working with presence-absence data, then presence (no disrupting mutation) is scored as 0 and absence (presence of a disrupting mutation) is scored as 1 [default FALSE]. |
output | Specify the format and class of the object to be returned, 'dist' for a object of class dist, 'matrix' for an object of class matrix [default "dist"]. |
plot.out | If TRUE, display a histogram and a boxplot of the genetic distances [TRUE]. |
plot_theme | User specified theme [default theme_dartR]. |
plot_colors | Vector with two color names for the borders and fill [default two_colors]. |
save2tmp | If TRUE, saves any ggplots to the session temporary directory [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The distance measure for SNP genotypes can be one of:
Euclidean Distance [method = "Euclidean"]
Scaled Euclidean Distance [method='Euclidean", scale=TRUE]
Simple Mismatch Distance [method="Simple"]
Absolute Mismatch Distance [method="Absolute"]
Czekanowski (Manhattan) Distance [method="Manhattan"]
The distance measure for Sequence Tag Presence/Absence data (binary) can be one of:
Euclidean Distance [method = "Euclidean"]
Scaled Euclidean Distance [method='Euclidean", scale=TRUE]
Simple Matching Distance [method="Simple"]
Jaccard Distance [method="Jaccard"]
Bray-Curtis Distance [method="Bray-Curtis"]
Refer to the dartR Technical Note on Distances in Genetics.
Value
An object of class 'matrix' or dist' giving distances between individuals
Author(s)
Author(s): Arthur Georges. Custodian: Arthur Georges – Post to #'https://groups.google.com/d/forum/dartr
Examples
D <- gl.dist.ind(testset.gl[1:20,], method='manhattan')D <- gl.dist.ind(testset.gs[1:20,], method='Jaccard',swap=TRUE)D <- gl.dist.ind(testset.gl[1:20,], method='euclidean',scale=TRUE)Calculates a distance matrix for populations with SNP genotypes in agenlight object
Description
This script calculates various distances between populations based on allelefrequencies (SNP genotypes) or frequency of presences in presence-absence data (Euclidean and Fixed-diff distances only).
Usage
gl.dist.pop( x, method = "euclidean", plot.out = TRUE, scale = FALSE, output = "dist", plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight containing the SNP genotypes [required]. |
method | Specify distance measure [default euclidean]. |
plot.out | If TRUE, display a histogram of the genetic distances, and awhisker plot [default TRUE]. |
scale | If TRUE and method='Euclidean', the distance will be scaled to fall in the range [0,1] [default FALSE]. |
output | Specify the format and class of the object to be returned, dist for a object of class dist, matrix for an object of class matrix [default "dist"]. |
plot_theme | User specified theme [default theme_dartR()]. |
plot_colors | Vector with two color names for the borders and fill[default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The distance measure can be one of 'euclidean', 'fixed-diff', 'reynolds','nei' and 'chord'. Refer to the documentation of functionsdescribed in the the dartR Distance Analysis tutorial for algorithmsand definitions.
Value
An object of class 'dist' giving distances between populations
Author(s)
author(s): Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
## Not run: # SNP genotypesD <- gl.dist.pop(possums.gl[1:90,1:100], method='euclidean')D <- gl.dist.pop(possums.gl[1:90,1:100], method='euclidean',scale=TRUE)#D <- gl.dist.pop(possums.gl, method='nei')#D <- gl.dist.pop(possums.gl, method='reynolds')#D <- gl.dist.pop(possums.gl, method='chord')#D <- gl.dist.pop(possums.gl, method='fixed-diff')#Presence-Absence data [only 10 individuals due to speed]D <- gl.dist.pop(testset.gs[1:10,], method='euclidean')## End(Not run)res <- gl.dist.pop(platypus.gl)Removes specified individuals from a dartR genlight object
Description
This function deletes individuals and their associated metadata.Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).
The script returns a dartR genlight object with the retained individuals and the recalculated locus metadata. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Usage
gl.drop.ind(x, ind.list, recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
ind.list | List of individuals to be removed [required]. |
recalc | If TRUE, recalculate the locus metadata statistics [default FALSE]. |
mono.rm | If TRUE, remove monomorphic and all NA loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A reduced dartR genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.keep.ind to keep rather than drop specifiedindividuals
Other dartR-base:gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
# SNP data gl2 <- gl.drop.ind(testset.gl, ind.list=c('AA019073','AA004859')) # Tag P/A data gs2 <- gl.drop.ind(testset.gs, ind.list=c('AA020656','AA19077','AA004859')) gs2 <- gl.drop.ind(testset.gs, ind.list=c('AA020656' ,'AA19077','AA004859'),mono.rm=TRUE, recalc=TRUE)Removes specified loci from a dartR genlight object
Description
This function deletes individuals and their associated metadata.
The script returns a dartR genlight object with the retained loci. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Usage
gl.drop.loc( x, loc.list = NULL, first_tmp = NULL, last_tmp = NULL, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
loc.list | A list of loci to be deleted[required, if loc.range not specified]. |
first_tmp | First of a range of loci to be deleted[required, if loc.list not specified]. |
last_tmp | Last of a range of loci to be deleted[if not specified, last_tmp locus in the dataset]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A reduced dartR genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.keep.loc to keep rather than drop specified loci
Other dartR-base:gl.drop.ind(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
# SNP data gl2 <- gl.drop.loc(testset.gl, loc.list=c('100051468|42-A/T', '100049816-51-A/G'),verbose=3)# Tag P/A data gs2 <- gl.drop.loc(testset.gs, loc.list=c('20134188','19249144'),verbose=3)Removes specified populations from a dartR genlight object
Description
Individuals are assigned to populations based on associated specimen metadatastored in the dartR genlight object. This function deletes all individuals in the nominated populations (pop.list).Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).
The script returns a dartR genlight object with the retained populations and the recalculated locus metadata. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Usage
gl.drop.pop( x, pop.list, as.pop = NULL, recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
pop.list | List of populations to be removed [required]. |
as.pop | Temporarily assign another locus metric as the population forthe purposes of deletions [default NULL]. |
recalc | If TRUE, recalculate the locus metadata statistics [default FALSE]. |
mono.rm | If TRUE, remove monomorphic and all NA loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A reduced dartR genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.keep.pop to keep rather than drop specified populations
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
# SNP data gl2 <- gl.drop.pop(testset.gl, pop.list=c('EmsubRopeMata','EmvicVictJasp'),verbose=3) gl2 <- gl.drop.pop(testset.gl, pop.list=c('EmsubRopeMata','EmvicVictJasp'), mono.rm=TRUE,recalc=TRUE) gl2 <- gl.drop.pop(testset.gl,as.pop='sex',pop.list=c('Male','Unknown'),verbose=3) # Tag P/A data gs2 <- gl.drop.pop(testset.gs, pop.list=c('EmsubRopeMata','EmvicVictJasp'))Creates or edits individual (=specimen) names, creates a recode_indfile and applies the changes to a genlight object
Description
A function to edit names of individual in a dartR genlight object, or to create areassignment table taking the individual labels from a genlight object, or toedit existing individual labels in an existing recode_ind file. The amended recode table is then applied to the genlight object.
Usage
gl.edit.recode.ind( x, out.recode.file = NULL, outpath = tempdir(), recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
out.recode.file | Name of the file to output the new individual labels[optional]. |
outpath | Path specifying where to save the output file[default tempdir(), mandated by CRAN]. |
recalc | If TRUE, recalculate the locus metadata statistics [default TRUE]. |
mono.rm | If TRUE, remove monomorphic loci [default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
Renaming individuals may be required when there have been errors in labelingarising in the passage of samples to sequencing. There may be occasionswhere renaming individuals is required for preparation of figures.
This function will input an existing recode table for editing and optionallysave it as a new table, or if the name of an input table is not supplied,will generate a table using the individual labels in the parent genlightobject.
When caution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a durable record of the changes.
For SNP genotype data, the function, having deleted individuals, optionally identifies resultant monomorphic loci or loci with all values missing and deletes them. The script also optionally recalculates thelocus metadata as appropriate. The optional deletion of monomorphic lociand the optional recalculation of locus statistics is not available forTag P/A data (SilicoDArT).
Use outpath=getwd() when calling this function to directoutput files to your working directory.
The function returns a dartR genlight object with the new population assignments and the recalculated locus metadata.
Value
An object of class ('genlight') with the revised individual labels.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.recode.ind,gl.drop.ind,gl.keep.ind
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
## Not run: gl <- gl.edit.recode.ind(testset.gl)gl <- gl.edit.recode.ind(testset.gl, out.recode.file='ind.recode.table.csv')## End(Not run)Creates or edits and applies a population re-assignment table
Description
A function to edit population assignments in a dartR genlight object, or tocreate a reassignment table taking the population assignmentsfrom a genlight object, or to edit existing population assignments ina pop.recode.table. The amended recode table is then applied to the genlightobject.
Usage
gl.edit.recode.pop( x, pop.recode = NULL, out.recode.file = NULL, outpath = tempdir(), recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
pop.recode | Path to recode file [default NULL]. |
out.recode.file | Name of the file to output the new individual labels[default NULL]. |
outpath | Path where to save the output file [default tempdir(), mandated by CRAN]. |
recalc | If TRUE, recalculate the locus metadata statistics[default TRUE]. |
mono.rm | If TRUE, remove monomorphic loci [default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
Genlight objects assign specimens to populations based on information in theind.metadata file provided when the genlight object is first generated.Often one wishes to subset the data by deleting populations or to amalgamatepopulations. This can be done with a pop.recode table with two columns. Thefirst column is the population assignment in the genlight object, the secondcolumn provides the new assignment.
This function will input an existing reassignment table for editing andoptionally save it as a new table, or if the name of an input table is notsupplied, will generate a table using the population assignments in theparent genlight object. It will then apply the recodings to the genlight object.
When caution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a durable record of the changes.
For SNP genotype data, the function, having deleted populations, optionally identifies resultant monomorphic loci or loci with all values missing and deletes them. The script also optionally recalculates thelocus metadata as appropriate. The optional deletion of monomorphic lociand the optional recalculation of locus statistics is not available forTag P/A data (SilicoDArT).
Use outpath=getwd() when calling this function to directoutput files to your working directory.
The function returns a dartR genlight object with the new population assignments and the recalculated locus metadata.
Value
A genlight object with the revised population assignments
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.recode.pop,gl.drop.pop,gl.keep.pop,gl.merge.pop,gl.reassign.pop
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
## Not run: gl <- gl.edit.recode.pop(testset.gl)gs <- gl.edit.recode.pop(testset.gs)## End(Not run)# See also -------------------Creates an Evanno plot from a STRUCTURE run object
Description
This function takes a genlight object and runs a STRUCTURE analysis based onfunctions fromstrataG
Usage
gl.evanno(sr, plot.out = TRUE)Arguments
sr | structure run object from |
plot.out | TRUE: all four plots are shown. FALSE: all four plots arereturned as a ggplot but not shown [default TRUE]. |
Details
The function is basically a convenient wrapper around the beautifulstrataG functionevanno (Archer et al. 2016). For a detaileddescription please refer to this package (see references below).
Value
An Evanno plot is created and a list of all four plots is returned.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
References
Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016) strataG: An Rpackage for manipulating, summarizing and analysing population genetic data.Mol Ecol Resour. doi:10.1111/1755-0998.12559
Evanno, G., Regnaut, S., and J. Goudet. 2005. Detecting the number ofclusters of individuals using the software STRUCTURE: a simulation study.Molecular Ecology 14:2611-2620.
See Also
gl.run.structure,clumpp,
Examples
## Not run: #CLUMPP and STRUCTURE need to be installed to be able to run the example#bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, exec = './structure.exe')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, k=3, CLUMPP='d:/structure/')#head(qmat)#gl.map.structure(qmat, bc, scalex=1, scaley=0.5)## End(Not run)Estimates the rate of false positives in a fixed difference analysis
Description
This function takes two populations and generates allele frequency profilesfor them. It then samples an allele frequency for each, at random, andestimates a sampling distribution for those two allele frequencies. Drawingtwo samples from those sampling distributions, it calculates whether or notthey represent a fixed difference. This is applied to all loci, and thenumber of fixed differences so generated are counted, as an expectation. Thescript distinguished between true fixed differences (with a tolerance ofdelta), and false positives. The simulation is repeated a given number oftimes (default=1000) to provide an expectation of the number of falsepositives, given the observed allele frequency profiles and the sample sizes.The probability of the observed count of fixed differences is greater thanthe expected number of false positives is calculated.
Usage
gl.fdsim( x, poppair, obs = NULL, sympatric = FALSE, reps = 1000, delta = 0.02, verbose = NULL)Arguments
x | Name of the genlight containing the SNP genotypes [required]. |
poppair | Labels of two populations for comparison in the formc(popA,popB) [required]. |
obs | Observed number of fixed differences between the two populations[default NULL]. |
sympatric | If TRUE, the two populations are sympatric, if FALSE thenallopatric [default FALSE]. |
reps | Number of replications to undertake in the simulation[default 1000]. |
delta | The threshold value for the minor allele frequency to regard thedifference between two populations to be fixed [default 0.02]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Value
A list containing the following square matrices[[1]] observed fixed differences;[[2]] mean expected number of false positives for each comparison;[[3]] standard deviation of the no. of false positives for eachcomparison;[[4]] probability the observed fixed differences arose by chance foreach comparison.
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
Examples
fd <- gl.fdsim(testset.gl[,1:100],poppair=c('EmsubRopeMata','EmmacBurnBara'),sympatric=TRUE,verbose=3)Filters loci that are all NA across individuals and/or populations with all NA across loci
Description
This script deletes deletes loci or individuals with all calls missing (NA),from a genlight object
A DArT dataset will not have loci for which the calls are scored all asmissing (NA) for a particular individual, but such loci can arise rarely whenpopulations or individuals are deleted. Similarly, a DArT dataset will nothave individuals for which the calls are scored all as missing (NA) acrossall loci, but such individuals may sneak in to the dataset when loci aredeleted. Retaining individual or loci with all NAs can cause issues forseveral functions.
Also, on occasion an analysis will require that there are some loci scoredin each population. Setting by.pop=TRUE will result in removal of loci whenthey are all missing in any one population.
Note that loci that are missing for all individuals in a population arenot imputed with method 'frequency' or 'HW'. Consider using the functiongl.filter.allna with by.pop=TRUE.
Usage
gl.filter.allna(x, by.pop = FALSE, recalc = FALSE, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
by.pop | If TRUE, loci that are all missing in any one populationare deleted [default FALSE] |
recalc | Recalculate the locus metadata statistics if any individualsare deleted in the filtering [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A genlight object having removed individuals that are scored NAacross all loci, or loci that are scored NA across all individuals.
Author(s)
Author(s): Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
# SNP data result <- gl.filter.allna(testset.gl, verbose=3)# Tag P/A data result <- gl.filter.allna(testset.gs, verbose=3)Filters loci or specimens in a genlight {adegenet} object based oncall rate
Description
SNP datasets generated by DArT have missing values primarily arising fromfailure to call a SNP because of a mutation at one or both of the restrictionenzyme recognition sites. The script gl.filter.callrate() will filter out theloci with call rates below a specified threshold.
Tag Presence/Absence datasets (SilicoDArT) have missing values where it isnot possible to determine reliably if there the sequence tag can be called ata particular locus.
Usage
gl.filter.callrate( x, method = "loc", threshold = 0.95, mono.rm = FALSE, recalc = FALSE, recursive = FALSE, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, bins = 25, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data, or the genindobject containing the SilocoDArT data [required]. |
method | Use method='loc' to specify that loci are to be filtered, 'ind'to specify that specimens are to be filtered, 'pop' to remove loci that fail to meet the specified threshold in any one population [default 'loc']. |
threshold | Threshold value below which loci will be removed[default 0.95]. |
mono.rm | Remove monomorphic loci after analysis is complete[default FALSE]. |
recalc | Recalculate the locus metadata statistics if any individualsare deleted in the filtering [default FALSE]. |
recursive | Repeatedly filter individuals on call rate, each timeremoving monomorphic loci. Only applies if method='ind' and mono.rm=TRUE[default FALSE]. |
plot.out | Specify if histograms of call rate, before and after, are tobe produced [default TRUE]. |
plot_theme | User specified theme for the plot [default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
bins | Number of bins to display in histograms [default 25]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
Because this filter operates on call rate, this function recalculates CallRate, if necessary, before filtering. If individuals are removed usingmethod='ind', then the call rate stored in the genlight object is, optionally,recalculated after filtering.
Note that when filtering individuals on call rate, the initial call rate iscalculated and compared against the threshold. After filtering, ifmono.rm=TRUE, the removal of monomorphic loci will alter the call rates.Some individuals with a call rate initially greater than the nominatedthreshold, and so retained, may come to have a call rate lower than thethreshold. If this is a problem, repeated iterations of this function willresolve the issue. This is done by setting mono.rm=TRUE and recursive=TRUE,or it can be done manually.
Callrate is summarized by locus or by individual to allow sensible decisionson thresholds for filtering taking into consideration consequential loss ofdata. The summary is in the form of a tabulation and plots.
Plot themes can be obtained from
Resultant ggplot(s) and the tabulation(s) are saved to the session'stemporary directory.
Value
The reduced genlight or genind object, plus a summary
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
# SNP data result <- gl.filter.callrate(testset.gl[1:10], method='loc', threshold=0.8, verbose=3) result <- gl.filter.callrate(testset.gl[1:10], method='ind', threshold=0.8, verbose=3) result <- gl.filter.callrate(testset.gl[1:10], method='pop', threshold=0.8, verbose=3)# Tag P/A data result <- gl.filter.callrate(testset.gs[1:10], method='loc', threshold=0.95, verbose=3) result <- gl.filter.callrate(testset.gs[1:10], method='ind', threshold=0.8, verbose=3) result <- gl.filter.callrate(testset.gs[1:10], method='pop', threshold=0.8, verbose=3) res <- gl.filter.callrate(platypus.gl)Filters loci based on pairwise Hamming distance between sequence tags
Description
Hamming distance is calculated as the number of base differences between twosequences which can be expressed as a count or a proportion. Typically, it iscalculated between two sequences of equal length. In the context of DArTtrimmed sequences, which differ in length but which are anchored to the leftby the restriction enzyme recognition sequence, it is sensible to compare thetwo trimmed sequences starting from immediately after the common recognitionsequence and terminating at the last base of the shorter sequence.
Usage
gl.filter.hamming( x, threshold = 0.2, rs = 5, taglength = 69, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, pb = FALSE, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
threshold | A threshold Hamming distance for filtering loci[default threshold 0.2]. |
rs | Number of bases in the restriction enzyme recognition sequence[default 5]. |
taglength | Typical length of the sequence tags [default 69]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
pb | Switch to output progress bar [default FALSE]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
Hamming distance can be computedby exploiting the fact that the dot product of two binary vectors x and (1-y)counts the corresponding elements that are different between x and y.This approach can also be used for vectors that contain more than two possible values at each position (e.g. A, C, T or G).
If a pair of DNA sequences are of differing length, the longer is truncated.
The algorithm is that of Johann de Jonghttps://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/as implemented inutils.hamming.
Only one of two loci are retained if their Hamming distance is less that a specifiedpercentage. 5 base differences out of 100 bases is a 20
Value
A genlight object filtered on Hamming distance.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
# SNP datatest <- platypus.gltest <- gl.subsample.loci(platypus.gl,n=50)result <- gl.filter.hamming(test, threshold=0.25, verbose=3)Filters individuals with average heterozygosity greater than aspecified upper threshold or less than a specified lower threshold
Description
Calculates the observed heterozygosity for each individual in a genlightobject and filters individuals based on specified threshold values.Use gl.report.heterozygosity to determine the appropriate thresholds.
Usage
gl.filter.heterozygosity(x, t.upper = 0.7, t.lower = 0, verbose = NULL)Arguments
x | A genlight object containing the SNP genotypes [required]. |
t.upper | Filter individuals > the threshold [default 0.7]. |
t.lower | Filter individuals < the threshold [default 0]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
The filtered genlight object.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
result <- gl.filter.heterozygosity(testset.gl,t.upper=0.06,verbose=3) tmp <- gl.report.heterozygosity(result,method='ind')Filters loci that show significant departure from Hardy-WeinbergEquilibrium
Description
This function filters out loci showing significant departure from H-Wproportions based on observed frequencies of reference homozygotes,heterozygotes and alternate homozygotes.
Loci are filtered out if they show HWE departure either in any one population (n.pop.threshold =1) or in at least X number of populations (n.pop.threshold > 1).
Usage
gl.filter.hwe( x, subset = "each", n.pop.threshold = 1, method_sig = "Exact", multi_comp = FALSE, multi_comp_method = "BY", alpha_val = 0.05, pvalue_type = "midp", cc_val = 0.5, min_sample_size = 5, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
subset | Way to group individuals to perform H-W tests. Either a vectorwith population names, 'each', 'all' (see details) [default 'each']. |
n.pop.threshold | The minimum number of populations where the same locus has to be out of H-W proportions to be removed [default 1]. |
method_sig | Method for determining statistical significance: 'ChiSquare'or 'Exact' [default 'Exact']. |
multi_comp | Whether to adjust p-values for multiple comparisons[default FALSE]. |
multi_comp_method | Method to adjust p-values for multiple comparisons:'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr'(see details) [default 'fdr']. |
alpha_val | Level of significance for testing [default 0.05]. |
pvalue_type | Type of p-value to be used in the Exact method.Either 'dost','selome','midp' (see details) [default 'midp']. |
cc_val | The continuity correction applied to the ChiSquare test[default 0.5]. |
min_sample_size | Minimum number of individuals per population in whichperform H-W tests [default 5]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
There are several factors that can cause deviations from Hardy-Weinbergproportions including: mutation, finite population size, selection,population structure, age structure, assortative mating, sex linkage,nonrandom sampling and genotyping errors. Therefore, testing forHardy-Weinberg proportions should be a process that involves a carefulevaluation of the results, a good place to start is Waples (2015).
Note that tests for H-W proportions are only valid if there is no populationsubstructure (assuming random mating) and have sufficient power only whenthere is sufficient sample size (n individuals > 15).
Populations can be defined in three ways:
Merging all populations in the dataset using subset = 'all'.
Within each population separately using: subset = 'each'.
Within selected populations using for example: subset = c('pop1','pop2').
Two different statistical methods to test for deviations from Hardy Weinbergproportions:
The classical chi-square test (method_sig='ChiSquare') based on thefunction
HWChisqof the R package HardyWeinberg.By default a continuity correction is applied (cc_val=0.5). Thecontinuity correction can be turned off (by specifying cc_val=0), for examplein cases of extreme allele frequencies in which the continuity correction canlead to excessive type 1 error rates.The exact test (method_sig='Exact') based on the exact calculationscontained in the function
HWExactStatsof the Rpackage HardyWeinberg, and described in Wigginton et al. (2005). The exacttest is recommended in most cases (Wigginton et al., 2005).Three different methods to estimate p-values (pvalue_type) in the Exact testcan be used:'dost' p-value is computed as twice the tail area of a one-sided test.
'selome' p-value is computed as the sum of the probabilities of allsamples less or equally likely as the current sample.
'midp', p-value is computed as half the probability of the currentsample + the probabilities of all samples that are more extreme.
The standard exact p-value is overly conservative, in particularfor small minor allele frequencies. The mid p-value ameliorates this problemby bringing the rejection rate closer to the nominal level, at the price ofoccasionally exceeding the nominal level (Graffelman & Moreno, 2013).
Correction for multiple tests can be applied using the following methodsbased on the functionp.adjust:
'holm' is also known as the sequential Bonferroni technique (Rice, 1989).This method has a greater statistical power than the standard Bonferroni test,however this method becomes very stringent when many tests are performed andmany real deviations from the null hypothesis can go undetected (Waples, 2015).
'hochberg' based on Hochberg, 1988.
'hommel' based on Hommel, 1988. This method is more powerful thanHochberg's, but the difference is usually small.
'bonferroni' in which p-values are multiplied by the number of tests.This method is very stringent and therefore has reduced power to detectmultiple departures from the null hypothesis.
'BH' based on Benjamini & Hochberg, 1995.
'BY' based on Benjamini & Yekutieli, 2001.
The first four methods are designed to give strong control of the family-wiseerror rate. The last two methods control the false discovery rate (FDR),the expected proportion of false discoveries among the rejected hypotheses.The false discovery rate is a less stringent condition than the family-wiseerror rate, so these methods are more powerful than the others, especiallywhen number of tests is large.The number of tests on which the adjustment for multiple comparisons isthe number of populations times the number of loci.
From v2.1gl.filter.hwe takes the argumentn.pop.threshold.ifn.pop.threshold > 1 loci will be removed only if they are concurrently significant (after adjustment if applied) out of hwe in >=n.pop.threshold > 1.
Value
A genlight object with the loci departing significantly from H-Wproportions removed.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
References
Benjamini, Y., and Yekutieli, D. (2001). The control of the falsediscovery rate in multiple testing under dependency. Annals of Statistics,29, 1165–1188.
Graffelman, J. (2015). Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64:1-23.
Graffelman, J. & Morales-Camarena, J. (2008). Graphical tests forHardy-Weinberg equilibrium based on the ternary plot. Human Heredity 65:77-84.
Graffelman, J., & Moreno, V. (2013). The mid p-value in exact tests forHardy-Weinberg equilibrium. Statistical applications in genetics andmolecularbiology, 12(4), 433-448.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple testsof significance. Biometrika, 75, 800–803.
Hommel, G. (1988). A stagewise rejective multiple test procedure basedon a modified Bonferroni test. Biometrika, 75, 383–386.
Rice, W. R. (1989). Analyzing tables of statistical tests. Evolution,43(1), 223-225.
Waples, R. S. (2015). Testing for Hardy–Weinberg proportions: have welost the plot?. Journal of heredity, 106(1), 1-19.
Wigginton, J.E., Cutler, D.J., & Abecasis, G.R. (2005). A Note on ExactTests of Hardy-Weinberg Equilibrium. American Journal of Human Genetics76:887-893.
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
result <- gl.filter.hwe(x = bandicoot.gl)Filters loci on the basis of numeric information stored inother$loc.metrics in a genlight {adegenet} object
Description
This script uses any field with numeric values stored in $other$loc.metricsto filter loci. The loci to keep can be within the upper and lower thresholds('within') or outside of the upper and lower thresholds ('outside').
Usage
gl.filter.locmetric(x, metric, upper, lower, keep = "within", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
metric | Name of the metric to be used for filtering [required]. |
upper | Filter upper threshold [required]. |
lower | Filter lower threshold [required]. |
keep | Whether keep loci within of upper and lower thresholds or keeploci outside of upper and lower thresholds [within]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The fields that are included in dartR, and a short description, are foundbelow. Optionally, the user can also set his/her own filter by adding avector into $other$loc.metrics as shown in the example.
SnpPosition - position (zero is position 1) in the sequence tag of thedefined SNP variant base.
CallRate - proportion of samples for which the genotype call isnon-missing (that is, not '-' ).
OneRatioRef - proportion of samples for which the genotype score is 0.
OneRatioSnp - proportion of samples for which the genotype score is 2.
FreqHomRef - proportion of samples homozygous for the Reference allele.
FreqHomSnp - proportion of samples homozygous for the Alternate (SNP)allele.
FreqHets - proportion of samples which score as heterozygous, that is,scored as 1.
PICRef - polymorphism information content (PIC) for the Referenceallele.
PICSnp - polymorphism information content (PIC) for the SNP.
AvgPIC - average of the polymorphism information content (PIC) of theReference and SNP alleles.
AvgCountRef - sum of the tag read counts for all samples, divided bythe number of samples with non-zero tag read counts, for the Reference allelerow.
AvgCountSnp - sum of the tag read counts for all samples, divided bythe number of samples with non-zero tag read counts, for the Alternate (SNP)allele row.
RepAvg - proportion of technical replicate assay pairs for which themarker score is consistent.
Value
The reduced genlight dataset.
Author(s)
Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
# adding dummy datatest <- testset.gltest$other$loc.metrics$test <- 1:nLoc(test)result <- gl.filter.locmetric(x=test, metric= 'test', upper=255,lower=200, keep= 'within', verbose=3)Filters loci on the basis of minor allele frequency (MAF) in a genlightadegenet object
Description
This script calculates the minor allele frequency for each locus and updatesthe locus metadata for FreqHomRef, FreqHomSnp, FreqHets and MAF (if itexists). It then uses the updated metadata for MAF to filter loci.
Usage
gl.filter.maf( x, threshold = 0.01, by.pop = FALSE, pop.limit = ceiling(nPop(x)/2), ind.limit = 10, recalc = FALSE, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors_pop = discrete_palette, plot_colors_all = two_colors, bins = 25, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
threshold | Threshold MAF – loci with a MAF less than the thresholdwill be removed. If a value > 1 is provided it will be interpreted as MAC (i.e. the minimum number of times an allele needs to be observed) [default 0.01]. |
by.pop | Whether MAF should be calculated by population [default FALSE]. |
pop.limit | Minimum number of populations in which MAF should be less than the threshold for a locus to be filtered out. Only used if by.pop=TRUE. The default value is half of the populations [default ceiling(nPop(x)/2)]. |
ind.limit | Minimum number of individuals that a population should contain to calculate MAF. Only used if by.pop=TRUE [default 10]. |
recalc | Recalculate the locus metadata statistics if any individualsare deleted in the filtering [default FALSE]. |
plot.out | Specify if histograms of call rate, before and after, are tobe produced [default TRUE]. |
plot_theme | User specified theme for the plot [default theme_dartR()]. |
plot_colors_pop | A color palette for population plots[default discrete_palette]. |
plot_colors_all | List of two color names for the borders and fill ofthe overall plot [default two_colors]. |
bins | Number of bins to display in histograms [default 25]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
Careful consideration needs to be given to the settings to be used for this fucntion. When the filter is applied globally (i.e.by.pop=FALSE) but the data include multiple population, there is the risk to remove markers because the allele frequencies is low (at global level) but the allele frequenciesfor the same markers may be high within some of the populations (especially if the per-population sample size is small). Similarly, not always it is a sensible choice to run this function usingby.pop=TRUE because allele that are rare in a population may be very common in other, but the (possible) allele frequencies will depend on the sample size within each population. Where the purpose of filtering for MAF is to remove possible spurious alleles (i.e. sequencing errors), it is perhaps better to filter based on the number of times an allele is observed (MAC, Minimum Allele Count), under the assumption that if an allele is observed >MAC, it is fairly rare to be an error.From v2.1 The threshold can take values > 1. In this case, these are interpreted as a threshold for MAC.
Value
The reduced genlight dataset
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
result <- gl.filter.monomorphs(testset.gl)result <- gl.filter.maf(result, threshold=0.05, verbose=3)Filters monomorphic loci, including those with all NAs
Description
This script deletes monomorphic loci from a genlight {adegenet} object
A DArT dataset will not have monomorphic loci, but they can arise, along withloci that are scored all NA, when populations or individuals are deleted.
Retaining monomorphic loci unnecessarily increases the size of the datasetand will affect some calculations.
Note that for SNP data, NAs likely represent null alleles; in tagpresence/absence data, NAs represent missing values (presence/absence couldnot be reliably scored)
Usage
gl.filter.monomorphs(x, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A genlight object with monomorphic (and all NA) loci removed.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
# SNP data result <- gl.filter.monomorphs(testset.gl, verbose=3)# Tag P/A data result <- gl.filter.monomorphs(testset.gs, verbose=3)Filters loci for which the SNP has been trimmed from the sequence tagalong with the adaptor
Description
This function checks the position of the SNP within the trimmed sequence tagand identifies those for which the SNP position is outside the trimmedsequence tag. This can happen, rarely, when the sequence containing the SNPresembles the adaptor.
The SNP genotype can still be used in most analyses, but functions likegl2fasta() will present challenges if the SNP has been trimmed from thesequence tag.
Not fatal, but should apply this filter before gl.filter.secondaries, forobvious reasons.
Usage
gl.filter.overshoot(x, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A new genlight object with the recalcitrant loci deleted
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
result <- gl.filter.overshoot(testset.gl, verbose=3)Filters loci that contain private (and fixed alleles) between two populations
Description
This script is meant to be used prior togl.nhybrids to maximise theinformation content of the SNPs used to identify hybrids (currentlynewhybrids does allow only 200 SNPs). The idea is to use first all loci thathave fixed alleles between the potential source populations and then 'fillup' to 200 loci using loci that have private alleles between those. Thefunctions filters for those loci (if invers is set to TRUE, the oppositeis returned (all loci that are not fixed and have no private alleles - notsure why yet, but maybe useful.)
Usage
gl.filter.pa(x, pop1, pop2, invers = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
pop1 | Name of the first parental population (in quotes) [required]. |
pop2 | Name of the second parental population (in quotes) [required]. |
invers | Switch to filter for all loci that have no private alleles andare not fixed [FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
The reduced genlight dataset, containing now only fixed and privatealleles.
Author(s)
Authors: Bernd Gruber & Ella Kelly (University of Melbourne);Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
result <- gl.filter.pa(testset.gl, pop1=pop(testset.gl)[1], pop2=pop(testset.gl)[2],verbose=3)Filters putative parent offspring within a population
Description
This script removes individuals suspected of being related asparent-offspring,using the output of the functiongl.report.parent.offspring, which examines the frequency ofpedigree inconsistent loci, that is, those loci that are homozygotes in theparent for the reference allele, and homozygous in the offspring for thealternate allele. This condition is not consistent with any pedigree,regardless of the (unknown) genotype of the other parent.The pedigree inconsistent loci are counted as an indication of whether or notit is reasonable to propose the two individuals are in a parent-offspringrelationship.
Usage
gl.filter.parent.offspring( x, min.rdepth = 12, min.reproducibility = 1, range = 1.5, method = "best", rm.monomorphs = FALSE, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP genotypes [required]. |
min.rdepth | Minimum read depth to include in analysis [default 12]. |
min.reproducibility | Minimum reproducibility to include in analysis[default 1]. |
range | Specifies the range to extend beyond the interquartile range fordelimiting outliers [default 1.5 interquartile ranges]. |
method | Method of selecting the individual to retain from each pair ofparent offspring relationship, 'best' (based on CallRate) or 'random'[default 'best']. |
rm.monomorphs | If TRUE, remove monomorphic loci after filteringindividuals [default FALSE]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
If two individuals are in a parent offspring relationship, the true number ofpedigree inconsistent loci should be zero, but SNP calling is not infallible.Some loci will be miss-called. The problem thus becomes one of determining ifthe two focal individuals have a count of pedigree inconsistent loci lessthan would be expected of typical unrelated individuals. There are some quitesophisticated software packages available to formally apply likelihoods tothe decision, but we use a simple outlier comparison.
To reduce the frequency of miss-calls, and so emphasize the differencebetween true parent-offspring pairs and unrelated pairs, the data can befiltered on read depth. Typically minimum read depth is set to 5x, but youcan examine the distribution of read depths with the functiongl.report.rdepth and push this up with an acceptable loss ofloci. 12x might be a good minimum for this particular analysis. It issensible also to push the minimum reproducibility up to 1, if that does notresult in an unacceptable loss of loci. Reproducibility is stored in the slot@other$loc.metrics$RepAvg and is defined as the proportion oftechnical replicate assay pairs for which the marker score is consistent.You can examine the distribution of reproducibility with the functiongl.report.reproducibility.
Note that the null expectation is not well defined, and the power reduced, ifthe population from which the putative parent-offspring pairs are drawncontains many sibs. Note also that if an individual has been genotyped twicein the dataset, the replicate pair will be assessed by this script as beingin a parent-offspring relationship.
You should rungl.report.parent.offspring before filtering. Usethis report to decide min.rdepth and min.reproducibility and assess impact onyour dataset.
Note that if your dataset does not contain RepAvg or rdepth among the locusmetrics, the filters for reproducibility and read depth are no used.
Function's output
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in
Value
the filtered genlight object without A set of individuals inparent-offspring relationship. NULL if no parent-offspring relationships werefound.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.list.reports,gl.report.rdepth ,gl.print.reports,gl.report.reproducibility,gl.report.parent.offspring
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
out <- gl.filter.parent.offspring(testset.gl[1:10,1:50])Filters loci based on counts of sequence tags scored at a locus (readdepth)
Description
SNP datasets generated by DArT report AvgCountRef and AvgCountSnp as countsof sequence tags for the reference and alternate alleles respectively. Thesecan be used to back calculate Read Depth. Fragment presence/absence datasetsas provided by DArT (SilicoDArT) provide Average Read Depth and StandardDeviation of Read Depth as standard columns in their report.
Filtering on Read Depth using the companion script gl.filter.rdepth can be onthe basis of loci with exceptionally low counts,or loci with exceptionally high counts.
Usage
gl.filter.rdepth( x, lower = 5, upper = 50, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or tagpresence/absence data [required]. |
lower | Lower threshold value below which loci will be removed[default 5]. |
upper | Upper threshold value above which loci will be removed[default 50]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
For examples of themes, see:
Value
Returns a genlight object retaining loci with a Read Depth in therange specified by the lower and upper threshold.
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
# SNP data gl.report.rdepth(testset.gl) result <- gl.filter.rdepth(testset.gl, lower=8, upper=50, verbose=3)# Tag P/A data result <- gl.filter.rdepth(testset.gs, lower=8, upper=50, verbose=3) res <- gl.filter.rdepth(platypus.gl)Filters loci in a genlight {adegenet} object based on averagerepeatability of alleles at a locus
Description
SNP datasets generated by DArT have an index, RepAvg, generated byreproducing the data independently for 30of alleles that give a repeatable result, averaged over both alleles for eachlocus.
SilicoDArT datasets generated by DArT have a similar index, Reproducibility.For these fragment presence/absence data, repeatability is the percentage ofscores that are repeated in the technical replicate dataset.
Usage
gl.filter.reproducibility( x, threshold = 0.99, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
threshold | Threshold value below which loci will be removed[default 0.99]. |
plot.out | If TRUE, displays a plots of the distribution ofreproducibility values before and after filtering [default TRUE]. |
plot_theme | Theme for the plot [default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
Returns a genlight object retaining loci with repeatability (Repavgor Reproducibility) greater than the specified threshold.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.secondaries(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
# SNP data gl.report.reproducibility(testset.gl) result <- gl.filter.reproducibility(testset.gl, threshold=0.99, verbose=3)# Tag P/A data gl.report.reproducibility(testset.gs) result <- gl.filter.reproducibility(testset.gs, threshold=0.99) test <- gl.subsample.loci(platypus.gl,n=100) res <- gl.filter.reproducibility(test)Filters loci that represent secondary SNPs in a genlight object
Description
SNP datasets generated by DArT include fragments with more than one SNP andrecord them separately with the same CloneID (=AlleleID). These multiple SNPloci within a fragment (secondaries) are likely to be linked, and so you maywish to remove secondaries.
This script filters out all but the first sequence tag with the same CloneIDafter ordering the genlight object on based on repeatability, avgPIC in thatorder (method='best') or at random (method='random').
The filter has not been implemented for tag presence/absence data.
Usage
gl.filter.secondaries(x, method = "random", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
method | Method of selecting SNP locus to retain, 'best' or 'random'[default 'random']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
The genlight object, with the secondary SNP loci removed.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.sexlinked(),gl.filter.taglength()
Examples
gl.report.secondaries(testset.gl)result <- gl.filter.secondaries(testset.gl)Filters loci that are sex linked
Description
Alleles unique to the Y or W chromosome and monomorphic on the X chromosomeswill appear in the SNP dataset as genotypes that are heterozygotic in allindividuals of the heterogametic sex and homozygous in all individuals of thehomogametic sex. This function keeps or drops loci with alleles that behavein this way, as putative sex specific SNP markers.
Usage
gl.filter.sexlinked( x, sex = NULL, filter = NULL, read.depth = 0, t.het = 0.1, t.hom = 0.1, t.pres = 0.1, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = three_colors, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
sex | Factor that defines the sex of individuals. See explanation indetails [default NULL]. |
filter | Either 'keep' to keep sex linked markers only or 'drop' to dropsex linked markers [required]. |
read.depth | Additional filter option to keep only loci above a certainread.depth. Default to 0, which means read.depth is not taken into account[default 0]. |
t.het | Tolerance in the heterogametic sex, that is t.het=0.05 meansthat 5% of the heterogametic sex can be homozygous and still be regarded asconsistent with a sex specific marker [default 0.1]. |
t.hom | Tolerance in the homogametic sex, that is t.hom=0.05 means that5% of the homogametic sex can be heterozygous and still be regarded asconsistent with a sex specific marker [default 0.1]. |
t.pres | Tolerance in presence, that is t.pres=0.05 means that asilicodart marker can be present in either of the sexes and still be regardedas a sex-linked marker [default 0.1]. |
plot.out | Creates a plot that shows the heterozygosity of males andfemales at each loci be regarded as consistent with a sex specific marker [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of three color names for the not sex-linked loci, forthe sex-linked loci and for the area in which sex-linked loci appear [default three_colors]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default NULL, unless specified using gl.set.verbosity]. |
Details
Sex of the individuals for which sex is known with certainty can be providedvia a factor (equal to the length of the number of individuals) or to be heldin the variablex@other$ind.metrics$sex.Coding is: M for male, F for female, U or NA for unknown/missing.The script abbreviates the entries here to the first character. So, coding of'Female' and 'Male' works as well. Character are also converted to upper cases.
' Function's output
This function creates also a plot that shows the heterozygosity of males andfemales at each loci for SNP data or percentage of present/absent in the case of SilicoDArT data.
Examples of other themes that can be used can be consulted in
Value
The filtered genlight object (filter = 'keep': sex linked loci,filter='drop', everything except sex linked loci).
Author(s)
Arthur Georges, Bernd Gruber & Floriaan Devloo-Delva (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.taglength()
Examples
out <- gl.filter.sexlinked(testset.gl, filter='drop')out <- gl.filter.sexlinked(testset.gs, filter='drop')Filters loci in a genlight {adegenet} object based on sequence taglength
Description
SNP datasets generated by DArT typically have sequence tag lengths rangingfrom 20 to 69 base pairs.
Usage
gl.filter.taglength(x, lower = 20, upper = 69, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
lower | Lower threshold value below which loci will be removed[default 20]. |
upper | Upper threshold value above which loci will be removed[default 69]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
Returns a genlight object retaining loci with a sequence tag lengthin the range specified by the lower and upper threshold.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other filter functions:gl.filter.allna(),gl.filter.callrate(),gl.filter.heterozygosity(),gl.filter.hwe(),gl.filter.locmetric(),gl.filter.maf(),gl.filter.monomorphs(),gl.filter.overshoot(),gl.filter.pa(),gl.filter.parent.offspring(),gl.filter.rdepth(),gl.filter.reproducibility(),gl.filter.secondaries(),gl.filter.sexlinked()
Examples
# SNP data gl.report.taglength(testset.gl) result <- gl.filter.taglength(testset.gl,lower=60) gl.report.taglength(result)# Tag P/A data gl.report.taglength(testset.gs) result <- gl.filter.taglength(testset.gs,lower=60) gl.report.taglength(result) test <- gl.subsample.loci(platypus.gl, n =100) res <- gl.report.taglength(test)Generates a matrix of fixed differences and associated statistics forpopulations taken pairwise
Description
This script takes SNP data or sequence tag P/A data grouped into populationsin a genlight object (DArTSeq) and generates a matrix of fixed differencesbetween populations taken pairwise
Usage
gl.fixed.diff( x, tloc = 0, test = FALSE, delta = 0.02, alpha = 0.05, reps = 1000, mono.rm = TRUE, pb = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes or tag P/A data(SilicoDArT) or an object of class 'fd' [required]. |
tloc | Threshold defining a fixed difference (e.g. 0.05 implies 95:5 vs5:95 is fixed) [default 0]. |
test | If TRUE, calculate p values for the observed fixed differences[default FALSE]. |
delta | Threshold value for the true population minor allele frequency(MAF) from which resultant sample fixed differences are considered truepositives [default 0.02]. |
alpha | Level of significance used to display non-significantdifferences between populations as they are compared pairwise [default 0.05]. |
reps | Number of replications to undertake in the simulation to estimateprobability of false positives [default 1000]. |
mono.rm | If TRUE, loci that are monomorphic across all individuals areremoved before beginning computations [default TRUE]. |
pb | If TRUE, show a progress bar on time consuming loops[default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
A fixed difference at a locus occurs when two populations share no alleles orwhere all members of one population has a sequence tag scored, and allmembers of the other population has the sequence tag absent. The challengewith this approach is that when sample sizes are finite, fixed differenceswill occur through sampling error, compounded when many loci are examined.Simulations suggest that sample sizes of n1=5 and n2=5 are adequate to reducethe probability of [experiment-wide] type 1 error to negligible levels[ploidy=2]. A warning is issued if comparison between two populationsinvolves sample sizes less than 5, taking into account allele drop-out.
Optionally, if test=TRUE, the script will test the fixed differences betweenfinal OTUs for statistical significance, using simulation, and then furtheramalgamate populations that for which there are no significant fixeddifferences at a specified level of significance (alpha). To avoid conflationof true fixed differences with false positives in the simulations, it isnecessary to decide a threshold value (delta) for extreme true allelefrequencies that will be considered fixed for practical purposes. That is,fixed differences in the sample set will be considered to be positives (notfalse positives) if they arise from true allele frequencies of less than1-delta in one or both populations. The parameter delta is typically set tobe small (e.g. delta = 0.02).
NOTE: The above test will only be calculated if tloc=0, that is, for analysesof absolute fixed differences. The test applies in comparisons of allopatricpopulations only. For sympatric populations, use gl.pval.sympatry().
An absolute fixed difference is as defined above. However, one might wish toscore fixed differences at some lower level of allele frequency difference,say where percent allele frequencies are 95,5 and 5,95 rather than 100:0 and0:100. This adjustment can be done with the tloc parameter. For example,tloc=0.05 means that SNP allele frequencies of 95,5 and 5,95 percent will beregarded as fixed when comparing two populations at a locus.
Value
A list of Class 'fd' containing the gl object and square matrices,as follows:
$gl – the output genlight object;
$fd – raw fixed differences;
$pcfd – percent fixed differences;
$nobs – mean no. of individuals used in each comparison;
$nloc – total number of loci used in each comparison;
$expfpos – if test=TRUE, the expected count of false positivesfor each comparison [by simulation];
$sdfpos – if test=TRUE, the standard deviation of the count offalse positives for each comparison [by simulation];
$pval – if test=TRUE, the significance of the count of fixeddifferences [by simulation])
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Examples
fd <- gl.fixed.diff(testset.gl, tloc=0, verbose=3 )fd <- gl.fixed.diff(testset.gl, tloc=0, test=TRUE, delta=0.02, reps=100, verbose=3 )Calculates a pairwise Fst values for populations in a genlight object
Description
This script calculates pairwise Fst values based on the implementation in theStAMPP package (?stamppFst). It allows to run bootstrap to estimateprobability of Fst values to be different from zero. For detailed informationplease check the help pages (?stamppFst).
Usage
gl.fst.pop(x, nboots = 1, percent = 95, nclusters = 1, verbose = NULL)Arguments
x | Name of the genlight containing the SNP genotypes [required]. |
nboots | Number of bootstraps to perform across loci to generateconfidence intervals and p-values [default 1]. |
percent | Percentile to calculate the confidence interval around[default 95]. |
nclusters | Number of processor threads or cores to use duringcalculations [default 1]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A matrix of distances between populations (class dist), if nboots =1,otherwise a list with Fsts (in a matrix), Pvalues (a matrix of pvalues),Bootstraps results (data frame of all runs). Hint: Useas.matrix(as.dist(fsts)) if you want to have a squared matrix withsymmetric entries returned, instead of a dist object.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
test <- gl.filter.callrate(platypus.gl,threshold = 1)test <- gl.filter.monomorphs(test)out <- gl.fst.pop(test, nboots=1)Performs least-cost path analysis based on a friction matrix
Description
This function calculates the pairwise distances (Euclidean, cost pathdistances and genetic distances) of populations using a friction matrix anda spatial genind object. The genind object needs to have coordinates in thesame projected coordinate system as the friction matrix. The frictionmatrix can be either a single raster of a stack of several layers. If astack is provided the specified cost distance is calculated for each layerin the stack. The output of this function can be used with the functionswassermann orlgrMMRR to test for the significance of alayer on the genetic structure.
Usage
gl.genleastcost( x, fric.raster, gen.distance, NN = NULL, pathtype = "leastcost", plotpath = TRUE, theta = 1, verbose = NULL)Arguments
x | A spatial genind object. See ?popgenreport how to providecoordinates in genind objects [required]. |
fric.raster | A friction matrix [required]. |
gen.distance | Specification which genetic distance method should beused to calculate pairwise genetic distances between populations ( 'D','Gst.Nei', 'Gst.Hedrick') or individuals ('Smouse', 'Kosman', 'propShared')[required]. |
NN | Number of neighbours used when calculating the cost distance(possible values 4, 8 or 16). As the default is NULL a value has to beprovided if pathtype='leastcost'. NN=8 is most commonly used. Be aware thatlinear structures may cause artefacts in the least-cost paths, thereforeinspect the actual least-cost paths in the provided output [default NULL]. |
pathtype | Type of cost distance to be calculated (based on function inthe |
plotpath | switch if least cost paths should be plotted (works only ifpathtype='leastcost'. Be aware this slows down the computation, but it isrecommended to do this to check least cost paths visually. |
theta | value needed for rSPDistance function. See |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
Returns a list that consists of four pairwise distance matrices(Euclidean, Cost, length of path and genetic) and the actual paths as spatialline objects.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
References
Cushman, S., Wasserman, T., Landguth, E. and Shirk, A. (2013).Re-Evaluating Causal Modeling with Mantel Tests in Landscape Genetics.Diversity, 5(1), 51-72.
Landguth, E. L., Cushman, S. A., Schwartz, M. K., McKelvey, K. S.,Murphy, M. and Luikart, G. (2010). Quantifying the lag time to detectbarriers in landscape genetics. Molecular ecology, 4179-4191.
Wasserman, T. N., Cushman, S. A., Schwartz, M. K. and Wallin, D. O.(2010). Spatial scaling and multi-model inference in landscape genetics:Martes americana in northern Idaho. Landscape Ecology, 25(10), 1601-1612.
See Also
landgenreport,popgenreport,wassermann,lgrMMRR
Examples
## Not run: data(possums.gl)library(raster) #needed for that examplelandscape.sim <- readRDS(system.file('extdata','landscape.sim.rdata', package='dartR'))glc <- gl.genleastcost(x=possums.gl,fric.raster=landscape.sim ,gen.distance = 'D', NN=8, pathtype = 'leastcost',plotpath = TRUE)library(PopGenReport)PopGenReport::wassermann(eucl.mat = glc$eucl.mat, cost.mat = glc$cost.mats, gen.mat = glc$gen.mat)lgrMMRR(gen.mat = glc$gen.mat, cost.mats = glc$cost.mats, eucl.mat = glc$eucl.mat)## End(Not run)Calculates an identity by descent matrix
Description
This function calculates the mean probability of identity by state (IBS)across loci that would result from all the possible crosses of theindividuals analyzed. IBD is calculated by an additive relationship matrixapproach developed by Endelman and Jannink (2012) as implemented in thefunctionA.mat (package rrBLUP).
Usage
gl.grm( x, plotheatmap = TRUE, palette_discrete = discrete_palette, palette_convergent = convergent_palette, legendx = 0, legendy = 0.5, verbose = NULL, ...)Arguments
x | Name of the genlight object containing the SNP data [required]. |
plotheatmap | A switch if a heatmap should be shown [default TRUE]. |
palette_discrete | A discrete palette for the color of populations or alist with as many colors as there are populations in the dataset[default discrete_palette]. |
palette_convergent | A convergent palette for the IBD values[default convergent_palette]. |
legendx | x coordinates for the legend[default 0]. |
legendy | y coordinates for the legend[default 1]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
... | Parameters passed to function A.mat from package rrBLUP. |
Details
Two or more alleles are identical by descent (IBD) if they are identicalcopies of the same ancestral allele in a base population. The additiverelationship matrix is a theoretical framework for estimating a relationshipmatrix that is consistent with an approach to estimate the probability thatthe alleles at a random locus are identical in state (IBS).
This function also plots a heatmap, and a dendrogram, of IBD values whereeach diagonal element has a mean that equals 1+f, where f is the inbreedingcoefficient (i.e. the probability that the two alleles at a randomly chosenlocus are IBD from the base population). As this probability lies between 0and 1, the diagonal elements range from 1 to 2. Because the inbreedingcoefficients are expressed relative to the current population, the mean ofthe off-diagonal elements is -(1+f)/n, where n is the number of loci.Individual names are shown in the margins of the heatmap and colorsrepresent different populations.
Value
An identity by descent matrix
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
References
Endelman, J. B. (2011). Ridge regression and other kernels for genomicselection with r package rrblup. The Plant Genome 4, 250.
Endelman, J. B. , Jannink, J.-L. (2012). Shrinkage estimation of therealized relationship matrix. G3: Genes, Genomics, Genetics 2, 1405.
See Also
Other inbreeding functions:gl.grm.network()
Examples
gl.grm(platypus.gl[1:10,1:100])Represents a genomic relationship matrix (GRM) as a network
Description
This script takes a G matrix generated bygl.grm and representsthe relationship among the specimens as a network diagram. In order to usethis script, a decision is required on a threshold for relatedness to berepresented as link in the network, and on the layout used to create thediagram.
Usage
gl.grm.network( G, x, method = "fr", node.size = 8, node.label = TRUE, node.label.size = 2, node.label.color = "black", link.color = NULL, link.size = 2, relatedness_factor = 0.125, title = "Network based on a genomic relationship matrix", palette_discrete = NULL, save2tmp = FALSE, verbose = NULL)Arguments
G | A genomic relationship matrix (GRM) generated by |
x | A genlight object from which the G matrix was generated [required]. |
method | One of 'fr', 'kk', 'gh' or 'mds' [default 'fr']. |
node.size | Size of the symbols for the network nodes [default 8]. |
node.label | TRUE to display node labels [default TRUE]. |
node.label.size | Size of the node labels [default 3]. |
node.label.color | Color of the text of the node labels[default 'black']. |
link.color | Color palette for links [default NULL]. |
link.size | Size of the links [default 2]. |
relatedness_factor | Factor of relatedness [default 0.125]. |
title | Title for the plot[default 'Network based on genomic relationship matrix']. |
palette_discrete | A discrete palette for the color of populations or alist with as many colors as there are populations in the dataset[default NULL]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The gl.grm.network function takes a genomic relationship matrix (GRM) generated by the gl.grm function to represent the relationship among individuals in the dataset as a network diagram. To generate the GRM, the function gl.grm uses the function A.mat from package rrBLUP, which implementsthe approach developed by Endelman and Jannink (2012).
The GRM is an estimate of the proportion of alleles that two individuals havein common. It is generated by estimating the covariance of the genotypes between two individuals, i.e. how much genotypes in the two individualscorrespond with each other. This covariance depends on the probability thatalleles at a random locus are identical by state (IBS). Two alleles are IBS if they represent the same allele. Two alleles are identical by descent (IBD) if one is a physical copy of the other or if they are both physical copies of the same ancestral allele. Note that IBD is complicatedto determine. IBD implies IBS, but not conversely. However, as the numberof SNPs in a dataset increases, the mean probability of IBS approaches the mean probability of IBD.
It follows that the off-diagonal elements of the GRM are two times the kinship coefficient, i.e. the probability that two alleles at a random locusdrawn from two individuals are IBD. Additionally, the diagonal elements ofthe GRM are 1+f, where f is the inbreeding coefficient of each individual,i.e. the probability that the two alleles at a random locus are IBD.
Choosing a meaningful threshold to represent the relationship between individuals is tricky because IBD is not an absolute state but is relative toa reference population for which there is generally little information so that we can estimate the kinship of a pair of individuals only relative to some other quantity. To deal with this, we can use the average inbreeding coefficient of the diagonal elements as the reference value. For this, the function subtracts 1 from the mean of the diagonal elements of the GRM. In asecond step, the off-diagonal elements are divided by 2, and finally, the mean of the diagonal elements is subtracted from each off-diagonal element after dividing them by 2. This approach is similar to the one used by Goudet et al. (2018).
Below is a table modified from Speed & Balding (2015) showing kinship values,and their confidence intervals (CI), for different relationships that could be used to guide the choosing of the relatedness threshold in the function.
|Relationship |Kinship | 95
|Identical twins/clones/same individual | 0.5 | - |
|Sibling/Parent-Offspring | 0.25 | (0.204, 0.296)|
|Half-sibling | 0.125 | (0.092, 0.158)|
|First cousin | 0.062 | (0.038, 0.089)|
|Half-cousin | 0.031 | (0.012, 0.055)|
|Second cousin | 0.016 | (0.004, 0.031)|
|Half-second cousin | 0.008 | (0.001, 0.020)|
|Third cousin | 0.004 | (0.000, 0.012)|
|Unrelated | 0 | - |
Four layout options are implemented in this function:
'fr' Fruchterman-Reingold layoutlayout_with_fr(package igraph)
'kk' Kamada-Kawai layoutlayout_with_kk (package igraph)
'gh' Graphopt layoutlayout_with_graphopt(package igraph)
'mds' Multidimensional scaling layoutlayout_with_mds(package igraph)
Value
A network plot showing relatedness between individuals
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
References
Endelman, J. B. , Jannink, J.-L. (2012). Shrinkage estimation of the realized relationship matrix. G3: Genes, Genomics, Genetics 2, 1405.
Goudet, J., Kay, T., & Weir, B. S. (2018). How to estimate kinship.Molecular Ecology, 27(20), 4121-4135.
Speed, D., & Balding, D. J. (2015). Relatedness in the post-genomic era: is it still useful?. Nature Reviews Genetics, 16(1), 33-44.
See Also
Other inbreeding functions:gl.grm()
Examples
if (requireNamespace("igraph", quietly = TRUE) & requireNamespace("rrBLUP",quietly = TRUE) & requireNamespace("fields", quietly=TRUE)) {t1 <- possums.gl# filtering on call rate t1 <- gl.filter.callrate(t1)t1 <- gl.subsample.loci(t1,n = 100)# relatedness matrixres <- gl.grm(t1,plotheatmap = FALSE)# relatedness networkres2 <- gl.grm.network(res,t1,relatedness_factor = 0.125)}Performs Hardy-Weinberg tests over loci and populations
Description
Hardy-Weinberg tests are performed for each loci in each of the populationsas defined by the pop slot in a genlight object.
Usage
gl.hwe.pop( x, alpha_val = 0.05, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = c("gray90", "deeppink"), HWformat = FALSE, verbose = NULL)Arguments
x | A genlight object with a population defined[pop(x) does not return NULL]. |
alpha_val | Level of significance for testing [default 0.05]. |
plot.out | If TRUE, returns a plot object compatible with ggplot,otherwise returns a dataframe [default TRUE]. |
plot_theme | User specified theme [default theme_dartR()]. |
plot_colors | Vector with two color names for the borders and fill[default two_colors].[default discrete_palette]. |
HWformat | Switch if data should be returned in HWformat (counts ofGenotypes to be used in package |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
This function employs theHardyWeinberg package, which needs to beinstalled. The function that is used isHWExactStats, but there are several other greatfunctions implemented in the package regarding HWE. Therefore, this functioncan return the data in the format expected by the HWE package expects, viaHWformat=TRUE and then use this to run other functions of the package.
This functions performs a HWE test for every population (rows) and loci(columns) and returns a true false matrix. True is reported if the p-value ofan HWE-test for a particular loci and population was below the specifiedthreshold (alpha_val, default=0.05). The thinking behind this approach isthat loci that are not in HWE in several populations have most likely to betreated (e.g. filtered if loci under selection are of interest). If plot=TRUEa barplot on the loci and the sum of deviation over all population isreturned. Loci that deviate in the majority of populations can be identifiedvia colSums on the resulting matrix.
Plot themes can be obtained from
Resultant ggplots and the tabulation are saved to the session's temporarydirectory.
Value
The function returns a list with up to three components:
'HWE' is the matrix over loci and populations
'plot' is a plot (ggplot) which shows the significant resultsfor population and loci (can be amended further using ggplot syntax)
'HWEformat=TRUE' the 'HWformat' entails SNP data for each populationin 'HardyWeinberg'-format to be used with other functions of the package(e.g
HWPermorHWExactPrevious).
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
out <- gl.hwe.pop(bandicoot.gl[,1:33], alpha_val=0.05, plot.out=TRUE, HWformat=FALSE)Performs isolation by distance analysis
Description
This function performs an isolation by distance analysis based on a Manteltest and also produces an isolation by distance plot. If a genlight objectwith coordinates is provided, then an Euclidean and genetic distance matricesare calculated.'
Usage
gl.ibd( x = NULL, distance = "Fst", coordinates = "latlon", Dgen = NULL, Dgeo = NULL, Dgeo_trans = "Dgeo", Dgen_trans = "Dgen", permutations = 999, plot.out = TRUE, paircols = NULL, plot_theme = theme_dartR(), save2tmp = FALSE, verbose = NULL)Arguments
x | Genlight object. If provided a standard analysis on Fst/1-Fst andlog(distance) is performed [required]. |
distance | Type of distance that is calculated and used for theanalysis. Can be either population based 'Fst' [stamppFst],'D' [stamppNeisD] or individual based 'propShared',[gl.propShared], 'euclidean' [gl.dist.ind, method='Euclidean'][default "Fst"]. |
coordinates | Can be either 'latlon', 'xy' or a two column data.framewith column names 'lat','lon', 'x', 'y'). Coordinates are provided via |
Dgen | Genetic distance matrix if no genlight object is provided[default NULL]. |
Dgeo | Euclidean distance matrix if no genlight object is provided[default NULL]. |
Dgeo_trans | Transformation to be used on the Euclidean distances. SeeDgen_trans [default "Dgeo"]. |
Dgen_trans | You can provide a formula to transform the geneticdistance. The transformation can be applied as a formula using Dgen as thevariable to be transformed. For example: |
permutations | Number of permutations in the Mantel test [default 999]. |
plot.out | Should an isolation by distance plot be returned[default TRUE]. |
paircols | Should pairwise dots colored by 'pop'ulation/'ind'ividualpairs [default 'pop']. You can color pairwise individuals by pairwisepopulation colors. |
plot_theme | Theme for the plot. See details for options[default theme_dartR()]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
Currently pairwise Fst and D between populations and1-propShared and Euclidean distance between individuals areimplemented. Coordinates are expected as lat long and converted to GoogleEarth Mercator projection. If coordinates are already projected, provide themat the x@other$xy slot.
You can provide also your own genetic and Euclidean distance matrices. Thefunction is based on the code provided by the adegenet tutorial(http://adegenet.r-forge.r-project.org/files/tutorial-basics.pdf),using the functionsmantel (package vegan),stamppFst,stamppNeisD (package StAMPP) andgl.propShared or gl.dist.ind. For transformation you need to have the dismopackage installed. As a new feature you can plot pairwise relationship usingdouble colored points (paircols=TRUE). Pairwise relationship can bevisualised via populations or individuals, depending which distance iscalculated. Please note: Often a problem arises, if an individual based distance is calculated (e.g. propShared) and some individuals have identicalcoordinates as this results in distances of zero between those pairs of individuals.
If the standard transformation [log(Dgeo)] is used, this results in an infinite value, because of trying to calculate'log(0)'. To avoid this, the easiest fix is to change the transformation from log(Dgeo) to log(Dgeo+1) or you could add some "noise" to the coordinates of the individuals (e.g. +- 1m,but be aware if you use lat lon then you rather want to add +0.00001 degreesor so).
Value
Returns a list of the following components: Dgen (the geneticdistance matrix), Dgeo (the Euclidean distance matrix), Mantel (thestatistics of the Mantel test).
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
References
Rousset, F. (1997). Genetic differentiation and estimation of gene flow fromF-statistics under isolation by distance. Genetics, 145(4), 1219-1228.
See Also
Examples
#because of speed only the first 100 lociibd <- gl.ibd(bandicoot.gl[,1:100], Dgeo_trans='log(Dgeo)' ,Dgen_trans='Dgen/(1-Dgen)')#because of speed only the first 10 individuals)ibd <- gl.ibd(bandicoot.gl[1:10,], distance='euclidean', paircols='pop', Dgeo_trans='Dgeo')#only first 100 lociibd <- gl.ibd(bandicoot.gl[,1:100])Imputates missing data
Description
This function imputes genotypes on a population-by-population basis, wherepopulations can be considered panmictic, or imputes the state forpresence-absence data.
Usage
gl.impute( x, method = "neighbour", fill.residual = TRUE, parallel = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence-absencedata [required]. |
method | Imputation method, either "frequency" or "HW" or "neighbour" or "random" [default "neighbour"]. |
fill.residual | Should any residual missing values remaining after imputation be set to 0, 1, 2 at random, taking into account global allele frequencies at the particular locus [default TRUE]. |
parallel | A logical indicating whether multiple cores -if available-should be used for the computations (TRUE), or not (FALSE); requires thepackage parallel to be installed [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
We recommend that imputation be performed on sampling locations, beforeany aggregation. Imputation is achieved by replacing missing values usingeither of two methods:
If "frequency", genotypes scored as missing at a locus in an individualare imputed using the average allele frequencies at that locus in the population from which the individual was drawn.
If "HW", genotypes scored as missing at a locus in an individual are imputed by sampling at random assuming Hardy-Weinberg equilibrium. Applies only to genotype data.
If "neighbour", substitute the missing values for the focal individualwith the values taken from the nearest neighbour. Repeat with next nearestand so on until all missing values are replaced.
if "random", missing data are substituted by random values (0, 1 or 2).
The nearest neighbour is the one with the smallest Euclidean distance in all the dataset.
The advantage of this approach is that it works regardless of how manyindividuals are in the population to which the focal individual belongs,and the displacement of the individual is haphazard as opposed to:
(a) Drawing the individual toward the population centroid (HW and Frequency).
(b) Drawing the individual toward the global centroid (glPCA).
Note that loci that are missing for all individuals in a population are not imputed with method 'frequency' or 'HW'. Consider using the functiongl.filter.allna with by.pop=TRUE to remove them first.
Value
A genlight object with the missing data imputed.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
Examples
require("dartR.data")# SNP genotype datagl <- gl.filter.callrate(platypus.gl,threshold=0.95)gl <- gl.filter.allna(gl)gl <- gl.impute(gl,method="neighbour")# Sequence Tag presence-absence datags <- gl.filter.callrate(testset.gs,threshold=0.95)gl <- gl.filter.allna(gl)gs <- gl.impute(gs, method="neighbour")gs <- gl.impute(platypus.gl,method ="random")Installs all required packages for using all functionsavailable in dartR
Description
The function compares the installed packages with the the currently availableones on CRAN. Be aware this function only works if a version of dartR isalready installed on your system. You can choose if you also want to have aspecific version of dartR installed ('CRAN', 'master', 'beta' or 'dev' ). 'master', 'beta' and 'dev' are installed from Github. Be aware that the dev version from github isnot fully tested and most certainly will contain untested functions.
Usage
gl.install.vanilla.dartR(flavour = NULL, verbose = NULL)Arguments
flavour | The version of R you want to install. If NULLthen only packages needed for the current version will be installed. If'CRAN' current CRAN version will be installed. 'master' installs the GitHubmaster branch, 'beta' installs the latest stable version, and 'dev' installs the experimental development branch fromGitHub [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
Returns a message if the installation was successful/required.
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Combines two genlight objects
Description
This function combines two genlight objects and their associated metadata.The history associated with the two genlight objects is cleared from the newgenlight object. The individuals/samples must be the same in each genlightobject.
The function is typically used to combine datasets from the same servicewhere the files have been split because of size limitations. The data is readin from multiple csv files, then the resultant genlight objects are combined.
Usage
gl.join(x1, x2, verbose = NULL)Arguments
x1 | Name of the first genlight object [required]. |
x2 | Name of the first genlight object [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A new genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
x1 <- testset.gl[,1:100]x1@other$loc.metrics <- testset.gl@other$loc.metrics[1:100,]nLoc(x1)x2 <- testset.gl[,101:150]x2@other$loc.metrics <- testset.gl@other$loc.metrics[101:150,]nLoc(x2)gl <- gl.join(x1, x2, verbose=2)nLoc(gl)Removes all but the specified individuals from a dartR genlight object
Description
This script deletes all individuals apart from those listed (ind.list).Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).
The script returns a dartR genlight object with the retained individuals and the recalculated locus metadata. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Usage
gl.keep.ind(x, ind.list, recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
ind.list | A list of individuals to be retained [required]. |
recalc | If TRUE, recalculate the locus metadata statistics [default FALSE]. |
mono.rm | If TRUE, remove monomorphic and all NA loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A reduced dartR genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.drop.pop to drop rather than keep specified populations
Examples
# SNP data gl2 <- gl.keep.ind(testset.gl, ind.list=c('AA019073','AA004859')) # Tag P/A data gs2 <- gl.keep.ind(testset.gs, ind.list=c('AA020656','AA19077','AA004859'))Removes all but the specified loci from a genlight object
Description
This function deletes loci that are not specified to keep, and their associated metadata.
The script returns a dartR genlight object with the retained loci. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Usage
gl.keep.loc(x, loc.list = NULL, first = NULL, last = NULL, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
loc.list | A list of loci to be kept[required, if loc.range not specified]. |
first | First of a range of loci to be kept[required, if loc.list not specified]. |
last | Last of a range of loci to be kept[if not specified, last locus in the dataset]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object with the reduced data
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.drop.loc to drop rather than keep specified loci
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
# SNP data gl2 <- gl.keep.loc(testset.gl, loc.list=c('100051468|42-A/T', '100049816-51-A/G'))# Tag P/A data gs2 <- gl.keep.loc(testset.gs, loc.list=c('20134188','19249144'))Removes all but the specified populations from a dartR genlight object
Description
Individuals are assigned to populations based on associated specimen metadatastored in the dartR genlight object.
This script deletes all individuals apart from those in listed populations (pop.list).Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).
The script returns a dartR genlight object with the retained populations and the recalculated locus metadata. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Usage
gl.keep.pop( x, pop.list, as.pop = NULL, recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
pop.list | List of populations to be retained [required]. |
as.pop | Temporarily assign another locus metric as the population forthe purposes of deletions [default NULL]. |
recalc | If TRUE, recalculate the locus metadata statistics [default FALSE]. |
mono.rm | If TRUE, remove monomorphic and all NA loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A reduced dartR genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.drop.pop to drop rather than keep specified populations
Examples
# SNP data gl2 <- gl.keep.pop(testset.gl, pop.list=c('EmsubRopeMata', 'EmvicVictJasp')) gl2 <- gl.keep.pop(testset.gl, pop.list=c('EmsubRopeMata', 'EmvicVictJasp'), mono.rm=TRUE,recalc=TRUE) gl2 <- gl.keep.pop(testset.gl, pop.list=c('Female'),as.pop='sex') # Tag P/A data gs2 <- gl.keep.pop(testset.gs, pop.list=c('EmsubRopeMata','EmvicVictJasp'))Plots linkage disequilibrium against distance by population disequilibrium patterns
Description
The function creates a plot showingthe pairwise LD measure against distance in number of base pairs pooled overall the chromosomes and a red line representing the threshold (R.squared = 0.2) that is commonly used to imply that two loci are unlinked (Delourme etal., 2013; Li et al., 2014).
Usage
gl.ld.distance( ld_report, ld_resolution = 1e+05, pop_colors = NULL, plot_theme = NULL, plot.out = TRUE, save2tmp = FALSE, plot_title = " ", verbose = NULL)Arguments
ld_report | Output from function |
ld_resolution | Resolution at which LD should be reported in number of base pairs [default NULL]. |
pop_colors | A color palette for box plots by population or a listwith as many colors as there are populations in the dataset[default NULL]. |
plot_theme | User specified theme [default NULL]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
plot_title | Title of tyh plot [default " "]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A dataframe with information of LD against distance by population.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
References
Delourme, R., Falentin, C., Fomeju, B. F., Boillot, M., Lassalle, G., André, I., . . . Marty, A. (2013). High-density SNP-based genetic map development and linkage disequilibrium assessment in Brassica napusL. BMC genomics, 14(1), 120.
Li, X., Han, Y., Wei, Y., Acharya, A., Farmer, A. D., Ho, J., . . . Brummer, E. C. (2014). Development of an alfalfa SNP array and its use to evaluate patterns of population structure and linkage disequilibrium. PLoS One, 9(1), e84329.
See Also
Other ld functions:gl.ld.haplotype()
Examples
if ((requireNamespace("snpStats", quietly = TRUE)) & (requireNamespace("fields", quietly = TRUE))) {require("dartR.data")x <- platypus.glx <- gl.filter.callrate(x,threshold = 1)x <- gl.filter.monomorphs(x)x$position <- x$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1x$chromosome <- as.factor(x$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)ld_res <- gl.report.ld.map(x,ld_max_pairwise = 10000000)ld_res_2 <- gl.ld.distance(ld_res,ld_resolution= 1000000)}Visualize patterns of linkage disequilibrium and identification of haplotypes
Description
This function plots a Linkage disequilibrium (LD) heatmap, where the colour shading indicates the strength of LD. Chromosome positions (Mbp) are shown onthe horizontal axis, and haplotypes appear as triangles and delimited by dark yellow vertical lines. Numbers identifying each haplotype are shown in the upper part of the plot.
The heatmap also shows heterozygosity for each SNP.
The function identifies haplotypes based on contiguous SNPs that are in linkage disequilibrium using as thresholdld_threshold_haplo andcontaining more thanmin_snps SNPs.
Usage
gl.ld.haplotype( x, pop_name = NULL, chrom_name = NULL, ld_max_pairwise = 1e+07, maf = 0.05, ld_stat = "R.squared", ind.limit = 10, min_snps = 10, ld_threshold_haplo = 0.5, coordinates = NULL, color_haplo = "viridis", color_het = "deeppink", plot.out = TRUE, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
pop_name | Name of the population to analyse. If NULL all the populations are analised [default NULL]. |
chrom_name | Nme of the chromosome to analyse. If NULL all the chromosomes are analised [default NULL]. |
ld_max_pairwise | Maximum distance in number of base pairs at which LDshould be calculated [default 10000000]. |
maf | Minor allele frequency (by population) threshold to filter out loci. If a value > 1 is provided it will be interpreted as MAC (i.e. theminimum number of times an allele needs to be observed) [default 0.05]. |
ld_stat | The LD measure to be calculated: "LLR", "OR", "Q", "Covar","D.prime", "R.squared", and "R". See |
ind.limit | Minimum number of individuals that a population shouldcontain to take it in account to report loci in LD [default 10]. |
min_snps | Minimum number of SNPs that should have a haplotype to call it [default 10]. |
ld_threshold_haplo | Minimum LD between adjacent SNPs to call a haplotype [default 0.5]. |
coordinates | A vector of two elements with the start and end coordinates in base pairs to which restrict the analysis e.g. c(1,1000000) [default NULL]. |
color_haplo | Color palette for haplotype plot. See details[default "viridis"]. |
color_het | Color for heterozygosity [default "deeppink"]. |
plot.out | Specify if heatmap plot is to be produced [default TRUE]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The information for SNP's position should be stored in the genlight accessor"@position" and the SNP's chromosome name in the accessor "@chromosome"(see examples). The function will then calculate LD within each chromosome.
The output of the function includes a table with the haplotypesthat were identified and their location.
Colors of the heatmap (color_haplo) are based on the functionscale_fill_viridis from packageviridis. Other color palettes options are "magma", "inferno", "plasma", "viridis","cividis", "rocket", "mako" and "turbo".
Value
A table with the haplotypes that were identified.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other ld functions:gl.ld.distance()
Examples
require("dartR.data")x <- platypus.glx <- gl.filter.callrate(x,threshold = 1)x <- gl.keep.pop(x, pop.list = "TENTERFIELD")x$chromosome <- as.factor(x$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)x$position <- x$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1ld_res <- gl.ld.haplotype(x,chrom_name = "NC_041728.1_chromosome_1", ld_max_pairwise = 10000000 )Prints dartR reports saved in tempdir
Description
Prints dartR reports saved in tempdir
Usage
gl.list.reports()Value
Prints a table with all reports saved in tempdir. Currently the stylecannot be changed.
Author(s)
Bernd Gruber & Luis Mijangos (bugs? Post tohttps://groups.google.com/d/forum/dartr)
See Also
Examples
## Not run: gl.report.callrate(testset.gl,save2tmp=TRUE)gl.list.reports()## End(Not run)Loads an object from compressed binary format produced by gl.save()
Description
This is a wrapper for readRDS()
Usage
gl.load(file, verbose = NULL)Arguments
file | Name of the file to receive the binary version of the object[required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The script loads the object from the current workspace and returns thegl object.
Value
The loaded object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Examples
gl.save(testset.gl,file.path(tempdir(),'testset.rds'))gl <- gl.load(file.path(tempdir(),'testset.rds'))Creates a proforma recode_ind file for reassigning individual(=specimen) names
Description
Renaming individuals may be required when there have been errors in labelingarising in the process from sample to sequencing files. There may be occasionswhere renaming individuals is required for preparation of figures.
Usage
gl.make.recode.ind( x, out.recode.file = "default_recode_ind.csv", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object [required]. |
out.recode.file | File name of the output file (including extension)[default default_recode_ind.csv]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
Details
This function facilitates the construction of a recode table by producing aproforma file with current individual (=specimen) names in two identicalcolumns. Edit the second column to reassign individual names. Use keyword'Delete' to delete an individual.
When caution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a clear record of the changes.
Use outpath=getwd() or when calling this function to direct output files to your working directory.
The function works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Apply the recoding using gl.recode.ind().
Value
A vector containing the new individual names.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.read.dart(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
result <- gl.make.recode.ind(testset.gl, out.recode.file ='Emmac_recode_ind.csv',outpath=tempdir())Creates a proforma recode_pop_table file for reassigning populationnames
Description
Renaming populations may be required when there have been errors inassignment arising in the process from sample to sequence files or when onewishes to amalgamate populations, or delete populations. Recoding populationscan also be done with a recode table (csv).
Usage
gl.make.recode.pop( x, out.recode.file = "recode_pop_table.csv", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object [required]. |
out.recode.file | File name of the output file (including extension)[default recode_pop_table.csv]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
This function facilitates the construction of a recode table by producing aproforma file with current population names in two identical columns. Editthe second column to reassign populations. Use keyword 'Delete' to delete apopulation.
When caution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a clear record of the changes.
Use outpath=getwd() or when calling this function to direct output files to your working directory.
The function works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
Apply the recoding using gl.recode.pop().
Value
A vector containing the new population names.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
result <- gl.make.recode.pop(testset.gl,out.recode.file='test.csv',outpath=tempdir(),verbose=2)Creates an interactive map (based on latlon) from a genlight object
Description
Creates an interactive map (based on latlon) from a genlight object
Usage
gl.map.interactive( x, matrix = NULL, standard = TRUE, symmetric = TRUE, pop.labels = TRUE, pop.labels.cex = 12, ind.circles = TRUE, ind.circle.cols = NULL, ind.circle.cex = 10, ind.circle.transparency = 0.8, palette_links = NULL, leg_title = NULL, provider = "Esri.NatGeoWorldMap", verbose = NULL)Arguments
x | A genlight object (including coordinates within the latlon slot) [required]. |
matrix | A distance matrix between populations or individuals. Thematrix is visualised as lines between individuals/populations. If matrix isasymmetric two lines with arrows are plotted [default NULL]. |
standard | If a matrix is provided line width will be standardised to bebetween 1 to 10, if set to true, otherwise taken as given [default TRUE]. |
symmetric | If a symmetric matrix is provided only one line is drawnbased on the lower triangle of the matrix. If set to false arrows indicatingthe direction are used instead [default TRUE]. |
pop.labels | Population labels at the center of the individuals ofpopulations [default TRUE]. |
pop.labels.cex | Size of population labels [default 12]. |
ind.circles | Should individuals plotted as circles [default TRUE]. |
ind.circle.cols | Colors of circles. Colors can be provided as usual by names (e.g. "black") and are re-cycled. So a color c("blue","red") colors individuals alternatively between blue and red using the genlight objectorder of individuals. For transparency see parameter ind.circle.transparency. Defaults to rainbow colors by population if notprovided. If you want to have your own colors for each population, checkthe platypus.gl example below. |
ind.circle.cex | (size or circles in pixels ) [default 10]. |
ind.circle.transparency | Transparency of circles between 0=invisible and 1=no transparency. Defaults to 0.8. |
palette_links | Color palette for the links in case a matrix is provided[default NULL]. |
leg_title | Legend's title for the links in case a matrix is provided[default NULL]. |
provider | Passed to leaflet [default "Esri.NatGeoWorldMap"]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
A wrapper around theleaflet package. For possible background maps check as specified via the provider:http://leaflet-extras.github.io/leaflet-providers/preview/index.html
The palette_links argument can be any of the following:A character vector of RGB or named colors. Examples: palette(), c("#000000", "#0000FF", "#FFFFFF"), topo.colors(10)
The name of an RColorBrewer palette, e.g. "BuPu" or "Greens".
The full name of a viridis palette: "viridis", "magma", "inferno", or "plasma".
A function that receives a single value between 0 and 1 and returns a color.Examples: colorRamp(c("#000000", "#FFFFFF"), interpolate = "spline").
Value
plots a map
Author(s)
Bernd Gruber – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")gl.map.interactive(bandicoot.gl)cols <- c("red","blue","yellow")[as.numeric(pop(platypus.gl))]gl.map.interactive(platypus.gl, ind.circle.cols=cols, ind.circle.cex=10, ind.circle.transparency=0.5)Maps a STRUCTURE plot using a genlight object
Description
This function takes the output of plotstructure (the q matrix) and maps theq-matrix across using the population centers from the genlight object thatwas used to run the structure analysis viagl.run.structure)and plots the typical structure bar plots on a spatial map, providing abarplot for each subpopulation. Therefore it requires coordinates from agenlight object. This kind of plots should support the interpretation of thespatial structure of a population, but in principle is not different fromgl.plot.structure
Usage
gl.map.structure( qmat, x, K, provider = "Esri.NatGeoWorldMap", scalex = 1, scaley = 1, movepops = NULL, pop.labels = TRUE, pop.labels.cex = 12)Arguments
qmat | Q-matrix from a structure run followed by a clumpp run object[from |
x | Name of the genlight object containing the coordinates in the |
K | The number for K to be plotted [required]. |
provider | Provider passed to leaflet. Checkprovidersfor a list of possible backgrounds [default "Esri.NatGeoWorldMap"]. |
scalex | Scaling factor to determine the size of the bars in x direction [default 1]. |
scaley | Scaling factor to determine the size of the bars in y direction[default 1]. |
movepops | A two-dimensional data frame that allows to move the center ofthe barplots manually in case they overlap. Often if populations arehorizontally close to each other. This needs to be a data.frame of thedimensions [rows=number of populations, columns = 2 (lon/lat)]. For eachpopulation you have to specify the x and y (lon and lat) units you want tomove the center of the plot, (see example for details) [default NULL]. |
pop.labels | Switch for population labels below the parplots [default TRUE]. |
pop.labels.cex | Size of population labels [default 12]. |
Details
Creates a mapped version of structure plots. For possible background mapscheck as specified via the provider:http://leaflet-extras.github.io/leaflet-providers/preview/index.html.You may need to adjust scalex and scaley values [default 1], as the sizedepends on the scale of the map and the position of the populations.
Value
An interactive map that shows the structure plots broken down by population.
returns the map and a list of the qmat split into sorted matrices perpopulation. This can be used to create your own map.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
References
Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016) strataG: An Rpackage for manipulating, summarizing and analysing population genetic data.Mol Ecol Resour. doi:10.1111/1755-0998.12559
Evanno, G., Regnaut, S., and J. Goudet. 2005. Detecting the number ofclusters of individuals using the software STRUCTURE: a simulation study.Molecular Ecology 14:2611-2620.
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a clustermatching and permutation program for dealing with label switching andmultimodality in analysis of population structure. Bioinformatics23(14):1801-1806. Available atclumpp
See Also
gl.run.structure,clumpp,gl.plot.structure
Examples
## Not run: #bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, exec = './structure.exe')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, k=2:4)#' #head(qmat)#gl.map.structure(qmat, bc,K=3)#gl.map.structure(qmat, bc,K=4)#move population 4 (out of 5) 0.5 degrees to the right and populations 1#0.3 degree to the north of the map.#mp <- data.frame(lon=c(0,0,0,0.5,0), lat=c(-0.3,0,0,0,0))#gl.map.structure(qmat, bc,K=4, movepops=mp)## End(Not run)Merges two or more populations in a genlight object into one population
Description
Individuals are assigned to populations based on the specimen metadata datafile (csv) used with gl.read.dart().
This script assigns individuals from two nominated populations into a newsingle population. It can also be used to rename populations.
The script returns a genlight object with the new population assignments.
Usage
gl.merge.pop(x, old = NULL, new = NULL, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes [required]. |
old | A list of populations to be merged [required]. |
new | Name of the new population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object with the new population assignments.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
gl <- gl.merge.pop(testset.gl, old=c('EmsubRopeMata','EmvicVictJasp'), new='Outgroup')Creates an input file for the program NewHybrids and runs it if NewHybrids is installed
Description
This function compares two sets of parental populations to identify loci thatexhibit a fixed difference, returns an genlight object with the reduceddata, and creates an input file for the program NewHybrids using the top 200(or user-specified lower loc.limit) loci. In the absence of two identifiedparental populations, the script will select a random set of up to 200 loci only(method='random') or up to the first 200 loci ranked on information content(method='AvgPIC').
A fixed difference occurs when a SNP allele is present in all individualsof one population and absent in the other. There is provision for settinga level of tolerance, e.g. threshold = 0.05 which considers alleles presentat greater than 95a fixed difference. Only up to 200 loci are retained, because of limitationsof NewHybids.
If you specify a directory for the NewHybrids executable file, then thescript will create the input file from the SNP data then run NewHybrids. Ifthe directory is set to NULL, the execution will stop once the input file(default='nhyb.txt') has been written to disk. Note: the executable optionwill not work on a Mac; Mac users should generate the NewHybrids input fileand run this on their local installation of NewHybrids.
Refer to the New Hybrids manual for further information on the parameters toset– http://ib.berkeley.edu/labs/slatkin/eriq/software/new_hybs_doc1_1Beta3.pdf
It is important to stringently filter the data on RepAvg and CallRate ifusing the random option. One might elect to repeat the analysis(method='random') and combine the resultant posterior probabilities shouldthe maximum of 200 loci be considered insufficient.
The F1 individuals should be homozygous at all loci for which the parentalpopulations are fixed and different, assuming parental populations have beenspecified. Sampling errors can result in this not being the case, especiallywhere the sample sizes for the parental populations are small. Alternatively,the threshold for posterior probabilities used to determine assignment(pprob) or the definition of a fixed difference (threshold) may be too lax.To assess the error rate in the determination of assignment of F1individuals, a plot of the frequency of homozygous reference, heterozygotesand homozygous alternate (SNP) can be produced by setting plot=TRUE (thedefault).
Usage
gl.nhybrids( gl, outpath = tempdir(), p0 = NULL, p1 = NULL, threshold = 0, method = "random", loc.limit = 200, plot = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, pprob = 0.95, nhyb.directory = NULL, BurnIn = 10000, sweeps = 10000, GtypFile = "TwoGensGtypFreq.txt", AFPriorFile = NULL, PiPrior = "Jeffreys", ThetaPrior = "Jeffreys", verbose = NULL)Arguments
gl | Name of the genlight object containing the SNP data [required]. |
outpath | Path where to save the output file [default tempdir()]. |
p0 | List of populations to be regarded as parental population 0[default NULL]. |
p1 | List of populations to be regarded as parental population 1[default NULL]. |
threshold | Sets the level at which a gene frequency difference isconsidered to be fixed [default 0]. |
method | Specifies the method (random or AvgPIC) to select 200 loci forNewHybrids [default random]. |
loc.limit | Specifies the number of loci to use in the analysis [default 200] |
plot | If TRUE, a plot of the frequency of homozygous reference,heterozygotes and homozygous alternate (SNP) is produced for the F1individuals[default TRUE, applies only if both parental populations are specified]. |
plot_theme | User specified theme [default theme_dartR()]. |
plot_colors | Vector with two color names for the borders and fill[default two_colors]. |
pprob | Threshold level for assignment to likelihood bins[default 0.95, used only if plot=TRUE]. |
nhyb.directory | Directory that holds the NewHybrids executable filee.g. C:/NewHybsPC [default NULL]. |
BurnIn | Number of sweeps to use in the burn in [default 10000]. |
sweeps | Number of sweeps to use in computing the actual MonteCarlo averages [default 10000]. |
GtypFile | Name of a file containing the genotype frequency classes[default TwoGensGtypFreq.txt]. |
AFPriorFile | Name of the file containing prior allele frequencyinformation [default NULL]. |
PiPrior | Jeffreys-like priors or Uniform priors for the parameter pi[default Jeffreys]. |
ThetaPrior | Jeffreys-like priors or Uniform priors for the parametertheta [default Jeffreys]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
The reduced genlight object, if parentals are provided; output ofNewHybrids is saved to the working directory.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
References
Anderson, E.C. and Thompson, E.A.(2002). A model-based method for identifying species hybrids using multilocus genetic data. Genetics. 160:1217-1229.
Examples
## Not run: m <- gl.nhybrids(testset.gl, p0=NULL, p1=NULL,nhyb.directory='D:/workspace/R/NewHybsPC', # Specify as necessaryoutpath="D:/workspace", # Specify as necessary, usually getwd() [= workspace]BurnIn=100,sweeps=100,verbose=3)## End(Not run)Identifies loci under selection per population using the outflankmethod of Whitlock and Lotterhos (2015)
Description
Identifies loci under selection per population using the outflankmethod of Whitlock and Lotterhos (2015)
Usage
gl.outflank( gi, plot = TRUE, LeftTrimFraction = 0.05, RightTrimFraction = 0.05, Hmin = 0.1, qthreshold = 0.05, ...)Arguments
gi | A genlight or genind object, with a defined population structure[required]. |
plot | A switch if a barplot is wanted [default TRUE]. |
LeftTrimFraction | The proportion of loci that are trimmed from thelower end of the range of Fst before the likelihood function is applied[default 0.05]. |
RightTrimFraction | The proportion of loci that are trimmed from theupper end of the range of Fst before the likelihood function is applied[default 0.05]. |
Hmin | The minimum heterozygosity required before including calculationsfrom a locus [default 0.1]. |
qthreshold | The desired false discovery rate threshold for calculatingq-values [default 0.05]. |
... | additional parameters (see documentation of outflank on github). |
Details
This function is a wrapper around the outflank function provided byWhitlock and Lotterhos. To be able to run this function the packages qvalue(from bioconductor) and outflank (from github) needs to be installed. To doso see example below.
Value
Returns an index of outliers and the full outflank list
References
Whitlock, M.C. and Lotterhos K.J. (2015) Reliable detection of lociresponsible for local adaptation: inference of a neutral model throughtrimming the distribution of Fst. The American Naturalist 186: 24 - 36.
Github repository: Whitlock & Lotterhos:https://github.com/whitlock/OutFLANK (Check the readme.pdf within therepository for an explanation. Be aware you now can run OufFLANK from agenlight object)
See Also
utils.outflank,utils.outflank.plotter,utils.outflank.MakeDiploidFSTMat
Examples
gl.outflank(bandicoot.gl, plot = TRUE)Ordination applied to genotypes in a genlight object (PCA), in an fdobject, or to a distance matrix (PCoA)
Description
This function takes the genotypes for individuals and undertakes a PearsonPrincipal Component analysis (PCA) on SNP or Tag P/A (SilicoDArT) data; itundertakes a Gower Principal Coordinate analysis (PCoA) if supplied with adistance matrix. Technically, any distance matrix can be represented in anordinated space using PCoA.
Usage
gl.pcoa( x, nfactors = 5, correction = NULL, mono.rm = TRUE, parallel = FALSE, n.cores = 16, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object or fd object containing the SNP data, ora distance matrix of type dist [required]. |
nfactors | Number of axes to retain in the output of factor scores[default 5]. |
correction | Method applied to correct for negative eigenvalues, either'lingoes' or 'cailliez' [Default NULL]. |
mono.rm | If TRUE, remove monomorphic loci [default TRUE]. |
parallel | TRUE if parallel processing is required (does fail underWindows) [default FALSE]. |
n.cores | Number of cores to use if parallel processing is requested[default 16]. |
plot.out | If TRUE, a diagnostic plot is displayed showing a scree plotfor the "informative" axes and a histogram of eigenvalues of the remaining "noise" axes [Default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplot [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | verbose= 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The function is essentially a wrapper for glPcaadegenet or pcoa {ape}with default settings apart from those specified as parameters in thisfunction. Sources of stress in the visual representation
While, technically, any distance matrix can be represented in an ordinatedspace, the representation will not typically be exact.There are three majorsources of stress in a reduced-representation of distances or dissimilaritiesamong entities using PCA or PCoA. By far the greatest source comes from thedecision to select only the top two or three axes from the ordinated set ofaxes derived from the PCA or PCoA. The representation of the entities such aheavily reduced space will not faithfully represent the distances in theinput distance matrix simply because of the loss of information in deeperinformative dimensions. For this reason, it is not sensible to be tooprecious about managing the other two sources of stress in the visualrepresentation.
The measure of distance between entities in a PCA is the Pearson CorrelationCoefficient, essentially a standardized Euclidean distance. This is both ametric distance and a Euclidean distance. In PCoA, the second source ofstress is the choice of distance measure or dissimilarity measure. While anydistance or dissimilarity matrix can be represented in an ordinated space,the distances between entities can be faithfully represented in that space(that is, without stress) only if the distances are metric. Furthermore, fordistances between entities to be faithfully represented in a rigid Cartesianspace, the distance measure needs to be Euclidean. If this is not the case,the distances between the entities in the ordinated visualized space will notexactly represent the distances in the input matrix (stress will be non-zero).This source of stress will be evident as negative eigenvalues in the deeperdimensions.
A third source of stress arises from having a sparse dataset, one withmissing values. This affects both PCA and PCoA. If the original data matrixis not fully populated, that is, if there are missing values, then even aEuclidean distance matrix will not necessarily be 'positive definite'. Itfollows that some of the eigenvalues may be negative, even though thedistance metric is Euclidean. This issue is exacerbated when the number ofloci greatly exceeds the number of individuals, as is typically the case whenworking with SNP data. The impact of missing values can be minimized bystringently filtering on Call Rate, albeit with loss of data. An alternativeis given in a paper 'Honey, I shrunk the sample covariance matrix' and morerecently by Ledoit and Wolf (2018), but their approach has not beenimplemented here.
The good news is that, unless the sum of the negative eigenvalues, arisingfrom a non-Euclidean distance measure or from missing values, approachesthose of the final PCA or PCoA axes to be displayed, the distortion isprobably of no practical consequence and certainly not comparable to thestress arising from selecting only two or three final dimensions out ofseveral informative dimensions for the visual representation.
Function's output
Two diagnostic plots are produced. The first is a Scree Plot, showing thepercentage variation explained by each of the PCA or PCoA axes, for thoseaxes that explain more than the original variables (loci) on average. Thatis, only informative axes are displayed. The scree plot informs the number ofdimensions to be retained in the visual summaries. As a rule of thumb, axeswith more than 10
The second graph shows the distribution of eigenvalues for the remaininguninformative (noise) axes, including those with negative eigenvalues.
Action is recommended (verbose >= 2) if the negative eigenvalues aredominant, their sum approaching in magnitude the eigenvalues for axesselected for the final visual solution.
Output is a glPca object conforming to adegenet::glPca but with only thefollowing retained.
$call - The call that generated the PCA/PCoA
$eig - Eigenvalues – All eigenvalues (positive, null, negative).
$scores - Scores (coefficients) for each individual
$loadings - Loadings of each SNP for each principal component
Plots and table were saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that the Rsession is closed.
Examples of other themes that can be used can be consulted in
PCA was developed by Pearson (1901) and Hotelling (1933), whilst the bestmodern reference is Jolliffe (2002). PCoA was developed by Gower (1966) whilethe best modern reference is Legendre & Legendre (1998).
Value
An object of class pcoa containing the eigenvalues and factor scores
Author(s)
Author(s): Arthur Georges. Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
References
Cailliez, F. (1983) The analytical solution of the additive constantproblem. Psychometrika, 48, 305-308.
Gower, J. C. (1966) Some distance properties of latent root and vectormethods used in multivariate analysis. Biometrika, 53, 325-338.
Hotelling, H., 1933. Analysis of a complex of statistical variables intoPrincipal Components. Journal of Educational Psychology 24:417-441, 498-520.
Jolliffe, I. (2002) Principal Component Analysis. 2nd Edition, Springer,New York.
Ledoit, O. and Wolf, M. (2018). Analytical nonlinear shrinkage oflarge-dimensional covariance matrices. University of Zurich, Department ofEconomics, Working Paper No. 264, Revised version. Available at SSRN:https://ssrn.com/abstract=3047302 or http://dx.doi.org/10.2139/ssrn.3047302
Legendre, P. and Legendre, L. (1998). Numerical Ecology, Volume 24, 2ndEdition. Elsevier Science, NY.
Lingoes, J. C. (1971) Some boundary conditions for a monotone analysisof symmetric matrices. Psychometrika, 36, 195-203.
Pearson, K. (1901). On lines and planes of closest fit to systems ofpoints in space. Philosophical Magazine. Series 6, vol. 2, no. 11, pp.559-572.
See Also
Examples
## Not run: gl <- possums.gl# PCA (using SNP genlight object)pca <- gl.pcoa(possums.gl[1:50,],verbose=2)gl.pcoa.plot(pca,gl)gs <- testset.gslevels(pop(gs))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',6),'Em.subglobosa','Em.victoriae')# PCA (using SilicoDArT genlight object)pca <- gl.pcoa(gs)gl.pcoa.plot(pca,gs)# Collapsing pops to OTUs using Fixed Difference Analysis (using fd object)fd <- gl.fixed.diff(testset.gl)fd <- gl.collapse(fd)pca <- gl.pcoa(fd)gl.pcoa.plot(pca,fd$gl)# Using a distance matrixD <- gl.dist.ind(testset.gs, method='jaccard')pcoa <- gl.pcoa(D,correction="cailliez")gl.pcoa.plot(pcoa,gs)## End(Not run)Bivariate or trivariate plot of the results of an ordination generatedusing gl.pcoa()
Description
This script takes output from the ordination generated by gl.pcoa() and plotsthe individuals classified by population.
Usage
gl.pcoa.plot( glPca, x, scale = FALSE, ellipse = FALSE, plevel = 0.95, pop.labels = "pop", interactive = FALSE, as.pop = NULL, hadjust = 1.5, vadjust = 1, xaxis = 1, yaxis = 2, zaxis = NULL, pt.size = 2, pt.colors = NULL, pt.shapes = NULL, label.size = 1, axis.label.size = 1.5, save2tmp = FALSE, verbose = NULL)Arguments
glPca | Name of the PCA or PCoA object containing the factor scores andeigenvalues [required]. |
x | Name of the genlight object or fd object containing the SNPgenotypes or Tag P/A (SilicoDArT) genotypes or the Distance Matrix used togenerate the ordination [required]. |
scale | If TRUE, scale the x and y axes in proportion to % variationexplained [default FALSE]. |
ellipse | If TRUE, display ellipses to encapsulate points for eachpopulation [default FALSE]. |
plevel | Value of the percentile for the ellipse to encapsulate pointsfor each population [default 0.95]. |
pop.labels | How labels will be added to the plot['none'|'pop'|'legend', default = 'pop']. |
interactive | If TRUE then the populations are plotted without labels,mouse-over to identify points [default FALSE]. |
as.pop | Assign another metric to represent populations for the plot[default NULL]. |
hadjust | Horizontal adjustment of label position in 2D plots[default 1.5]. |
vadjust | Vertical adjustment of label position in 2D plots [default 1]. |
xaxis | Identify the x axis from those available in the ordination(xaxis <= nfactors) [default 1]. |
yaxis | Identify the y axis from those available in the ordination(yaxis <= nfactors) [default 2]. |
zaxis | Identify the z axis from those available in the ordination for a3D plot (zaxis <= nfactors) [default NULL]. |
pt.size | Specify the size of the displayed points [default 2]. |
pt.colors | Optionally provide a vector of nPop colors(run gl.select.colors() for color options) [default NULL]. |
pt.shapes | Optionally provide a vector of nPop shapes(run gl.select.shapes() for shape options) [default NULL]. |
label.size | Specify the size of the point labels [default 1]. |
axis.label.size | Specify the size of the displayed axis labels[default 1.5]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The factor scores are taken from the output of gl.pcoa() and the populationassignments are taken from from the original data file. In the bivariateplots, the specimens are shown optionally with adjacent labels and enclosingellipses. Population labels on the plot are shuffled so as not to overlap(using package {directlabels}).This can be a bit clunky, as the labels may be some distance from the pointsto which they refer, but it provides the opportunity for moving labels aroundusing graphics software (e.g. Adobe Illustrator).
3D plotting is activated by specifying a zaxis.
Any pair or trio of axes can be specified from the ordination, provided theyare within the range of the nfactors value provided to gl.pcoa().In the 2D plots, axes can be scaled to represent the proportion of variationexplained. In any case, the proportion of variation explained by each axis isprovided in the axis label.
Colors and shapes of the points can be altered by passing a vector of shapesand/or a vector of colors. These vectors can be created withgl.select.shapes() and gl.select.colors() and passed to this script using thept.shapes and pt.colors parameters.
Points displayed in the ordination can be identified if the optioninteractive=TRUE is chosen, in which case the resultant plot is ggplotly()friendly. Identification of points is by moving the mouse over them. Referto the plotly package for further information.The interactive option is automatically enabled for 3D plotting.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other Exploration/visualisation functions:gl.select.colors(),gl.select.shapes(),gl.smearplot()
Examples
# SET UP DATASETgl <- testset.gllevels(pop(gl))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',7),'Em.subglobosa','Em.victoriae')# RUN PCApca<-gl.pcoa(gl,nfactors=5)# VARIOUS EXAMPLESgl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.95, pop.labels='pop', axis.label.size=1, hadjust=1.5,vadjust=1)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, pop.labels='legend', axis.label.size=1)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, pop.labels='legend', axis.label.size=1.5,scale=TRUE)gl.pcoa.plot(pca, gl, ellipse=TRUE, axis.label.size=1.2, xaxis=1, yaxis=3, scale=TRUE)gl.pcoa.plot(pca, gl, pop.labels='none',scale=TRUE)gl.pcoa.plot(pca, gl, axis.label.size=1.2, interactive=TRUE)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, xaxis=1, yaxis=2, zaxis=3)# color AND SHAPE ADJUSTMENTSshp <- gl.select.shapes(select=c(16,17,17,0,2))col <- gl.select.colors(library='brewer',palette='Spectral',ncolors=11,select=c(1,9,3,11,11))gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.95, pop.labels='pop', pt.colors=col, pt.shapes=shp, axis.label.size=1, hadjust=1.5,vadjust=1)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, pop.labels='legend', pt.colors=col, pt.shapes=shp, axis.label.size=1) test <- gl.pcoa(platypus.gl) gl.pcoa.plot(glPca = test, x = platypus.gl)Generates percentage allele frequencies by locus and population
Description
This is a support script, to take SNP data or SilicoDArT presence/absencedata grouped into populations in a genlight object {adegenet} and generatea table of allele frequencies for each population and locus
Usage
gl.percent.freq(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or Tag P/A(SilicoDArT) data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A matrix with allele (SNP data) or presence/absence frequencies(Tag P/A data) broken down by population and locus
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
Examples
m <- gl.percent.freq(testset.gl)Replays the history and applies it to a genlight object
Description
Replays the history and applies it to a genlight object
Usage
gl.play.history(x, history = NULL, verbose = 0)Arguments
x | A genlight object (with a history slot) [optional]. |
history | If no history is provided the complete history ofx is used (recreating the identical object x). If history is a vector itindicates which which part of the history of x is used [ |
verbose | If set to one then history commands are printed,which may facilitate reading the output [default 0]. |
Details
This function basically allows to create a 'template history'(=set of filters) and apply them to any other genlight object. Histories canalso be saved and loaded (see. gl.save.history and gl.load.history).
Value
Returns a genlight object that was created by replaying the providedapplied to the genlight object x. Please note you can 'mix' histories orpart of them and apply them to different genlight objects. If the historydoes not containgl.read.dart, histories of x and history areconcatenated.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr).
Examples
## Not run: dartfile <- system.file('extdata','testset_SNPs_2Row.csv', package='dartR')metadata <- system.file('extdata','testset_metadata.csv', package='dartR')gl <- gl.read.dart(dartfile, ind.metafile = metadata, probar=FALSE)gl2 <- gl.filter.callrate(gl, method='loc', threshold=0.9)gl3 <- gl.filter.callrate(gl2, method='ind', threshold=0.95)#Now 'replay' part of the history 'onto' another genlight object#bc.fil <- gl.play.history(gl.compliance.check(bandicoot.gl),#history=gl3@other$history[c(2,3)], verbose=1)#gl.print.history(bc.fil)## End(Not run)Plots fastStructure analysis results (Q-matrix)
Description
This function takes a fastStructure run object (output fromgl.run.faststructure) and plots the typical structure barplot that visualize the q matrix of a fastStructure run.
Usage
gl.plot.faststructure( sr, k.range, met_clumpp = "greedyLargeK", iter_clumpp = 100, clumpak = TRUE, plot_theme = NULL, colors_clusters = NULL, ind_name = TRUE, border_ind = 0.15)Arguments
sr | fastStructure run object from |
k.range | The number for K of the q matrix that should be plotted. Needs to be within you simulated range of K's in your sr structure run object. If NULL, all the K's are plotted [default NULL]. |
met_clumpp | The algorithm to use to infer the correct permutations.One of 'greedy' or 'greedyLargeK' or 'stephens' [default "greedyLargeK"]. |
iter_clumpp | The number of iterations to use if running either 'greedy''greedyLargeK' [default 100]. |
clumpak | Whether use the Clumpak method (see details) [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default NULL]. |
colors_clusters | A color palette for clusters (K) or a list withas many colors as there are clusters (K) [default NULL]. |
ind_name | Whether to plot individual names [default TRUE]. |
border_ind | The width of the border line between individuals [default 0.25]. |
Details
The function outputs a barplot which is the typical output offastStructure.
This function is based on the methods of CLUMPP and Clumpak as implemented in the R package starmie (https://github.com/sa-lee/starmie).
The Clumpak method identifies sets of highly similar runs among all the replicates of the same K. The method then separates the distinct groups of runs representing distinct modes in the space of possible solutions.
The CLUMPP method permutes the clusters output by independent runs of clustering programs such as structure, so that they match up as closely as possible.
This function averages the replicates within each mode identified by the Clumpak method.
Examples of other themes that can be used can be consulted in
Value
List of Q-matrices
Author(s)
Bernd Gruber & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Raj, A., Stephens, M., & Pritchard, J. K. (2014). fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics, 197(2), 573-589.
Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Kopelman, Naama M., et al. "Clumpak: a program for identifying clustering modes and packaging population structure inferences across K." Molecular ecology resources 15.5 (2015): 1179-1191.
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a clustermatching and permutation program for dealing with label switching andmultimodality in analysis of population structure. Bioinformatics23(14):1801-1806. Available atclumpp
See Also
gl.run.faststructure
Examples
## Not run: t1 <- gl.filter.callrate(platypus.gl,threshold = 1)res <- gl.run.faststructure(t1, exec = "./fastStructure",k.range = 2:3, num.k.rep = 2,output = paste0(getwd(),"/res_str"))qmat <- gl.plot.faststructure(res,k.range=2:3)gl.map.structure(qmat, K=2, t1, scalex=1, scaley=0.5)## End(Not run)Represents a distance matrix as a heatmap
Description
The script plots a heat map to represent the distances in the distance ordissimilarity matrix. This function is a wrapper forheatmap.2 (package gplots).
Usage
gl.plot.heatmap(D, palette.divergent = gl.colors("div"), verbose = NULL, ...)Arguments
D | Name of the distance matrix or class fd object [required]. |
palette.divergent | A divergent palette for the distance values[default gl.colors("div")]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity] |
... | Parameters passed to functionheatmap.2 (package gplots) |
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr)
Examples
## Not run: gl <- testset.gl[1:10,] D <- dist(as.matrix(gl),upper=TRUE,diag=TRUE) gl.plot.heatmap(D) D2 <- gl.dist.pop(possums.gl) gl.plot.heatmap(D2) D3 <- gl.fixed.diff(testset.gl) gl.plot.heatmap(D3) ## End(Not run) if ((requireNamespace("gplots", quietly = TRUE))) { D2 <- gl.dist.pop(possums.gl) gl.plot.heatmap(D2) }Represents a distance or dissimilarity matrix as a network
Description
This script takes a distance matrix generated by dist() and represents therelationship among the specimens as a network diagram. In order to use thisscript, a decision is required on a threshold for relatedness to berepresented as link in the network, and on the layout used to create thediagram.
Usage
gl.plot.network( D, x = NULL, method = "fr", node.size = 3, node.label = FALSE, node.label.size = 0.7, node.label.color = "black", alpha = 0.005, title = "Network based on genetic distance", verbose = NULL)Arguments
D | A distance or dissimilarity matrix generated by dist() or gl.dist()[required]. |
x | A genlight object from which the D matrix was generated[default NULL]. |
method | One of "fr", "kk" or "drl" [default "fr"]. |
node.size | Size of the symbols for the network nodes [default 3]. |
node.label | TRUE to display node labels [default FALSE]. |
node.label.size | Size of the node labels [default 0.7]. |
node.label.color | Color of the text of the node labels[default 'black']. |
alpha | Upper threshold to determine which links between nodes to display[default 0.005]. |
title | Title for the plot[default "Network based on genetic distance"]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The threshold for relatedness to be represented as a link in the network isspecified as a quantile. Those relatedness measures above the quantile areplotted as links, those below the quantile are not. Often you are looking forrelatedness outliers in comparison with the overall relatedness amongindividuals, so a very conservative quantile is used (e.g. 0.004), butultimately, this decision is made as a matter of trial and error. One way toapproach this trial and error is to try to achieve a sparse set of linksbetween unrelated 'background' individuals so that the stronger links arepreferentially shown.
There are several layouts from which to choose. The most popular are given asoptions in this script.
fr – Fruchterman, T.M.J. and Reingold, E.M. (1991). Graph Drawing byForce-directed Placement. Software – Practice and Experience 21:1129-1164.
kk – Kamada, T. and Kawai, S.: An Algorithm for Drawing GeneralUndirected Graphs. Information Processing Letters 31:7-15, 1989.
drl – Martin, S., Brown, W.M., Klavans, R., Boyack, K.W., DrL:Distributed Recursive (Graph) Layout. SAND Reports 2936:1-10, 2008.
Colors of node symbols are those of the rainbow.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
if ((requireNamespace("rrBLUP", quietly = TRUE)) & (requireNamespace("gplots", quietly = TRUE))) {test <- gl.subsample.loci(platypus.gl, n = 100)test <- gl.keep.ind(test,ind.list = indNames(test)[1:10])D <- gl.grm(test, legendx=0.04)gl.plot.network(D,test)}Plots STRUCTURE analysis results (Q-matrix)
Description
This function takes a structure run object (output fromgl.run.structure) and plots the typical structure barplot that visualize the q matrix of a structure run.
Usage
gl.plot.structure( sr, K = NULL, met_clumpp = "greedyLargeK", iter_clumpp = 100, clumpak = TRUE, plot_theme = NULL, colors_clusters = NULL, ind_name = TRUE, border_ind = 0.15, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)Arguments
sr | Structure run object from |
K | The number for K of the q matrix that should be plotted. Needs tobe within you simulated range of K's in your sr structure run object. If NULL, all the K's are plotted [default NULL]. |
met_clumpp | The algorithm to use to infer the correct permutations.One of 'greedy' or 'greedyLargeK' or 'stephens' [default "greedyLargeK"]. |
iter_clumpp | The number of iterations to use if running either 'greedy''greedyLargeK' [default 100]. |
clumpak | Whether use the Clumpak method (see details) [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default NULL]. |
colors_clusters | A color palette for clusters (K) or a list withas many colors as there are clusters (K) [default NULL]. |
ind_name | Whether to plot individual names [default TRUE]. |
border_ind | The width of the border line between individuals [default 0.25]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [defaultNULL, unless specified using gl.set.verbosity] |
Details
The function outputs a barplot which is the typical output ofstructure. For a Evanno plot use gl.evanno.
This function is based on the methods of CLUMPP and Clumpak as implemented in the R package starmie (https://github.com/sa-lee/starmie).
The Clumpak method identifies sets of highly similar runs among all the replicates of the same K. The method then separates the distinct groups of runs representing distinct modes in the space of possible solutions.
The CLUMPP method permutes the clusters output by independent runs of clustering programs such as structure, so that they match up as closely as possible.
This function averages the replicates within each mode identified by the Clumpak method.
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in
Value
List of Q-matrices
Author(s)
Bernd Gruber & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Kopelman, Naama M., et al. "Clumpak: a program for identifying clustering modes and packaging population structure inferences across K." Molecular ecology resources 15.5 (2015): 1179-1191.
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a clustermatching and permutation program for dealing with label switching andmultimodality in analysis of population structure. Bioinformatics23(14):1801-1806. Available atclumpp
See Also
gl.run.structure,gl.plot.structure
Examples
## Not run: #bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, exec = './structure')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, K=3)#head(qmat)#gl.map.structure(qmat, K=3, bc, scalex=1, scaley=0.5)## End(Not run)Prints history of a genlight object
Description
Prints history of a genlight object
Usage
gl.print.history(x = NULL, history = NULL)Arguments
x | A genlight object (with history) [optional]. |
history | Either a link to a history slot(gl\@other$history), or a vector indicating which part of the history of x isused [c(1,3,4) uses the first, third and forth entry from x\@other$history].If no history is provided the complete history of x is used (recreating theidentical object x) [optional]. |
Value
Prints a table with all history records. Currently the style cannotbe changed.
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
dartfile <- system.file('extdata','testset_SNPs_2Row.csv', package='dartR')metadata <- system.file('extdata','testset_metadata.csv', package='dartR')gl <- gl.read.dart(dartfile, ind.metafile = metadata, probar=FALSE)gl2 <- gl.filter.callrate(gl, method='loc', threshold=0.9)gl3 <- gl.filter.callrate(gl2, method='ind', threshold=0.95)#Now 'replay' part of the history 'onto' another genlight object#bc.fil <- gl.play.history(gl.compliance.check(bandicoot.gl),#history=gl3@other$history[c(2,3)], verbose=1)#gl.print.history(bc.fil)Prints dartR reports saved in tempdir
Description
Prints dartR reports saved in tempdir
Usage
gl.print.reports(print_report)Arguments
print_report | Number of report from |
Value
Prints reports that were saved in tempdir.
Author(s)
Bernd Gruber & Luis Mijangos (bugs? Post tohttps://groups.google.com/d/forum/dartr)
See Also
Examples
## Not run: reports <- gl.print.reports(1)## End(Not run)Calculates a similarity (distance) matrix for individuals on the proportion ofshared alleles
Description
This script calculates an individual based distance matrix. It uses an C++implementation, so package Rcpp needs to be installed and it is thereforereally fast (once it has compiled the function after the first run).
Usage
gl.propShared(x)Arguments
x | Name of the genlight containing the SNP genotypes [required]. |
Value
A similarity matrix
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
#takes some time at the first run of the function...## Not run: res <- gl.propShared(bandicoot.gl)res[1:5,1:7] #show only a small part of the matrix## End(Not run)Randomly changes the allocation of 0's and 2's in a genlight object
Description
This function samples randomly half of the SNPs and re-codes, in the sampledSNP's, 0's by 2's.
Usage
gl.random.snp(x, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
plot.out | Specify if a plot is to be produced [default TRUE]. |
save2tmp | If TRUE, saves any ggplots to the session temporary directory(tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default NULL,unless specified using gl.set.verbosity]. |
Details
DArT calls the most common allele as the reference allele. In a genlightobject, homozygous for the reference allele are coded with a '0' andhomozygous for the alternative allele are coded with a '2'. This causes somedistortions in visuals from time to time.
If plot.out = TRUE, two smear plots (pre-randomisation andpost-randomisation) are presented using a random subset of individuals (10)and loci (100) to provide an overview of the changes.
Resultant ggplots are saved to the session's temporary directory.
Value
Returns a genlight object with half of the loci re-coded.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")res <- gl.random.snp(platypus.gl[1:5,1:5],verbose = 5)Reads SNP data from a csv file into a genlight object
Description
This script takes SNP genotypes from a csv file, combines them withindividual and locus metrics and creates a genlight object.
Usage
gl.read.csv( filename, transpose = FALSE, ind.metafile = NULL, loc.metafile = NULL, verbose = NULL)Arguments
filename | Name of the csv file containing the SNP genotypes [required]. |
transpose | If TRUE, rows are loci and columns are individuals[default FALSE]. |
ind.metafile | Name of the csv file containing the metrics forindividuals [optional]. |
loc.metafile | Name of the csv file containing the metrics forloci [optional]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The SNP data need to be in one of two forms. SNPs can be coded 0 forhomozygous reference, 2 for homozygous alternate, 1 for heterozygous, and NA for missing values; or the SNP data can be coded A/A, A/C, C/T, G/A etc,and -/- for missing data. In this format, the reference allele is the most frequent allele, as used by DArT. Other formats will throw an error.
The SNP data need to be individuals as rows, labeled, and loci as columns,also labeled. If the orientation is individuals as columns and loci by rows,then set transpose=TRUE.
The individual metrics need to be in a csv file, with headings, with amandatory id column corresponding exactly to the individual identity labelsprovided with the SNP data and in the same order.
The locus metadata needs to be in a csv file with headings, with a mandatorycolumn headed AlleleID corresponding exactly to the locus identity labelsprovided with the SNP data and in the same order.
Note that the locus metadata will be complemented by calculable statisticscorresponding to those that would be provided by Diversity Arrays Technology(e.g. CallRate).
Value
A genlight object with the SNP data and associated metadata included.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
csv_file <- system.file('extdata','platy_test.csv', package='dartR')ind_metadata <- system.file('extdata','platy_ind.csv', package='dartR')gl <- gl.read.csv(filename = csv_file, ind.metafile = ind_metadata)Imports DArT data into dartR and converts it into a dartR genlight object
Description
This function is a wrapper function that allows you to convert your DArT fileinto a genlight object of class dartR.
Usage
gl.read.dart( filename, ind.metafile = NULL, recalc = TRUE, mono.rm = FALSE, nas = "-", topskip = NULL, lastmetric = NULL, covfilename = NULL, service.row = 1, plate.row = 3, probar = FALSE, verbose = NULL)Arguments
filename | File containing the SNP data (csv file) [required]. |
ind.metafile | File that contains additional information on individuals[required]. |
recalc | If TRUE, force the recalculation of locus metrics [default TRUE]. |
mono.rm | If TRUE, force the removal of monomorphic loci (including all NAs.[default FALSE]. |
nas | A character specifying NAs [default '-']. |
topskip | A number specifying the number of initial rows to be skipped. [default NULL]. |
lastmetric | Deprecated, specifies the last column of locus metadata. Can be specified as a column number [default NULL]. |
covfilename | Deprecated, sse ind.metafile parameter [NULL]. |
service.row | The row number for the DArT serviceis contained [default 1]. |
plate.row | The row number the plate well [default 3]. |
probar | Show progress bar [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, or as set by gl.set.verbose()]. |
Details
The function will determine automatically if the data are in Diversity Arraysone-row csv format or two-row csv format.
The first row of data is determined from the number of rows with an * in the first column. This can be alternatively specified with the topskip parameter.
The DArT service code is added to the ind.metrics of the genlight object. The row containing the service code for each individual can be specified with the service.row parameter.
#'The DArT plate well is added to the ind.metrics of the genlight object. The row containing the plate well for each individual can be specified with the plate.row parameter.
If individuals have been deleted from the input file manually, then the locusmetrics supplied by DArT will no longer be correct and some loci may bemonomorphic. To accommodate this, set mono.rm and recalc to TRUE.
Value
A dartR genlight object that contains individual and locus metrics[if data were provided] and locus metrics [from a DArT report].
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.recode.ind(),gl.recode.pop(),gl.set.verbosity()
Examples
dartfile <- system.file('extdata','testset_SNPs_2Row.csv', package='dartR')metadata <- system.file('extdata','testset_metadata.csv', package='dartR')gl <- gl.read.dart(dartfile, ind.metafile = metadata, probar=TRUE)Reads FASTA files and converts them to genlight object
Description
The following IUPAC Ambiguity Codes are taken as heterozygotes:
M is heterozygote forAC and CA
R is heterozygotefor AG and GA
W is heterozygotefor AT and TA
S is heterozygotefor CG and GC
Y is heterozygotefor CT and TC
K is heterozygotefor GT and TG
The following IUPAC Ambiguity Codes are taken as missing data:
V
H
D
B
N
The function can deal with missing data in individuals, e.g. when FASTA files have different number of individuals due to missing data.
The allele with the highest frequency is taken as the reference allele.
SNPs with more than two alleles are skipped.
Usage
gl.read.fasta(fasta_files, parallel = FALSE, n_cores = NULL, verbose = NULL)Arguments
fasta_files | Fasta files to read [required]. |
parallel | A logical indicating whether multiple cores -if available-should be used for the computations (TRUE), or not (FALSE); requires thepackage parallel to be installed [default FALSE]. |
n_cores | If parallel is TRUE, the number of cores to be used in thecomputations; if NULL, then the maximum number of cores available on thecomputer is used [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
Ambiguity characters are often used to code heterozygotes. However, usingheterozygotes as ambiguity characters may bias many estimates. See moreinformation in the link below:https://evodify.com/heterozygotes-ambiguity-characters/
Value
A genlight object.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
# Folder where the fasta files are located. folder_samples <- system.file('extdata', package ='dartR') # listing the FASTA files, including their path. Files have an extension # that contains "fas". file_names <- list.files(path = folder_samples, pattern = "*.fas", full.names = TRUE) # reading fasta files obj <- gl.read.fasta(file_names)Imports presence/absence data from SilicoDArT to genlight {agegenet}format (ploidy=1)
Description
DaRT provide the data as a matrix of entities (individual animals) across thetop and attributes (P/A of sequenced fragment) down the side in a formatthat is unique to DArT. This program reads the data in to adegenet formatfor consistency with other programming activity. The script may requiremodification as DArT modify their data formats from time to time.
Usage
gl.read.silicodart( filename, ind.metafile = NULL, nas = "-", topskip = NULL, lastmetric = "Reproducibility", probar = TRUE, verbose = NULL)Arguments
filename | Name of csv file containing the SilicoDArT data [required]. |
ind.metafile | Name of csv file containing metadata assigned to eachentity (individual) [default NULL]. |
nas | Missing data character [default '-']. |
topskip | Number of rows to skip before the header row (containing thespecimen identities) [optional]. |
lastmetric | Specifies the last non genetic column (Default is'Reproducibility'). Be sure to check if that is true, otherwise the number ofindividuals will not match. You can also specify the last column by a number[default "Reproducibility"]. |
probar | Show progress bar [default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, or as set by gl.set.verbose()]. |
Details
gl.read.silicodart() opens the data file (csv comma delimited) and skips thefirst n=topskip lines. The script assumes that the next line contains theentity labels (specimen ids) followed immediately by the SNP data for thefirst locus.
It reads the presence/absence data into a matrix of 1s and 0s, and inputs thelocus metadata and specimen metadata. The locus metadata comprises a seriesof columns of values for each locus including the essential columns ofCloneID and the desirable variables Reproducibility and PIC. Refer todocumentation provide by DArT for an explanation of these columns.
The specimen metadata provides the opportunity to reassign specimens topopulations, and to add other data relevant to the specimen. The keyvariables are id (specimen identity which must be the same and in the sameorder as the SilicoDArT file, each unique), pop (population assignment), lat(latitude, optional) and lon (longitude, optional). id, pop, lat, lon arethe column headers in the csv file. Other optional columns can be added.
The data matrix, locus names (forced to be unique), locus metadata, specimennames, specimen metadata are combined into a genind object. Refer to thedocumentation for {adegenet} for further details.
Value
An object of classgenlight with ploidy set to 1, containingthe presence/absence data, and locus and individual metadata.
Author(s)
Custodian: Bernd Gruber – Post tohttps://groups.google.com/d/forum/dartr
See Also
Examples
silicodartfile <- system.file('extdata','testset_SilicoDArT.csv', package='dartR')metadata <- system.file('extdata',ind.metafile ='testset_metadata_silicodart.csv', package='dartR')testset.gs <- gl.read.silicodart(filename = silicodartfile, ind.metafile = metadata)Converts a vcf file into a genlight object
Description
This function needs package vcfR, please install it.
Usage
gl.read.vcf(vcffile, ind.metafile = NULL, verbose = NULL)Arguments
vcffile | A vcf file (works only for diploid data) [required]. |
ind.metafile | Optional file in csv format with metadata for eachindividual (see details for explanation) [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The ind.metadata file needs to have very specific headings. First a headingcalled id. Here the ids have to match the ids in the dartR object. The following column headings are optional.pop: specifies the population membership of each individual. lat and lonspecify spatial coordinates (in decimal degrees WGS1984 format). Additionalcolumns with individual metadata can be imported (e.g. age, gender).
Value
A genlight object.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
## Not run: obj <- gl.read.vcf(system.file('extdata/test.vcf', package='dartR'))## End(Not run)Assigns an individual metric as pop in a genlight {adegenet} object
Description
Individuals are assigned to populations based on theindividual/sample/specimen metrics file (csv) used with gl.read.dart().
One might want to define the population structure in accordance with anotherclassification, such as using an individual metric (e.g. sex, male orfemale). This script discards the current population assignments and replacesthem with new population assignments defined by a specified individualmetric.
The script returns a genlight object with the new population assignments.Note that the original population assignments are lost.
Usage
gl.reassign.pop(x, as.pop, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes [required]. |
as.pop | Specify the name of the individual metric to set as the popvariable [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object with the reassigned populations.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
# SNP data popNames(testset.gl) gl <- gl.reassign.pop(testset.gl, as.pop='sex',verbose=3) popNames(gl)# Tag P/A data popNames(testset.gs) gs <- gl.reassign.pop(testset.gs, as.pop='sex',verbose=3) popNames(gs)Recalculates locus metrics when individuals or populations are deleted from agenlight {adegenet} object
Description
When individuals,or populations, are deleted from a genlight object, thelocus metrics no longer apply. For example, the Call Rate may be differentconsidering the subset of individuals, compared with the full set. Thisscript recalculates those affected locus metrics, namely, avgPIC, CallRate,freqHets, freqHomRef, freqHomSnp, OneRatioRef, OneRatioSnp, PICRef andPICSnp.Metrics that remain unaltered are RepAvg and TrimmedSeq as they areunaffected by the removal of individuals.
Usage
gl.recalc.metrics(x, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes [required]. |
mono.rm | If TRUE, removes monomorphic loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The script optionally removes resultant monomorphic loci or lociwith all values missing and deletes them (using gl.filter.monomorphs.r).
The script returns a genlight object with the recalculated locus metadata.
Value
A genlight object with the recalculated locus metadata.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Examples
gl <- gl.recalc.metrics(testset.gl, verbose=2)Recodes individual (=specimen = sample) labels in a genlight object
Description
This function recodes individual labels and/or deletes individuals from a DaRTgenlight SNP file based on a lookup table provided as a csv file.
Usage
gl.recode.ind(x, ind.recode, recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
ind.recode | Name of the csv file containing the individual relabelling[required]. |
recalc | If TRUE, recalculate the locus metadata statistics if any individuals are deleted in the filtering [default FALSE]. |
mono.rm | If TRUE, remove monomorphic loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
Details
Renaming individuals may be required when there have been errors in labelingarising in the process from sample to sequence files. There may be occasionswhere renaming individuals is required for preparation of figures. Whencaution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a durable record of the changes.
The function works with genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
For SNP genotype data, the function, having deleted individuals, optionally identifies resultant monomorphic loci or loci with all values missing and deletes them. The script also optionally recalculates thelocus metadata as appropriate. The optional deletion of monomorphic lociand the optional recalculation of locus statistics is not available forTag P/A data (SilicoDArT).
The script returns a dartR genlight object with the new individual names and the recalculated locus metadata.
Value
A genlight or genind object with the recoded and reduced data.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.filter.monomorphs for filtering monomorphs,gl.recalc.metrics for recalculating locus metrics,gl.recode.pop for recoding populations
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.pop(),gl.set.verbosity()
Examples
file <- system.file('extdata','testset_ind_recode.csv', package='dartR') gl <- gl.recode.ind(testset.gl, ind.recode=file, verbose=3)Recodes population assignments in a genlight object
Description
This function recodes population assignments and/or deletes populations from aDaRT genlight object based on information provided in a csv populationrecode file.
Usage
gl.recode.pop(x, pop.recode, recalc = FALSE, mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
pop.recode | Name of the csv file containing the populationreassignments [required]. |
recalc | If TRUE, recalculates the locus metadata statistics if any individualsare deleted in the filtering [default FALSE]. |
mono.rm | If TRUE, removes monomorphic loci [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
Details
Individuals are assigned to populations based on the specimen metadata datafile (csv) used with gl.read.dart(). Recoding can be used to amalgamatepopulations or to selectively delete or retain populations.
When caution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a durable record of the changes.
The population recode file contains a list of populations taken from the genlightobject as the first column of the csv file, and the new populationassignments are located in the second column of the csv file. The keyword 'Delete' used as a new population assignment will result in the associated specimen being dropped from the dataset.
The function works with genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).
For SNP genotype data, the function, having deleted populations, optionally identifies resultant monomorphic loci or loci with all values missing and deletes them. The script also optionally recalculates thelocus metadata as appropriate. The optional deletion of monomorphic lociand the optional recalculation of locus statistics is not available forTag P/A data (SilicoDArT).
Value
A genlight object with the recoded and reduced data.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.set.verbosity()
Examples
mfile <- system.file('extdata', 'testset_pop_recode.csv', package='dartR') nPop(testset.gl) gl <- gl.recode.pop(testset.gl, pop.recode=mfile, verbose=3)Renames a population in a genlight object
Description
Individuals are assigned to populations based on the specimen metadata datafile (csv) used with gl.read.dart().
This script renames a nominated population.
The script returns a genlight object with the new population name.
Usage
gl.rename.pop(x, old = NULL, new = NULL, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes [required]. |
old | Name of population to be changed [required]. |
new | New name for the population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object with the new population name.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
gl <- gl.rename.pop(testset.gl, old='EmsubRopeMata', new='Outgroup')Reports summary of base pair frequencies
Description
This script calculates the frequencies of the four DNA nucleotide bases:adenine (A), cytosine (C), 'guanine (G) and thymine (T), and the frequency oftransitions (Ts) and transversions (Tv) in a DArT genlight object.
Usage
gl.report.bases( x, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
plot.out | If TRUE, histograms of base composition are produced[default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE] |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity] |
Details
The script checks first if trimmed sequences are included in thelocus metadata (@other$loc.metrics$TrimmedSequence), and if so, tallies upthe numbers of A, T, G and C bases. Only the reference state at the SNP locusis counted. Counts of transitions (Ts) and transversions (Tv) assume thatthere is no directionality, that is C->T is the same as T->C, because thereference state is arbitrary.
For presence/absence data (SilicoDArT), it is not possible to counttransversions or transitions or transversions/transitions ratio because theSNP data is not available, only a single sequence tag.
Examples of other themes that can be used can be consulted in
Value
The unchanged genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
# SNP data out <- gl.report.bases(testset.gl) #' # Tag P/A data out <- gl.report.bases(testset.gs)Reports summary of Call Rate for loci or individuals
Description
SNP datasets generated by DArT have missing values primarily arising fromfailure to call a SNP because of a mutation at one or both of the restrictionenzyme recognition sites. P/A datasets (SilicoDArT) have missing valuesbecause it was not possible to call whether a sequence tag was amplified ornot. This function tabulates the number of missing values as quantiles.
Usage
gl.report.callrate( x, method = "loc", by_pop = FALSE, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, bins = 50, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
method | Specify the type of report by locus (method='loc') orindividual (method='ind') [default 'loc']. |
by_pop | Whether report by population [default FALSE]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | User specified theme [default theme_dartR()]. |
plot_colors | Vector with two color names for the borders and fill[default two_colors]. |
bins | Number of bins to display in histograms [default 25]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function expects a genlight object, containing either SNP data orSilicoDArT (=presence/absence data).
Callrate is summarized by locus or by individual to allow sensible decisionson thresholds for filtering taking into consideration consequential loss ofdata. The summary is in the form of a tabulation and plots.
Plot themes can be obtained from:
Resultant ggplots and the tabulation are saved to the session's temporarydirectory.
Value
Returns unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
# SNP data test.gl <- testset.gl[1:20,] gl.report.callrate(test.gl) gl.report.callrate(test.gl,method='ind')# Tag P/A data test.gs <- testset.gs[1:20,] gl.report.callrate(test.gs) gl.report.callrate(test.gs,method='ind') test.gl <- testset.gl[1:20,] gl.report.callrate(test.gl)Calculates diversity indexes for SNPs
Description
This script takes a genlight object and calculates alpha and beta diversityfor q = 0:2. Formulas are taken from Sherwin et al. 2017. The paper describesnicely the relationship between the different q levels and how they relate topopulation genetic processes such as dispersal and selection.
Usage
gl.report.diversity( x, plot.out = TRUE, pbar = TRUE, table = "DH", plot_theme = theme_dartR(), plot_colors = discrete_palette, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
pbar | Report on progress. Silent if set to FALSE [default TRUE]. |
table | Prints a tabular output to the console either 'D'=D values, or'H'=H values or 'DH','HD'=both or 'N'=no table. [default 'DH']. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | A color palette or a list with as many colors as there are populations in the dataset [default discrete_palette]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
For all indexes, the entropies (H) and corresponding effectivenumbers, i.e. Hill numbers (D), which reflect the number of needed entitiesto get the observed values, are calculated. In a nutshell, the alpha indexesbetween the different q-values should be similar if there is no deviationfrom expected allele frequencies and occurrences (e.g. all loci in HWE &equilibrium). If there is a deviation of an index, this links to a processcausing it, such as dispersal, selection or strong drift. For a detailedexplanation of all the indexes, we recommend resorting to the literatureprovided below. Confidence intervals are +/- 1 standard deviation.
Function's output
If the function's parameter "table" = "DH" (the default value) is used, the output of the function is 20 tables.
The first two show the number of loci used. The name of each of the rest of the tables starts with three terms separated by underscores.
The first term refers to the q value (0 to 2).
The second term refers to whether it is the diversity measure (H) or its transformation to Hill numbers (D).
The third term refers to whether the diversity is calculated within populations (alpha) or between populations (beta).
In the case of alpha diversity tables, standard deviations have their own table, which finishes with a fourth term: "sd".
In the case of beta diversity tables, standard deviations are in the upper triangle of the matrix and diversity values are in the lower triangle of the matrix.
Plots are saved to the temporal directory (tempdir) and can be accessed withthe functiongl.print.reports and listed with the functiongl.list.reports. Note that they can be accessed only in thecurrent R session because tempdir is cleared each time that the R sessionis closed.
Examples of other themes that can be used can be consulted in
Value
A list of entropy indexes for each level of q and equivalent numbersfor alpha and beta diversity.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr),Contributors: William B. Sherwin, Alexander Sentinella
References
Sherwin, W.B., Chao, A., Johst, L., Smouse, P.E. (2017). Information TheoryBroadens the Spectrum of Molecular Ecology and Evolution. TREE 32(12)948-963. doi:10.1016/j.tree.2017.09.12
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
div <- gl.report.diversity(bandicoot.gl[1:10,1:100], table = FALSE, pbar=FALSE)div$zero_H_alphadiv$two_H_betanames(div)Reports various statistics of genetic differentiation betweenpopulations with confident intervals
Description
This function calculates four genetic differentiation between populationsstatistics (see the "Details" section for further information).
Fst - Measure of the degree of genetic differentiation of subpopulations (Nei, 1987).
Fstp - Unbiased (i.e. corrected for sampling error, see explanation below) Fst (Nei, 1987).
Dest - Jost’s D (Jost, 2008).
Gst_H - Gst standardized by the maximum level that it can obtain forthe observed amount of genetic variation (Hedrick 2005).
Sampling errors arise because allele frequencies in our samples differ from those in the subpopulations from which they were taken (Holsinger, 2012).
Confident Intervals are obtained using bootstrapping.
Usage
gl.report.fstat( x, nboots = 0, conf = 0.95, CI.type = "bca", ncpus = 1, plot.stat = "Fstp", plot.display = TRUE, palette.divergent = gl.colors("div"), font.size = 0.5, plot.dir = NULL, plot.file = NULL, verbose = NULL, ...)Arguments
x | Name of the genlight object containing the SNP data [required]. |
nboots | Number of bootstrap replicates to obtain confident intervals[default 0]. |
conf | The confidence level of the required interval [default 0.95]. |
CI.type | Method to estimate confident intervals. One of"norm", "basic", "perc" or "bca" [default "bca"]. |
ncpus | Number of processes to be used in parallel operation. If ncpus> 1 parallel operation is activated,see "Details" section [default 1]. |
plot.stat | Statistic to plot. One of "Fst","Fstp","Dest" or "Gst_H"[default "Fstp"]. |
plot.display | If TRUE, a heatmap of the pairwise static chosen isdisplayed in the plot window [default TRUE]. |
palette.divergent | A color palette function for the heatmap plot[default gl.colors("div")]. |
font.size | Size of font for the labels of horizontal and vertical axesof the heatmap [default 0.5]. |
plot.dir | Directory in which to save files [default working directory]. |
plot.file | Name for the RDS binary file to save (base name only,exclude extension) [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity] |
... | Parameters passed to functionheatmap.2 (packagegplots). |
Details
Even though Fst and its relatives can predict evolutionary processes(Holsinger & Weir, 2009), they are not true measures of geneticdifferentiation in the sense that they are dependent on the diversitywithin populations (Meirmans & Hedrick, 2011), the number of populationsanalysed (Alcala & Rosenberg, 2017) and are not monotonic(Sherwin et al., 2017). Recent approaches have been developed toaccommodate these mathematical restrictions (G'ST; "Gst_H"; Hedrick, 2005,and Jost's D; "Dest"; Jost, 2008). More recently, novel approaches based oninformation theory (Mutual Information; Sherwin et al., 2017) and allelefrequencies (Allele Frequency Difference; Berner, 2019) have distinctproperties that make them valuable resources to interpret geneticdifferentiation between populations.
Note that each measure of genetic differentiation has advantages anddrawbacks, and the decision of using a particular measure is usuallybased on the research question.
Statistics calculated
The equations used to calculate the statistics are shown below.
Ho - Unbiased estimate of observed heterozygosity across subpopulations (Nei, 1987, pp. 164, eq. 7.38) is calculated as:

wherePkii represents the proportion of homozygoteii for allelei in individualk ands represents the numberof subpopulations.
Hs - Unbiased estimate of the expected heterozygosity under Hardy-Weinberg equilibrium across subpopulations (Nei, 1987, pp. 164,eq. 7.39) is calculated as:

whereñ is the harmonic mean ofnk (the number of individuals in each subpopulation),pki is the proportion (sometimes misleadingly called frequency) of allelei in subpopulationk.
Ht - Heterozygosity for the total population (Nei, 1987, pp. 164,eq. 7.40) is calculated as:

Dst - The average allele frequency differentiation between populations (Nei, 1987, pp. 163) is calculated as:

Htp - Unbiased estimate of Heterozygosity for the total population(Nei, 1987, pp. 165) is calculated as:

Dstp - Unbiased estimate of the average allele frequency differentiation between populations (Nei, 1987, pp. 165)
Fst - Measure of the extent of genetic differentiation of subpopulations (Nei, 1987, pp. 162, eq. 7.34) is calculated as:

Fstp - Unbiased measure of the extent of genetic differentiation of subpopulations (Nei, 1987, pp. 163, eq. 7.36) is calculated as:

Dest - Jost’s D (Jost, 2008, eq. 12)
Gst-max - The maximum level that Gst can obtain for the observed amount of genetic variation (Hedrick 2005, eq. 4a) is calculated as:

Gst-H - Gst standardized by the maximum level that it can obtain for the observed amount of genetic variation (Hedrick 2005, eq. 4b) is calculated as:

Confident Intervals
The uncertainty of a parameter, in this case the mean of the statistic, canbe summarised by a confidence interval (CI) which includes the true parametervalue with a specified probability (i.e. confidence level; the parameter"conf" in this function).
In this function, CI are obtained using Bootstrap which is an inferencemethod that samples with replacement the data (i.e. loci) and calculates thestatistics every time.
This function uses the functionboot (package boot) to performthe bootstrap replicates and the functionboot.ci(package boot) to perform the calculations for the CI.
Four different types of nonparametric CI can be calculated(parameter "CI.type" in this function):
First order normal approximation interval ("norm").
Basic bootstrap interval ("basic").
Bootstrap percentile interval ("perc").
Adjusted bootstrap percentile interval ("bca").
The studentized bootstrap interval ("stud") was not included in the CI typesbecause it is computationally intensive, it may produce estimates outsidethe range of plausible values and it has been found to be erratic inpractice, see for example the "Studentized (t) Intervals" section in:
www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-package
Nice tutorials about the different types of CI can be found at:
https://www.datacamp.com/tutorial/bootstrap-r
Efron and Tibshirani (1993, p. 162) and Davison and Hinkley(1997, p. 194) suggest that the number of bootstrap replicates shouldbe between 1000 and 2000.
It is important to note that unreliable confident intervals will beobtained if too few number of bootstrap replicates are used.Therefore, the functionboot.ci will throw warnings and errorsif bootstrap replicates are too few. Consider increasing then number ofbootstrap replicates to at least 200.
The "bca" interval is often cited as the best for theoretical reasons,however it may produce unstable results if the bootstrap distributionis skewed or has extreme values. For example, you might get the warning"extreme order statistics used as endpoints" or the error "estimatedadjustment 'a' is NA". In this case, you may want to use more bootstrapreplicates or a different method or check your data for outliers.
The error "estimated adjustment 'w' is infinite" means that the estimatedadjustment ‘w’ for the "bca" interval is infinite, which can happen whenthe empirical influence values are zero or very close to zero. This canbe caused by various reasons, such as:
The number of bootstrap replicates is too small, the statistic of interestis constant or nearly constant across the bootstrap samples, the datacontains outliers or extreme values.
You can try some possible solutions, such as:
Increasing the number of bootstrap replicates, using a different type ofbootstrap confidence interval or removing or transforming the outliers orextreme values.
Plotting
The plot can be customised by including any parameter(s) from the functionheatmap.2 (package gplots).
For the color palette you could try for example:
>library(viridis)
>res <- gl.report.fstat(platypus.gl, palette.divergent = viridis)
If a plot.file is given, the plot arising from this function is saved as an"RDS" binary file using the functionsaveRDS (package base);can be reloaded with functionreadRDS (package base). A filename must be specified for the plot to be saved.
If a plot directory (plot.dir) is specified, the gplot binary is saved tothat directory; otherwise to the tempdir().
Your plot might not shown in full because your 'Plots' pane is too small(in RStudio).Increase the size of the 'Plots' pane before running the function.Alternatively, use the parameter 'plot.file' to save the plot to a file.
Parallelisation
If the parameter ncpus > 1, parallelisation is enabled. In Windows, parallelcomputing employs a "socket" approach that starts new copies of R on eachcore. POSIX systems, on the other hand (Mac, Linux, Unix, and BSD),utilise a "forking" approach that replicates the whole current version ofR and transfers it to a new core.
Opening and terminating R sessions in each core involves a significantamount of processing time, therefore parallelisation in Windows machinesis only quicker than not usung parallelisation when nboots > 1000-2000.
Value
Two lists, the first list contains matrices with genetic statisticstaken pairwise by population, the second list contains tables with thegenetic statistics for each pair of populations. If nboots > 0, tables withthe four statistics calculated with Low Confidence Intervals (LCI) and HighConfidence Intervals (HCI).
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
References
Alcala, N., & Rosenberg, N. A. (2017). Mathematical constraints on FST:Biallelic markers in arbitrarily many populations. Genetics (206), 1581-1600.
Berner, D. (2019). Allele frequency difference AFD–an intuitive alternativeto FST for quantifying genetic population differentiation. Genes, 10(4), 308.
Davison AC, Hinkley DV (1997). Bootstrap Methods and their Application.Cambridge University Press: Cambridge.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals ofStatistics 7, 1–26.
Efron B, Tibshirani RJ (1993). An Introduction to the Bootstrap. Chapman andHall: London.
Hedrick, P. W. (2005). A standardized genetic differentiation measure.Evolution, 59(8), 1633-1638.
Holsinger, K. E. (2012). Lecture notes in population genetics.
Holsinger, K. E., & Weir, B. S. (2009). Genetics in geographically structuredpopulations: defining, estimating and interpreting FST. Nature ReviewsGenetics, 10(9), 639- 650.
Jost, L. (2008). GST and its relatives do not measure differentiation.Molecular Ecology, 17(18), 4015-4026.
Meirmans, P. G., & Hedrick, P. W. (2011). Assessing population structure:FST and related measures. Molecular Ecology Resources, 11(1), 5-18.
Nei, M. (1987). Molecular evolutionary genetics: Columbia University Press.
Sherwin, W. B., Chao, A., Jost, L., & Smouse, P. E. (2017). Informationtheory broadens the spectrum of molecular ecology and evolution. Trends inEcology & Evolution, 32(12), 948-963.
Examples
res <- gl.report.fstat(platypus.gl)Calculates the pairwise Hamming distance between DArT trimmed DNAsequences
Description
Hamming distance is calculated as the number of base differencesbetween two sequences which can be expressed as a count or a proportion.Typically, it is calculated between two sequences of equal length. In thecontext of DArT trimmed sequences, which differ in length but which areanchored to the left by the restriction enzyme recognition sequence, it issensible to compare the two trimmed sequences starting from immediately afterthe common recognition sequence and terminating at the last base of theshorter sequence.
Usage
gl.report.hamming( x, rs = 5, threshold = 3, taglength = 69, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, probar = FALSE, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
rs | Number of bases in the restriction enzyme recognition sequence[default 5]. |
threshold | Minimum acceptable base pair difference for display on theboxplot and histogram [default 3]. |
taglength | Typical length of the sequence tags [default 69]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
probar | If TRUE, then a progress bar is displayed on long loops[default TRUE]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The functiongl.filter.hamming will filter out one oftwo loci if their Hamming distance is less than a specified percentage
Hamming distance can be computed by exploiting the fact that the dot productof two binary vectors x and (1-y) counts the corresponding elements that aredifferent between x and y. This approach can also be used for vectors thatcontain more than two possible values at each position (e.g. A, C, T or G).
If a pair of DNA sequences are of differing length, the longer is truncated.
The algorithm is that of Johann de Jonghttps://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/as implemented inutils.hamming
Plots and table are saved to the session's temporary directory (tempdir)
Examples of other themes that can be used can be consulted in
Value
Returns unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
gl.report.hamming(testset.gl[,1:100])gl.report.hamming(testset.gs[,1:100])#' # SNP datatest <- platypus.gltest <- gl.subsample.loci(platypus.gl,n=50)result <- gl.filter.hamming(test, threshold=0.25, verbose=3)Reports observed, expected and unbiased heterozygosities and FIS(inbreeding coefficient) by population or by individual from SNP data
Description
Calculates the observed, expected and unbiased expected (i.e.corrected for sample size) heterozygosities and FIS (inbreeding coefficient)for each population or the observed heterozygosity for each individual in agenlight object.
Usage
gl.report.heterozygosity( x, method = "pop", n.invariant = 0, nboots = 0, conf = 0.95, CI.type = "bca", ncpus = 1, plot.display = TRUE, plot.theme = theme_dartR(), plot.colors.pop = gl.colors("dis"), plot.colors.ind = gl.colors(2), error.bar = "SD", save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP [required]. |
method | Calculate heterozygosity by population (method='pop') or byindividual (method='ind') [default 'pop']. |
n.invariant | An estimate of the number of invariant sequence tags usedto adjust the heterozygosity rate [default 0]. |
nboots | Number of bootstrap replicates to obtain confident intervals[default 0]. |
conf | The confidence level of the required interval [default 0.95]. |
CI.type | Method to estimate confident intervals. One of"norm", "basic", "perc" or "bca" [default "bca"]. |
ncpus | Number of processes to be used in parallel operation. If ncpus> 1 parallel operation is activated,see "Details" section [default 1]. |
plot.display | Specify if plot is to be produced [default TRUE]. |
plot.theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot.colors.pop | A color palette for population plots or a list withas many colors as there are populations in the dataset[default gl.colors("dis")]. |
plot.colors.ind | List of two color names for the borders and fill ofthe plot by individual [default gl.colors(2)]. |
error.bar | statistic to be plotted as error bar either "SD" (standard deviation) or "SE" (standard error) or "CI" (confident intervals)[default "SD"]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
Observed heterozygosity for a population takes the proportion ofheterozygous loci for each individual then averages over the individuals inthat population. The calculations take into account missing values.Expected heterozygosity for a population takes the expected proportion ofheterozygotes, that is, expected under Hardy-Weinberg equilibrium, for eachlocus, then averages this across the loci for an average estimate for thepopulation.
Expected heterozygosity is calculated using the correction for sample sizefollowing equation 2 from Nei 1978.
Observed heterozygosity for individuals is calculated as the proportion ofloci that are heterozygous for that individual.
Finally, the loci that are invariant across all individuals in the dataset(that is, across populations), is typically unknown. This can renderestimates of heterozygosity analysis specific, and so it is not valid tocompare such estimates across species or even across different analyses. Thisis a similar problem faced by microsatellites. If you have an estimate of thenumber of invariant sequence tags (loci) in your data, such as provided bygl.report.secondaries, you can specify it with the n.invariantparameter to standardize your estimates of heterozygosity.
NOTE: It is important to realise that estimation of adjustedheterozygosity requires that secondaries not to be removed.
Heterozygosities and FIS (inbreeding coefficient) are calculated by locuswithin each population using the following equations:
Observed heterozygosity (Ho) = number of homozygotes / n_Ind,where n_Ind is the number of individuals without missing data.
Observed heterozygosity adjusted (Ho.adj) <- Ho * n_Loc /(n_Loc + n.invariant),where n_Loc is the number of loci that do not have all missing data andn.invariant is an estimate of the number of invariant loci to adjustheterozygosity.
Expected heterozygosity (He) = 1 - (p^2 + q^2),where p is the frequency of the reference allele and q is the frequency ofthe alternative allele.
Expected heterozygosity adjusted (He.adj) = He * n_Loc /(n_Loc + n.invariant)
Unbiased expected heterozygosity (uHe) = He * (2 * n_Ind /(2 * n_Ind - 1))
Inbreeding coefficient (FIS) = 1 - (mean(Ho) / mean(uHe))
Function's outputOutput for method='pop' is an ordered barchart of observed heterozygosity,unbiased expected heterozygosity and FIS (Inbreeding coefficient) across populationstogether with a table of mean observed and expected heterozygosities and FISby population and their respective standard deviations (SD).In the output, it is also reported by population: the number of loci used toestimate heterozygosity(n.Loc), the number of polymorphic loci (polyLoc),the number of monomorphic loci (monoLoc) and loci with all missing data(all_NALoc).Output for method='ind' is a histogram and a boxplot of heterozygosity acrossindividuals.Plots and table are saved to the session temporary directory (tempdir)Examples of other themes that can be used can be consulted in
Error bars
The best method for presenting or assessing genetic statistics depends on the type of data you have and the specific questions you're trying to answer. Here's a brief overview of when you might use each method:
1. Confidence Intervals ("CI"):
- Usage: Often used to convey the precision of an estimate.
- Advantage: Confidence intervals give a range in which the true parameter (like a population mean) is likely to fall, given the data and a specified probability (like 95
- In Context: For genetic statistics, if you're estimating a parameter,a 95lies.
2. Standard Deviation ("SD"):
- Usage: Describes the amount of variation from the average in a set of data.
- Advantage: Allows for an understanding of the spread of individual datapoints around the mean.
- In Context: If you're looking at the distribution of a quantitative trait (like height) in a population with a particular genotype, the SD can describe how much individual heights vary around the average height.
3. Standard Error ("SE"):
- Usage: Describes the precision of the sample mean as an estimate of the population mean.
- Advantage: Smaller than the SD in large samples; it takes into account both the SD and the sample size.
- In Context: If you want to know how accurately your sample mean representsthe population mean, you'd look at the SE.
Recommendation:
- If you're trying to convey the precision of an estimate, confidence intervals are very useful.
- For understanding variability within a sample, standard deviation is key.
- To see how well a sample mean might estimate a population mean, consider the standard error.
In practice, geneticists often use a combination of these methods to analyze and present their data, depending on their research questions and the nature of the data.
Confident Intervals
The uncertainty of a parameter, in this case the mean of the statistic, canbe summarised by a confidence interval (CI) which includes the true parametervalue with a specified probability (i.e. confidence level; the parameter"conf" in this function).
In this function, CI are obtained using Bootstrap which is an inferencemethod that samples with replacement the data (i.e. loci) and calculates thestatistics every time.
This function uses the functionboot (package boot) to performthe bootstrap replicates and the functionboot.ci(package boot) to perform the calculations for the CI.
Four different types of nonparametric CI can be calculated(parameter "CI.type" in this function):
First order normal approximation interval ("norm").
Basic bootstrap interval ("basic").
Bootstrap percentile interval ("perc").
Adjusted bootstrap percentile interval ("bca").
The studentized bootstrap interval ("stud") was not included in the CI typesbecause it is computationally intensive, it may produce estimates outsidethe range of plausible values and it has been found to be erratic inpractice, see for example the "Studentized (t) Intervals" section in:
www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-packageEfron and Tibshirani (1993, p. 162) and Davison and Hinkley(1997, p. 194) suggest that the number of bootstrap replicates shouldbe between 1000 and 2000.
It is important to note that unreliable confident intervals will beobtained if too few number of bootstrap replicates are used.Therefore, the functionboot.ci will throw warnings and errorsif bootstrap replicates are too few. Consider increasing then number ofbootstrap replicates to at least 200.
The "bca" interval is often cited as the best for theoretical reasons,however it may produce unstable results if the bootstrap distributionis skewed or has extreme values. For example, you might get the warning"extreme order statistics used as endpoints" or the error "estimatedadjustment 'a' is NA". In this case, you may want to use more bootstrapreplicates or a different method or check your data for outliers.
The error "estimated adjustment 'w' is infinite" means that the estimatedadjustment ‘w’ for the "bca" interval is infinite, which can happen whenthe empirical influence values are zero or very close to zero. This canbe caused by various reasons, such as:
The number of bootstrap replicates is too small, the statistic of interestis constant or nearly constant across the bootstrap samples, the datacontains outliers or extreme values.
You can try some possible solutions, such as:
Increasing the number of bootstrap replicates, using a different type ofbootstrap confidence interval or removing or transforming the outliers orextreme values.
Parallelisation
If the parameter ncpus > 1, parallelisation is enabled. In Windows, parallelcomputing employs a "socket" approach that starts new copies of R on eachcore. POSIX systems, on the other hand (Mac, Linux, Unix, and BSD),utilise a "forking" approach that replicates the whole current version ofR and transfers it to a new core.
Opening and terminating R sessions in each core involves a significantamount of processing time, therefore parallelisation in Windows machinesis only quicker than not usung parallelisation when nboots > 1000-2000.
Value
A dataframe containing population labels, heterozygosities, FIS,their standard deviations and sample sizes
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Nei, M. (1978). Estimation of average heterozygosity and genetic distancefrom a small number of individuals. Genetics, 89(3), 583-590.
See Also
Other unmatched report:gl.allele.freq()
Examples
require("dartR.data")df <- gl.report.heterozygosity(platypus.gl)df <- gl.report.heterozygosity(platypus.gl,method='ind')n.inv <- gl.report.secondaries(platypus.gl)gl.report.heterozygosity(platypus.gl, n.invariant = n.inv[7, 2])df <- gl.report.heterozygosity(platypus.gl)Reports departure from Hardy-Weinberg proportions
Description
Calculates the probabilities of agreement with H-W proportions based on observedfrequencies of reference homozygotes, heterozygotes and alternate homozygotes.
Usage
gl.report.hwe( x, subset = "each", method_sig = "Exact", multi_comp = FALSE, multi_comp_method = "BY", alpha_val = 0.05, pvalue_type = "midp", cc_val = 0.5, sig_only = TRUE, min_sample_size = 5, plot.out = TRUE, plot_colors = two_colors_contrast, max_plots = 4, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
subset | Way to group individuals to perform H-W tests. Either a vectorwith population names, 'each', 'all' (see details) [default 'each']. |
method_sig | Method for determining statistical significance: 'ChiSquare'or 'Exact' [default 'Exact']. |
multi_comp | Whether to adjust p-values for multiple comparisons[default FALSE]. |
multi_comp_method | Method to adjust p-values for multiple comparisons:'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr'(see details) [default 'fdr']. |
alpha_val | Level of significance for testing [default 0.05]. |
pvalue_type | Type of p-value to be used in the Exact method.Either 'dost','selome','midp' (see details) [default 'midp']. |
cc_val | The continuity correction applied to the ChiSquare test[default 0.5]. |
sig_only | Whether the returned table should include loci with a significant departure from Hardy-Weinberg proportions [default TRUE]. |
min_sample_size | Minimum number of individuals per population in whichperform H-W tests [default 5]. |
plot.out | If TRUE, will produce Ternary Plot(s) [default TRUE]. |
plot_colors | Vector with two color names for the significant andnot-significant loci [default two_colors_contrast]. |
max_plots | Maximum number of plots to print per page [default 4]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
There are several factors that can cause deviations from Hardy-Weinbergproportions including: mutation, finite population size, selection,population structure, age structure, assortative mating, sex linkage,nonrandom sampling and genotyping errors. Therefore, testing forHardy-Weinberg proportions should be a process that involves a carefulevaluation of the results, a good place to start is Waples (2015).
Note that tests for H-W proportions are only valid if there is no populationsubstructure (assuming random mating) and have sufficient power only whenthere is sufficient sample size (n individuals > 15).
Populations can be defined in three ways:
Merging all populations in the dataset using subset = 'all'.
Within each population separately using: subset = 'each'.
Within selected populations using for example: subset =c('pop1','pop2').
Two different statistical methods to test for deviations from Hardy Weinbergproportions:
The classical chi-square test (method_sig='ChiSquare') based on thefunction
HWChisqof the R package HardyWeinberg.By default a continuity correction is applied (cc_val=0.5). Thecontinuity correction can be turned off (by specifying cc_val=0), for examplein cases of extreme allele frequencies in which the continuity correction canlead to excessive type 1 error rates.The exact test (method_sig='Exact') based on the exact calculationscontained in the function
HWExactStatsof the Rpackage HardyWeinberg, and described in Wigginton et al. (2005). The exacttest is recommended in most cases (Wigginton et al., 2005).Three different methods to estimate p-values (pvalue_type) in the Exact testcan be used:'dost' p-value is computed as twice the tail area of a one-sided test.
'selome' p-value is computed as the sum of the probabilities of allsamples less or equally likely as the current sample.
'midp', p-value is computed as half the probability of the currentsample + the probabilities of all samples that are more extreme.
The standard exact p-value is overly conservative, in particularfor small minor allele frequencies. The mid p-value ameliorates this problemby bringing the rejection rate closer to the nominal level, at the price ofoccasionally exceeding the nominal level (Graffelman & Moreno, 2013).
Correction for multiple tests can be applied using the following methodsbased on the functionp.adjust:
'holm' is also known as the sequential Bonferroni technique (Rice,1989). This method has a greater statistical power than the standardBonferroni test, however this method becomes very stringent when many testsare performed and many real deviations from the null hypothesis can goundetected (Waples, 2015).
'hochberg' based on Hochberg, 1988.
'hommel' based on Hommel, 1988. This method is more powerful thanHochberg's, but the difference is usually small.
'bonferroni' in which p-values are multiplied by the number of tests.This method is very stringent and therefore has reduced power to detectmultiple departures from the null hypothesis.
'BH' based on Benjamini & Hochberg, 1995.
'BY' based on Benjamini & Yekutieli, 2001.
The first four methods are designed to give strong control of the family-wiseerror rate. The last two methods control the false discovery rate (FDR),the expected proportion of false discoveries among the rejected hypotheses.The false discovery rate is a less stringent condition than the family-wiseerror rate, so these methods are more powerful than the others, especiallywhen number of tests is large.The number of tests on which the adjustment for multiple comparisons isthe number of populations times the number of loci.
Ternary plots
Ternary plots can be used to visualise patterns of H-W proportions (plot.out= TRUE). P-values and the statistical (non)significance of a large number ofbi-allelic markers can be inferred from their position in a ternary plot.See Graffelman & Morales-Camarena (2008) for further details. Ternary plotsare based on the functionHWTernaryPlot fromthe package HardyWeinberg. Each vertex of the Ternary plot represents one of the three possible genotypes for SNP data: homozygous for the reference allele (AA), heterozygous (AB) and homozygous for the alternative allele(BB). Loci deviating significantly from Hardy-Weinberg proportions after correction for multiple tests are shown in pink. The blue parabola represents Hardy-Weinberg equilibrium, and the area between green lines represents the acceptance region.
For these plots to work it is necessary to install the package ggtern.
Value
A dataframe containing loci, counts of reference SNP homozygotes,heterozygotes and alternate SNP homozygotes; probability of departure fromH-W proportions, per locus significance with and without correction formultiple comparisons and the number of population where the same locus is significantly out of HWE.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
References
Benjamini, Y., and Yekutieli, D. (2001). The control of the falsediscovery rate in multiple testing under dependency. Annals of Statistics,29, 1165–1188.
Graffelman, J. (2015). Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64:1-23.
Graffelman, J. & Morales-Camarena, J. (2008). Graphical tests forHardy-Weinberg equilibrium based on the ternary plot. Human Heredity65:77-84.
Graffelman, J., & Moreno, V. (2013). The mid p-value in exact tests forHardy-Weinberg equilibrium. Statistical applications in genetics andmolecular biology, 12(4), 433-448.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple testsof significance. Biometrika, 75, 800–803.
Hommel, G. (1988). A stagewise rejective multiple test procedure basedon a modified Bonferroni test. Biometrika, 75, 383–386.
Rice, W. R. (1989). Analyzing tables of statistical tests. Evolution,43(1), 223-225.
Waples, R. S. (2015). Testing for Hardy–Weinberg proportions: have welost the plot?. Journal of heredity, 106(1), 1-19.
Wigginton, J.E., Cutler, D.J., & Abecasis, G.R. (2005). A Note on ExactTests of Hardy-Weinberg Equilibrium. American Journal of Human Genetics76:887-893.
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Calculates pairwise linkage disequilibrium by population
Description
This function calculates pairwise linkage disequilibrium (LD) by population using the functionld (package snpStats).
If SNPs are not mapped to a reference genome, the parameterld_max_pairwiseshould be set as NULL (the default). In this case, the function will assign the same chromosome ("1") to all the SNPs in the datasetand assign a sequence from 1 to n loci as the position of each SNP. The function will then calculate LD for all possible SNP pair combinations.
If SNPs are mapped to a reference genome, the parameterld_max_pairwiseshould be filled out (i.e. not NULL). In this case, theinformation for SNP's position should be stored in the genlight accessor"@position" and the SNP's chromosome name in the accessor "@chromosome"(see examples). The function will then calculate LD within each chromosomeand for all possible SNP pair combinations within a distance ofld_max_pairwise.
Usage
gl.report.ld.map( x, ld_max_pairwise = NULL, maf = 0.05, ld_stat = "R.squared", ind.limit = 10, stat_keep = "AvgPIC", ld_threshold_pops = 0.2, plot.out = TRUE, plot_theme = NULL, histogram_colors = NULL, boxplot_colors = NULL, bins = 50, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
ld_max_pairwise | Maximum distance in number of base pairs at which LD should be calculated [default NULL]. |
maf | Minor allele frequency (by population) threshold to filter out loci. If a value > 1 is provided it will be interpreted as MAC (i.e. theminimum number of times an allele needs to be observed) [default 0.05]. |
ld_stat | The LD measure to be calculated: "LLR", "OR", "Q", "Covar","D.prime", "R.squared", and "R". See |
ind.limit | Minimum number of individuals that a population shouldcontain to take it in account to report loci in LD [default 10]. |
stat_keep | Name of the column from the slot |
ld_threshold_pops | LD threshold to report in the plot of "Number of populations in which the same SNP pair are in LD" [default 0.2]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | User specified theme [default NULL]. |
histogram_colors | Vector with two color names for the borders and fill[default NULL]. |
boxplot_colors | A color palette for box plots by population or a listwith as many colors as there are populations in the dataset[default NULL]. |
bins | Number of bins to display in histograms [default 50]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function reports LD between SNP pairs by population. The functiongl.filter.ld filters out the SNPs in LD using asinput the results ofgl.report.ld.map. The actual number of SNPs to be filtered out depends on the parameters set in the functiongl.filter.ld.
Boxplots of LD by population anda histogram showing LD frequency are presented.
Value
A dataframe with information for each SNP pair in LD.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.filter.ld
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
require("dartR.data")x <- platypus.glx <- gl.filter.callrate(x,threshold = 1)x <- gl.filter.monomorphs(x)x$position <- x$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1x$chromosome <- as.factor(x$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)ld_res <- gl.report.ld.map(x,ld_max_pairwise = 10000000)Reports summary of the slot $other$loc.metrics
Description
This script uses any field with numeric values stored in $other$loc.metricsto produce summary statistics (mean, minimum, average, quantiles), histogramsand boxplots to assist the decision of choosing thresholds for the filterfunctiongl.filter.locmetric.
Usage
gl.report.locmetric( x, metric, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
metric | Name of the metric to be used for filtering [required]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
The functiongl.filter.locmetric will filter out theloci with a locmetric value below a specified threshold.
The fields that are included in dartR, and a short description, are foundbelow. Optionally, the user can also set his/her own field by adding a vectorinto $other$loc.metrics as shown in the example. You can check the names ofall available loc.metrics via: names(gl$other$loc.metrics).
SnpPosition - position (zero is position 1) in the sequence tag of thedefined SNP variant base.
CallRate - proportion of samples for which the genotype call isnon-missing (that is, not '-' ).
OneRatioRef - proportion of samples for which the genotype score is 0.
OneRatioSnp - proportion of samples for which the genotype score is 2.
FreqHomRef - proportion of samples homozygous for the Reference allele.
FreqHomSnp - proportion of samples homozygous for the Alternate (SNP)allele.
FreqHets - proportion of samples which score as heterozygous, that is,scored as 1.
PICRef - polymorphism information content (PIC) for the Reference allele.
PICSnp - polymorphism information content (PIC) for the SNP.
AvgPIC - average of the polymorphism information content (PIC) of thereference and SNP alleles.
AvgCountRef - sum of the tag read counts for all samples, divided by thenumber of samples with non-zero tag read counts, for the Reference allele row.
AvgCountSnp - sum of the tag read counts for all samples, divided by thenumber of samples with non-zero tag read counts, for the Alternate (SNP) allelerow.
RepAvg - proportion of technical replicate assay pairs for which themarker score is consistent.
rdepth - read depth.
Function's output
The minimum, maximum, mean and a tabulation of quantiles of the locmetricvalues against thresholds rate are provided. Output also includes a boxplotand a histogram.
Quantiles are partitions of a finite set of values into q subsets of (nearly)equal sizes. In this function q = 20. Quantiles are useful measures becausethey are less susceptible to long-tailed distributions and outliers.
Plots and table were saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in:
Value
An unaltered genlight object.
Author(s)
Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
gl.filter.locmetric,gl.list.reports,gl.print.reports
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
# adding dummy datatest <- testset.gltest$other$loc.metrics$test <- 1:nLoc(test)# SNP dataout <- gl.report.locmetric(test,metric='test')# adding dummy datatest.gs <- testset.gstest.gs$other$loc.metrics$test <- 1:nLoc(test.gs)# Tag P/A dataout <- gl.report.locmetric(test.gs,metric='test')Reports minor allele frequency (MAF) for each locus in a SNP dataset
Description
This script provides summary histograms of MAF for eachpopulation in the dataset and an overall histogram to assist the decision ofchoosing thresholds for the filter functiongl.filter.maf
Usage
gl.report.maf( x, maf.limit = 0.5, ind.limit = 5, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors_pop = discrete_palette, plot_colors_all = two_colors, bins = 25, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
maf.limit | Show histograms MAF range <= maf.limit [default 0.5]. |
ind.limit | Show histograms only for populations of size greater thanind.limit [default 5]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors_pop | A color palette for population plots[default discrete_palette]. |
plot_colors_all | List of two color names for the borders and fill ofthe overall plot [default two_colors]. |
bins | Number of bins to display in histograms [default 25]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
The functiongl.filter.maf will filter out theloci with MAF below a specified threshold.
Function's output
The minimum, maximum, mean and a tabulation of MAF quantiles againstthresholds rate are provided. Output also includes a boxplot and ahistogram.
This function reports theMAF for each of several quantiles. Quantiles arepartitions of a finite set of values into q subsets of (nearly) equal sizes.In this function q = 20. Quantiles are useful measures because they are lesssusceptible to long-tailed distributions and outliers.
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in
Value
An unaltered genlight object
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
See Also
gl.filter.maf,gl.list.reports,gl.print.reports
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
gl <- gl.report.maf(platypus.gl)Reports monomorphic loci
Description
This script reports the number of monomorphic loci and those with all NAs ina genlight {adegenet} object
Usage
gl.report.monomorphs(x, verbose = NULL)Arguments
x | Name of the input genlight object [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
A DArT dataset will not have monomorphic loci, but they can arise, along withloci that are scored all NA, when populations or individuals are deleted.Retaining monomorphic loci unnecessarily increases the size of the datasetand will affect some calculations.
Note that for SNP data, NAs likely represent null alleles; in tagpresence/absence data, NAs represent missing values (presence/absence couldnot be reliably scored)
Value
An unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
# SNP data gl.report.monomorphs(testset.gl)# SilicoDArT data gl.report.monomorphs(testset.gs)Reports loci for which the SNP has been trimmed from the sequence tagalong with the adaptor
Description
This function checks the position of the SNP within the trimmed sequence tagand identifies those for which the SNP position is outside the trimmedsequence tag. This can happen, rarely, when the sequence containing the SNPresembles the adaptor.
Usage
gl.report.overshoot(x, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object [required]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
The SNP genotype can still be used in most analyses, but functions likegl2fasta() will present challenges if the SNP has been trimmed from the sequence tag.
Resultant ggplot(s) and the tabulation(s) are saved to the session'stemporary directory.
Value
An unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
gl.report.overshoot(testset.gl)Reports private alleles (and fixed alleles) per pair of populations
Description
This function reports private alleles in one population compared with asecond population, for all populations taken pairwise. It also reports acount of fixed allelic differences and the mean absolute allele frequencydifferences (AFD) between pairs of populations.
Usage
gl.report.pa( x, x2 = NULL, method = "pairwise", loc_names = FALSE, plot.out = TRUE, font_plot = 14, map.interactive = FALSE, provider = "Esri.NatGeoWorldMap", palette_discrete = discrete_palette, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or SilicoDArT data [required]. |
x2 | If two separate genlight objects are to be compared this can beprovided here, but they must have the same number of SNPs [default NULL]. |
method | Method to calculate private alleles: 'pairwise' comparison orcompare each population against the rest 'one2rest' [default 'pairwise']. |
loc_names | Whether names of loci with private alleles and fixed differences should reported. If TRUE, loci names are reported using a list[default FALSE]. |
plot.out | Specify if Sankey plot is to be produced [default TRUE]. |
font_plot | Numeric font size in pixels for the node text labels[default 14]. |
map.interactive | Specify whether an interactive map showing privatealleles between populations is to be produced [default FALSE]. |
provider | Passed to leaflet [default "Esri.NatGeoWorldMap"]. |
palette_discrete | A discrete palette for the color of populations or alist with as many colors as there are populations in the dataset[default discrete_palette]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent, fatal errors only; 1, flag functionbegin and end; 2, progress log; 3, progress and results summary; 5, fullreport [default 2 or as specified using gl.set.verbosity]. |
Details
Note that the number of paired alleles between two populations is not asymmetric dissimilarity measure.
If no x2 is provided, the function uses the pop(gl) hierarchy to determinepairs of populations, otherwise it runs a single comparison between x andx2.
Hint: in case you want to run comparisons between individuals(assuming individual names are unique), you can simply redefine yourpopulation names with your individual names, as below:
pop(gl) <- indNames(gl)
Definition of fixed and private alleles
The table below shows the possible cases of allele frequencies betweentwo populations (0 = homozygote for Allele 1, x = both Alleles are present,1 = homozygote for Allele 2).
p: cases where there is a private allele in pop1 compared to pop2 (butnot vice versa)
f: cases where there is a fixed allele in pop1 (and pop2, as those casesare symmetric)
| pop1 | ||||
| 0 | x | 1 | ||
| 0 | - | p | p,f | |
| pop2 | x | - | - | - |
| 1 | p,f | p | - | |
The absolute allele frequency difference (AFD) in this function is a simpledifferentiation metric displaying intuitive properties which provides avaluable alternative to FST. For details about its properties and how it iscalculated see Berner (2019).
The function also reports an estimation of the lower bound of the number ofundetected private alleles using the Good-Turing frequency formula,originally developed for cryptography, which estimates in an ecological context the true frequencies of rare species in a single assemblage based onan incomplete sample of individuals. The approach is described in Chao et al. (2017). For this function, the equation 2c is used. This estimate is reported in the output table as Chao1 and Chao2.
In this function a Sankey Diagram is used to visualize patterns of privatealleles between populations. This diagram allows to display flows (privatealleles) between nodes (populations). Their links are represented with arcsthat have a width proportional to the importance of the flow (number ofprivate alleles).
if save2temp=TRUE, resultant plot(s) and the tabulation(s) are saved to thesession's temporary directory.
Value
A data.frame. Each row shows, for each pair of populations the numberof individuals in each population, the number of loci with fixed differences(same for both populations) in pop1 (compared to pop2) and vice versa. Samefor private alleles and finally the absolute mean allele frequencydifference between loci (AFD). If loc_names = TRUE, loci names with privatealleles and fixed differences are reported in a list in addition to the dataframe.
Author(s)
Custodian: Bernd Gruber – Post tohttps://groups.google.com/d/forum/dartr
References
Berner, D. (2019). Allele frequency difference AFD – an intuitivealternative to FST for quantifying genetic population differentiation. Genes,10(4), 308.
Chao, Anne, et al. "Deciphering the enigma of undetected species,phylogenetic, and functional diversity based on Good-Turing theory." Ecology 98.11 (2017): 2914-2929.
See Also
gl.list.reports,gl.print.reports
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
out <- gl.report.pa(platypus.gl)Identifies putative parent offspring within a population
Description
This script examines the frequency of pedigree inconsistent loci, that is,those loci that are homozygotes in the parent for the reference allele, andhomozygous in the offspring for the alternate allele. This condition is notconsistent with any pedigree, regardless of the (unknown) genotype of theother parent. The pedigree inconsistent loci are counted as an indication ofwhether or not it is reasonable to propose the two individuals are in aparent-offspring relationship.
Usage
gl.report.parent.offspring( x, min.rdepth = 12, min.reproducibility = 1, range = 1.5, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP genotypes [required]. |
min.rdepth | Minimum read depth to include in analysis [default 12]. |
min.reproducibility | Minimum reproducibility to include in analysis[default 1]. |
range | Specifies the range to extend beyond the interquartile range fordelimiting outliers [default 1.5 interquartile ranges]. |
plot.out | Creates a plot that shows the sex linked markers[default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
If two individuals are in a parent offspring relationship, the true number ofpedigree inconsistent loci should be zero, but SNP calling is not infallible.Some loci will be miss-called. The problem thus becomes one of determiningif the two focal individuals have a count of pedigree inconsistent loci lessthan would be expected of typical unrelated individuals. There are some quitesophisticated software packages available to formally apply likelihoods tothe decision, but we use a simple outlier comparison.
To reduce the frequency of miss-calls, and so emphasize the differencebetween true parent-offspring pairs and unrelated pairs, the data can befiltered on read depth.
Typically minimum read depth is set to 5x, but you can examine thedistribution of read depths with the functiongl.report.rdepthand push this up with an acceptable loss of loci. 12x might be a good minimumfor this particular analysis. It is sensible also to push the minimumreproducibility up to 1, if that does not result in an unacceptable loss ofloci. Reproducibility is stored in the slot@other$loc.metrics$RepAvgand is defined as the proportion of technical replicate assay pairs for whichthe marker score is consistent. You can examine the distribution ofreproducibility with the functiongl.report.reproducibility.
Note that the null expectation is not well defined, and the power reduced, ifthe population from which the putative parent-offspring pairs are drawncontains many sibs. Note also that if an individual has been genotyped twicein the dataset, the replicate pair will be assessed by this script as beingin a parent-offspring relationship.
The functiongl.filter.parent.offspring will filter out thoseindividuals in a parent offspring relationship.
Note that if your dataset does not contain RepAvg or rdepth among the locusmetrics, the filters for reproducibility and read depth are no used.
Function's output
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in
Value
A set of individuals in parent-offspring relationship. NULL if noparent-offspring relationships were found.
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
See Also
gl.list.reports,gl.report.rdepth ,gl.print.reports,gl.report.reproducibility,gl.filter.parent.offspring
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
out <- gl.report.parent.offspring(testset.gl[1:10,1:100])Reports summary of Read Depth for each locus
Description
SNP datasets generated by DArT report AvgCountRef and AvgCountSnp as countsof sequence tags for the reference and alternate alleles respectively.These can be used to back calculate Read Depth. Fragment presence/absencedatasets as provided by DArT (SilicoDArT) provide Average Read Depth andStandard Deviation of Read Depth as standard columns in their report. Thisfunction reports the read depth by locus for each of several quantiles.
Usage
gl.report.rdepth( x, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The function displays a table of minimum, maximum, mean and quantiles forread depth against possible thresholds that might subsequently be specifiedingl.filter.rdepth. If plot.out=TRUE, display also includes aboxplot and a histogram to guide in the selection of a threshold forfiltering on read depth.
If save2tmp=TRUE, ggplots and relevant tabulations are saved to thesession's temp directory (tempdir).
For examples of themes, see
Value
An unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
# SNP datadf <- gl.report.rdepth(testset.gl)df <- gl.report.rdepth(testset.gs)Identify replicated individuals
Description
Identify replicated individuals
Usage
gl.report.replicates( x, loc_threshold = 100, perc_geno = 0.99, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, bins = 100, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
loc_threshold | Minimum number of loci required to asses that two individuals are replicates [default 100]. |
perc_geno | Mimimum percentage of genotypes in which two individuals should be the same [default 0.99]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | User specified theme [default theme_dartR()]. |
plot_colors | Vector with two color names for the borders and fill[default two_colors]. |
bins | Number of bins to display in histograms [default 100]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function uses an C++ implementation, so package Rcpp needs to be installed and it is therefore fast (once it has compiled the function after the first run).
Ideally, in a large dataset with related and unrelated individuals and several replicated individuals, such as in a capture/mark/recapture study, the first histogram should have four "peaks". The first peak should representunrelated individuals, the second peak should correspond to second-degree relationships (such as cousins), the third peak should represent first-degree relationships (like parent/offspring and full siblings), andthe fourth peak should represent replicated individuals.
In order to ensure that replicated individuals are properly identified, it'simportant to have a clear separation between the third and fourth peaks in the second histogram. This means that there should be bins with zero counts between these two peaks.
Value
A list with three elements:
table.rep: A dataframe with pairwise results of percentage of same genotypes between two individuals, the number of loci used in the comparison and the missing data for each individual.
ind.list.drop: A vector of replicated individuals to be dropped.Replicated individual with the least missing data is reported.
ind.list.rep: A list of of each individual that has replicates in the dataset, the name of the replicates and the percentage of the same genotype.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
res_rep <- gl.report.replicates(platypus.gl, loc_threshold = 500, perc_geno = 0.85)Reports summary of RepAvg (repeatability averaged over both alleles foreach locus) or reproducibility (repeatability of the scores for fragmentpresence/absence)
Description
SNP datasets generated by DArT have an index, RepAvg, generated byreproducing the data independently for 30of alleles that give a repeatable result, averaged over both alleles for eachlocus.
In the case of fragment presence/absence data (SilicoDArT), repeatability isthe percentage of scores that are repeated in the technical replicatedataset.
Usage
gl.report.reproducibility( x, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
plot.out | If TRUE, displays a plot to guide the decision on a filterthreshold [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The function displays a table of minimum, maximum, mean and quantiles forrepeatbility against possible thresholds that might subsequently bespecified ingl.filter.reproducibility.
If plot.out=TRUE, display also includes a boxplot and a histogram to guidein the selection of a threshold for filtering on repeatability.
If save2tmp=TRUE, ggplots and relevant tabulations are saved to thesession's temp directory (tempdir)
For examples of themes, see:
Value
An unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.secondaries(),gl.report.sexlinked(),gl.report.taglength()
Examples
# SNP data out <- gl.report.reproducibility(testset.gl) # Tag P/A data out <- gl.report.reproducibility(testset.gs)Reports loci containing secondary SNPs in sequence tags and calculatesnumber of invariant sites
Description
SNP datasets generated by DArT include fragments with more thanone SNP (that is, with secondaries). They are recorded separately with thesame CloneID (=AlleleID). These multiple SNP loci within a fragment arelikely to be linked, and so you may wish to remove secondaries.
This function reports statistics associated with secondaries, and theconsequences of filtering them out, and provides three plots. The first isa boxplot, the second is a barplot of the frequency of secondaries persequence tag, and the third is the Poisson expectation for thosefrequencies including an estimate of the zero class (no. of sequence tagswith no SNP scored).
Usage
gl.report.secondaries( x, nsim = 1000, taglength = 69, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
nsim | The number of simulations to estimate the mean of the Poissondistribution [default 1000]. |
taglength | Typical length of the sequence tags [default 69]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
plot_theme | Theme for the plot. See Details for options [defaulttheme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
The functiongl.filter.secondaries will filter out theloci with secondaries retaining only one sequence tag.
Heterozygosity as estimated by the functiongl.report.heterozygosity is in a sense relative, because itis calculated against a background of only those loci that are polymorphicsomewhere in the dataset. To allow intercompatibility across studies andspecies, any measure of heterozygosity needs to accommodate loci that areinvariant (autosomal heterozygosity. See Schmidt et al 2021). However, thenumber of invariant loci are unknown given the SNPs are detected as singlepoint mutational variants and invariant sequences are discarded, andbecause of the particular additional filtering pre-analysis. Modelling thecounts of SNPs per sequence tag as a Poisson distribution in this scriptallows estimate of the zero class, that is, the number of invariant loci.This is reported, and the veracity of the estimate can be assessed by thecorrespondence of the observed frequencies against those under Poissonexpectation in the associated graphs. The number of invariant loci can thenbe optionally provided to the functiongl.report.heterozygosity via the parameter n.invariants.
In case the calculations for the Poisson expectation of the number ofinvariant sequence tags fail to converge, try to rerun the analysis with alargernsim values.
This function now also calculates the number of invariant sites (i.e.nucleotides) of the sequence tags (ifTrimmedSequence is present inx$other$loc.metrics) or estimate these by assuming that the averagelength of the sequence tags is 69 nucleotides. Based on the Poissonexpectation of the number of invariant sequence tags, it also estimates thenumber of invariant sites for these to eventually provide an estimate ofthe total number of invariant sites.
Note, previous version ofdartR would only return an estimate of the number of invariantsequence tags (not sites).
Plots are saved to the session temporary directory (tempdir).
Examples of other themes that can be used can be consulted in:
Value
A data.frame with the list of parameter values
n.total.tags Number of sequence tags in total
n.SNPs.secondaries Number of secondary SNP loci that would be removedon filtering
n.invariant.tags Estimated number of invariant sequence tags
n.tags.secondaries Number of sequence tags with secondaries
n.inv.gen Number of invariant sites in sequenced tags
mean.len.tag Mean length of sequence tags
n.invariant Total Number of invariant sites (including invariantsequence tags)
k Lambda: mean of the Poisson distribution of number of SNPs in thesequence tags
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
References
Schmidt, T.L., Jasper, M.-E., Weeks, A.R., Hoffmann, A.A., 2021.Unbiased population heterozygosity estimates from genome-wide sequencedata. Methods in Ecology and Evolution n/a.
See Also
gl.filter.secondaries,gl.report.heterozygosity,utils.n.var.invariant
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.sexlinked(),gl.report.taglength()
Examples
require("dartR.data")test <- gl.filter.callrate(platypus.gl,threshold = 1)n.inv <- gl.report.secondaries(test)gl.report.heterozygosity(test, n.invariant = n.inv[7, 2])Identifies loci that are sex linked
Description
Alleles unique to the Y or W chromosome and monomorphic on the X chromosomeswill appear in the SNP dataset as genotypes that are heterozygotic in allindividuals of the heterogametic sex and homozygous in all individuals of thehomogametic sex. This function identifies loci with alleles that behave inthis way, as putative sex specific SNP markers.
Usage
gl.report.sexlinked( x, sex = NULL, t.het = 0.1, t.hom = 0.1, t.pres = 0.1, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = three_colors, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
sex | Factor that defines the sex of individuals. See explanation indetails [default NULL]. |
t.het | Tolerance in the heterogametic sex, that is t.het=0.05 meansthat 5% of the heterogametic sex can be homozygous and still be regarded asconsistent with a sex specific marker [default 0.1]. |
t.hom | Tolerance in the homogametic sex, that is t.hom=0.05 means that5% of the homogametic sex can be heterozygous and still be regarded asconsistent with a sex specific marker [default 0.1]. |
t.pres | Tolerance in presence, that is t.pres=0.05 means that asilicodart marker can be present in either of the sexes and still be regardedas a sex-linked marker [default 0.1]. |
plot.out | Creates a plot that shows the heterozygosity of males andfemales at each loci and shaded area in which loci can be regarded as consistent with a sex specific marker [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of three color names for the not sex-linked loci, forthe sex-linked loci and for the area in which sex-linked loci appear[default three_colors]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
Sex of the individuals for which sex is known with certainty can be providedvia a factor (equal to the length of the number of individuals) or to be heldin the variablex@other$ind.metrics$sex.Coding is: M for male, F for female, U or NA for unknown/missing.The script abbreviates the entries here to the first character. So, coding of'Female' and 'Male' works as well. Character are also converted to uppercases.
' Function's output
This function creates a plot that shows the heterozygosity of males andfemales at each loci or SNP data or percentage of present/absent in the caseof SilicoDArT data.
Examples of other themes that can be used can be consulted in
Value
Two lists of sex-linked loci, one for XX/XY and one for ZZ/ZW systemsand a plot.
Author(s)
Arthur Georges, Bernd Gruber & Floriaan Devloo-Delva(Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.taglength()
Examples
out <- gl.report.sexlinked(testset.gl)out <- gl.report.sexlinked(testset.gs)test <- gl.filter.callrate(platypus.gl)test <- gl.filter.monomorphs(test)out <- gl.report.sexlinked(test)Reports summary of sequence tag length across loci
Description
SNP datasets generated by DArT typically have sequence tag lengths rangingfrom 20 to 69 base pairs. This function reports summary statistics of the taglengths.
Usage
gl.report.taglength( x, plot.out = TRUE, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP [required]. |
plot.out | If TRUE, displays a plot to guide the decision on a filterthreshold [default TRUE]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity] |
Details
The functiongl.filter.taglength will filter out theloci with a tag length below a specified threshold.
Quantiles are partitions of a finite set of values into q subsets of (nearly)equal sizes. In this function q = 20. Quantiles are useful measures becausethey are less susceptible to long-tailed distributions and outliers.
Function's output
The minimum, maximum, mean and a tabulation of tag length quantiles againstthresholds are provided. Output also includes a boxplot and ahistogram to guide in the selection of a threshold for filtering on taglength.
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in
Value
Returns unaltered genlight object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
gl.filter.taglength,gl.list.reports,gl.print.reports
Other report functions:gl.report.bases(),gl.report.callrate(),gl.report.diversity(),gl.report.hamming(),gl.report.hwe(),gl.report.ld.map(),gl.report.locmetric(),gl.report.maf(),gl.report.monomorphs(),gl.report.overshoot(),gl.report.pa(),gl.report.parent.offspring(),gl.report.rdepth(),gl.report.replicates(),gl.report.reproducibility(),gl.report.secondaries(),gl.report.sexlinked()
Examples
out <- gl.report.taglength(testset.gl)Runs a faststructure analysis using a genlight object
Description
This function takes a genlight object and runs a faststructure analysis.
Usage
gl.run.faststructure( x, k.range, num.k.rep, exec = "./fastStructure", output = getwd(), tol = 1e-05, prior = "simple", cv = 0, seed = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
k.range | Range of the number of populations [required]. |
num.k.rep | Number of replicates [required]. |
exec | Full path and name+extension where the fastStructure executableis located [default working directory "./fastStructure"]. |
output | Path to output file [default getwd()]. |
tol | Convergence criterion [default 10e-6]. |
prior | Choice of prior: simple or logistic [default "simple"]. |
cv | Number of test sets for cross-validation, 0 implies no CV step[default 0]. |
seed | Seed for random number generator [default NULL]. |
Details
Download faststructure binary for your system from here (only runs on Mac or Linux):
https://github.com/StuntsPT/Structure_threader/tree/master/structure_threader/bins
Move faststructure file to working directory. Make file executable using terminal app.
system(paste0("chmod u+x ",getwd(), "/faststructure"))
Download plink binary for your system from here:
https://www.cog-genomics.org/plink/
Move plink file to working directory. Make file executable using terminal app.
system(paste0("chmod u+x ",getwd(), "/plink"))
To install fastStructure dependencies follow these directions:https://github.com/rajanil/fastStructure
fastStructure performs inference for the simplest, independent-loci,admixture model, with two choices of priors that can be specified usingthe –prior parameter. Thus, unlike Structure, fastStructure does not requirethe mainparams and extraparam files. The inference algorithm used byfastStructure is fundamentally different from that of Structure andrequires the setting of far fewer options.
To identify the number of populations that best approximates the marginallikelihood of the data, the marginal likelihood is extracted from each runof K, averaged across replications and plotted.
Value
A list in which each list entry is a single faststructure run output(there are k.range * num.k.rep number of runs).
Author(s)
Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Raj, A., Stephens, M., & Pritchard, J. K. (2014). fastSTRUCTURE:variational inference of population structure in large SNP data sets.Genetics, 197(2), 573-589.
Examples
## Not run: t1 <- gl.filter.callrate(platypus.gl,threshold = 1)res <- gl.run.faststructure(t1, exec = "./fastStructure",k.range = 2:3, num.k.rep = 2,output = paste0(getwd(),"/res_str"))qmat <- gl.plot.faststructure(res,k.range=2:3)gl.map.structure(qmat, K=2, t1, scalex=1, scaley=0.5)## End(Not run)Runs a STRUCTURE analysis using a genlight object
Description
This function takes a genlight object and runs a STRUCTURE analysis based onfunctions fromstrataG
Usage
gl.run.structure( x, ..., exec = ".", plot.out = TRUE, plot_theme = theme_dartR(), save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
... | Parameters to specify the STRUCTURE run (check |
exec | Full path and name+extension where the structure executable islocated. E.g. |
plot.out | Create an Evanno plot once finished. Be aware k.range needsto be at least three different k steps [default TRUE]. |
plot_theme | Theme for the plot. See details for options[default theme_dartR()]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Set verbosity for this function (though structure outputcannot be switched off currently) [default NULL] |
Details
The function is basically a convenient wrapper around the beautifulstrataG functionstructureRun (Archer et al. 2016). For a detaileddescription please refer to this package (see references below).To make use of this function you need to download STRUCTURE for you system(non GUI version) from hereSTRUCTURE.
Format note
For this function to work, make sure that individual and population names have no spaces. To substitute spaces by underscores you could use the R functiongsub as below.
popNames(gl) <- gsub(" ","_",popNames(gl))
indNames(gl) <- gsub(" ","_",indNames(gl))
It's also worth noting that Structure truncates individual names at 11 characters. The function will fail if the names of individuals are not uniqueafter truncation. To avoid this possible problem, a number sequence, as shown in the code below, might be used instead of individual names.indNames(gl) <- as.character(1:length(indNames(gl)))
Value
An sr object (structure.result list output). Each list entry is asingle structurerun output (there are k.range * num.k.rep number of runs).For example the summary output of the first run can be accessed viasr[[1]]$summary or the q-matrix of the third run viasr[[3]]$q.mat. To conveniently summarise the outputs across runs(clumpp) you need to run gl.plot.structure on the returned sr object. ForEvanno plots run gl.evanno on your sr object.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
References
Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016) strataG: An Rpackage for manipulating, summarizing and analysing population genetic data.Mol Ecol Resour. doi:10.1111/1755-0998.12559
Examples
## Not run: #bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, # exec = './structure.exe')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, K=3)#head(qmat)#gl.map.structure(qmat, bc, scalex=1, scaley=0.5)## End(Not run)Samples individuals from populations
Description
This is a convenience function to prepare a bootstrap approach in dartR. For a bootstrap approach it is often desirable to sample a defined number of individuals for each of the populations in a genlight object and then calculate a certain quantity for that subset (redo a 1000 times)
Usage
gl.sample( x, nsample = min(table(pop(x))), replace = TRUE, onepop = FALSE, verbose = NULL)Arguments
x | genlight object containing SNP/silicodart genotypes |
nsample | the number of individuals that should be sampled |
replace | a switch to sample by replacement (default). |
onepop | switch to ignore population settings of the genlight object and sample from all individuals disregarding the population definition. [default FALSE]. |
verbose | set verbosity |
Details
This is convenience function to facilitate a bootstrap approach
Value
returns a genlight object with nsample samples from each populations.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other base dartR:gl.sort()
Examples
## Not run: #bootstrap for 2 possums populations to check effect of sample size on fixed allelesgl.set.verbosity(0)pp <- possums.gl[1:60,]nrep <- 1:10nss <- seq(1,10,2)res <- expand.grid(nrep=nrep, nss=nss)for (i in 1:nrow(res)) {dummy <- gl.sample(pp, nsample=res$nss[i], replace=TRUE)pas <- gl.report.pa(dummy, plot.out = F)res$fixed[i] <- pas$fixed[1]}boxplot(fixed ~ nss, data=res)## End(Not run)Saves an object in compressed binary format for later rapid retrieval
Description
This is a wrapper for saveRDS().
The script saves the object in binary form to the current workspace andreturns the input gl object.
Usage
gl.save(x, file, verbose = NULL)Arguments
x | Name of the genlight object containing SNP genotypes [required]. |
file | Name of the file to receive the binary version of the object[required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
The input object
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Examples
gl.save(testset.gl,file.path(tempdir(),'testset.rds'))Selects colors from one of several palettes and output as a vector
Description
This script draws upon a number of specified color libraries to extract avector of colors for plotting, where the script that follows has a colorparameter expecting a vector of colors.
Usage
gl.select.colors( x = NULL, library = NULL, palette = NULL, ncolors = NULL, select = NULL, verbose = NULL)Arguments
x | Optionally, provide a gl object from which to determine the numberof populations [default NULL]. |
library | Name of the color library to be used [default scales::hue_pl]. |
palette | Name of the color palette to be pulled from the specifiedlibrary [default is library specific] . |
ncolors | number of colors to be displayed and returned [default 9]. |
select | select the colors to retain in the output vector[default NULL]. |
verbose | – verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The available color libraries and their palettes include:
library 'brewer' and the palettes available can be listed byRColorBrewer::display.brewer.all() and RColorBrewer::brewer.pal.info.
library 'gr.palette' and the palettes available can be listed bygrDevices::palette.pals()
library 'r.hcl' and the palettes available can be listed bygrDevices::hcl.pals()
library 'baseR' and the palettes available are: 'rainbow','heat','topo.colors','terrain.colors','cm.colors'.
If the nominated palette is not specified, all the palettes will be listed and a default palette will then be chosen.
The color palette will be displayed in the graphics window for the requestednumber of colors (or 9 if not specified),and the vector of colors returnedfor later use.
The select parameter can be used to select colors from the specified ncolors.For example, select=c(1,1,3) will select color 1, 1 again and 3 to retain inthe final vector. This can be useful for fine-tuning color selection, andmatching colors and shapes.
Value
A vector with the required number of colors
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other Exploration/visualisation functions:gl.pcoa.plot(),gl.select.shapes(),gl.smearplot()
Examples
# SET UP DATASETgl <- testset.gllevels(pop(gl))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',7),'Em.subglobosa','Em.victoriae')# EXAMPLES -- SIMPLEcolors <- gl.select.colors()colors <- gl.select.colors(library='brewer',palette='Spectral',ncolors=6)colors <- gl.select.colors(library='baseR',palette='terrain.colors',ncolors=6)colors <- gl.select.colors(library='baseR',palette='rainbow',ncolors=12)colors <- gl.select.colors(library='gr.hcl',palette='RdBu',ncolors=12)colors <- gl.select.colors(library='gr.palette',palette='Pastel 1',ncolors=6)# EXAMPLES -- SELECTING colorScolors <- gl.select.colors(library='baseR',palette='rainbow',ncolors=12,select=c(1,1,1,5,8))# EXAMPLES -- CROSS-CHECKING WITH A GENLIGHT OBJECTcolors <- gl.select.colors(x=gl,library='baseR',palette='rainbow',ncolors=12,select=c(1,1,1,5,8))Selects shapes from the base R shape palette and outputs as a vector
Description
This script draws upon the standard R shape palette to extract a vector ofshapes for plotting, where the script that follows has a shape parameterexpecting a vector of shapes.
Usage
gl.select.shapes(x = NULL, select = NULL, verbose = NULL)Arguments
x | Optionally, provide a gl object from which to determine the numberof populations [default NULL]. |
select | Select the shapes to retain in the output vector[default NULL, all shapes shown and returned]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
By default the shape palette will be displayed in full in the graphics windowfrom which shapes can be selected in a subsequent run, and the vector ofshapes returned for later use.
The select parameter can be used to select shapes from the specified 26shapes available (0-25). For example, select=c(1,1,3) will select shape 1, 1again and 3 to retain in the final vector. This can be useful for fine-tuningshape selection, and matching colors and shapes.
Value
A vector with the required number of shapes
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other Exploration/visualisation functions:gl.pcoa.plot(),gl.select.colors(),gl.smearplot()
Examples
# SET UP DATASETgl <- testset.gllevels(pop(gl))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',7),'Em.subglobosa','Em.victoriae')# EXAMPLESshapes <- gl.select.shapes() # Select and display available shapes# Select and display a restricted set of shapesshapes <- gl.select.shapes(select=c(1,1,1,5,8)) # Select set of shapes and check with no. of pops.shapes <- gl.select.shapes(x=gl,select=c(1,1,1,5,8))Sets the default verbosity level
Description
dartR functions have a verbosity parameter that sets the level of reportingduring the execution of the function. The verbosity level, set by parameter'verbose' can be one of verbose 0, silent or fatal errors; 1, begin and end;2, progress ; 3, progress and results summary; 5, full report. Thedefault value for verbosity is stored in the r environment. This script setsthe default value.
Usage
gl.set.verbosity(value = 2)Arguments
value | Set the default verbosity to be this value: 0, silent only fatalerrors; 1, begin and end; 2, progress log; 3, progress and results summary;5, full report [default 2] |
Value
verbosity value [set for all functions]
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)# Examples ————-
See Also
Other dartR-base:gl.drop.ind(),gl.drop.loc(),gl.drop.pop(),gl.edit.recode.ind(),gl.edit.recode.pop(),gl.keep.loc(),gl.make.recode.ind(),gl.read.dart(),gl.recode.ind(),gl.recode.pop()
Examples
gl <- gl.set.verbosity(value=2)Creates a site frequency spectrum based on a dartR or genlight object
Description
Creates a site frequency spectrum based on a dartR or genlight object
Usage
gl.sfs( x, minbinsize = 0, folded = TRUE, singlepop = FALSE, plot.out = TRUE, verbose = NULL)Arguments
x | dartR/genlight object |
minbinsize | remove bins from the left of the sfs. For example to removesingletons (alleles only occurring once among all individuals) setminbinsize to 2. If set to zero, also monomorphic (d0) loci are returned. |
folded | if set to TRUE (default) a folded sfs (minor allele frequencysfs) is returned. If set to FALSE then an unfolded (derived allele frequencysfs) is returned. It is assumed that 0 is homozygote for the reference and2 is homozygote for the derived allele. So you need to make sure yourcoding is correct. |
singlepop | switch to force to create a one-dimensional sfs, eventhough the genlight/dartR object contains more than one population |
plot.out | Specify if plot is to be produced [default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
returns a site frequency spectrum, either a one dimensional vector(only a single population in the dartR/genlight object or singlepop=TRUE) oran n-dimensional array (n is the number of populations in the genlight/dartRobject). If the dartR/genlight object consists of several populations themultidimensional site frequency spectrum for each population is returned[=a multidimensional site frequency spectrum]. Be aware themultidimensional spectrum works only for a limited number of populationand individuals [if too high the table command used internally willthrough an error as the number of populations and individuals (andtherefore dimensions) are too large]. To get a single sfs for agenlight/dartR object with multiple populations, you need to setsinglepop to TRUE. The returned sfs can be used to analyse demographics,e.g. using fastsimcoal2.
Author(s)
Custodian: Bernd Gruber & Carlo Pacioni (Post tohttps://groups.google.com/d/forum/dartr)
References
Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V. C. andFoll M. (2013) Robust demographic inference from genomic and SNP data. PLoSgenetics 9(10)
Examples
gl.sfs(bandicoot.gl, singlepop=TRUE)gl.sfs(possums.gl[c(1:5,31:33),], minbinsize=1)Runs Wright-Fisher simulations
Description
This function simulates populations made up of diploid organisms that reproduce in non-overlapping generations. Each individual has a pair of homologous chromosomes that contains interspersed selected and neutral loci. For the initial generation, the genotype for each individual’s chromosomes israndomly drawn from distributions at linkage equilibrium and in Hardy-Weinberg equilibrium.
See documentation and tutorial for a complete description of the simulations.These documents can be accessed at http://georges.biomatix.org/dartR
Take into account that the simulations will take a little bit longer thefirst time you use the function gl.sim.WF.run() because C++ functions mustbe compiled.
Usage
gl.sim.WF.run( file_var, ref_table, x = NULL, file_dispersal = NULL, number_iterations = 1, every_gen = 10, sample_percent = 50, store_phase1 = FALSE, interactive_vars = TRUE, seed = NULL, verbose = NULL, ...)Arguments
file_var | Path of the variables file 'sim_variables.csv' (see details) [required if interactive_vars = FALSE]. |
ref_table | Reference table created by the function |
x | Name of the genlight object containing the SNP data to extractvalues for some simulation variables (see details) [default NULL]. |
file_dispersal | Path of the file with the dispersal table created withthe function |
number_iterations | Number of iterations of the simulations [default 1]. |
every_gen | Generation interval at which simulations should be stored ina genlight object [default 10]. |
sample_percent | Percentage of individuals, from the total population, to sample and save in the genlight object every generation [default 50]. |
store_phase1 | Whether to store simulations of phase 1 in genlightobjects [default FALSE]. |
interactive_vars | Run a shiny app to input interactively the values ofsimulations variables [default TRUE]. |
seed | Set the seed for the simulations [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
... | Any variable and its value can be added separately within the function, will be changed over the input value supplied by the csv file. See tutorial. |
Details
Values for simulation variables can be submitted into the function interactively through a shiny app if interactive_vars = TRUE. Optionally, if interactive_vars = FALSE, values for variables can be submitted by using thecsv file 'sim_variables.csv' which can be found by typing in the R console:system.file('extdata', 'sim_variables.csv', package ='dartR').
The values of the variables can be modified using the third column (“value”) of this file.
The output of the simulations can be analysed seemingly with other dartR functions.
If a genlight object is used as input for some of the simulation variables, this function access the information stored in the slots x$position and x$chromosome.
To show further information of the variables in interactive mode, it might benecessary to call first: 'library(shinyBS)' for the information to be displayed.
The main characteristics of the simulations are:
Simulations can be parameterised with real-life genetic characteristics such as the number, location, allele frequency and the distribution of fitness effects (selection coefficients and dominance) of loci under selection.
Simulations can recreate specific life histories and demographics, suchas source populations, dispersal rate, number of generations, founder individuals, effective population size and census population size.
Each allele in each individual is an agent (i.e., each allele is explicitly simulated).
Each locus can be customisable regarding its allele frequencies, selection coefficients, and dominance.
The number of loci, individuals, and populations to be simulated is only limited by computing resources.
Recombination is accurately modeled, and it is possible to use real recombination maps as input.
The ratio between effective population size and census population size can be easily controlled.
The output of the simulations are genlight objects for each generation or a subset of generations.
Genlight objects can be used as input for some simulation variables.
Value
Returns genlight objects with simulated data.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other simulation functions:gl.sim.WF.table(),gl.sim.create_dispersal()
Examples
## Not run: ref_table <- gl.sim.WF.table(file_var=system.file('extdata', 'ref_variables.csv', package = 'dartR'),interactive_vars = FALSE)res_sim <- gl.sim.WF.run(file_var = system.file('extdata', 'sim_variables.csv', package ='dartR'),ref_table=ref_table,interactive_vars = FALSE)## End(Not run)Creates the reference table for running gl.sim.WF.run
Description
This function creates a reference table to be used as input for the functiongl.sim.WF.run. The created table has eight columns with the following information for each locus to be simulated:
q - initial frequency.
h - dominance coefficient.
s - selection coefficient.
c - recombination rate.
loc_bp - chromosome location in base pairs.
loc_cM - chromosome location in centiMorgans.
chr_name - chromosome name.
type - SNP type.
The reference table can be further modified as required.
See documentation and tutorial for a complete description of the simulations.These documents can be accessed at http://georges.biomatix.org/dartR
Usage
gl.sim.WF.table( file_var, x = NULL, file_targets_sel = NULL, file_r_map = NULL, interactive_vars = TRUE, seed = NULL, verbose = NULL, ...)Arguments
file_var | Path of the variables file 'ref_variables.csv' (see details) [required if interactive_vars = FALSE]. |
x | Name of the genlight object containing the SNP data to extractvalues for some simulation variables (see details) [default NULL]. |
file_targets_sel | Path of the file with the targets for selection (see details) [default NULL]. |
file_r_map | Path of the file with the recombination map (see details)[default NULL]. |
interactive_vars | Run a shiny app to input interactively the values ofsimulation variables [default TRUE]. |
seed | Set the seed for the simulations [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
... | Any variable and its value can be added separately within the function, will be changed over the input value supplied by the csv file. See tutorial. |
Details
Values for the variables to create the reference table can be submitted into the function interactively through a Shiny app if interactive_vars = TRUE. Optionally, if interactive_vars = FALSE, values for variables can be submitted by using the csv file 'ref_variables.csv' which can be found by typing in the R console:system.file('extdata', 'ref_variables.csv', package ='dartR').
The values of the variables can be modified using the third column (“value”) of this file.
If a genlight object is used as input for some of the simulation variables, this function access the information stored in the slots x$position and x$chromosome.
Examples of the format required for the recombination map file and the targets for selection file can be found by typing in the R console:
system.file('extdata', 'fly_recom_map.csv', package ='dartR')
system.file('extdata', 'fly_targets_of_selection.csv', package ='dartR')
To show further information of the variables in interactive mode, it might benecessary to call first: 'library(shinyBS)' for the information to be displayed.
Value
Returns a list with the reference table used as input for the functiongl.sim.WF.run and a table with the values variables used to create the reference table.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other simulation functions:gl.sim.WF.run(),gl.sim.create_dispersal()
Examples
ref_table <- gl.sim.WF.table(file_var=system.file('extdata', 'ref_variables.csv', package = 'dartR'),interactive_vars = FALSE)## Not run: #uncomment to run res_sim <- gl.sim.WF.run(file_var = system.file('extdata', 'sim_variables.csv', package ='dartR'),ref_table=ref_table,interactive_vars = FALSE)## End(Not run)Creates a dispersal file as input for the function gl.sim.WF.run
Description
This function writes a csv file called "dispersal_table.csv" which containsthe dispersal variables for each pair of populations to be used as input forthe functiongl.sim.WF.run.
The values of the variables can be modified using the columns"transfer_each_gen" and "number_transfers" of this file.
See documentation and tutorial for a complete description of the simulations.These documents can be accessed by typing in the R console:browseVignettes(package="dartR”)
Usage
gl.sim.create_dispersal( number_pops, dispersal_type = "all_connected", number_transfers = 1, transfer_each_gen = 1, outpath = tempdir(), outfile = "dispersal_table.csv", verbose = NULL)Arguments
number_pops | Number of populations [required]. |
dispersal_type | One of: "all_connected", "circle" or "line"[default "all_connected"]. |
number_transfers | Number of dispersing individuals. This value can be .modified by hand after the file has been created [default 1]. |
transfer_each_gen | Interval of number of generations in which dispersaloccur. This value can be modified by hand after the file has been created[default 1]. |
outpath | Path where to save the output file. Use outpath=getwd() oroutpath='.' when calling this function to direct output files to your workingdirectory [default tempdir(), mandated by CRAN]. |
outfile | File name of the output file [default 'dispersal_table.csv']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A csv file containing the dispersal variables for each pair ofpopulations to be used as input for the functiongl.sim.WF.run.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other simulation functions:gl.sim.WF.run(),gl.sim.WF.table()
Examples
gl.sim.create_dispersal(number_pops=10)Simulates emigration between populations
Description
A function that allows to exchange individuals of populations within agenlight object (=simulate emigration between populations).
Usage
gl.sim.emigration(x, perc.mig = NULL, emi.m = NULL, emi.table = NULL)Arguments
x | A genlight or list of genlight objects [required]. |
perc.mig | Percentage of individuals that migrate(emigrates = nInd times perc.mig) [default NULL]. |
emi.m | Probabilistic emigration matrix (emigrate from=column to=row)[default NULL] |
emi.table | If presented emi.m matrix is ignored. Deterministicemigration as specified in the matrix (a square matrix of dimension of thenumber of populations). e.g. an entry in the 'emi.table[2,1]<- 5' means thatfive individuals emigrate from population 1 to population 2 (from=columns andto=row) [default NULL]. |
Details
There are two ways to specify emigration. If an emi.table is provided (asquare matrix of dimension of the populations that specifies the emigrationfrom column x to row y), then emigration is deterministic in terms of numbersof individuals as specified in the table. If perc.mig and emi.m are provided,then emigration is probabilistic. The number of emigrants is determined bythe population size times the perc.mig and then the population where tomigrate to is taken from the relative probability in the columns of the emi.mtable.
Be aware if the diagonal is non zero then migration can occur into the samepatch. So most often you want to set the diagonal of the emi.m matrix tozero. Which individuals is moved is random, but the order is in the order ofpopulations. It is possible that an individual moves twice within anemigration call(as there is no check, so an individual moved from population1 to 2 can move again from population 2 to 3).
Value
A list or a single [depends on the input] genlight object, whereemigration between population has happened
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
x <- possums.gl#one individual moves from every population to#every other populationemi.tab <- matrix(1, nrow=nPop(x), ncol=nPop(x))diag(emi.tab)<- 0np <- gl.sim.emigration(x, emi.table=emi.tab)npSimulates individuals based on the allele frequencies provided via a genlightobject.
Description
This function simulates individuals based on the allele frequencies of agenlight object. The output is a genlight object with the same number of locias the input genlight object.
Usage
gl.sim.ind(x, n = 50, popname = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
n | Number of individuals that should be simulated [default 50]. |
popname | A population name for the simulated individuals [default NULL]. |
Details
The function can be used to simulate populations for sampling designs or forpower analysis. Check the example below where the effect of drift isexplored, by simply simulating several generation a genlight object and putting in the allele frequencies of the previous generation. The beauty ofthe function is, that it is lightning fast. Be aware this is a simulation and to avoid lengthy error checking the function crashes if there are loci that have just NAs. If such a case can occur during your simulation, thoseloci need to be removed, before the function is called.
Value
A genlight object with n individuals.
Author(s)
Bernd Gruber (bernd.gruber@canberra.edu.au)
Examples
glsim <- gl.sim.ind(testset.gl, n=10, popname='sims')glsim###Simulate drift over 10 generation# assuming a bottleneck of only 10 individuals# [ignoring effect of mating and mutation]# Simulate 20 individuals with no structure and 50 SNP locifounder <- glSim(n.ind = 20, n.snp.nonstruc = 50, ploidy=2)#number of fixed loci in the first generationres <- sum(colMeans(as.matrix(founder), na.rm=TRUE) %%2 ==0)simgl <- founder#49 generations of only 10 individualsfor (i in 2:50){ simgl <- gl.sim.ind(simgl, n=10, popname='sims') res[i]<- sum(colMeans(as.matrix(simgl), na.rm=TRUE) %%2 ==0)}plot(1:50, res, type='b', xlab='generation', ylab='# fixed loci')Simulates mutations within a genlight object
Description
This script is intended to be used within the simulation framework of dartR. It adds the ability to add a constant mutation rate across all loci. Only works currently for biallelic data sets (SNPs). Mutation rate is checking for all alleles position and mutations at loci with missing values are ignored and in principle 'double mutations' at the same loci can occur, but should be rare.
Usage
gl.sim.mutate(x, mut.rate = 1e-06)Arguments
x | Name of the genlight object containing the SNP data [required]. |
mut.rate | Constant mutation rate over nInd*nLoc*2 possible locations[default 1e-6] |
Value
Returns a genlight object with the applied mutations
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
b2 <- gl.sim.mutate(bandicoot.gl,mut.rate=1e-4 )#check the mutations that have occurredtable(as.matrix(bandicoot.gl), as.matrix(b2))Simulates a specified number of offspring based on alleles provided bypotential father(s) and mother(s)
Description
This takes a population (or a single individual) of fathers (provided as agenlight object) and mother(s) and simulates offspring based on 'random'mating. It can be used to simulate population dynamics and check the effectof those dynamics and allele frequencies, number of alleles. Anotherapplication is to simulate relatedness of siblings and compare it to actualrelatedness found in the population to determine kinship.
Usage
gl.sim.offspring(fathers, mothers, noffpermother, sexratio = 0.5)Arguments
fathers | Genlight object of potential fathers [required]. |
mothers | Genlight object of potential mothers simulated [required]. |
noffpermother | Number of offspring per mother [required]. |
sexratio | The sex ratio of simulated offspring (females / females +males, 1 equals 100 percent females) [default 0.5.]. |
Value
A genlight object with n individuals.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
#Simulate 10 potential fathersgl.fathers <- glSim(10, 20, ploidy=2)#Simulate 10 potential mothersgl.mothers <- glSim(10, 20, ploidy=2)gl.sim.offspring(gl.fathers, gl.mothers, 2, sexratio=0.5)Smear plot of SNP or presence/absence (SilicoDArT) data
Description
Each locus is color coded for scores of 0, 1, 2 and NA for SNP data and 0, 1and NA for presence/absence (SilicoDArT) data. Individual labels can be addedand individuals can be grouped by population.
Plot may become cluttered if ind_labels If there are too many individuals, it is best to use ind_labels_size = 0.
Usage
gl.smearplot( x, ind_labels = FALSE, group_pop = FALSE, ind_labels_size = 10, plot_colors = NULL, posi = "bottom", save2tmp = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
ind_labels | If TRUE, individuals are labelled with indNames(x) [default FALSE]. |
group_pop | If ind_labels is TRUE, group by population [default TRUE]. |
ind_labels_size | Size of the individual labels [default 10]. |
plot_colors | Vector with four color names for homozygotes for thereference allele, heterozygotes, homozygotes for the alternative allele andfor missing values (NA), e.g. four_colours [default NULL].Can be set to "hetonly", which defines colors to only show heterozygotes in the genlight object |
posi | Position of the legend: “left”, “top”, “right”, “bottom” or'none' [default = 'bottom']. |
save2tmp | If TRUE, saves plot to the session temporary directory(tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default NULL]. |
Value
Returns unaltered genlight object
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
See Also
Other Exploration/visualisation functions:gl.pcoa.plot(),gl.select.colors(),gl.select.shapes()
Examples
gl.smearplot(testset.gl,ind_labels=FALSE)gl.smearplot(testset.gs[1:10,],ind_labels=TRUE)re-sorts genlight objects
Description
Often it is desirable to have the genlight object sorted individuals by population names, indiviual name, for example to have a more informative gl.smearplot (showing banding patterns for populations). Also sorting by loci can be informative in some instances. This function provides the ability to sort individuals of a genlight object by providing the order of individuals or populations and also by loci metric providing the order of locis. See examples below for specifics.
Usage
gl.sort(x, sort.by = "pop", order.by = NULL, verbose = NULL)Arguments
x | genlight object containing SNP/silicodart genotypes |
sort.by | either "ind", "pop". Default is pop |
order.by | that is used to order individuals or loci. Depening on the order.by parameter, this needs to be a vector of length of nPop(genlight) for populations or nInd(genlight) for individuals. If not specified alphabetical order of populations or individuals is used. For sort.by="ind" order.by can be also a vector specifying the order for each individual (for example another ind.metrics) |
verbose | set verbosity |
Details
This is convenience function to facilitate sorting of individuals within the genlight object. For example if you want to visualise the "band" of population in a gl.smearplot then the order of individuals is important. Also
Value
Returns a reordered genlight object. Sorts also the ind/loc.metrics and coordinates accordingly
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
See Also
Other base dartR:gl.sample()
Examples
#sort by populationsbc <- gl.sort(bandicoot.gl)#sort from West to Eastbc2 <- gl.sort(bandicoot.gl, sort.by="pop" ,order.by=c("WA", "SA", "VIC", "NSW", "QLD"))#sort by missing valuesmiss <- rowSums(is.na(as.matrix(bandicoot.gl)))bc3 <- gl.sort(bandicoot.gl, sort.by="ind", order.by=miss)gl.smearplot(bc3)Spatial autocorrelation following Smouse and Peakall 1999
Description
Global spatial autocorrelation is a multivariate approachcombining all loci into a single analysis. The autocorrelation coefficient"r" is calculated for each pair of individuals in each specified distanceclass. For more information see Smouse and Peakall 1999, Peakall et al. 2003and Smouse et al. 2008.
Usage
gl.spatial.autoCorr( x = NULL, Dgeo = NULL, Dgen = NULL, coordinates = "latlon", Dgen_method = "Euclidean", Dgeo_trans = "Dgeo", Dgen_trans = "Dgen", bins = 5, reps = 100, plot.pops.together = FALSE, permutation = TRUE, bootstrap = TRUE, plot_theme = NULL, plot_colors_pop = NULL, CI_color = "red", plot.out = TRUE, save2tmp = FALSE, verbose = NULL)Arguments
x | Genlight object [default NULL]. |
Dgeo | Geographic distance matrix if no genlight object is provided.This is typically an Euclidean distance but it can be any meaningful (geographical) distance metrics [default NULL]. |
Dgen | Genetic distance matrix if no genlight object is provided[default NULL]. |
coordinates | Can be either 'latlon', 'xy' or a two column data.framewith column names 'lat','lon', 'x', 'y') Coordinates are provided via |
Dgen_method | Method to calculate genetic distances. See details[default "Euclidean"]. |
Dgeo_trans | Transformation to be used on the geographic distances. SeeDgen_trans [default "Dgeo"]. |
Dgen_trans | You can provide a formula to transform the geneticdistance. The transformation can be applied as a formula using Dgen as thevariable to be transformed. For example: |
bins | The number of bins for the distance classes(i.e. |
reps | The number to be used for permutation and bootstrap analyses[default 100]. |
plot.pops.together | Plot all the populations in one plot. Confidence intervals from permutations are not shown [default FALSE]. |
permutation | Whether permutation calculations for the null hypothesis of no spatial structure should be carried out [default TRUE]. |
bootstrap | Whether bootstrap calculations to compute the 95% confidence intervals around r should be carried out [default TRUE]. |
plot_theme | Theme for the plot. See details [default NULL]. |
plot_colors_pop | A color palette for populations or a list withas many colors as there are populations in the dataset [default NULL]. |
CI_color | Color for the shade of the 95% confidence intervals around the r estimates [default "red"]. |
plot.out | Specify if plot is to be produced [default TRUE]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [defaultNULL, unless specified using gl.set.verbosity]. |
Details
This function executes a modified versionofspautocorr from the packagePopGenReport. Differently fromPopGenReport, this function also computes the 95% confidence intervals around the r via bootstraps, the 95null hypothesis of no spatial structure and the one-tail test via permutation, and the correction factor described by Peakall et al 2003.
The input can be i) a genlight object (which has to have the latlon slot populated), ii) a pair ofDgeo andDgen, which have to beeithermatrix ordist objects, or iii) alist of thematrix ordist objects if the analysis needs to be carried out for multiple populations (in this case, all the elements of thelist have to be of the same class (i.e.matrix ordist) and the population order in the two lists has to be the same.
If the input is a genlight object, the function calculates the linear distanceforDgeo and the relevantDgen matrix (seeDgen_method) for each population. When the method selected is a genetic similarity matrix (e.g. "simple" distance), the matrix is internally transformed with1 - Dgen so that positive values of autocorrelation coefficients indicates more related individuals similarly as implemented in GenAlEx. If the user provide the distance matrices, care must be taken in interpreting the results becausesimilarity matrix will generate negative values for closely related individuals.
Ifmax(Dgeo)>1000 (e.g. the geographic distances are in thousands of metres), values are divided by 1000 (in the example before these would then become km) to facilitate readability of the plots.
Ifbins is of length = 1 it is interpreted as the number of (even)bins to use. In this case the starting point is always the minimum value in the distance matrix, and the last is the maximum. If it is a numeric vector of length>1, it is interpreted as the breaking points. In this case, the first has to be the lowest value, and the last has to be the highest. There are no internal checks for this and it is user responsibility to ensure thatdistance classes are properly set up. If that is not the case, data that falloutside the range provided will be dropped. The number of bins will belength(bins) - 1.
The permutation constructs the 95% confidence intervals around the nullhypothesis of no spatial structure (this is a two-tail test). The same dataare also used to calculate the probability of the one-tail test (See references below for details).
Bootstrap calculations are skipped andNA is returned when the number of possible combinations given the sample size of any given distance class is<reps.
Methods available to calculate genetic distances for SNP data:
"propShared" using the function
gl.propShared."grm" using the function
gl.grm."Euclidean" using the function
gl.dist.ind."Simple" using the function
gl.dist.ind."Absolute" using the function
gl.dist.ind."Manhattan" using the function
gl.dist.ind.
Methods available to calculate genetic distances for SilicoDArT data:
"Euclidean" using the function
gl.dist.ind."Simple" using the function
gl.dist.ind."Jaccard" using the function
gl.dist.ind."Bray-Curtis" using the function
gl.dist.ind.
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.
Examples of other themes that can be used can be consulted in
Value
Returns a data frame with the following columns:
Bin The distance classes
N The number of pairwise comparisons within each distance class
r.uc The uncorrected autocorrelation coefficient
Correction the correction
r The corrected autocorrelation coefficient
L.r The corrected autocorrelation coefficient lower limit(if
bootstap = TRUE)U.r The corrected autocorrelation coefficient upper limit(if
bootstap = TRUE)L.r.null.uc The uncorrected lower limit for the null hypothesis of no spatial autocorrelation (if
permutation = TRUE)U.r.null.uc The uncorrected upper limit for the null hypothesis of no spatial autocorrelation (if
permutation = TRUE)L.r.null The corrected lower limit for the null hypothesis of no spatial autocorrelation (if
permutation = TRUE)U.r.null The corrected upper limit for the null hypothesis of no spatial autocorrelation (if
permutation = TRUE)p.one.tail The p value of the one tail statistical test
Author(s)
Carlo Pacioni, Bernd Gruber & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Smouse PE, Peakall R. 1999. Spatial autocorrelation analysis ofindividual multiallele and multilocus genetic structure. Heredity 82:561-573.
Double, MC, et al. 2005. Dispersal, philopatry and infidelity: dissecting local genetic structure in superb fairy-wrens (Malurus cyaneus). Evolution 59, 625-635.
Peakall, R, et al. 2003. Spatial autocorrelation analysis offers newinsights into gene flow in the Australian bush rat, Rattus fuscipes.Evolution 57, 1182-1195.
Smouse, PE, et al. 2008. A heterogeneity test for fine-scale geneticstructure. Molecular Ecology 17, 3389-3400.
Gonzales, E, et al. 2010. The impact of landscape disturbance on spatial genetic structure in the Guanacaste tree, Enterolobiumcyclocarpum (Fabaceae). Journal of Heredity 101, 133-143.
Beck, N, et al. 2008. Social constraint and an absence of sex-biaseddispersal drive fine-scale genetic structure in white-winged choughs.Molecular Ecology 17, 4346-4358.
Examples
require("dartR.data")res <- gl.spatial.autoCorr(platypus.gl, bins=seq(0,10000,2000))# using one population, showing sample sizetest <- gl.keep.pop(platypus.gl,pop.list = "TENTERFIELD")res <- gl.spatial.autoCorr(test, bins=seq(0,10000,2000),CI_color = "green")test <- gl.keep.pop(platypus.gl,pop.list = "TENTERFIELD")res <- gl.spatial.autoCorr(test, bins=seq(0,10000,2000),CI_color = "green")Subsamples n loci from a genlight object and return it as a genlight object
Description
This is a support script, to subsample a genlight {adegenet} object basedon loci. Two methods are used to subsample, random and based on informationcontent.
Usage
gl.subsample.loci(x, n, method = "random", mono.rm = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
n | Number of loci to include in the subsample [required]. |
method | Method: 'random', in which case the loci are sampled at random;or 'pic', in which case the top n loci ranked on information content arechosen. Information content is stored in AvgPIC in the case of SNP data and inPIC in the the case of presence/absence (SilicoDArT) data [default 'random']. |
mono.rm | Delete monomorphic loci before sampling [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
A genlight object with n loci
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
# SNP data gl2 <- gl.subsample.loci(testset.gl, n=200, method='pic')# Tag P/A data gl2 <- gl.subsample.loci(testset.gl, n=100, method='random')Tests the difference in heterozygosity between populations takenpairwise
Description
Calculates heterozygosities (expected or observed) for each population in a genlight object, and uses re-randomization to test the statistical significance of differences in heterozygosity between populations takenpairwise.
Usage
gl.test.heterozygosity( x, nreps = 100, alpha1 = 0.05, alpha2 = 0.01, test_het = "He", plot.out = TRUE, max_plots = 6, plot_theme = theme_dartR(), plot_colors = two_colors, save2tmp = FALSE, verbose = NULL)Arguments
x | A genlight object containing the SNP genotypes [required]. |
nreps | Number of replications of the re-randomization [default 1,000]. |
alpha1 | First significance level for comparison with diff=0 on plot[default 0.05]. |
alpha2 | Second significance level for comparison with diff=0 on plot[default 0.01]. |
test_het | Whether to test difference using observed heterozygosity("Ho") or expected heterozygosity ("He") [default "He"]. |
plot.out | If TRUE, plots a sampling distribution of the differences foreach comparison [default TRUE]. |
max_plots | Maximum number of plots to print per page [default 6]. |
plot_theme | Theme for the plot. See Details for options[default theme_dartR()]. |
plot_colors | List of two color names for the borders and fill of theplots [default two_colors]. |
save2tmp | If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
Function's output
If plot.out = TRUE, plots are created showing the sampling distribution forthe difference between each pair of heterozygosities, marked with thecritical limits alpha1 and alpha2, the observed heterozygosity, and the zerovalue (if in range).
Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed with thefunctiongl.list.reports. Note that they can be accessed onlyin the current R session because tempdir is cleared each time that the Rsession is closed.
Examples of other themes that can be used can be consulted in
Value
A dataframe containing population labels, heterozygosities and samplesizes
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
Examples
out <- gl.test.heterozygosity(platypus.gl, nreps=1, verbose=3, plot.out=TRUE)Outputs an nj tree to summarize genetic similarity among populations
Description
This function is a wrapper for the nj function or package ape applied to Euclideandistances calculated from the genlight object.
Usage
gl.tree.nj( x, d_mat = NULL, type = "phylogram", outgroup = NULL, labelsize = 0.7, treefile = NULL, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
d_mat | Distance matrix [default NULL]. |
type | Type of dendrogram "phylogram"|"cladogram"|"fan"|"unrooted"[default "phylogram"]. |
outgroup | Vector containing the population names that are the outgroups[default NULL]. |
labelsize | Size of the labels as a proportion of the graphics default[default 0.7]. |
treefile | Name of the file for the tree topology using Newick format [default NULL]. |
verbose | Specify the level of verbosity: 0, silent, fatal errors only; 1, flag function begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
An euclidean distance matrix is calculated by default [d_mat = NULL]. Optionally the user can use as input for the tree any other distance matrixusing this parameter, see for example the functiongl.dist.pop.
Value
A tree file of class phylo.
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
Examples
# SNP data gl.tree.nj(testset.gl,type='fan')# Tag P/A data gl.tree.nj(testset.gs,type='fan') res <- gl.tree.nj(platypus.gl)Writes out data from a genlight object to csv file
Description
This script writes to file the SNP genotypes with specimens as entities(columns) and loci as attributes (rows). Each row has associated locusmetadata. Each column, with header of specimen id, has population in thefirst row.
The data coding differs from the DArT 1row format in that 0 = referencehomozygous, 2 = alternate homozygous, 1 = heterozygous, and NA = missing SNPassignment.
Usage
gl.write.csv(x, outfile = "outfile.csv", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default "outfile.csv"]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end;2, progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
Saves a genlight object to csv, returns NULL.
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
# SNP data gl.write.csv(testset.gl, outfile='SNP_1row.csv')# Tag P/A data gl.write.csv(testset.gs, outfile='PA_1row.csv')Converts a genlight object into a format suitable for input to Bayescan
Description
The output text file contains the SNP data and relevant BAyescan commandlines to guide input.
Usage
gl2bayescan(x, outfile = "bayescan.txt", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default bayescan.txt]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Foll M and OE Gaggiotti (2008) A genome scan method to identify selected lociappropriate for both dominant and codominant markers: A Bayesianperspective. Genetics 180: 977-993.
Examples
out <- gl2bayescan(testset.gl)Converts a genlight object into a format suitable for input to the BPP program
Description
This function generates the sequence alignment file and the Imap file. The control file should produced by the user.
Usage
gl2bpp( x, method = 1, outfile = "output_bpp.txt", imap = "Imap.txt", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
method | One of 1 | 2, see details [default = 1]. |
outfile | Name of the sequence alignment file ["output_bpp.txt"]. |
imap | Name of the Imap file ["Imap.txt"]. |
outpath | Path where to save the output file (set to tempdir by default) |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
Details
If method = 1, heterozygous positions are replaced by standard ambiguity codes.
If method = 2, the heterozygous state is resolved by randomly assigning one or the other SNP variant to the individual.
Trimmed sequences for which the SNP has been trimmed out, rarely, by adaptermis-identity are deleted.
This function requires 'TrimmedSequence' to be among the locus metrics(@other$loc.metrics) and information of the type of alleles (slotloc.all e.g. 'G/A') and the position of the SNP in slot position of the“'genlight“' object (see testset.gl@position and testset.gl@loc.all forhow to format these slots.)
It's important to keep in mind that analyses based on coalescent theory, like those done by the programme BPP, are meant to be used with sequencedata. In this type of data, large chunks of DNA are sequenced, so when wefind polymorphic sites along the sequence, we know they are all on the samechromosome. This kind of data, in which we know which chromosome each allele comes from, is called "phased data." Most data from reduced representation genome-sequencing methods, like DArTseq, is unphased, which means that we don't know which chromosome each allele comes from. So, if we apply coalescence theory to data that is not phased, we will getbiased results. As in Ellegren et al., one way to deal with this is to "haplodize" each genotype by randomly choosing one allele from heterozygous genotypes (2012) by using method = 2.
Be mindful that there is little information in the literature on thevalidity of this method.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Ellegren, Hans, et al. "The genomic landscape of species divergence in Ficedula flycatchers." Nature 491.7426 (2012): 756-760.
Flouri T., Jiao X., Rannala B., Yang Z. (2018) Species Tree Inference with BPP using Genomic Sequences and the Multispecies Coalescent. MolecularBiology and Evolution, 35(10):2585-2593. doi:10.1093/molbev/msy147
Examples
require(dartR.data)test <- platypus.gltest <- gl.filter.callrate(test,threshold = 1)test <- gl.filter.monomorphs(test)test <- gl.subsample.loci(test,n=25)gl2bpp(x = test)Convert a genlight object to a dartR object
Description
This function converts a 'genlight' object into a 'dartR' object by changing its class attribute.It is used to convert legacy data sets to the new dartR format.
Usage
gl2dartR(x, filename = NULL, file.path = tempdir())Arguments
x | An object of class 'genlight' to be converted. |
filename | A character string specifying the name of the file to save the converted object. [default is gl.rds] |
file.path | A character string specifying the path to save the file. |
Value
The input object with class changed to '"dartR"' and its package attribute set to '"dartR.base"'.
Examples
simgl <- glSim(10, 100, ploidy = 2, indnames=1:10, locnames=1:100) # Simulating a genlight objectsimgl <- gl2dartR(simgl)pop(simgl)<- rep("A",10)indNames(simgl) <- paste0("ind",1:10)gl.smearplot(simgl, verbose=0)Creates a dataframe suitable for input to package {Demerelate} from agenlight {adegenet} object
Description
Creates a dataframe suitable for input to package {Demerelate} from agenlight {adegenet} object
Usage
gl2demerelate(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity] |
Value
A dataframe suitable as input to package {Demerelate}
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
Examples
df <- gl2demerelate(testset.gl)Converts a genlight object into eigenstrat format
Description
The output of this function are three files:
genotype file: contains genotype data for each individual at each SNPwith an extension 'eigenstratgeno.'
snp file: contains information about each SNP with an extension 'snp.'
indiv file: contains information about each individual with anextension 'ind.'
Usage
gl2eigenstrat( x, outfile = "gl_eigenstrat", outpath = tempdir(), snp_pos = 1, snp_chr = 1, pos_cM = 0, sex_code = "unknown", phen_value = "Case", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file [default 'gl_eigenstrat']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
snp_pos | Field name from the slot loc.metrics where the SNP position isstored [default 1]. |
snp_chr | Field name from the slot loc.metrics where the chromosome ofeach is stored [default 1]. |
pos_cM | A vector, with as many elements as there are loci, containingthe SNP position in morgans or centimorgans [default 1]. |
sex_code | A vector, with as many elements as there are individuals,containing the sex code ('male', 'female', 'unknown') [default 'unknown']. |
phen_value | A vector, with as many elements as there are individuals,containing the phenotype value ('Case', 'Control') [default 'Case']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
Eigenstrat only accepts chromosomes coded as numeric values, as follows:X chromosome is encoded as 23, Y is encoded as 24, mtDNA is encoded as90, and XY is encoded as 91. SNPs with illegal chromosome values, suchas 0, will be removed.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Patterson, N., Price, A. L., & Reich, D. (2006). Population structureand eigenanalysis. PLoS genetics, 2(12), e190.
Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E.,Shadick, N. A., & Reich, D. (2006). Principal components analysis correctsfor stratification in genome-wide association studies. Nature genetics,38(8), 904-909.
Examples
require("dartR.data")gl2eigenstrat(platypus.gl,snp_pos='ChromPos_Platypus_Chrom_NCBIv1',snp_chr = 'Chrom_Platypus_Chrom_NCBIv1')Concatenates DArT trimmed sequences and outputs a FASTA file
Description
Concatenated sequence tags are useful for phylogenetic methods whereinformation on base frequencies and transition and transversion ratios arerequired (for example, Maximum Likelihood methods). Where relevant,heterozygous loci are resolved before concatenation by either assigningambiguity codes or by random allele assignment.
Usage
gl2fasta( x, method = 1, trimmed.sequence = TRUE, outfile = "output.fasta", outpath = tempdir(), probar = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
method | One of 1 | 2 | 3 | 4. Type method=0 for a list of options [method=1]. |
trimmed.sequence | Include Trimmedsequence. If FALSE, only method 3 and 4 are available [default = TRUE]. |
outfile | Name of the output file (fasta format) ["output.fasta"]. |
outpath | Path where to save the output file (set to tempdir by default) |
probar | If TRUE, a progress bar will be displayed for long loops [default = TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
Details
Four methods are employed:
Method 1 – heterozygous positions are replaced by the standard ambiguity codes. The resultant sequence fragments are concatenated across loci togenerate a single combined sequence to be used in subsequent ML phylogeneticanalyses.
Method 2 – the heterozygous state is resolved by randomly assigning one orthe other SNP variant to the individual. The resultant sequence fragments areconcatenated across loci to generate a single composite haplotype to be usedin subsequent ML phylogenetic analyses.
Method 3 – heterozygous positions are replaced by the standard ambiguitycodes. The resultant SNP bases are concatenated across loci to generate asingle combined sequence to be used in subsequent MP phylogenetic analyses.
Method 4 – the heterozygous state is resolved by randomly assigning one orthe other SNP variant to the individual. The resultant SNP bases areconcatenated across loci to generate a single composite haplotype to be usedin subsequent MP phylogenetic analyses.
Trimmed sequences for which the SNP has been trimmed out, rarely, by adaptermis-identity are deleted.
The script writes out the composite haplotypes for each individual as afastA file. Requires 'TrimmedSequence' to be among the locus metrics(@other$loc.metrics) and information of the type of alleles (slotloc.all e.g. 'G/A') and the position of the SNP in slot position of the“'genlight“' object (see testset.gl@position and testset.gl@loc.all forhow to format these slots.)
When trimmed.sequence = FALSE, loci that are not SNPs are removed.
Value
A new gl object with all loci rendered homozygous.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
Examples
gl <- gl.filter.reproducibility(testset.gl,t=1)gl <- gl.filter.overshoot(gl,verbose=3)gl <- gl.filter.callrate(testset.gl,t=.98)gl <- gl.filter.monomorphs(gl)gl2fasta(gl, method=1, outfile='test.fasta',verbose=3)test <- gl.subsample.loci(platypus.gl,n=100)gl2fasta(test)Converts a genlight object into faststructure format (to run faststructureelsewhere)
Description
Recodes in the quite specific faststructure format (e.g first six columnsneed to be there, but are ignored...check faststructure documentation(if you find any :-( )))
Usage
gl2faststructure( x, outfile = "gl.str", outpath = tempdir(), probar = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default "gl.str"]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
probar | Switch to show/hide progress bar [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
The script writes out the a file in faststructure format.
Value
returns no value (i.e. NULL)
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Converts a genlight object into gds format
Description
Package SNPRelate relies on a bit-level representation of a SNP dataset thatcompetes with {adegenet} genlight objects and associated files. Thisfunction converts a genlight object to a gds format file.
Usage
gl2gds( x, outfile = "gl_gds.gds", outpath = tempdir(), snp_pos = "0", snp_chr = "0", chr_format = "character", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default 'gl_gds.gds']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
snp_pos | Field name from the slot loc.metrics where the SNP position isstored [default '0']. |
snp_chr | Field name from the slot loc.metrics where the chromosome ofeach is stored [default '0']. |
chr_format | Whether chromosome information is stored as 'numeric' or as'character', see details [default 'character']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
This function orders the SNPS by chromosome and by position before convertingto SNPRelate format, as required by this package.
The chromosome of each SNP can be a character or numeric, as described in thevignette of SNPRelate:'snp.chromosome, an integer or character mapping for each chromosome.Integer: numeric values 1-26, mapped in order from 1-22, 23=X, 24=XY(the pseudoautosomal region), 25=Y, 26=M (the mitochondrial probes), and 0for probes with unknown positions; it does not allow NA. Character: “X”,“XY”, “Y” and “M” can be used here, and a blank string indicating unknownposition.'
When using some functions from package SNPRelate with datasets other thanhumans it might be necessary to use the option autosome.only=FALSE to avoiddetecting chromosome coding. So, it is important to read the documentation ofthe function before using it.
The chromosome information for unmapped SNPS is coded as 0, as required bySNPRelate.
Remember to close the GDS file before working in a different GDS object withthe functionsnpgdsClose (package SNPRelate).
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
Examples
require("dartR.data")gl2gds(platypus.gl,snp_pos='ChromPos_Platypus_Chrom_NCBIv1',snp_chr = 'Chrom_Platypus_Chrom_NCBIv1')Converts a genlight object into a format suitable for input to genalex
Description
The output csv file contains the snp data and other relevant lines suitablefor genalex. This script is a wrapper forgenind2genalex(package poppr).
Usage
gl2genalex( x, outfile = "genalex.csv", outpath = tempdir(), overwrite = FALSE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default 'genalex.csv']. |
outpath | Path where to save the output file [default tempdir()]. |
overwrite | If FALSE and filename exists, then the file will not beoverwritten. Set this option to TRUE to overwrite the file [default FALSE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end;2, progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos, Author: Katrin Hohwieler, wrapper ArthurGeorges (Post tohttps://groups.google.com/d/forum/dartr)
References
Peakall, R. and Smouse P.E. (2012) GenAlEx 6.5: genetic analysisin Excel. Population genetic software for teaching and research-an update.Bioinformatics 28, 2537-2539.http://bioinformatics.oxfordjournals.org/content/28/19/2537
Examples
gl2genalex(testset.gl, outfile='testset.csv')Converts a genlight object into genepop format (and file)
Description
The genepop format is used by several external applications (for exampleNeestimator2.So the main idea is to create the genepop file and then run the othersoftware externally. As a feature, the genepop file is also returned as aninvisible data.frame by the function.
Usage
gl2genepop( x, outfile = "genepop.gen", outpath = tempdir(), pop_order = "alphabetic", output_format = "2_digits", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file [default 'genepop.gen']. |
outpath | Path where to save the output file. Use outpath=getwd() oroutpath='.' when calling this function to direct output files to your workingdirectory [default tempdir(), mandated by CRAN]. |
pop_order | Order of the output populations either "alphabetic" or a vector of population names in the order required by the user (see examples)[default "alphabetic"]. |
output_format | Whether to use a 2-digit format ("2_digits") or 3-digitsformat ("3_digits") [default "2_digits"]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
Invisible data frame in genepop format
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
## Not run: require("dartR.data")# SNP datageno <- gl2genepop(testset.gl[1:3,1:9])head(geno)test <- gl.filter.callrate(platypus.gl,threshold = 1)popNames(test)gl2genepop(test, pop_order = c("TENTERFIELD","SEVERN_ABOVE","SEVERN_BELOW"), output_format="3_digits")## End(Not run)Converts a genlight object to geno format from package LEA
Description
The function converts a genlight object (SNP or presence/absencei.e. SilicoDArT data) into a file in the 'geno' and the 'lfmm' formats from (package LEA).
Usage
gl2geno(x, outfile = "gl_geno", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
outfile | File name of the output file [default 'gl_geno']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
Examples
# SNP datagl2geno(testset.gl)# Tag P/A datagl2geno(testset.gs)Converts a genlight object to genind object
Description
Converts a genlight object to genind object
Usage
gl2gi(x, probar = FALSE, verbose = NULL)Arguments
x | A genlight object [required]. |
probar | If TRUE, a progress bar will be displayed for long loops[default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
This function uses a faster version of df2genind (from the adegenetpackage)
Value
A genind object, with all slots filled.
Author(s)
Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Converts a genlight objects into hiphop format
Description
This function exports genlight objects to the format used by the parentageassignment R package hiphop. Hiphop can be used for paternity and maternityassignment and outperforms conventional methods where closely relatedindividuals occur in the pool of possible parents. The method compares thegenotypes of offspring with any combination of potentials parents and scoresthe number of mismatches of these individuals at bi-allelic genetic markers(e.g. Single Nucleotide Polymorphisms).
Usage
gl2hiphop(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
Dataframe containing all the genotyped individuals (offspring andpotential parents) and their genotypes scored using bi-allelic markers.
Author(s)
Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Cockburn, A., Penalba, J.V.,Jaccoud, D.,Kilian, A., Brouwer, L.,Double, M.C.,Margraf, N., Osmond, H.L., van de Pol, M. and Kruuk, L.E.B.(in revision).HIPHOP: improved paternity assignment among close relatives using a simpleexclusion method for bi-allelic markers. Molecular Ecology Resources, DOI tobe added upon acceptance
Examples
result <- gl2hiphop(testset.gl)Creates a Phylip input distance matrix from a genlight (SNP) {adegenet}object
Description
This function calculates and returns a matrix of Euclidean distances between populations and produces an input file for the phylogenetic program Phylip (Joe Felsenstein).
Usage
gl2phylip( x, outfile = "phyinput.txt", outpath = tempdir(), bstrap = 1, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required]. |
outfile | Name of the file to become the input file for phylip[default "phyinput.txt"]. |
outpath | Path where to save the output file [default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
bstrap | Number of bootstrap replicates [default 1]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity] |
Value
Matrix of Euclidean distances between populations.
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
Examples
result <- gl2phylip(testset.gl, outfile='test.txt', bstrap=10)Converts a genlight object into PLINK format
Description
This function exports a genlight object into PLINK format and save it into afile.This function produces the following PLINK files: bed, bim, fam, ped and map.
Usage
gl2plink( x, plink_path = getwd(), bed_file = FALSE, outfile = "gl_plink", outpath = tempdir(), chr_format = "character", pos_cM = "0", ID_dad = "0", ID_mom = "0", sex_code = "unknown", phen_value = "0", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
plink_path | Path of PLINK binary file [default getwd()]. |
bed_file | Whether create PLINK files .bed, .bim and .fam[default FALSE]. |
outfile | File name of the output file [default 'gl_plink']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
chr_format | Whether chromosome information is stored as 'numeric' or as'character', see details [default 'character']. |
pos_cM | A vector, with as many elements as there are loci, containingthe SNP position in morgans or centimorgans [default '0']. |
ID_dad | A vector, with as many elements as there are individuals,containing the ID of the father, '0' if father isn't in dataset [default '0']. |
ID_mom | A vector, with as many elements as there are individuals,containing the ID of the mother, '0' if mother isn't in dataset [default '0']. |
sex_code | A vector, with as many elements as there are individuals,containing the sex code ('male', 'female', 'unknown'). Sex information needs just to start with an "F" or "f" for females, with an "M" or "m" for males and with a "U", "u" or being empty if the sex is unknown [default 'unknown']. |
phen_value | A vector, with as many elements as there are individuals,containing the phenotype value. '1' = control, '2' = case, '0' = unknown[default '0']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
To create PLINK files .bed, .bim and .fam (bed_file = TRUE), it is necessaryto download the binary file of PLINK 1.9 and provide its path (plink_path).The binary file can be downloaded from:https://www.cog-genomics.org/plink/
After downloading, unzip the file, access the unzipped folder and move the binary file ("plink") to your working directory.
If you are using a Mac, you might need to open the binary first to grant access to the binary.
The chromosome of each SNP can be a character or numeric. The chromosomeinformation for unmapped SNPS is coded as 0.Family ID is taken from x$pop.Within-family ID (cannot be '0') is taken from indNames(x).Variant identifier is taken from locNames(x).SNP position is taken from the accessor x$position.Chromosome name is taken from the accessor x$chromosomeNote that if names of populations or individuals contain spaces, they are replaced by an underscore "_".
If you like to use chromosome information when converting to plink format andyour chromosome names are not from human, you need to change the chromosome names as 'contig1', 'contig2', etc. as described in the section "Nonstandardchromosome IDs" in the following link:https://www.cog-genomics.org/plink/1.9/input
Note that the function might not work if there are spaces in the path to theplink executable.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Purcell, Shaun, et al. 'PLINK: a tool set for whole-genome association andpopulation-based linkage analyses.' The American journal of human genetics81.3 (2007): 559-575.
Examples
require("dartR.data")test <- platypus.gl# assigning SNP positiontest$position <- test$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1# assigning a dummy name for chromosomestest$chromosome <- as.factor("1")gl2plink(test)Converts a genlight object to format suitable to be run with Coancestry
Description
The output txt file contains the SNP data and an additional column with thenames of the individual. The file then can be used and loaded into coancestryor - if installed - run with the related package. Be aware the relatedpackage was crashing in previous versions, but in general is using the samecode as coancestry and therefore should have identical results. Also runningcoancestry with thousands of SNPs via the GUI seems to be not reliable andtherefore for comparisons between coancestry and related we suggest to usethe command line version of coancestry.
Usage
gl2related( x, outfile = "related.txt", outpath = tempdir(), save = TRUE, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default 'related.txt']. |
outpath | Path where to save the output file [default tempdir()]. |
save | A switch if you want to save the file or not. This might beuseful for someone who wants to use the coancestry function to calculaterelatedness and not export to coancestry. See the example below[default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Value
A data.frame that can be used to run with the related package
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)
References
Jack Pew, Jinliang Wang, Paul Muir and Tim Frasier (2014).related: related: an R package for analyzing pairwise relatednessdata based on codominant molecular markers.R package version 0.8/r2.https://R-Forge.R-project.org/projects/related/
Examples
gtd <- gl2related(bandicoot.gl[1:10,1:20], save=FALSE)## Not run: ##running with the related package#install.packages('related', repos='http://R-Forge.R-project.org')library(related)coan <- coancestry(gtd, wang=1)head(coan$relatedness)##check ?coancestry for information how to use the function.## End(Not run)Converts genlight objects to the format used in the SNPassoc package
Description
This function exports a genlight object into a SNPassoc object. See packageSNPassoc for details. This function needs package SNPassoc. At the time ofwriting (August 2020) the package was no longer available from CRAN. Toinstall the package check their github repository.https://github.com/isglobal-brge/SNPassoc and/or useinstall_github('isglobal-brge/SNPassoc') to install the function anduncomment the function code.
Usage
gl2sa(x, verbose = NULL, installed = FALSE)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
installed | Switch to run the function once SNPassoc package is installed [default FALSE]. |
Value
Returns an object of class 'snp' to be used with SNPassoc.
Author(s)
Bernd Guber (Post tohttps://groups.google.com/d/forum/dartr)
References
Gonzalez, J.R., Armengol, L., Sol?, X., Guin?, E., Mercader, J.M., Estivill,X. and Moreno, V. (2017). SNPassoc: an R package to perform whole genomeassociation studies. Bioinformatics 23:654-655.
Converts a genlight object into a sfs input file
Description
The output of this function is suitable for analysis in fastsimcoal2 or dada.
Usage
gl2sfs( x, n.invariant.tags = 0, outfile_root = "gl2sfs", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
n.invariant.tags | Number of invariant sites[default 0]. |
outfile_root | The root of the name of the output file [default "gl2sfs"]. |
outpath | Path where to save the output file [default tempdir()]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
It saves a derived sfs, assuming that the reference allele is the ancestral,and a MAF sfs.
At this stage this function caters only for diploid organisms, for samplesfrom one population only, and for genotypes without missing data. Note thatsfs uses frequencies consideredindependent, data are assumed to befrom independent (i.e. not linked) loci. This means that only one site per tagshould be considered 9i.e. secondaries should be removed). If no monomorphicsite estimates is provided (withn.invariant.tags), the sfs will onlyinclude the number of monomorphic sites in the data (but this will be a biasedestimates as it doesn't take into account the invariant tags that have notbeen included. This will affect parameter estimates in the analyses). Notethat the number of invariant tags can be estimated withgl.report.secondaries. In a limited number of cases, ascertainment biascan be explicitly modelled in fastsimcoal2. See fastsimcoal2 manual fordetails.
It expects a dartR formatted genlight object, but it should also work withother genlight objects.
Value
Deprecated. Please use gl.sfs instead.
Author(s)
Custodian: Carlo Pacioni (Post tohttps://groups.google.com/d/forum/dartr)
References
Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V. C. andFoll M. (2013) Robust demographic inference from genomic and SNP data. PLoSgenetics 9(10)
See Also
gl.report.heterozygosity,gl.report.secondaries,utils.n.var.invariant
Converts a genlight object to ESRI shapefiles or kml files
Description
This function exports coordinates in a genlight object to a point shape file(including also individual meta data if available).Coordinates are provided under x@other$latlon and assumed to be in WGS84coordinates, if not proj4 string is provided.
Usage
gl2shp( x, type = "shp", proj4 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs", outfile = "gl", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data and locationdata, lat longs [required]. |
type | Type of output 'kml' or 'shp' [default 'shp']. |
proj4 | Proj4string of data set (see spatialreference.org forprojections) [default WGS84]. |
outfile | Name (path) of the output shape file [default 'gl']. shpextension is added automatically. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
returns a SpatVector file
Author(s)
Bernd Guber (Post tohttps://groups.google.com/d/forum/dartr)
Examples
out <- gl2shp(testset.gl)Converts a genlight object to nexus format suitable for phylogenetic analysisby SNAPP (via BEAUti)
Description
The output nexus file contains the SNP data and relevant PAUP command linessuitable for BEAUti.
Usage
gl2snapp(x, outfile = "snapp.nex", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including extension)[default "snapp.nex"]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
References
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A. andRoyChoudhury, A. (2012). Inferring species trees directly from biallelicgenetic markers: bypassing gene trees in a full coalescent analysis.Molecular Biology and Evolution 29:1917-1932.
Examples
gl2snapp(testset.gl)Converts a genlight object to STRUCTURE formatted files
Description
This function exports genlight objects to STRUCTURE formatted files (be awarethere is a gl2faststructure version as well). It is based on the codeprovided by Lindsay Clark (seehttps://github.com/lvclark/R_genetics_conv) and this function isbasically a wrapper around her numeric2structure function. See also: LindsayClark. (2017, August 22). lvclark/R_genetics_conv: R_genetics_conv 1.1(Version v1.1). Zenodo: doi.org/10.5281/zenodo.846816.
Usage
gl2structure( x, indNames = NULL, addcolumns = NULL, ploidy = 2, exportMarkerNames = TRUE, outfile = "gl.str", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data and locationdata, lat longs [required]. |
indNames | Specify individuals names to be added [if NULL, defaults to indNames(x)]. |
addcolumns | Additional columns to be added before genotypes [default NULL]. |
ploidy | Set the ploidy [defaults 2]. |
exportMarkerNames | If TRUE, locus names locNames(x) will be included [default TRUE]. |
outfile | File name of the output file (including extension) [default "gl.str"]. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.' when calling this function to direct output files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
returns no value (i.e. NULL)
Author(s)
Bernd Gruber (wrapper) and Lindsay V. Clark [lvclark@illinois.edu]
Examples
#not run here#gl2structure(testset.gl)Converts a genlight object to nexus format PAUP SVDquartets
Description
The output nexus file contains the SNP data in one of two forms, dependingupon what you regard as most appropriate. One form, that used by Chifman andKubatko, has two lines per individual, one providing the reference SNP thesecond providing the alternate SNP (method=1).
Usage
gl2svdquartets( x, outfile = "svd.nex", outpath = tempdir(), method = 2, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data or tag P/A data[required]. |
outfile | File name of the output file (including extension)[default 'svd.nex']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() when calling thisfunction or set.tempdir <- getwd() elsewhere in your script to direct outputfiles to your working directory. |
method | Method = 1, nexus file with two lines per individual; method =2, nexus file with one line per individual, ambiguity codes for SNPgenotypes, 0 or 1 for presence/absence data [default 2]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity] |
Details
A second form, recommended by Dave Swofford, has a single line perindividual, resolving heterozygous SNPs by replacing them with standardambiguity codes (method=2).
If the data are tag presence/absence, then method=2 is assumed.
Note that the genlight object must contain at least two populations for thisfunction to work.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
References
Chifman, J. and L. Kubatko. 2014. Quartet inference from SNP dataunder the coalescent. Bioinformatics 30: 3317-3324
Examples
gg <- testset.gl[1:20,1:100]gg@other$loc.metrics <- gg@other$loc.metrics[1:100,]gl2svdquartets(gg)Converts a genlight object to a treemix input file
Description
The output file contains the SNP data in the format expected by treemix –see the treemix manual. The file will be gzipped before in order to berecognised by treemix. Plotting functions provided with treemix will need tobe sourced from the treemix download page.
Usage
gl2treemix( x, outfile = "treemix_input.gz", outpath = tempdir(), verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
outfile | File name of the output file (including gz extension)[default 'treemix_input.gz']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() when calling thisfunction or set.tempdir <- getwd() elsewhere in your script to directoutput files to your working directory. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
References
Pickrell and Pritchard (2012). Inference of population splits andmixtures from genome-wide allele frequency data. PLoS Geneticshttps://doi.org/10.1371/journal.pgen.1002967
Examples
gl2treemix(testset.gl, outpath=tempdir())Converts a genlight object into vcf format
Description
This function exports a genlight object into VCF format and save it into afile.
Usage
gl2vcf( x, plink_path = getwd(), outfile = "gl_vcf", outpath = tempdir(), snp_pos = "0", snp_chr = "0", chr_format = "character", pos_cM = "0", ID_dad = "0", ID_mom = "0", sex_code = "unknown", phen_value = "0", verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
plink_path | Path of PLINK binary file [default getwd())]. |
outfile | File name of the output file [default 'gl_vcf']. |
outpath | Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory. |
snp_pos | Field name from the slot loc.metrics where the SNP position isstored [default '0']. |
snp_chr | Field name from the slot loc.metrics where the chromosome ofeach is stored [default '0']. |
chr_format | Whether chromosome information is stored as 'numeric' or as'character', see details [default 'character']. |
pos_cM | A vector, with as many elements as there are loci, containingthe SNP position in morgans or centimorgans [default '0']. |
ID_dad | A vector, with as many elements as there are individuals,containing the ID of the father, '0' if father isn't in dataset [default '0']. |
ID_mom | A vector, with as many elements as there are individuals,containing the ID of the mother, '0' if mother isn't in dataset [default '0']. |
sex_code | A vector, with as many elements as there are individuals,containing the sex code ('male', 'female', 'unknown') [default 'unknown']. |
phen_value | A vector, with as many elements as there are individuals,containing the phenotype value. '1' = control, '2' = case, '0' = unknown[default '0']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
Details
This function requires to download the binary file of PLINK 1.9 and provideits path (plink_path).The binary file can be downloaded from:https://www.cog-genomics.org/plink/
The chromosome information for unmapped SNPS is coded as 0.Family ID is taken from x$popWithin-family ID (cannot be '0') is taken from indNames(x)Variant identifier is taken from locNames(x)
#' Note that if names of populations or individuals contain spaces, they are replaced by an underscore "_".
If you like to use chromosome information when converting to plink format andyour chromosome names are not from human, you need to change the chromosome names as 'contig1', 'contig2', etc. as described in the section "Nonstandardchromosome IDs" in the following link:https://www.cog-genomics.org/plink/1.9/input
Note that the function might not work if there are spaces in the path to theplink executable.
Value
returns no value (i.e. NULL)
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
References
Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M.A., ... & 1000 Genomes Project Analysis Group. (2011). The variant callformat and VCFtools. Bioinformatics, 27(15), 2156-2158.
Examples
## Not run: require("dartR.data")gl2vcf(platypus.gl,snp_pos='ChromPos_Platypus_Chrom_NCBIv1', snp_chr = 'Chrom_Platypus_Chrom_NCBIv1')## End(Not run)Shiny app for the input of the reference table for the simulations
Description
Shiny app for the input of the reference table for the simulations
Usage
interactive_reference()Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Shiny app for the input of the simulations variables
Description
Shiny app for the input of the simulations variables
Usage
interactive_sim_run()Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Tests if two populations are fixed at a given locus
Description
This script compares two percent allele frequenciesand reports TRUE if they represent a fixed difference, FALSE otherwise.
Usage
is.fixed(s1, s2, tloc = 0)Arguments
s1 | Percentage SNP allele or sequence tag frequency for the first population [required]. |
s2 | Percentage SNP allele or sequence tag frequency for the second population [required]. |
tloc | Threshold value for tolerance in when a difference is regarded asfixed [default 0]. |
Details
A fixed difference at a locus occurs when two populations share no alleles,noting that SNPs are biallelic (ploidy=2).Tolerance in the definition of a fixed difference is provided by the tparameter. For example, t=0.05 means that SNP allele frequencies of 95,5 and5,95 percent will be reported as fixed (TRUE).
Value
TRUE (fixed difference) or FALSE (alleles shared) or NA (one or both s1 or s2 missing)
Author(s)
Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr)
See Also
Examples
is.fixed(s1=100, s2=0, tloc=0)is.fixed(96, 4, tloc=0.05)Example data set as text file to be imported into a genlight object
Description
Check ?read.genetable in pacakge PopGenReport for details on the format.
Format
csv
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr
Examples
library(PopGenReport)read.csv( paste(.libPaths()[1],'/dartR/extdata/platy.csv',sep='' ))platy <- read.genetable( paste(.libPaths()[1],'/dartR/extdata/platy.csv',sep='' ), ind=1, pop=2, lat=3, long=4, other.min=5, other.max=6, oneColPerAll=FALSE, sep='/')platy.gl <- gi2gl(platy, parallel=FALSE)df.loc <- data.frame(RepAvg = runif(nLoc(platy.gl)), CallRate = 1)platy.gl@other$loc.metrics <- df.locgl.report.reproducibility(platy.gl)A simulated genlight object created to run a landscape genetic example
Description
This a test data set to run a landscape genetics example. It contains 10 populations of 30 individuals each and each individual has 300 loci. There are no covariates for individuals or loci.
Usage
possums.glFormat
genlight object
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr
adjust rbind for dartR
Description
rbind is a bit lazy and does not take care for the metadata (so data in theother slot is lost). You can get most of the loci metadata back usinggl.compliance.check.
Usage
## S3 method for class 'dartR'rbind(...)Arguments
... | list of dartR objects |
Value
A genlight object
Examples
t1 <- platypus.glclass(t1) <- "dartR"t2 <- rbind(t1[1:5,],t1[6:10,])A genlight object created via the gl.read.dart function
Description
This is a test data set on turtles. 250 individuals, 255 loci in >30 populations.
Usage
testset.glFormat
genlight object
Author(s)
Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr
A genlight object created via the gl.read.silicodart function
Description
This is a test data set on turtles. 218 individuals, 255 loci in >30 populations.
Usage
testset.gsFormat
genlight object
Author(s)
Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr
Testfile in DArT format (as provided by DArT)
Description
This test data set is provided to show a typical DArT file format. Can be used to create a genlight object using the read.dart function.
Format
csv
Author(s)
Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr
Metadata file. Can be integrated via the dart2genlight function.
Description
Metadata file. Can be integrated via the dart2genlight function.
Format
csv
Author(s)
Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr
Recode file to be used with the function.
Description
This test data set is provided to show a typical recode file format.
Format
csv
Author(s)
Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr
dartR theme
Description
This is the theme used as default for dartR plots.This function controls all non-data display elements in the plots.
Usage
theme_dartR( base_size = 11, base_family = "", base_line_size = base_size/22, base_rect_size = base_size/22)Arguments
base_size | base font size, given in pts. |
base_family | base font family |
base_line_size | base size for line elements |
base_rect_size | base size for rect elements |
Examples
#ggplot(data.frame(dummy=rnorm(1000)),aes(dummy)) +#geom_histogram(binwidth=0.1) + theme_dartR()Population assignment probabilities
Description
This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.
Usage
utils.assignment(x, unknown, verbose = 2)Arguments
x | Name of the genlight object containing the SNP data [required]. |
unknown | Name of the individual to be assigned to a population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function is a re-implementation of the function multilocus_assignmentfrom package gstudio.Description of the method used in this function can be found at:https://dyerlab.github.io/applied_population_genetics/population-assignment.html
Value
Adata.frame consisting of assignment probabilities for eachpopulation.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")res <- utils.assignment(platypus.gl,unknown="T27")Population assignment probabilities
Description
This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.
Usage
utils.assignment_2(x, unknown, verbose = 2)Arguments
x | Name of the genlight object containing the SNP data [required]. |
unknown | Name of the individual to be assigned to a population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function is a re-implementation of the function multilocus_assignmentfrom package gstudio.Description of the method used in this function can be found at:https://dyerlab.github.io/applied_population_genetics/population-assignment.html
Value
Adata.frame consisting of assignment probabilities for eachpopulation.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")res <- utils.assignment_2(platypus.gl,unknown="T27")Population assignment probabilities
Description
This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.
Usage
utils.assignment_3(x, unknown, verbose = 2)Arguments
x | Name of the genlight object containing the SNP data [required]. |
unknown | Name of the individual to be assigned to a population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function is a re-implementation of the function multilocus_assignmentfrom package gstudio.Description of the method used in this function can be found at:https://dyerlab.github.io/applied_population_genetics/population-assignment.html
Value
Adata.frame consisting of assignment probabilities for eachpopulation.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")res <- utils.assignment_2(platypus.gl,unknown="T27")Population assignment probabilities
Description
This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.
Usage
utils.assignment_4(x, unknown, verbose = 2)Arguments
x | Name of the genlight object containing the SNP data [required]. |
unknown | Name of the individual to be assigned to a population [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]. |
Details
This function is a re-implementation of the function multilocus_assignmentfrom package gstudio.Description of the method used in this function can be found at:https://dyerlab.github.io/applied_population_genetics/population-assignment.html
Value
Adata.frame consisting of assignment probabilities for eachpopulation.
Author(s)
Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")res <- utils.assignment_2(platypus.gl,unknown="T27")Calculates mean observed heterozygosity, mean expected heterozygosity and Fisper locus, per population and various population differentiation measures
Description
This is a re-implementation ofhierfstat::basics.stats specifically for genlight objects. Formula (and hence results) match exactly the original version ofhierfstat::basics.stats but it is much faster.
Usage
utils.basic.stats(x, digits = 4)Arguments
x | A genlight object containing the SNP genotypes [required]. |
digits | Number of decimals to report [default 4] |
Value
A list with with the statistics for each population
Author(s)
Luis Mijangos and Carlo Pacioni (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
require("dartR.data")out <- utils.basic.stats(platypus.gl)Utility function to check the class of an object passed to a function
Description
Most functions require access to a genlight object, dist matrix, data matrixor fixed difference list (fd), and this function checks that a genlightobject or one of the above has been passed, whether the genlight object is aSNP dataset or a SilicoDArT object, and reports back if verbosity is >=2.
Usage
utils.check.datatype( x, accept = c("genlight", "SNP", "SilicoDArT"), verbose = NULL)Arguments
x | Name of the genlight object, dist matrix, data matrix, glPCA, orfixed difference list (fd) [required]. |
accept | Vector containing the classes of objects that are to beaccepted [default c('genlight','SNP','SilicoDArT']. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]. |
Details
This function checks the class of passed object and sets the datatype to'SNP', 'SilicoDArT', 'dist', 'mat', or class[1](x) as appropriate.
Note also that this function checks to see if there are individuals or lociscored as all missing (NA) and if so, issues the user with a warning.
Note: One and only one of gl.check, fd.check, dist.check or mat.check can beTRUE.
Value
datatype, 'SNP' for SNP data, 'SilicoDArT' for P/A data, 'dist' for adistance matrix, 'mat' for a data matrix, 'glPCA' for an ordination file, orclass(x)[1].
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
datatype <- utils.check.datatype(testset.gl)datatype <- utils.check.datatype(as.matrix(testset.gl),accept='matrix')fd <- gl.fixed.diff(testset.gl)datatype <- utils.check.datatype(fd,accept='fd')datatype <- utils.check.datatype(testset.gl)Functions from package starmie for merging Q matrices from Structure runsusing the CLUMPP algorithms.
Description
Functions from package starmie for merging Q matrices from Structure runsusing the CLUMPP algorithms.
Usage
utils.clumpp(Q_list, method, iter)Arguments
Q_list | A list of of Q matrices. |
method | The algorithm to use to infer the correct permutations. One of'greedy' or 'greedyLargeK' or 'stephens' |
iter | The number of iterations to use if running either 'greedy' or'greedyLargeK' |
Converts DarT to genlight.
Description
Converts a DArT file (read viaread.dart) into angenlight objectadegenet.
Usage
utils.dart2genlight( dart, ind.metafile = NULL, covfilename = NULL, probar = TRUE, verbose = NULL)Arguments
dart | A dart object created via read.dart [required]. |
ind.metafile | Optional file in csv format with metadata for eachindividual (see details for explanation) [default NULL]. |
covfilename | Depreciated, use parameter ind.metafile. |
probar | Show progress bar [default TRUE]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default NULL]. |
Details
The ind.metadata file needs to have very specific headings. First a headingcalled id. Here the ids have to match the ids in the dartR object. The following column headings are optional.pop: specifies the population membership of each individual. lat and lonspecify spatial coordinates (in decimal degrees WGS1984 format). Additionalcolumns with individual metadata can be imported (e.g. age, gender).
Value
A genlight object. Including all available slots are filled.loc.names, ind.names, pop, lat, lon (if provided via the ind.metadata file)
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
Calculates a distance matrix for individuals defined in a dartRgenlight object using binary P/A data (SilicoDArT)
Description
This script calculates various distances between individuals based on sequence tagPresence/Absence data.
Usage
utils.dist.binary( x, method = "simple", scale = FALSE, swap = FALSE, output = "dist", verbose = NULL)Arguments
x | Name of the genlight containing the genotypes [required]. |
method | Specify distance measure [default simple]. |
scale | If TRUE and method='euclidean', the distance will be scaled to fall in the range [0,1] [default FALSE]. |
swap | If TRUE and working with presence-absence data, then presence (no disrupting mutation) is scored as 0 and absence (presence of a disrupting mutation) is scored as 1 [default FALSE]. |
output | Specify the format and class of the object to be returned, dist for a object of class dist, matrix for an object of class matrix [default "dist"]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
The distance measure can be one of:
Euclidean – Euclidean Distance applied to cartesian coordinates definedby the loci, scored as 0 or 1. Presence and absence equally weighted.
simple – simple matching, both 1 or both 0 = 0; one 1 and the other0 = 1. Presence and absence equally weighted.
Jaccard – ignores matching 0, both 1 = 0; one 1 and the other 0 = 1.Absences could be for different reasons.
Bray-Curtis – both 0 = 0; both 1 = 2; one 1 and the other 0 = 1. Absencescould be for different reasons. Sometimes called the Dice or Sorensendistance.
One might choose to disregard or downweight absences in comparison withpresences because the homology of absences is less clear (mutation at one orthe other, or both restriction sites). Your call.
Value
An object of class 'dist' or 'matrix' giving distances between individuals
Author(s)
Author: Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
D <- utils.dist.binary(testset.gs, method='Jaccard')D <- utils.dist.binary(testset.gs, method='Euclidean',scale=TRUE)D <- utils.dist.binary(testset.gs, method='Simple')Calculates a distance matrix for individuals defined in a dartRgenlight object using SNP data (DArTseq)
Description
This script calculates various distances between individuals based on SNP genotypes.
Usage
utils.dist.ind.snp( x, method = "Euclidean", scale = FALSE, output = "dist", verbose = NULL)Arguments
x | Name of the genlight containing the genotypes [required]. |
method | Specify distance measure [default Euclidean]. |
scale | If TRUE and method='Euclidean', the distance will be scaled to fall in the range [0,1] [default FALSE]. |
output | Specify the format and class of the object to be returned, dist for a object of class dist, matrix for an object of class matrix [default "dist"]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
The distance measure can be one of:
Euclidean – Euclidean Distance applied to Cartesian coordinates definedby the loci, scored as 0, 1 or 2.
Simple – simple mismatch, 0 where no alleles are shared, 1 where oneallele is shared, 2 where both alleles are shared.
Absolute – absolute mismatch, 0 where no alleles are shared, 1 whereone or both alleles are shared.
Czekanowski (or Manhattan) calculates the city block metric distanceby summing the scores on each axis (locus).
Value
An object of class 'dist' or 'matrix' giving distances between individuals
Author(s)
Author(s): Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr
Examples
D <- utils.dist.ind.snp(testset.gl, method='Manhattan')D <- utils.dist.ind.snp(testset.gl, method='Euclidean',scale=TRUE)D <- utils.dist.ind.snp(testset.gl, method='Simple')A utility script to flag the start of a script
Description
A utility script to flag the start of a script
Usage
utils.flag.start(func = NULL, build = NULL, verbose = NULL)Arguments
func | Name of the function that is starting [required]. |
build | Name of the build [default NULL]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Value
calling function name
Author(s)
Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr@export
Calculates the Hamming distance between two DArT trimmed DNA sequences
Description
Hamming distance is calculated as the number of base differences between twosequences which can be expressed as a count or a proportion. Typically, it iscalculated between two sequences of equal length. In the context of DArTtrimmed sequences, which differ in length but which are anchored to the leftby the restriction enzyme recognition sequence, it is sensible to compare thetwo trimmed sequences starting from immediately after the common recognitionsequence and terminating at the last base of the shorter sequence.
Usage
utils.hamming(str1, str2, r = 4)Arguments
str1 | String containing the first sequence [required]. |
str2 | String containing the second sequence [required]. |
r | Number of bases in the restriction enzyme recognition sequence[default 4]. |
Details
The Hamming distance between the rows of a matrix can be computed quicklyby exploiting the fact that the dot product of two binary vectors x and (1-y)counts the corresponding elements that are different between x and y.This matrix multiplication can also be used for matrices with more than twopossible values, and different types of elements, such as DNA sequences.
The function calculates the Hamming distance between all columns of amatrix X, or two matrices X and Y. Again matrix multiplication is used, thistime for counting, between two columns x and y, the number of cases in whichcorresponding elements have the same value (e.g. A, C, G or T). This countingis done for each of the possible values individually, while iteratively addingthe results. The end result of the iterative adding is the sum of allcorresponding elements that are the same, i.e. the inverse of the Hammingdistance. Therefore, the last step is to subtract this end result H from themaximum possible distance, which is the number of rows of matrix X.
If the two DNA sequences are of differing length, the longer is truncated. Theinitial common restriction enzyme recognition sequence is ignored.
The algorithm is that of Johann de Jonghttps://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/
Value
Hamming distance between the two strings
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
Calculates expected mean expected heterozygosity per population
Description
Calculates expected mean expected heterozygosity per population
Usage
utils.het.pop(x, t_het)Arguments
x | A genlight object containing the SNP genotypes [required]. |
t_het | A string specifying the type of expected heterozygosity to becalculated. Options are "He" for expected heterozygosity and "Ho" for observed |
Value
A vector with the mean expected heterozygosity for each population
Author(s)
Bernd Gruber & Luis Mijangos (bugs? Post tohttps://groups.google.com/d/forum/dartr)
Examples
out <- utils.het.pop(testset.gl,t_het="He")Conducts jackknife resampling using a genlight object
Description
Jackknife resampling is a statistical procedure where for a dataset of sample size n, subsamples of size n-1 are used to compute a statistic. The collection of the values obtained can be used to evaluate the variability around the point estimate. This function can take the loci, the individuals or the populations as units over which to conduct resampling.
Note that when n is very small, jackknife resampling is not recommended.
Parallel computation is implemented. The argumentn.cores indicates the number of core to use. If "auto" [default], it will use all but one available cores. If the number of units is small (e.g. a few populations), there is not real advantage in using parallel computation. On the other hand, if the number of units is large (e.g. thousands of loci), even with parallel computation, this function can be very slow.
Usage
utils.jackknife( x, FUN, unit = "loc", recalc = FALSE, mono.rm = FALSE, n.cores = "auto", verbose = NULL, ...)Arguments
x | Name of the genlight object [required]. |
FUN | the name of the function to be used to calculate the statistic |
unit | The unit to use for resampling. One of c("loc", "ind", "pop"): loci, individuals or populations |
recalc | If TRUE, recalculate the locus metadata statistics [default FALSE]. |
mono.rm | If TRUE, remove monomorphic and all NA loci [default FALSE]. |
n.cores | The number of cores to use. If "auto" [default], it will use all but one available cores. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]. |
... | any additional arguments to be passed to FUN |
Value
A list of length n where each element is the output of FUN
Author(s)
Custodian: Carlo Pacioni – Post tohttps://groups.google.com/d/forum/dartr
Examples
require("dartR.data")platMod.gl <- gl.filter.allna(platypus.gl) chk.pop <- utils.jackknife(x=platMod.gl, FUN="gl.alf", unit="pop", recalc = FALSE, mono.rm = FALSE, n.cores = 1, verbose=0)A utility script to calculate the number of variant and invariant sites bylocus
Description
Calculate the number of variant and invariant sites by locus and add them ascolumns inloc.metrics. This can be useful to conduct furtherfiltering, for example where only loci with secondaries are wanted forphylogenetic analyses.
Usage
utils.n.var.invariant(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL]. |
Details
Invariant sites are the sites (nucleotide) that are not polymorphic. When thelocus metadata supplied by DArT includes the sequence of the allele(TrimmedSequence), it is used by this function to estimate the numberof sites that were sequenced in each tag (read). This script then subtractsthe number of polymorphic sites. The length of the trimmed sequence(lenTrimSeq), the number of variant (n.variant) andinvariant (n.invariant) sites are the added to the table ingl@others$loc.metrics.
NOTE: It is important to realise that this function correctlyestimates the number of variant and invariant sites only when it is executed ongenlight objects before secondaries are removed.
Value
The modified genlight object.
Author(s)
Carlo Pacioni (Post tohttps://groups.google.com/d/forum/dartr)
See Also
gl.filter.secondaries,gl.report.heterozygosity
Examples
require("dartR.data")out <- utils.n.var.invariant(platypus.gl)OutFLANK: An Fst outlier approach by Mike Whitlock and Katie Lotterhos,University of British Columbia.
Description
This function is the original implementation of Outflank by Whitlock andLotterhos. dartR simply provides a convenient wrapper around their functionsand an easier install being an r package (for information please refer totheir github repository)
Usage
utils.outflank( FstDataFrame, LeftTrimFraction = 0.05, RightTrimFraction = 0.05, Hmin = 0.1, NumberOfSamples, qthreshold = 0.05)Arguments
FstDataFrame | A data frame that includes a row for each locus, withcolumns as follows:
|
LeftTrimFraction | The proportion of loci that are trimmed from thelower end of the range of Fst before the likelihood funciton is applied[default 0.05]. |
RightTrimFraction | The proportion of loci that are trimmed from theupper end of the range of Fst before the likelihood funciton is applied[default 0.05]. |
Hmin | The minimum heterozygosity required before including calculationsfrom a locus [default 0.1]. |
NumberOfSamples | The number of spatial locations included in the dataset. |
qthreshold | The desired false discovery rate threshold for calculatingq-values [default 0.05]. |
Details
This method looks for Fst outliers from a list of Fst's for different loci.It assumes that each locus has been genotyped in all populations withapproximately equal coverage.
OutFLANK estimates the distribution of Fst based on a trimmed sample of Fst's.It assumes that the majority of loci in the center of the distribution areneutral and infers the shape of the distribution of neutral Fst using atrimmed set of loci. Loci with the highest and lowest Fst's are trimmed fromthe data set before this inference, and the distribution of Fst df/(mean Fst)is assumed to'follow a chi-square distribution. Based on this inferreddistribution, each locus is given a q-value based on its quantile in theinferred null'distribution.
The main procedure is called OutFLANK – see comments in that functionimmediately below for input and output formats. The other functions here arenecessary and must be uploaded, but are not necessarily needed by the userdirectly.
Steps:
Value
The function returns a list with seven elements:
FSTbar: the mean FST inferred from loci not marked as outliers
FSTNoCorrbar: the mean FST (not corrected for sample size -gives anupwardly biased estimate of FST)
dfInferred: the inferred number of degrees of freedom for thechi-square distribution of neutral FST
numberLowFstOutliers: Number of loci flagged as having a significantlylow FST (not reliable)
numberHighFstOutliers: Number of loci identified as havingsignificantly high FST
results: a data frame with a row for each locus. This data frameincludes all the original columns in thedata set, and six new ones:
$indexOrder (the original order of the input data set),
$GoodH (Boolean variable which is TRUE if the expectedheterozygosity is greater than the Hemin set by input),
$OutlierFlag (TRUE if the method identifies the locus asan outlier, FALSE otherwise), and
$q (the q-value for the test of neutrality for the locus)
$pvalues (the p-value for the test of neutrality for thelocus)
$pvaluesRightTail the one-sided (right tail) p-value fora locus
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofWhitlock & Lotterhos
Creates OutFLANK input file from individual genotype info.
Description
Creates OutFLANK input file from individual genotype info.
Usage
utils.outflank.MakeDiploidFSTMat(SNPmat, locusNames, popNames)Arguments
SNPmat | This is an array of genotypes with a row for each individual.There should be a column for each SNP, with the number of copies of the focalallele (0, 1, or 2) for that individual. If that individual is missing datafor that SNP, there should be a 9, instead. |
locusNames | A list of names for each SNP locus. There should be thesame number of locus names as there are columns in SNPmat. |
popNames | A list of population names to give location for eachindividual. Typically multiple individuals will have the same popName. Thelist popNames should have the same length as the number of rows in SNPmat. |
Value
Returns a data frame in the form needed for the main OutFLANKfunction.
Plotting functions for Fst distributions after OutFLANK
Description
This function takes the output of OutFLANK asinput with the OFoutput parameter. It plots a histogram of the FST (bydefault, the uncorrected FSTs used by OutFLANK) of loci and overlays theinferred null histogram.
Usage
utils.outflank.plotter( OFoutput, withOutliers = TRUE, NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE, RightZoomFraction = 0.05, titletext = NULL)Arguments
OFoutput | The output of the function OutFLANK() |
withOutliers | Determines whether the loci marked as outliers (with$OutlierFlag) are included in the histogram. |
NoCorr | Plots the distribution of FSTNoCorr when TRUE. Recommended,because this is the data used by OutFLANK to infer the distribution. |
Hmin | The minimum heterozygosity required before including a locus inthe plot. |
binwidth | The width of bins in the histogram. |
Zoom | If Zoom is set to TRUE, then the graph will zoom in on the righttail of the distirbution (based on argument RightZoomFraction) |
RightZoomFraction | Used when Zoom = TRUE. Defines the proportion of thedistribution to plot. |
titletext | Allows a test string to be printed as a title on the graph |
Value
produces a histogram of the FST
An internal function to save a ggplot object to disk in RDS binary format
Description
WARNING: UTILITY SCRIPTS ARE FOR INTERNAL USE ONLY AND SHOULD NOT BE USED BY END USERS AS THEIR USE OUT OF CONTEXT COULD LEAD TO UNPREDICTABLE OUTCOMES.
Usage
utils.plot.save(x, dir = NULL, file = NULL, verbose = NULL, ...)Arguments
x | Name of the ggplot object. |
dir | Name of the directory to save the file. |
file | Name of the file to save the plot to (omit file extension) |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity] |
... | Parameters passed to functionggsave, such as width and height, when the ggplot is to be saved. |
Details
An internal function to save a ggplot object to disk in RDS binary format.Uses saveRDS() to save the file with an .RDS extension; can be reloaded with gl.load().
Author(s)
Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)
utility function to read in DArT data
Description
Utility to import DarT data to RInternal function called by gl.read.dart()
Usage
utils.read.dart( filename, nas = "-", topskip = NULL, lastmetric = "RepAvg", service.row = 1, plate.row = 3, verbose = NULL)Arguments
filename | Path to file (csv file only currently) [required]. |
nas | A character specifying NAs [default '-']. |
topskip | A number specifying the number of rows to be skipped. If notprovided the number of rows to be skipped are 'guessed' by the number of rowswith '*' at the beginning [default NULL]. |
lastmetric | Specifies the last non genetic column [default 'RepAvg'].Be sure to check if that is true, otherwise the number of individuals willnot match. You can also specify the last column by a number. |
service.row | The row number in which the information of the DArTservice is contained [default 1]. |
plate.row | The row number in which the information of the platelocation is contained [default 3]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default NULL]. |
Value
A list of length 5. #dart format (one or two rows) #individuals,#snps, #non genetic metrics, #genetic data (still two line format, rows=snps,columns=individuals)
Author(s)
Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)
A utility script to recalculate the OneRatioRef, OneRatioSnp, PICRef, PICSnp,and AvgPIC by locus after some individuals or populations have been deleted.
Description
The locus metadata supplied by DArT has OneRatioRef, OneRatioSnp, PICRef,PICSnp, and AvgPIC included, but the allelic composition will change whensome individuals,or populations, are removed from the dataset and so theinitial statistics will no longer apply. This script recalculates thesestatistics and places the recalculated values in the appropriate place inthe genlight object.
Usage
utils.recalc.avgpic(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
If the locus metadata OneRatioRef|Snp, PICRef|Snp and/or AvgPIC do not exist,the script creates and populates them.
Value
The modified genlight object.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.callrate for recalculating CallRate,utils.recalc.freqhomref for recalculating frequency of homozygousreference,utils.recalc.freqhomsnp for recalculating frequency ofhomozygous alternate,utils.recalc.freqhet for recalculating frequencyof heterozygotes,gl.recalc.maf for recalculating minor allelefrequency,gl.recalc.rdepth for recalculating average read depth
Examples
#out <- utils.recalc.avgpic(testset.gl)A utility script to recalculate the callrate by locus after some populationshave been deleted
Description
SNP datasets generated by DArT have missing values primarily arising fromfailure to call a SNP because of a mutation at one or both of therestriction enzyme recognition sites. The locus metadata supplied by DArT hascallrate included, but the call rate will change when some individuals areremoved from the dataset. This script recalculates the callrate and placesthese recalculated values in the appropriate place in the genlight object.It sets the Call Rate flag to TRUE.
Usage
utils.recalc.callrate(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default 2]. |
Value
The modified genlight object
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.avgpic for recalculating avgPIC,utils.recalc.freqhomref for recalculating frequency of homozygousreference,utils.recalc.freqhomsnp for recalculating frequency ofhomozygous alternate,utils.recalc.freqhet for recalculating frequencyof heterozygotes,gl.recalc.maf for recalculating minor allelefrequency,gl.recalc.rdepth for recalculating average read depth
Examples
#out <- utils.recalc.callrate(testset.gl)A utility script to recalculate the frequency of the heterozygous SNPs bylocus after some populations have been deleted
Description
The locus metadata supplied by DArT has FreqHets included, but the frequencyof the heterozygotes will change when some individuals are removed from thedataset.
Usage
utils.recalc.freqhets(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
This script recalculates the FreqHets and places these recalculated values inthe appropriate place in the genlight object.
Note that the frequency of the homozygote reference SNPS is calculated fromthe individuals that could be scored.
Value
The modified genlight object.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.callrate for recalculating CallRate,utils.recalc.freqhomref for recalculating frequency of homozygousreference,utils.recalc.freqhomsnp for recalculating frequency ofhomozygous alternate,utils.recalc.AvgPIC for recalculating RepAvg,gl.recalc.maf forrecalculating minor allele frequency,gl.recalc.rdepth for recalculating average read depth
Examples
#out <- utils.recalc.freqhets(testset.gl)A utility script to recalculate the frequency of the homozygous referenceSNP by locus after some populations have been deleted
Description
The locus metadata supplied by DArT has FreqHomRef included, but thefrequency of the homozygous reference will change when some individuals areremoved from the dataset.
Usage
utils.recalc.freqhomref(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
This script recalculates the FreqHomRef and places these recalculated valuesin the appropriate place in the genlight object.
Note that the frequency of the homozygote reference SNPS is calculated fromthe individuals that could be scored.
Value
The modified genlight object
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.callrate for recalculating CallRate,utils.recalc.avgpic for recalculating AvgPIC,utils.recalc.freqhomsnp for recalculating frequency of homozygousalternate,utils.recalc.freqhet for recalculating frequency ofheterozygotes,gl.recalc.maf for recalculating minor allele frequency,gl.recalc.rdepth for recalculating average read depth
Examples
#result <- utils.recalc.freqhomref(testset.gl)A utility script to recalculate the frequency of the homozygous alternateSNP by locus after some populations have been deleted
Description
The locus metadata supplied by DArT has FreqHomSnp included, but thefrequency of the homozygous alternate will change when some individuals areremoved from the dataset.
Usage
utils.recalc.freqhomsnp(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Details
This script recalculates the FreqHomSnp and places these recalculated valuesin the appropriate place in the genlight object.
Note that the frequency of the homozygote alternate SNPS is calculated fromthe individuals that could be scored.
Value
The modified genlight object.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.callrate for recalculating CallRate,utils.recalc.freqhomref for recalculating frequency of homozygousreference,utils.recalc.avgpic for recalculating AvgPIC,utils.recalc.freqhet for recalculating frequency of heterozygotes,gl.recalc.maf for recalculating minor allele frequency,gl.recalc.rdepth for recalculating average read depth
Examples
#out <- utils.recalc.freqhomsnp(testset.gl)A utility script to recalculate the minor allele frequency by locus,typically after some populations have been deleted
Description
The locus metadata supplied by DArT does not have MAF included, so it iscalculated and added to the locus.metadata by this script. The minimum allelefrequency will change when some individuals are removed from the dataset.This script recalculates the MAF and places these recalculated values in theappropriate place in the genlight object.
Usage
utils.recalc.maf(x, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data [required]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2]. |
Value
The modified genlight dataset.
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.callrate for recalculating CallRate,utils.recalc.freqhomref for recalculating frequency of homozygousreference,utils.recalc.freqhomsnp for recalculating frequency ofhomozygous alternate,utils.recalc.freqhet for recalculating frequencyof heterozygotes,gl.recalc.avgpic for recalculating AvgPIC,gl.recalc.rdepth for recalculating average read depth
Examples
#f <- dartR::utils.recalc.maf(testset.gl)A utility script to reset to FALSE (or TRUE) the locus metric flags aftersome individuals or populations have been deleted.
Description
The locus metadata supplied by DArT has OneRatioRef, OneRatioSnp, PICRef,PICSnp, and AvgPIC included, but the allelic composition will change whensome individuals are removed from the dataset and so the initial statisticswill no longer apply. This applies also to some variable calculated by dartR(e.g. maf). This script resets the locus metrics flags to FALSE to indicatethat these statistics in the genlight object are no longer current. Theverbosity default is also set, and in the case of SilcoDArT, the flags PICand OneRatio are also set.
Usage
utils.reset.flags(x, set = FALSE, value = 2, verbose = NULL)Arguments
x | Name of the genlight object containing the SNP data ortag presence/absence data (SilicoDArT) [required]. |
set | Set the flags to TRUE or FALSE [default FALSE]. |
value | Set the default verbosity for all functions, where verbosity isnot specified [default 2]. |
verbose | Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default NULL]. |
Details
If the locus metrics do not exist then they are added to the genlight objectbut not populated. If the locus metrics flags do not exist, then they areadded to the genlight object and set to FALSE (or TRUE).
Value
The modified genlight object
Author(s)
Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)
See Also
utils.recalc.metrics for recalculating all metrics,utils.recalc.callrate for recalculating CallRate,utils.recalc.freqhomref for recalculating frequency of homozygousreference,utils.recalc.freqhomsnp for recalculating frequency ofhomozygous alternate,utils.recalc.freqhet for recalculating frequencyof heterozygotes,gl.recalc.maf for recalculating minor allele frequency,gl.recalc.rdepth for recalculating average read depth
Examples
#result <- utils.reset.flags(testset.gl)Spatial autocorrelation coefficient calculations
Description
Carries out calculation for spatial autocorrelation coefficientstarting from a genetic and geogaphic distance matrix.
Usage
utils.spautocor( GD, GGD, permutation = FALSE, bootstrap = FALSE, bins = 10, reps)Arguments
GD | Genetic distance matrix. |
GGD | Geographic distance matrix. |
permutation | Whether permutation calculations for the null hypothesis of no spatial structure should be carried out [default TRUE]. |
bootstrap | Whether bootstrap calculations to compute the 95% confidence intervals around r should be carried out [default TRUE]. |
bins | The number of bins for the distance classes(i.e. |
reps | The number to be used for permutation and bootstrap analyses[default 100]. |
Details
The code of this function is based onespautocorr from the packagePopGenReport, which has been modified to fix a few bugs (as ofPopGenReport v 3.0.4 and allow calculations of bootstraps estimates.
See details fromgl.spatial.autoCorr for a detailed explanation.
Value
Returns a data frame with the following columns:
Bin The distance classes
N The number of pairwise comparisons within each distance class
r.uc The uncorrected autocorrelation coefficient
if bothbootstap andpermutation areFALSE otherwise onlyr estimates are returned
Author(s)
Carlo Pacioni & Bernd Gruber
References
Smouse PE, Peakall R. 1999. Spatial autocorrelation analysis ofindividual multiallele and multilocus genetic structure. Heredity 82:561-573.
Double, MC, et al. 2005. Dispersal, philopatry and infidelity: dissectinglocal genetic structure in superb fairy-wrens (Malurus cyaneus). Evolution59, 625-635.
Peakall, R, et al. 2003. Spatial autocorrelation analysis offers newinsights into gene flow in the Australian bush rat, Rattus fuscipes.Evolution 57, 1182-1195.
Smouse, PE, et al. 2008. A heterogeneity test for fine-scale geneticstructure. Molecular Ecology 17, 3389-3400.
Gonzales, E, et al. 2010. The impact of landscape disturbance on spatialgenetic structure in the Guanacaste tree, Enterolobiumcyclocarpum(Fabaceae). Journal of Heredity 101, 133-143.
Beck, N, et al. 2008. Social constraint and an absence of sex-biaseddispersal drive fine-scale genetic structure in white-winged choughs.Molecular Ecology 17, 4346-4358.
See Also
Examples
# See gl.spatial.autoCorrUtil function for evanno plots
Description
These functions were copied from package strataG, which is no longer on CRAN (maintained by Eric Archer)
Usage
utils.structure.evanno(sr, plot = TRUE)Arguments
sr | structure run object |
plot | should the plots be returned |
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofEric Archerhttps://github.com/EricArcher/strataG
structure util functions
Description
These functions were copied from package strataG, which is no longer on CRAN (maintained by Eric Archer)
Usage
utils.structure.genind2gtypes(x)Arguments
x | a genind object |
Value
a gtypes object
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofEric Archerhttps://github.com/EricArcher/strataG
Utility function to run Structure
Description
These functions were copied from package strataG, which is no longer on CRAN(maintained by Eric Archer)
Usage
utils.structure.run( g, k.range = NULL, num.k.rep = 1, label = NULL, delete.files = TRUE, exec = "structure", ...)Arguments
g | a gtypes object [see |
k.range | vector of values to for |
num.k.rep | number of replicates for each value in |
label | label to use for input and output files |
delete.files | logical. Delete all files when STRUCTURE is finished? |
exec | name of executable for STRUCTURE. Defaults to "structure". |
... | arguments to be passed to |
Value
structureRuna list where each element is alist with results from
structureReadand a vector of the filenamesusedstructureWritea vector of the filenames used bySTRUCTURE
structureReada list containing:
summarynew locus name, which is a combination of loci ingroup
q.matdata.frame of assignment probabilities for eachid
prior.anclist of prior ancestry estimates for eachindividual where population priors were used
filesvector ofinput and output files used by STRUCTURE
labellabel for therun
Author(s)
Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofEric Archerhttps://github.com/EricArcher/strataG
Setting up the package
Description
Setting theme, colors and verbosity
Usage
zzzFormat
An object of classNULL of length 0.