Movatterモバイル変換

Type:

Package

Title:

Importing and Analysing 'SNP' and 'Silicodart' Data Generated byGenome-Wide Restriction Fragment Analysis

Version:

2.9.9.5

Date:

2025-03-24

Description:

Functions are provided that facilitate the import and analysis of 'SNP' (single nucleotide polymorphism) and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'CallRate', heterozygosity, reproducibility, maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR' supports also the analysis of 3rd party software package such as 'newhybrid', 'structure', 'NeEstimator' and 'blast'. Since version 2.0.3 we also implemented simulation functions, that allow to forward simulate 'SNP' dynamics under different population and evolutionary dynamics. Comprehensive tutorials and support can be found at our 'github' repository: github.com/green-striped-gecko/dartR/. If you want to cite 'dartR', you find the information by typing citation('dartR') in the console.

VignetteBuilder:

knitr

Encoding:

UTF-8

Depends:

R (≥ 3.5), adegenet (≥ 2.0.0), ggplot2, dplyr, dartR.data

Imports:

ape,crayon,data.table,fields,foreach,gridExtra,MASS,methods,patchwork,plyr,PopGenReport,raster,reshape2,shiny,SNPRelate,sp(≥ 1.6.1),StAMPP,stats,stringr,tidyr,utils, gsubfn, purrr

Suggests:

boot, devtools, directlabels, dismo, doParallel, expm,gdistance, ggtern, gganimate, ggrepel, grid, gtable, ggthemes,gplots, HardyWeinberg, hierfstat, igraph, iterpc, knitr,label.switching, lattice, leaflet, leaflet.minicharts,markdown, mmod, networkD3, parallel, pegas, pheatmap, plotly,poppr, proxy, qvalue, RColorBrewer, Rcpp, rgl, rmarkdown,rrBLUP, scales, seqinr, shinyBS, shinyjs, shinythemes,shinyWidgets, SIBER, snpStats, stringi, terra, tibble, vcfR,zoo, viridis, vegan

License:

GPL (≥ 3)

LazyData:

true

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-03-25 00:35:23 UTC; s425824

Author:

Bernd Gruber [aut, cre], Arthur Georges [aut], Jose L. Mijangos [aut], Carlo Pacioni [aut], Diana Robledo-Ruiz [aut], Peter J. Unmack [ctb], Oliver Berry [ctb], Lindsay V. Clark [ctb], Floriaan Devloo-Delva [ctb], Eric Archer [ctb]

URL:

https://green-striped-gecko.github.io/dartR/,https://github.com/green-striped-gecko/dartR

BugReports:

https://groups.google.com/g/dartr?pli=1

Maintainer:

Bernd Gruber <bernd.gruber@canberra.edu.au>

Repository:

CRAN

Date/Publication:

2025-03-25 09:50:02 UTC

indexing dartR objects correctly...

Description

indexing dartR objects correctly...

Usage

## S4 method for signature 'dartR,ANY,ANY,ANY'x[i, j, ..., pop = NULL, treatOther = TRUE, quiet = TRUE, drop = FALSE]

Arguments

x

dartR object

i

index for individuals

j

index for loci

...

other parameters

pop

list of populations to be kept

treatOther

elements in other (and ind.metrics & loci.metrics) as indexed as well. default: TRUE

quiet

warnings are suppressed. default: TRUE

drop

reduced to a vector if a single individual/loci is selected. default: FALSE [should never set to TRUE]

A genlight object created via the read.dart functions

Description

This a test data set to test the validity of functions within dartR and is based on a DArT SNP data set of simulated bandicoots across Australia. It contains 96 individuals and 1000 SNPs.

Usage

bandicoot.gl

Format

genlight object

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr

adjust cbind for dartR

Description

cbind is a bit lazy and does not take care for the metadata (so data in theother slot is lost). You can get most of the loci metadata back usinggl.compliance.check.

Usage

## S3 method for class 'dartR'cbind(...)

Arguments

...

list of dartR objects

Value

A genlight object

Examples

t1 <- platypus.glclass(t1) <- "dartR"t2 <- cbind(t1[,1:10],t1[,11:20])

Converts a genind object into a genlight object

Description

Converts a genind object into a genlight object

Usage

gi2gl(gi, parallel = FALSE, verbose = NULL)

Arguments

gi

A genind object [required].

parallel

Switch to deactivate parallel version. It might not be worthto run it parallel most of the times [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default 2].

Details

Be aware due to ambiguity which one is the reference allele a combination ofgi2gl(gl2gi(gl)) does not return an identical object (but in terms ofanalysis this conversions are equivalent)

Value

A genlight object, with all slots filled.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Estimates expected Heterozygosity

Description

Estimates expected Heterozygosity

Usage

gl.He(gl)

Arguments

gl

A genlight object [required]

Value

A simple vector whit Ho for each loci

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Estimates observed Heterozygosity

Description

Estimates observed Heterozygosity

Usage

gl.Ho(gl)

Arguments

gl

A genlight object [required]

Value

A simple vector whit Ho for each loci

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Estimates effective population size using the Linkage Disequilibriummethod based on NeEstimator (V2)

Description

This function is basically a convenience function that runs the LD Neestimator using Neestimator2within R using the provided genlight object.To be able to do so, the software has to be downloaded from their website and the appropriate executable Ne2-1 has to be copied into the path as specified in the function. (see example below).

Usage

gl.LDNe(  x,  outfile = "genepopLD.txt",  outpath = tempdir(),  neest.path = getwd(),  critical = 0,  singleton.rm = TRUE,  mating = "random",  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors_pop = discrete_palette,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file withall results from Neestimator 2 [default 'genepopLD.txt'].

outpath

Path where to save the output file. Use outpath=getwd() oroutpath='.' when calling this function to direct output files to your workingdirectory [default tempdir(), mandated by CRAN].

neest.path

Path to the folder of the NE2-1 file.Please note there are 3 different executables depending on your OS:Ne2-1.exe (=Windows), Ne2-1M (=Mac), Ne2-1L (=Linux). You only need to pointto the folder (the function will recognise which OS you are running)[default getwd()].

critical

(vector of) Critical values that are used to remove allelesbased on their minor allele frequency. This can be done before using thegl.filter.maf function, therefore the default is set to 0 (no loci areremoved). To run for MAF 0 and MAF 0.05 at the same time specify: critical =c(0,0.05) [default 0].

singleton.rm

Whether to remove singleton alleles [default TRUE].

mating

Formula for Random mating='random' or monogamy= 'monogamy'[default 'random'].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

User specified theme [default theme_dartR()].

plot_colors_pop

A discrete palette for population colors or a listwith as many colors as there are populations in the dataset[default discrete_palette].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

Dataframe with the results as table

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run: # SNP data (use two populations and only the first 100 SNPs)pops <- possums.gl[1:60,1:100]nes <- gl.LDNe(pops, outfile="popsLD.txt", outpath=tempdir(),neest.path = "./path_to Ne-21",critical=c(0,0.05), singleton.rm=TRUE, mating='random')nes## End(Not run)

Calculates allele frequency of the first and second allele for each lociA very simple function to report allele frequencies

Description

Calculates allele frequency of the first and second allele for each lociA very simple function to report allele frequencies

Usage

gl.alf(x)

Arguments

x

Name of the genlight object containing the SNP data [required].

Value

A simple data.frame with alf1, alf2.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

#for the first 10 loci onlygl.alf(possums.gl[,1:10])barplot(t(as.matrix(gl.alf(possums.gl[,1:10]))))

Generates percentage allele frequencies by locus and population

Description

This is a support script, to take SNP data or SilicoDArT presence/absencedata grouped into populations in a genlight object {adegenet} and generatea table of allele frequencies for each population and locus

Usage

gl.allele.freq(x, percent = FALSE, by = "pop", simple = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or Tag P/A(SilicoDArT) data [required].

percent

If TRUE, percentage allele frequencies are given, if FALSEallele proportions are given [default FALSE]

by

If by='popxloc' then breakdown is given by population and locus; if by='pop'then breakdown is given by population with statistics averaged across loci; if by='loc'then breakdown is given by locus with statistics averaged across individuals [default 'pop']

simple

A legacy option to return a dataframe with the frequency of the reference allele (alf1) and the frequency of the alternate allele (alf2) by locus [default FALSE]

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]

Value

A matrix with allele (SNP data) or presence/absence frequencies(Tag P/A data) broken down by population and locus

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

gl.allele.freq(testset.gl,percent=FALSE,by='pop')gl.allele.freq(testset.gl,percent=FALSE,by="loc")gl.allele.freq(testset.gl,percent=FALSE,by="popxloc")gl.allele.freq(testset.gl,simple=TRUE)

Performs AMOVA using genlight data

Description

This script performs an AMOVA based on the genetic distance matrix fromstamppNeisD() [package StAMPP] using the amova() function from the packagePEGAS for exploring within and between population variation. For detailedinformation use their help pages: ?pegas::amova, ?StAMPP::stamppAmova. Beaware due to a conflict of the amova functions from various packages I hadto 'hack' StAMPP::stamppAmova to avoid a namespace conflict.

Usage

gl.amova(x, distance = NULL, permutations = 100, verbose = NULL)

Arguments

x

Name of the genlight containing the SNP genotypes, withpopulation information [required].

distance

Distance matrix between individuals (if not provided NeisDfrom StAMPP::stamppNeisD is calculated) [default NULL].

permutations

Number of permutations to perform for hypothesistesting [default 100]. Please note should be set to 1000 for analysis.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

An object of class 'amova' which is a list with a table of sums ofsquare deviations (SSD), mean square deviations (MSD), and the number ofdegrees of freedom, and a vector of variance components.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

#permutations should be higher, here set to 1 because of speedout <- gl.amova(bandicoot.gl, permutations=1)

Population assignment using grm

Description

This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.

Usage

gl.assign.grm(x, unknown, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

unknown

Name of the individual to be assigned to a population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

This function is a re-implementation of the function multilocus_assignmentfrom package gstudio.Description of the method used in this function can be found at:https://dyerlab.github.io/applied_population_genetics/population-assignment.html

Value

Adata.frame consisting of assignment probabilities for eachpopulation.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")if ((requireNamespace("rrBLUP", quietly = TRUE)) &(requireNamespace("gplots", quietly = TRUE)) ) {res <- gl.assign.grm(platypus.gl,unknown="T27")}

Assign an individual of unknown provenance to population based on Mahalanobis Distance

Description

This script assigns an individual of unknown provenance to one or more targetpopulations based on the unknown individual's proximity to population centroids; proximity is estimated using Mahalanobis Distance.

The following process is followed:

An ordination is undertaken on the populations to again yield aseries of orthogonal (independent) axes.
A workable subset of dimensions is chosen, that specified, orequal to the number of dimensions with substantive eigenvalues, whichever isthe smaller.
The Mahalobalis Distance is calculated for the unknown against eachpopulation and probability of membership of each population is calculated.The assignment probabilities are listed in support of a decision.

Usage

gl.assign.mahalanobis(  x,  dim.limit = 2,  plevel = 0.999,  plot.out = TRUE,  unknown,  verbose = NULL)

Arguments

x

Name of the input genlight object [required].

dim.limit

Maximum number of dimensions to consider for theconfidence ellipses [default 2]

plevel

Probability level for bounding ellipses[default 0.999].

plot.out

If TRUE, produces a plot showing the position of the unknown in relation to putative source populations [default TRUE]

unknown

Identity label of the focal individual whose provenance isunknown [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

There are three considerations to assignment. First, consider only thosepopulations for which the unknown has no private alleles. Private alleles arean indication that the unknown does not belong to a target population(provided that the sample size is adequate, say >=10). This can be evaluatedwith gl.assign.pa().

A next step is to consider the PCoA plot for populations where no privatealleles have been detected. The position of the unknown in relation to theconfidence ellipses is plotted by this script as a basis for narrowing downthe list of putative source populations. This can be evaluated with gl.assign.pca().

The third step (delivered by this script) is to consider the assignment probabilities based on the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, then to consider the probability associated with its quantile using the Chisquare approximation. In effect, this index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination. The larger the assignment probability, the greater the confidence in the assignment.

If dim.limit is set to 2, to correspond with the dimensions used ingl.assign.pa(), then the output provides a ranking of the final setof putative source populations.

If dim.limit is set to be > 2, then this script provides a basis forfurther narrowing the set of putative populations.If the unknown individualis an extreme outlier, say at less than 0.001 probability of population membership (0.999 confidence envelope), then the associated population can be eliminated from further consideration.

Warning: gl.assign.mahal() treats each specified dimension equally, withoutregard to the percentage variation explained after ordination. If the unknown is an outlier in a lower dimension with an explanatory variance of,say, 0.1dimensions from the ordination.

Each of these above approaches provides evidence, none are 100They need to be interpreted cautiously.

In deciding the assignment, the script considers an individual to be anoutlier with respect to a particular population at alpha = 0.001 as default

Value

A data frame with the results of the assignment analysis.

Author(s)

Custodian: Arthur Georges –Post tohttps://groups.google.com/d/forum/dartr

Examples

## Not run: #Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown='UC_01044', nmin=10, threshold=1,verbose=3) test_2  <- gl.assign.pca(test, unknown='UC_01044', plevel=0.95, verbose=3)df <- gl.assign.mahalanobis(test_2, unknown='UC_01044', verbose=3)## End(Not run)

Eliminates populations as possible source populations for an individual of unknown provenance, using private alleles

Description

This script eliminates from consideration as putative source populations,those populations for which the individual has too many private alleles. Thepopulations that remain are putative source populations, subject to furtherconsideration.

The algorithm identifies those target populations for which the individualhas no private alleles or for which the number of private alleles does notexceed a user specified threshold.

An excessive count of private alleles is an indication that the unknown doesnot belong to a target population (provided that the sample size isadequate, say >=10).

Usage

gl.assign.pa(  x,  unknown,  nmin = 10,  threshold = 0,  n.best = NULL,  verbose = NULL)

Arguments

x

Name of the input genlight object [required].

unknown

SpecimenID label (indName) of the focal individual whoseprovenance is unknown [required].

nmin

Minimum sample size for a target population to be included in theanalysis [default 10].

threshold

Populations to retain for consideration; those for which thefocal individual has less than or equal to threshold loci with privatealleles [default 0].

n.best

If given a value, dictates the best n=n.best populations toretain for consideration (or more if their are ties) based on private alleles[default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object containing the focal individual (assigned topopulation 'unknown') and populations for which the focal individual is notdistinctive (number of loci with private alleles less than or equal to thethreshold). If no such populations, the genlight object contains only datafor the unknown individual.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# Test run with a focal individual from the Macleay River (EmmacMaclGeor)  test <- gl.assign.pa(testset.gl, unknown='UC_00146', nmin=10, threshold=1,  verbose=3)

Assign an individual of unknown provenance to population based on PCA

Description

This script assigns an individual of unknown provenance to one or more targetpopulations based on its proximity to each population defined by aconfidence ellipse in ordinated space of two dimensions.

The following process is followed:

The space defined by the loci is ordinated to yield a series oforthogonal axes (independent), and the top two dimensions are considered.Populations for which the unknown lies outside the specified confidencelimits are no longer removed from the dataset.

Usage

gl.assign.pca(x, unknown, plevel = 0.999, plot.out = TRUE, verbose = NULL)

Arguments

x

Name of the input genlight object [required].

unknown

Identity label of the focal individual whose provenance isunknown [required].

plevel

Probability level for bounding ellipses in the PCoA plot[default 0.999].

plot.out

If TRUE, plot the 2D PCA showing the position of the unknown [default TRUE]

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

A next step is to consider the PCoA plot for populations where no privatealleles have been detected and the position of the unknown in relation to theconfidence ellipses as is plotted by this script. Note, this plot isconsidering only the top two dimensions of the ordination, and so an unknownlying outside the confidence ellipse can be unambiguously interpreted as it lying outside the confidence envelope. However, if the unknown lies inside the confidence ellipse in two dimensions, then it may still lie outside the confidence envelope in deeper dimensions. This second step is good for eliminating populations from consideration, but does not provide confidence in assignment.

The third step is to consider the assignment probabilities, using the scriptgl.assign.mahalanobis(). This approach calculates the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, and calculates the probability associated with its quantile under the zero truncated normal distribution. This index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination.

Each of these approaches provides evidence, none are 100need to be interpreted cautiously. They are best applied sequentially.

In deciding the assignment, the script considers an individual to be anoutlier with respect to a particular population at alpha = 0.001 as default.

Value

A genlight object containing only those populations that areputative source populations for the unknown individual.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

## Not run: #Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown='UC_00146', nmin=10, threshold=1,verbose=3) test_2 <- gl.assign.pca(test, unknown='UC_00146', plevel=0.95, verbose=3)## End(Not run)

Calculates basic statistics for each loci (Hs, Ho, Fis etc.)

Description

Based on functionbasic.stats. Check ?basic.statsfor help.

Usage

gl.basic.stats(x, digits = 4, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

digits

Number of digits that should be returned [default 4].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

Several tables and lists with all basic stats.basic.stats for details.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

if (!(requireNamespace("hierfstat", quietly = TRUE))) {out <- gl.basic.stats(possums.gl[1:10,1:100])}

Aligns nucleotides sequences against those present in a target database using blastn

Description

Basic Local Alignment Search Tool (BLAST; Altschul et al., 1990 &1997) is a sequence comparison algorithm optimized for speed used to searchsequence databases for optimal local alignments to a query. This functioncreates fasta files, creates databases to run BLAST, runs blastn and filtersthese results to obtain the best hit per sequence.

This function can be used to run BLAST alignment of short-read (DArTseqdata) and long-read sequences (Illumina, PacBio... etc). You can usereference genomes from NCBI, genomes from your private collection, contigs,scaffolds or any other genetic sequence that you would like to use asreference.

Usage

gl.blast(  x,  ref_genome,  task = "megablast",  Percentage_identity = 70,  Percentage_overlap = 0.8,  bitscore = 50,  number_of_threads = 2,  verbose = NULL)

Arguments

x

Either a genlight object containing a column named'TrimmedSequence' containing the sequence of the SNPs (the sequence tag)trimmed of adapters as provided by DArT; or a path to a fasta file with thequery sequences [required].

ref_genome

Path to a reference genome in fasta of fna format[required].

task

Four different tasks are supported: 1) “megablast”, for verysimilar sequences (e.g, sequencing errors), 2) “dc-megablast”, typicallyused for inter-species comparisons, 3) “blastn”, the traditional programused for inter-species comparisons, 4) “blastn-short”, optimized forsequences less than 30 nucleotides [default 'megablast'].

Percentage_identity

Not a very sensitive or reliable measure ofsequence similarity, however it is a reasonable proxy for evolutionarydistance. The evolutionary distance associated with a 10 percent change inPercentage_identity is much greater at longer distances. Thus, a change from80 – 70 percent identity might reflect divergence 200 million years earlierin time, but the change from 30 percent to 20 percent might correspond to abillion year divergence time change [default 70].

Percentage_overlap

Calculated as alignment length divided by thequery length or subject length (whichever is shortest of the two lengths,i.e. length / min(qlen,slen) ) [default 0.8].

bitscore

A rule-of-thumb for inferring homology, a bit score of 50is almost always significant [default 50].

number_of_threads

Number of threads (CPUs) to use in blastn search[default 2].

verbose

verbose= 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]

Details

Installing BLAST

You can download the BLAST installs from:https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

It is important to install BLAST in a path that does not contain spaces forthis function to work.

Running BLAST

Four different tasks are supported:

“megablast”, for verysimilar sequences (e.g, sequencing errors)
“dc-megablast”, typicallyused for inter-species comparisons
“blastn”, the traditional programused for inter-species comparisons
“blastn-short”, optimized forsequences less than 30 nucleotides

If you are running a BLAST alignment of similar sequences, forexample Turtle Genome Vs Turtle Sequences, the recommended parametersare: task = “megablast”, Percentage_identity = 70, Percentage_overlap = 0.8and bitscore = 50.

If you are running a BLAST alignment of highly dissimilar sequences becauseyou are probably looking for sex linked hits in a distantly relatedspecies, and you are aligning for example sequences of Chicken Genome VsBassiana, the recommended parameters are: task = “dc-megablast”,Percentage_identity = 50, Percentage_overlap = 0.01 and bitscore = 30.

Be aware that running BLAST might take a long time (i.e. days) depending ofthe size of your query, the size of your database and the number of threadsselected for your computer.

BLAST output

The BLAST output is formatted as a table using output format 6, with columnsdefined in the following order:

qseqid - Query Seq-id
sacc - Subject accession
stitle - Subject Title
qseq - Alignedpart of query sequence
sseq - Aligned part of subject sequence
nident - Number of identical matches
mismatch - Number of mismatches
pident - Percentage of identical matches
length - Alignmentlength
evalue - Expect value
bitscore - Bit score
qstart -Start of alignment in query
qend - End of alignment in query
sstart - Start of alignment in subject
send - End of alignment insubject
gapopen - Number of gap openings
gaps - Total number ofgaps
qlen - Query sequence length
slen - Subject sequence length
PercentageOverlap - length / min(qlen,slen)

Databases containing unfiltered aligned sequences, filtered alignedsequences and one hit per sequence are saved to the temporal directory(tempdir) and can be accessed with the functiongl.print.reports and listed with the functiongl.list.reports. Note that they can be accessed only in thecurrent R session because tempdir is cleared each time that the R session isclosed.

BLAST filtering

BLAST output is filtered by ordering the hits of each sequence first by thehighest percentage identity, then the highest percentage overlap and thenthe highest bitscore. Only one hit per sequence is kept based on theseselection criteria.

Value

If the input is a genlight object: returns a genlight object with onehit per sequence merged to the slot $other$loc.metrics. If the input is afasta file: returns a dataframe with one hit per sequence.

Author(s)

Berenice Talamantes Becerra & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D.J. (1990). Basic local alignment search tool. Journal of molecular biology,215(3), 403-410.
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang,Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic acids research,25(17), 3389-3402.
Pearson, W. R. (2013). An introduction to sequence similarity(“homology”) searching. Current protocols in bioinformatics, 42(1), 3-1.

Examples

## Not run: res <- gl.blast(x= testset.gl,ref_genome = 'sequence.fasta')# display of reports saved in the temporal directorygl.list.reports()# open the reports saved in the temporal directoryblast_databases <- gl.print.reports(1)## End(Not run)

Checks the current global verbosity

Description

The verbosity can be set in one of two ways – (a) explicitly by the user bypassing a value using the parameter verbose in a function, or (b) by settingthe verbosity globally as part of the r environment (gl.set.verbosity).

Usage

gl.check.verbosity(x = NULL)

Arguments

x

User requested level of verbosity [default NULL].

Value

The verbosity, in variable verbose

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

gl.check.verbosity()

Checks the global working directory

Description

The working directory can be set in one of two ways – (a) explicitly by the user bypassing a value using the parameter plot.dir in a function, or (b) by settingthe working directory globally as part of the r environment (gl.setwd). The default is in acccordance to CRAN set to tempdir().

Usage

gl.check.wd(wd = NULL, verbose = NULL)

Arguments

wd

path to the working directory [default: tempdir()].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

the working directory

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

gl.check.wd()

Collapses a distance matrix by amalgamating populations with pairwisefixed difference count less that a threshold

Description

This script takes a file generated by gl.fixed.diff and amalgamatespopulations with distance less than or equal to a specified threshold. Thedistance matrix is generated by gl.fixed.diff().

The script then applies the new population assignments to the genlight objectand recalculates the distance and associated matrices.

Usage

gl.collapse(fd, tpop = 0, tloc = 0, pb = FALSE, verbose = NULL)

Arguments

fd

Name of the list of matrices produced by gl.fixed.diff() [required].

tpop

Threshold number of fixed differences above which populationswill not be amalgamated [default 0].

tloc

Threshold defining a fixed difference (e.g. 0.05 implies 95:5 vs5:95 is fixed) [default 0].

pb

If TRUE, show a progress bar on time consuming loops [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]

Value

A list containing the gl object x and the following square matrices:

$gl – the new genlight object with populations collapsed;
$fd – raw fixed differences;
$pcfd – percent fixed differences;
$nobs – mean no. of individuals used in each comparison;
$nloc – total number of loci used in each comparison;
$expfpos – NA's, populated by gl.fixed.diff [by simulation]
$expfpos – NA's, populated by gl.fixed.diff [by simulation]
$prob – NA's, populated by gl.fixed.diff [by simulation]

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 fd <- gl.fixed.diff(testset.gl,tloc=0.05)fdfd2 <- gl.collapse(fd,tpop=1)fd2fd3 <- gl.collapse(fd2,tpop=1)fd3 fd <- gl.fixed.diff(testset.gl,tloc=0.05) fd2 <- gl.collapse(fd)

This is a helper function that supports the creation of color palettes for all plotting functions.

Description

This is a helper function that supports the creation of color palettes for all plotting functions.

Usage

gl.colors(type = 2)

Arguments

type

the type of color or palette. Can be "2" [two colors], "2c" [two colors contrast], "3" [three colors], "4" [four colors], "pal" [need to be specify the palette type and the number of colors ]. A palette of colors can be specified via "div" [divergent], "dis" [discrete], "con" [convergent], "vir" [viridis]. Be aware a palette needs the number of colors specified as well. It returns a function and therefore the number of colors needs to be a part of the function call. Check the examples to see how this works.

Examples

gl.colors(2)gl.colors("2")gl.colors("2c")#five discrete colorsgl.colors(type="dis")(5)#seven divergent colorsgl.colors("div")(7)

Checks a genlight object to see if it complies with dartRexpectations and amends it to comply if necessary

Description

This function will check to see that the genlight object conforms toexpectation in regard to dartR requirements (see details), and if it doesnot, will rectify it.

Usage

gl.compliance.check(x, verbose = NULL)

Arguments

x

Name of the input genlight object [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

A genlight object used by dartR has a number of requirements that allowfunctions within the package to operate correctly. The genlight objectcomprises:

The SNP genotypes or Tag Presence/Absence data (SilicoDArT);
An associated dataframe (gl@other$loc.metrics) containing the locusmetrics (e.g. Call Rate, Repeatability, etc);
An associated dataframe (gl@other$ind.metrics) containing theindividual/sample metrics (e.g. sex, latitude (=lat), longitude(=lon), etc);
A specimen identity field (indNames(gl)) with the unique labels appliedto each individual/sample;
A population assignment (popNames) for each individual/specimen;
Flags that indicate whether or not calculable locus metrics have beenupdated.

Value

A genlight object that conforms to the expectations of dartR

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

x <- gl.compliance.check(testset.gl)x <- gl.compliance.check(testset.gs)

Calculates cost distances for a given landscape (resistance matrix)

Description

Calculates a cost distance matrix, to be used with run.popgensim.

Usage

gl.costdistances(landscape, locs, method, NN, verbose = NULL)

Arguments

landscape

A raster object coding the resistance of the landscape[required].

locs

Coordinates of the subpopulations. If a genlight object isprovided coordinates are taken from @other$latlon and centers for population(pop(gl)) are calculated. In case you want to calculate costdistances betweenindividuals redefine pop(gl) via:pop(gl)<- indNames(gl) [required].

method

Defines the type of cost distance, types are 'leastcost','rSPDistance' or 'commute' (Circuitscape type) [required].

NN

Number of next neighbours recommendation is 8 [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A costdistance matrix between all pairs of locs.

Examples

## Not run: data(possums.gl)library(raster)  #needed for that examplelandscape.sim <- readRDS(system.file('extdata','landscape.sim.rdata', package='dartR'))#calculate mean centers of individuals per populationxy <- apply(possums.gl@other$xy, 2, function(x) tapply(x, pop(possums.gl), mean))cd <- gl.costdistances(landscape.sim, xy, method='leastcost', NN=8)round(cd,3)## End(Not run)

Defines a new population in a genlight object for specified individuals

Description

The script reassigns existing individuals to a new population and removestheir existing population assignment.

The script returns a genlight object with the new population assignment.

Usage

gl.define.pop(x, ind.list, new, verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes [required].

ind.list

A list of individuals to be assigned to the new population[required].

new

Name of the new population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object with the redefined population structure.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

popNames(testset.gl)gl <- gl.define.pop(testset.gl, ind.list=c('AA019073','AA004859'), new='newguys')popNames(gl)indNames(gl)[pop(gl)=='newguys']

Provides descriptive stats and plots to diagnose potential problemswith Hardy-Weinberg proportions

Description

Different causes may be responsible for lack of Hardy-Weinbergproportions. This function helps diagnose potential problems.

Usage

gl.diagnostics.hwe(  x,  alpha_val = 0.05,  bins = 20,  stdErr = TRUE,  colors_hist = two_colors,  colors_barplot = two_colors_contrast,  plot_theme = theme_dartR(),  save2tmp = FALSE,  n.cores = "auto",  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

alpha_val

Level of significance for testing [default 0.05].

bins

Number of bins to display in histograms [default 20].

stdErr

Whether standard errors for Fis and Fst should be computed (default: TRUE)

colors_hist

List of two color names for the borders and fill of thehistogram [default two_colors].

colors_barplot

Vector with two color names for the observed andexpected number of significant HWE tests [default two_colors_contrast].

plot_theme

User specified theme [default theme_dartR()].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

n.cores

The number of cores to use. If "auto", it will use all but one available cores [default "auto"].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

This function initially runsgl.report.hwe and reportsthe ternary plots. The remaining outputs follow the recommendations fromWaples(2015) paper and De Meeûs 2018. These include:

A histogramwith the distribution of p-values of the HWE tests. The distribution shouldbe roughly uniform across equal-sized bins.
A bar plot with observedand expected (null expectation) number of significant HWE tests for the samelocus in multiple populations (that is, the x-axis shows whether a locusresults significant in 1, 2, ..., n populations. The y axis is the count ofthese occurrences. The zero value on x-axis shows the number ofnon-significant tests). If HWE tests are significant by chance alone,observed and expected number of HWE tests should have roughly a similardistribution.
A scatter plot with a linear regression between Fst and Fis,averaged across subpopulations. De Meeûs 2018 suggests that in the case ofNull alleles, a strong positive relationship is expected (together with theFis standard error much larger than the Fst standard error, see below).Note, this is not the scatter plot that Waples 2015 presents in hispaper. In the lower right corner of the plot, the Pearson correlationcoefficient is reported.
The Fis and Fst (averaged over loci andsubpopulations) standard errors are also printed on screen and reported inthe returned list (ifstdErr=TRUE). These are computed with the Jackknife method over loci (See De Meeûs 2007 for details on how this is computed) and it may take some time for these computations to complete. De Meeûs 2018 suggests that under a global significant heterozygosity deficit:
- if thecorrelation between Fis and Fst is strongly positive, and StdErrFis >>StdErrFst, Null alleles are likely to be the cause.
- if the correlationbetween Fis and Fst is ~0 or mildly positive, and StdErrFis > StdErrFst,Wahlund may be the cause.
- if the correlation between Fis and Fst is ~0, andStdErrFis ~ StdErrFst, selfing or sib mating could to be the cause.
It isimportant to realise that these statistics only suggest a pattern (pointers).Their absence is not conclusive evidence of the absence of the problem, as their presence does not confirm the cause of the problem.
A table where thenumber of observed and expected significant HWE tests are reported by eachpopulation, indicating whether these are due to heterozygosity excess ordeficiency. These can be used to have a clue of potential problems (e.g.deficiency might be due to a Wahlund effect, presence of null alleles ornon-random sampling; excess might be due to sex linkage or differentselection between sexes, demographic changes or small Ne. See Table 1 inWapples 2015). The last two columns of the table generated by this functionreport chisquare values and their associated p-values. Chisquare is computedfollowing Fisher's procedure for a global test (Fisher 1970). This basicallytests whether there is at least one test that is truly significant in theseries of tests conducted (De Meeûs et al 2009).

Value

A list with the table with the summary of the HWE tests and (if stdErr=TRUE) a named vector with the StdErrFis and StdErrFst.

Author(s)

Custodian: Carlo Pacioni – Post tohttps://groups.google.com/d/forum/dartr

References

de Meeûs, T., McCoy, K.D., Prugnolle, F.,Chevillon, C., Durand, P., Hurtrez-Boussès, S., Renaud, F., 2007. Populationgenetics and molecular epidemiology or how to “débusquer la bête”. Infection,Genetics and Evolution 7, 308-332.
De Meeûs, T., Guégan, J.-F., Teriokhin, A.T., 2009. MultiTest V.1.2, a program to binomially combine independent tests and performance comparison with other related methods onproportional data. BMC Bioinformatics 10, 443-443.
De Meeûs, T., 2018. Revisiting FIS, FST, Wahlund Effects, and Null Alleles. Journal of Heredity 109, 446-456.
Fisher, R., 1970.Statistical methods for research workers Edinburgh: Oliver and Boyd.
Waples, R. S. (2015). Testing for Hardy–Weinberg proportions: have we lostthe plot?. Journal of heredity, 106(1), 1-19.

Examples

## Not run: require("dartR.data")res <- gl.diagnostics.hwe(x = gl.filter.allna(platypus.gl[,1:50]), stdErr=FALSE, n.cores=1)## End(Not run)

Comparing simulations against theoretical expectations

Description

Comparing simulations against theoretical expectations

Usage

gl.diagnostics.sim(  x,  Ne,  iteration = 1,  pop_he = 1,  pops_fst = c(1, 2),  plot_theme = theme_dartR(),  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Output from functiongl.sim.WF.run [required].

Ne

Effective population size to use as input to compare theoretical expectations [required].

iteration

Iteration number to analyse [default 1].

pop_he

Population name in which the rate of loss of heterozygosity isgoing to be compared against theoretical expectations [default 1].

pops_fst

Pair of populations in which FST is going to be compared against theoretical expectations [default c(1,2)].

plot_theme

User specified theme [default theme_dartR()].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

Two plots are presented comparing the simulations against theoretical expectations:

Expected heterozygosity under neutrality (Crow & Kimura, 1970, p. 329) is calculated as:
Het = He0(1-(1/2Ne))^t,
where Ne is effective population size, He0 is heterozygosity at generation 0and t is the number of generations.
Expected FST under neutrality (Takahata, 1983) is calculated as:
FST=1/(4Nem(n/(n-1))^2+1),
where Ne is effective populations size of each individual subpopulation, m isdispersal rate and n the number of subpopulations (always 2).

Value

Returns plots comparing simulations against theoretical expectations

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

References

Crow JF, Kimura M. An introduction to population genetics theory. An introduction to population genetics theory. 1970.
Takahata N. Gene identity and genetic differentiation of populations in the finite island model. Genetics. 1983;104(3):497-512.

Examples

## Not run: ref_table <- gl.sim.WF.table(file_var=system.file('extdata', 'ref_variables.csv', package = 'dartR'),interactive_vars = FALSE)res_sim <- gl.sim.WF.run(file_var = system.file('extdata', 'sim_variables.csv', package ='dartR'),ref_table=ref_table,interactive_vars = FALSE,number_pops_phase2=2,population_size_phase2="50 50")res <- gl.diagnostics.sim(x=res_sim,Ne=50)## End(Not run)

Calculates a distance matrix for individuals defined in a genlight object

Description

This script calculates various distances between individuals based on allelefrequencies or presence-absence data

Usage

gl.dist.ind(  x,  method = NULL,  scale = FALSE,  swap = FALSE,  output = "dist",  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight containing the SNP genotypes or presence-absence data [required].

method

Specify distance measure [SNP: Euclidean; P/A: Simple].

scale

If TRUE, the distances are scaled to fall in the range [0,1] [default TRUE]

swap

If TRUE and working with presence-absence data, then presence (no disrupting mutation) is scored as 0 and absence (presence of a disrupting mutation) is scored as 1 [default FALSE].

output

Specify the format and class of the object to be returned, 'dist' for a object of class dist, 'matrix' for an object of class matrix [default "dist"].

plot.out

If TRUE, display a histogram and a boxplot of the genetic distances [TRUE].

plot_theme

User specified theme [default theme_dartR].

plot_colors

Vector with two color names for the borders and fill [default two_colors].

save2tmp

If TRUE, saves any ggplots to the session temporary directory [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The distance measure for SNP genotypes can be one of:

Euclidean Distance [method = "Euclidean"]
Scaled Euclidean Distance [method='Euclidean", scale=TRUE]
Simple Mismatch Distance [method="Simple"]
Absolute Mismatch Distance [method="Absolute"]
Czekanowski (Manhattan) Distance [method="Manhattan"]

The distance measure for Sequence Tag Presence/Absence data (binary) can be one of:

Euclidean Distance [method = "Euclidean"]
Scaled Euclidean Distance [method='Euclidean", scale=TRUE]
Simple Matching Distance [method="Simple"]
Jaccard Distance [method="Jaccard"]
Bray-Curtis Distance [method="Bray-Curtis"]

Refer to the dartR Technical Note on Distances in Genetics.

Value

An object of class 'matrix' or dist' giving distances between individuals

Author(s)

Author(s): Arthur Georges. Custodian: Arthur Georges – Post to #'https://groups.google.com/d/forum/dartr

Examples

 D <- gl.dist.ind(testset.gl[1:20,], method='manhattan')D <- gl.dist.ind(testset.gs[1:20,], method='Jaccard',swap=TRUE)D <- gl.dist.ind(testset.gl[1:20,], method='euclidean',scale=TRUE)

Calculates a distance matrix for populations with SNP genotypes in agenlight object

Description

This script calculates various distances between populations based on allelefrequencies (SNP genotypes) or frequency of presences in presence-absence data (Euclidean and Fixed-diff distances only).

Usage

gl.dist.pop(  x,  method = "euclidean",  plot.out = TRUE,  scale = FALSE,  output = "dist",  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight containing the SNP genotypes [required].

method

Specify distance measure [default euclidean].

plot.out

If TRUE, display a histogram of the genetic distances, and awhisker plot [default TRUE].

scale

If TRUE and method='Euclidean', the distance will be scaled to fall in the range [0,1] [default FALSE].

output

Specify the format and class of the object to be returned, dist for a object of class dist, matrix for an object of class matrix [default "dist"].

plot_theme

User specified theme [default theme_dartR()].

plot_colors

Vector with two color names for the borders and fill[default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The distance measure can be one of 'euclidean', 'fixed-diff', 'reynolds','nei' and 'chord'. Refer to the documentation of functionsdescribed in the the dartR Distance Analysis tutorial for algorithmsand definitions.

Value

An object of class 'dist' giving distances between populations

Author(s)

author(s): Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 ## Not run: # SNP genotypesD <- gl.dist.pop(possums.gl[1:90,1:100], method='euclidean')D <- gl.dist.pop(possums.gl[1:90,1:100], method='euclidean',scale=TRUE)#D <- gl.dist.pop(possums.gl, method='nei')#D <- gl.dist.pop(possums.gl, method='reynolds')#D <- gl.dist.pop(possums.gl, method='chord')#D <- gl.dist.pop(possums.gl, method='fixed-diff')#Presence-Absence data [only 10 individuals due to speed]D <- gl.dist.pop(testset.gs[1:10,], method='euclidean')## End(Not run)res <- gl.dist.pop(platypus.gl)

Removes specified individuals from a dartR genlight object

Description

This function deletes individuals and their associated metadata.Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).

The script returns a dartR genlight object with the retained individuals and the recalculated locus metadata. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Usage

gl.drop.ind(x, ind.list, recalc = FALSE, mono.rm = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object [required].

ind.list

List of individuals to be removed [required].

recalc

If TRUE, recalculate the locus metadata statistics [default FALSE].

mono.rm

If TRUE, remove monomorphic and all NA loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A reduced dartR genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data   gl2 <- gl.drop.ind(testset.gl,   ind.list=c('AA019073','AA004859')) # Tag P/A data   gs2 <- gl.drop.ind(testset.gs,   ind.list=c('AA020656','AA19077','AA004859'))   gs2 <- gl.drop.ind(testset.gs, ind.list=c('AA020656'   ,'AA19077','AA004859'),mono.rm=TRUE, recalc=TRUE)

Removes specified loci from a dartR genlight object

Description

This function deletes individuals and their associated metadata.

The script returns a dartR genlight object with the retained loci. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Usage

gl.drop.loc(  x,  loc.list = NULL,  first_tmp = NULL,  last_tmp = NULL,  verbose = NULL)

Arguments

x

Name of the genlight object [required].

loc.list

A list of loci to be deleted[required, if loc.range not specified].

first_tmp

First of a range of loci to be deleted[required, if loc.list not specified].

last_tmp

Last of a range of loci to be deleted[if not specified, last_tmp locus in the dataset].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A reduced dartR genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  gl2 <- gl.drop.loc(testset.gl, loc.list=c('100051468|42-A/T', '100049816-51-A/G'),verbose=3)# Tag P/A data  gs2 <- gl.drop.loc(testset.gs, loc.list=c('20134188','19249144'),verbose=3)

Removes specified populations from a dartR genlight object

Description

Individuals are assigned to populations based on associated specimen metadatastored in the dartR genlight object. This function deletes all individuals in the nominated populations (pop.list).Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).

The script returns a dartR genlight object with the retained populations and the recalculated locus metadata. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Usage

gl.drop.pop(  x,  pop.list,  as.pop = NULL,  recalc = FALSE,  mono.rm = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object [required].

pop.list

List of populations to be removed [required].

as.pop

Temporarily assign another locus metric as the population forthe purposes of deletions [default NULL].

recalc

If TRUE, recalculate the locus metadata statistics [default FALSE].

mono.rm

If TRUE, remove monomorphic and all NA loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A reduced dartR genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data   gl2 <- gl.drop.pop(testset.gl,   pop.list=c('EmsubRopeMata','EmvicVictJasp'),verbose=3)   gl2 <- gl.drop.pop(testset.gl, pop.list=c('EmsubRopeMata','EmvicVictJasp'),   mono.rm=TRUE,recalc=TRUE)   gl2 <- gl.drop.pop(testset.gl,as.pop='sex',pop.list=c('Male','Unknown'),verbose=3) # Tag P/A data   gs2 <- gl.drop.pop(testset.gs, pop.list=c('EmsubRopeMata','EmvicVictJasp'))

Creates or edits individual (=specimen) names, creates a recode_indfile and applies the changes to a genlight object

Description

A function to edit names of individual in a dartR genlight object, or to create areassignment table taking the individual labels from a genlight object, or toedit existing individual labels in an existing recode_ind file. The amended recode table is then applied to the genlight object.

Usage

gl.edit.recode.ind(  x,  out.recode.file = NULL,  outpath = tempdir(),  recalc = FALSE,  mono.rm = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object [required].

out.recode.file

Name of the file to output the new individual labels[optional].

outpath

Path specifying where to save the output file[default tempdir(), mandated by CRAN].

recalc

If TRUE, recalculate the locus metadata statistics [default TRUE].

mono.rm

If TRUE, remove monomorphic loci [default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

Renaming individuals may be required when there have been errors in labelingarising in the passage of samples to sequencing. There may be occasionswhere renaming individuals is required for preparation of figures.

This function will input an existing recode table for editing and optionallysave it as a new table, or if the name of an input table is not supplied,will generate a table using the individual labels in the parent genlightobject.

When caution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a durable record of the changes.

For SNP genotype data, the function, having deleted individuals, optionally identifies resultant monomorphic loci or loci with all values missing and deletes them. The script also optionally recalculates thelocus metadata as appropriate. The optional deletion of monomorphic lociand the optional recalculation of locus statistics is not available forTag P/A data (SilicoDArT).

Use outpath=getwd() when calling this function to directoutput files to your working directory.

The function returns a dartR genlight object with the new population assignments and the recalculated locus metadata.

Value

An object of class ('genlight') with the revised individual labels.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

## Not run: gl <- gl.edit.recode.ind(testset.gl)gl <- gl.edit.recode.ind(testset.gl, out.recode.file='ind.recode.table.csv')## End(Not run)

Creates or edits and applies a population re-assignment table

Description

A function to edit population assignments in a dartR genlight object, or tocreate a reassignment table taking the population assignmentsfrom a genlight object, or to edit existing population assignments ina pop.recode.table. The amended recode table is then applied to the genlightobject.

Usage

gl.edit.recode.pop(  x,  pop.recode = NULL,  out.recode.file = NULL,  outpath = tempdir(),  recalc = FALSE,  mono.rm = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object [required].

pop.recode

Path to recode file [default NULL].

out.recode.file

Name of the file to output the new individual labels[default NULL].

outpath

Path where to save the output file [default tempdir(), mandated by CRAN].

recalc

If TRUE, recalculate the locus metadata statistics[default TRUE].

mono.rm

If TRUE, remove monomorphic loci [default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

Genlight objects assign specimens to populations based on information in theind.metadata file provided when the genlight object is first generated.Often one wishes to subset the data by deleting populations or to amalgamatepopulations. This can be done with a pop.recode table with two columns. Thefirst column is the population assignment in the genlight object, the secondcolumn provides the new assignment.

This function will input an existing reassignment table for editing andoptionally save it as a new table, or if the name of an input table is notsupplied, will generate a table using the population assignments in theparent genlight object. It will then apply the recodings to the genlight object.

For SNP genotype data, the function, having deleted populations, optionally identifies resultant monomorphic loci or loci with all values missing and deletes them. The script also optionally recalculates thelocus metadata as appropriate. The optional deletion of monomorphic lociand the optional recalculation of locus statistics is not available forTag P/A data (SilicoDArT).

Use outpath=getwd() when calling this function to directoutput files to your working directory.

The function returns a dartR genlight object with the new population assignments and the recalculated locus metadata.

Value

A genlight object with the revised population assignments

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

## Not run: gl <- gl.edit.recode.pop(testset.gl)gs <- gl.edit.recode.pop(testset.gs)## End(Not run)# See also -------------------

Creates an Evanno plot from a STRUCTURE run object

Description

This function takes a genlight object and runs a STRUCTURE analysis based onfunctions fromstrataG

Usage

gl.evanno(sr, plot.out = TRUE)

Arguments

sr

structure run object fromgl.run.structure [required].

plot.out

TRUE: all four plots are shown. FALSE: all four plots arereturned as a ggplot but not shown [default TRUE].

Details

The function is basically a convenient wrapper around the beautifulstrataG functionevanno (Archer et al. 2016). For a detaileddescription please refer to this package (see references below).

Value

An Evanno plot is created and a list of all four plots is returned.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

References

Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016) strataG: An Rpackage for manipulating, summarizing and analysing population genetic data.Mol Ecol Resour. doi:10.1111/1755-0998.12559
Evanno, G., Regnaut, S., and J. Goudet. 2005. Detecting the number ofclusters of individuals using the software STRUCTURE: a simulation study.Molecular Ecology 14:2611-2620.

Examples

## Not run: #CLUMPP and STRUCTURE need to be installed to be able to run the example#bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, exec = './structure.exe')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, k=3, CLUMPP='d:/structure/')#head(qmat)#gl.map.structure(qmat, bc, scalex=1, scaley=0.5)## End(Not run)

Estimates the rate of false positives in a fixed difference analysis

Description

This function takes two populations and generates allele frequency profilesfor them. It then samples an allele frequency for each, at random, andestimates a sampling distribution for those two allele frequencies. Drawingtwo samples from those sampling distributions, it calculates whether or notthey represent a fixed difference. This is applied to all loci, and thenumber of fixed differences so generated are counted, as an expectation. Thescript distinguished between true fixed differences (with a tolerance ofdelta), and false positives. The simulation is repeated a given number oftimes (default=1000) to provide an expectation of the number of falsepositives, given the observed allele frequency profiles and the sample sizes.The probability of the observed count of fixed differences is greater thanthe expected number of false positives is calculated.

Usage

gl.fdsim(  x,  poppair,  obs = NULL,  sympatric = FALSE,  reps = 1000,  delta = 0.02,  verbose = NULL)

Arguments

x

Name of the genlight containing the SNP genotypes [required].

poppair

Labels of two populations for comparison in the formc(popA,popB) [required].

obs

Observed number of fixed differences between the two populations[default NULL].

sympatric

If TRUE, the two populations are sympatric, if FALSE thenallopatric [default FALSE].

reps

Number of replications to undertake in the simulation[default 1000].

delta

The threshold value for the minor allele frequency to regard thedifference between two populations to be fixed [default 0.02].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Value

A list containing the following square matrices[[1]] observed fixed differences;[[2]] mean expected number of false positives for each comparison;[[3]] standard deviation of the no. of false positives for eachcomparison;[[4]] probability the observed fixed differences arose by chance foreach comparison.

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

fd <- gl.fdsim(testset.gl[,1:100],poppair=c('EmsubRopeMata','EmmacBurnBara'),sympatric=TRUE,verbose=3)

Filters loci that are all NA across individuals and/or populations with all NA across loci

Description

This script deletes deletes loci or individuals with all calls missing (NA),from a genlight object

A DArT dataset will not have loci for which the calls are scored all asmissing (NA) for a particular individual, but such loci can arise rarely whenpopulations or individuals are deleted. Similarly, a DArT dataset will nothave individuals for which the calls are scored all as missing (NA) acrossall loci, but such individuals may sneak in to the dataset when loci aredeleted. Retaining individual or loci with all NAs can cause issues forseveral functions.

Also, on occasion an analysis will require that there are some loci scoredin each population. Setting by.pop=TRUE will result in removal of loci whenthey are all missing in any one population.

Note that loci that are missing for all individuals in a population arenot imputed with method 'frequency' or 'HW'. Consider using the functiongl.filter.allna with by.pop=TRUE.

Usage

gl.filter.allna(x, by.pop = FALSE, recalc = FALSE, verbose = NULL)

Arguments

x

Name of the input genlight object [required].

by.pop

If TRUE, loci that are all missing in any one populationare deleted [default FALSE]

recalc

Recalculate the locus metadata statistics if any individualsare deleted in the filtering [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A genlight object having removed individuals that are scored NAacross all loci, or loci that are scored NA across all individuals.

Author(s)

Author(s): Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  result <- gl.filter.allna(testset.gl, verbose=3)# Tag P/A data  result <- gl.filter.allna(testset.gs, verbose=3)

Filters loci or specimens in a genlight {adegenet} object based oncall rate

Description

Tag Presence/Absence datasets (SilicoDArT) have missing values where it isnot possible to determine reliably if there the sequence tag can be called ata particular locus.

Usage

gl.filter.callrate(  x,  method = "loc",  threshold = 0.95,  mono.rm = FALSE,  recalc = FALSE,  recursive = FALSE,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  bins = 25,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data, or the genindobject containing the SilocoDArT data [required].

method

Use method='loc' to specify that loci are to be filtered, 'ind'to specify that specimens are to be filtered, 'pop' to remove loci that fail to meet the specified threshold in any one population [default 'loc'].

threshold

Threshold value below which loci will be removed[default 0.95].

mono.rm

Remove monomorphic loci after analysis is complete[default FALSE].

recalc

Recalculate the locus metadata statistics if any individualsare deleted in the filtering [default FALSE].

recursive

Repeatedly filter individuals on call rate, each timeremoving monomorphic loci. Only applies if method='ind' and mono.rm=TRUE[default FALSE].

plot.out

Specify if histograms of call rate, before and after, are tobe produced [default TRUE].

plot_theme

User specified theme for the plot [default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

bins

Number of bins to display in histograms [default 25].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Because this filter operates on call rate, this function recalculates CallRate, if necessary, before filtering. If individuals are removed usingmethod='ind', then the call rate stored in the genlight object is, optionally,recalculated after filtering.

Note that when filtering individuals on call rate, the initial call rate iscalculated and compared against the threshold. After filtering, ifmono.rm=TRUE, the removal of monomorphic loci will alter the call rates.Some individuals with a call rate initially greater than the nominatedthreshold, and so retained, may come to have a call rate lower than thethreshold. If this is a problem, repeated iterations of this function willresolve the issue. This is done by setting mono.rm=TRUE and recursive=TRUE,or it can be done manually.

Callrate is summarized by locus or by individual to allow sensible decisionson thresholds for filtering taking into consideration consequential loss ofdata. The summary is in the form of a tabulation and plots.

Plot themes can be obtained from

Resultant ggplot(s) and the tabulation(s) are saved to the session'stemporary directory.

Value

The reduced genlight or genind object, plus a summary

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data  result <- gl.filter.callrate(testset.gl[1:10], method='loc', threshold=0.8,   verbose=3)  result <- gl.filter.callrate(testset.gl[1:10], method='ind', threshold=0.8,   verbose=3)  result <- gl.filter.callrate(testset.gl[1:10], method='pop', threshold=0.8,   verbose=3)# Tag P/A data  result <- gl.filter.callrate(testset.gs[1:10], method='loc',   threshold=0.95, verbose=3)  result <- gl.filter.callrate(testset.gs[1:10], method='ind',   threshold=0.8, verbose=3)  result <- gl.filter.callrate(testset.gs[1:10], method='pop',   threshold=0.8, verbose=3)    res <- gl.filter.callrate(platypus.gl)

Filters loci based on pairwise Hamming distance between sequence tags

Description

Hamming distance is calculated as the number of base differences between twosequences which can be expressed as a count or a proportion. Typically, it iscalculated between two sequences of equal length. In the context of DArTtrimmed sequences, which differ in length but which are anchored to the leftby the restriction enzyme recognition sequence, it is sensible to compare thetwo trimmed sequences starting from immediately after the common recognitionsequence and terminating at the last base of the shorter sequence.

Usage

gl.filter.hamming(  x,  threshold = 0.2,  rs = 5,  taglength = 69,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  pb = FALSE,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

threshold

A threshold Hamming distance for filtering loci[default threshold 0.2].

rs

Number of bases in the restriction enzyme recognition sequence[default 5].

taglength

Typical length of the sequence tags [default 69].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

pb

Switch to output progress bar [default FALSE].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Hamming distance can be computedby exploiting the fact that the dot product of two binary vectors x and (1-y)counts the corresponding elements that are different between x and y.This approach can also be used for vectors that contain more than two possible values at each position (e.g. A, C, T or G).

If a pair of DNA sequences are of differing length, the longer is truncated.

The algorithm is that of Johann de Jonghttps://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/as implemented inutils.hamming.

Only one of two loci are retained if their Hamming distance is less that a specifiedpercentage. 5 base differences out of 100 bases is a 20

Value

A genlight object filtered on Hamming distance.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP datatest <- platypus.gltest <- gl.subsample.loci(platypus.gl,n=50)result <- gl.filter.hamming(test, threshold=0.25, verbose=3)

Filters individuals with average heterozygosity greater than aspecified upper threshold or less than a specified lower threshold

Description

Calculates the observed heterozygosity for each individual in a genlightobject and filters individuals based on specified threshold values.Use gl.report.heterozygosity to determine the appropriate thresholds.

Usage

gl.filter.heterozygosity(x, t.upper = 0.7, t.lower = 0, verbose = NULL)

Arguments

x

A genlight object containing the SNP genotypes [required].

t.upper

Filter individuals > the threshold [default 0.7].

t.lower

Filter individuals < the threshold [default 0].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

The filtered genlight object.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

 result <- gl.filter.heterozygosity(testset.gl,t.upper=0.06,verbose=3) tmp <- gl.report.heterozygosity(result,method='ind')

Filters loci that show significant departure from Hardy-WeinbergEquilibrium

Description

This function filters out loci showing significant departure from H-Wproportions based on observed frequencies of reference homozygotes,heterozygotes and alternate homozygotes.

Loci are filtered out if they show HWE departure either in any one population (n.pop.threshold =1) or in at least X number of populations (n.pop.threshold > 1).

Usage

gl.filter.hwe(  x,  subset = "each",  n.pop.threshold = 1,  method_sig = "Exact",  multi_comp = FALSE,  multi_comp_method = "BY",  alpha_val = 0.05,  pvalue_type = "midp",  cc_val = 0.5,  min_sample_size = 5,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

subset

Way to group individuals to perform H-W tests. Either a vectorwith population names, 'each', 'all' (see details) [default 'each'].

n.pop.threshold

The minimum number of populations where the same locus has to be out of H-W proportions to be removed [default 1].

method_sig

Method for determining statistical significance: 'ChiSquare'or 'Exact' [default 'Exact'].

multi_comp

Whether to adjust p-values for multiple comparisons[default FALSE].

multi_comp_method

Method to adjust p-values for multiple comparisons:'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr'(see details) [default 'fdr'].

alpha_val

Level of significance for testing [default 0.05].

pvalue_type

Type of p-value to be used in the Exact method.Either 'dost','selome','midp' (see details) [default 'midp'].

cc_val

The continuity correction applied to the ChiSquare test[default 0.5].

min_sample_size

Minimum number of individuals per population in whichperform H-W tests [default 5].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

There are several factors that can cause deviations from Hardy-Weinbergproportions including: mutation, finite population size, selection,population structure, age structure, assortative mating, sex linkage,nonrandom sampling and genotyping errors. Therefore, testing forHardy-Weinberg proportions should be a process that involves a carefulevaluation of the results, a good place to start is Waples (2015).

Note that tests for H-W proportions are only valid if there is no populationsubstructure (assuming random mating) and have sufficient power only whenthere is sufficient sample size (n individuals > 15).

Populations can be defined in three ways:

Merging all populations in the dataset using subset = 'all'.
Within each population separately using: subset = 'each'.
Within selected populations using for example: subset = c('pop1','pop2').

Two different statistical methods to test for deviations from Hardy Weinbergproportions:

The classical chi-square test (method_sig='ChiSquare') based on thefunctionHWChisq of the R package HardyWeinberg.By default a continuity correction is applied (cc_val=0.5). Thecontinuity correction can be turned off (by specifying cc_val=0), for examplein cases of extreme allele frequencies in which the continuity correction canlead to excessive type 1 error rates.
The exact test (method_sig='Exact') based on the exact calculationscontained in the functionHWExactStats of the Rpackage HardyWeinberg, and described in Wigginton et al. (2005). The exacttest is recommended in most cases (Wigginton et al., 2005).Three different methods to estimate p-values (pvalue_type) in the Exact testcan be used:
- 'dost' p-value is computed as twice the tail area of a one-sided test.
- 'selome' p-value is computed as the sum of the probabilities of allsamples less or equally likely as the current sample.
- 'midp', p-value is computed as half the probability of the currentsample + the probabilities of all samples that are more extreme.
The standard exact p-value is overly conservative, in particularfor small minor allele frequencies. The mid p-value ameliorates this problemby bringing the rejection rate closer to the nominal level, at the price ofoccasionally exceeding the nominal level (Graffelman & Moreno, 2013).

Correction for multiple tests can be applied using the following methodsbased on the functionp.adjust:

'holm' is also known as the sequential Bonferroni technique (Rice, 1989).This method has a greater statistical power than the standard Bonferroni test,however this method becomes very stringent when many tests are performed andmany real deviations from the null hypothesis can go undetected (Waples, 2015).
'hochberg' based on Hochberg, 1988.
'hommel' based on Hommel, 1988. This method is more powerful thanHochberg's, but the difference is usually small.
'bonferroni' in which p-values are multiplied by the number of tests.This method is very stringent and therefore has reduced power to detectmultiple departures from the null hypothesis.
'BH' based on Benjamini & Hochberg, 1995.
'BY' based on Benjamini & Yekutieli, 2001.

The first four methods are designed to give strong control of the family-wiseerror rate. The last two methods control the false discovery rate (FDR),the expected proportion of false discoveries among the rejected hypotheses.The false discovery rate is a less stringent condition than the family-wiseerror rate, so these methods are more powerful than the others, especiallywhen number of tests is large.The number of tests on which the adjustment for multiple comparisons isthe number of populations times the number of loci.

From v2.1gl.filter.hwe takes the argumentn.pop.threshold.ifn.pop.threshold > 1 loci will be removed only if they are concurrently significant (after adjustment if applied) out of hwe in >=n.pop.threshold > 1.

Value

A genlight object with the loci departing significantly from H-Wproportions removed.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

References

Benjamini, Y., and Yekutieli, D. (2001). The control of the falsediscovery rate in multiple testing under dependency. Annals of Statistics,29, 1165–1188.
Graffelman, J. (2015). Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64:1-23.
Graffelman, J. & Morales-Camarena, J. (2008). Graphical tests forHardy-Weinberg equilibrium based on the ternary plot. Human Heredity 65:77-84.
Graffelman, J., & Moreno, V. (2013). The mid p-value in exact tests forHardy-Weinberg equilibrium. Statistical applications in genetics andmolecularbiology, 12(4), 433-448.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple testsof significance. Biometrika, 75, 800–803.
Hommel, G. (1988). A stagewise rejective multiple test procedure basedon a modified Bonferroni test. Biometrika, 75, 383–386.
Rice, W. R. (1989). Analyzing tables of statistical tests. Evolution,43(1), 223-225.
Waples, R. S. (2015). Testing for Hardy–Weinberg proportions: have welost the plot?. Journal of heredity, 106(1), 1-19.
Wigginton, J.E., Cutler, D.J., & Abecasis, G.R. (2005). A Note on ExactTests of Hardy-Weinberg Equilibrium. American Journal of Human Genetics76:887-893.

Examples

result <- gl.filter.hwe(x = bandicoot.gl)

Filters loci on the basis of numeric information stored inother$loc.metrics in a genlight {adegenet} object

Description

This script uses any field with numeric values stored in $other$loc.metricsto filter loci. The loci to keep can be within the upper and lower thresholds('within') or outside of the upper and lower thresholds ('outside').

Usage

gl.filter.locmetric(x, metric, upper, lower, keep = "within", verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

metric

Name of the metric to be used for filtering [required].

upper

Filter upper threshold [required].

lower

Filter lower threshold [required].

keep

Whether keep loci within of upper and lower thresholds or keeploci outside of upper and lower thresholds [within].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The fields that are included in dartR, and a short description, are foundbelow. Optionally, the user can also set his/her own filter by adding avector into $other$loc.metrics as shown in the example.

SnpPosition - position (zero is position 1) in the sequence tag of thedefined SNP variant base.
CallRate - proportion of samples for which the genotype call isnon-missing (that is, not '-' ).
OneRatioRef - proportion of samples for which the genotype score is 0.
OneRatioSnp - proportion of samples for which the genotype score is 2.
FreqHomRef - proportion of samples homozygous for the Reference allele.
FreqHomSnp - proportion of samples homozygous for the Alternate (SNP)allele.
FreqHets - proportion of samples which score as heterozygous, that is,scored as 1.
PICRef - polymorphism information content (PIC) for the Referenceallele.
PICSnp - polymorphism information content (PIC) for the SNP.
AvgPIC - average of the polymorphism information content (PIC) of theReference and SNP alleles.
AvgCountRef - sum of the tag read counts for all samples, divided bythe number of samples with non-zero tag read counts, for the Reference allelerow.
AvgCountSnp - sum of the tag read counts for all samples, divided bythe number of samples with non-zero tag read counts, for the Alternate (SNP)allele row.
RepAvg - proportion of technical replicate assay pairs for which themarker score is consistent.

Value

The reduced genlight dataset.

Author(s)

Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

# adding dummy datatest <- testset.gltest$other$loc.metrics$test <- 1:nLoc(test)result <- gl.filter.locmetric(x=test, metric= 'test', upper=255,lower=200, keep= 'within', verbose=3)

Filters loci on the basis of minor allele frequency (MAF) in a genlight`adegenet` object

Description

This script calculates the minor allele frequency for each locus and updatesthe locus metadata for FreqHomRef, FreqHomSnp, FreqHets and MAF (if itexists). It then uses the updated metadata for MAF to filter loci.

Usage

gl.filter.maf(  x,  threshold = 0.01,  by.pop = FALSE,  pop.limit = ceiling(nPop(x)/2),  ind.limit = 10,  recalc = FALSE,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors_pop = discrete_palette,  plot_colors_all = two_colors,  bins = 25,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

threshold

Threshold MAF – loci with a MAF less than the thresholdwill be removed. If a value > 1 is provided it will be interpreted as MAC (i.e. the minimum number of times an allele needs to be observed) [default 0.01].

by.pop

Whether MAF should be calculated by population [default FALSE].

pop.limit

Minimum number of populations in which MAF should be less than the threshold for a locus to be filtered out. Only used if by.pop=TRUE. The default value is half of the populations [default ceiling(nPop(x)/2)].

ind.limit

Minimum number of individuals that a population should contain to calculate MAF. Only used if by.pop=TRUE [default 10].

recalc

Recalculate the locus metadata statistics if any individualsare deleted in the filtering [default FALSE].

plot.out

Specify if histograms of call rate, before and after, are tobe produced [default TRUE].

plot_theme

User specified theme for the plot [default theme_dartR()].

plot_colors_pop

A color palette for population plots[default discrete_palette].

plot_colors_all

List of two color names for the borders and fill ofthe overall plot [default two_colors].

bins

Number of bins to display in histograms [default 25].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Careful consideration needs to be given to the settings to be used for this fucntion. When the filter is applied globally (i.e.by.pop=FALSE) but the data include multiple population, there is the risk to remove markers because the allele frequencies is low (at global level) but the allele frequenciesfor the same markers may be high within some of the populations (especially if the per-population sample size is small). Similarly, not always it is a sensible choice to run this function usingby.pop=TRUE because allele that are rare in a population may be very common in other, but the (possible) allele frequencies will depend on the sample size within each population. Where the purpose of filtering for MAF is to remove possible spurious alleles (i.e. sequencing errors), it is perhaps better to filter based on the number of times an allele is observed (MAC, Minimum Allele Count), under the assumption that if an allele is observed >MAC, it is fairly rare to be an error.From v2.1 The threshold can take values > 1. In this case, these are interpreted as a threshold for MAC.

Value

The reduced genlight dataset

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

result <- gl.filter.monomorphs(testset.gl)result <- gl.filter.maf(result, threshold=0.05, verbose=3)

Filters monomorphic loci, including those with all NAs

Description

This script deletes monomorphic loci from a genlight {adegenet} object

A DArT dataset will not have monomorphic loci, but they can arise, along withloci that are scored all NA, when populations or individuals are deleted.

Retaining monomorphic loci unnecessarily increases the size of the datasetand will affect some calculations.

Note that for SNP data, NAs likely represent null alleles; in tagpresence/absence data, NAs represent missing values (presence/absence couldnot be reliably scored)

Usage

gl.filter.monomorphs(x, verbose = NULL)

Arguments

x

Name of the input genlight object [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A genlight object with monomorphic (and all NA) loci removed.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  result <- gl.filter.monomorphs(testset.gl, verbose=3)# Tag P/A data  result <- gl.filter.monomorphs(testset.gs, verbose=3)

Filters loci for which the SNP has been trimmed from the sequence tagalong with the adaptor

Description

This function checks the position of the SNP within the trimmed sequence tagand identifies those for which the SNP position is outside the trimmedsequence tag. This can happen, rarely, when the sequence containing the SNPresembles the adaptor.

The SNP genotype can still be used in most analyses, but functions likegl2fasta() will present challenges if the SNP has been trimmed from thesequence tag.

Not fatal, but should apply this filter before gl.filter.secondaries, forobvious reasons.

Usage

gl.filter.overshoot(x, save2tmp = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object [required].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A new genlight object with the recalcitrant loci deleted

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

result <- gl.filter.overshoot(testset.gl, verbose=3)

Filters loci that contain private (and fixed alleles) between two populations

Description

This script is meant to be used prior togl.nhybrids to maximise theinformation content of the SNPs used to identify hybrids (currentlynewhybrids does allow only 200 SNPs). The idea is to use first all loci thathave fixed alleles between the potential source populations and then 'fillup' to 200 loci using loci that have private alleles between those. Thefunctions filters for those loci (if invers is set to TRUE, the oppositeis returned (all loci that are not fixed and have no private alleles - notsure why yet, but maybe useful.)

Usage

gl.filter.pa(x, pop1, pop2, invers = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

pop1

Name of the first parental population (in quotes) [required].

pop2

Name of the second parental population (in quotes) [required].

invers

Switch to filter for all loci that have no private alleles andare not fixed [FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

The reduced genlight dataset, containing now only fixed and privatealleles.

Author(s)

Authors: Bernd Gruber & Ella Kelly (University of Melbourne);Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

result <- gl.filter.pa(testset.gl, pop1=pop(testset.gl)[1], pop2=pop(testset.gl)[2],verbose=3)

Filters putative parent offspring within a population

Description

This script removes individuals suspected of being related asparent-offspring,using the output of the functiongl.report.parent.offspring, which examines the frequency ofpedigree inconsistent loci, that is, those loci that are homozygotes in theparent for the reference allele, and homozygous in the offspring for thealternate allele. This condition is not consistent with any pedigree,regardless of the (unknown) genotype of the other parent.The pedigree inconsistent loci are counted as an indication of whether or notit is reasonable to propose the two individuals are in a parent-offspringrelationship.

Usage

gl.filter.parent.offspring(  x,  min.rdepth = 12,  min.reproducibility = 1,  range = 1.5,  method = "best",  rm.monomorphs = FALSE,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP genotypes [required].

min.rdepth

Minimum read depth to include in analysis [default 12].

min.reproducibility

Minimum reproducibility to include in analysis[default 1].

range

Specifies the range to extend beyond the interquartile range fordelimiting outliers [default 1.5 interquartile ranges].

method

Method of selecting the individual to retain from each pair ofparent offspring relationship, 'best' (based on CallRate) or 'random'[default 'best'].

rm.monomorphs

If TRUE, remove monomorphic loci after filteringindividuals [default FALSE].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

If two individuals are in a parent offspring relationship, the true number ofpedigree inconsistent loci should be zero, but SNP calling is not infallible.Some loci will be miss-called. The problem thus becomes one of determining ifthe two focal individuals have a count of pedigree inconsistent loci lessthan would be expected of typical unrelated individuals. There are some quitesophisticated software packages available to formally apply likelihoods tothe decision, but we use a simple outlier comparison.

To reduce the frequency of miss-calls, and so emphasize the differencebetween true parent-offspring pairs and unrelated pairs, the data can befiltered on read depth. Typically minimum read depth is set to 5x, but youcan examine the distribution of read depths with the functiongl.report.rdepth and push this up with an acceptable loss ofloci. 12x might be a good minimum for this particular analysis. It issensible also to push the minimum reproducibility up to 1, if that does notresult in an unacceptable loss of loci. Reproducibility is stored in the slot@other$loc.metrics$RepAvg and is defined as the proportion oftechnical replicate assay pairs for which the marker score is consistent.You can examine the distribution of reproducibility with the functiongl.report.reproducibility.

Note that the null expectation is not well defined, and the power reduced, ifthe population from which the putative parent-offspring pairs are drawncontains many sibs. Note also that if an individual has been genotyped twicein the dataset, the replicate pair will be assessed by this script as beingin a parent-offspring relationship.

You should rungl.report.parent.offspring before filtering. Usethis report to decide min.rdepth and min.reproducibility and assess impact onyour dataset.

Note that if your dataset does not contain RepAvg or rdepth among the locusmetrics, the filters for reproducibility and read depth are no used.

Function's output

Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that theR session is closed.

Examples of other themes that can be used can be consulted in

Value

the filtered genlight object without A set of individuals inparent-offspring relationship. NULL if no parent-offspring relationships werefound.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

out <- gl.filter.parent.offspring(testset.gl[1:10,1:50])

Filters loci based on counts of sequence tags scored at a locus (readdepth)

Description

SNP datasets generated by DArT report AvgCountRef and AvgCountSnp as countsof sequence tags for the reference and alternate alleles respectively. Thesecan be used to back calculate Read Depth. Fragment presence/absence datasetsas provided by DArT (SilicoDArT) provide Average Read Depth and StandardDeviation of Read Depth as standard columns in their report.

Filtering on Read Depth using the companion script gl.filter.rdepth can be onthe basis of loci with exceptionally low counts,or loci with exceptionally high counts.

Usage

gl.filter.rdepth(  x,  lower = 5,  upper = 50,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or tagpresence/absence data [required].

lower

Lower threshold value below which loci will be removed[default 5].

upper

Upper threshold value above which loci will be removed[default 50].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

For examples of themes, see:

Value

Returns a genlight object retaining loci with a Read Depth in therange specified by the lower and upper threshold.

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

 # SNP data  gl.report.rdepth(testset.gl)  result <- gl.filter.rdepth(testset.gl, lower=8, upper=50, verbose=3)# Tag P/A data  result <- gl.filter.rdepth(testset.gs, lower=8, upper=50, verbose=3)    res <- gl.filter.rdepth(platypus.gl)

Filters loci in a genlight {adegenet} object based on averagerepeatability of alleles at a locus

Description

SNP datasets generated by DArT have an index, RepAvg, generated byreproducing the data independently for 30of alleles that give a repeatable result, averaged over both alleles for eachlocus.

SilicoDArT datasets generated by DArT have a similar index, Reproducibility.For these fragment presence/absence data, repeatability is the percentage ofscores that are repeated in the technical replicate dataset.

Usage

gl.filter.reproducibility(  x,  threshold = 0.99,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

threshold

Threshold value below which loci will be removed[default 0.99].

plot.out

If TRUE, displays a plots of the distribution ofreproducibility values before and after filtering [default TRUE].

plot_theme

Theme for the plot [default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

Returns a genlight object retaining loci with repeatability (Repavgor Reproducibility) greater than the specified threshold.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data  gl.report.reproducibility(testset.gl)  result <- gl.filter.reproducibility(testset.gl, threshold=0.99, verbose=3)# Tag P/A data  gl.report.reproducibility(testset.gs)  result <- gl.filter.reproducibility(testset.gs, threshold=0.99)      test <- gl.subsample.loci(platypus.gl,n=100)  res <- gl.filter.reproducibility(test)

Filters loci that represent secondary SNPs in a genlight object

Description

SNP datasets generated by DArT include fragments with more than one SNP andrecord them separately with the same CloneID (=AlleleID). These multiple SNPloci within a fragment (secondaries) are likely to be linked, and so you maywish to remove secondaries.

This script filters out all but the first sequence tag with the same CloneIDafter ordering the genlight object on based on repeatability, avgPIC in thatorder (method='best') or at random (method='random').

The filter has not been implemented for tag presence/absence data.

Usage

gl.filter.secondaries(x, method = "random", verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

method

Method of selecting SNP locus to retain, 'best' or 'random'[default 'random'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

The genlight object, with the secondary SNP loci removed.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

gl.report.secondaries(testset.gl)result <- gl.filter.secondaries(testset.gl)

Filters loci that are sex linked

Description

Alleles unique to the Y or W chromosome and monomorphic on the X chromosomeswill appear in the SNP dataset as genotypes that are heterozygotic in allindividuals of the heterogametic sex and homozygous in all individuals of thehomogametic sex. This function keeps or drops loci with alleles that behavein this way, as putative sex specific SNP markers.

Usage

gl.filter.sexlinked(  x,  sex = NULL,  filter = NULL,  read.depth = 0,  t.het = 0.1,  t.hom = 0.1,  t.pres = 0.1,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = three_colors,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

sex

Factor that defines the sex of individuals. See explanation indetails [default NULL].

filter

Either 'keep' to keep sex linked markers only or 'drop' to dropsex linked markers [required].

read.depth

Additional filter option to keep only loci above a certainread.depth. Default to 0, which means read.depth is not taken into account[default 0].

t.het

Tolerance in the heterogametic sex, that is t.het=0.05 meansthat 5% of the heterogametic sex can be homozygous and still be regarded asconsistent with a sex specific marker [default 0.1].

t.hom

Tolerance in the homogametic sex, that is t.hom=0.05 means that5% of the homogametic sex can be heterozygous and still be regarded asconsistent with a sex specific marker [default 0.1].

t.pres

Tolerance in presence, that is t.pres=0.05 means that asilicodart marker can be present in either of the sexes and still be regardedas a sex-linked marker [default 0.1].

plot.out

Creates a plot that shows the heterozygosity of males andfemales at each loci be regarded as consistent with a sex specific marker [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of three color names for the not sex-linked loci, forthe sex-linked loci and for the area in which sex-linked loci appear [default three_colors].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default NULL, unless specified using gl.set.verbosity].

Details

Sex of the individuals for which sex is known with certainty can be providedvia a factor (equal to the length of the number of individuals) or to be heldin the variablex@other$ind.metrics$sex.Coding is: M for male, F for female, U or NA for unknown/missing.The script abbreviates the entries here to the first character. So, coding of'Female' and 'Male' works as well. Character are also converted to upper cases.

' Function's output

This function creates also a plot that shows the heterozygosity of males andfemales at each loci for SNP data or percentage of present/absent in the case of SilicoDArT data.

Examples of other themes that can be used can be consulted in

Value

The filtered genlight object (filter = 'keep': sex linked loci,filter='drop', everything except sex linked loci).

Author(s)

Arthur Georges, Bernd Gruber & Floriaan Devloo-Delva (Post tohttps://groups.google.com/d/forum/dartr)

Examples

  out <- gl.filter.sexlinked(testset.gl, filter='drop')out <- gl.filter.sexlinked(testset.gs, filter='drop')

Filters loci in a genlight {adegenet} object based on sequence taglength

Description

SNP datasets generated by DArT typically have sequence tag lengths rangingfrom 20 to 69 base pairs.

Usage

gl.filter.taglength(x, lower = 20, upper = 69, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

lower

Lower threshold value below which loci will be removed[default 20].

upper

Upper threshold value above which loci will be removed[default 69].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

Returns a genlight object retaining loci with a sequence tag lengthin the range specified by the lower and upper threshold.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data  gl.report.taglength(testset.gl)  result <- gl.filter.taglength(testset.gl,lower=60)  gl.report.taglength(result)# Tag P/A data  gl.report.taglength(testset.gs)  result <- gl.filter.taglength(testset.gs,lower=60)  gl.report.taglength(result)    test <- gl.subsample.loci(platypus.gl, n =100)  res <- gl.report.taglength(test)

Generates a matrix of fixed differences and associated statistics forpopulations taken pairwise

Description

This script takes SNP data or sequence tag P/A data grouped into populationsin a genlight object (DArTSeq) and generates a matrix of fixed differencesbetween populations taken pairwise

Usage

gl.fixed.diff(  x,  tloc = 0,  test = FALSE,  delta = 0.02,  alpha = 0.05,  reps = 1000,  mono.rm = TRUE,  pb = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes or tag P/A data(SilicoDArT) or an object of class 'fd' [required].

tloc

Threshold defining a fixed difference (e.g. 0.05 implies 95:5 vs5:95 is fixed) [default 0].

test

If TRUE, calculate p values for the observed fixed differences[default FALSE].

delta

Threshold value for the true population minor allele frequency(MAF) from which resultant sample fixed differences are considered truepositives [default 0.02].

alpha

Level of significance used to display non-significantdifferences between populations as they are compared pairwise [default 0.05].

reps

Number of replications to undertake in the simulation to estimateprobability of false positives [default 1000].

mono.rm

If TRUE, loci that are monomorphic across all individuals areremoved before beginning computations [default TRUE].

pb

If TRUE, show a progress bar on time consuming loops[default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

A fixed difference at a locus occurs when two populations share no alleles orwhere all members of one population has a sequence tag scored, and allmembers of the other population has the sequence tag absent. The challengewith this approach is that when sample sizes are finite, fixed differenceswill occur through sampling error, compounded when many loci are examined.Simulations suggest that sample sizes of n1=5 and n2=5 are adequate to reducethe probability of [experiment-wide] type 1 error to negligible levels[ploidy=2]. A warning is issued if comparison between two populationsinvolves sample sizes less than 5, taking into account allele drop-out.

Optionally, if test=TRUE, the script will test the fixed differences betweenfinal OTUs for statistical significance, using simulation, and then furtheramalgamate populations that for which there are no significant fixeddifferences at a specified level of significance (alpha). To avoid conflationof true fixed differences with false positives in the simulations, it isnecessary to decide a threshold value (delta) for extreme true allelefrequencies that will be considered fixed for practical purposes. That is,fixed differences in the sample set will be considered to be positives (notfalse positives) if they arise from true allele frequencies of less than1-delta in one or both populations. The parameter delta is typically set tobe small (e.g. delta = 0.02).

NOTE: The above test will only be calculated if tloc=0, that is, for analysesof absolute fixed differences. The test applies in comparisons of allopatricpopulations only. For sympatric populations, use gl.pval.sympatry().

An absolute fixed difference is as defined above. However, one might wish toscore fixed differences at some lower level of allele frequency difference,say where percent allele frequencies are 95,5 and 5,95 rather than 100:0 and0:100. This adjustment can be done with the tloc parameter. For example,tloc=0.05 means that SNP allele frequencies of 95,5 and 5,95 percent will beregarded as fixed when comparing two populations at a locus.

Value

A list of Class 'fd' containing the gl object and square matrices,as follows:

$gl – the output genlight object;
$fd – raw fixed differences;
$pcfd – percent fixed differences;
$nobs – mean no. of individuals used in each comparison;
$nloc – total number of loci used in each comparison;
$expfpos – if test=TRUE, the expected count of false positivesfor each comparison [by simulation];
$sdfpos – if test=TRUE, the standard deviation of the count offalse positives for each comparison [by simulation];
$pval – if test=TRUE, the significance of the count of fixeddifferences [by simulation])

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

fd <- gl.fixed.diff(testset.gl, tloc=0, verbose=3 )fd <- gl.fixed.diff(testset.gl, tloc=0, test=TRUE, delta=0.02, reps=100, verbose=3 )

Calculates a pairwise Fst values for populations in a genlight object

Description

This script calculates pairwise Fst values based on the implementation in theStAMPP package (?stamppFst). It allows to run bootstrap to estimateprobability of Fst values to be different from zero. For detailed informationplease check the help pages (?stamppFst).

Usage

gl.fst.pop(x, nboots = 1, percent = 95, nclusters = 1, verbose = NULL)

Arguments

x

Name of the genlight containing the SNP genotypes [required].

nboots

Number of bootstraps to perform across loci to generateconfidence intervals and p-values [default 1].

percent

Percentile to calculate the confidence interval around[default 95].

nclusters

Number of processor threads or cores to use duringcalculations [default 1].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A matrix of distances between populations (class dist), if nboots =1,otherwise a list with Fsts (in a matrix), Pvalues (a matrix of pvalues),Bootstraps results (data frame of all runs). Hint: Useas.matrix(as.dist(fsts)) if you want to have a squared matrix withsymmetric entries returned, instead of a dist object.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

test <- gl.filter.callrate(platypus.gl,threshold = 1)test <- gl.filter.monomorphs(test)out <- gl.fst.pop(test, nboots=1)

Performs least-cost path analysis based on a friction matrix

Description

This function calculates the pairwise distances (Euclidean, cost pathdistances and genetic distances) of populations using a friction matrix anda spatial genind object. The genind object needs to have coordinates in thesame projected coordinate system as the friction matrix. The frictionmatrix can be either a single raster of a stack of several layers. If astack is provided the specified cost distance is calculated for each layerin the stack. The output of this function can be used with the functionswassermann orlgrMMRR to test for the significance of alayer on the genetic structure.

Usage

gl.genleastcost(  x,  fric.raster,  gen.distance,  NN = NULL,  pathtype = "leastcost",  plotpath = TRUE,  theta = 1,  verbose = NULL)

Arguments

x

A spatial genind object. See ?popgenreport how to providecoordinates in genind objects [required].

fric.raster

A friction matrix [required].

gen.distance

Specification which genetic distance method should beused to calculate pairwise genetic distances between populations ( 'D','Gst.Nei', 'Gst.Hedrick') or individuals ('Smouse', 'Kosman', 'propShared')[required].

NN

Number of neighbours used when calculating the cost distance(possible values 4, 8 or 16). As the default is NULL a value has to beprovided if pathtype='leastcost'. NN=8 is most commonly used. Be aware thatlinear structures may cause artefacts in the least-cost paths, thereforeinspect the actual least-cost paths in the provided output [default NULL].

pathtype

Type of cost distance to be calculated (based on function inthegdistance package. Available distances are 'leastcost', 'commute'or 'rSPDistance'. See functions in the gdistance package for futherexplanations. If the path type is set to 'leastcost' then paths and alsopathlength are returned [default 'leastcost'].

plotpath

switch if least cost paths should be plotted (works only ifpathtype='leastcost'. Be aware this slows down the computation, but it isrecommended to do this to check least cost paths visually.

theta

value needed for rSPDistance function. SeerSPDistance in packagegdistance [default 1].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

Returns a list that consists of four pairwise distance matrices(Euclidean, Cost, length of path and genetic) and the actual paths as spatialline objects.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

References

Cushman, S., Wasserman, T., Landguth, E. and Shirk, A. (2013).Re-Evaluating Causal Modeling with Mantel Tests in Landscape Genetics.Diversity, 5(1), 51-72.
Landguth, E. L., Cushman, S. A., Schwartz, M. K., McKelvey, K. S.,Murphy, M. and Luikart, G. (2010). Quantifying the lag time to detectbarriers in landscape genetics. Molecular ecology, 4179-4191.
Wasserman, T. N., Cushman, S. A., Schwartz, M. K. and Wallin, D. O.(2010). Spatial scaling and multi-model inference in landscape genetics:Martes americana in northern Idaho. Landscape Ecology, 25(10), 1601-1612.

Examples

## Not run: data(possums.gl)library(raster)  #needed for that examplelandscape.sim <- readRDS(system.file('extdata','landscape.sim.rdata', package='dartR'))glc <- gl.genleastcost(x=possums.gl,fric.raster=landscape.sim ,gen.distance = 'D', NN=8, pathtype = 'leastcost',plotpath = TRUE)library(PopGenReport)PopGenReport::wassermann(eucl.mat = glc$eucl.mat, cost.mat = glc$cost.mats,  gen.mat = glc$gen.mat)lgrMMRR(gen.mat = glc$gen.mat, cost.mats = glc$cost.mats,  eucl.mat = glc$eucl.mat)## End(Not run)

Calculates an identity by descent matrix

Description

This function calculates the mean probability of identity by state (IBS)across loci that would result from all the possible crosses of theindividuals analyzed. IBD is calculated by an additive relationship matrixapproach developed by Endelman and Jannink (2012) as implemented in thefunctionA.mat (package rrBLUP).

Usage

gl.grm(  x,  plotheatmap = TRUE,  palette_discrete = discrete_palette,  palette_convergent = convergent_palette,  legendx = 0,  legendy = 0.5,  verbose = NULL,  ...)

Arguments

x

Name of the genlight object containing the SNP data [required].

plotheatmap

A switch if a heatmap should be shown [default TRUE].

palette_discrete

A discrete palette for the color of populations or alist with as many colors as there are populations in the dataset[default discrete_palette].

palette_convergent

A convergent palette for the IBD values[default convergent_palette].

legendx

x coordinates for the legend[default 0].

legendy

y coordinates for the legend[default 1].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

...

Parameters passed to function A.mat from package rrBLUP.

Details

Two or more alleles are identical by descent (IBD) if they are identicalcopies of the same ancestral allele in a base population. The additiverelationship matrix is a theoretical framework for estimating a relationshipmatrix that is consistent with an approach to estimate the probability thatthe alleles at a random locus are identical in state (IBS).

This function also plots a heatmap, and a dendrogram, of IBD values whereeach diagonal element has a mean that equals 1+f, where f is the inbreedingcoefficient (i.e. the probability that the two alleles at a randomly chosenlocus are IBD from the base population). As this probability lies between 0and 1, the diagonal elements range from 1 to 2. Because the inbreedingcoefficients are expressed relative to the current population, the mean ofthe off-diagonal elements is -(1+f)/n, where n is the number of loci.Individual names are shown in the margins of the heatmap and colorsrepresent different populations.

Value

An identity by descent matrix

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

References

Endelman, J. B. (2011). Ridge regression and other kernels for genomicselection with r package rrblup. The Plant Genome 4, 250.
Endelman, J. B. , Jannink, J.-L. (2012). Shrinkage estimation of therealized relationship matrix. G3: Genes, Genomics, Genetics 2, 1405.

Examples

gl.grm(platypus.gl[1:10,1:100])

Represents a genomic relationship matrix (GRM) as a network

Description

This script takes a G matrix generated bygl.grm and representsthe relationship among the specimens as a network diagram. In order to usethis script, a decision is required on a threshold for relatedness to berepresented as link in the network, and on the layout used to create thediagram.

Usage

gl.grm.network(  G,  x,  method = "fr",  node.size = 8,  node.label = TRUE,  node.label.size = 2,  node.label.color = "black",  link.color = NULL,  link.size = 2,  relatedness_factor = 0.125,  title = "Network based on a genomic relationship matrix",  palette_discrete = NULL,  save2tmp = FALSE,  verbose = NULL)

Arguments

G

A genomic relationship matrix (GRM) generated bygl.grm [required].

x

A genlight object from which the G matrix was generated [required].

method

One of 'fr', 'kk', 'gh' or 'mds' [default 'fr'].

node.size

Size of the symbols for the network nodes [default 8].

node.label

TRUE to display node labels [default TRUE].

node.label.size

Size of the node labels [default 3].

node.label.color

Color of the text of the node labels[default 'black'].

link.color

Color palette for links [default NULL].

link.size

Size of the links [default 2].

relatedness_factor

Factor of relatedness [default 0.125].

title

Title for the plot[default 'Network based on genomic relationship matrix'].

palette_discrete

A discrete palette for the color of populations or alist with as many colors as there are populations in the dataset[default NULL].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The gl.grm.network function takes a genomic relationship matrix (GRM) generated by the gl.grm function to represent the relationship among individuals in the dataset as a network diagram. To generate the GRM, the function gl.grm uses the function A.mat from package rrBLUP, which implementsthe approach developed by Endelman and Jannink (2012).

The GRM is an estimate of the proportion of alleles that two individuals havein common. It is generated by estimating the covariance of the genotypes between two individuals, i.e. how much genotypes in the two individualscorrespond with each other. This covariance depends on the probability thatalleles at a random locus are identical by state (IBS). Two alleles are IBS if they represent the same allele. Two alleles are identical by descent (IBD) if one is a physical copy of the other or if they are both physical copies of the same ancestral allele. Note that IBD is complicatedto determine. IBD implies IBS, but not conversely. However, as the numberof SNPs in a dataset increases, the mean probability of IBS approaches the mean probability of IBD.

It follows that the off-diagonal elements of the GRM are two times the kinship coefficient, i.e. the probability that two alleles at a random locusdrawn from two individuals are IBD. Additionally, the diagonal elements ofthe GRM are 1+f, where f is the inbreeding coefficient of each individual,i.e. the probability that the two alleles at a random locus are IBD.

Choosing a meaningful threshold to represent the relationship between individuals is tricky because IBD is not an absolute state but is relative toa reference population for which there is generally little information so that we can estimate the kinship of a pair of individuals only relative to some other quantity. To deal with this, we can use the average inbreeding coefficient of the diagonal elements as the reference value. For this, the function subtracts 1 from the mean of the diagonal elements of the GRM. In asecond step, the off-diagonal elements are divided by 2, and finally, the mean of the diagonal elements is subtracted from each off-diagonal element after dividing them by 2. This approach is similar to the one used by Goudet et al. (2018).

Below is a table modified from Speed & Balding (2015) showing kinship values,and their confidence intervals (CI), for different relationships that could be used to guide the choosing of the relatedness threshold in the function.

|Relationship |Kinship | 95

|Identical twins/clones/same individual | 0.5 | - |

|Sibling/Parent-Offspring | 0.25 | (0.204, 0.296)|

|Half-sibling | 0.125 | (0.092, 0.158)|

|First cousin | 0.062 | (0.038, 0.089)|

|Half-cousin | 0.031 | (0.012, 0.055)|

|Second cousin | 0.016 | (0.004, 0.031)|

|Half-second cousin | 0.008 | (0.001, 0.020)|

|Third cousin | 0.004 | (0.000, 0.012)|

|Unrelated | 0 | - |

Four layout options are implemented in this function:

'fr' Fruchterman-Reingold layoutlayout_with_fr(package igraph)
'kk' Kamada-Kawai layoutlayout_with_kk (package igraph)
'gh' Graphopt layoutlayout_with_graphopt(package igraph)
'mds' Multidimensional scaling layoutlayout_with_mds(package igraph)

Value

A network plot showing relatedness between individuals

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

References

Endelman, J. B. , Jannink, J.-L. (2012). Shrinkage estimation of the realized relationship matrix. G3: Genes, Genomics, Genetics 2, 1405.
Goudet, J., Kay, T., & Weir, B. S. (2018). How to estimate kinship.Molecular Ecology, 27(20), 4121-4135.
Speed, D., & Balding, D. J. (2015). Relatedness in the post-genomic era: is it still useful?. Nature Reviews Genetics, 16(1), 33-44.

Examples

if (requireNamespace("igraph", quietly = TRUE) & requireNamespace("rrBLUP",quietly = TRUE) & requireNamespace("fields", quietly=TRUE)) {t1 <- possums.gl# filtering on call rate t1 <- gl.filter.callrate(t1)t1 <- gl.subsample.loci(t1,n = 100)# relatedness matrixres <- gl.grm(t1,plotheatmap = FALSE)# relatedness networkres2 <- gl.grm.network(res,t1,relatedness_factor = 0.125)}

Performs Hardy-Weinberg tests over loci and populations

Description

Hardy-Weinberg tests are performed for each loci in each of the populationsas defined by the pop slot in a genlight object.

Usage

gl.hwe.pop(  x,  alpha_val = 0.05,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = c("gray90", "deeppink"),  HWformat = FALSE,  verbose = NULL)

Arguments

x

A genlight object with a population defined[pop(x) does not return NULL].

alpha_val

Level of significance for testing [default 0.05].

plot.out

If TRUE, returns a plot object compatible with ggplot,otherwise returns a dataframe [default TRUE].

plot_theme

User specified theme [default theme_dartR()].

plot_colors

Vector with two color names for the borders and fill[default two_colors].[default discrete_palette].

HWformat

Switch if data should be returned in HWformat (counts ofGenotypes to be used in packageHardyWeinberg)

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

This function employs theHardyWeinberg package, which needs to beinstalled. The function that is used isHWExactStats, but there are several other greatfunctions implemented in the package regarding HWE. Therefore, this functioncan return the data in the format expected by the HWE package expects, viaHWformat=TRUE and then use this to run other functions of the package.

This functions performs a HWE test for every population (rows) and loci(columns) and returns a true false matrix. True is reported if the p-value ofan HWE-test for a particular loci and population was below the specifiedthreshold (alpha_val, default=0.05). The thinking behind this approach isthat loci that are not in HWE in several populations have most likely to betreated (e.g. filtered if loci under selection are of interest). If plot=TRUEa barplot on the loci and the sum of deviation over all population isreturned. Loci that deviate in the majority of populations can be identifiedvia colSums on the resulting matrix.

Plot themes can be obtained from

Resultant ggplots and the tabulation are saved to the session's temporarydirectory.

Value

The function returns a list with up to three components:

'HWE' is the matrix over loci and populations
'plot' is a plot (ggplot) which shows the significant resultsfor population and loci (can be amended further using ggplot syntax)
'HWEformat=TRUE' the 'HWformat' entails SNP data for each populationin 'HardyWeinberg'-format to be used with other functions of the package(e.gHWPerm orHWExactPrevious).

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

out <- gl.hwe.pop(bandicoot.gl[,1:33], alpha_val=0.05, plot.out=TRUE, HWformat=FALSE)

Performs isolation by distance analysis

Description

This function performs an isolation by distance analysis based on a Manteltest and also produces an isolation by distance plot. If a genlight objectwith coordinates is provided, then an Euclidean and genetic distance matricesare calculated.'

Usage

gl.ibd(  x = NULL,  distance = "Fst",  coordinates = "latlon",  Dgen = NULL,  Dgeo = NULL,  Dgeo_trans = "Dgeo",  Dgen_trans = "Dgen",  permutations = 999,  plot.out = TRUE,  paircols = NULL,  plot_theme = theme_dartR(),  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Genlight object. If provided a standard analysis on Fst/1-Fst andlog(distance) is performed [required].

distance

Type of distance that is calculated and used for theanalysis. Can be either population based 'Fst' [stamppFst],'D' [stamppNeisD] or individual based 'propShared',[gl.propShared], 'euclidean' [gl.dist.ind, method='Euclidean'][default "Fst"].

coordinates

Can be either 'latlon', 'xy' or a two column data.framewith column names 'lat','lon', 'x', 'y'). Coordinates are provided viagl@other$latlon ['latlon'] or viagl@other$xy ['xy']. If latlondata will be projected to meters using Mercator system [google maps] or ifxy then distance is directly calculated on the coordinates.

Dgen

Genetic distance matrix if no genlight object is provided[default NULL].

Dgeo

Euclidean distance matrix if no genlight object is provided[default NULL].

Dgeo_trans

Transformation to be used on the Euclidean distances. SeeDgen_trans [default "Dgeo"].

Dgen_trans

You can provide a formula to transform the geneticdistance. The transformation can be applied as a formula using Dgen as thevariable to be transformed. For example:Dgen_trans = 'Dgen/(1-Dgen)'. Any valid R expression can be used here [default 'Dgen', which is the identity function.]

permutations

Number of permutations in the Mantel test [default 999].

plot.out

Should an isolation by distance plot be returned[default TRUE].

paircols

Should pairwise dots colored by 'pop'ulation/'ind'ividualpairs [default 'pop']. You can color pairwise individuals by pairwisepopulation colors.

plot_theme

Theme for the plot. See details for options[default theme_dartR()].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

Currently pairwise Fst and D between populations and1-propShared and Euclidean distance between individuals areimplemented. Coordinates are expected as lat long and converted to GoogleEarth Mercator projection. If coordinates are already projected, provide themat the x@other$xy slot.

You can provide also your own genetic and Euclidean distance matrices. Thefunction is based on the code provided by the adegenet tutorial(http://adegenet.r-forge.r-project.org/files/tutorial-basics.pdf),using the functionsmantel (package vegan),stamppFst,stamppNeisD (package StAMPP) andgl.propShared or gl.dist.ind. For transformation you need to have the dismopackage installed. As a new feature you can plot pairwise relationship usingdouble colored points (paircols=TRUE). Pairwise relationship can bevisualised via populations or individuals, depending which distance iscalculated. Please note: Often a problem arises, if an individual based distance is calculated (e.g. propShared) and some individuals have identicalcoordinates as this results in distances of zero between those pairs of individuals.

If the standard transformation [log(Dgeo)] is used, this results in an infinite value, because of trying to calculate'log(0)'. To avoid this, the easiest fix is to change the transformation from log(Dgeo) to log(Dgeo+1) or you could add some "noise" to the coordinates of the individuals (e.g. +- 1m,but be aware if you use lat lon then you rather want to add +0.00001 degreesor so).

Value

Returns a list of the following components: Dgen (the geneticdistance matrix), Dgeo (the Euclidean distance matrix), Mantel (thestatistics of the Mantel test).

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

References

Rousset, F. (1997). Genetic differentiation and estimation of gene flow fromF-statistics under isolation by distance. Genetics, 145(4), 1219-1228.

Examples

 #because of speed only the first 100 lociibd <- gl.ibd(bandicoot.gl[,1:100], Dgeo_trans='log(Dgeo)' ,Dgen_trans='Dgen/(1-Dgen)')#because of speed only the first 10 individuals)ibd <- gl.ibd(bandicoot.gl[1:10,], distance='euclidean', paircols='pop', Dgeo_trans='Dgeo')#only first 100 lociibd <- gl.ibd(bandicoot.gl[,1:100])

Imputates missing data

Description

This function imputes genotypes on a population-by-population basis, wherepopulations can be considered panmictic, or imputes the state forpresence-absence data.

Usage

gl.impute(  x,  method = "neighbour",  fill.residual = TRUE,  parallel = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence-absencedata [required].

method

Imputation method, either "frequency" or "HW" or "neighbour" or "random" [default "neighbour"].

fill.residual

Should any residual missing values remaining after imputation be set to 0, 1, 2 at random, taking into account global allele frequencies at the particular locus [default TRUE].

parallel

A logical indicating whether multiple cores -if available-should be used for the computations (TRUE), or not (FALSE); requires thepackage parallel to be installed [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

We recommend that imputation be performed on sampling locations, beforeany aggregation. Imputation is achieved by replacing missing values usingeither of two methods:

If "frequency", genotypes scored as missing at a locus in an individualare imputed using the average allele frequencies at that locus in the population from which the individual was drawn.
If "HW", genotypes scored as missing at a locus in an individual are imputed by sampling at random assuming Hardy-Weinberg equilibrium. Applies only to genotype data.
If "neighbour", substitute the missing values for the focal individualwith the values taken from the nearest neighbour. Repeat with next nearestand so on until all missing values are replaced.
if "random", missing data are substituted by random values (0, 1 or 2).

The nearest neighbour is the one with the smallest Euclidean distance in all the dataset.

The advantage of this approach is that it works regardless of how manyindividuals are in the population to which the focal individual belongs,and the displacement of the individual is haphazard as opposed to:

(a) Drawing the individual toward the population centroid (HW and Frequency).

(b) Drawing the individual toward the global centroid (glPCA).

Note that loci that are missing for all individuals in a population are not imputed with method 'frequency' or 'HW'. Consider using the functiongl.filter.allna with by.pop=TRUE to remove them first.

Value

A genlight object with the missing data imputed.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

 require("dartR.data")# SNP genotype datagl <- gl.filter.callrate(platypus.gl,threshold=0.95)gl <- gl.filter.allna(gl)gl <- gl.impute(gl,method="neighbour")# Sequence Tag presence-absence datags <- gl.filter.callrate(testset.gs,threshold=0.95)gl <- gl.filter.allna(gl)gs <- gl.impute(gs, method="neighbour")gs <- gl.impute(platypus.gl,method ="random")

Installs all required packages for using all functionsavailable in dartR

Description

The function compares the installed packages with the the currently availableones on CRAN. Be aware this function only works if a version of dartR isalready installed on your system. You can choose if you also want to have aspecific version of dartR installed ('CRAN', 'master', 'beta' or 'dev' ). 'master', 'beta' and 'dev' are installed from Github. Be aware that the dev version from github isnot fully tested and most certainly will contain untested functions.

Usage

gl.install.vanilla.dartR(flavour = NULL, verbose = NULL)

Arguments

flavour

The version of R you want to install. If NULLthen only packages needed for the current version will be installed. If'CRAN' current CRAN version will be installed. 'master' installs the GitHubmaster branch, 'beta' installs the latest stable version, and 'dev' installs the experimental development branch fromGitHub [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

Returns a message if the installation was successful/required.

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Combines two genlight objects

Description

This function combines two genlight objects and their associated metadata.The history associated with the two genlight objects is cleared from the newgenlight object. The individuals/samples must be the same in each genlightobject.

The function is typically used to combine datasets from the same servicewhere the files have been split because of size limitations. The data is readin from multiple csv files, then the resultant genlight objects are combined.

Usage

gl.join(x1, x2, verbose = NULL)

Arguments

x1

Name of the first genlight object [required].

x2

Name of the first genlight object [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A new genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

x1 <- testset.gl[,1:100]x1@other$loc.metrics <-  testset.gl@other$loc.metrics[1:100,]nLoc(x1)x2 <- testset.gl[,101:150]x2@other$loc.metrics <-  testset.gl@other$loc.metrics[101:150,]nLoc(x2)gl <- gl.join(x1, x2, verbose=2)nLoc(gl)

Removes all but the specified individuals from a dartR genlight object

Description

This script deletes all individuals apart from those listed (ind.list).Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).

Usage

gl.keep.ind(x, ind.list, recalc = FALSE, mono.rm = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object [required].

ind.list

A list of individuals to be retained [required].

recalc

If TRUE, recalculate the locus metadata statistics [default FALSE].

mono.rm

If TRUE, remove monomorphic and all NA loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A reduced dartR genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

  # SNP data    gl2 <- gl.keep.ind(testset.gl, ind.list=c('AA019073','AA004859'))  # Tag P/A data   gs2 <- gl.keep.ind(testset.gs, ind.list=c('AA020656','AA19077','AA004859'))

Removes all but the specified loci from a genlight object

Description

This function deletes loci that are not specified to keep, and their associated metadata.

The script returns a dartR genlight object with the retained loci. The script works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Usage

gl.keep.loc(x, loc.list = NULL, first = NULL, last = NULL, verbose = NULL)

Arguments

x

Name of the genlight object [required].

loc.list

A list of loci to be kept[required, if loc.range not specified].

first

First of a range of loci to be kept[required, if loc.list not specified].

last

Last of a range of loci to be kept[if not specified, last locus in the dataset].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object with the reduced data

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  gl2 <- gl.keep.loc(testset.gl, loc.list=c('100051468|42-A/T', '100049816-51-A/G'))# Tag P/A data  gs2 <- gl.keep.loc(testset.gs, loc.list=c('20134188','19249144'))

Removes all but the specified populations from a dartR genlight object

Description

Individuals are assigned to populations based on associated specimen metadatastored in the dartR genlight object.

This script deletes all individuals apart from those in listed populations (pop.list).Monomorphic loci and loci that are scored all NA are optionally deleted (mono.rm=TRUE). The script also optionally recalculates locus metatdata statistics to accommodatethe deletion of individuals from the dataset (recalc=TRUE).

Usage

gl.keep.pop(  x,  pop.list,  as.pop = NULL,  recalc = FALSE,  mono.rm = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object [required].

pop.list

List of populations to be retained [required].

as.pop

Temporarily assign another locus metric as the population forthe purposes of deletions [default NULL].

recalc

If TRUE, recalculate the locus metadata statistics [default FALSE].

mono.rm

If TRUE, remove monomorphic and all NA loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A reduced dartR genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data   gl2 <- gl.keep.pop(testset.gl, pop.list=c('EmsubRopeMata', 'EmvicVictJasp'))   gl2 <- gl.keep.pop(testset.gl, pop.list=c('EmsubRopeMata', 'EmvicVictJasp'),   mono.rm=TRUE,recalc=TRUE)   gl2 <- gl.keep.pop(testset.gl, pop.list=c('Female'),as.pop='sex') # Tag P/A data   gs2 <- gl.keep.pop(testset.gs, pop.list=c('EmsubRopeMata','EmvicVictJasp'))

Plots linkage disequilibrium against distance by population disequilibrium patterns

Description

The function creates a plot showingthe pairwise LD measure against distance in number of base pairs pooled overall the chromosomes and a red line representing the threshold (R.squared = 0.2) that is commonly used to imply that two loci are unlinked (Delourme etal., 2013; Li et al., 2014).

Usage

gl.ld.distance(  ld_report,  ld_resolution = 1e+05,  pop_colors = NULL,  plot_theme = NULL,  plot.out = TRUE,  save2tmp = FALSE,  plot_title = " ",  verbose = NULL)

Arguments

ld_report

Output from functiongl.report.ld.map [required].

ld_resolution

Resolution at which LD should be reported in number of base pairs [default NULL].

pop_colors

A color palette for box plots by population or a listwith as many colors as there are populations in the dataset[default NULL].

plot_theme

User specified theme [default NULL].

plot.out

Specify if plot is to be produced [default TRUE].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

plot_title

Title of tyh plot [default " "].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A dataframe with information of LD against distance by population.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

References

Delourme, R., Falentin, C., Fomeju, B. F., Boillot, M., Lassalle, G., André, I., . . . Marty, A. (2013). High-density SNP-based genetic map development and linkage disequilibrium assessment in Brassica napusL. BMC genomics, 14(1), 120.
Li, X., Han, Y., Wei, Y., Acharya, A., Farmer, A. D., Ho, J., . . . Brummer, E. C. (2014). Development of an alfalfa SNP array and its use to evaluate patterns of population structure and linkage disequilibrium. PLoS One, 9(1), e84329.

Examples

if ((requireNamespace("snpStats", quietly = TRUE)) & (requireNamespace("fields", quietly = TRUE))) {require("dartR.data")x <- platypus.glx <- gl.filter.callrate(x,threshold = 1)x <- gl.filter.monomorphs(x)x$position <- x$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1x$chromosome <- as.factor(x$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)ld_res <- gl.report.ld.map(x,ld_max_pairwise = 10000000)ld_res_2 <- gl.ld.distance(ld_res,ld_resolution= 1000000)}

Visualize patterns of linkage disequilibrium and identification of haplotypes

Description

This function plots a Linkage disequilibrium (LD) heatmap, where the colour shading indicates the strength of LD. Chromosome positions (Mbp) are shown onthe horizontal axis, and haplotypes appear as triangles and delimited by dark yellow vertical lines. Numbers identifying each haplotype are shown in the upper part of the plot.

The heatmap also shows heterozygosity for each SNP.

The function identifies haplotypes based on contiguous SNPs that are in linkage disequilibrium using as thresholdld_threshold_haplo andcontaining more thanmin_snps SNPs.

Usage

gl.ld.haplotype(  x,  pop_name = NULL,  chrom_name = NULL,  ld_max_pairwise = 1e+07,  maf = 0.05,  ld_stat = "R.squared",  ind.limit = 10,  min_snps = 10,  ld_threshold_haplo = 0.5,  coordinates = NULL,  color_haplo = "viridis",  color_het = "deeppink",  plot.out = TRUE,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

pop_name

Name of the population to analyse. If NULL all the populations are analised [default NULL].

chrom_name

Nme of the chromosome to analyse. If NULL all the chromosomes are analised [default NULL].

ld_max_pairwise

Maximum distance in number of base pairs at which LDshould be calculated [default 10000000].

maf

Minor allele frequency (by population) threshold to filter out loci. If a value > 1 is provided it will be interpreted as MAC (i.e. theminimum number of times an allele needs to be observed) [default 0.05].

ld_stat

The LD measure to be calculated: "LLR", "OR", "Q", "Covar","D.prime", "R.squared", and "R". Seeld(package snpStats) for details [default "R.squared"].

ind.limit

Minimum number of individuals that a population shouldcontain to take it in account to report loci in LD [default 10].

min_snps

Minimum number of SNPs that should have a haplotype to call it [default 10].

ld_threshold_haplo

Minimum LD between adjacent SNPs to call a haplotype [default 0.5].

coordinates

A vector of two elements with the start and end coordinates in base pairs to which restrict the analysis e.g. c(1,1000000) [default NULL].

color_haplo

Color palette for haplotype plot. See details[default "viridis"].

color_het

Color for heterozygosity [default "deeppink"].

plot.out

Specify if heatmap plot is to be produced [default TRUE].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The information for SNP's position should be stored in the genlight accessor"@position" and the SNP's chromosome name in the accessor "@chromosome"(see examples). The function will then calculate LD within each chromosome.

The output of the function includes a table with the haplotypesthat were identified and their location.

Colors of the heatmap (color_haplo) are based on the functionscale_fill_viridis from packageviridis. Other color palettes options are "magma", "inferno", "plasma", "viridis","cividis", "rocket", "mako" and "turbo".

Value

A table with the haplotypes that were identified.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")x <- platypus.glx <- gl.filter.callrate(x,threshold = 1)x <- gl.keep.pop(x, pop.list = "TENTERFIELD")x$chromosome <- as.factor(x$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)x$position <- x$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1ld_res <- gl.ld.haplotype(x,chrom_name = "NC_041728.1_chromosome_1",                          ld_max_pairwise = 10000000 )

Prints dartR reports saved in tempdir

Description

Prints dartR reports saved in tempdir

Usage

gl.list.reports()

Value

Prints a table with all reports saved in tempdir. Currently the stylecannot be changed.

Author(s)

Bernd Gruber & Luis Mijangos (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run: gl.report.callrate(testset.gl,save2tmp=TRUE)gl.list.reports()## End(Not run)

Loads an object from compressed binary format produced by gl.save()

Description

This is a wrapper for readRDS()

Usage

gl.load(file, verbose = NULL)

Arguments

file

Name of the file to receive the binary version of the object[required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The script loads the object from the current workspace and returns thegl object.

Value

The loaded object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

gl.save(testset.gl,file.path(tempdir(),'testset.rds'))gl <- gl.load(file.path(tempdir(),'testset.rds'))

Creates a proforma recode_ind file for reassigning individual(=specimen) names

Description

Renaming individuals may be required when there have been errors in labelingarising in the process from sample to sequencing files. There may be occasionswhere renaming individuals is required for preparation of figures.

Usage

gl.make.recode.ind(  x,  out.recode.file = "default_recode_ind.csv",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object [required].

out.recode.file

File name of the output file (including extension)[default default_recode_ind.csv].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Details

This function facilitates the construction of a recode table by producing aproforma file with current individual (=specimen) names in two identicalcolumns. Edit the second column to reassign individual names. Use keyword'Delete' to delete an individual.

Use outpath=getwd() or when calling this function to direct output files to your working directory.

The function works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Apply the recoding using gl.recode.ind().

Value

A vector containing the new individual names.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

result <- gl.make.recode.ind(testset.gl, out.recode.file ='Emmac_recode_ind.csv',outpath=tempdir())

Creates a proforma recode_pop_table file for reassigning populationnames

Description

Renaming populations may be required when there have been errors inassignment arising in the process from sample to sequence files or when onewishes to amalgamate populations, or delete populations. Recoding populationscan also be done with a recode table (csv).

Usage

gl.make.recode.pop(  x,  out.recode.file = "recode_pop_table.csv",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object [required].

out.recode.file

File name of the output file (including extension)[default recode_pop_table.csv].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

This function facilitates the construction of a recode table by producing aproforma file with current population names in two identical columns. Editthe second column to reassign populations. Use keyword 'Delete' to delete apopulation.

Use outpath=getwd() or when calling this function to direct output files to your working directory.

The function works with both genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Apply the recoding using gl.recode.pop().

Value

A vector containing the new population names.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

result <- gl.make.recode.pop(testset.gl,out.recode.file='test.csv',outpath=tempdir(),verbose=2)

Creates an interactive map (based on latlon) from a genlight object

Description

Creates an interactive map (based on latlon) from a genlight object

Usage

gl.map.interactive(  x,  matrix = NULL,  standard = TRUE,  symmetric = TRUE,  pop.labels = TRUE,  pop.labels.cex = 12,  ind.circles = TRUE,  ind.circle.cols = NULL,  ind.circle.cex = 10,  ind.circle.transparency = 0.8,  palette_links = NULL,  leg_title = NULL,  provider = "Esri.NatGeoWorldMap",  verbose = NULL)

Arguments

x

A genlight object (including coordinates within the latlon slot) [required].

matrix

A distance matrix between populations or individuals. Thematrix is visualised as lines between individuals/populations. If matrix isasymmetric two lines with arrows are plotted [default NULL].

standard

If a matrix is provided line width will be standardised to bebetween 1 to 10, if set to true, otherwise taken as given [default TRUE].

symmetric

If a symmetric matrix is provided only one line is drawnbased on the lower triangle of the matrix. If set to false arrows indicatingthe direction are used instead [default TRUE].

pop.labels

Population labels at the center of the individuals ofpopulations [default TRUE].

pop.labels.cex

Size of population labels [default 12].

ind.circles

Should individuals plotted as circles [default TRUE].

ind.circle.cols

Colors of circles. Colors can be provided as usual by names (e.g. "black") and are re-cycled. So a color c("blue","red") colors individuals alternatively between blue and red using the genlight objectorder of individuals. For transparency see parameter ind.circle.transparency. Defaults to rainbow colors by population if notprovided. If you want to have your own colors for each population, checkthe platypus.gl example below.

ind.circle.cex

(size or circles in pixels ) [default 10].

ind.circle.transparency

Transparency of circles between 0=invisible and 1=no transparency. Defaults to 0.8.

palette_links

Color palette for the links in case a matrix is provided[default NULL].

leg_title

Legend's title for the links in case a matrix is provided[default NULL].

provider

Passed to leaflet [default "Esri.NatGeoWorldMap"].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

A wrapper around theleaflet package. For possible background maps check as specified via the provider:http://leaflet-extras.github.io/leaflet-providers/preview/index.html

The palette_links argument can be any of the following:A character vector of RGB or named colors. Examples: palette(), c("#000000", "#0000FF", "#FFFFFF"), topo.colors(10)

The name of an RColorBrewer palette, e.g. "BuPu" or "Greens".

The full name of a viridis palette: "viridis", "magma", "inferno", or "plasma".

A function that receives a single value between 0 and 1 and returns a color.Examples: colorRamp(c("#000000", "#FFFFFF"), interpolate = "spline").

Value

plots a map

Author(s)

Bernd Gruber – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")gl.map.interactive(bandicoot.gl)cols <- c("red","blue","yellow")[as.numeric(pop(platypus.gl))]gl.map.interactive(platypus.gl, ind.circle.cols=cols, ind.circle.cex=10, ind.circle.transparency=0.5)

Maps a STRUCTURE plot using a genlight object

Description

This function takes the output of plotstructure (the q matrix) and maps theq-matrix across using the population centers from the genlight object thatwas used to run the structure analysis viagl.run.structure)and plots the typical structure bar plots on a spatial map, providing abarplot for each subpopulation. Therefore it requires coordinates from agenlight object. This kind of plots should support the interpretation of thespatial structure of a population, but in principle is not different fromgl.plot.structure

Usage

gl.map.structure(  qmat,  x,  K,  provider = "Esri.NatGeoWorldMap",  scalex = 1,  scaley = 1,  movepops = NULL,  pop.labels = TRUE,  pop.labels.cex = 12)

Arguments

qmat

Q-matrix from a structure run followed by a clumpp run object[fromgl.run.structure andgl.plot.structure][required].

x

Name of the genlight object containing the coordinates in the\@other$latlon slot to calculate the population centers [required].

K

The number for K to be plotted [required].

provider

Provider passed to leaflet. Checkprovidersfor a list of possible backgrounds [default "Esri.NatGeoWorldMap"].

scalex

Scaling factor to determine the size of the bars in x direction [default 1].

scaley

Scaling factor to determine the size of the bars in y direction[default 1].

movepops

A two-dimensional data frame that allows to move the center ofthe barplots manually in case they overlap. Often if populations arehorizontally close to each other. This needs to be a data.frame of thedimensions [rows=number of populations, columns = 2 (lon/lat)]. For eachpopulation you have to specify the x and y (lon and lat) units you want tomove the center of the plot, (see example for details) [default NULL].

pop.labels

Switch for population labels below the parplots [default TRUE].

pop.labels.cex

Size of population labels [default 12].

Details

Creates a mapped version of structure plots. For possible background mapscheck as specified via the provider:http://leaflet-extras.github.io/leaflet-providers/preview/index.html.You may need to adjust scalex and scaley values [default 1], as the sizedepends on the scale of the map and the position of the populations.

Value

An interactive map that shows the structure plots broken down by population.

returns the map and a list of the qmat split into sorted matrices perpopulation. This can be used to create your own map.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

References

Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016) strataG: An Rpackage for manipulating, summarizing and analysing population genetic data.Mol Ecol Resour. doi:10.1111/1755-0998.12559
Evanno, G., Regnaut, S., and J. Goudet. 2005. Detecting the number ofclusters of individuals using the software STRUCTURE: a simulation study.Molecular Ecology 14:2611-2620.
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a clustermatching and permutation program for dealing with label switching andmultimodality in analysis of population structure. Bioinformatics23(14):1801-1806. Available atclumpp

Examples

## Not run: #bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, exec = './structure.exe')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, k=2:4)#' #head(qmat)#gl.map.structure(qmat, bc,K=3)#gl.map.structure(qmat, bc,K=4)#move population 4 (out of 5) 0.5 degrees to the right and populations 1#0.3 degree to the north of the map.#mp <- data.frame(lon=c(0,0,0,0.5,0), lat=c(-0.3,0,0,0,0))#gl.map.structure(qmat, bc,K=4, movepops=mp)## End(Not run)

Merges two or more populations in a genlight object into one population

Description

Individuals are assigned to populations based on the specimen metadata datafile (csv) used with gl.read.dart().

This script assigns individuals from two nominated populations into a newsingle population. It can also be used to rename populations.

The script returns a genlight object with the new population assignments.

Usage

gl.merge.pop(x, old = NULL, new = NULL, verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes [required].

old

A list of populations to be merged [required].

new

Name of the new population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object with the new population assignments.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

   gl <- gl.merge.pop(testset.gl, old=c('EmsubRopeMata','EmvicVictJasp'), new='Outgroup')

Creates an input file for the program NewHybrids and runs it if NewHybrids is installed

Description

This function compares two sets of parental populations to identify loci thatexhibit a fixed difference, returns an genlight object with the reduceddata, and creates an input file for the program NewHybrids using the top 200(or user-specified lower loc.limit) loci. In the absence of two identifiedparental populations, the script will select a random set of up to 200 loci only(method='random') or up to the first 200 loci ranked on information content(method='AvgPIC').

A fixed difference occurs when a SNP allele is present in all individualsof one population and absent in the other. There is provision for settinga level of tolerance, e.g. threshold = 0.05 which considers alleles presentat greater than 95a fixed difference. Only up to 200 loci are retained, because of limitationsof NewHybids.

If you specify a directory for the NewHybrids executable file, then thescript will create the input file from the SNP data then run NewHybrids. Ifthe directory is set to NULL, the execution will stop once the input file(default='nhyb.txt') has been written to disk. Note: the executable optionwill not work on a Mac; Mac users should generate the NewHybrids input fileand run this on their local installation of NewHybrids.

Refer to the New Hybrids manual for further information on the parameters toset– http://ib.berkeley.edu/labs/slatkin/eriq/software/new_hybs_doc1_1Beta3.pdf

It is important to stringently filter the data on RepAvg and CallRate ifusing the random option. One might elect to repeat the analysis(method='random') and combine the resultant posterior probabilities shouldthe maximum of 200 loci be considered insufficient.

The F1 individuals should be homozygous at all loci for which the parentalpopulations are fixed and different, assuming parental populations have beenspecified. Sampling errors can result in this not being the case, especiallywhere the sample sizes for the parental populations are small. Alternatively,the threshold for posterior probabilities used to determine assignment(pprob) or the definition of a fixed difference (threshold) may be too lax.To assess the error rate in the determination of assignment of F1individuals, a plot of the frequency of homozygous reference, heterozygotesand homozygous alternate (SNP) can be produced by setting plot=TRUE (thedefault).

Usage

gl.nhybrids(  gl,  outpath = tempdir(),  p0 = NULL,  p1 = NULL,  threshold = 0,  method = "random",  loc.limit = 200,  plot = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  pprob = 0.95,  nhyb.directory = NULL,  BurnIn = 10000,  sweeps = 10000,  GtypFile = "TwoGensGtypFreq.txt",  AFPriorFile = NULL,  PiPrior = "Jeffreys",  ThetaPrior = "Jeffreys",  verbose = NULL)

Arguments

gl

Name of the genlight object containing the SNP data [required].

outpath

Path where to save the output file [default tempdir()].

p0

List of populations to be regarded as parental population 0[default NULL].

p1

List of populations to be regarded as parental population 1[default NULL].

threshold

Sets the level at which a gene frequency difference isconsidered to be fixed [default 0].

method

Specifies the method (random or AvgPIC) to select 200 loci forNewHybrids [default random].

loc.limit

Specifies the number of loci to use in the analysis [default 200]

plot

If TRUE, a plot of the frequency of homozygous reference,heterozygotes and homozygous alternate (SNP) is produced for the F1individuals[default TRUE, applies only if both parental populations are specified].

plot_theme

User specified theme [default theme_dartR()].

plot_colors

Vector with two color names for the borders and fill[default two_colors].

pprob

Threshold level for assignment to likelihood bins[default 0.95, used only if plot=TRUE].

nhyb.directory

Directory that holds the NewHybrids executable filee.g. C:/NewHybsPC [default NULL].

BurnIn

Number of sweeps to use in the burn in [default 10000].

sweeps

Number of sweeps to use in computing the actual MonteCarlo averages [default 10000].

GtypFile

Name of a file containing the genotype frequency classes[default TwoGensGtypFreq.txt].

AFPriorFile

Name of the file containing prior allele frequencyinformation [default NULL].

PiPrior

Jeffreys-like priors or Uniform priors for the parameter pi[default Jeffreys].

ThetaPrior

Jeffreys-like priors or Uniform priors for the parametertheta [default Jeffreys].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

The reduced genlight object, if parentals are provided; output ofNewHybrids is saved to the working directory.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

References

Anderson, E.C. and Thompson, E.A.(2002). A model-based method for identifying species hybrids using multilocus genetic data. Genetics. 160:1217-1229.

Examples

## Not run: m <- gl.nhybrids(testset.gl, p0=NULL, p1=NULL,nhyb.directory='D:/workspace/R/NewHybsPC', # Specify as necessaryoutpath="D:/workspace",  # Specify as necessary, usually getwd() [= workspace]BurnIn=100,sweeps=100,verbose=3)## End(Not run)

Identifies loci under selection per population using the outflankmethod of Whitlock and Lotterhos (2015)

Description

Identifies loci under selection per population using the outflankmethod of Whitlock and Lotterhos (2015)

Usage

gl.outflank(  gi,  plot = TRUE,  LeftTrimFraction = 0.05,  RightTrimFraction = 0.05,  Hmin = 0.1,  qthreshold = 0.05,  ...)

Arguments

gi

A genlight or genind object, with a defined population structure[required].

plot

A switch if a barplot is wanted [default TRUE].

LeftTrimFraction

The proportion of loci that are trimmed from thelower end of the range of Fst before the likelihood function is applied[default 0.05].

RightTrimFraction

The proportion of loci that are trimmed from theupper end of the range of Fst before the likelihood function is applied[default 0.05].

Hmin

The minimum heterozygosity required before including calculationsfrom a locus [default 0.1].

qthreshold

The desired false discovery rate threshold for calculatingq-values [default 0.05].

...

additional parameters (see documentation of outflank on github).

Details

This function is a wrapper around the outflank function provided byWhitlock and Lotterhos. To be able to run this function the packages qvalue(from bioconductor) and outflank (from github) needs to be installed. To doso see example below.

Value

Returns an index of outliers and the full outflank list

References

Whitlock, M.C. and Lotterhos K.J. (2015) Reliable detection of lociresponsible for local adaptation: inference of a neutral model throughtrimming the distribution of Fst. The American Naturalist 186: 24 - 36.

Github repository: Whitlock & Lotterhos:https://github.com/whitlock/OutFLANK (Check the readme.pdf within therepository for an explanation. Be aware you now can run OufFLANK from agenlight object)

Examples

gl.outflank(bandicoot.gl, plot = TRUE)

Ordination applied to genotypes in a genlight object (PCA), in an fdobject, or to a distance matrix (PCoA)

Description

This function takes the genotypes for individuals and undertakes a PearsonPrincipal Component analysis (PCA) on SNP or Tag P/A (SilicoDArT) data; itundertakes a Gower Principal Coordinate analysis (PCoA) if supplied with adistance matrix. Technically, any distance matrix can be represented in anordinated space using PCoA.

Usage

gl.pcoa(  x,  nfactors = 5,  correction = NULL,  mono.rm = TRUE,  parallel = FALSE,  n.cores = 16,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object or fd object containing the SNP data, ora distance matrix of type dist [required].

nfactors

Number of axes to retain in the output of factor scores[default 5].

correction

Method applied to correct for negative eigenvalues, either'lingoes' or 'cailliez' [Default NULL].

mono.rm

If TRUE, remove monomorphic loci [default TRUE].

parallel

TRUE if parallel processing is required (does fail underWindows) [default FALSE].

n.cores

Number of cores to use if parallel processing is requested[default 16].

plot.out

If TRUE, a diagnostic plot is displayed showing a scree plotfor the "informative" axes and a histogram of eigenvalues of the remaining "noise" axes [Default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplot [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

verbose= 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The function is essentially a wrapper for glPcaadegenet or pcoa {ape}with default settings apart from those specified as parameters in thisfunction. Sources of stress in the visual representation

While, technically, any distance matrix can be represented in an ordinatedspace, the representation will not typically be exact.There are three majorsources of stress in a reduced-representation of distances or dissimilaritiesamong entities using PCA or PCoA. By far the greatest source comes from thedecision to select only the top two or three axes from the ordinated set ofaxes derived from the PCA or PCoA. The representation of the entities such aheavily reduced space will not faithfully represent the distances in theinput distance matrix simply because of the loss of information in deeperinformative dimensions. For this reason, it is not sensible to be tooprecious about managing the other two sources of stress in the visualrepresentation.

The measure of distance between entities in a PCA is the Pearson CorrelationCoefficient, essentially a standardized Euclidean distance. This is both ametric distance and a Euclidean distance. In PCoA, the second source ofstress is the choice of distance measure or dissimilarity measure. While anydistance or dissimilarity matrix can be represented in an ordinated space,the distances between entities can be faithfully represented in that space(that is, without stress) only if the distances are metric. Furthermore, fordistances between entities to be faithfully represented in a rigid Cartesianspace, the distance measure needs to be Euclidean. If this is not the case,the distances between the entities in the ordinated visualized space will notexactly represent the distances in the input matrix (stress will be non-zero).This source of stress will be evident as negative eigenvalues in the deeperdimensions.

A third source of stress arises from having a sparse dataset, one withmissing values. This affects both PCA and PCoA. If the original data matrixis not fully populated, that is, if there are missing values, then even aEuclidean distance matrix will not necessarily be 'positive definite'. Itfollows that some of the eigenvalues may be negative, even though thedistance metric is Euclidean. This issue is exacerbated when the number ofloci greatly exceeds the number of individuals, as is typically the case whenworking with SNP data. The impact of missing values can be minimized bystringently filtering on Call Rate, albeit with loss of data. An alternativeis given in a paper 'Honey, I shrunk the sample covariance matrix' and morerecently by Ledoit and Wolf (2018), but their approach has not beenimplemented here.

The good news is that, unless the sum of the negative eigenvalues, arisingfrom a non-Euclidean distance measure or from missing values, approachesthose of the final PCA or PCoA axes to be displayed, the distortion isprobably of no practical consequence and certainly not comparable to thestress arising from selecting only two or three final dimensions out ofseveral informative dimensions for the visual representation.

Function's output

Two diagnostic plots are produced. The first is a Scree Plot, showing thepercentage variation explained by each of the PCA or PCoA axes, for thoseaxes that explain more than the original variables (loci) on average. Thatis, only informative axes are displayed. The scree plot informs the number ofdimensions to be retained in the visual summaries. As a rule of thumb, axeswith more than 10

The second graph shows the distribution of eigenvalues for the remaininguninformative (noise) axes, including those with negative eigenvalues.

Action is recommended (verbose >= 2) if the negative eigenvalues aredominant, their sum approaching in magnitude the eigenvalues for axesselected for the final visual solution.

Output is a glPca object conforming to adegenet::glPca but with only thefollowing retained.

$call - The call that generated the PCA/PCoA
$eig - Eigenvalues – All eigenvalues (positive, null, negative).
$scores - Scores (coefficients) for each individual
$loadings - Loadings of each SNP for each principal component

Plots and table were saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed withthe functiongl.list.reports. Note that they can be accessedonly in the current R session because tempdir is cleared each time that the Rsession is closed.

Examples of other themes that can be used can be consulted in

PCA was developed by Pearson (1901) and Hotelling (1933), whilst the bestmodern reference is Jolliffe (2002). PCoA was developed by Gower (1966) whilethe best modern reference is Legendre & Legendre (1998).

Value

An object of class pcoa containing the eigenvalues and factor scores

Author(s)

Author(s): Arthur Georges. Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

References

Cailliez, F. (1983) The analytical solution of the additive constantproblem. Psychometrika, 48, 305-308.
Gower, J. C. (1966) Some distance properties of latent root and vectormethods used in multivariate analysis. Biometrika, 53, 325-338.
Hotelling, H., 1933. Analysis of a complex of statistical variables intoPrincipal Components. Journal of Educational Psychology 24:417-441, 498-520.
Jolliffe, I. (2002) Principal Component Analysis. 2nd Edition, Springer,New York.
Ledoit, O. and Wolf, M. (2018). Analytical nonlinear shrinkage oflarge-dimensional covariance matrices. University of Zurich, Department ofEconomics, Working Paper No. 264, Revised version. Available at SSRN:https://ssrn.com/abstract=3047302 or http://dx.doi.org/10.2139/ssrn.3047302
Legendre, P. and Legendre, L. (1998). Numerical Ecology, Volume 24, 2ndEdition. Elsevier Science, NY.
Lingoes, J. C. (1971) Some boundary conditions for a monotone analysisof symmetric matrices. Psychometrika, 36, 195-203.
Pearson, K. (1901). On lines and planes of closest fit to systems ofpoints in space. Philosophical Magazine. Series 6, vol. 2, no. 11, pp.559-572.

Examples

## Not run: gl <- possums.gl# PCA (using SNP genlight object)pca <- gl.pcoa(possums.gl[1:50,],verbose=2)gl.pcoa.plot(pca,gl)gs <- testset.gslevels(pop(gs))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',6),'Em.subglobosa','Em.victoriae')# PCA (using SilicoDArT genlight object)pca <- gl.pcoa(gs)gl.pcoa.plot(pca,gs)# Collapsing pops to OTUs using Fixed Difference Analysis (using fd object)fd <- gl.fixed.diff(testset.gl)fd <- gl.collapse(fd)pca <- gl.pcoa(fd)gl.pcoa.plot(pca,fd$gl)# Using a distance matrixD <- gl.dist.ind(testset.gs, method='jaccard')pcoa <- gl.pcoa(D,correction="cailliez")gl.pcoa.plot(pcoa,gs)## End(Not run)

Bivariate or trivariate plot of the results of an ordination generatedusing gl.pcoa()

Description

This script takes output from the ordination generated by gl.pcoa() and plotsthe individuals classified by population.

Usage

gl.pcoa.plot(  glPca,  x,  scale = FALSE,  ellipse = FALSE,  plevel = 0.95,  pop.labels = "pop",  interactive = FALSE,  as.pop = NULL,  hadjust = 1.5,  vadjust = 1,  xaxis = 1,  yaxis = 2,  zaxis = NULL,  pt.size = 2,  pt.colors = NULL,  pt.shapes = NULL,  label.size = 1,  axis.label.size = 1.5,  save2tmp = FALSE,  verbose = NULL)

Arguments

glPca

Name of the PCA or PCoA object containing the factor scores andeigenvalues [required].

x

Name of the genlight object or fd object containing the SNPgenotypes or Tag P/A (SilicoDArT) genotypes or the Distance Matrix used togenerate the ordination [required].

scale

If TRUE, scale the x and y axes in proportion to % variationexplained [default FALSE].

ellipse

If TRUE, display ellipses to encapsulate points for eachpopulation [default FALSE].

plevel

Value of the percentile for the ellipse to encapsulate pointsfor each population [default 0.95].

pop.labels

How labels will be added to the plot['none'|'pop'|'legend', default = 'pop'].

interactive

If TRUE then the populations are plotted without labels,mouse-over to identify points [default FALSE].

as.pop

Assign another metric to represent populations for the plot[default NULL].

hadjust

Horizontal adjustment of label position in 2D plots[default 1.5].

vadjust

Vertical adjustment of label position in 2D plots [default 1].

xaxis

Identify the x axis from those available in the ordination(xaxis <= nfactors) [default 1].

yaxis

Identify the y axis from those available in the ordination(yaxis <= nfactors) [default 2].

zaxis

Identify the z axis from those available in the ordination for a3D plot (zaxis <= nfactors) [default NULL].

pt.size

Specify the size of the displayed points [default 2].

pt.colors

Optionally provide a vector of nPop colors(run gl.select.colors() for color options) [default NULL].

pt.shapes

Optionally provide a vector of nPop shapes(run gl.select.shapes() for shape options) [default NULL].

label.size

Specify the size of the point labels [default 1].

axis.label.size

Specify the size of the displayed axis labels[default 1.5].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The factor scores are taken from the output of gl.pcoa() and the populationassignments are taken from from the original data file. In the bivariateplots, the specimens are shown optionally with adjacent labels and enclosingellipses. Population labels on the plot are shuffled so as not to overlap(using package {directlabels}).This can be a bit clunky, as the labels may be some distance from the pointsto which they refer, but it provides the opportunity for moving labels aroundusing graphics software (e.g. Adobe Illustrator).

3D plotting is activated by specifying a zaxis.

Any pair or trio of axes can be specified from the ordination, provided theyare within the range of the nfactors value provided to gl.pcoa().In the 2D plots, axes can be scaled to represent the proportion of variationexplained. In any case, the proportion of variation explained by each axis isprovided in the axis label.

Colors and shapes of the points can be altered by passing a vector of shapesand/or a vector of colors. These vectors can be created withgl.select.shapes() and gl.select.colors() and passed to this script using thept.shapes and pt.colors parameters.

Points displayed in the ordination can be identified if the optioninteractive=TRUE is chosen, in which case the resultant plot is ggplotly()friendly. Identification of points is by moving the mouse over them. Referto the plotly package for further information.The interactive option is automatically enabled for 3D plotting.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SET UP DATASETgl <- testset.gllevels(pop(gl))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',7),'Em.subglobosa','Em.victoriae')# RUN PCApca<-gl.pcoa(gl,nfactors=5)# VARIOUS EXAMPLESgl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.95, pop.labels='pop', axis.label.size=1, hadjust=1.5,vadjust=1)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, pop.labels='legend', axis.label.size=1)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, pop.labels='legend', axis.label.size=1.5,scale=TRUE)gl.pcoa.plot(pca, gl, ellipse=TRUE, axis.label.size=1.2, xaxis=1, yaxis=3, scale=TRUE)gl.pcoa.plot(pca, gl, pop.labels='none',scale=TRUE)gl.pcoa.plot(pca, gl, axis.label.size=1.2, interactive=TRUE)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, xaxis=1, yaxis=2, zaxis=3)# color AND SHAPE ADJUSTMENTSshp <- gl.select.shapes(select=c(16,17,17,0,2))col <- gl.select.colors(library='brewer',palette='Spectral',ncolors=11,select=c(1,9,3,11,11))gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.95, pop.labels='pop', pt.colors=col, pt.shapes=shp, axis.label.size=1, hadjust=1.5,vadjust=1)gl.pcoa.plot(pca, gl, ellipse=TRUE, plevel=0.99, pop.labels='legend', pt.colors=col, pt.shapes=shp, axis.label.size=1)  test <- gl.pcoa(platypus.gl) gl.pcoa.plot(glPca = test, x = platypus.gl)

Generates percentage allele frequencies by locus and population

Description

Usage

gl.percent.freq(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or Tag P/A(SilicoDArT) data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A matrix with allele (SNP data) or presence/absence frequencies(Tag P/A data) broken down by population and locus

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

m <-  gl.percent.freq(testset.gl)

Replays the history and applies it to a genlight object

Description

Replays the history and applies it to a genlight object

Usage

gl.play.history(x, history = NULL, verbose = 0)

Arguments

x

A genlight object (with a history slot) [optional].

history

If no history is provided the complete history ofx is used (recreating the identical object x). If history is a vector itindicates which which part of the history of x is used [c(1,3,4) usesthe first, third and forth entry fromx@other$history]. Or a simplelink to a history slot of another genlight object (e.g.x2@other$history[c(1,4,5)]). [optional].

verbose

If set to one then history commands are printed,which may facilitate reading the output [default 0].

Details

This function basically allows to create a 'template history'(=set of filters) and apply them to any other genlight object. Histories canalso be saved and loaded (see. gl.save.history and gl.load.history).

Value

Returns a genlight object that was created by replaying the providedapplied to the genlight object x. Please note you can 'mix' histories orpart of them and apply them to different genlight objects. If the historydoes not containgl.read.dart, histories of x and history areconcatenated.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr).

Examples

## Not run: dartfile <- system.file('extdata','testset_SNPs_2Row.csv', package='dartR')metadata <- system.file('extdata','testset_metadata.csv', package='dartR')gl <- gl.read.dart(dartfile, ind.metafile = metadata, probar=FALSE)gl2 <- gl.filter.callrate(gl, method='loc', threshold=0.9)gl3 <- gl.filter.callrate(gl2, method='ind', threshold=0.95)#Now 'replay' part of the history 'onto' another genlight object#bc.fil <- gl.play.history(gl.compliance.check(bandicoot.gl),#history=gl3@other$history[c(2,3)], verbose=1)#gl.print.history(bc.fil)## End(Not run)

Plots fastStructure analysis results (Q-matrix)

Description

This function takes a fastStructure run object (output fromgl.run.faststructure) and plots the typical structure barplot that visualize the q matrix of a fastStructure run.

Usage

gl.plot.faststructure(  sr,  k.range,  met_clumpp = "greedyLargeK",  iter_clumpp = 100,  clumpak = TRUE,  plot_theme = NULL,  colors_clusters = NULL,  ind_name = TRUE,  border_ind = 0.15)

Arguments

sr

fastStructure run object fromgl.run.faststructure[required].

k.range

The number for K of the q matrix that should be plotted. Needs to be within you simulated range of K's in your sr structure run object. If NULL, all the K's are plotted [default NULL].

met_clumpp

The algorithm to use to infer the correct permutations.One of 'greedy' or 'greedyLargeK' or 'stephens' [default "greedyLargeK"].

iter_clumpp

The number of iterations to use if running either 'greedy''greedyLargeK' [default 100].

clumpak

Whether use the Clumpak method (see details) [default TRUE].

plot_theme

Theme for the plot. See Details for options[default NULL].

colors_clusters

A color palette for clusters (K) or a list withas many colors as there are clusters (K) [default NULL].

ind_name

Whether to plot individual names [default TRUE].

border_ind

The width of the border line between individuals [default 0.25].

Details

The function outputs a barplot which is the typical output offastStructure.

This function is based on the methods of CLUMPP and Clumpak as implemented in the R package starmie (https://github.com/sa-lee/starmie).

The Clumpak method identifies sets of highly similar runs among all the replicates of the same K. The method then separates the distinct groups of runs representing distinct modes in the space of possible solutions.

The CLUMPP method permutes the clusters output by independent runs of clustering programs such as structure, so that they match up as closely as possible.

This function averages the replicates within each mode identified by the Clumpak method.

Examples of other themes that can be used can be consulted in

Value

List of Q-matrices

Author(s)

Bernd Gruber & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Raj, A., Stephens, M., & Pritchard, J. K. (2014). fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics, 197(2), 573-589.
Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Kopelman, Naama M., et al. "Clumpak: a program for identifying clustering modes and packaging population structure inferences across K." Molecular ecology resources 15.5 (2015): 1179-1191.
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a clustermatching and permutation program for dealing with label switching andmultimodality in analysis of population structure. Bioinformatics23(14):1801-1806. Available atclumpp

Examples

## Not run: t1 <- gl.filter.callrate(platypus.gl,threshold = 1)res <- gl.run.faststructure(t1, exec = "./fastStructure",k.range = 2:3,                           num.k.rep = 2,output = paste0(getwd(),"/res_str"))qmat <- gl.plot.faststructure(res,k.range=2:3)gl.map.structure(qmat, K=2, t1, scalex=1, scaley=0.5)## End(Not run)

Represents a distance matrix as a heatmap

Description

The script plots a heat map to represent the distances in the distance ordissimilarity matrix. This function is a wrapper forheatmap.2 (package gplots).

Usage

gl.plot.heatmap(D, palette.divergent = gl.colors("div"), verbose = NULL, ...)

Arguments

D

Name of the distance matrix or class fd object [required].

palette.divergent

A divergent palette for the distance values[default gl.colors("div")].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]

...

Parameters passed to functionheatmap.2 (package gplots)

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run:    gl <- testset.gl[1:10,]   D <- dist(as.matrix(gl),upper=TRUE,diag=TRUE)   gl.plot.heatmap(D)   D2 <- gl.dist.pop(possums.gl)   gl.plot.heatmap(D2)   D3 <- gl.fixed.diff(testset.gl)   gl.plot.heatmap(D3)   ## End(Not run)   if ((requireNamespace("gplots", quietly = TRUE))) {   D2 <- gl.dist.pop(possums.gl)   gl.plot.heatmap(D2)   }

Represents a distance or dissimilarity matrix as a network

Description

This script takes a distance matrix generated by dist() and represents therelationship among the specimens as a network diagram. In order to use thisscript, a decision is required on a threshold for relatedness to berepresented as link in the network, and on the layout used to create thediagram.

Usage

gl.plot.network(  D,  x = NULL,  method = "fr",  node.size = 3,  node.label = FALSE,  node.label.size = 0.7,  node.label.color = "black",  alpha = 0.005,  title = "Network based on genetic distance",  verbose = NULL)

Arguments

D

A distance or dissimilarity matrix generated by dist() or gl.dist()[required].

x

A genlight object from which the D matrix was generated[default NULL].

method

One of "fr", "kk" or "drl" [default "fr"].

node.size

Size of the symbols for the network nodes [default 3].

node.label

TRUE to display node labels [default FALSE].

node.label.size

Size of the node labels [default 0.7].

node.label.color

Color of the text of the node labels[default 'black'].

alpha

Upper threshold to determine which links between nodes to display[default 0.005].

title

Title for the plot[default "Network based on genetic distance"].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The threshold for relatedness to be represented as a link in the network isspecified as a quantile. Those relatedness measures above the quantile areplotted as links, those below the quantile are not. Often you are looking forrelatedness outliers in comparison with the overall relatedness amongindividuals, so a very conservative quantile is used (e.g. 0.004), butultimately, this decision is made as a matter of trial and error. One way toapproach this trial and error is to try to achieve a sparse set of linksbetween unrelated 'background' individuals so that the stronger links arepreferentially shown.

There are several layouts from which to choose. The most popular are given asoptions in this script.

fr – Fruchterman, T.M.J. and Reingold, E.M. (1991). Graph Drawing byForce-directed Placement. Software – Practice and Experience 21:1129-1164.
kk – Kamada, T. and Kawai, S.: An Algorithm for Drawing GeneralUndirected Graphs. Information Processing Letters 31:7-15, 1989.
drl – Martin, S., Brown, W.M., Klavans, R., Boyack, K.W., DrL:Distributed Recursive (Graph) Layout. SAND Reports 2936:1-10, 2008.

Colors of node symbols are those of the rainbow.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

if ((requireNamespace("rrBLUP", quietly = TRUE)) & (requireNamespace("gplots", quietly = TRUE))) {test <- gl.subsample.loci(platypus.gl, n = 100)test <- gl.keep.ind(test,ind.list = indNames(test)[1:10])D <- gl.grm(test, legendx=0.04)gl.plot.network(D,test)}

Plots STRUCTURE analysis results (Q-matrix)

Description

This function takes a structure run object (output fromgl.run.structure) and plots the typical structure barplot that visualize the q matrix of a structure run.

Usage

gl.plot.structure(  sr,  K = NULL,  met_clumpp = "greedyLargeK",  iter_clumpp = 100,  clumpak = TRUE,  plot_theme = NULL,  colors_clusters = NULL,  ind_name = TRUE,  border_ind = 0.15,  plot.out = TRUE,  save2tmp = FALSE,  verbose = NULL)

Arguments

sr

Structure run object fromgl.run.structure [required].

K

The number for K of the q matrix that should be plotted. Needs tobe within you simulated range of K's in your sr structure run object. If NULL, all the K's are plotted [default NULL].

met_clumpp

The algorithm to use to infer the correct permutations.One of 'greedy' or 'greedyLargeK' or 'stephens' [default "greedyLargeK"].

iter_clumpp

The number of iterations to use if running either 'greedy''greedyLargeK' [default 100].

clumpak

Whether use the Clumpak method (see details) [default TRUE].

plot_theme

Theme for the plot. See Details for options[default NULL].

colors_clusters

A color palette for clusters (K) or a list withas many colors as there are clusters (K) [default NULL].

ind_name

Whether to plot individual names [default TRUE].

border_ind

The width of the border line between individuals [default 0.25].

plot.out

Specify if plot is to be produced [default TRUE].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [defaultNULL, unless specified using gl.set.verbosity]

Details

The function outputs a barplot which is the typical output ofstructure. For a Evanno plot use gl.evanno.

This function is based on the methods of CLUMPP and Clumpak as implemented in the R package starmie (https://github.com/sa-lee/starmie).

The CLUMPP method permutes the clusters output by independent runs of clustering programs such as structure, so that they match up as closely as possible.

This function averages the replicates within each mode identified by the Clumpak method.

Examples of other themes that can be used can be consulted in

Value

List of Q-matrices

Author(s)

Bernd Gruber & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Kopelman, Naama M., et al. "Clumpak: a program for identifying clustering modes and packaging population structure inferences across K." Molecular ecology resources 15.5 (2015): 1179-1191.
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a clustermatching and permutation program for dealing with label switching andmultimodality in analysis of population structure. Bioinformatics23(14):1801-1806. Available atclumpp

Examples

## Not run: #bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, exec = './structure')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, K=3)#head(qmat)#gl.map.structure(qmat, K=3, bc, scalex=1, scaley=0.5)## End(Not run)

Prints history of a genlight object

Description

Prints history of a genlight object

Usage

gl.print.history(x = NULL, history = NULL)

Arguments

x

A genlight object (with history) [optional].

history

Either a link to a history slot(gl\@other$history), or a vector indicating which part of the history of x isused [c(1,3,4) uses the first, third and forth entry from x\@other$history].If no history is provided the complete history of x is used (recreating theidentical object x) [optional].

Value

Prints a table with all history records. Currently the style cannotbe changed.

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

dartfile <- system.file('extdata','testset_SNPs_2Row.csv', package='dartR')metadata <- system.file('extdata','testset_metadata.csv', package='dartR')gl <- gl.read.dart(dartfile, ind.metafile = metadata, probar=FALSE)gl2 <- gl.filter.callrate(gl, method='loc', threshold=0.9)gl3 <- gl.filter.callrate(gl2, method='ind', threshold=0.95)#Now 'replay' part of the history 'onto' another genlight object#bc.fil <- gl.play.history(gl.compliance.check(bandicoot.gl),#history=gl3@other$history[c(2,3)], verbose=1)#gl.print.history(bc.fil)

Prints dartR reports saved in tempdir

Description

Prints dartR reports saved in tempdir

Usage

gl.print.reports(print_report)

Arguments

print_report

Number of report fromgl.list.reports that is to be printed

Value

Prints reports that were saved in tempdir.

Author(s)

Bernd Gruber & Luis Mijangos (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run: reports <- gl.print.reports(1)## End(Not run)

Calculates a similarity (distance) matrix for individuals on the proportion ofshared alleles

Description

This script calculates an individual based distance matrix. It uses an C++implementation, so package Rcpp needs to be installed and it is thereforereally fast (once it has compiled the function after the first run).

Usage

gl.propShared(x)

Arguments

x

Name of the genlight containing the SNP genotypes [required].

Value

A similarity matrix

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#takes some time at the first run of the function...## Not run: res <- gl.propShared(bandicoot.gl)res[1:5,1:7] #show only a small part of the matrix## End(Not run)

Randomly changes the allocation of 0's and 2's in a genlight object

Description

This function samples randomly half of the SNPs and re-codes, in the sampledSNP's, 0's by 2's.

Usage

gl.random.snp(x, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

plot.out

Specify if a plot is to be produced [default TRUE].

save2tmp

If TRUE, saves any ggplots to the session temporary directory(tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default NULL,unless specified using gl.set.verbosity].

Details

DArT calls the most common allele as the reference allele. In a genlightobject, homozygous for the reference allele are coded with a '0' andhomozygous for the alternative allele are coded with a '2'. This causes somedistortions in visuals from time to time.

If plot.out = TRUE, two smear plots (pre-randomisation andpost-randomisation) are presented using a random subset of individuals (10)and loci (100) to provide an overview of the changes.

Resultant ggplots are saved to the session's temporary directory.

Value

Returns a genlight object with half of the loci re-coded.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")res <- gl.random.snp(platypus.gl[1:5,1:5],verbose = 5)

Reads SNP data from a csv file into a genlight object

Description

This script takes SNP genotypes from a csv file, combines them withindividual and locus metrics and creates a genlight object.

Usage

gl.read.csv(  filename,  transpose = FALSE,  ind.metafile = NULL,  loc.metafile = NULL,  verbose = NULL)

Arguments

filename

Name of the csv file containing the SNP genotypes [required].

transpose

If TRUE, rows are loci and columns are individuals[default FALSE].

ind.metafile

Name of the csv file containing the metrics forindividuals [optional].

loc.metafile

Name of the csv file containing the metrics forloci [optional].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The SNP data need to be in one of two forms. SNPs can be coded 0 forhomozygous reference, 2 for homozygous alternate, 1 for heterozygous, and NA for missing values; or the SNP data can be coded A/A, A/C, C/T, G/A etc,and -/- for missing data. In this format, the reference allele is the most frequent allele, as used by DArT. Other formats will throw an error.

The SNP data need to be individuals as rows, labeled, and loci as columns,also labeled. If the orientation is individuals as columns and loci by rows,then set transpose=TRUE.

The individual metrics need to be in a csv file, with headings, with amandatory id column corresponding exactly to the individual identity labelsprovided with the SNP data and in the same order.

The locus metadata needs to be in a csv file with headings, with a mandatorycolumn headed AlleleID corresponding exactly to the locus identity labelsprovided with the SNP data and in the same order.

Note that the locus metadata will be complemented by calculable statisticscorresponding to those that would be provided by Diversity Arrays Technology(e.g. CallRate).

Value

A genlight object with the SNP data and associated metadata included.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

csv_file <- system.file('extdata','platy_test.csv', package='dartR')ind_metadata <- system.file('extdata','platy_ind.csv', package='dartR')gl  <- gl.read.csv(filename = csv_file, ind.metafile = ind_metadata)

Imports DArT data into dartR and converts it into a dartR genlight object

Description

This function is a wrapper function that allows you to convert your DArT fileinto a genlight object of class dartR.

Usage

gl.read.dart(  filename,  ind.metafile = NULL,  recalc = TRUE,  mono.rm = FALSE,  nas = "-",  topskip = NULL,  lastmetric = NULL,  covfilename = NULL,  service.row = 1,  plate.row = 3,  probar = FALSE,  verbose = NULL)

Arguments

filename

File containing the SNP data (csv file) [required].

ind.metafile

File that contains additional information on individuals[required].

recalc

If TRUE, force the recalculation of locus metrics [default TRUE].

mono.rm

If TRUE, force the removal of monomorphic loci (including all NAs.[default FALSE].

nas

A character specifying NAs [default '-'].

topskip

A number specifying the number of initial rows to be skipped. [default NULL].

lastmetric

Deprecated, specifies the last column of locus metadata. Can be specified as a column number [default NULL].

covfilename

Deprecated, sse ind.metafile parameter [NULL].

service.row

The row number for the DArT serviceis contained [default 1].

plate.row

The row number the plate well [default 3].

probar

Show progress bar [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, or as set by gl.set.verbose()].

Details

The function will determine automatically if the data are in Diversity Arraysone-row csv format or two-row csv format.

The first row of data is determined from the number of rows with an * in the first column. This can be alternatively specified with the topskip parameter.

The DArT service code is added to the ind.metrics of the genlight object. The row containing the service code for each individual can be specified with the service.row parameter.

#'The DArT plate well is added to the ind.metrics of the genlight object. The row containing the plate well for each individual can be specified with the plate.row parameter.

If individuals have been deleted from the input file manually, then the locusmetrics supplied by DArT will no longer be correct and some loci may bemonomorphic. To accommodate this, set mono.rm and recalc to TRUE.

Value

A dartR genlight object that contains individual and locus metrics[if data were provided] and locus metrics [from a DArT report].

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

dartfile <- system.file('extdata','testset_SNPs_2Row.csv', package='dartR')metadata <- system.file('extdata','testset_metadata.csv', package='dartR')gl <- gl.read.dart(dartfile, ind.metafile = metadata, probar=TRUE)

Reads FASTA files and converts them to genlight object

Description

The following IUPAC Ambiguity Codes are taken as heterozygotes:

M is heterozygote forAC and CA
R is heterozygotefor AG and GA
W is heterozygotefor AT and TA
S is heterozygotefor CG and GC
Y is heterozygotefor CT and TC
K is heterozygotefor GT and TG

The following IUPAC Ambiguity Codes are taken as missing data:

The function can deal with missing data in individuals, e.g. when FASTA files have different number of individuals due to missing data.

The allele with the highest frequency is taken as the reference allele.

SNPs with more than two alleles are skipped.

Usage

gl.read.fasta(fasta_files, parallel = FALSE, n_cores = NULL, verbose = NULL)

Arguments

fasta_files

Fasta files to read [required].

parallel

A logical indicating whether multiple cores -if available-should be used for the computations (TRUE), or not (FALSE); requires thepackage parallel to be installed [default FALSE].

n_cores

If parallel is TRUE, the number of cores to be used in thecomputations; if NULL, then the maximum number of cores available on thecomputer is used [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Ambiguity characters are often used to code heterozygotes. However, usingheterozygotes as ambiguity characters may bias many estimates. See moreinformation in the link below:https://evodify.com/heterozygotes-ambiguity-characters/

Value

A genlight object.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # Folder where the fasta files are located.  folder_samples <- system.file('extdata', package ='dartR') # listing the FASTA files, including their path. Files have an extension # that contains "fas". file_names <- list.files(path = folder_samples, pattern = "*.fas",                           full.names = TRUE) # reading fasta files  obj <- gl.read.fasta(file_names)

Imports presence/absence data from SilicoDArT to genlight {agegenet}format (ploidy=1)

Description

DaRT provide the data as a matrix of entities (individual animals) across thetop and attributes (P/A of sequenced fragment) down the side in a formatthat is unique to DArT. This program reads the data in to adegenet formatfor consistency with other programming activity. The script may requiremodification as DArT modify their data formats from time to time.

Usage

gl.read.silicodart(  filename,  ind.metafile = NULL,  nas = "-",  topskip = NULL,  lastmetric = "Reproducibility",  probar = TRUE,  verbose = NULL)

Arguments

filename

Name of csv file containing the SilicoDArT data [required].

ind.metafile

Name of csv file containing metadata assigned to eachentity (individual) [default NULL].

nas

Missing data character [default '-'].

topskip

Number of rows to skip before the header row (containing thespecimen identities) [optional].

lastmetric

Specifies the last non genetic column (Default is'Reproducibility'). Be sure to check if that is true, otherwise the number ofindividuals will not match. You can also specify the last column by a number[default "Reproducibility"].

probar

Show progress bar [default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, or as set by gl.set.verbose()].

Details

gl.read.silicodart() opens the data file (csv comma delimited) and skips thefirst n=topskip lines. The script assumes that the next line contains theentity labels (specimen ids) followed immediately by the SNP data for thefirst locus.

It reads the presence/absence data into a matrix of 1s and 0s, and inputs thelocus metadata and specimen metadata. The locus metadata comprises a seriesof columns of values for each locus including the essential columns ofCloneID and the desirable variables Reproducibility and PIC. Refer todocumentation provide by DArT for an explanation of these columns.

The specimen metadata provides the opportunity to reassign specimens topopulations, and to add other data relevant to the specimen. The keyvariables are id (specimen identity which must be the same and in the sameorder as the SilicoDArT file, each unique), pop (population assignment), lat(latitude, optional) and lon (longitude, optional). id, pop, lat, lon arethe column headers in the csv file. Other optional columns can be added.

The data matrix, locus names (forced to be unique), locus metadata, specimennames, specimen metadata are combined into a genind object. Refer to thedocumentation for {adegenet} for further details.

Value

An object of classgenlight with ploidy set to 1, containingthe presence/absence data, and locus and individual metadata.

Author(s)

Custodian: Bernd Gruber – Post tohttps://groups.google.com/d/forum/dartr

Examples

silicodartfile <- system.file('extdata','testset_SilicoDArT.csv', package='dartR')metadata <- system.file('extdata',ind.metafile ='testset_metadata_silicodart.csv', package='dartR')testset.gs <- gl.read.silicodart(filename = silicodartfile, ind.metafile = metadata)

Converts a vcf file into a genlight object

Description

This function needs package vcfR, please install it.

Usage

gl.read.vcf(vcffile, ind.metafile = NULL, verbose = NULL)

Arguments

vcffile

A vcf file (works only for diploid data) [required].

ind.metafile

Optional file in csv format with metadata for eachindividual (see details for explanation) [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The ind.metadata file needs to have very specific headings. First a headingcalled id. Here the ids have to match the ids in the dartR object. The following column headings are optional.pop: specifies the population membership of each individual. lat and lonspecify spatial coordinates (in decimal degrees WGS1984 format). Additionalcolumns with individual metadata can be imported (e.g. age, gender).

Value

A genlight object.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run: obj <- gl.read.vcf(system.file('extdata/test.vcf', package='dartR'))## End(Not run)

Assigns an individual metric as pop in a genlight {adegenet} object

Description

Individuals are assigned to populations based on theindividual/sample/specimen metrics file (csv) used with gl.read.dart().

One might want to define the population structure in accordance with anotherclassification, such as using an individual metric (e.g. sex, male orfemale). This script discards the current population assignments and replacesthem with new population assignments defined by a specified individualmetric.

The script returns a genlight object with the new population assignments.Note that the original population assignments are lost.

Usage

gl.reassign.pop(x, as.pop, verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes [required].

as.pop

Specify the name of the individual metric to set as the popvariable [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object with the reassigned populations.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data   popNames(testset.gl)   gl <- gl.reassign.pop(testset.gl, as.pop='sex',verbose=3)   popNames(gl)# Tag P/A data   popNames(testset.gs)   gs <- gl.reassign.pop(testset.gs, as.pop='sex',verbose=3)   popNames(gs)

Recalculates locus metrics when individuals or populations are deleted from agenlight {adegenet} object

Description

When individuals,or populations, are deleted from a genlight object, thelocus metrics no longer apply. For example, the Call Rate may be differentconsidering the subset of individuals, compared with the full set. Thisscript recalculates those affected locus metrics, namely, avgPIC, CallRate,freqHets, freqHomRef, freqHomSnp, OneRatioRef, OneRatioSnp, PICRef andPICSnp.Metrics that remain unaltered are RepAvg and TrimmedSeq as they areunaffected by the removal of individuals.

Usage

gl.recalc.metrics(x, mono.rm = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes [required].

mono.rm

If TRUE, removes monomorphic loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The script optionally removes resultant monomorphic loci or lociwith all values missing and deletes them (using gl.filter.monomorphs.r).

The script returns a genlight object with the recalculated locus metadata.

Value

A genlight object with the recalculated locus metadata.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

  gl <- gl.recalc.metrics(testset.gl, verbose=2)

Recodes individual (=specimen = sample) labels in a genlight object

Description

This function recodes individual labels and/or deletes individuals from a DaRTgenlight SNP file based on a lookup table provided as a csv file.

Usage

gl.recode.ind(x, ind.recode, recalc = FALSE, mono.rm = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object [required].

ind.recode

Name of the csv file containing the individual relabelling[required].

recalc

If TRUE, recalculate the locus metadata statistics if any individuals are deleted in the filtering [default FALSE].

mono.rm

If TRUE, remove monomorphic loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Details

Renaming individuals may be required when there have been errors in labelingarising in the process from sample to sequence files. There may be occasionswhere renaming individuals is required for preparation of figures. Whencaution needs to be exercised because of the potential for breaking the'chain of evidence' associated with the samples, recoding individuals usinga recode table (csv) can provide a durable record of the changes.

The function works with genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

The script returns a dartR genlight object with the new individual names and the recalculated locus metadata.

Value

A genlight or genind object with the recoded and reduced data.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

  file <- system.file('extdata','testset_ind_recode.csv', package='dartR')  gl <- gl.recode.ind(testset.gl, ind.recode=file, verbose=3)

Recodes population assignments in a genlight object

Description

This function recodes population assignments and/or deletes populations from aDaRT genlight object based on information provided in a csv populationrecode file.

Usage

gl.recode.pop(x, pop.recode, recalc = FALSE, mono.rm = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object [required].

pop.recode

Name of the csv file containing the populationreassignments [required].

recalc

If TRUE, recalculates the locus metadata statistics if any individualsare deleted in the filtering [default FALSE].

mono.rm

If TRUE, removes monomorphic loci [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Details

Individuals are assigned to populations based on the specimen metadata datafile (csv) used with gl.read.dart(). Recoding can be used to amalgamatepopulations or to selectively delete or retain populations.

The population recode file contains a list of populations taken from the genlightobject as the first column of the csv file, and the new populationassignments are located in the second column of the csv file. The keyword 'Delete' used as a new population assignment will result in the associated specimen being dropped from the dataset.

The function works with genlight objectscontaining SNP genotypes and Tag P/A data (SilicoDArT).

Value

A genlight object with the recoded and reduced data.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

  mfile <- system.file('extdata', 'testset_pop_recode.csv', package='dartR')  nPop(testset.gl)  gl <- gl.recode.pop(testset.gl, pop.recode=mfile, verbose=3)

Renames a population in a genlight object

Description

Individuals are assigned to populations based on the specimen metadata datafile (csv) used with gl.read.dart().

This script renames a nominated population.

The script returns a genlight object with the new population name.

Usage

gl.rename.pop(x, old = NULL, new = NULL, verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes [required].

old

Name of population to be changed [required].

new

New name for the population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object with the new population name.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

   gl <- gl.rename.pop(testset.gl, old='EmsubRopeMata', new='Outgroup')

Reports summary of base pair frequencies

Description

This script calculates the frequencies of the four DNA nucleotide bases:adenine (A), cytosine (C), 'guanine (G) and thymine (T), and the frequency oftransitions (Ts) and transversions (Tv) in a DArT genlight object.

Usage

gl.report.bases(  x,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

plot.out

If TRUE, histograms of base composition are produced[default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE]

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]

Details

The script checks first if trimmed sequences are included in thelocus metadata (@other$loc.metrics$TrimmedSequence), and if so, tallies upthe numbers of A, T, G and C bases. Only the reference state at the SNP locusis counted. Counts of transitions (Ts) and transversions (Tv) assume thatthere is no directionality, that is C->T is the same as T->C, because thereference state is arbitrary.

For presence/absence data (SilicoDArT), it is not possible to counttransversions or transitions or transversions/transitions ratio because theSNP data is not available, only a single sequence tag.

Examples of other themes that can be used can be consulted in

Value

The unchanged genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  out <- gl.report.bases(testset.gl)  #' # Tag P/A data  out <- gl.report.bases(testset.gs)

Reports summary of Call Rate for loci or individuals

Description

SNP datasets generated by DArT have missing values primarily arising fromfailure to call a SNP because of a mutation at one or both of the restrictionenzyme recognition sites. P/A datasets (SilicoDArT) have missing valuesbecause it was not possible to call whether a sequence tag was amplified ornot. This function tabulates the number of missing values as quantiles.

Usage

gl.report.callrate(  x,  method = "loc",  by_pop = FALSE,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  bins = 50,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

method

Specify the type of report by locus (method='loc') orindividual (method='ind') [default 'loc'].

by_pop

Whether report by population [default FALSE].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

User specified theme [default theme_dartR()].

plot_colors

Vector with two color names for the borders and fill[default two_colors].

bins

Number of bins to display in histograms [default 25].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

This function expects a genlight object, containing either SNP data orSilicoDArT (=presence/absence data).

Plot themes can be obtained from:

Resultant ggplots and the tabulation are saved to the session's temporarydirectory.

Value

Returns unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data  test.gl <- testset.gl[1:20,]  gl.report.callrate(test.gl)  gl.report.callrate(test.gl,method='ind')# Tag P/A data  test.gs <- testset.gs[1:20,]  gl.report.callrate(test.gs)  gl.report.callrate(test.gs,method='ind')    test.gl <- testset.gl[1:20,]  gl.report.callrate(test.gl)

Calculates diversity indexes for SNPs

Description

This script takes a genlight object and calculates alpha and beta diversityfor q = 0:2. Formulas are taken from Sherwin et al. 2017. The paper describesnicely the relationship between the different q levels and how they relate topopulation genetic processes such as dispersal and selection.

Usage

gl.report.diversity(  x,  plot.out = TRUE,  pbar = TRUE,  table = "DH",  plot_theme = theme_dartR(),  plot_colors = discrete_palette,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

plot.out

Specify if plot is to be produced [default TRUE].

pbar

Report on progress. Silent if set to FALSE [default TRUE].

table

Prints a tabular output to the console either 'D'=D values, or'H'=H values or 'DH','HD'=both or 'N'=no table. [default 'DH'].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

A color palette or a list with as many colors as there are populations in the dataset [default discrete_palette].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

For all indexes, the entropies (H) and corresponding effectivenumbers, i.e. Hill numbers (D), which reflect the number of needed entitiesto get the observed values, are calculated. In a nutshell, the alpha indexesbetween the different q-values should be similar if there is no deviationfrom expected allele frequencies and occurrences (e.g. all loci in HWE &equilibrium). If there is a deviation of an index, this links to a processcausing it, such as dispersal, selection or strong drift. For a detailedexplanation of all the indexes, we recommend resorting to the literatureprovided below. Confidence intervals are +/- 1 standard deviation.

Function's output

If the function's parameter "table" = "DH" (the default value) is used, the output of the function is 20 tables.

The first two show the number of loci used. The name of each of the rest of the tables starts with three terms separated by underscores.

The first term refers to the q value (0 to 2).

The second term refers to whether it is the diversity measure (H) or its transformation to Hill numbers (D).

The third term refers to whether the diversity is calculated within populations (alpha) or between populations (beta).

In the case of alpha diversity tables, standard deviations have their own table, which finishes with a fourth term: "sd".

In the case of beta diversity tables, standard deviations are in the upper triangle of the matrix and diversity values are in the lower triangle of the matrix.

Plots are saved to the temporal directory (tempdir) and can be accessed withthe functiongl.print.reports and listed with the functiongl.list.reports. Note that they can be accessed only in thecurrent R session because tempdir is cleared each time that the R sessionis closed.

Examples of other themes that can be used can be consulted in

Value

A list of entropy indexes for each level of q and equivalent numbersfor alpha and beta diversity.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr),Contributors: William B. Sherwin, Alexander Sentinella

References

Sherwin, W.B., Chao, A., Johst, L., Smouse, P.E. (2017). Information TheoryBroadens the Spectrum of Molecular Ecology and Evolution. TREE 32(12)948-963. doi:10.1016/j.tree.2017.09.12

Examples

div <- gl.report.diversity(bandicoot.gl[1:10,1:100], table = FALSE, pbar=FALSE)div$zero_H_alphadiv$two_H_betanames(div)

Reports various statistics of genetic differentiation betweenpopulations with confident intervals

Description

This function calculates four genetic differentiation between populationsstatistics (see the "Details" section for further information).

Fst - Measure of the degree of genetic differentiation of subpopulations (Nei, 1987).
Fstp - Unbiased (i.e. corrected for sampling error, see explanation below) Fst (Nei, 1987).
Dest - Jost’s D (Jost, 2008).
Gst_H - Gst standardized by the maximum level that it can obtain forthe observed amount of genetic variation (Hedrick 2005).

Sampling errors arise because allele frequencies in our samples differ from those in the subpopulations from which they were taken (Holsinger, 2012).

Confident Intervals are obtained using bootstrapping.

Usage

gl.report.fstat(  x,  nboots = 0,  conf = 0.95,  CI.type = "bca",  ncpus = 1,  plot.stat = "Fstp",  plot.display = TRUE,  palette.divergent = gl.colors("div"),  font.size = 0.5,  plot.dir = NULL,  plot.file = NULL,  verbose = NULL,  ...)

Arguments

x

Name of the genlight object containing the SNP data [required].

nboots

Number of bootstrap replicates to obtain confident intervals[default 0].

conf

The confidence level of the required interval [default 0.95].

CI.type

Method to estimate confident intervals. One of"norm", "basic", "perc" or "bca" [default "bca"].

ncpus

Number of processes to be used in parallel operation. If ncpus> 1 parallel operation is activated,see "Details" section [default 1].

plot.stat

Statistic to plot. One of "Fst","Fstp","Dest" or "Gst_H"[default "Fstp"].

plot.display

If TRUE, a heatmap of the pairwise static chosen isdisplayed in the plot window [default TRUE].

palette.divergent

A color palette function for the heatmap plot[default gl.colors("div")].

font.size

Size of font for the labels of horizontal and vertical axesof the heatmap [default 0.5].

plot.dir

Directory in which to save files [default working directory].

plot.file

Name for the RDS binary file to save (base name only,exclude extension) [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]

...

Parameters passed to functionheatmap.2 (packagegplots).

Details

Even though Fst and its relatives can predict evolutionary processes(Holsinger & Weir, 2009), they are not true measures of geneticdifferentiation in the sense that they are dependent on the diversitywithin populations (Meirmans & Hedrick, 2011), the number of populationsanalysed (Alcala & Rosenberg, 2017) and are not monotonic(Sherwin et al., 2017). Recent approaches have been developed toaccommodate these mathematical restrictions (G'ST; "Gst_H"; Hedrick, 2005,and Jost's D; "Dest"; Jost, 2008). More recently, novel approaches based oninformation theory (Mutual Information; Sherwin et al., 2017) and allelefrequencies (Allele Frequency Difference; Berner, 2019) have distinctproperties that make them valuable resources to interpret geneticdifferentiation between populations.

Note that each measure of genetic differentiation has advantages anddrawbacks, and the decision of using a particular measure is usuallybased on the research question.

Statistics calculated

The equations used to calculate the statistics are shown below.

Ho - Unbiased estimate of observed heterozygosity across subpopulations (Nei, 1987, pp. 164, eq. 7.38) is calculated as:
wherePkii represents the proportion of homozygoteii for allelei in individualk ands represents the numberof subpopulations.
Hs - Unbiased estimate of the expected heterozygosity under Hardy-Weinberg equilibrium across subpopulations (Nei, 1987, pp. 164,eq. 7.39) is calculated as:
whereñ is the harmonic mean ofnk (the number of individuals in each subpopulation),pki is the proportion (sometimes misleadingly called frequency) of allelei in subpopulationk.
Ht - Heterozygosity for the total population (Nei, 1987, pp. 164,eq. 7.40) is calculated as:
Dst - The average allele frequency differentiation between populations (Nei, 1987, pp. 163) is calculated as:
Htp - Unbiased estimate of Heterozygosity for the total population(Nei, 1987, pp. 165) is calculated as:
Dstp - Unbiased estimate of the average allele frequency differentiation between populations (Nei, 1987, pp. 165)
Fst - Measure of the extent of genetic differentiation of subpopulations (Nei, 1987, pp. 162, eq. 7.34) is calculated as:
Fstp - Unbiased measure of the extent of genetic differentiation of subpopulations (Nei, 1987, pp. 163, eq. 7.36) is calculated as:
Dest - Jost’s D (Jost, 2008, eq. 12)
Gst-max - The maximum level that Gst can obtain for the observed amount of genetic variation (Hedrick 2005, eq. 4a) is calculated as:
Gst-H - Gst standardized by the maximum level that it can obtain for the observed amount of genetic variation (Hedrick 2005, eq. 4b) is calculated as:

Confident Intervals

The uncertainty of a parameter, in this case the mean of the statistic, canbe summarised by a confidence interval (CI) which includes the true parametervalue with a specified probability (i.e. confidence level; the parameter"conf" in this function).

In this function, CI are obtained using Bootstrap which is an inferencemethod that samples with replacement the data (i.e. loci) and calculates thestatistics every time.

This function uses the functionboot (package boot) to performthe bootstrap replicates and the functionboot.ci(package boot) to perform the calculations for the CI.

Four different types of nonparametric CI can be calculated(parameter "CI.type" in this function):

First order normal approximation interval ("norm").
Basic bootstrap interval ("basic").
Bootstrap percentile interval ("perc").
Adjusted bootstrap percentile interval ("bca").

The studentized bootstrap interval ("stud") was not included in the CI typesbecause it is computationally intensive, it may produce estimates outsidethe range of plausible values and it has been found to be erratic inpractice, see for example the "Studentized (t) Intervals" section in:

www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-package

Nice tutorials about the different types of CI can be found at:

https://www.datacamp.com/tutorial/bootstrap-r

Efron and Tibshirani (1993, p. 162) and Davison and Hinkley(1997, p. 194) suggest that the number of bootstrap replicates shouldbe between 1000 and 2000.

It is important to note that unreliable confident intervals will beobtained if too few number of bootstrap replicates are used.Therefore, the functionboot.ci will throw warnings and errorsif bootstrap replicates are too few. Consider increasing then number ofbootstrap replicates to at least 200.

The "bca" interval is often cited as the best for theoretical reasons,however it may produce unstable results if the bootstrap distributionis skewed or has extreme values. For example, you might get the warning"extreme order statistics used as endpoints" or the error "estimatedadjustment 'a' is NA". In this case, you may want to use more bootstrapreplicates or a different method or check your data for outliers.

The error "estimated adjustment 'w' is infinite" means that the estimatedadjustment ‘w’ for the "bca" interval is infinite, which can happen whenthe empirical influence values are zero or very close to zero. This canbe caused by various reasons, such as:

The number of bootstrap replicates is too small, the statistic of interestis constant or nearly constant across the bootstrap samples, the datacontains outliers or extreme values.

You can try some possible solutions, such as:

Increasing the number of bootstrap replicates, using a different type ofbootstrap confidence interval or removing or transforming the outliers orextreme values.

Plotting

The plot can be customised by including any parameter(s) from the functionheatmap.2 (package gplots).

For the color palette you could try for example:

>library(viridis)

>res <- gl.report.fstat(platypus.gl, palette.divergent = viridis)

If a plot.file is given, the plot arising from this function is saved as an"RDS" binary file using the functionsaveRDS (package base);can be reloaded with functionreadRDS (package base). A filename must be specified for the plot to be saved.

If a plot directory (plot.dir) is specified, the gplot binary is saved tothat directory; otherwise to the tempdir().

Your plot might not shown in full because your 'Plots' pane is too small(in RStudio).Increase the size of the 'Plots' pane before running the function.Alternatively, use the parameter 'plot.file' to save the plot to a file.

Parallelisation

If the parameter ncpus > 1, parallelisation is enabled. In Windows, parallelcomputing employs a "socket" approach that starts new copies of R on eachcore. POSIX systems, on the other hand (Mac, Linux, Unix, and BSD),utilise a "forking" approach that replicates the whole current version ofR and transfers it to a new core.

Opening and terminating R sessions in each core involves a significantamount of processing time, therefore parallelisation in Windows machinesis only quicker than not usung parallelisation when nboots > 1000-2000.

Value

Two lists, the first list contains matrices with genetic statisticstaken pairwise by population, the second list contains tables with thegenetic statistics for each pair of populations. If nboots > 0, tables withthe four statistics calculated with Low Confidence Intervals (LCI) and HighConfidence Intervals (HCI).

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

References

Alcala, N., & Rosenberg, N. A. (2017). Mathematical constraints on FST:Biallelic markers in arbitrarily many populations. Genetics (206), 1581-1600.
Berner, D. (2019). Allele frequency difference AFD–an intuitive alternativeto FST for quantifying genetic population differentiation. Genes, 10(4), 308.
Davison AC, Hinkley DV (1997). Bootstrap Methods and their Application.Cambridge University Press: Cambridge.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals ofStatistics 7, 1–26.
Efron B, Tibshirani RJ (1993). An Introduction to the Bootstrap. Chapman andHall: London.
Hedrick, P. W. (2005). A standardized genetic differentiation measure.Evolution, 59(8), 1633-1638.
Holsinger, K. E. (2012). Lecture notes in population genetics.
Holsinger, K. E., & Weir, B. S. (2009). Genetics in geographically structuredpopulations: defining, estimating and interpreting FST. Nature ReviewsGenetics, 10(9), 639- 650.
Jost, L. (2008). GST and its relatives do not measure differentiation.Molecular Ecology, 17(18), 4015-4026.
Meirmans, P. G., & Hedrick, P. W. (2011). Assessing population structure:FST and related measures. Molecular Ecology Resources, 11(1), 5-18.
Nei, M. (1987). Molecular evolutionary genetics: Columbia University Press.
Sherwin, W. B., Chao, A., Jost, L., & Smouse, P. E. (2017). Informationtheory broadens the spectrum of molecular ecology and evolution. Trends inEcology & Evolution, 32(12), 948-963.

Examples

res <- gl.report.fstat(platypus.gl)

Calculates the pairwise Hamming distance between DArT trimmed DNAsequences

Description

Hamming distance is calculated as the number of base differencesbetween two sequences which can be expressed as a count or a proportion.Typically, it is calculated between two sequences of equal length. In thecontext of DArT trimmed sequences, which differ in length but which areanchored to the left by the restriction enzyme recognition sequence, it issensible to compare the two trimmed sequences starting from immediately afterthe common recognition sequence and terminating at the last base of theshorter sequence.

Usage

gl.report.hamming(  x,  rs = 5,  threshold = 3,  taglength = 69,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  probar = FALSE,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

rs

Number of bases in the restriction enzyme recognition sequence[default 5].

threshold

Minimum acceptable base pair difference for display on theboxplot and histogram [default 3].

taglength

Typical length of the sequence tags [default 69].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

probar

If TRUE, then a progress bar is displayed on long loops[default TRUE].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The functiongl.filter.hamming will filter out one oftwo loci if their Hamming distance is less than a specified percentage

Hamming distance can be computed by exploiting the fact that the dot productof two binary vectors x and (1-y) counts the corresponding elements that aredifferent between x and y. This approach can also be used for vectors thatcontain more than two possible values at each position (e.g. A, C, T or G).

If a pair of DNA sequences are of differing length, the longer is truncated.

The algorithm is that of Johann de Jonghttps://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/as implemented inutils.hamming

Plots and table are saved to the session's temporary directory (tempdir)

Examples of other themes that can be used can be consulted in

Value

Returns unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 gl.report.hamming(testset.gl[,1:100])gl.report.hamming(testset.gs[,1:100])#' # SNP datatest <- platypus.gltest <- gl.subsample.loci(platypus.gl,n=50)result <- gl.filter.hamming(test, threshold=0.25, verbose=3)

Reports observed, expected and unbiased heterozygosities and FIS(inbreeding coefficient) by population or by individual from SNP data

Description

Calculates the observed, expected and unbiased expected (i.e.corrected for sample size) heterozygosities and FIS (inbreeding coefficient)for each population or the observed heterozygosity for each individual in agenlight object.

Usage

gl.report.heterozygosity(  x,  method = "pop",  n.invariant = 0,  nboots = 0,  conf = 0.95,  CI.type = "bca",  ncpus = 1,  plot.display = TRUE,  plot.theme = theme_dartR(),  plot.colors.pop = gl.colors("dis"),  plot.colors.ind = gl.colors(2),  error.bar = "SD",  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP [required].

method

Calculate heterozygosity by population (method='pop') or byindividual (method='ind') [default 'pop'].

n.invariant

An estimate of the number of invariant sequence tags usedto adjust the heterozygosity rate [default 0].

nboots

Number of bootstrap replicates to obtain confident intervals[default 0].

conf

The confidence level of the required interval [default 0.95].

CI.type

Method to estimate confident intervals. One of"norm", "basic", "perc" or "bca" [default "bca"].

ncpus

Number of processes to be used in parallel operation. If ncpus> 1 parallel operation is activated,see "Details" section [default 1].

plot.display

Specify if plot is to be produced [default TRUE].

plot.theme

Theme for the plot. See Details for options[default theme_dartR()].

plot.colors.pop

A color palette for population plots or a list withas many colors as there are populations in the dataset[default gl.colors("dis")].

plot.colors.ind

List of two color names for the borders and fill ofthe plot by individual [default gl.colors(2)].

error.bar

statistic to be plotted as error bar either "SD" (standard deviation) or "SE" (standard error) or "CI" (confident intervals)[default "SD"].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

Observed heterozygosity for a population takes the proportion ofheterozygous loci for each individual then averages over the individuals inthat population. The calculations take into account missing values.Expected heterozygosity for a population takes the expected proportion ofheterozygotes, that is, expected under Hardy-Weinberg equilibrium, for eachlocus, then averages this across the loci for an average estimate for thepopulation.

Expected heterozygosity is calculated using the correction for sample sizefollowing equation 2 from Nei 1978.

Observed heterozygosity for individuals is calculated as the proportion ofloci that are heterozygous for that individual.

Finally, the loci that are invariant across all individuals in the dataset(that is, across populations), is typically unknown. This can renderestimates of heterozygosity analysis specific, and so it is not valid tocompare such estimates across species or even across different analyses. Thisis a similar problem faced by microsatellites. If you have an estimate of thenumber of invariant sequence tags (loci) in your data, such as provided bygl.report.secondaries, you can specify it with the n.invariantparameter to standardize your estimates of heterozygosity.

NOTE: It is important to realise that estimation of adjustedheterozygosity requires that secondaries not to be removed.

Heterozygosities and FIS (inbreeding coefficient) are calculated by locuswithin each population using the following equations:

Observed heterozygosity (Ho) = number of homozygotes / n_Ind,where n_Ind is the number of individuals without missing data.
Observed heterozygosity adjusted (Ho.adj) <- Ho * n_Loc /(n_Loc + n.invariant),where n_Loc is the number of loci that do not have all missing data andn.invariant is an estimate of the number of invariant loci to adjustheterozygosity.
Expected heterozygosity (He) = 1 - (p^2 + q^2),where p is the frequency of the reference allele and q is the frequency ofthe alternative allele.
Expected heterozygosity adjusted (He.adj) = He * n_Loc /(n_Loc + n.invariant)
Unbiased expected heterozygosity (uHe) = He * (2 * n_Ind /(2 * n_Ind - 1))
Inbreeding coefficient (FIS) = 1 - (mean(Ho) / mean(uHe))

Function's outputOutput for method='pop' is an ordered barchart of observed heterozygosity,unbiased expected heterozygosity and FIS (Inbreeding coefficient) across populationstogether with a table of mean observed and expected heterozygosities and FISby population and their respective standard deviations (SD).In the output, it is also reported by population: the number of loci used toestimate heterozygosity(n.Loc), the number of polymorphic loci (polyLoc),the number of monomorphic loci (monoLoc) and loci with all missing data(all_NALoc).Output for method='ind' is a histogram and a boxplot of heterozygosity acrossindividuals.Plots and table are saved to the session temporary directory (tempdir)Examples of other themes that can be used can be consulted in

Error bars

The best method for presenting or assessing genetic statistics depends on the type of data you have and the specific questions you're trying to answer. Here's a brief overview of when you might use each method:

1. Confidence Intervals ("CI"):

- Usage: Often used to convey the precision of an estimate.

- Advantage: Confidence intervals give a range in which the true parameter (like a population mean) is likely to fall, given the data and a specified probability (like 95

- In Context: For genetic statistics, if you're estimating a parameter,a 95lies.

2. Standard Deviation ("SD"):

- Usage: Describes the amount of variation from the average in a set of data.

- Advantage: Allows for an understanding of the spread of individual datapoints around the mean.

- In Context: If you're looking at the distribution of a quantitative trait (like height) in a population with a particular genotype, the SD can describe how much individual heights vary around the average height.

3. Standard Error ("SE"):

- Usage: Describes the precision of the sample mean as an estimate of the population mean.

- Advantage: Smaller than the SD in large samples; it takes into account both the SD and the sample size.

- In Context: If you want to know how accurately your sample mean representsthe population mean, you'd look at the SE.

Recommendation:

- If you're trying to convey the precision of an estimate, confidence intervals are very useful.

- For understanding variability within a sample, standard deviation is key.

- To see how well a sample mean might estimate a population mean, consider the standard error.

In practice, geneticists often use a combination of these methods to analyze and present their data, depending on their research questions and the nature of the data.

Confident Intervals

In this function, CI are obtained using Bootstrap which is an inferencemethod that samples with replacement the data (i.e. loci) and calculates thestatistics every time.

This function uses the functionboot (package boot) to performthe bootstrap replicates and the functionboot.ci(package boot) to perform the calculations for the CI.

Four different types of nonparametric CI can be calculated(parameter "CI.type" in this function):

First order normal approximation interval ("norm").
Basic bootstrap interval ("basic").
Bootstrap percentile interval ("perc").
Adjusted bootstrap percentile interval ("bca").

www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-packageEfron and Tibshirani (1993, p. 162) and Davison and Hinkley(1997, p. 194) suggest that the number of bootstrap replicates shouldbe between 1000 and 2000.

The number of bootstrap replicates is too small, the statistic of interestis constant or nearly constant across the bootstrap samples, the datacontains outliers or extreme values.

You can try some possible solutions, such as:

Increasing the number of bootstrap replicates, using a different type ofbootstrap confidence interval or removing or transforming the outliers orextreme values.

Parallelisation

Value

A dataframe containing population labels, heterozygosities, FIS,their standard deviations and sample sizes

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Nei, M. (1978). Estimation of average heterozygosity and genetic distancefrom a small number of individuals. Genetics, 89(3), 583-590.

Examples

 require("dartR.data")df <- gl.report.heterozygosity(platypus.gl)df <- gl.report.heterozygosity(platypus.gl,method='ind')n.inv <- gl.report.secondaries(platypus.gl)gl.report.heterozygosity(platypus.gl, n.invariant = n.inv[7, 2])df <- gl.report.heterozygosity(platypus.gl)

Reports departure from Hardy-Weinberg proportions

Description

Calculates the probabilities of agreement with H-W proportions based on observedfrequencies of reference homozygotes, heterozygotes and alternate homozygotes.

Usage

gl.report.hwe(  x,  subset = "each",  method_sig = "Exact",  multi_comp = FALSE,  multi_comp_method = "BY",  alpha_val = 0.05,  pvalue_type = "midp",  cc_val = 0.5,  sig_only = TRUE,  min_sample_size = 5,  plot.out = TRUE,  plot_colors = two_colors_contrast,  max_plots = 4,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

subset

Way to group individuals to perform H-W tests. Either a vectorwith population names, 'each', 'all' (see details) [default 'each'].

method_sig

Method for determining statistical significance: 'ChiSquare'or 'Exact' [default 'Exact'].

multi_comp

Whether to adjust p-values for multiple comparisons[default FALSE].

multi_comp_method

Method to adjust p-values for multiple comparisons:'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr'(see details) [default 'fdr'].

alpha_val

Level of significance for testing [default 0.05].

pvalue_type

Type of p-value to be used in the Exact method.Either 'dost','selome','midp' (see details) [default 'midp'].

cc_val

The continuity correction applied to the ChiSquare test[default 0.5].

sig_only

Whether the returned table should include loci with a significant departure from Hardy-Weinberg proportions [default TRUE].

min_sample_size

Minimum number of individuals per population in whichperform H-W tests [default 5].

plot.out

If TRUE, will produce Ternary Plot(s) [default TRUE].

plot_colors

Vector with two color names for the significant andnot-significant loci [default two_colors_contrast].

max_plots

Maximum number of plots to print per page [default 4].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

Populations can be defined in three ways:

Merging all populations in the dataset using subset = 'all'.
Within each population separately using: subset = 'each'.
Within selected populations using for example: subset =c('pop1','pop2').

Two different statistical methods to test for deviations from Hardy Weinbergproportions:

The classical chi-square test (method_sig='ChiSquare') based on thefunctionHWChisq of the R package HardyWeinberg.By default a continuity correction is applied (cc_val=0.5). Thecontinuity correction can be turned off (by specifying cc_val=0), for examplein cases of extreme allele frequencies in which the continuity correction canlead to excessive type 1 error rates.
The exact test (method_sig='Exact') based on the exact calculationscontained in the functionHWExactStats of the Rpackage HardyWeinberg, and described in Wigginton et al. (2005). The exacttest is recommended in most cases (Wigginton et al., 2005).Three different methods to estimate p-values (pvalue_type) in the Exact testcan be used:
- 'dost' p-value is computed as twice the tail area of a one-sided test.
- 'selome' p-value is computed as the sum of the probabilities of allsamples less or equally likely as the current sample.
- 'midp', p-value is computed as half the probability of the currentsample + the probabilities of all samples that are more extreme.
The standard exact p-value is overly conservative, in particularfor small minor allele frequencies. The mid p-value ameliorates this problemby bringing the rejection rate closer to the nominal level, at the price ofoccasionally exceeding the nominal level (Graffelman & Moreno, 2013).

Correction for multiple tests can be applied using the following methodsbased on the functionp.adjust:

'holm' is also known as the sequential Bonferroni technique (Rice,1989). This method has a greater statistical power than the standardBonferroni test, however this method becomes very stringent when many testsare performed and many real deviations from the null hypothesis can goundetected (Waples, 2015).
'hochberg' based on Hochberg, 1988.
'hommel' based on Hommel, 1988. This method is more powerful thanHochberg's, but the difference is usually small.
'bonferroni' in which p-values are multiplied by the number of tests.This method is very stringent and therefore has reduced power to detectmultiple departures from the null hypothesis.
'BH' based on Benjamini & Hochberg, 1995.
'BY' based on Benjamini & Yekutieli, 2001.

Ternary plots

Ternary plots can be used to visualise patterns of H-W proportions (plot.out= TRUE). P-values and the statistical (non)significance of a large number ofbi-allelic markers can be inferred from their position in a ternary plot.See Graffelman & Morales-Camarena (2008) for further details. Ternary plotsare based on the functionHWTernaryPlot fromthe package HardyWeinberg. Each vertex of the Ternary plot represents one of the three possible genotypes for SNP data: homozygous for the reference allele (AA), heterozygous (AB) and homozygous for the alternative allele(BB). Loci deviating significantly from Hardy-Weinberg proportions after correction for multiple tests are shown in pink. The blue parabola represents Hardy-Weinberg equilibrium, and the area between green lines represents the acceptance region.

For these plots to work it is necessary to install the package ggtern.

Value

A dataframe containing loci, counts of reference SNP homozygotes,heterozygotes and alternate SNP homozygotes; probability of departure fromH-W proportions, per locus significance with and without correction formultiple comparisons and the number of population where the same locus is significantly out of HWE.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

References

Benjamini, Y., and Yekutieli, D. (2001). The control of the falsediscovery rate in multiple testing under dependency. Annals of Statistics,29, 1165–1188.
Graffelman, J. (2015). Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64:1-23.
Graffelman, J. & Morales-Camarena, J. (2008). Graphical tests forHardy-Weinberg equilibrium based on the ternary plot. Human Heredity65:77-84.
Graffelman, J., & Moreno, V. (2013). The mid p-value in exact tests forHardy-Weinberg equilibrium. Statistical applications in genetics andmolecular biology, 12(4), 433-448.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple testsof significance. Biometrika, 75, 800–803.
Hommel, G. (1988). A stagewise rejective multiple test procedure basedon a modified Bonferroni test. Biometrika, 75, 383–386.
Rice, W. R. (1989). Analyzing tables of statistical tests. Evolution,43(1), 223-225.
Waples, R. S. (2015). Testing for Hardy–Weinberg proportions: have welost the plot?. Journal of heredity, 106(1), 1-19.
Wigginton, J.E., Cutler, D.J., & Abecasis, G.R. (2005). A Note on ExactTests of Hardy-Weinberg Equilibrium. American Journal of Human Genetics76:887-893.

Calculates pairwise linkage disequilibrium by population

Description

This function calculates pairwise linkage disequilibrium (LD) by population using the functionld (package snpStats).

If SNPs are not mapped to a reference genome, the parameterld_max_pairwiseshould be set as NULL (the default). In this case, the function will assign the same chromosome ("1") to all the SNPs in the datasetand assign a sequence from 1 to n loci as the position of each SNP. The function will then calculate LD for all possible SNP pair combinations.

If SNPs are mapped to a reference genome, the parameterld_max_pairwiseshould be filled out (i.e. not NULL). In this case, theinformation for SNP's position should be stored in the genlight accessor"@position" and the SNP's chromosome name in the accessor "@chromosome"(see examples). The function will then calculate LD within each chromosomeand for all possible SNP pair combinations within a distance ofld_max_pairwise.

Usage

gl.report.ld.map(  x,  ld_max_pairwise = NULL,  maf = 0.05,  ld_stat = "R.squared",  ind.limit = 10,  stat_keep = "AvgPIC",  ld_threshold_pops = 0.2,  plot.out = TRUE,  plot_theme = NULL,  histogram_colors = NULL,  boxplot_colors = NULL,  bins = 50,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

ld_max_pairwise

Maximum distance in number of base pairs at which LD should be calculated [default NULL].

maf

ld_stat

The LD measure to be calculated: "LLR", "OR", "Q", "Covar","D.prime", "R.squared", and "R". Seeld(package snpStats) for details [default "R.squared"].

ind.limit

Minimum number of individuals that a population shouldcontain to take it in account to report loci in LD [default 10].

stat_keep

Name of the column from the slotloc.metrics to beused to choose SNP to be kept [default "AvgPIC"].

ld_threshold_pops

LD threshold to report in the plot of "Number of populations in which the same SNP pair are in LD" [default 0.2].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

User specified theme [default NULL].

histogram_colors

Vector with two color names for the borders and fill[default NULL].

boxplot_colors

A color palette for box plots by population or a listwith as many colors as there are populations in the dataset[default NULL].

bins

Number of bins to display in histograms [default 50].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

This function reports LD between SNP pairs by population. The functiongl.filter.ld filters out the SNPs in LD using asinput the results ofgl.report.ld.map. The actual number of SNPs to be filtered out depends on the parameters set in the functiongl.filter.ld.

Boxplots of LD by population anda histogram showing LD frequency are presented.

Value

A dataframe with information for each SNP pair in LD.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")x <- platypus.glx <- gl.filter.callrate(x,threshold = 1)x <- gl.filter.monomorphs(x)x$position <- x$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1x$chromosome <- as.factor(x$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)ld_res <- gl.report.ld.map(x,ld_max_pairwise = 10000000)

Reports summary of the slot $other$loc.metrics

Description

This script uses any field with numeric values stored in $other$loc.metricsto produce summary statistics (mean, minimum, average, quantiles), histogramsand boxplots to assist the decision of choosing thresholds for the filterfunctiongl.filter.locmetric.

Usage

gl.report.locmetric(  x,  metric,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

metric

Name of the metric to be used for filtering [required].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

The functiongl.filter.locmetric will filter out theloci with a locmetric value below a specified threshold.

The fields that are included in dartR, and a short description, are foundbelow. Optionally, the user can also set his/her own field by adding a vectorinto $other$loc.metrics as shown in the example. You can check the names ofall available loc.metrics via: names(gl$other$loc.metrics).

SnpPosition - position (zero is position 1) in the sequence tag of thedefined SNP variant base.
CallRate - proportion of samples for which the genotype call isnon-missing (that is, not '-' ).
OneRatioRef - proportion of samples for which the genotype score is 0.
OneRatioSnp - proportion of samples for which the genotype score is 2.
FreqHomRef - proportion of samples homozygous for the Reference allele.
FreqHomSnp - proportion of samples homozygous for the Alternate (SNP)allele.
FreqHets - proportion of samples which score as heterozygous, that is,scored as 1.
PICRef - polymorphism information content (PIC) for the Reference allele.
PICSnp - polymorphism information content (PIC) for the SNP.
AvgPIC - average of the polymorphism information content (PIC) of thereference and SNP alleles.
AvgCountRef - sum of the tag read counts for all samples, divided by thenumber of samples with non-zero tag read counts, for the Reference allele row.
AvgCountSnp - sum of the tag read counts for all samples, divided by thenumber of samples with non-zero tag read counts, for the Alternate (SNP) allelerow.
RepAvg - proportion of technical replicate assay pairs for which themarker score is consistent.
rdepth - read depth.

Function's output

The minimum, maximum, mean and a tabulation of quantiles of the locmetricvalues against thresholds rate are provided. Output also includes a boxplotand a histogram.

Quantiles are partitions of a finite set of values into q subsets of (nearly)equal sizes. In this function q = 20. Quantiles are useful measures becausethey are less susceptible to long-tailed distributions and outliers.

Examples of other themes that can be used can be consulted in:

Value

An unaltered genlight object.

Author(s)

Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

# adding dummy datatest <- testset.gltest$other$loc.metrics$test <- 1:nLoc(test)# SNP dataout <- gl.report.locmetric(test,metric='test')# adding dummy datatest.gs <- testset.gstest.gs$other$loc.metrics$test <- 1:nLoc(test.gs)# Tag P/A dataout <- gl.report.locmetric(test.gs,metric='test')

Reports minor allele frequency (MAF) for each locus in a SNP dataset

Description

This script provides summary histograms of MAF for eachpopulation in the dataset and an overall histogram to assist the decision ofchoosing thresholds for the filter functiongl.filter.maf

Usage

gl.report.maf(  x,  maf.limit = 0.5,  ind.limit = 5,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors_pop = discrete_palette,  plot_colors_all = two_colors,  bins = 25,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

maf.limit

Show histograms MAF range <= maf.limit [default 0.5].

ind.limit

Show histograms only for populations of size greater thanind.limit [default 5].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors_pop

A color palette for population plots[default discrete_palette].

plot_colors_all

List of two color names for the borders and fill ofthe overall plot [default two_colors].

bins

Number of bins to display in histograms [default 25].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

The functiongl.filter.maf will filter out theloci with MAF below a specified threshold.

Function's output

The minimum, maximum, mean and a tabulation of MAF quantiles againstthresholds rate are provided. Output also includes a boxplot and ahistogram.

This function reports theMAF for each of several quantiles. Quantiles arepartitions of a finite set of values into q subsets of (nearly) equal sizes.In this function q = 20. Quantiles are useful measures because they are lesssusceptible to long-tailed distributions and outliers.

Examples of other themes that can be used can be consulted in

Value

An unaltered genlight object

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

gl <- gl.report.maf(platypus.gl)

Reports monomorphic loci

Description

This script reports the number of monomorphic loci and those with all NAs ina genlight {adegenet} object

Usage

gl.report.monomorphs(x, verbose = NULL)

Arguments

x

Name of the input genlight object [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

A DArT dataset will not have monomorphic loci, but they can arise, along withloci that are scored all NA, when populations or individuals are deleted.Retaining monomorphic loci unnecessarily increases the size of the datasetand will affect some calculations.

Note that for SNP data, NAs likely represent null alleles; in tagpresence/absence data, NAs represent missing values (presence/absence couldnot be reliably scored)

Value

An unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  gl.report.monomorphs(testset.gl)# SilicoDArT data  gl.report.monomorphs(testset.gs)

Reports loci for which the SNP has been trimmed from the sequence tagalong with the adaptor

Description

Usage

gl.report.overshoot(x, save2tmp = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object [required].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

The SNP genotype can still be used in most analyses, but functions likegl2fasta() will present challenges if the SNP has been trimmed from the sequence tag.

Resultant ggplot(s) and the tabulation(s) are saved to the session'stemporary directory.

Value

An unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

gl.report.overshoot(testset.gl)

Reports private alleles (and fixed alleles) per pair of populations

Description

This function reports private alleles in one population compared with asecond population, for all populations taken pairwise. It also reports acount of fixed allelic differences and the mean absolute allele frequencydifferences (AFD) between pairs of populations.

Usage

gl.report.pa(  x,  x2 = NULL,  method = "pairwise",  loc_names = FALSE,  plot.out = TRUE,  font_plot = 14,  map.interactive = FALSE,  provider = "Esri.NatGeoWorldMap",  palette_discrete = discrete_palette,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or SilicoDArT data [required].

x2

If two separate genlight objects are to be compared this can beprovided here, but they must have the same number of SNPs [default NULL].

method

Method to calculate private alleles: 'pairwise' comparison orcompare each population against the rest 'one2rest' [default 'pairwise'].

loc_names

Whether names of loci with private alleles and fixed differences should reported. If TRUE, loci names are reported using a list[default FALSE].

plot.out

Specify if Sankey plot is to be produced [default TRUE].

font_plot

Numeric font size in pixels for the node text labels[default 14].

map.interactive

Specify whether an interactive map showing privatealleles between populations is to be produced [default FALSE].

provider

Passed to leaflet [default "Esri.NatGeoWorldMap"].

palette_discrete

A discrete palette for the color of populations or alist with as many colors as there are populations in the dataset[default discrete_palette].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent, fatal errors only; 1, flag functionbegin and end; 2, progress log; 3, progress and results summary; 5, fullreport [default 2 or as specified using gl.set.verbosity].

Details

Note that the number of paired alleles between two populations is not asymmetric dissimilarity measure.

If no x2 is provided, the function uses the pop(gl) hierarchy to determinepairs of populations, otherwise it runs a single comparison between x andx2.

Hint: in case you want to run comparisons between individuals(assuming individual names are unique), you can simply redefine yourpopulation names with your individual names, as below:

pop(gl) <- indNames(gl)

Definition of fixed and private alleles

The table below shows the possible cases of allele frequencies betweentwo populations (0 = homozygote for Allele 1, x = both Alleles are present,1 = homozygote for Allele 2).

p: cases where there is a private allele in pop1 compared to pop2 (butnot vice versa)
f: cases where there is a fixed allele in pop1 (and pop2, as those casesare symmetric)

			pop1
		0	x	1
	0	-	p	p,f
pop2	x	-	-	-
	1	p,f	p	-

The absolute allele frequency difference (AFD) in this function is a simpledifferentiation metric displaying intuitive properties which provides avaluable alternative to FST. For details about its properties and how it iscalculated see Berner (2019).

The function also reports an estimation of the lower bound of the number ofundetected private alleles using the Good-Turing frequency formula,originally developed for cryptography, which estimates in an ecological context the true frequencies of rare species in a single assemblage based onan incomplete sample of individuals. The approach is described in Chao et al. (2017). For this function, the equation 2c is used. This estimate is reported in the output table as Chao1 and Chao2.

In this function a Sankey Diagram is used to visualize patterns of privatealleles between populations. This diagram allows to display flows (privatealleles) between nodes (populations). Their links are represented with arcsthat have a width proportional to the importance of the flow (number ofprivate alleles).

if save2temp=TRUE, resultant plot(s) and the tabulation(s) are saved to thesession's temporary directory.

Value

A data.frame. Each row shows, for each pair of populations the numberof individuals in each population, the number of loci with fixed differences(same for both populations) in pop1 (compared to pop2) and vice versa. Samefor private alleles and finally the absolute mean allele frequencydifference between loci (AFD). If loc_names = TRUE, loci names with privatealleles and fixed differences are reported in a list in addition to the dataframe.

Author(s)

Custodian: Bernd Gruber – Post tohttps://groups.google.com/d/forum/dartr

References

Berner, D. (2019). Allele frequency difference AFD – an intuitivealternative to FST for quantifying genetic population differentiation. Genes,10(4), 308.
Chao, Anne, et al. "Deciphering the enigma of undetected species,phylogenetic, and functional diversity based on Good-Turing theory." Ecology 98.11 (2017): 2914-2929.

Examples

out <- gl.report.pa(platypus.gl)

Identifies putative parent offspring within a population

Description

This script examines the frequency of pedigree inconsistent loci, that is,those loci that are homozygotes in the parent for the reference allele, andhomozygous in the offspring for the alternate allele. This condition is notconsistent with any pedigree, regardless of the (unknown) genotype of theother parent. The pedigree inconsistent loci are counted as an indication ofwhether or not it is reasonable to propose the two individuals are in aparent-offspring relationship.

Usage

gl.report.parent.offspring(  x,  min.rdepth = 12,  min.reproducibility = 1,  range = 1.5,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP genotypes [required].

min.rdepth

Minimum read depth to include in analysis [default 12].

min.reproducibility

Minimum reproducibility to include in analysis[default 1].

range

Specifies the range to extend beyond the interquartile range fordelimiting outliers [default 1.5 interquartile ranges].

plot.out

Creates a plot that shows the sex linked markers[default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

If two individuals are in a parent offspring relationship, the true number ofpedigree inconsistent loci should be zero, but SNP calling is not infallible.Some loci will be miss-called. The problem thus becomes one of determiningif the two focal individuals have a count of pedigree inconsistent loci lessthan would be expected of typical unrelated individuals. There are some quitesophisticated software packages available to formally apply likelihoods tothe decision, but we use a simple outlier comparison.

To reduce the frequency of miss-calls, and so emphasize the differencebetween true parent-offspring pairs and unrelated pairs, the data can befiltered on read depth.

Typically minimum read depth is set to 5x, but you can examine thedistribution of read depths with the functiongl.report.rdepthand push this up with an acceptable loss of loci. 12x might be a good minimumfor this particular analysis. It is sensible also to push the minimumreproducibility up to 1, if that does not result in an unacceptable loss ofloci. Reproducibility is stored in the slot@other$loc.metrics$RepAvgand is defined as the proportion of technical replicate assay pairs for whichthe marker score is consistent. You can examine the distribution ofreproducibility with the functiongl.report.reproducibility.

The functiongl.filter.parent.offspring will filter out thoseindividuals in a parent offspring relationship.

Note that if your dataset does not contain RepAvg or rdepth among the locusmetrics, the filters for reproducibility and read depth are no used.

Function's output

Examples of other themes that can be used can be consulted in

Value

A set of individuals in parent-offspring relationship. NULL if noparent-offspring relationships were found.

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

out <- gl.report.parent.offspring(testset.gl[1:10,1:100])

Reports summary of Read Depth for each locus

Description

SNP datasets generated by DArT report AvgCountRef and AvgCountSnp as countsof sequence tags for the reference and alternate alleles respectively.These can be used to back calculate Read Depth. Fragment presence/absencedatasets as provided by DArT (SilicoDArT) provide Average Read Depth andStandard Deviation of Read Depth as standard columns in their report. Thisfunction reports the read depth by locus for each of several quantiles.

Usage

gl.report.rdepth(  x,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The function displays a table of minimum, maximum, mean and quantiles forread depth against possible thresholds that might subsequently be specifiedingl.filter.rdepth. If plot.out=TRUE, display also includes aboxplot and a histogram to guide in the selection of a threshold forfiltering on read depth.

If save2tmp=TRUE, ggplots and relevant tabulations are saved to thesession's temp directory (tempdir).

For examples of themes, see

Value

An unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP datadf <- gl.report.rdepth(testset.gl)df <- gl.report.rdepth(testset.gs)

Identify replicated individuals

Description

Identify replicated individuals

Usage

gl.report.replicates(  x,  loc_threshold = 100,  perc_geno = 0.99,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  bins = 100,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

loc_threshold

Minimum number of loci required to asses that two individuals are replicates [default 100].

perc_geno

Mimimum percentage of genotypes in which two individuals should be the same [default 0.99].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

User specified theme [default theme_dartR()].

plot_colors

Vector with two color names for the borders and fill[default two_colors].

bins

Number of bins to display in histograms [default 100].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

This function uses an C++ implementation, so package Rcpp needs to be installed and it is therefore fast (once it has compiled the function after the first run).

Ideally, in a large dataset with related and unrelated individuals and several replicated individuals, such as in a capture/mark/recapture study, the first histogram should have four "peaks". The first peak should representunrelated individuals, the second peak should correspond to second-degree relationships (such as cousins), the third peak should represent first-degree relationships (like parent/offspring and full siblings), andthe fourth peak should represent replicated individuals.

In order to ensure that replicated individuals are properly identified, it'simportant to have a clear separation between the third and fourth peaks in the second histogram. This means that there should be bins with zero counts between these two peaks.

Value

A list with three elements:

table.rep: A dataframe with pairwise results of percentage of same genotypes between two individuals, the number of loci used in the comparison and the missing data for each individual.
ind.list.drop: A vector of replicated individuals to be dropped.Replicated individual with the least missing data is reported.
ind.list.rep: A list of of each individual that has replicates in the dataset, the name of the replicates and the percentage of the same genotype.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

res_rep <- gl.report.replicates(platypus.gl, loc_threshold = 500, perc_geno = 0.85)

Reports summary of RepAvg (repeatability averaged over both alleles foreach locus) or reproducibility (repeatability of the scores for fragmentpresence/absence)

Description

SNP datasets generated by DArT have an index, RepAvg, generated byreproducing the data independently for 30of alleles that give a repeatable result, averaged over both alleles for eachlocus.

In the case of fragment presence/absence data (SilicoDArT), repeatability isthe percentage of scores that are repeated in the technical replicatedataset.

Usage

gl.report.reproducibility(  x,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

plot.out

If TRUE, displays a plot to guide the decision on a filterthreshold [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The function displays a table of minimum, maximum, mean and quantiles forrepeatbility against possible thresholds that might subsequently bespecified ingl.filter.reproducibility.

If plot.out=TRUE, display also includes a boxplot and a histogram to guidein the selection of a threshold for filtering on repeatability.

If save2tmp=TRUE, ggplots and relevant tabulations are saved to thesession's temp directory (tempdir)

For examples of themes, see:

Value

An unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 # SNP data  out <- gl.report.reproducibility(testset.gl)  # Tag P/A data  out <- gl.report.reproducibility(testset.gs)

Reports loci containing secondary SNPs in sequence tags and calculatesnumber of invariant sites

Description

SNP datasets generated by DArT include fragments with more thanone SNP (that is, with secondaries). They are recorded separately with thesame CloneID (=AlleleID). These multiple SNP loci within a fragment arelikely to be linked, and so you may wish to remove secondaries.

This function reports statistics associated with secondaries, and theconsequences of filtering them out, and provides three plots. The first isa boxplot, the second is a barplot of the frequency of secondaries persequence tag, and the third is the Poisson expectation for thosefrequencies including an estimate of the zero class (no. of sequence tagswith no SNP scored).

Usage

gl.report.secondaries(  x,  nsim = 1000,  taglength = 69,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

nsim

The number of simulations to estimate the mean of the Poissondistribution [default 1000].

taglength

Typical length of the sequence tags [default 69].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options [defaulttheme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

The functiongl.filter.secondaries will filter out theloci with secondaries retaining only one sequence tag.

Heterozygosity as estimated by the functiongl.report.heterozygosity is in a sense relative, because itis calculated against a background of only those loci that are polymorphicsomewhere in the dataset. To allow intercompatibility across studies andspecies, any measure of heterozygosity needs to accommodate loci that areinvariant (autosomal heterozygosity. See Schmidt et al 2021). However, thenumber of invariant loci are unknown given the SNPs are detected as singlepoint mutational variants and invariant sequences are discarded, andbecause of the particular additional filtering pre-analysis. Modelling thecounts of SNPs per sequence tag as a Poisson distribution in this scriptallows estimate of the zero class, that is, the number of invariant loci.This is reported, and the veracity of the estimate can be assessed by thecorrespondence of the observed frequencies against those under Poissonexpectation in the associated graphs. The number of invariant loci can thenbe optionally provided to the functiongl.report.heterozygosity via the parameter n.invariants.

In case the calculations for the Poisson expectation of the number ofinvariant sequence tags fail to converge, try to rerun the analysis with alargernsim values.

This function now also calculates the number of invariant sites (i.e.nucleotides) of the sequence tags (ifTrimmedSequence is present inx$other$loc.metrics) or estimate these by assuming that the averagelength of the sequence tags is 69 nucleotides. Based on the Poissonexpectation of the number of invariant sequence tags, it also estimates thenumber of invariant sites for these to eventually provide an estimate ofthe total number of invariant sites.

Note, previous version ofdartR would only return an estimate of the number of invariantsequence tags (not sites).

Plots are saved to the session temporary directory (tempdir).

Examples of other themes that can be used can be consulted in:

Value

A data.frame with the list of parameter values

n.total.tags Number of sequence tags in total
n.SNPs.secondaries Number of secondary SNP loci that would be removedon filtering
n.invariant.tags Estimated number of invariant sequence tags
n.tags.secondaries Number of sequence tags with secondaries
n.inv.gen Number of invariant sites in sequenced tags
mean.len.tag Mean length of sequence tags
n.invariant Total Number of invariant sites (including invariantsequence tags)
k Lambda: mean of the Poisson distribution of number of SNPs in thesequence tags

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

References

Schmidt, T.L., Jasper, M.-E., Weeks, A.R., Hoffmann, A.A., 2021.Unbiased population heterozygosity estimates from genome-wide sequencedata. Methods in Ecology and Evolution n/a.

Examples

require("dartR.data")test <- gl.filter.callrate(platypus.gl,threshold = 1)n.inv <- gl.report.secondaries(test)gl.report.heterozygosity(test, n.invariant = n.inv[7, 2])

Identifies loci that are sex linked

Description

Alleles unique to the Y or W chromosome and monomorphic on the X chromosomeswill appear in the SNP dataset as genotypes that are heterozygotic in allindividuals of the heterogametic sex and homozygous in all individuals of thehomogametic sex. This function identifies loci with alleles that behave inthis way, as putative sex specific SNP markers.

Usage

gl.report.sexlinked(  x,  sex = NULL,  t.het = 0.1,  t.hom = 0.1,  t.pres = 0.1,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = three_colors,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

sex

Factor that defines the sex of individuals. See explanation indetails [default NULL].

t.het

Tolerance in the heterogametic sex, that is t.het=0.05 meansthat 5% of the heterogametic sex can be homozygous and still be regarded asconsistent with a sex specific marker [default 0.1].

t.hom

Tolerance in the homogametic sex, that is t.hom=0.05 means that5% of the homogametic sex can be heterozygous and still be regarded asconsistent with a sex specific marker [default 0.1].

t.pres

Tolerance in presence, that is t.pres=0.05 means that asilicodart marker can be present in either of the sexes and still be regardedas a sex-linked marker [default 0.1].

plot.out

Creates a plot that shows the heterozygosity of males andfemales at each loci and shaded area in which loci can be regarded as consistent with a sex specific marker [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of three color names for the not sex-linked loci, forthe sex-linked loci and for the area in which sex-linked loci appear[default three_colors].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

' Function's output

This function creates a plot that shows the heterozygosity of males andfemales at each loci or SNP data or percentage of present/absent in the caseof SilicoDArT data.

Examples of other themes that can be used can be consulted in

Value

Two lists of sex-linked loci, one for XX/XY and one for ZZ/ZW systemsand a plot.

Author(s)

Arthur Georges, Bernd Gruber & Floriaan Devloo-Delva(Post tohttps://groups.google.com/d/forum/dartr)

Examples

 out <- gl.report.sexlinked(testset.gl)out <- gl.report.sexlinked(testset.gs)test <- gl.filter.callrate(platypus.gl)test <- gl.filter.monomorphs(test)out <- gl.report.sexlinked(test)

Reports summary of sequence tag length across loci

Description

SNP datasets generated by DArT typically have sequence tag lengths rangingfrom 20 to 69 base pairs. This function reports summary statistics of the taglengths.

Usage

gl.report.taglength(  x,  plot.out = TRUE,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP [required].

plot.out

If TRUE, displays a plot to guide the decision on a filterthreshold [default TRUE].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity]

Details

The functiongl.filter.taglength will filter out theloci with a tag length below a specified threshold.

Function's output

The minimum, maximum, mean and a tabulation of tag length quantiles againstthresholds are provided. Output also includes a boxplot and ahistogram to guide in the selection of a threshold for filtering on taglength.

Examples of other themes that can be used can be consulted in

Value

Returns unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

out <- gl.report.taglength(testset.gl)

Runs a faststructure analysis using a genlight object

Description

This function takes a genlight object and runs a faststructure analysis.

Usage

gl.run.faststructure(  x,  k.range,  num.k.rep,  exec = "./fastStructure",  output = getwd(),  tol = 1e-05,  prior = "simple",  cv = 0,  seed = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

k.range

Range of the number of populations [required].

num.k.rep

Number of replicates [required].

exec

Full path and name+extension where the fastStructure executableis located [default working directory "./fastStructure"].

output

Path to output file [default getwd()].

tol

Convergence criterion [default 10e-6].

prior

Choice of prior: simple or logistic [default "simple"].

cv

Number of test sets for cross-validation, 0 implies no CV step[default 0].

seed

Seed for random number generator [default NULL].

Details

Download faststructure binary for your system from here (only runs on Mac or Linux):

https://github.com/StuntsPT/Structure_threader/tree/master/structure_threader/bins

Move faststructure file to working directory. Make file executable using terminal app.

system(paste0("chmod u+x ",getwd(), "/faststructure"))

Download plink binary for your system from here:

https://www.cog-genomics.org/plink/

Move plink file to working directory. Make file executable using terminal app.

system(paste0("chmod u+x ",getwd(), "/plink"))

To install fastStructure dependencies follow these directions:https://github.com/rajanil/fastStructure

fastStructure performs inference for the simplest, independent-loci,admixture model, with two choices of priors that can be specified usingthe –prior parameter. Thus, unlike Structure, fastStructure does not requirethe mainparams and extraparam files. The inference algorithm used byfastStructure is fundamentally different from that of Structure andrequires the setting of far fewer options.

To identify the number of populations that best approximates the marginallikelihood of the data, the marginal likelihood is extracted from each runof K, averaged across replications and plotted.

Value

A list in which each list entry is a single faststructure run output(there are k.range * num.k.rep number of runs).

Author(s)

Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Raj, A., Stephens, M., & Pritchard, J. K. (2014). fastSTRUCTURE:variational inference of population structure in large SNP data sets.Genetics, 197(2), 573-589.

Examples

## Not run: t1 <- gl.filter.callrate(platypus.gl,threshold = 1)res <- gl.run.faststructure(t1, exec = "./fastStructure",k.range = 2:3,                           num.k.rep = 2,output = paste0(getwd(),"/res_str"))qmat <- gl.plot.faststructure(res,k.range=2:3)gl.map.structure(qmat, K=2, t1, scalex=1, scaley=0.5)## End(Not run)

Runs a STRUCTURE analysis using a genlight object

Description

This function takes a genlight object and runs a STRUCTURE analysis based onfunctions fromstrataG

Usage

gl.run.structure(  x,  ...,  exec = ".",  plot.out = TRUE,  plot_theme = theme_dartR(),  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

...

Parameters to specify the STRUCTURE run (checkstructureRunwithin strataG.for more details). Parameters are passed to thestructureRun function.For example you need to set the k.range and the type of model you would liketo run (noadmix, locprior) etc. If those parameter names do not tell youanything, please make sure you familiarize with the STRUCTURE program(Pritchard 2000).

exec

Full path and name+extension where the structure executable islocated. E.g.'c:/structure/structure.exe' under Windows. For Mac andLinux it might be something like'./structure/structure' if theexecutable is in a subfolder 'structure' in your home directory[default working directory "."].

plot.out

Create an Evanno plot once finished. Be aware k.range needsto be at least three different k steps [default TRUE].

plot_theme

Theme for the plot. See details for options[default theme_dartR()].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Set verbosity for this function (though structure outputcannot be switched off currently) [default NULL]

Details

The function is basically a convenient wrapper around the beautifulstrataG functionstructureRun (Archer et al. 2016). For a detaileddescription please refer to this package (see references below).To make use of this function you need to download STRUCTURE for you system(non GUI version) from hereSTRUCTURE.

Format note

For this function to work, make sure that individual and population names have no spaces. To substitute spaces by underscores you could use the R functiongsub as below.

popNames(gl) <- gsub(" ","_",popNames(gl))

indNames(gl) <- gsub(" ","_",indNames(gl))

It's also worth noting that Structure truncates individual names at 11 characters. The function will fail if the names of individuals are not uniqueafter truncation. To avoid this possible problem, a number sequence, as shown in the code below, might be used instead of individual names.indNames(gl) <- as.character(1:length(indNames(gl)))

Value

An sr object (structure.result list output). Each list entry is asingle structurerun output (there are k.range * num.k.rep number of runs).For example the summary output of the first run can be accessed viasr[[1]]$summary or the q-matrix of the third run viasr[[3]]$q.mat. To conveniently summarise the outputs across runs(clumpp) you need to run gl.plot.structure on the returned sr object. ForEvanno plots run gl.evanno on your sr object.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

References

Pritchard, J.K., Stephens, M., Donnelly, P. (2000) Inference ofpopulation structure using multilocus genotype data. Genetics 155, 945-959.
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016) strataG: An Rpackage for manipulating, summarizing and analysing population genetic data.Mol Ecol Resour. doi:10.1111/1755-0998.12559

Examples

## Not run: #bc <- bandicoot.gl[,1:100]#sr <- gl.run.structure(bc, k.range = 2:5, num.k.rep = 3, # exec = './structure.exe')#ev <- gl.evanno(sr)#ev#qmat <- gl.plot.structure(sr, K=3)#head(qmat)#gl.map.structure(qmat, bc, scalex=1, scaley=0.5)## End(Not run)

Samples individuals from populations

Description

This is a convenience function to prepare a bootstrap approach in dartR. For a bootstrap approach it is often desirable to sample a defined number of individuals for each of the populations in a genlight object and then calculate a certain quantity for that subset (redo a 1000 times)

Usage

gl.sample(  x,  nsample = min(table(pop(x))),  replace = TRUE,  onepop = FALSE,  verbose = NULL)

Arguments

x

genlight object containing SNP/silicodart genotypes

nsample

the number of individuals that should be sampled

replace

a switch to sample by replacement (default).

onepop

switch to ignore population settings of the genlight object and sample from all individuals disregarding the population definition. [default FALSE].

verbose

set verbosity

Details

This is convenience function to facilitate a bootstrap approach

Value

returns a genlight object with nsample samples from each populations.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run: #bootstrap for 2 possums populations to check effect of sample size on fixed allelesgl.set.verbosity(0)pp <- possums.gl[1:60,]nrep <- 1:10nss <- seq(1,10,2)res <- expand.grid(nrep=nrep, nss=nss)for (i in 1:nrow(res)) {dummy <- gl.sample(pp, nsample=res$nss[i], replace=TRUE)pas <- gl.report.pa(dummy, plot.out = F)res$fixed[i] <- pas$fixed[1]}boxplot(fixed ~ nss, data=res)## End(Not run)

Saves an object in compressed binary format for later rapid retrieval

Description

This is a wrapper for saveRDS().

The script saves the object in binary form to the current workspace andreturns the input gl object.

Usage

gl.save(x, file, verbose = NULL)

Arguments

x

Name of the genlight object containing SNP genotypes [required].

file

Name of the file to receive the binary version of the object[required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

The input object

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

gl.save(testset.gl,file.path(tempdir(),'testset.rds'))

Selects colors from one of several palettes and output as a vector

Description

This script draws upon a number of specified color libraries to extract avector of colors for plotting, where the script that follows has a colorparameter expecting a vector of colors.

Usage

gl.select.colors(  x = NULL,  library = NULL,  palette = NULL,  ncolors = NULL,  select = NULL,  verbose = NULL)

Arguments

x

Optionally, provide a gl object from which to determine the numberof populations [default NULL].

library

Name of the color library to be used [default scales::hue_pl].

palette

Name of the color palette to be pulled from the specifiedlibrary [default is library specific] .

ncolors

number of colors to be displayed and returned [default 9].

select

select the colors to retain in the output vector[default NULL].

verbose

– verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The available color libraries and their palettes include:

library 'brewer' and the palettes available can be listed byRColorBrewer::display.brewer.all() and RColorBrewer::brewer.pal.info.
library 'gr.palette' and the palettes available can be listed bygrDevices::palette.pals()
library 'r.hcl' and the palettes available can be listed bygrDevices::hcl.pals()
library 'baseR' and the palettes available are: 'rainbow','heat','topo.colors','terrain.colors','cm.colors'.

If the nominated palette is not specified, all the palettes will be listed and a default palette will then be chosen.

The color palette will be displayed in the graphics window for the requestednumber of colors (or 9 if not specified),and the vector of colors returnedfor later use.

The select parameter can be used to select colors from the specified ncolors.For example, select=c(1,1,3) will select color 1, 1 again and 3 to retain inthe final vector. This can be useful for fine-tuning color selection, andmatching colors and shapes.

Value

A vector with the required number of colors

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SET UP DATASETgl <- testset.gllevels(pop(gl))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',7),'Em.subglobosa','Em.victoriae')# EXAMPLES -- SIMPLEcolors <- gl.select.colors()colors <- gl.select.colors(library='brewer',palette='Spectral',ncolors=6)colors <- gl.select.colors(library='baseR',palette='terrain.colors',ncolors=6)colors <- gl.select.colors(library='baseR',palette='rainbow',ncolors=12)colors <- gl.select.colors(library='gr.hcl',palette='RdBu',ncolors=12)colors <- gl.select.colors(library='gr.palette',palette='Pastel 1',ncolors=6)# EXAMPLES -- SELECTING colorScolors <- gl.select.colors(library='baseR',palette='rainbow',ncolors=12,select=c(1,1,1,5,8))# EXAMPLES -- CROSS-CHECKING WITH A GENLIGHT OBJECTcolors <- gl.select.colors(x=gl,library='baseR',palette='rainbow',ncolors=12,select=c(1,1,1,5,8))

Selects shapes from the base R shape palette and outputs as a vector

Description

This script draws upon the standard R shape palette to extract a vector ofshapes for plotting, where the script that follows has a shape parameterexpecting a vector of shapes.

Usage

gl.select.shapes(x = NULL, select = NULL, verbose = NULL)

Arguments

x

Optionally, provide a gl object from which to determine the numberof populations [default NULL].

select

Select the shapes to retain in the output vector[default NULL, all shapes shown and returned].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

By default the shape palette will be displayed in full in the graphics windowfrom which shapes can be selected in a subsequent run, and the vector ofshapes returned for later use.

The select parameter can be used to select shapes from the specified 26shapes available (0-25). For example, select=c(1,1,3) will select shape 1, 1again and 3 to retain in the final vector. This can be useful for fine-tuningshape selection, and matching colors and shapes.

Value

A vector with the required number of shapes

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SET UP DATASETgl <- testset.gllevels(pop(gl))<-c(rep('Coast',5),rep('Cooper',3),rep('Coast',5),rep('MDB',8),rep('Coast',7),'Em.subglobosa','Em.victoriae')# EXAMPLESshapes <- gl.select.shapes() # Select and display available shapes# Select and display a restricted set of shapesshapes <- gl.select.shapes(select=c(1,1,1,5,8))  # Select set of shapes and check with no. of pops.shapes <- gl.select.shapes(x=gl,select=c(1,1,1,5,8))

Sets the default verbosity level

Description

dartR functions have a verbosity parameter that sets the level of reportingduring the execution of the function. The verbosity level, set by parameter'verbose' can be one of verbose 0, silent or fatal errors; 1, begin and end;2, progress ; 3, progress and results summary; 5, full report. Thedefault value for verbosity is stored in the r environment. This script setsthe default value.

Usage

gl.set.verbosity(value = 2)

Arguments

value

Set the default verbosity to be this value: 0, silent only fatalerrors; 1, begin and end; 2, progress log; 3, progress and results summary;5, full report [default 2]

Value

verbosity value [set for all functions]

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)# Examples ————-

Examples

gl <- gl.set.verbosity(value=2)

Creates a site frequency spectrum based on a dartR or genlight object

Description

Creates a site frequency spectrum based on a dartR or genlight object

Usage

gl.sfs(  x,  minbinsize = 0,  folded = TRUE,  singlepop = FALSE,  plot.out = TRUE,  verbose = NULL)

Arguments

x

dartR/genlight object

minbinsize

remove bins from the left of the sfs. For example to removesingletons (alleles only occurring once among all individuals) setminbinsize to 2. If set to zero, also monomorphic (d0) loci are returned.

folded

if set to TRUE (default) a folded sfs (minor allele frequencysfs) is returned. If set to FALSE then an unfolded (derived allele frequencysfs) is returned. It is assumed that 0 is homozygote for the reference and2 is homozygote for the derived allele. So you need to make sure yourcoding is correct.

singlepop

switch to force to create a one-dimensional sfs, eventhough the genlight/dartR object contains more than one population

plot.out

Specify if plot is to be produced [default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

returns a site frequency spectrum, either a one dimensional vector(only a single population in the dartR/genlight object or singlepop=TRUE) oran n-dimensional array (n is the number of populations in the genlight/dartRobject). If the dartR/genlight object consists of several populations themultidimensional site frequency spectrum for each population is returned[=a multidimensional site frequency spectrum]. Be aware themultidimensional spectrum works only for a limited number of populationand individuals [if too high the table command used internally willthrough an error as the number of populations and individuals (andtherefore dimensions) are too large]. To get a single sfs for agenlight/dartR object with multiple populations, you need to setsinglepop to TRUE. The returned sfs can be used to analyse demographics,e.g. using fastsimcoal2.

Author(s)

Custodian: Bernd Gruber & Carlo Pacioni (Post tohttps://groups.google.com/d/forum/dartr)

References

Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V. C. andFoll M. (2013) Robust demographic inference from genomic and SNP data. PLoSgenetics 9(10)

Examples

gl.sfs(bandicoot.gl, singlepop=TRUE)gl.sfs(possums.gl[c(1:5,31:33),], minbinsize=1)

Runs Wright-Fisher simulations

Description

This function simulates populations made up of diploid organisms that reproduce in non-overlapping generations. Each individual has a pair of homologous chromosomes that contains interspersed selected and neutral loci. For the initial generation, the genotype for each individual’s chromosomes israndomly drawn from distributions at linkage equilibrium and in Hardy-Weinberg equilibrium.

See documentation and tutorial for a complete description of the simulations.These documents can be accessed at http://georges.biomatix.org/dartR

Take into account that the simulations will take a little bit longer thefirst time you use the function gl.sim.WF.run() because C++ functions mustbe compiled.

Usage

gl.sim.WF.run(  file_var,  ref_table,  x = NULL,  file_dispersal = NULL,  number_iterations = 1,  every_gen = 10,  sample_percent = 50,  store_phase1 = FALSE,  interactive_vars = TRUE,  seed = NULL,  verbose = NULL,  ...)

Arguments

file_var

Path of the variables file 'sim_variables.csv' (see details) [required if interactive_vars = FALSE].

ref_table

Reference table created by the functiongl.sim.WF.table [required].

x

Name of the genlight object containing the SNP data to extractvalues for some simulation variables (see details) [default NULL].

file_dispersal

Path of the file with the dispersal table created withthe functiongl.sim.create_dispersal [default NULL].

number_iterations

Number of iterations of the simulations [default 1].

every_gen

Generation interval at which simulations should be stored ina genlight object [default 10].

sample_percent

Percentage of individuals, from the total population, to sample and save in the genlight object every generation [default 50].

store_phase1

Whether to store simulations of phase 1 in genlightobjects [default FALSE].

interactive_vars

Run a shiny app to input interactively the values ofsimulations variables [default TRUE].

seed

Set the seed for the simulations [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

...

Any variable and its value can be added separately within the function, will be changed over the input value supplied by the csv file. See tutorial.

Details

Values for simulation variables can be submitted into the function interactively through a shiny app if interactive_vars = TRUE. Optionally, if interactive_vars = FALSE, values for variables can be submitted by using thecsv file 'sim_variables.csv' which can be found by typing in the R console:system.file('extdata', 'sim_variables.csv', package ='dartR').

The values of the variables can be modified using the third column (“value”) of this file.

The output of the simulations can be analysed seemingly with other dartR functions.

If a genlight object is used as input for some of the simulation variables, this function access the information stored in the slots x$position and x$chromosome.

To show further information of the variables in interactive mode, it might benecessary to call first: 'library(shinyBS)' for the information to be displayed.

The main characteristics of the simulations are:

Simulations can be parameterised with real-life genetic characteristics such as the number, location, allele frequency and the distribution of fitness effects (selection coefficients and dominance) of loci under selection.
Simulations can recreate specific life histories and demographics, suchas source populations, dispersal rate, number of generations, founder individuals, effective population size and census population size.
Each allele in each individual is an agent (i.e., each allele is explicitly simulated).
Each locus can be customisable regarding its allele frequencies, selection coefficients, and dominance.
The number of loci, individuals, and populations to be simulated is only limited by computing resources.
Recombination is accurately modeled, and it is possible to use real recombination maps as input.
The ratio between effective population size and census population size can be easily controlled.
The output of the simulations are genlight objects for each generation or a subset of generations.
Genlight objects can be used as input for some simulation variables.

Value

Returns genlight objects with simulated data.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

## Not run: ref_table <- gl.sim.WF.table(file_var=system.file('extdata', 'ref_variables.csv', package = 'dartR'),interactive_vars = FALSE)res_sim <- gl.sim.WF.run(file_var = system.file('extdata', 'sim_variables.csv', package ='dartR'),ref_table=ref_table,interactive_vars = FALSE)## End(Not run)

Creates the reference table for running gl.sim.WF.run

Description

This function creates a reference table to be used as input for the functiongl.sim.WF.run. The created table has eight columns with the following information for each locus to be simulated:

q - initial frequency.
h - dominance coefficient.
s - selection coefficient.
c - recombination rate.
loc_bp - chromosome location in base pairs.
loc_cM - chromosome location in centiMorgans.
chr_name - chromosome name.
type - SNP type.

The reference table can be further modified as required.

See documentation and tutorial for a complete description of the simulations.These documents can be accessed at http://georges.biomatix.org/dartR

Usage

gl.sim.WF.table(  file_var,  x = NULL,  file_targets_sel = NULL,  file_r_map = NULL,  interactive_vars = TRUE,  seed = NULL,  verbose = NULL,  ...)

Arguments

file_var

Path of the variables file 'ref_variables.csv' (see details) [required if interactive_vars = FALSE].

x

Name of the genlight object containing the SNP data to extractvalues for some simulation variables (see details) [default NULL].

file_targets_sel

Path of the file with the targets for selection (see details) [default NULL].

file_r_map

Path of the file with the recombination map (see details)[default NULL].

interactive_vars

Run a shiny app to input interactively the values ofsimulation variables [default TRUE].

seed

Set the seed for the simulations [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

...

Any variable and its value can be added separately within the function, will be changed over the input value supplied by the csv file. See tutorial.

Details

Values for the variables to create the reference table can be submitted into the function interactively through a Shiny app if interactive_vars = TRUE. Optionally, if interactive_vars = FALSE, values for variables can be submitted by using the csv file 'ref_variables.csv' which can be found by typing in the R console:system.file('extdata', 'ref_variables.csv', package ='dartR').

The values of the variables can be modified using the third column (“value”) of this file.

If a genlight object is used as input for some of the simulation variables, this function access the information stored in the slots x$position and x$chromosome.

Examples of the format required for the recombination map file and the targets for selection file can be found by typing in the R console:

system.file('extdata', 'fly_recom_map.csv', package ='dartR')
system.file('extdata', 'fly_targets_of_selection.csv', package ='dartR')

To show further information of the variables in interactive mode, it might benecessary to call first: 'library(shinyBS)' for the information to be displayed.

Value

Returns a list with the reference table used as input for the functiongl.sim.WF.run and a table with the values variables used to create the reference table.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

ref_table <- gl.sim.WF.table(file_var=system.file('extdata', 'ref_variables.csv', package = 'dartR'),interactive_vars = FALSE)## Not run: #uncomment to run res_sim <- gl.sim.WF.run(file_var = system.file('extdata', 'sim_variables.csv', package ='dartR'),ref_table=ref_table,interactive_vars = FALSE)## End(Not run)

Creates a dispersal file as input for the function gl.sim.WF.run

Description

This function writes a csv file called "dispersal_table.csv" which containsthe dispersal variables for each pair of populations to be used as input forthe functiongl.sim.WF.run.

The values of the variables can be modified using the columns"transfer_each_gen" and "number_transfers" of this file.

See documentation and tutorial for a complete description of the simulations.These documents can be accessed by typing in the R console:browseVignettes(package="dartR”)

Usage

gl.sim.create_dispersal(  number_pops,  dispersal_type = "all_connected",  number_transfers = 1,  transfer_each_gen = 1,  outpath = tempdir(),  outfile = "dispersal_table.csv",  verbose = NULL)

Arguments

number_pops

Number of populations [required].

dispersal_type

One of: "all_connected", "circle" or "line"[default "all_connected"].

number_transfers

Number of dispersing individuals. This value can be .modified by hand after the file has been created [default 1].

transfer_each_gen

Interval of number of generations in which dispersaloccur. This value can be modified by hand after the file has been created[default 1].

outpath

Path where to save the output file. Use outpath=getwd() oroutpath='.' when calling this function to direct output files to your workingdirectory [default tempdir(), mandated by CRAN].

outfile

File name of the output file [default 'dispersal_table.csv'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A csv file containing the dispersal variables for each pair ofpopulations to be used as input for the functiongl.sim.WF.run.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

gl.sim.create_dispersal(number_pops=10)

Simulates emigration between populations

Description

A function that allows to exchange individuals of populations within agenlight object (=simulate emigration between populations).

Usage

gl.sim.emigration(x, perc.mig = NULL, emi.m = NULL, emi.table = NULL)

Arguments

x

A genlight or list of genlight objects [required].

perc.mig

Percentage of individuals that migrate(emigrates = nInd times perc.mig) [default NULL].

emi.m

Probabilistic emigration matrix (emigrate from=column to=row)[default NULL]

emi.table

If presented emi.m matrix is ignored. Deterministicemigration as specified in the matrix (a square matrix of dimension of thenumber of populations). e.g. an entry in the 'emi.table[2,1]<- 5' means thatfive individuals emigrate from population 1 to population 2 (from=columns andto=row) [default NULL].

Details

There are two ways to specify emigration. If an emi.table is provided (asquare matrix of dimension of the populations that specifies the emigrationfrom column x to row y), then emigration is deterministic in terms of numbersof individuals as specified in the table. If perc.mig and emi.m are provided,then emigration is probabilistic. The number of emigrants is determined bythe population size times the perc.mig and then the population where tomigrate to is taken from the relative probability in the columns of the emi.mtable.

Be aware if the diagonal is non zero then migration can occur into the samepatch. So most often you want to set the diagonal of the emi.m matrix tozero. Which individuals is moved is random, but the order is in the order ofpopulations. It is possible that an individual moves twice within anemigration call(as there is no check, so an individual moved from population1 to 2 can move again from population 2 to 3).

Value

A list or a single [depends on the input] genlight object, whereemigration between population has happened

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

x <- possums.gl#one individual moves from every population to#every other populationemi.tab <- matrix(1, nrow=nPop(x), ncol=nPop(x))diag(emi.tab)<- 0np <- gl.sim.emigration(x, emi.table=emi.tab)np

Simulates individuals based on the allele frequencies provided via a genlightobject.

Description

This function simulates individuals based on the allele frequencies of agenlight object. The output is a genlight object with the same number of locias the input genlight object.

Usage

gl.sim.ind(x, n = 50, popname = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

n

Number of individuals that should be simulated [default 50].

popname

A population name for the simulated individuals [default NULL].

Details

The function can be used to simulate populations for sampling designs or forpower analysis. Check the example below where the effect of drift isexplored, by simply simulating several generation a genlight object and putting in the allele frequencies of the previous generation. The beauty ofthe function is, that it is lightning fast. Be aware this is a simulation and to avoid lengthy error checking the function crashes if there are loci that have just NAs. If such a case can occur during your simulation, thoseloci need to be removed, before the function is called.

Value

A genlight object with n individuals.

Author(s)

Bernd Gruber (bernd.gruber@canberra.edu.au)

Examples

glsim <- gl.sim.ind(testset.gl, n=10, popname='sims')glsim###Simulate drift over 10 generation# assuming a bottleneck of only 10 individuals# [ignoring effect of mating and mutation]# Simulate 20 individuals with no structure and 50 SNP locifounder <- glSim(n.ind = 20, n.snp.nonstruc = 50, ploidy=2)#number of fixed loci in the first generationres <- sum(colMeans(as.matrix(founder), na.rm=TRUE) %%2 ==0)simgl <- founder#49 generations of only 10 individualsfor (i in 2:50){   simgl <- gl.sim.ind(simgl, n=10, popname='sims')   res[i]<- sum(colMeans(as.matrix(simgl), na.rm=TRUE) %%2 ==0)}plot(1:50, res, type='b', xlab='generation', ylab='# fixed loci')

Simulates mutations within a genlight object

Description

This script is intended to be used within the simulation framework of dartR. It adds the ability to add a constant mutation rate across all loci. Only works currently for biallelic data sets (SNPs). Mutation rate is checking for all alleles position and mutations at loci with missing values are ignored and in principle 'double mutations' at the same loci can occur, but should be rare.

Usage

gl.sim.mutate(x, mut.rate = 1e-06)

Arguments

x

Name of the genlight object containing the SNP data [required].

mut.rate

Constant mutation rate over nInd*nLoc*2 possible locations[default 1e-6]

Value

Returns a genlight object with the applied mutations

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

b2 <- gl.sim.mutate(bandicoot.gl,mut.rate=1e-4 )#check the mutations that have occurredtable(as.matrix(bandicoot.gl), as.matrix(b2))

Simulates a specified number of offspring based on alleles provided bypotential father(s) and mother(s)

Description

This takes a population (or a single individual) of fathers (provided as agenlight object) and mother(s) and simulates offspring based on 'random'mating. It can be used to simulate population dynamics and check the effectof those dynamics and allele frequencies, number of alleles. Anotherapplication is to simulate relatedness of siblings and compare it to actualrelatedness found in the population to determine kinship.

Usage

gl.sim.offspring(fathers, mothers, noffpermother, sexratio = 0.5)

Arguments

fathers

Genlight object of potential fathers [required].

mothers

Genlight object of potential mothers simulated [required].

noffpermother

Number of offspring per mother [required].

sexratio

The sex ratio of simulated offspring (females / females +males, 1 equals 100 percent females) [default 0.5.].

Value

A genlight object with n individuals.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#Simulate 10 potential fathersgl.fathers <- glSim(10, 20, ploidy=2)#Simulate 10 potential mothersgl.mothers <- glSim(10, 20, ploidy=2)gl.sim.offspring(gl.fathers, gl.mothers, 2, sexratio=0.5)

Smear plot of SNP or presence/absence (SilicoDArT) data

Description

Each locus is color coded for scores of 0, 1, 2 and NA for SNP data and 0, 1and NA for presence/absence (SilicoDArT) data. Individual labels can be addedand individuals can be grouped by population.

Plot may become cluttered if ind_labels If there are too many individuals, it is best to use ind_labels_size = 0.

Usage

gl.smearplot(  x,  ind_labels = FALSE,  group_pop = FALSE,  ind_labels_size = 10,  plot_colors = NULL,  posi = "bottom",  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

ind_labels

If TRUE, individuals are labelled with indNames(x) [default FALSE].

group_pop

If ind_labels is TRUE, group by population [default TRUE].

ind_labels_size

Size of the individual labels [default 10].

plot_colors

Vector with four color names for homozygotes for thereference allele, heterozygotes, homozygotes for the alternative allele andfor missing values (NA), e.g. four_colours [default NULL].Can be set to "hetonly", which defines colors to only show heterozygotes in the genlight object

posi

Position of the legend: “left”, “top”, “right”, “bottom” or'none' [default = 'bottom'].

save2tmp

If TRUE, saves plot to the session temporary directory(tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default NULL].

Value

Returns unaltered genlight object

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

gl.smearplot(testset.gl,ind_labels=FALSE)gl.smearplot(testset.gs[1:10,],ind_labels=TRUE)

re-sorts genlight objects

Description

Often it is desirable to have the genlight object sorted individuals by population names, indiviual name, for example to have a more informative gl.smearplot (showing banding patterns for populations). Also sorting by loci can be informative in some instances. This function provides the ability to sort individuals of a genlight object by providing the order of individuals or populations and also by loci metric providing the order of locis. See examples below for specifics.

Usage

gl.sort(x, sort.by = "pop", order.by = NULL, verbose = NULL)

Arguments

x

genlight object containing SNP/silicodart genotypes

sort.by

either "ind", "pop". Default is pop

order.by

that is used to order individuals or loci. Depening on the order.by parameter, this needs to be a vector of length of nPop(genlight) for populations or nInd(genlight) for individuals. If not specified alphabetical order of populations or individuals is used. For sort.by="ind" order.by can be also a vector specifying the order for each individual (for example another ind.metrics)

verbose

set verbosity

Details

This is convenience function to facilitate sorting of individuals within the genlight object. For example if you want to visualise the "band" of population in a gl.smearplot then the order of individuals is important. Also

Value

Returns a reordered genlight object. Sorts also the ind/loc.metrics and coordinates accordingly

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#sort by populationsbc <- gl.sort(bandicoot.gl)#sort from West to Eastbc2 <- gl.sort(bandicoot.gl, sort.by="pop" ,order.by=c("WA", "SA", "VIC", "NSW", "QLD"))#sort by missing valuesmiss <- rowSums(is.na(as.matrix(bandicoot.gl)))bc3 <- gl.sort(bandicoot.gl, sort.by="ind", order.by=miss)gl.smearplot(bc3)

Spatial autocorrelation following Smouse and Peakall 1999

Description

Global spatial autocorrelation is a multivariate approachcombining all loci into a single analysis. The autocorrelation coefficient"r" is calculated for each pair of individuals in each specified distanceclass. For more information see Smouse and Peakall 1999, Peakall et al. 2003and Smouse et al. 2008.

Usage

gl.spatial.autoCorr(  x = NULL,  Dgeo = NULL,  Dgen = NULL,  coordinates = "latlon",  Dgen_method = "Euclidean",  Dgeo_trans = "Dgeo",  Dgen_trans = "Dgen",  bins = 5,  reps = 100,  plot.pops.together = FALSE,  permutation = TRUE,  bootstrap = TRUE,  plot_theme = NULL,  plot_colors_pop = NULL,  CI_color = "red",  plot.out = TRUE,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

Genlight object [default NULL].

Dgeo

Geographic distance matrix if no genlight object is provided.This is typically an Euclidean distance but it can be any meaningful (geographical) distance metrics [default NULL].

Dgen

Genetic distance matrix if no genlight object is provided[default NULL].

coordinates

Can be either 'latlon', 'xy' or a two column data.framewith column names 'lat','lon', 'x', 'y') Coordinates are provided viagl@other$latlon ['latlon'] or viagl@other$xy ['xy']. If latlondata will be projected to meters using Mercator system [google maps] or ifxy then distance is directly calculated on the coordinates [default "latlon"].

Dgen_method

Method to calculate genetic distances. See details[default "Euclidean"].

Dgeo_trans

Transformation to be used on the geographic distances. SeeDgen_trans [default "Dgeo"].

Dgen_trans

bins

The number of bins for the distance classes(i.e.length(bins) == 1) or a vectors with the break points. See details [default 5].

reps

The number to be used for permutation and bootstrap analyses[default 100].

plot.pops.together

Plot all the populations in one plot. Confidence intervals from permutations are not shown [default FALSE].

permutation

Whether permutation calculations for the null hypothesis of no spatial structure should be carried out [default TRUE].

bootstrap

Whether bootstrap calculations to compute the 95% confidence intervals around r should be carried out [default TRUE].

plot_theme

Theme for the plot. See details [default NULL].

plot_colors_pop

A color palette for populations or a list withas many colors as there are populations in the dataset [default NULL].

CI_color

Color for the shade of the 95% confidence intervals around the r estimates [default "red"].

plot.out

Specify if plot is to be produced [default TRUE].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [defaultNULL, unless specified using gl.set.verbosity].

Details

This function executes a modified versionofspautocorr from the packagePopGenReport. Differently fromPopGenReport, this function also computes the 95% confidence intervals around the r via bootstraps, the 95null hypothesis of no spatial structure and the one-tail test via permutation, and the correction factor described by Peakall et al 2003.

The input can be i) a genlight object (which has to have the latlon slot populated), ii) a pair ofDgeo andDgen, which have to beeithermatrix ordist objects, or iii) alist of thematrix ordist objects if the analysis needs to be carried out for multiple populations (in this case, all the elements of thelist have to be of the same class (i.e.matrix ordist) and the population order in the two lists has to be the same.

If the input is a genlight object, the function calculates the linear distanceforDgeo and the relevantDgen matrix (seeDgen_method) for each population. When the method selected is a genetic similarity matrix (e.g. "simple" distance), the matrix is internally transformed with1 - Dgen so that positive values of autocorrelation coefficients indicates more related individuals similarly as implemented in GenAlEx. If the user provide the distance matrices, care must be taken in interpreting the results becausesimilarity matrix will generate negative values for closely related individuals.

Ifmax(Dgeo)>1000 (e.g. the geographic distances are in thousands of metres), values are divided by 1000 (in the example before these would then become km) to facilitate readability of the plots.

Ifbins is of length = 1 it is interpreted as the number of (even)bins to use. In this case the starting point is always the minimum value in the distance matrix, and the last is the maximum. If it is a numeric vector of length>1, it is interpreted as the breaking points. In this case, the first has to be the lowest value, and the last has to be the highest. There are no internal checks for this and it is user responsibility to ensure thatdistance classes are properly set up. If that is not the case, data that falloutside the range provided will be dropped. The number of bins will belength(bins) - 1.

The permutation constructs the 95% confidence intervals around the nullhypothesis of no spatial structure (this is a two-tail test). The same dataare also used to calculate the probability of the one-tail test (See references below for details).

Bootstrap calculations are skipped andNA is returned when the number of possible combinations given the sample size of any given distance class is<reps.

Methods available to calculate genetic distances for SNP data:

"propShared" using the functiongl.propShared.
"grm" using the functiongl.grm.
"Euclidean" using the functiongl.dist.ind.
"Simple" using the functiongl.dist.ind.
"Absolute" using the functiongl.dist.ind.
"Manhattan" using the functiongl.dist.ind.

Methods available to calculate genetic distances for SilicoDArT data:

"Euclidean" using the functiongl.dist.ind.
"Simple" using the functiongl.dist.ind.
"Jaccard" using the functiongl.dist.ind.
"Bray-Curtis" using the functiongl.dist.ind.

Examples of other themes that can be used can be consulted in

Value

Returns a data frame with the following columns:

Bin The distance classes
N The number of pairwise comparisons within each distance class
r.uc The uncorrected autocorrelation coefficient
Correction the correction
r The corrected autocorrelation coefficient
L.r The corrected autocorrelation coefficient lower limit(ifbootstap = TRUE)
U.r The corrected autocorrelation coefficient upper limit(ifbootstap = TRUE)
L.r.null.uc The uncorrected lower limit for the null hypothesis of no spatial autocorrelation (ifpermutation = TRUE)
U.r.null.uc The uncorrected upper limit for the null hypothesis of no spatial autocorrelation (ifpermutation = TRUE)
L.r.null The corrected lower limit for the null hypothesis of no spatial autocorrelation (ifpermutation = TRUE)
U.r.null The corrected upper limit for the null hypothesis of no spatial autocorrelation (ifpermutation = TRUE)
p.one.tail The p value of the one tail statistical test

Author(s)

Carlo Pacioni, Bernd Gruber & Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Smouse PE, Peakall R. 1999. Spatial autocorrelation analysis ofindividual multiallele and multilocus genetic structure. Heredity 82:561-573.
Double, MC, et al. 2005. Dispersal, philopatry and infidelity: dissecting local genetic structure in superb fairy-wrens (Malurus cyaneus). Evolution 59, 625-635.
Peakall, R, et al. 2003. Spatial autocorrelation analysis offers newinsights into gene flow in the Australian bush rat, Rattus fuscipes.Evolution 57, 1182-1195.
Smouse, PE, et al. 2008. A heterogeneity test for fine-scale geneticstructure. Molecular Ecology 17, 3389-3400.
Gonzales, E, et al. 2010. The impact of landscape disturbance on spatial genetic structure in the Guanacaste tree, Enterolobiumcyclocarpum (Fabaceae). Journal of Heredity 101, 133-143.
Beck, N, et al. 2008. Social constraint and an absence of sex-biaseddispersal drive fine-scale genetic structure in white-winged choughs.Molecular Ecology 17, 4346-4358.

Examples

require("dartR.data")res <- gl.spatial.autoCorr(platypus.gl, bins=seq(0,10000,2000))# using one population, showing sample sizetest <- gl.keep.pop(platypus.gl,pop.list = "TENTERFIELD")res <- gl.spatial.autoCorr(test, bins=seq(0,10000,2000),CI_color = "green")test <- gl.keep.pop(platypus.gl,pop.list = "TENTERFIELD")res <- gl.spatial.autoCorr(test, bins=seq(0,10000,2000),CI_color = "green")

Subsamples n loci from a genlight object and return it as a genlight object

Description

This is a support script, to subsample a genlight {adegenet} object basedon loci. Two methods are used to subsample, random and based on informationcontent.

Usage

gl.subsample.loci(x, n, method = "random", mono.rm = FALSE, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

n

Number of loci to include in the subsample [required].

method

Method: 'random', in which case the loci are sampled at random;or 'pic', in which case the top n loci ranked on information content arechosen. Information content is stored in AvgPIC in the case of SNP data and inPIC in the the case of presence/absence (SilicoDArT) data [default 'random'].

mono.rm

Delete monomorphic loci before sampling [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

A genlight object with n loci

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  gl2 <- gl.subsample.loci(testset.gl, n=200, method='pic')# Tag P/A data  gl2 <- gl.subsample.loci(testset.gl, n=100, method='random')

Tests the difference in heterozygosity between populations takenpairwise

Description

Calculates heterozygosities (expected or observed) for each population in a genlight object, and uses re-randomization to test the statistical significance of differences in heterozygosity between populations takenpairwise.

Usage

gl.test.heterozygosity(  x,  nreps = 100,  alpha1 = 0.05,  alpha2 = 0.01,  test_het = "He",  plot.out = TRUE,  max_plots = 6,  plot_theme = theme_dartR(),  plot_colors = two_colors,  save2tmp = FALSE,  verbose = NULL)

Arguments

x

A genlight object containing the SNP genotypes [required].

nreps

Number of replications of the re-randomization [default 1,000].

alpha1

First significance level for comparison with diff=0 on plot[default 0.05].

alpha2

Second significance level for comparison with diff=0 on plot[default 0.01].

test_het

Whether to test difference using observed heterozygosity("Ho") or expected heterozygosity ("He") [default "He"].

plot.out

If TRUE, plots a sampling distribution of the differences foreach comparison [default TRUE].

max_plots

Maximum number of plots to print per page [default 6].

plot_theme

Theme for the plot. See Details for options[default theme_dartR()].

plot_colors

List of two color names for the borders and fill of theplots [default two_colors].

save2tmp

If TRUE, saves any ggplots and listings to the sessiontemporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

Function's output

If plot.out = TRUE, plots are created showing the sampling distribution forthe difference between each pair of heterozygosities, marked with thecritical limits alpha1 and alpha2, the observed heterozygosity, and the zerovalue (if in range).

Plots and table are saved to the temporal directory (tempdir) and can beaccessed with the functiongl.print.reports and listed with thefunctiongl.list.reports. Note that they can be accessed onlyin the current R session because tempdir is cleared each time that the Rsession is closed.

Examples of other themes that can be used can be consulted in

Value

A dataframe containing population labels, heterozygosities and samplesizes

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

out <- gl.test.heterozygosity(platypus.gl, nreps=1, verbose=3, plot.out=TRUE)

Outputs an nj tree to summarize genetic similarity among populations

Description

This function is a wrapper for the nj function or package ape applied to Euclideandistances calculated from the genlight object.

Usage

gl.tree.nj(  x,  d_mat = NULL,  type = "phylogram",  outgroup = NULL,  labelsize = 0.7,  treefile = NULL,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

d_mat

Distance matrix [default NULL].

type

Type of dendrogram "phylogram"|"cladogram"|"fan"|"unrooted"[default "phylogram"].

outgroup

Vector containing the population names that are the outgroups[default NULL].

labelsize

Size of the labels as a proportion of the graphics default[default 0.7].

treefile

Name of the file for the tree topology using Newick format [default NULL].

verbose

Specify the level of verbosity: 0, silent, fatal errors only; 1, flag function begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2].

Details

An euclidean distance matrix is calculated by default [d_mat = NULL]. Optionally the user can use as input for the tree any other distance matrixusing this parameter, see for example the functiongl.dist.pop.

Value

A tree file of class phylo.

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

 # SNP data  gl.tree.nj(testset.gl,type='fan')# Tag P/A data  gl.tree.nj(testset.gs,type='fan')    res <- gl.tree.nj(platypus.gl)

Writes out data from a genlight object to csv file

Description

This script writes to file the SNP genotypes with specimens as entities(columns) and loci as attributes (rows). Each row has associated locusmetadata. Each column, with header of specimen id, has population in thefirst row.

The data coding differs from the DArT 1row format in that 0 = referencehomozygous, 2 = alternate homozygous, 1 = heterozygous, and NA = missing SNPassignment.

Usage

gl.write.csv(x, outfile = "outfile.csv", outpath = tempdir(), verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default "outfile.csv"].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end;2, progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

Saves a genlight object to csv, returns NULL.

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

# SNP data  gl.write.csv(testset.gl, outfile='SNP_1row.csv')# Tag P/A data  gl.write.csv(testset.gs, outfile='PA_1row.csv')

Converts a genlight object into a format suitable for input to Bayescan

Description

The output text file contains the SNP data and relevant BAyescan commandlines to guide input.

Usage

gl2bayescan(x, outfile = "bayescan.txt", outpath = tempdir(), verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default bayescan.txt].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Foll M and OE Gaggiotti (2008) A genome scan method to identify selected lociappropriate for both dominant and codominant markers: A Bayesianperspective. Genetics 180: 977-993.

Examples

out <- gl2bayescan(testset.gl)

Converts a genlight object into a format suitable for input to the BPP program

Description

This function generates the sequence alignment file and the Imap file. The control file should produced by the user.

Usage

gl2bpp(  x,  method = 1,  outfile = "output_bpp.txt",  imap = "Imap.txt",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

method

One of 1 | 2, see details [default = 1].

outfile

Name of the sequence alignment file ["output_bpp.txt"].

imap

Name of the Imap file ["Imap.txt"].

outpath

Path where to save the output file (set to tempdir by default)

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Details

If method = 1, heterozygous positions are replaced by standard ambiguity codes.

If method = 2, the heterozygous state is resolved by randomly assigning one or the other SNP variant to the individual.

Trimmed sequences for which the SNP has been trimmed out, rarely, by adaptermis-identity are deleted.

This function requires 'TrimmedSequence' to be among the locus metrics(@other$loc.metrics) and information of the type of alleles (slotloc.all e.g. 'G/A') and the position of the SNP in slot position of the“'genlight“' object (see testset.gl@position and testset.gl@loc.all forhow to format these slots.)

It's important to keep in mind that analyses based on coalescent theory, like those done by the programme BPP, are meant to be used with sequencedata. In this type of data, large chunks of DNA are sequenced, so when wefind polymorphic sites along the sequence, we know they are all on the samechromosome. This kind of data, in which we know which chromosome each allele comes from, is called "phased data." Most data from reduced representation genome-sequencing methods, like DArTseq, is unphased, which means that we don't know which chromosome each allele comes from. So, if we apply coalescence theory to data that is not phased, we will getbiased results. As in Ellegren et al., one way to deal with this is to "haplodize" each genotype by randomly choosing one allele from heterozygous genotypes (2012) by using method = 2.

Be mindful that there is little information in the literature on thevalidity of this method.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Ellegren, Hans, et al. "The genomic landscape of species divergence in Ficedula flycatchers." Nature 491.7426 (2012): 756-760.
Flouri T., Jiao X., Rannala B., Yang Z. (2018) Species Tree Inference with BPP using Genomic Sequences and the Multispecies Coalescent. MolecularBiology and Evolution, 35(10):2585-2593. doi:10.1093/molbev/msy147

Examples

require(dartR.data)test <- platypus.gltest <- gl.filter.callrate(test,threshold = 1)test <- gl.filter.monomorphs(test)test <- gl.subsample.loci(test,n=25)gl2bpp(x = test)

Convert a genlight object to a dartR object

Description

This function converts a 'genlight' object into a 'dartR' object by changing its class attribute.It is used to convert legacy data sets to the new dartR format.

Usage

gl2dartR(x, filename = NULL, file.path = tempdir())

Arguments

x

An object of class 'genlight' to be converted.

filename

A character string specifying the name of the file to save the converted object. [default is gl.rds]

file.path

A character string specifying the path to save the file.

Value

The input object with class changed to '"dartR"' and its package attribute set to '"dartR.base"'.

Examples

simgl <- glSim(10, 100, ploidy = 2, indnames=1:10, locnames=1:100)  # Simulating a genlight objectsimgl <- gl2dartR(simgl)pop(simgl)<- rep("A",10)indNames(simgl) <- paste0("ind",1:10)gl.smearplot(simgl, verbose=0)

Creates a dataframe suitable for input to package {Demerelate} from agenlight {adegenet} object

Description

Creates a dataframe suitable for input to package {Demerelate} from agenlight {adegenet} object

Usage

gl2demerelate(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]

Value

A dataframe suitable as input to package {Demerelate}

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

df <- gl2demerelate(testset.gl)

Converts a genlight object into eigenstrat format

Description

The output of this function are three files:

genotype file: contains genotype data for each individual at each SNPwith an extension 'eigenstratgeno.'
snp file: contains information about each SNP with an extension 'snp.'
indiv file: contains information about each individual with anextension 'ind.'

Usage

gl2eigenstrat(  x,  outfile = "gl_eigenstrat",  outpath = tempdir(),  snp_pos = 1,  snp_chr = 1,  pos_cM = 0,  sex_code = "unknown",  phen_value = "Case",  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file [default 'gl_eigenstrat'].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

snp_pos

Field name from the slot loc.metrics where the SNP position isstored [default 1].

snp_chr

Field name from the slot loc.metrics where the chromosome ofeach is stored [default 1].

pos_cM

A vector, with as many elements as there are loci, containingthe SNP position in morgans or centimorgans [default 1].

sex_code

A vector, with as many elements as there are individuals,containing the sex code ('male', 'female', 'unknown') [default 'unknown'].

phen_value

A vector, with as many elements as there are individuals,containing the phenotype value ('Case', 'Control') [default 'Case'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

Eigenstrat only accepts chromosomes coded as numeric values, as follows:X chromosome is encoded as 23, Y is encoded as 24, mtDNA is encoded as90, and XY is encoded as 91. SNPs with illegal chromosome values, suchas 0, will be removed.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Patterson, N., Price, A. L., & Reich, D. (2006). Population structureand eigenanalysis. PLoS genetics, 2(12), e190.
Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E.,Shadick, N. A., & Reich, D. (2006). Principal components analysis correctsfor stratification in genome-wide association studies. Nature genetics,38(8), 904-909.

Examples

require("dartR.data")gl2eigenstrat(platypus.gl,snp_pos='ChromPos_Platypus_Chrom_NCBIv1',snp_chr = 'Chrom_Platypus_Chrom_NCBIv1')

Concatenates DArT trimmed sequences and outputs a FASTA file

Description

Concatenated sequence tags are useful for phylogenetic methods whereinformation on base frequencies and transition and transversion ratios arerequired (for example, Maximum Likelihood methods). Where relevant,heterozygous loci are resolved before concatenation by either assigningambiguity codes or by random allele assignment.

Usage

gl2fasta(  x,  method = 1,  trimmed.sequence = TRUE,  outfile = "output.fasta",  outpath = tempdir(),  probar = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

method

One of 1 | 2 | 3 | 4. Type method=0 for a list of options [method=1].

trimmed.sequence

Include Trimmedsequence. If FALSE, only method 3 and 4 are available [default = TRUE].

outfile

Name of the output file (fasta format) ["output.fasta"].

outpath

Path where to save the output file (set to tempdir by default)

probar

If TRUE, a progress bar will be displayed for long loops [default = TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Details

Four methods are employed:

Method 1 – heterozygous positions are replaced by the standard ambiguity codes. The resultant sequence fragments are concatenated across loci togenerate a single combined sequence to be used in subsequent ML phylogeneticanalyses.

Method 2 – the heterozygous state is resolved by randomly assigning one orthe other SNP variant to the individual. The resultant sequence fragments areconcatenated across loci to generate a single composite haplotype to be usedin subsequent ML phylogenetic analyses.

Method 3 – heterozygous positions are replaced by the standard ambiguitycodes. The resultant SNP bases are concatenated across loci to generate asingle combined sequence to be used in subsequent MP phylogenetic analyses.

Method 4 – the heterozygous state is resolved by randomly assigning one orthe other SNP variant to the individual. The resultant SNP bases areconcatenated across loci to generate a single composite haplotype to be usedin subsequent MP phylogenetic analyses.

Trimmed sequences for which the SNP has been trimmed out, rarely, by adaptermis-identity are deleted.

The script writes out the composite haplotypes for each individual as afastA file. Requires 'TrimmedSequence' to be among the locus metrics(@other$loc.metrics) and information of the type of alleles (slotloc.all e.g. 'G/A') and the position of the SNP in slot position of the“'genlight“' object (see testset.gl@position and testset.gl@loc.all forhow to format these slots.)

When trimmed.sequence = FALSE, loci that are not SNPs are removed.

Value

A new gl object with all loci rendered homozygous.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

 gl <- gl.filter.reproducibility(testset.gl,t=1)gl <- gl.filter.overshoot(gl,verbose=3)gl <- gl.filter.callrate(testset.gl,t=.98)gl <- gl.filter.monomorphs(gl)gl2fasta(gl, method=1, outfile='test.fasta',verbose=3)test <- gl.subsample.loci(platypus.gl,n=100)gl2fasta(test)

Converts a genlight object into faststructure format (to run faststructureelsewhere)

Description

Recodes in the quite specific faststructure format (e.g first six columnsneed to be there, but are ignored...check faststructure documentation(if you find any :-( )))

Usage

gl2faststructure(  x,  outfile = "gl.str",  outpath = tempdir(),  probar = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default "gl.str"].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

probar

Switch to show/hide progress bar [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

The script writes out the a file in faststructure format.

Value

returns no value (i.e. NULL)

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Converts a genlight object into gds format

Description

Package SNPRelate relies on a bit-level representation of a SNP dataset thatcompetes with {adegenet} genlight objects and associated files. Thisfunction converts a genlight object to a gds format file.

Usage

gl2gds(  x,  outfile = "gl_gds.gds",  outpath = tempdir(),  snp_pos = "0",  snp_chr = "0",  chr_format = "character",  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default 'gl_gds.gds'].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

snp_pos

Field name from the slot loc.metrics where the SNP position isstored [default '0'].

snp_chr

Field name from the slot loc.metrics where the chromosome ofeach is stored [default '0'].

chr_format

Whether chromosome information is stored as 'numeric' or as'character', see details [default 'character'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

This function orders the SNPS by chromosome and by position before convertingto SNPRelate format, as required by this package.

The chromosome of each SNP can be a character or numeric, as described in thevignette of SNPRelate:'snp.chromosome, an integer or character mapping for each chromosome.Integer: numeric values 1-26, mapped in order from 1-22, 23=X, 24=XY(the pseudoautosomal region), 25=Y, 26=M (the mitochondrial probes), and 0for probes with unknown positions; it does not allow NA. Character: “X”,“XY”, “Y” and “M” can be used here, and a blank string indicating unknownposition.'

When using some functions from package SNPRelate with datasets other thanhumans it might be necessary to use the option autosome.only=FALSE to avoiddetecting chromosome coding. So, it is important to read the documentation ofthe function before using it.

The chromosome information for unmapped SNPS is coded as 0, as required bySNPRelate.

Remember to close the GDS file before working in a different GDS object withthe functionsnpgdsClose (package SNPRelate).

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

require("dartR.data")gl2gds(platypus.gl,snp_pos='ChromPos_Platypus_Chrom_NCBIv1',snp_chr = 'Chrom_Platypus_Chrom_NCBIv1')

Converts a genlight object into a format suitable for input to genalex

Description

The output csv file contains the snp data and other relevant lines suitablefor genalex. This script is a wrapper forgenind2genalex(package poppr).

Usage

gl2genalex(  x,  outfile = "genalex.csv",  outpath = tempdir(),  overwrite = FALSE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default 'genalex.csv'].

outpath

Path where to save the output file [default tempdir()].

overwrite

If FALSE and filename exists, then the file will not beoverwritten. Set this option to TRUE to overwrite the file [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end;2, progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos, Author: Katrin Hohwieler, wrapper ArthurGeorges (Post tohttps://groups.google.com/d/forum/dartr)

References

Peakall, R. and Smouse P.E. (2012) GenAlEx 6.5: genetic analysisin Excel. Population genetic software for teaching and research-an update.Bioinformatics 28, 2537-2539.http://bioinformatics.oxfordjournals.org/content/28/19/2537

Examples

gl2genalex(testset.gl, outfile='testset.csv')

Converts a genlight object into genepop format (and file)

Description

The genepop format is used by several external applications (for exampleNeestimator2.So the main idea is to create the genepop file and then run the othersoftware externally. As a feature, the genepop file is also returned as aninvisible data.frame by the function.

Usage

gl2genepop(  x,  outfile = "genepop.gen",  outpath = tempdir(),  pop_order = "alphabetic",  output_format = "2_digits",  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file [default 'genepop.gen'].

outpath

Path where to save the output file. Use outpath=getwd() oroutpath='.' when calling this function to direct output files to your workingdirectory [default tempdir(), mandated by CRAN].

pop_order

Order of the output populations either "alphabetic" or a vector of population names in the order required by the user (see examples)[default "alphabetic"].

output_format

Whether to use a 2-digit format ("2_digits") or 3-digitsformat ("3_digits") [default "2_digits"].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

Invisible data frame in genepop format

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

## Not run: require("dartR.data")# SNP datageno <- gl2genepop(testset.gl[1:3,1:9])head(geno)test <- gl.filter.callrate(platypus.gl,threshold = 1)popNames(test)gl2genepop(test, pop_order = c("TENTERFIELD","SEVERN_ABOVE","SEVERN_BELOW"),           output_format="3_digits")## End(Not run)

Converts a genlight object to geno format from package LEA

Description

The function converts a genlight object (SNP or presence/absencei.e. SilicoDArT data) into a file in the 'geno' and the 'lfmm' formats from (package LEA).

Usage

gl2geno(x, outfile = "gl_geno", outpath = tempdir(), verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

outfile

File name of the output file [default 'gl_geno'].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

# SNP datagl2geno(testset.gl)# Tag P/A datagl2geno(testset.gs)

Converts a genlight object to genind object

Description

Converts a genlight object to genind object

Usage

gl2gi(x, probar = FALSE, verbose = NULL)

Arguments

x

A genlight object [required].

probar

If TRUE, a progress bar will be displayed for long loops[default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

This function uses a faster version of df2genind (from the adegenetpackage)

Value

A genind object, with all slots filled.

Author(s)

Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Converts a genlight objects into hiphop format

Description

This function exports genlight objects to the format used by the parentageassignment R package hiphop. Hiphop can be used for paternity and maternityassignment and outperforms conventional methods where closely relatedindividuals occur in the pool of possible parents. The method compares thegenotypes of offspring with any combination of potentials parents and scoresthe number of mismatches of these individuals at bi-allelic genetic markers(e.g. Single Nucleotide Polymorphisms).

Usage

gl2hiphop(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

Dataframe containing all the genotyped individuals (offspring andpotential parents) and their genotypes scored using bi-allelic markers.

Author(s)

Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Cockburn, A., Penalba, J.V.,Jaccoud, D.,Kilian, A., Brouwer, L.,Double, M.C.,Margraf, N., Osmond, H.L., van de Pol, M. and Kruuk, L.E.B.(in revision).HIPHOP: improved paternity assignment among close relatives using a simpleexclusion method for bi-allelic markers. Molecular Ecology Resources, DOI tobe added upon acceptance

Examples

result <- gl2hiphop(testset.gl)

Creates a Phylip input distance matrix from a genlight (SNP) {adegenet}object

Description

This function calculates and returns a matrix of Euclidean distances between populations and produces an input file for the phylogenetic program Phylip (Joe Felsenstein).

Usage

gl2phylip(  x,  outfile = "phyinput.txt",  outpath = tempdir(),  bstrap = 1,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP or presence/absence(SilicoDArT) data [required].

outfile

Name of the file to become the input file for phylip[default "phyinput.txt"].

outpath

Path where to save the output file [default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

bstrap

Number of bootstrap replicates [default 1].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]

Value

Matrix of Euclidean distances between populations.

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Examples

result <- gl2phylip(testset.gl, outfile='test.txt', bstrap=10)

Converts a genlight object into PLINK format

Description

This function exports a genlight object into PLINK format and save it into afile.This function produces the following PLINK files: bed, bim, fam, ped and map.

Usage

gl2plink(  x,  plink_path = getwd(),  bed_file = FALSE,  outfile = "gl_plink",  outpath = tempdir(),  chr_format = "character",  pos_cM = "0",  ID_dad = "0",  ID_mom = "0",  sex_code = "unknown",  phen_value = "0",  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

plink_path

Path of PLINK binary file [default getwd()].

bed_file

Whether create PLINK files .bed, .bim and .fam[default FALSE].

outfile

File name of the output file [default 'gl_plink'].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

chr_format

Whether chromosome information is stored as 'numeric' or as'character', see details [default 'character'].

pos_cM

A vector, with as many elements as there are loci, containingthe SNP position in morgans or centimorgans [default '0'].

ID_dad

A vector, with as many elements as there are individuals,containing the ID of the father, '0' if father isn't in dataset [default '0'].

ID_mom

A vector, with as many elements as there are individuals,containing the ID of the mother, '0' if mother isn't in dataset [default '0'].

sex_code

A vector, with as many elements as there are individuals,containing the sex code ('male', 'female', 'unknown'). Sex information needs just to start with an "F" or "f" for females, with an "M" or "m" for males and with a "U", "u" or being empty if the sex is unknown [default 'unknown'].

phen_value

A vector, with as many elements as there are individuals,containing the phenotype value. '1' = control, '2' = case, '0' = unknown[default '0'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

To create PLINK files .bed, .bim and .fam (bed_file = TRUE), it is necessaryto download the binary file of PLINK 1.9 and provide its path (plink_path).The binary file can be downloaded from:https://www.cog-genomics.org/plink/

After downloading, unzip the file, access the unzipped folder and move the binary file ("plink") to your working directory.

If you are using a Mac, you might need to open the binary first to grant access to the binary.

The chromosome of each SNP can be a character or numeric. The chromosomeinformation for unmapped SNPS is coded as 0.Family ID is taken from x$pop.Within-family ID (cannot be '0') is taken from indNames(x).Variant identifier is taken from locNames(x).SNP position is taken from the accessor x$position.Chromosome name is taken from the accessor x$chromosomeNote that if names of populations or individuals contain spaces, they are replaced by an underscore "_".

If you like to use chromosome information when converting to plink format andyour chromosome names are not from human, you need to change the chromosome names as 'contig1', 'contig2', etc. as described in the section "Nonstandardchromosome IDs" in the following link:https://www.cog-genomics.org/plink/1.9/input

Note that the function might not work if there are spaces in the path to theplink executable.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Purcell, Shaun, et al. 'PLINK: a tool set for whole-genome association andpopulation-based linkage analyses.' The American journal of human genetics81.3 (2007): 559-575.

Examples

require("dartR.data")test <- platypus.gl# assigning SNP positiontest$position <- test$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1# assigning a dummy name for chromosomestest$chromosome <- as.factor("1")gl2plink(test)

Converts a genlight object to format suitable to be run with Coancestry

Description

The output txt file contains the SNP data and an additional column with thenames of the individual. The file then can be used and loaded into coancestryor - if installed - run with the related package. Be aware the relatedpackage was crashing in previous versions, but in general is using the samecode as coancestry and therefore should have identical results. Also runningcoancestry with thousands of SNPs via the GUI seems to be not reliable andtherefore for comparisons between coancestry and related we suggest to usethe command line version of coancestry.

Usage

gl2related(  x,  outfile = "related.txt",  outpath = tempdir(),  save = TRUE,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default 'related.txt'].

outpath

Path where to save the output file [default tempdir()].

save

A switch if you want to save the file or not. This might beuseful for someone who wants to use the coancestry function to calculaterelatedness and not export to coancestry. See the example below[default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Value

A data.frame that can be used to run with the related package

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr)

References

Jack Pew, Jinliang Wang, Paul Muir and Tim Frasier (2014).related: related: an R package for analyzing pairwise relatednessdata based on codominant molecular markers.R package version 0.8/r2.https://R-Forge.R-project.org/projects/related/

Examples

gtd <- gl2related(bandicoot.gl[1:10,1:20], save=FALSE)## Not run: ##running with the related package#install.packages('related', repos='http://R-Forge.R-project.org')library(related)coan <- coancestry(gtd, wang=1)head(coan$relatedness)##check ?coancestry for information how to use the function.## End(Not run)

Converts genlight objects to the format used in the SNPassoc package

Description

This function exports a genlight object into a SNPassoc object. See packageSNPassoc for details. This function needs package SNPassoc. At the time ofwriting (August 2020) the package was no longer available from CRAN. Toinstall the package check their github repository.https://github.com/isglobal-brge/SNPassoc and/or useinstall_github('isglobal-brge/SNPassoc') to install the function anduncomment the function code.

Usage

gl2sa(x, verbose = NULL, installed = FALSE)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

installed

Switch to run the function once SNPassoc package is installed [default FALSE].

Value

Returns an object of class 'snp' to be used with SNPassoc.

Author(s)

Bernd Guber (Post tohttps://groups.google.com/d/forum/dartr)

References

Gonzalez, J.R., Armengol, L., Sol?, X., Guin?, E., Mercader, J.M., Estivill,X. and Moreno, V. (2017). SNPassoc: an R package to perform whole genomeassociation studies. Bioinformatics 23:654-655.

Converts a genlight object into a sfs input file

Description

The output of this function is suitable for analysis in fastsimcoal2 or dada.

Usage

gl2sfs(  x,  n.invariant.tags = 0,  outfile_root = "gl2sfs",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

n.invariant.tags

Number of invariant sites[default 0].

outfile_root

The root of the name of the output file [default "gl2sfs"].

outpath

Path where to save the output file [default tempdir()].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

It saves a derived sfs, assuming that the reference allele is the ancestral,and a MAF sfs.

At this stage this function caters only for diploid organisms, for samplesfrom one population only, and for genotypes without missing data. Note thatsfs uses frequencies consideredindependent, data are assumed to befrom independent (i.e. not linked) loci. This means that only one site per tagshould be considered 9i.e. secondaries should be removed). If no monomorphicsite estimates is provided (withn.invariant.tags), the sfs will onlyinclude the number of monomorphic sites in the data (but this will be a biasedestimates as it doesn't take into account the invariant tags that have notbeen included. This will affect parameter estimates in the analyses). Notethat the number of invariant tags can be estimated withgl.report.secondaries. In a limited number of cases, ascertainment biascan be explicitly modelled in fastsimcoal2. See fastsimcoal2 manual fordetails.

It expects a dartR formatted genlight object, but it should also work withother genlight objects.

Value

Deprecated. Please use gl.sfs instead.

Author(s)

Custodian: Carlo Pacioni (Post tohttps://groups.google.com/d/forum/dartr)

References

Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V. C. andFoll M. (2013) Robust demographic inference from genomic and SNP data. PLoSgenetics 9(10)

Converts a genlight object to ESRI shapefiles or kml files

Description

This function exports coordinates in a genlight object to a point shape file(including also individual meta data if available).Coordinates are provided under x@other$latlon and assumed to be in WGS84coordinates, if not proj4 string is provided.

Usage

gl2shp(  x,  type = "shp",  proj4 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs",  outfile = "gl",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data and locationdata, lat longs [required].

type

Type of output 'kml' or 'shp' [default 'shp'].

proj4

Proj4string of data set (see spatialreference.org forprojections) [default WGS84].

outfile

Name (path) of the output shape file [default 'gl']. shpextension is added automatically.

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

returns a SpatVector file

Author(s)

Bernd Guber (Post tohttps://groups.google.com/d/forum/dartr)

Examples

out <- gl2shp(testset.gl)

Converts a genlight object to nexus format suitable for phylogenetic analysisby SNAPP (via BEAUti)

Description

The output nexus file contains the SNP data and relevant PAUP command linessuitable for BEAUti.

Usage

gl2snapp(x, outfile = "snapp.nex", outpath = tempdir(), verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including extension)[default "snapp.nex"].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

References

Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A. andRoyChoudhury, A. (2012). Inferring species trees directly from biallelicgenetic markers: bypassing gene trees in a full coalescent analysis.Molecular Biology and Evolution 29:1917-1932.

Examples

gl2snapp(testset.gl)

Converts a genlight object to STRUCTURE formatted files

Description

This function exports genlight objects to STRUCTURE formatted files (be awarethere is a gl2faststructure version as well). It is based on the codeprovided by Lindsay Clark (seehttps://github.com/lvclark/R_genetics_conv) and this function isbasically a wrapper around her numeric2structure function. See also: LindsayClark. (2017, August 22). lvclark/R_genetics_conv: R_genetics_conv 1.1(Version v1.1). Zenodo: doi.org/10.5281/zenodo.846816.

Usage

gl2structure(  x,  indNames = NULL,  addcolumns = NULL,  ploidy = 2,  exportMarkerNames = TRUE,  outfile = "gl.str",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data and locationdata, lat longs [required].

indNames

Specify individuals names to be added [if NULL, defaults to indNames(x)].

addcolumns

Additional columns to be added before genotypes [default NULL].

ploidy

Set the ploidy [defaults 2].

exportMarkerNames

If TRUE, locus names locNames(x) will be included [default TRUE].

outfile

File name of the output file (including extension) [default "gl.str"].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.' when calling this function to direct output files to your working directory.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

returns no value (i.e. NULL)

Author(s)

Bernd Gruber (wrapper) and Lindsay V. Clark [lvclark@illinois.edu]

Examples

#not run here#gl2structure(testset.gl)

Converts a genlight object to nexus format PAUP SVDquartets

Description

The output nexus file contains the SNP data in one of two forms, dependingupon what you regard as most appropriate. One form, that used by Chifman andKubatko, has two lines per individual, one providing the reference SNP thesecond providing the alternate SNP (method=1).

Usage

gl2svdquartets(  x,  outfile = "svd.nex",  outpath = tempdir(),  method = 2,  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data or tag P/A data[required].

outfile

File name of the output file (including extension)[default 'svd.nex'].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() when calling thisfunction or set.tempdir <- getwd() elsewhere in your script to direct outputfiles to your working directory.

method

Method = 1, nexus file with two lines per individual; method =2, nexus file with one line per individual, ambiguity codes for SNPgenotypes, 0 or 1 for presence/absence data [default 2].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity]

Details

A second form, recommended by Dave Swofford, has a single line perindividual, resolving heterozygous SNPs by replacing them with standardambiguity codes (method=2).

If the data are tag presence/absence, then method=2 is assumed.

Note that the genlight object must contain at least two populations for thisfunction to work.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

References

Chifman, J. and L. Kubatko. 2014. Quartet inference from SNP dataunder the coalescent. Bioinformatics 30: 3317-3324

Examples

gg <- testset.gl[1:20,1:100]gg@other$loc.metrics <- gg@other$loc.metrics[1:100,]gl2svdquartets(gg)

Converts a genlight object to a treemix input file

Description

The output file contains the SNP data in the format expected by treemix –see the treemix manual. The file will be gzipped before in order to berecognised by treemix. Plotting functions provided with treemix will need tobe sourced from the treemix download page.

Usage

gl2treemix(  x,  outfile = "treemix_input.gz",  outpath = tempdir(),  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

outfile

File name of the output file (including gz extension)[default 'treemix_input.gz'].

outpath

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

References

Pickrell and Pritchard (2012). Inference of population splits andmixtures from genome-wide allele frequency data. PLoS Geneticshttps://doi.org/10.1371/journal.pgen.1002967

Examples

gl2treemix(testset.gl, outpath=tempdir())

Converts a genlight object into vcf format

Description

This function exports a genlight object into VCF format and save it into afile.

Usage

gl2vcf(  x,  plink_path = getwd(),  outfile = "gl_vcf",  outpath = tempdir(),  snp_pos = "0",  snp_chr = "0",  chr_format = "character",  pos_cM = "0",  ID_dad = "0",  ID_mom = "0",  sex_code = "unknown",  phen_value = "0",  verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

plink_path

Path of PLINK binary file [default getwd())].

outfile

File name of the output file [default 'gl_vcf'].

outpath

Path where to save the output file[default tempdir(), mandated by CRAN]. Use outpath=getwd() or outpath='.'when calling this function to direct output files to your working directory.

snp_pos

Field name from the slot loc.metrics where the SNP position isstored [default '0'].

snp_chr

Field name from the slot loc.metrics where the chromosome ofeach is stored [default '0'].

chr_format

Whether chromosome information is stored as 'numeric' or as'character', see details [default 'character'].

pos_cM

A vector, with as many elements as there are loci, containingthe SNP position in morgans or centimorgans [default '0'].

ID_dad

A vector, with as many elements as there are individuals,containing the ID of the father, '0' if father isn't in dataset [default '0'].

ID_mom

A vector, with as many elements as there are individuals,containing the ID of the mother, '0' if mother isn't in dataset [default '0'].

sex_code

A vector, with as many elements as there are individuals,containing the sex code ('male', 'female', 'unknown') [default 'unknown'].

phen_value

A vector, with as many elements as there are individuals,containing the phenotype value. '1' = control, '2' = case, '0' = unknown[default '0'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

Details

This function requires to download the binary file of PLINK 1.9 and provideits path (plink_path).The binary file can be downloaded from:https://www.cog-genomics.org/plink/

The chromosome information for unmapped SNPS is coded as 0.Family ID is taken from x$popWithin-family ID (cannot be '0') is taken from indNames(x)Variant identifier is taken from locNames(x)

#' Note that if names of populations or individuals contain spaces, they are replaced by an underscore "_".

Note that the function might not work if there are spaces in the path to theplink executable.

Value

returns no value (i.e. NULL)

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

References

Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M.A., ... & 1000 Genomes Project Analysis Group. (2011). The variant callformat and VCFtools. Bioinformatics, 27(15), 2156-2158.

Examples

## Not run: require("dartR.data")gl2vcf(platypus.gl,snp_pos='ChromPos_Platypus_Chrom_NCBIv1', snp_chr = 'Chrom_Platypus_Chrom_NCBIv1')## End(Not run)

Shiny app for the input of the reference table for the simulations

Description

Shiny app for the input of the reference table for the simulations

Usage

interactive_reference()

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Shiny app for the input of the simulations variables

Description

Shiny app for the input of the simulations variables

Usage

interactive_sim_run()

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Tests if two populations are fixed at a given locus

Description

This script compares two percent allele frequenciesand reports TRUE if they represent a fixed difference, FALSE otherwise.

Usage

is.fixed(s1, s2, tloc = 0)

Arguments

s1

Percentage SNP allele or sequence tag frequency for the first population [required].

s2

Percentage SNP allele or sequence tag frequency for the second population [required].

tloc

Threshold value for tolerance in when a difference is regarded asfixed [default 0].

Details

A fixed difference at a locus occurs when two populations share no alleles,noting that SNPs are biallelic (ploidy=2).Tolerance in the definition of a fixed difference is provided by the tparameter. For example, t=0.05 means that SNP allele frequencies of 95,5 and5,95 percent will be reported as fixed (TRUE).

Value

TRUE (fixed difference) or FALSE (alleles shared) or NA (one or both s1 or s2 missing)

Author(s)

Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

is.fixed(s1=100, s2=0, tloc=0)is.fixed(96, 4, tloc=0.05)

Example data set as text file to be imported into a genlight object

Description

Check ?read.genetable in pacakge PopGenReport for details on the format.

Format

csv

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr

Examples

library(PopGenReport)read.csv( paste(.libPaths()[1],'/dartR/extdata/platy.csv',sep='' ))platy <- read.genetable( paste(.libPaths()[1],'/dartR/extdata/platy.csv',sep='' ), ind=1, pop=2, lat=3, long=4, other.min=5, other.max=6, oneColPerAll=FALSE, sep='/')platy.gl <- gi2gl(platy, parallel=FALSE)df.loc <- data.frame(RepAvg = runif(nLoc(platy.gl)), CallRate = 1)platy.gl@other$loc.metrics <- df.locgl.report.reproducibility(platy.gl)

A simulated genlight object created to run a landscape genetic example

Description

This a test data set to run a landscape genetics example. It contains 10 populations of 30 individuals each and each individual has 300 loci. There are no covariates for individuals or loci.

Usage

possums.gl

Format

genlight object

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr

adjust rbind for dartR

Description

rbind is a bit lazy and does not take care for the metadata (so data in theother slot is lost). You can get most of the loci metadata back usinggl.compliance.check.

Usage

## S3 method for class 'dartR'rbind(...)

Arguments

...

list of dartR objects

Value

A genlight object

Examples

t1 <- platypus.glclass(t1) <- "dartR"t2 <- rbind(t1[1:5,],t1[6:10,])

A genlight object created via the gl.read.dart function

Description

This is a test data set on turtles. 250 individuals, 255 loci in >30 populations.

Usage

testset.gl

Format

genlight object

Author(s)

Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr

A genlight object created via the gl.read.silicodart function

Description

This is a test data set on turtles. 218 individuals, 255 loci in >30 populations.

Usage

testset.gs

Format

genlight object

Author(s)

Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr

Testfile in DArT format (as provided by DArT)

Description

This test data set is provided to show a typical DArT file format. Can be used to create a genlight object using the read.dart function.

Format

csv

Author(s)

Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr

Metadata file. Can be integrated via the dart2genlight function.

Description

Metadata file. Can be integrated via the dart2genlight function.

Format

csv

Author(s)

Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr

Recode file to be used with the function.

Description

This test data set is provided to show a typical recode file format.

Format

csv

Author(s)

Custodian: Arthur Georges (bugs? Post tohttps://groups.google.com/d/forum/dartr

dartR theme

Description

This is the theme used as default for dartR plots.This function controls all non-data display elements in the plots.

Usage

theme_dartR(  base_size = 11,  base_family = "",  base_line_size = base_size/22,  base_rect_size = base_size/22)

Arguments

base_size

base font size, given in pts.

base_family

base font family

base_line_size

base size for line elements

base_rect_size

base size for rect elements

Examples

#ggplot(data.frame(dummy=rnorm(1000)),aes(dummy)) +#geom_histogram(binwidth=0.1) + theme_dartR()

Population assignment probabilities

Description

This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.

Usage

utils.assignment(x, unknown, verbose = 2)

Arguments

x

Name of the genlight object containing the SNP data [required].

unknown

Name of the individual to be assigned to a population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Value

Adata.frame consisting of assignment probabilities for eachpopulation.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")res <- utils.assignment(platypus.gl,unknown="T27")

Population assignment probabilities

Description

This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.

Usage

utils.assignment_2(x, unknown, verbose = 2)

Arguments

x

Name of the genlight object containing the SNP data [required].

unknown

Name of the individual to be assigned to a population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Value

Adata.frame consisting of assignment probabilities for eachpopulation.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")res <- utils.assignment_2(platypus.gl,unknown="T27")

Population assignment probabilities

Description

This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.

Usage

utils.assignment_3(x, unknown, verbose = 2)

Arguments

x

Name of the genlight object containing the SNP data [required].

unknown

Name of the individual to be assigned to a population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Value

Adata.frame consisting of assignment probabilities for eachpopulation.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")res <- utils.assignment_2(platypus.gl,unknown="T27")

Population assignment probabilities

Description

This function takes one individual and estimatestheir probability of coming from individual populationsfrom multilocus genotype frequencies.

Usage

utils.assignment_4(x, unknown, verbose = 2)

Arguments

x

Name of the genlight object containing the SNP data [required].

unknown

Name of the individual to be assigned to a population [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default 2, unless specified using gl.set.verbosity].

Details

Value

Adata.frame consisting of assignment probabilities for eachpopulation.

Author(s)

Custodian: Luis Mijangos – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")res <- utils.assignment_2(platypus.gl,unknown="T27")

Calculates mean observed heterozygosity, mean expected heterozygosity and Fisper locus, per population and various population differentiation measures

Description

This is a re-implementation ofhierfstat::basics.stats specifically for genlight objects. Formula (and hence results) match exactly the original version ofhierfstat::basics.stats but it is much faster.

Usage

utils.basic.stats(x, digits = 4)

Arguments

x

A genlight object containing the SNP genotypes [required].

digits

Number of decimals to report [default 4]

Value

A list with with the statistics for each population

Author(s)

Luis Mijangos and Carlo Pacioni (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

require("dartR.data")out <- utils.basic.stats(platypus.gl)

Utility function to check the class of an object passed to a function

Description

Most functions require access to a genlight object, dist matrix, data matrixor fixed difference list (fd), and this function checks that a genlightobject or one of the above has been passed, whether the genlight object is aSNP dataset or a SilicoDArT object, and reports back if verbosity is >=2.

Usage

utils.check.datatype(  x,  accept = c("genlight", "SNP", "SilicoDArT"),  verbose = NULL)

Arguments

x

Name of the genlight object, dist matrix, data matrix, glPCA, orfixed difference list (fd) [required].

accept

Vector containing the classes of objects that are to beaccepted [default c('genlight','SNP','SilicoDArT'].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity].

Details

This function checks the class of passed object and sets the datatype to'SNP', 'SilicoDArT', 'dist', 'mat', or class[1](x) as appropriate.

Note also that this function checks to see if there are individuals or lociscored as all missing (NA) and if so, issues the user with a warning.

Note: One and only one of gl.check, fd.check, dist.check or mat.check can beTRUE.

Value

datatype, 'SNP' for SNP data, 'SilicoDArT' for P/A data, 'dist' for adistance matrix, 'mat' for a data matrix, 'glPCA' for an ordination file, orclass(x)[1].

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 datatype <- utils.check.datatype(testset.gl)datatype <- utils.check.datatype(as.matrix(testset.gl),accept='matrix')fd <- gl.fixed.diff(testset.gl)datatype <- utils.check.datatype(fd,accept='fd')datatype <- utils.check.datatype(testset.gl)

Functions from package starmie for merging Q matrices from Structure runsusing the CLUMPP algorithms.

Description

Functions from package starmie for merging Q matrices from Structure runsusing the CLUMPP algorithms.

Usage

utils.clumpp(Q_list, method, iter)

Arguments

Q_list

A list of of Q matrices.

method

The algorithm to use to infer the correct permutations. One of'greedy' or 'greedyLargeK' or 'stephens'

iter

The number of iterations to use if running either 'greedy' or'greedyLargeK'

Converts DarT to genlight.

Description

Converts a DArT file (read viaread.dart) into angenlight objectadegenet.

Usage

utils.dart2genlight(  dart,  ind.metafile = NULL,  covfilename = NULL,  probar = TRUE,  verbose = NULL)

Arguments

dart

A dart object created via read.dart [required].

ind.metafile

Optional file in csv format with metadata for eachindividual (see details for explanation) [default NULL].

covfilename

Depreciated, use parameter ind.metafile.

probar

Show progress bar [default TRUE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default NULL].

Details

Value

A genlight object. Including all available slots are filled.loc.names, ind.names, pop, lat, lon (if provided via the ind.metadata file)

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

Calculates a distance matrix for individuals defined in a dartRgenlight object using binary P/A data (SilicoDArT)

Description

This script calculates various distances between individuals based on sequence tagPresence/Absence data.

Usage

utils.dist.binary(  x,  method = "simple",  scale = FALSE,  swap = FALSE,  output = "dist",  verbose = NULL)

Arguments

x

Name of the genlight containing the genotypes [required].

method

Specify distance measure [default simple].

scale

If TRUE and method='euclidean', the distance will be scaled to fall in the range [0,1] [default FALSE].

swap

If TRUE and working with presence-absence data, then presence (no disrupting mutation) is scored as 0 and absence (presence of a disrupting mutation) is scored as 1 [default FALSE].

output

Specify the format and class of the object to be returned, dist for a object of class dist, matrix for an object of class matrix [default "dist"].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Details

The distance measure can be one of:

Euclidean – Euclidean Distance applied to cartesian coordinates definedby the loci, scored as 0 or 1. Presence and absence equally weighted.
simple – simple matching, both 1 or both 0 = 0; one 1 and the other0 = 1. Presence and absence equally weighted.
Jaccard – ignores matching 0, both 1 = 0; one 1 and the other 0 = 1.Absences could be for different reasons.
Bray-Curtis – both 0 = 0; both 1 = 2; one 1 and the other 0 = 1. Absencescould be for different reasons. Sometimes called the Dice or Sorensendistance.

One might choose to disregard or downweight absences in comparison withpresences because the homology of absences is less clear (mutation at one orthe other, or both restriction sites). Your call.

Value

An object of class 'dist' or 'matrix' giving distances between individuals

Author(s)

Author: Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 D <- utils.dist.binary(testset.gs, method='Jaccard')D <- utils.dist.binary(testset.gs, method='Euclidean',scale=TRUE)D <- utils.dist.binary(testset.gs, method='Simple')

Calculates a distance matrix for individuals defined in a dartRgenlight object using SNP data (DArTseq)

Description

This script calculates various distances between individuals based on SNP genotypes.

Usage

utils.dist.ind.snp(  x,  method = "Euclidean",  scale = FALSE,  output = "dist",  verbose = NULL)

Arguments

x

Name of the genlight containing the genotypes [required].

method

Specify distance measure [default Euclidean].

scale

If TRUE and method='Euclidean', the distance will be scaled to fall in the range [0,1] [default FALSE].

output

Specify the format and class of the object to be returned, dist for a object of class dist, matrix for an object of class matrix [default "dist"].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Details

The distance measure can be one of:

Euclidean – Euclidean Distance applied to Cartesian coordinates definedby the loci, scored as 0, 1 or 2.
Simple – simple mismatch, 0 where no alleles are shared, 1 where oneallele is shared, 2 where both alleles are shared.
Absolute – absolute mismatch, 0 where no alleles are shared, 1 whereone or both alleles are shared.
Czekanowski (or Manhattan) calculates the city block metric distanceby summing the scores on each axis (locus).

Value

An object of class 'dist' or 'matrix' giving distances between individuals

Author(s)

Author(s): Arthur Georges. Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr

Examples

 D <- utils.dist.ind.snp(testset.gl, method='Manhattan')D <- utils.dist.ind.snp(testset.gl, method='Euclidean',scale=TRUE)D <- utils.dist.ind.snp(testset.gl, method='Simple')

A utility script to flag the start of a script

Description

A utility script to flag the start of a script

Usage

utils.flag.start(func = NULL, build = NULL, verbose = NULL)

Arguments

func

Name of the function that is starting [required].

build

Name of the build [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Value

calling function name

Author(s)

Custodian: Arthur Georges – Post tohttps://groups.google.com/d/forum/dartr@export

Calculates the Hamming distance between two DArT trimmed DNA sequences

Description

Usage

utils.hamming(str1, str2, r = 4)

Arguments

str1

String containing the first sequence [required].

str2

String containing the second sequence [required].

r

Number of bases in the restriction enzyme recognition sequence[default 4].

Details

The Hamming distance between the rows of a matrix can be computed quicklyby exploiting the fact that the dot product of two binary vectors x and (1-y)counts the corresponding elements that are different between x and y.This matrix multiplication can also be used for matrices with more than twopossible values, and different types of elements, such as DNA sequences.

The function calculates the Hamming distance between all columns of amatrix X, or two matrices X and Y. Again matrix multiplication is used, thistime for counting, between two columns x and y, the number of cases in whichcorresponding elements have the same value (e.g. A, C, G or T). This countingis done for each of the possible values individually, while iteratively addingthe results. The end result of the iterative adding is the sum of allcorresponding elements that are the same, i.e. the inverse of the Hammingdistance. Therefore, the last step is to subtract this end result H from themaximum possible distance, which is the number of rows of matrix X.

If the two DNA sequences are of differing length, the longer is truncated. Theinitial common restriction enzyme recognition sequence is ignored.

The algorithm is that of Johann de Jonghttps://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/

Value

Hamming distance between the two strings

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

Calculates expected mean expected heterozygosity per population

Description

Calculates expected mean expected heterozygosity per population

Usage

utils.het.pop(x, t_het)

Arguments

x

A genlight object containing the SNP genotypes [required].

t_het

A string specifying the type of expected heterozygosity to becalculated. Options are "He" for expected heterozygosity and "Ho" for observed

Value

A vector with the mean expected heterozygosity for each population

Author(s)

Bernd Gruber & Luis Mijangos (bugs? Post tohttps://groups.google.com/d/forum/dartr)

Examples

out <- utils.het.pop(testset.gl,t_het="He")

Conducts jackknife resampling using a genlight object

Description

Jackknife resampling is a statistical procedure where for a dataset of sample size n, subsamples of size n-1 are used to compute a statistic. The collection of the values obtained can be used to evaluate the variability around the point estimate. This function can take the loci, the individuals or the populations as units over which to conduct resampling.

Note that when n is very small, jackknife resampling is not recommended.

Parallel computation is implemented. The argumentn.cores indicates the number of core to use. If "auto" [default], it will use all but one available cores. If the number of units is small (e.g. a few populations), there is not real advantage in using parallel computation. On the other hand, if the number of units is large (e.g. thousands of loci), even with parallel computation, this function can be very slow.

Usage

utils.jackknife(  x,  FUN,  unit = "loc",  recalc = FALSE,  mono.rm = FALSE,  n.cores = "auto",  verbose = NULL,  ...)

Arguments

x

Name of the genlight object [required].

FUN

the name of the function to be used to calculate the statistic

unit

The unit to use for resampling. One of c("loc", "ind", "pop"): loci, individuals or populations

recalc

If TRUE, recalculate the locus metadata statistics [default FALSE].

mono.rm

If TRUE, remove monomorphic and all NA loci [default FALSE].

n.cores

The number of cores to use. If "auto" [default], it will use all but one available cores.

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress but not results; 3, progress and results summary; 5, full report[default 2 or as specified using gl.set.verbosity].

...

any additional arguments to be passed to FUN

Value

A list of length n where each element is the output of FUN

Author(s)

Custodian: Carlo Pacioni – Post tohttps://groups.google.com/d/forum/dartr

Examples

require("dartR.data")platMod.gl <- gl.filter.allna(platypus.gl) chk.pop <- utils.jackknife(x=platMod.gl, FUN="gl.alf", unit="pop", recalc = FALSE, mono.rm = FALSE, n.cores = 1, verbose=0)

A utility script to calculate the number of variant and invariant sites bylocus

Description

Calculate the number of variant and invariant sites by locus and add them ascolumns inloc.metrics. This can be useful to conduct furtherfiltering, for example where only loci with secondaries are wanted forphylogenetic analyses.

Usage

utils.n.var.invariant(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL].

Details

Invariant sites are the sites (nucleotide) that are not polymorphic. When thelocus metadata supplied by DArT includes the sequence of the allele(TrimmedSequence), it is used by this function to estimate the numberof sites that were sequenced in each tag (read). This script then subtractsthe number of polymorphic sites. The length of the trimmed sequence(lenTrimSeq), the number of variant (n.variant) andinvariant (n.invariant) sites are the added to the table ingl@others$loc.metrics.

NOTE: It is important to realise that this function correctlyestimates the number of variant and invariant sites only when it is executed ongenlight objects before secondaries are removed.

Value

The modified genlight object.

Author(s)

Carlo Pacioni (Post tohttps://groups.google.com/d/forum/dartr)

Examples

require("dartR.data")out <- utils.n.var.invariant(platypus.gl)

OutFLANK: An Fst outlier approach by Mike Whitlock and Katie Lotterhos,University of British Columbia.

Description

This function is the original implementation of Outflank by Whitlock andLotterhos. dartR simply provides a convenient wrapper around their functionsand an easier install being an r package (for information please refer totheir github repository)

Usage

utils.outflank(  FstDataFrame,  LeftTrimFraction = 0.05,  RightTrimFraction = 0.05,  Hmin = 0.1,  NumberOfSamples,  qthreshold = 0.05)

Arguments

FstDataFrame

A data frame that includes a row for each locus, withcolumns as follows:

$LocusName: a character string that uniquely nameseach locus.
$FST: Fst calculated for this locus. (Kept here toreport the unbased Fst of the results)
$T1: The numerator of the estimator for Fst(necessary, with $T2, to calculate mean Fst)
$T2: The denominator of the estimator of Fst
$FSTNoCorr: Fst calculated for this locus withoutsample size correction. (Used to find outliers)
$T1NoCorr: The numerator of the estimator for Fstwithout sample size correction (necessary, with $T2, tocalculate mean Fst)
$T2NoCorr: The denominator of the estimator of Fstwithout sample size correction
$He: The heterozygosity of the locus (used to screenout low heterozygosity loci that have a different distribution)

LeftTrimFraction

The proportion of loci that are trimmed from thelower end of the range of Fst before the likelihood funciton is applied[default 0.05].

RightTrimFraction

The proportion of loci that are trimmed from theupper end of the range of Fst before the likelihood funciton is applied[default 0.05].

Hmin

The minimum heterozygosity required before including calculationsfrom a locus [default 0.1].

NumberOfSamples

The number of spatial locations included in the dataset.

qthreshold

The desired false discovery rate threshold for calculatingq-values [default 0.05].

Details

This method looks for Fst outliers from a list of Fst's for different loci.It assumes that each locus has been genotyped in all populations withapproximately equal coverage.

OutFLANK estimates the distribution of Fst based on a trimmed sample of Fst's.It assumes that the majority of loci in the center of the distribution areneutral and infers the shape of the distribution of neutral Fst using atrimmed set of loci. Loci with the highest and lowest Fst's are trimmed fromthe data set before this inference, and the distribution of Fst df/(mean Fst)is assumed to'follow a chi-square distribution. Based on this inferreddistribution, each locus is given a q-value based on its quantile in theinferred null'distribution.

The main procedure is called OutFLANK – see comments in that functionimmediately below for input and output formats. The other functions here arenecessary and must be uploaded, but are not necessarily needed by the userdirectly.

Steps:

Value

The function returns a list with seven elements:

FSTbar: the mean FST inferred from loci not marked as outliers
FSTNoCorrbar: the mean FST (not corrected for sample size -gives anupwardly biased estimate of FST)
dfInferred: the inferred number of degrees of freedom for thechi-square distribution of neutral FST
numberLowFstOutliers: Number of loci flagged as having a significantlylow FST (not reliable)
numberHighFstOutliers: Number of loci identified as havingsignificantly high FST
results: a data frame with a row for each locus. This data frameincludes all the original columns in thedata set, and six new ones:
- $indexOrder (the original order of the input data set),
- $GoodH (Boolean variable which is TRUE if the expectedheterozygosity is greater than the Hemin set by input),
- $OutlierFlag (TRUE if the method identifies the locus asan outlier, FALSE otherwise), and
- $q (the q-value for the test of neutrality for the locus)
- $pvalues (the p-value for the test of neutrality for thelocus)
- $pvaluesRightTail the one-sided (right tail) p-value fora locus

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofWhitlock & Lotterhos

Creates OutFLANK input file from individual genotype info.

Description

Creates OutFLANK input file from individual genotype info.

Usage

utils.outflank.MakeDiploidFSTMat(SNPmat, locusNames, popNames)

Arguments

SNPmat

This is an array of genotypes with a row for each individual.There should be a column for each SNP, with the number of copies of the focalallele (0, 1, or 2) for that individual. If that individual is missing datafor that SNP, there should be a 9, instead.

locusNames

A list of names for each SNP locus. There should be thesame number of locus names as there are columns in SNPmat.

popNames

A list of population names to give location for eachindividual. Typically multiple individuals will have the same popName. Thelist popNames should have the same length as the number of rows in SNPmat.

Value

Returns a data frame in the form needed for the main OutFLANKfunction.

Plotting functions for Fst distributions after OutFLANK

Description

This function takes the output of OutFLANK asinput with the OFoutput parameter. It plots a histogram of the FST (bydefault, the uncorrected FSTs used by OutFLANK) of loci and overlays theinferred null histogram.

Usage

utils.outflank.plotter(  OFoutput,  withOutliers = TRUE,  NoCorr = TRUE,  Hmin = 0.1,  binwidth = 0.005,  Zoom = FALSE,  RightZoomFraction = 0.05,  titletext = NULL)

Arguments

OFoutput

The output of the function OutFLANK()

withOutliers

Determines whether the loci marked as outliers (with$OutlierFlag) are included in the histogram.

NoCorr

Plots the distribution of FSTNoCorr when TRUE. Recommended,because this is the data used by OutFLANK to infer the distribution.

Hmin

The minimum heterozygosity required before including a locus inthe plot.

binwidth

The width of bins in the histogram.

Zoom

If Zoom is set to TRUE, then the graph will zoom in on the righttail of the distirbution (based on argument RightZoomFraction)

RightZoomFraction

Used when Zoom = TRUE. Defines the proportion of thedistribution to plot.

titletext

Allows a test string to be printed as a title on the graph

Value

produces a histogram of the FST

An internal function to save a ggplot object to disk in RDS binary format

Description

WARNING: UTILITY SCRIPTS ARE FOR INTERNAL USE ONLY AND SHOULD NOT BE USED BY END USERS AS THEIR USE OUT OF CONTEXT COULD LEAD TO UNPREDICTABLE OUTCOMES.

Usage

utils.plot.save(x, dir = NULL, file = NULL, verbose = NULL, ...)

Arguments

x

Name of the ggplot object.

dir

Name of the directory to save the file.

file

Name of the file to save the plot to (omit file extension)

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report[default NULL, unless specified using gl.set.verbosity]

...

Parameters passed to functionggsave, such as width and height, when the ggplot is to be saved.

Details

An internal function to save a ggplot object to disk in RDS binary format.Uses saveRDS() to save the file with an .RDS extension; can be reloaded with gl.load().

Author(s)

Custodian: Arthur Georges (Post tohttps://groups.google.com/d/forum/dartr)

utility function to read in DArT data

Description

Utility to import DarT data to RInternal function called by gl.read.dart()

Usage

utils.read.dart(  filename,  nas = "-",  topskip = NULL,  lastmetric = "RepAvg",  service.row = 1,  plate.row = 3,  verbose = NULL)

Arguments

filename

Path to file (csv file only currently) [required].

nas

A character specifying NAs [default '-'].

topskip

A number specifying the number of rows to be skipped. If notprovided the number of rows to be skipped are 'guessed' by the number of rowswith '*' at the beginning [default NULL].

lastmetric

Specifies the last non genetic column [default 'RepAvg'].Be sure to check if that is true, otherwise the number of individuals willnot match. You can also specify the last column by a number.

service.row

The row number in which the information of the DArTservice is contained [default 1].

plate.row

The row number in which the information of the platelocation is contained [default 3].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default NULL].

Value

A list of length 5. #dart format (one or two rows) #individuals,#snps, #non genetic metrics, #genetic data (still two line format, rows=snps,columns=individuals)

Author(s)

Custodian: Bernd Gruber (Post tohttps://groups.google.com/d/forum/dartr)

A utility script to recalculate the OneRatioRef, OneRatioSnp, PICRef, PICSnp,and AvgPIC by locus after some individuals or populations have been deleted.

Description

The locus metadata supplied by DArT has OneRatioRef, OneRatioSnp, PICRef,PICSnp, and AvgPIC included, but the allelic composition will change whensome individuals,or populations, are removed from the dataset and so theinitial statistics will no longer apply. This script recalculates thesestatistics and places the recalculated values in the appropriate place inthe genlight object.

Usage

utils.recalc.avgpic(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Details

If the locus metadata OneRatioRef|Snp, PICRef|Snp and/or AvgPIC do not exist,the script creates and populates them.

Value

The modified genlight object.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#out <- utils.recalc.avgpic(testset.gl)

A utility script to recalculate the callrate by locus after some populationshave been deleted

Description

SNP datasets generated by DArT have missing values primarily arising fromfailure to call a SNP because of a mutation at one or both of therestriction enzyme recognition sites. The locus metadata supplied by DArT hascallrate included, but the call rate will change when some individuals areremoved from the dataset. This script recalculates the callrate and placesthese recalculated values in the appropriate place in the genlight object.It sets the Call Rate flag to TRUE.

Usage

utils.recalc.callrate(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log ; 3, progress and results summary; 5, full report [default 2].

Value

The modified genlight object

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#out <- utils.recalc.callrate(testset.gl)

A utility script to recalculate the frequency of the heterozygous SNPs bylocus after some populations have been deleted

Description

The locus metadata supplied by DArT has FreqHets included, but the frequencyof the heterozygotes will change when some individuals are removed from thedataset.

Usage

utils.recalc.freqhets(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Details

This script recalculates the FreqHets and places these recalculated values inthe appropriate place in the genlight object.

Note that the frequency of the homozygote reference SNPS is calculated fromthe individuals that could be scored.

Value

The modified genlight object.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#out <- utils.recalc.freqhets(testset.gl)

A utility script to recalculate the frequency of the homozygous referenceSNP by locus after some populations have been deleted

Description

The locus metadata supplied by DArT has FreqHomRef included, but thefrequency of the homozygous reference will change when some individuals areremoved from the dataset.

Usage

utils.recalc.freqhomref(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Details

This script recalculates the FreqHomRef and places these recalculated valuesin the appropriate place in the genlight object.

Note that the frequency of the homozygote reference SNPS is calculated fromthe individuals that could be scored.

Value

The modified genlight object

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#result <- utils.recalc.freqhomref(testset.gl)

A utility script to recalculate the frequency of the homozygous alternateSNP by locus after some populations have been deleted

Description

The locus metadata supplied by DArT has FreqHomSnp included, but thefrequency of the homozygous alternate will change when some individuals areremoved from the dataset.

Usage

utils.recalc.freqhomsnp(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Details

This script recalculates the FreqHomSnp and places these recalculated valuesin the appropriate place in the genlight object.

Note that the frequency of the homozygote alternate SNPS is calculated fromthe individuals that could be scored.

Value

The modified genlight object.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#out <- utils.recalc.freqhomsnp(testset.gl)

A utility script to recalculate the minor allele frequency by locus,typically after some populations have been deleted

Description

The locus metadata supplied by DArT does not have MAF included, so it iscalculated and added to the locus.metadata by this script. The minimum allelefrequency will change when some individuals are removed from the dataset.This script recalculates the MAF and places these recalculated values in theappropriate place in the genlight object.

Usage

utils.recalc.maf(x, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data [required].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default 2].

Value

The modified genlight dataset.

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#f <- dartR::utils.recalc.maf(testset.gl)

A utility script to reset to FALSE (or TRUE) the locus metric flags aftersome individuals or populations have been deleted.

Description

The locus metadata supplied by DArT has OneRatioRef, OneRatioSnp, PICRef,PICSnp, and AvgPIC included, but the allelic composition will change whensome individuals are removed from the dataset and so the initial statisticswill no longer apply. This applies also to some variable calculated by dartR(e.g. maf). This script resets the locus metrics flags to FALSE to indicatethat these statistics in the genlight object are no longer current. Theverbosity default is also set, and in the case of SilcoDArT, the flags PICand OneRatio are also set.

Usage

utils.reset.flags(x, set = FALSE, value = 2, verbose = NULL)

Arguments

x

Name of the genlight object containing the SNP data ortag presence/absence data (SilicoDArT) [required].

set

Set the flags to TRUE or FALSE [default FALSE].

value

Set the default verbosity for all functions, where verbosity isnot specified [default 2].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2,progress log; 3, progress and results summary; 5, full report [default NULL].

Details

If the locus metrics do not exist then they are added to the genlight objectbut not populated. If the locus metrics flags do not exist, then they areadded to the genlight object and set to FALSE (or TRUE).

Value

The modified genlight object

Author(s)

Custodian: Luis Mijangos (Post tohttps://groups.google.com/d/forum/dartr)

Examples

#result <- utils.reset.flags(testset.gl)

Spatial autocorrelation coefficient calculations

Description

Carries out calculation for spatial autocorrelation coefficientstarting from a genetic and geogaphic distance matrix.

Usage

utils.spautocor(  GD,  GGD,  permutation = FALSE,  bootstrap = FALSE,  bins = 10,  reps)

Arguments

GD

Genetic distance matrix.

GGD

Geographic distance matrix.

permutation

Whether permutation calculations for the null hypothesis of no spatial structure should be carried out [default TRUE].

bootstrap

Whether bootstrap calculations to compute the 95% confidence intervals around r should be carried out [default TRUE].

bins

The number of bins for the distance classes(i.e.length(bins) == 1) or a vectors with the break points. See details [default 5].

reps

The number to be used for permutation and bootstrap analyses[default 100].

Details

The code of this function is based onespautocorr from the packagePopGenReport, which has been modified to fix a few bugs (as ofPopGenReport v 3.0.4 and allow calculations of bootstraps estimates.

See details fromgl.spatial.autoCorr for a detailed explanation.

Value

Returns a data frame with the following columns:

Bin The distance classes
N The number of pairwise comparisons within each distance class
r.uc The uncorrected autocorrelation coefficient

if bothbootstap andpermutation areFALSE otherwise onlyr estimates are returned

Author(s)

Carlo Pacioni & Bernd Gruber

References

Smouse PE, Peakall R. 1999. Spatial autocorrelation analysis ofindividual multiallele and multilocus genetic structure. Heredity 82:561-573.
Double, MC, et al. 2005. Dispersal, philopatry and infidelity: dissectinglocal genetic structure in superb fairy-wrens (Malurus cyaneus). Evolution59, 625-635.
Peakall, R, et al. 2003. Spatial autocorrelation analysis offers newinsights into gene flow in the Australian bush rat, Rattus fuscipes.Evolution 57, 1182-1195.
Smouse, PE, et al. 2008. A heterogeneity test for fine-scale geneticstructure. Molecular Ecology 17, 3389-3400.
Gonzales, E, et al. 2010. The impact of landscape disturbance on spatialgenetic structure in the Guanacaste tree, Enterolobiumcyclocarpum(Fabaceae). Journal of Heredity 101, 133-143.
Beck, N, et al. 2008. Social constraint and an absence of sex-biaseddispersal drive fine-scale genetic structure in white-winged choughs.Molecular Ecology 17, 4346-4358.

Examples

# See gl.spatial.autoCorr

Util function for evanno plots

Description

These functions were copied from package strataG, which is no longer on CRAN (maintained by Eric Archer)

Usage

utils.structure.evanno(sr, plot = TRUE)

Arguments

sr

structure run object

plot

should the plots be returned

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofEric Archerhttps://github.com/EricArcher/strataG

structure util functions

Description

These functions were copied from package strataG, which is no longer on CRAN (maintained by Eric Archer)

Usage

utils.structure.genind2gtypes(x)

Arguments

x

a genind object

Value

a gtypes object

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofEric Archerhttps://github.com/EricArcher/strataG

Utility function to run Structure

Description

These functions were copied from package strataG, which is no longer on CRAN(maintained by Eric Archer)

Usage

utils.structure.run(  g,  k.range = NULL,  num.k.rep = 1,  label = NULL,  delete.files = TRUE,  exec = "structure",  ...)

Arguments

g

a gtypes object [seestrataG].

k.range

vector of values to formaxpop in multiple runs. If settoNULL, a single STRUCTURE run is conducted withmaxpopsgroups. If specified, do not also specifymaxpops.

num.k.rep

number of replicates for each value ink.range.

label

label to use for input and output files

delete.files

logical. Delete all files when STRUCTURE is finished?

exec

name of executable for STRUCTURE. Defaults to "structure".

...

arguments to be passed tostructureWrite.

Value

structureRun

a list where each element is alist with results fromstructureRead and a vector of the filenamesused

structureWrite

a vector of the filenames used bySTRUCTURE

structureRead

a list containing:

summary: new locus name, which is a combination of loci ingroup
q.mat: data.frame of assignment probabilities for eachid
prior.anc: list of prior ancestry estimates for eachindividual where population priors were used
files: vector ofinput and output files used by STRUCTURE
label: label for therun

Author(s)

Bernd Gruber (bugs? Post tohttps://groups.google.com/d/forum/dartr); original implementation ofEric Archerhttps://github.com/EricArcher/strataG

Setting up the package

Description

Setting theme, colors and verbosity

Usage

zzz

Format

An object of classNULL of length 0.

Movatterモバイル変換

indexing dartR objects correctly...

Description

Usage

Arguments

A genlight object created via the read.dart functions

Description

Usage

Format

Author(s)

adjust cbind for dartR

Description

Usage

Arguments

Value

Examples

Converts a genind object into a genlight object

Description

Usage

Arguments

Details

Value

Author(s)

Estimates expected Heterozygosity

Description

Usage

Arguments

Value

Author(s)

Estimates observed Heterozygosity

Description

Usage

Arguments

Value

Author(s)

Estimates effective population size using the Linkage Disequilibriummethod based on NeEstimator (V2)

Description

Usage

Arguments

Value

Author(s)

Examples

Calculates allele frequency of the first and second allele for each lociA very simple function to report allele frequencies

Description

Usage

Arguments

Value

Author(s)

Examples

Generates percentage allele frequencies by locus and population

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Performs AMOVA using genlight data

Description

Usage

Arguments

Value

Author(s)

Examples

Population assignment using grm

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Assign an individual of unknown provenance to population based on Mahalanobis Distance

Description

Usage

Arguments

Details

Value

Author(s)

Examples