Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Relative Simulator
Version:1.0.0
Date:2023-12-04
Author:James M. Curran
Maintainer:James M. Curran <j.curran@auckland.ac.nz>
Description:A set of tools to explore the behaviour statistics used for forensic DNA interpretation when close relatives are involved. The package also offers some useful tools for exploring other forensic DNA situations.
Encoding:UTF-8
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
Imports:stats, graphics, xtable, multicool, utils, rvest, stringr,xml2, methods
Depends:R (≥ 2.10)
LinkingTo:Rcpp
LazyData:true
RoxygenNote:7.2.3
NeedsCompilation:yes
Packaged:2023-12-03 23:48:12 UTC; jcur002
Repository:CRAN
Date/Publication:2023-12-04 00:10:02 UTC

relSim: Relative Simulator

Description

A set of tools to explore the behaviour statistics used for forensic DNA interpretation when close relatives are involved. The package also offers some useful tools for exploring other forensic DNA situations.


Identity by state

Description

Calculates the total number of alleles that are shared by two profiles. Ifthe two profiles in question are indeed relatives then the matching allelesmay be identical by descent, or by random chance alone, hence identity bystate.

Usage

IBS(prof1, prof2, nLoci = length(prof1)/2, bPrint = FALSE)

Arguments

prof1

A matrix consisting of 2 columns and nLoci rows. Each entry inthe matrix is the (coded) allele held by the individual.

prof2

Seeprof1

nLoci

The number of loci in the profiles. Specifying this valuespeeds up computation enormously.

bPrint

If true then the result is printed locus by locus. Thisfeature exists primarily for debugging purposes.

Value

An integer between 0 and 2*nLoci representing the total number ofalleles that match in the two profiles.

Author(s)

James M. Curran

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)C1 = randomChild(P1, fbiCaucs)IBS(P1, C1)IBS(P1, C1, bPrint = TRUE)

Use importance sampling to determine the probability of m peaks from n contributors to a mixture

Description

WARNING: This function is experimental.

Usage

IS(  Freqs,  numContributors = 4,  maxPeaks = NULL,  numIterations = 100,  bTail = FALSE)

Arguments

Freqs

a set of allele frequencies. The format can be found inreadFreqs

numContributors

the number of contributors to each mixture. Must be >= 1.

maxPeaks

either the number of peaks observed in the mixture or such that1 <=1 maxPeaks <= min(2 * numContribuors, numAlleles). That is, if usedin this way maxPeaks must be between 1 and the smaller of twice the number ofcontributors or the number of possible alleles because you cannot see more peaks than there are possible alleles.

numIterations

the number of iterations to use in the importance sampling scheme.

bTail

ifTRUE then the tail probability is calculated.

Value

a list with as many elements as loci. If tail probabilities are selected then each locus element will be a vector of probabilities

Examples

## Not run: data(USCaucs)IS(USCaucs, numContributors = 4, maxPeaks = 3, numIterations = 1e4)## End(Not run)

CODIS STR Loci allele frequency data

Description

This data structure

Format

This data set is a list which has two sub-lists. The lists are namedloci and freqs. loci is a vector of the 13 CODIS STR locus names. freqs is alist of 13 vectors, each vector contains the allele frequencies publishedfor US Caucasians in Budowle and Moretti (1999). The raw data is availablefromhttps://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/july1999/dnaloci.txt

Author(s)

James M. Curran

References

Budowle, B. and Moretti, T.R. (1999),Genotype Profilesfor Six Population Groups at the 13 CODIS Short Tandem Repeat Core Loci andOther PCR Based Loci, Forensic Science Communications 1(2).

See Also

fbiCaucs

Examples

data(USCaucs)names(USCaucs)USCaucs$locinames(USCaucs$freqs)USCaucs$freqs[[1]]names(USCaucs$freqs[[1]])USCaucs$freqs[[1]][1]

Compute the likehood ratio for all pairs of profiles in a database

Description

This function takes every pair of profiles in a database of profiles and computesthe likelihood ratio (LR) for a specific relationship given bynCode. That meansthere will beN(N-1)/2 LRs computed for N profiles.

Usage

allPairsLR(Profiles, listFreqs, nCode)

Arguments

Profiles

an integer vector of stacked profiles representing the database. This vector has2NL entries, where N is the number ofprofiles andL is the number of loci.

listFreqs

is a set of allele frequencies representing a particular multiplex. The function assumes that that loci in the profilesare in the same order as the loci in this list. The data structure is aList ofNumericVector's.

nCode

if1 then compute the LR for siblings, otherwise computer the LR for parent/child.

Value

aNumericVector containing the LRs. They are stored in sequential order so if for example there were threeprofiles, then there are 3 possible LRs, and the result vector would contain the LRs for the profile pairs (1, 2),(1, 3), and (2, 3).

Author(s)

James Curran

Examples

data("USCaucs")N = 600profs = relSim:::.randomProfiles(USCaucs$freqs, N)system.time({lr = relSim:::allPairsLR(profs, USCaucs$freqs, 1)})plot(density(log10(lr)))mean(lr > 1) ## estimate the probability that the LR is incorrectly above 1

Perform relatives simulations using large memory blocks in C

Description

Generate N pairs with a given relationship, calculate the LR for sibs,parent-child and the number of matching alleles and count the number ofpairs that meet the threshold criteria.

Usage

blockSim(  N,  Freqs,  rel = "UN",  ibsthresh = NULL,  kithresh = NULL,  code = 1,  falseNeg = TRUE,  BlockSize = N/10,  showProgress = FALSE)

Arguments

N

The number of iterations to carry out

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

rel

generate unrelated (rel = 'UN'), full-sibs (rel ='FS'), or parent child (rel = 'PC') pairs

ibsthresh

A vector of one or more IBS thresholds

kithresh

A vector of one or more KI/LR thresholds

code

A code from 1 to 6 which dictates the events that will becounted.

  1. the LR for siblings will be compared to thevalues inkithresh and incremented if the LR is greater than thethreshold

  2. the LR for parent/child will be compared to the values inkithresh and incremented if the LR is greater than the threshold

  3. the number of matching alleles (IBS) will be compared to the values inibsthresh and incremented if the IBS is greater than the threshold

  4. the LR for siblings and the number of matching alleles will becompared to the values inkithresh andibsthresh andincremented if both the LR and IBS is greater than the thresholds.ibsthresh andkithresh must be of equal length for this optionto work

  5. the LR for parent/child and the number of matching alleleswill be compared to the values inkithresh andibsthresh andincremented if both the LR and IBS is greater than the thresholds.ibsthresh andkithresh must be of equal length for this optionto work

  6. this option is equivalent to performing code 4 and 5simulataneously. It is not currently implemented

falseNeg

if TRUE then the number of results that DO NOT satisfy theconditions are counted, otherwise the number of results DO satisfy theconditions are counted

BlockSize

Sets the number of random profiles to be generated in eachiteration. By default the block size is set to 10 percent of the totalsample size. It is unclear whether the procedure is more efficient if abigger percentage of the total is used. Users must take care to make surethat the block size evenly dividesN otherwise the procedure willexit. Users must also make sure that they have enough memory.

showProgress

IfTRUE then a progress bar will be displayed inthe console showing the progress of the simulation.

Details

This function is used for fast accurate estimation of false positive andfalse negative rates. It achieves part of its speed by block exectution inC, and part by not saving the LR or IBS results. It can do 1 billioniterations in about an hour.

Value

A vector containing the number of profile pairs that satisfied thethreshold conditions

Author(s)

James M. Curran

See Also

sim

Examples

## not run## this counts the number of unrelated pairs that are falsely identified## as siblings using the policy that there are 16 or more matching## alleles, and the LR/KI is greater than 100,000## this is a very rare event for the FBI Caucasians with a frequency of## about 4-5 times in 10 million pairs## Not run: data(fbiCaucs)N = 1e8ki = 1e5ibs = 16code = 5BlockSize = 1e6blockSim(N, fbiCaucs, rel = "UN", ibsthresh = ibs, kithresh = ki,         code = code, falseNeg = FALSE, BlockSize = BlockSize)## End(Not run)

Breed a population with an approximate level of\theta(F_{ST})

Description

This function simulates a population with an approximate level of populationsubstructure. This is achieved by subdividing a population into equal sizedsubpopulations and allowing them to breed within themselves for

t =

\lceil{\frac{\log_e(1-\theta)}{\log\left(1-\frac{1}{2N_s}\right)}}\rceil

generations, whereN_s isthe number of individuals in each subpopulation. This will produce apopulation with an estimated coancestry coefficient approximately equal to\theta

Usage

breedFst(Freqs, theta = 0.01, N = 10000, ns = 10, DNAtools = FALSE)

Arguments

Freqs

A list with an element,freqs which contains a list ofvectors, where each vector is a set of allele frequencies for a locus

theta

A desired level of inbreeding, where0 < \theta < 0.5

N

Total population size

ns

The number of subpopulations.N/n_s needs to begreater than 100

DNAtools

IfTRUE then the profiles in the return populationwill be formatted as a data frame with an id column and two columns perlocus.

Value

An object of class 'population' which is a list with the followingelements

Author(s)

James M. Curran

Examples

data(USCaucs)pop = breedFst(USCaucs)

Calculate locus-wise and population\theta = F_{ST},F = F_{IT}, andf = F_{IS} values

Description

This procedure uses the method of Weir and Cockerham to estimate\theta = F_{ST},F = F_{IT}, andf = F_{IS} for a population with known substructure

Usage

calcFStats(Pop, subPopIdx = NULL)

Arguments

Pop

An object type 'population'

subPopIdx

If this vector is not null, then it must consist ofN elements with values from 1 ton_s representing whichsubpopulation each member ofPop$profiles belongs to. If it is nullthen it is assumed that the population consists ofn_ssubpopulations of equal sizeN_s so thatn_s\times N_s =N

Value

A vector of lengthn_{loci}+1 with locus-wise\theta values and an overall\theta value for thepopulation

Author(s)

James M. Curran

References

Weir, B.S., Genetic Data Analysis II, (1996) p.173–179,Sinauer, Sunderland, MA.

See Also

breedFst

Examples

data(USCaucs)set.seed(123)p = breedFst(USCaucs)fstats = calcFStats(p)fstats

Caculate locus-wise and populationF_{ST} values

Description

This procedure uses the method of Weir and Cockerham to estimate\theta (F_{ST}) for a population with substructure

Usage

calcFst(Pop, subPopIdx = NULL)

Arguments

Pop

An object type 'population'

subPopIdx

If this vector is not null, then it must consist ofN elements with values from 1 ton_s representing whichsubpopulation each member ofPop$profiles belongs to. If it is nullthen it is assumed that the population consists ofn_ssubpopulations of equal sizeN_s so thatn_s\times N_s =N

Value

A vector of lengthn_{loci}+1 with locus-wise\theta values and an overall\theta value for thepopulation

Author(s)

James M. Curran

References

Weir, B.S., Genetic Data Analysis II, (1996) p.173–179,Sinauer, Sunderland, MA.

See Also

breedFst

Examples

data(USCaucs)p = breedFst(USCaucs)fst = calcFst(p)fst

Make sure that the frequencies are such

Description

Checks whether a list of frequencies at a series of genetic loci both sum toone and lie between 0 and 1.

Usage

checkFreqs(Freqs)

Arguments

Freqs

A list containg elementsloci andfreqs.freqs is a list of vectors containing the frequencies at the givenloci.

Details

If a locus fails to sum to one, or there are alleles which fall below zeroor above one, then a warning message will be returned for each item inerror.

Author(s)

James M. Curran

See Also

normalizeFreqs

Examples

data(fbiCaucs)checkFreqs(fbiCaucs)## induce an errorfbiCaucs$freqs[[1]] = runif(10)checkFreqs(fbiCaucs)

Returns the false positive or false negative rates for a set of IBS and/orKI thresholds

Description

This function is used to calcalate the various tables in the work of Ge etal. and Balding et al. Specifically it can be used to calculate the falsepositive rate for unrelated pairs being identified as full-sibs orparent-child pairs under differing levels of IBS or KI (or both) thresholds.It can also be used to calculate the false negative rates for full-sib, orparent-child, pairs being identified as unrelated, again with differinglevels of IBS, KI or both.

Usage

errorRate(  simResults,  bIBS = TRUE,  bKI = FALSE,  rel = "UN",  IBSthresh = 14:17,  KIthresh = c(1000, 10000, 1e+05, 1e+06),  nLoci = 13)

Arguments

simResults

A data.frame with three columns labelled sib, pc and ibs.This will usually be obtained from a call tosim orreadResults.

bIBS

IfTRUE then IBS thresholds are used to generate theerror rates. If bothbIBS andbKI areTRUE then bothcriteria are used.

bKI

IfTRUE then KI thresholds are used to generate the errorrates. If bothbIBS andbKI areTRUE then both criteriaare used.

rel

The relationship used in the simulation. Must be one of 'UN','FS' or 'PC'.

IBSthresh

A vector of IBS values that can be used to classify theresults as being related (or not).

KIthresh

A vector of KI threshold values that can be used to classifythe results as being related (or not).

nLoci

The number of loci being used in the multiplex. This dictatesthe upper bound on the IBS values.

Value

A vector (or a two-column matrix) of false negative or falsepositive rates. If the relationship is 'UN' then false positive rates arereturned for parent-child and full-sibs, with parent-child being in column 1and full-sibs in column 2. If the relationship is 'PC' then the falsenegative rate is returned for parent-child pairs, and if it is 'FS' then thefalse negative rate for full-sibs.

Author(s)

James M. Curran

See Also

sim, readResults

Examples

## not run## Not run: data(fbiCaucs)unrel = sim(10000)errorRate(unrel)## End(Not run)

Calculate the exclusion power of a multiplex by locus

Description

Calculates the exclusion power

1-2\left(\sum_{i=1}^{n_l}p_i^2\right)^2-4\sum_{i=1}^{n_l}p_i^4

at each locus for a set of allele frequencies.

Usage

exclusionPower(Freqs)

Arguments

Freqs

A list containing two vectors and a list, called loci, counts,and freqs. The elements of loci are the loci present in the multiplex. Theelements are freqs a vectors of allele frequencies for the locus. Theelements of counts are irrelevant here.

Value

The exclusion power for each locus.

Author(s)

James M. Curran

References

NRC II, Evaluation of Forensic Evidence, (1996), p.96, NationalAcademy Press.

Examples

data(USCaucs)ep(USCaucs)## get the multiplex wide exclusion power1 - prod(1-ep(USCaucs))

Search a database for siblings or children

Description

This function searches a database of profiles for either a sibling or a child

Usage

famSearch(profiles, siblings, children, listFreqs, step)

Arguments

profiles

an integer vector of stacked profiles representing the database. This vector has2NL entries, where N is the number ofprofiles and L is the number of loci.

siblings

an integer vector of stacked profiles representing the siblings of the profiles in database. The first entry is a sibling of the first entry inprofiles and so on. This vector has2NL entries, where N is the number ofprofiles and L is the number of loci.

children

an integer vector of stacked profiles representing the children of the profiles in database. The first entry is a child of the first entry inprofiles and so on. This vector has2NL entries, where N is the number ofprofiles and L is the number of loci.

listFreqs

is a set of allele frequencies representing a particular multiplex. The function assumes that that loci in the profilesare in the same order as the loci in this list. The data structure is aList ofNumericVector's.

step

A step size for progress reporting, i.e. print out progress everystep iterations. Ifstep = -1, then there is no printing.

Value

aList containing two dataframes, one calledsibs and one calledchildren. Each dataframe has results from searching foreither the sibling or the child in the database. For each entry there is a record of which profile gave the highest LR (and its value),and the position of the actual sibling or parent/child in the database (and its respective LR).

Author(s)

James Curran


CODIS STR Loci allele frequency data

Description

This data structure

Format

This data set is a list which has two sub-lists. The lists are namedloci and freqs. loci is a vector of the 13 CODIS STR locus names. freqs is alist of 13 vectors, each vector contains the allele frequencies publishedfor US Caucasians in Budowle et al. (2001).

Author(s)

James M. Curran

References

Budowle B, Shea B, Niezgoda S, Chakraborty R. (2001),CODIS STR loci data from 41 sample populations, J. Forensic Sci.46:453-89.

See Also

USCaucs

Examples

data(fbiCaucs)names(fbiCaucs)fbiCaucs$locinames(fbiCaucs$freqs)fbiCaucs$freqs[[1]]names(fbiCaucs$freqs[[1]])fbiCaucs$freqs[[1]][1]

Retrieve data from Budowle and Moretti (1999) from the web

Description

Retreives the Budowle and Moretti (1999) and compiles the allele frequencytables needed for the other parts of this package such assim.

Usage

fetchBMdata(url = NULL, id = NULL)

Arguments

url

- the location of the webpage this data is stored on. IfNULL then hardcoded values in the function are used. The argumentallows the user to get the function to work given this values may change.However, it is unlikely that it will help.

id

- the id of the HTML element where the data is stored on thewebpage. IfNULL then a hardcoded value in the function is used. Theargument allows the user to get the function to work given this values maychange. However, it is unlikely that it will help. NOTE: this is superflakey because of how the FBI web authors have created it. This may changein which case, the function will more likely change than the id.

Details

The first three populations have data on 20 loci, the second three on 13loci. The missing values (0's in the raw data) have been dropped and are notused in calculating the frequencies. This function will not work if you arenot connected to the internet, or access to the internet is blocked.

Value

A list consisting of six elements corresponding to the sixpopulations detailed in the data set. Each of the list elements is a list initself with three further elements namedloci,profiles andfreqs.loci is a vector of the 13-20 STR locus names.freqs is a list of 13-20 vectors, each vector contains the allele frequencies.profiles contains the raw profiles that the allele frequency tables were constructed from.

Author(s)

James M. Curran

References

Budowle, B. and Moretti, T.R. (1999),Genotype Profilesfor Six Population Groups at the 13 CODIS Short Tandem Repeat Core Loci andOther PCR Based Loci, Forensic Science Communications 1(2).

See Also

fbiCaucs, USCaucs

Examples

## not run## Not run: db = fetchBMdata()names(db)f = db[["TRINIDADIAN"]]$freqsdbExpect(f, k = "UN", collapse = TRUE)## End(Not run)

Identity by state at a locus

Description

Calculates the number of alleles that are shared by two profiles at a singlelocus. If the two profiles in question are indeed relatives then thematching alleles may be identical by descent, or by random chance alone,hence identity by state.

Usage

locusIBS(profMat)

Arguments

profMat

A matrix consisting of 4 columns and N rows. Each row in thematrix consists of the genotypes of two individuals.

Value

A vector of length N containing values 0, 1, or 2 depending on howmany alleles each pair of profiles share at a locus.

Author(s)

James M. Curran

Examples

data(fbiCaucs)G = randomSample(1, fbiCaucs, rel = 'FS', N = 1000)ibs = locusIBS(G)barplot(tabulate(ibs+1, nbins = 3))

Calculate locuswise likelihood ratios for two person victim/suspect mixtures

Description

Calculates the likelihood ratio for pairs of profiles under the propositionsH_p:\qquad V+S andH_d:\qquad V+U, whereV,S andU are the victim, thesuspect and someone unrelated to the suspect respectively. The calculationdoes not employ\theta so there are no assumptions about thesubpopulations of the contributors.

Usage

lrMix(profiles, Freqs)

Arguments

profiles

A vector of profile lists, fromrandomProfilePairs.randomPCPairs andrandomSibPairs also work but should notreally be used as the calculations do not take account of the relationshipbetween the two individuals.

Freqs

A list containing elementsfreqs,loci andcounts. The elementfreqs is a list of vectors of allelefrequencies at the loci listed inloci. These frequencies are used toevaluate the LR

Value

A matrix of LRs calculated at each locus for every pair of profiles.Note this is the set ofN profile pairs supplied inprofiles, not a pairwise comparison.

Author(s)

James M. Curran

Examples

data(USCaucs)p = randomProfilePairs(USCaucs, 10000)log.lrs = log10(lrMix(p, USCaucs))boxplot(log.lrs, las = 2)

Likelihood Ratio for Parent-Child / Paternity Index

Description

Calculates Likelihood Ratio comparing the probability of two profiles ifthey are indeed parent-child compared to unrelated. This is the paternityindex or PI.

Usage

lrPC(parent, child, Freqs = NULL, nLoci = length(parent)/2, f = NULL, n = NULL)

Arguments

parent

A matrix consisting of 2 columns and nLoci rows. Each entry inthe matrix is the (coded) allele held by the individual. This represents thealleged parent. The relationship is reflexive so it does not matter whichprofile is labelled parent and child.

child

Seeparent

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex. This argument or both f and n must bespecified

nLoci

The number of loci in the profiles

f

A concatenated vector of allele frequencies. Specifying this speedsup computation enormously

n

A vector of lengthnLoci giving the number of alleles ateach locus. Specifying this in advance enormously speeds up computation

Value

A value between 0 and infinity representing support (or lack ofsupport if the value is less than 1) for the hypothesis that the twoprofiles are parent and child. There is no mutation built into thiscalculation. This means that the LR will be zero if the profiles do notshare at least one allele in common at each locus in the multiplex.

Author(s)

James M. Curran

References

Buckleton, J, Triggs, C.M., and Walsh, S.J. (2005)ForensicDNA Evidence Interpretation, CRC Press., Boca Raton, FL. p.410

See Also

lrSib, IBS

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)C1 = randomChild(P1, fbiCaucs)lrPC(P1, C1, fbiCaucs)

Likelihood Ratio / Kinship Index for full-siblings

Description

Calculates Likelihood Ratio comparing the probability of two profiles ifthey are indeed full-sibs compared to unrelated. This is sometimes calledthe kinship index (KI) for full-sibs.

Usage

lrSib(sib1, sib2, Freqs = NULL, nLoci = length(sib1)/2, f = NULL, n = NULL)

Arguments

sib1

A matrix consisting of 2 columns and nLoci rows. Each entry inthe matrix is the (coded) allele held by the individual. This represents thealleged sibling. The relationship is reflexive so it does not matter whichprofile is labelled sib1 and sib2.

sib2

Seesib1

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex. This argument or both f and n must bespecified

nLoci

The number of loci in the profiles

f

A concatenated vector of allele frequencies. Specifying this speedsup computation enormously

n

A vector of lengthnLoci giving the number of alleles ateach locus. Specifying this in advance enormously speeds up computation

Value

A value between 0 and infinity representing support (or lack ofsupport if the value is less than 1) for the hypothesis that the twoprofiles are full-siblings. There is no mutation built into thiscalculation.

Author(s)

James M. Curran

References

Buckleton, J, Triggs, C.M., and Walsh, S.J. (2005)ForensicDNA Evidence Interpretation, CRC Press., Boca Raton, FL. p.411

See Also

lrSibDebug, lrPC, IBS

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)S1 = randomSib(P1, fbiCaucs)P2 = randomProfile(fbiCaucs)lrSib(P1, S1, fbiCaucs)lrSib(P1, P2, fbiCaucs)

Likelihood Ratio / Kinship Index for full-siblings

Description

Calculates Likelihood Ratio comparing the probability of two profiles ifthey are indeed full-sibs compared to unrelated. This is sometimes calledthe kinship index (KI) for full-sibs. This function is identical to lrSibexcept that the calculation is performed in R, and provides full calculationdetail at each locus. It exists primarily to check that the correct formulaand logic is being applied in the LR calculation so that the result can bemanually verified.

Usage

lrSibDebug(sib1, sib2, Freqs)

Arguments

sib1

A matrix consisting of 2 columns and nLoci rows. Each entry inthe matrix is the (coded) allele held by the individual. This represents thealleged sibling. The relationship is reflexive so it does not matter whichprofile is labelled sib1 and sib2.

sib2

Seesib1

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

Value

A list containing three elements Lines, lr, and Cases. Lines is alist of strings containing the calculation at each locus so that the resultcan be written to file for example. Cases is a numeric code listing whichlogical case (1-11) the locus falls into for the profiles in question. lr isthe KI for full-sibs for the two profiles.

Author(s)

James M. Curran

References

Buckleton, J, Triggs, C.M., and Walsh, S.J. (2005)ForensicDNA Evidence Interpretation, CRC Press., Boca Raton, FL. p.411

See Also

lrSib, lrPC, IBS

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)S1 = randomSib(P1, fbiCaucs)P2 = randomProfile(fbiCaucs)cat(paste(lrSibDebug(P1, S1, fbiCaucs)$Lines))cat(paste(lrSibDebug(P1, P2, fbiCaucs)$Lines))

Normalize frequencies to 1

Description

Normalize a list of frequencies at a series of genetic loci both sum to one.Not that this does not deal with the problem of values larger than one orsmaller than zero.

Usage

normalizeFreqs(Freqs)

Arguments

Freqs

A list containg elementsloci andfreqs.freqs is a list of vectors containing the frequencies at the givenloci.

Details

Divides vector in Freqs$freqs by the vector sum.

Value

A list containg elementsloci andfreqs.freqsis a list of vectors containing the frequencies at the given loci.

Author(s)

James M. Curran

See Also

checkFreqs

Examples

data(fbiCaucs)## induce an errorfbiCaucs$freqs[[1]] = rgamma(10,1,1)checkFreqs(fbiCaucs)fbiCaucs = normalizeFreqs(fbiCaucs)checkFreqs(fbiCaucs)

Print summary details of a substructed population

Description

Nicely prints summary information about a substructured population createdusingbreedFst

Usage

## S3 method for class 'population'print(x, ...)

Arguments

x

The population object to be printed

...

Ignored - really should be passed to print, but given cat isactually called they are ignored

Author(s)

James M. Curran

See Also

breedFst

Examples

data(fbiCaucs)p = breedFst(fbiCaucs)print(p)

Print a DNA profile

Description

Nicely prints a profile object out in genotype pairs

Usage

## S3 method for class 'profile'print(x, horizontal = FALSE, ...)

Arguments

x

The profile object to be printed

horizontal

ifTRUE then the profile will print on a singleline instead of multiple lines. Useful for comparing two profiles

...

Ignored - really should be passed to print, but given cat isactually called they are ignored

Author(s)

James M. Curran

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)P2 = randomProfile(fbiCaucs)P1print(P1, horizontal = TRUE)print(P2, horizontal = TRUE)

Generate a random child from a given DNA profile and a given set of allelefrequencies

Description

Generates a random child (or parent) from a given DNA profile from a givenset of allele frequencies. At each locus, the child inherits the firstallele of the given profile with one half, or the second allele withprobability one half. The second allele is chosen at random with probabilityproportional to the allele frequencies.

Usage

randomChild(profile, Freqs)

Arguments

profile

A vector of length 2*nLoci. Each entry in the vector is the(coded) allele held by the individual. This represents the parent. Therelationship is reflexive so it does not matter if the profile is a parentor a child.

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

Details

The alleles are simply integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

A vector with 2*nLoci elements. Each pair of elements represents thegenotpe of the random individual at that locus. The genotype alleles arealways ordered so that allele1 <= allele2.

Author(s)

James M. Curran

See Also

randomChild, randomSample, randomSib

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)C1 = randomChild(P1,fbiCaucs)P1C1

Generate one or more random parent/child pairs from a given set of allelefrequencies

Description

Generates one or more pairs random parent/child pairs from a given set ofallele frequencies.

Usage

randomPCPairs(Freqs, BlockSize = 1)

Arguments

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

BlockSize

The number of pairs of profiles to generate

Details

The alleles are simply integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

A list of lengthBlockSize. Each element of the list has asublist containing two profiles calledparent andchild

Author(s)

James M. Curran

See Also

randomSibPairs, randomProfilePairs

Examples

data(fbiCaucs)P = randomPCPairs(fbiCaucs)P$parentP$child

Generate a random DNA profile from a given set of allele frequencies

Description

Generates a random DNA profile from a given set of allele frequencies.

Usage

randomProfile(Freqs)

Arguments

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

Details

The alleles are simply integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

A vector with 2*nLoci elements. Each pair of elements represents thegenotpe of the random individual at that locus. The genotype alleles arealways ordered so that allele1 <= allele2.

Author(s)

James M. Curran

See Also

randomChild, randomSample, randomSib

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)

Generate one or more random DNA profile pairs from a given set of allelefrequencies

Description

Generates one or more random DNA profile pairs from a given set of allelefrequencies.

Usage

randomProfilePairs(Freqs, BlockSize = 1)

Arguments

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

BlockSize

The number of pairs of profiles to generate

Details

The alleles are simply integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

A list of lengthBlockSize. Each element of the list has asublist containing two profiles calledprof1 andprof2

Author(s)

James M. Curran

See Also

randomPCPairs, randomSibPairs

Examples

data(fbiCaucs)P = randomProfilePairs(fbiCaucs)P$prof1P$prof2

Generate a random sample of related (or unrelated) pairs of people

Description

Generate a random sample of unrelated, full-sib, or parent/child pairs ofprofiles at a single locus.

Usage

randomSample(nLoc, Freqs, rel = "UN", N = 10000)

Arguments

nLoc

The locus number to sample from

Freqs

A list containg elementsloci andfreqs.freqs is a list of vectors containing the frequencies at the givenloci.

rel

One of 'UN', 'FS', or 'PC' for unrelated, full-sib, orparent/child pairs respectively.

N

The sample size

Value

An N by 4 matrix of random profiles. The first two columns representthe genotype of person one and the second two columns represent the genotypeof column two. Note that the random profiles do not use the orginal alleledesignations.

Author(s)

James M. Curran

See Also

randomProfile, randomSib, randomChild

Examples

data(fbiCaucs)G = randomSample(1, fbiCaucs, "FS", 100)

Generate a random sibling from a given DNA profile and a given set of allelefrequencies

Description

Generates a random sibling from a given DNA profile from a given set ofallele frequencies. At each locus, the sibling inherits the first allele ofthe given profile with one quarter, or the second allele with probabilityone quarter, both alleles with probability one quarter, or neither withprobability one quarter. If the sibling inherits zero or one identicalalleles, the missing alleles are chosen at random with probabilityproportional to the allele frequencies.

Usage

randomSib(profile, Freqs)

Arguments

profile

A vector consisting of 2*nLoci elements. Each element in thevector is the (coded) allele held by the individual. This represents thesibling.

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

Details

The alleles are simply integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

A vector with 2*nLoci elements. Each pair of elements represents thegenotpe of the random individual at that locus. The genotype alleles arealways ordered so that allele1 <= allele2.

Author(s)

James M. Curran

See Also

randomChild, randomSample

Examples

data(fbiCaucs)P1 = randomProfile(fbiCaucs)S1 = randomSib(P1,fbiCaucs)P1S1

Generate one or more pairs of random siblings from a given set of allelefrequencies

Description

Generates one or more pairs of random siblings from a given set of allelefrequencies.

Usage

randomSibPairs(Freqs, BlockSize = 1)

Arguments

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

BlockSize

The number of pairs of profiles to generate

Details

The alleles are simply integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

A list of lengthBlockSize. Each element of the list has asublist containing two profiles calledsib1 andsib2

Author(s)

James M. Curran

See Also

randomPCPairs, randomProfilePairs

Examples

data(fbiCaucs)P = randomSibPairs(fbiCaucs)P$sib1P$sib2

Read in a file of allele frequencies

Description

Reads in a file of alleles in a particular format.

Usage

readFreqs(strPath, FSIGenFormat = TRUE, delim = ",")

Arguments

strPath

The file from which to read the frequencies

FSIGenFormat

Tells the function whether the file is either in FSIGenetics format (see below) or 'Curran' format

delim

This argument is used whenFSIGenFormat isTRUE,and is the regular expression used to delimit columns of the table. it isset to a single comma by default, and multiple delimiters are consideredempty separate fields. There probably should be an additional argument whichspecifies the missing or empty cell symbol, but I won't programme thisunless somebody asks for it

Details

This function reads frequencies in the rectangular allele freqency tableformat used by FSI Genetics and other journals. This file format assumes acomma separated value file (CSV) (although the column delimeter can bespecified). The first column should be labelled 'Allele' and contain the STRallele designations that are used in the data set. The remaining columnswill have the locus name as a header, and frequencies that are either blank,zero, or non-zero. Blanks or zeros are used to specify that the allele isnot observed (and not used) at the locus. The final row of the file shouldstart with 'N' or 'n' in the first column and give the number of individualstyped (or the number of alleles recorded) in assessing the frequency of thealleles.

The second format is a very particular 'Curran' text format. The first linecontains the number of loci in the multiplex. The next line will contain thename of the first locus and the number of alleles, nA, the locus separatedby a comma. The next nA lines contain the allele number (from 1 to nA), theSTR designation of the allele, and the frequency separated by commas. Thispattern is repeated for each locus. In the future this function will readthe rectangular allele freqency table used by FSI Genetics and otherjournals.

Value

a list containing two vectors and a list, loci, counts, and freqs.The vector loci is a vector of the locus names in the frequency file. Thevector counts is a vector of the number of individuals (or sometimesalleles) typed at each locus. This will null if the 'Curran' format is used.The list freqs, is a list of vectors with each vector containing thefrequencies of the alleles at the locus. The names of the elements of thevectors are the STR allele designations.

Author(s)

James M. Curran


Read a set of profiles from a file

Description

Reads a set of profiles from a file

Usage

readProfiles(  fileName,  freqs = NULL,  sep = "\t",  header = FALSE,  id = 1,  discardMissing = TRUE)

Arguments

fileName

a path to the profile file.

freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex. If this is leftNULL, then it is calculated from the profile file.

sep

a character that delimits the fields in the profile file.

header

a boolean which isTRUE if the profile file has a column header line.

id

a column number indicating which column the profile id's are stored. Ifid == -1, then this means there is no id information.

discardMissing

ifTRUE, then all profiles which have alleles which cannot be matched to the frequency file are ommitted and returnedin the return list.

Details

The alleles are recorded integers rather than the STR repeat numbers. Thisspeeds up computation immensely when calculating any of the LRs or IBS.

Value

a list containing adata.frame of profiles where the alleles have been recoded to the allele index number, rather than the allele itself, and a set of frequencies in the sameformat as you would get fromreadFreqs. Iffreqs have been supplied, then this will just be the same set of frequencies, if they have not, then this will be calculatedfrom the profiles. Given that the profiles generally do not have any locus name informationthe loci will just be labelled Locus1, Locus2, .... If there are missing values then the raw missing profiles are returned

Author(s)

James M. Curran


Read a simulation result set from file

Description

This function will read the output fromsim that has been saved todisk

Usage

readResults(  N = 0,  rel = "UN",  gzip = TRUE,  strPath = "",  strVer = "",  fileName = NULL)

Arguments

N

The number of iterations in the simulation

rel

'UN' = unrelated, 'FS' = full-sib, 'PC' = parent-child

gzip

IfTRUE then it is assumed that the file is compressed

strPath

Optional location of files. Must terminate with / otherwiseit will not work

strVer

A version string, useful if more than simulation has been run

fileName

This argument allows the user to override the default filenaming conventions of the result file

Details

The arguments to this file are used to generate the input file name. Theformat is very rigid, being 'results-sim-rel-N(-strVer).csv(.gz)' That is,if strVer is something than an empty string then it is included after thenumber of interations. Similarly ifgzip == TRUE then the filename isassumed to end with '.gz'

Value

a data frame with three columns labelled sib, pc, and ibs. Theserepresent the LRs for sibs and parent-child calculated on each simulatedprofile pair, and the number of matching alleles (IBS).

Author(s)

James M. Curran

See Also

sim

Examples

data(fbiCaucs)## not run## write the results of 100 unrelated profile pairs to## results-sim-UN-100.csv.gz## and read it back in## Not run: sim(100, save = T)unrel = readResults(100)sim(100, rel = "FS", strVer = "01", save = T)sibs = readResults(100, rel = "FS", strVer = "01")## End(Not run)

Perform the relatives simulation

Description

Generate N pairs with a given relationship and calculate the LR for sibs,parent-child and the number of matching alleles

Usage

sim(  N,  Freqs,  rel = "UN",  save = FALSE,  strPath = "",  strVer = "",  BlockSize = N/100,  fileName = NULL)

Arguments

N

The number of iterations to carry out

Freqs

A list containing two lists labelled loci and freqs. The secondlist is a list of vectors containing the allele frequencies of each alleleat each locus in the multiplex.

rel

generate unrelated (rel = 'UN'), full-sibs (rel ='FS'), or parent child (rel = 'PC') pairs

save

Write the results to disk ifTRUE

strPath

Optional prefix to add to the results file path so that theoutput location can be specified

strVer

Optional suffix for the results file. This is useful whenrunning multiple instances of R

BlockSize

Sets the number of random profiles to be generated in eachiteration. By default the block size is set to 1 percent of the total samplesize. It is unclear whether the procedure is more efficient if a biggerpercentage of the total is used. Users must take care to make sure that theblock size evenly dividesN otherwise the procedure will exit

fileName

This argument lets the user override the default result filenaming scheme

Details

This is the function that generates all the data for the results in thepaper. WARNING: this function is not especially fast. To achieve the 100million iterations used in the paper, 30 instances of R were launched on amulticore server. Each instance represented one relationship with 10 millioniterations. The compute time for this arrangement was approximately 1 hours,meaning a full serial run would have taken over 30 hours to achieve the sameresult.

Value

a data frame with three columns: sib, pc, ibs containing the LRs forfull-siblings, parent-child, and the number of matching alleles for eachgenerated pair of profiles.

Author(s)

James M. Curran

See Also

readResults, errorRate

Examples

## not run## this replicates Ge et al.'s experiment and takes about 45 minutes## to run (I think)## Not run: data(fbiCaucs)N = 1000000sim(N, fbiCaucs, save = T)sim(N, fbiCaucs, 'FS', save = T)sim(N, fbiCaucs, 'PC', save = T)## End(Not run)

Simulate and count unique alleles in N person mixtures

Description

This function simulates N persons mixtures using the supplied frequencies and recordsthe number of times they share 1, 2, ..., 2N alleles locus by locus.

Usage

simNpersonMixture(freqs, numContributors, numIterations = 10000)

Arguments

freqs

a set of allele frequencies. The format can be found inreadFreqs

numContributors

the number of contributors to each mixture. Must be >= 2.

numIterations

the number of N person mixtures to simulate in total.

Value

an object of class npmresult


Export a population with substructure to a Nexus file

Description

Exports a population with population substructure to a Nexus formatted fileso that GDA can be used to check the Fst calculations

Usage

toNexus(Pop, fileName = "output.nex")

Arguments

Pop

An object of type 'population' - seebreedFst for adescription of the object

fileName

The name of the file output file

Author(s)

James M. Curran

References

Maddison DR, Swofford DL, Maddison WP (1997), NEXUS: Anextensible file format for systematic information, Systematic Biology 46(4): 590–621.

Zaykin, D. and Lewis, P., GDA - software to accompany Genetic Data AnalysisII, <http://phylogeny.uconn.edu/software/>.

See Also

breedFst

Examples

## Don't run## Not run: data(USCaucs)p = breedFst(USCaucs)toNexus(p)## End(Not run)

Saves/writes population frequencies to disk

Description

Writes a population of profiles to disk using the original alleledesignations rather than the internal integer representations that are usedfor the other functions.

Usage

writeCSV(fileName, pop, n = 100, delim = ",")

Arguments

fileName

the name and path where the population profiles are to besaved to.

pop

alist containing elementsloci andfreqs.lociis a vector with the loci in the data set.freqs is a list of vectors with elementsnamed after the elements inloci. Each locus infreqs is a vector of allele frequencieswith the allele names given by the named elements.TRUE then an Amelogenin marker is added to the population, and allthe profiles are set to male XY, although this is coded to 1,2 to keep theallele designations numeric.

n

the number of people in the database. This is arbitrarily set to 100 by default.

delim

The allele delimiter.

Value

a matrix which contains the table written to file.

Note

Rare alleles are recoded to 108.1. This is unlikely to do the right thingwhen you have things like <5 or >20 in your allele names. Given it is impossible to predict what a user would like to do, I suggest you recode them yourself before usingthis function.

Author(s)

James M. Curran

See Also

breedFst USCaucs

Examples

data(USCaucs)## Not run:   writeCSV("USCaucs.csv", USCaucs)## End(Not run)

Saves/writes population profiles to disk

Description

Writes a population of profiles to disk using the original alleledesignations rather than the internal integer representations that are usedfor the other functions.

Usage

writePop(fileName, pop, addAmelo = FALSE, delim = ",", dupLoci = TRUE)

Arguments

fileName

the name and path where the population profiles are to besaved to.

pop

an object of classpopulation, most likely produced bybreedFst

addAmelo

The simulated populations do not have Amelogenin. IfTRUE then an Amelogenin marker is added to the population, and allthe profiles are set to male XY, although this is coded to 1,2 to keep theallele designations numeric.

delim

The allele delimiter.

dupLoci

IfTRUE the locus names are written twice in the header, otherwise just once.

Note

Rare alleles are recoded to 108.1.

Author(s)

James M. Curran

See Also

breedFst

Examples

data(USCaucs)pop = breedFst(USCaucs)## Not run:   writePop("USCaucs.csv", pop)  ## End(Not run)

[8]ページ先頭

©2009-2025 Movatter.jp