bnpsd\[\newcommand{\Fst}{F_{\text{ST}}}\newcommand{\ft}[1][j]{f_{#1}^T}\]
Thebnpsd package simulates the genotypes of an admixed population. In the Pritchard-Stephens-Donnelly (PSD) model(Pritchard, Stephens, and Donnelly 2000), admixed individuals draw their alleles with individual-specific probabilities (admixture proportions) from\(K\) intermediate subpopulations. We impose the Balding-Nichols (BN) model(Balding and Nichols 1995) to the intermediate subpopulation allele frequency, which thus evolve independently with subpopulation-specific inbreeding coefficients (\(\Fst\) values) from a common ancestral population\(T\). The kinship coefficients and generalized\(\Fst\) of the admixed individuals were derived in recent work(Ochoa and Storey 2016). A simulated admixed population generated using this package was used to benchmark kinship and\(\Fst\) estimators in the accompanying paper(Ochoa and Storey 2021). Here we briefly summarize the notation and intuition behind the key parameters (see(Ochoa and Storey 2016) for precise definitions).
There are conceptually two parts of the BN-PSD model, one that defines relatedness between individuals (under the admixture framework), while the other simulates allele frequencies and genotypes conditional on the specified parameters of relatedness/admixture.
The population structure determines how individuals are related to each other. The key parameters are the coancestry coefficients of the intermediate subpopulations and the admixture proportions of each individual for each subpopulation (\(q_{ju}\)), which are treated as fixed variables.
Thebnpsd functions that model population structure (not drawing allele frequencies or genotypes) admit arbitrary coancestry matrices for the intermediate subpopulations. However, the code that draws allele frequencies and genotypes requires these subpopulations to beindependent, which means that the coancestry matrix must have zero values off-diagonal. In this setting—the only case we’ll considered in this vignette—the diagonal values of the coancestry matrix of the intermediate subpopulations are inbreeding coefficients (\(\ft[S_u]\) below), which correspond to subpopulation-specific\(\Fst\) values.
Each intermediate subpopulation\(S_u\) (\(u \in \{1, ..., K\}\)) evolved independently from a shared ancestral population\(T\) with an inbreeding coefficient denoted by\(\ft[S_u]\). Note that\(T\) is a superscript, and does not stand for exponentiation or matrix transposition. Each admixed individual\(j \in \{1, ..., n\}\) draws each allele from\(S_u\) with probability given by the admixture proportion\(q_{ju}\) (\(\sum_{u=1}^K q_{ju} = 1 \forall j\)). In this case the coancestry coefficients\(\theta^T_{jk}\) between individuals\(j,k\) (including\(j=k\) case) and the\(\Fst\) of the admixed individuals are given by:\[\theta^T_{jk} = \sum_{u=1}^K q_{ju} q_{ku} \ft[S_u],\quad \quad\Fst = \sum_{j=1}^n \sum_{u=1}^K w_j q_{ju}^2 \ft[S_u],\] where\(0 < w_j < 1, \sum_{j=1}^n w_j = 1\) are user-defined weights for individuals (default\(w_j=\frac{1}{n} \forall j\)). Note\(\theta^T_{jk}\) equals the kinship coefficient for\(j \ne k\) and the inbreeding coefficient for\(j=k\).
The bias coefficient\(s\) is defined by\[s = \frac{\bar{\theta}^T}{\Fst}\] where\(\bar{\theta}^T = \sum_{j=1}^n \sum_{k=1}^n w_j w_k \theta_{jk}^T\). This\(0 < s \le 1\) approximates the proportional bias of\(\Fst\) estimators that assume independent subpopulations, and onebnpsd function below fits its parameters to yield a desired\(s\).
This section details the distributions of the allele frequencies and genotypes of the various populations or individuals of the BN-PSD model.
Every biallelic locus\(i\) in the ancestral population\(T\) has an ancestral reference allele frequency denoted by\(p_i^T\). By default thebnpsd code draws\[p_i^T \sim \text{Uniform}(a, b)\] with\(a=0.01, b=0.5\), but the code accepts\(p_i^T\) from arbitrary distributions (see below).
The distribution of the allele frequency at locus\(i\) in subpopulation\(S_u\), denoted by\(p_i^{S_u}\), is the BN distribution:\[p_i^{S_u} | T \sim \text{Beta} \left( \nu_s p_i^T, \nu_s \left( 1-p_i^T \right) \right),\] where\(\nu_s = \frac{1}{\ft[S_u]}-1\). Allele frequencies for different loci and different subpopulations (\(S_u,S_v, u \ne v\)) are drawn independently.
Each admixed individual\(j\) at each locus\(i\) draws alleles from a mixture of Bernoulli distributions from each intermediate subpopulation, which is mathematically equivalent to assigning what we call individual-specific allele frequencies\(\pi_{ij}\) constructed as:\[\pi_{ij} = \sum_{u=1}^K p_i^{S_u} q_{ju}.\] The unphased genotype\(x_{ij}\) (encoded to count the number of reference alleles) is drawn as:\[x_{ij}|\pi_{ij} \sim \text{Binomial}(2, \pi_{ij}).\]
Let’s generate the same population structure used in the simulation of(Ochoa and Storey 2021).
# this packagelibrary(bnpsd)# for nice colorslibrary(RColorBrewer)# for visualizing coancestry matrix with plot_popkin, and other handy functionslibrary(popkin)# dimensions of data/model# number of individuals (NOTE this is 10x less than in publication!)n_ind <-100# number of intermediate subpopsk_subpops <-10# define population structure# subpopulation FST vector, up to a scalarinbr_subpops <-1:k_subpops# desired bias coefficientbias_coeff <-0.5# desired FST for the admixed individualsFst <-0.1# admixture proportions from 1D geographyobj <-admix_prop_1d_linear(n_ind = n_ind,k_subpops = k_subpops,bias_coeff = bias_coeff,coanc_subpops = inbr_subpops,fst = Fst)admix_proportions <-obj$admix_proportions# rescaled inbreeding vector for intermediate subpopulationsinbr_subpops <-obj$coanc_subpops# get pop structure parameters of the admixed individualscoancestry <-coanc_admix(admix_proportions, inbr_subpops)# verify that we got the desired FST!fst_admix(admix_proportions, inbr_subpops)#> [1] 0.1# the mean of the diagonal of the coancestry matrix also equals FSTinbr <-diag(coancestry)fst(inbr)# `fst` is a function in the popkin package#> [1] 0.1# verify that we got the desired `bias_coeff` too!mean(coancestry)/Fst#> [1] 0.5# visualize the per-subpopulation inbreeding coefficients (FSTs)# shrink default marginspar(mar =c(4,4,0,0)+0.2)# colors for independent subpopulationscol_subpops <-brewer.pal(k_subpops,"Paired")barplot(inbr_subpops,col = col_subpops,names.arg =1:k_subpops,ylim =c(0,1),xlab ='Subpopulation',ylab ='Inbreeding coeff.')# visualize the admixture proportions# shrink default marginspar(mar =c(1,4,0,0)+0.2)barplot(t(admix_proportions),col = col_subpops,border =NA,space =0,ylab ='Admixture prop.')mtext('Individuals',1)# Visualize the coancestry matrix using "popkin"!plot_popkin( coancestry,# zero inner margin (plus padding) because we have no labelsmar =0,leg_title ='Coancestry')Now let’s draw all the random allele frequencies and genotypes from the population structure. The easiest way is to usedraw_all_admix:
# number of loci in simulation (NOTE this is 30x less than in publication!)m_loci <-10000# draw all random Allele Freqs (AFs) and genotypes# reuse the previous inbr_subpops, admix_proportionsout <-draw_all_admix(admix_proportions = admix_proportions,inbr_subpops = inbr_subpops,m_loci = m_loci,#NOTE by default p_subpops and p_ind are not returned, but here we will ask for themwant_p_subpops =TRUE,#NOTE: asking for `p_ind` increases memory usage substantially,# so don't ask for it unless you're sure you want it!want_p_ind =TRUE)# genotypesX <-out$X# ancestral AFsp_anc <-out$p_anc# intermediate independent subpopulation AFsp_subpops <-out$p_subpops# individual-specific AFsp_ind <-out$p_ind# inspect distribution of ancestral AFs (~ Uniform(0.01, 0.5))# shrink default margins for these figurespar(mar =c(4,4,0,0)+0.2)hist(p_anc,xlab ='Ancestral AF',main ='',xlim =c(0,1))# distribution of intermediate population AFs# (all subpopulations combined)# (will be more dispersed toward 0 and 1 than ancestral AFs)hist(p_subpops,xlab ='Intermediate Subpopulation AF',main ='',xlim =c(0,1))# distribution of individual-specific AFs (admixed individuals)# (admixture reduces differentiation, so these resemble ancestral AFs a bit more)hist(p_ind,xlab ='Individual-specific AF',main ='',xlim =c(0,1))# genotype distribution of admixed individualsbarplot(table(X),xlab ='Genotypes',ylab ='Frequency',col ='white')Let’s verify that the correlation structure of the genotypes matches the theoretical coancestry matrix we constructed earlier. For this we use thepopkin function of the package with the same name.
# for best estimates, group individuals into subpopulations using the geography# this averages more individuals in estimating the minimum kinshipsubpops <-ceiling( (1:n_ind )/n_ind*k_subpops )table(subpops)# got k_subpops = 10 with 100 individuals each#> subpops#> 1 2 3 4 5 6 7 8 9 10#> 10 10 10 10 10 10 10 10 10 10# now estimate kinship using popkinkinship_estimate <-popkin(X, subpops)# replace diagonal with inbreeding coeffs. to match coancestry matrixcoancestry_estimate <-inbr_diag(kinship_estimate)# Visualize the coancestry matrix using "popkin"!plot_popkin(list( coancestry, coancestry_estimate ),titles =c('Truth','Estimate'),# second value is top margin, for panel titlesmar =c(0,2.5),leg_title ='Coancestry')We can also look at the structure of the two allele frequency matrices,p_subpops andp_ind, usingpopkin_af, which estimates coancestry from allele frequency matrices (in contrast,popkin estimates kinship from genotype matrices, and is preferred when genotypes are available).
coancestry_subpops_estimate <-popkin_af( p_subpops )coancestry_ind_estimate <-popkin_af( p_ind )The first coancestry matrix is small, and matches the expectation of independent subpopulations (zero values off-diagonal) with the subpopulation inbreeding coefficients as the diagonal coancestry values:
plot_popkin(list(diag( inbr_subpops ), coancestry_subpops_estimate ),titles =c('Truth','Estimate'),mar =c(0,2.5),leg_title ='Coancestry') The second coancestry matrix, estimated from individual-specific allele frequencies, is the same in expectation to the true coancestry matrix of the model:
plot_popkin(list( coancestry, coancestry_ind_estimate ),titles =c('Truth','Estimate'),mar =c(0,2.5),leg_title ='Coancestry') The last coancestry matrix also equals in expectation the transformed kinship matrix estimated earlier, but this estimate from allele frequencies is less noisy. However, in real data such allele frequencies are not observed, and their estimates can vary in quality and result in biases, so the above is more a validation of the simulation than a practical estimation approach.
The random variables generated bydraw_all_admix above can also be generated separately using the following functions (where\(p\) is the usual variable symbol for allele frequencies):
draw_p_anc (drawn ancestral allele frequencies)draw_p_subpops (draw independent subpopulation allele frequencies)make_p_ind_admix (construct individual-specific allele frequencies under the admixture model)draw_genotypes_admix (draw genotypes under the admixture model)These functions are provided for greater flexibility (examples follow further below). However, the joint functiondraw_all_admix has the advantage of removing fixed loci (loci that were randomly set to 0 or 2 for all individuals, despite non-zero allele frequencies), which are re-drawn from scratch (starting from the ancestral allele frequencies).
Here is the step-by-step procedure for drawing AFs and genotypes in the defaultdraw_all_admix (except for the re-drawing of fixed loci):
# reuse the previous m_loci, inbr_subpops, admix_proportions# draw ancestral AFsp_anc <-draw_p_anc(m_loci)# draw intermediate independent subpopulations AFsp_subpops <-draw_p_subpops(p_anc, inbr_subpops)# draw individual-specific AFsp_ind <-make_p_ind_admix(p_subpops, admix_proportions)# draw genotypesX <-draw_genotypes_admix(p_ind)We provide functions for these separate steps to allow for more flexibility. It is worth noting thatdraw_p_anc can be used to draw from a Symmetric Beta distribution if thebeta parameter is set. In the example below, a small shape value ofbeta\(< 1\) is used to draw a distribution with more rare minor variants.
# this increases the proportion of rare allelesp_anc_beta <-draw_p_anc(m_loci,beta =0.1)# shrink default margins for these figurespar(mar =c(4,4,0,0)+0.2)hist(p_anc_beta,xlab ='Ancestral AF',main ='',xlim =c(0,1)) However, if you want any other distribution, that can be implemented at this step. For a ridiculous example, the ancestral allele frequencies could be drawn from a (non-symmetric) Beta instead of using
draw_p_anc:
# this increases the proportion of rare allelesp_anc_beta <-rbeta(m_loci,shape1 =0.1,shape2 =1)# shrink default margins for these figurespar(mar =c(4,4,0,0)+0.2)hist(p_anc_beta,xlab ='Ancestral AF',main ='',xlim =c(0,1))You could also draw genotypes from the ancestral population or the intermediate populations:
# draw genotypes for one individual from the ancestral population# use "cbind" to turn the vector p_anc into a column matrix# ("draw_genotypes_admix" expects a matrix)X_anc <-draw_genotypes_admix(cbind( p_anc ) )# returns a column matrix:dim(X_anc)#> [1] 10000 1# draw genotypes from intermediate populations# draws one individual per intermediate populationX_subpops <-draw_genotypes_admix(p_subpops)Here we show examples for functions that create admixture matrices for various simple population structures. The admixture scenarios implemented inbnpsd are generated by these functions:
admix_prop_1d_linear: Linear 1D geographyadmix_prop_1d_circular: Circular 1D geographyadmix_prop_indep_subpops: Independent SubpopulationsThe main example (above) was foradmix_prop_1d_linear, the rest follow.
This is a twist on the earlier 1D geography where subpopulations and individuals are placed on a circumference, so random walks wrap around and the appropriate density is the Von Misses distribution.
Let’s generate an analogous population structure to the original “linear” example.
# data dimensions# number of individualsn_ind <-100# number of intermediate subpopsk_subpops <-10# define population structure# subpopulation FST vector, up to a scalarinbr_subpops <-1:k_subpops# desired bias coefficientbias_coeff <-0.5# desired FST for the admixed individualsFst <-0.1# admixture proportions from *circular* 1D geographyobj <-admix_prop_1d_circular(n_ind = n_ind,k_subpops = k_subpops,bias_coeff = bias_coeff,coanc_subpops = inbr_subpops,fst = Fst)admix_proportions <-obj$admix_proportionsinbr_subpops <-obj$coanc_subpops# get pop structure parameters of the admixed individualscoancestry <-coanc_admix(admix_proportions, inbr_subpops)# verify that we got the desired FST!fst_admix(admix_proportions, inbr_subpops)#> [1] 0.1# verify that we got the desired bias_coeff too!mean(coancestry)/Fst#> [1] 0.5# visualize the per-subpopulation inbreeding coefficients (FSTs)# tweak margins/etcpar(mar =c(2.5,2.5,0.3,0)+0.2,lab =c(2,1,7),mgp =c(1.5,0.5,0))# colors for independent subpopulationscol_subpops <-brewer.pal(k_subpops,"Paired")barplot(inbr_subpops,col = col_subpops,names.arg =colnames(admix_proportions),xlab ='Subpopulation',ylab ='Inbr')# visualize the admixture proportions# tweak margins/etcpar(mar =c(1,4,0.4,0)+0.2,lab =c(2,2,7))barplot(t(admix_proportions),col = col_subpops,border =NA,space =0,ylab ='Admix prop')mtext('Individuals',1)# Visualize the coancestry matrix using "popkin"!plot_popkin( coancestry,leg_n =3,mar =c(0,0.4),leg_title ='Coancestry')The independent subpopulations model, where individuals are actually unadmixed, is the most trivial form of the BN-PSD admixture model.
# define population structure# we'll have k_subpops = 3, each subpopulation with these sizes:n1 <-100n2 <-50n3 <-20# here's the labels (for simplicity, list all individuals of S1 first, then S2, then S3)labs <-c(rep.int('S1', n1),rep.int('S2', n2),rep.int('S3', n3))# data dimensions infered from labs:# number of individuals "n_ind"length(labs)#> [1] 170# number of subpopulations "k_subpops"k_subpops <-length(unique(labs))k_subpops#> [1] 3# desired admixture matrixadmix_proportions <-admix_prop_indep_subpops(labs)# got a numeric matrix with a single 1 value per row# (denoting the sole subpopulation from which each individual draws its ancestry)head(admix_proportions,2)#> S1 S2 S3#> [1,] 1 0 0#> [2,] 1 0 0# construct the intermediate subpopulation FST vector# the desired final FSTFst <-0.2# subpopulation FST vector, unnormalized so farinbr_subpops <-1:k_subpops# normalized to have the desired FST#NOTE fst is a function in the `popkin` packageinbr_subpops <-inbr_subpops/fst(inbr_subpops)*Fst# verify FST for the intermediate subpopulationsfst(inbr_subpops)#> [1] 0.2# get coancestry of the admixed individualscoancestry <-coanc_admix(admix_proportions, inbr_subpops)# before getting FST for individuals, weigh then inversely proportional to subpop sizesweights <-weights_subpops(labs)# function from `popkin` package# verify FST for individuals (same as for intermediate subpops for this pop structure)fst_admix(admix_proportions, inbr_subpops, weights)#> [1] 0.2# visualize the per-subpopulation inbreeding coefficients (FSTs)# tweak margins/etcpar(mar =c(2.5,2.5,0,0)+0.2,lab =c(2,1,7),mgp =c(1.5,0.5,0))# colors for independent subpopulationscol_subpops <-brewer.pal(k_subpops,"Paired")barplot(inbr_subpops,col = col_subpops,names.arg =colnames(admix_proportions),xlab ='Subpopulation',ylab ='Inbr')# visualize the admixture proportions# tweak margins/etcpar(mar =c(1,4,0.4,0)+0.2,lab =c(2,2,7))barplot(t(admix_proportions),col = col_subpops,border =NA,space =0,ylab ='Admix prop')mtext('Individuals',1)# Visualize the coancestry matrix using "popkin"!plot_popkin( coancestry,leg_n =3,mar =c(0,0.4),leg_title ='Coancestry')In all of the previous examples we simulated fromindependent subpopulations first, then admixed individuals in various ways (1D linear and 1D circular scenarios, as well as “independent subpopulations” which meant there was no admixture). Here we use an extension of the approach to simulate the intermediate subpopulations related by atree, which allows forcorrelated subpopulations.
First let’s construct the tree of interest. An easy-to-understand example is a serial founder effect, where there are consecutive splits on one branch, which somewhat resembles the human tree. For simplicity we model only 5 “tip” subpopulations and shall assume that there is equal inbreeding on every branch, equal to 0.1. This tree, in the standard (Newick) character format, is:
tree_str <- '(S1:0.1,(S2:0.1,(S3:0.1,(S4:0.1,S5:0.1)N3:0.1)N2:0.1)N1:0.1)T;'The tips (leaf nodes) are labeled as S1 to S5, the root node is labeled as T, and the (internal) nodes are N1 to N3. Before proceeding, we parse and process this tree using theread.tree function from the packageape, which creates the desired tree object of classphylo that the our functions require.
# base package for phylogeneticslibrary(ape)# parses tree, creates object of class "phylo"tree_subpops <-read.tree(text = tree_str )# plot treepar(mar =c(3,0,0,0)+0.2 )plot( tree_subpops,show.node.label =TRUE )# add axis and labelaxisPhylo(backward =FALSE )mtext('Coancestry',side =1,line =2 ) The values of the edges in this tree are taken to be inbreeding values, and we draw allele frequencies for each subpopulation sequentially from its parent population using the Balding-Nichols distribution, which is achieved with the function
draw_p_subpops_tree:
p_subpops_tree <-draw_p_subpops_tree(p_anc = p_anc,tree_subpops = tree_subpops)Let’s validate the correlation structure of these allele frequencies:
# estimate coancestrycoancestry_subpops_tree_estimate <-popkin_af( p_subpops_tree )# expected coancestry according to modelcoancestry_subpops_tree <-coanc_tree( tree_subpops )# plotplot_popkin(list( coancestry_subpops_tree, coancestry_subpops_tree_estimate ),titles =c('Truth','Estimate'),mar =c(2,2.5),leg_title ='Coancestry',names =TRUE) Note that the coancestry values in the tree are not additive in the final coancestry matrix. While the tree edges go up to 0.4 (0.1 added 4 times), the maximum coancestry (equal to the total self-coancestry for node
S5 relative toT) is actually equal to:
1-(1-0.1)^4#> [1] 0.3439The discrepancy is that the coancestry values in the matrix are all relative to the root nodeT, while the edges in the plot of the tree are each relative to their parent node (the edgeN3-S5 is relative toN3). The coancestry between a pair of different subpopulations is given by the total coancestry of the most recent common ancestor node of the two subpopulations (so the coancestry ofS4 andS5 is the self-coancestry ofN3).
Now we repeat our previous simulation of the admixture of individuals, but this time replacing the independent subpopulation structure with our previous tree structure for the intermediate subpopulations. In our previous code, it suffices to replace the previousinbr_subpops withcoancestry_subpops_tree which we estimated fromtree_subpops in the last section.
# number of admixed individualsn_ind <-100# number of subpopulations, extracted from true coancestryk_subpops <-nrow( coancestry_subpops_tree )# define population structure# for simplicity here we set spread parameter `sigma` directly# admixture proportions from 1D geographyadmix_proportions <-admix_prop_1d_linear(n_ind = n_ind,k_subpops = k_subpops,sigma =0.5)# get pop structure parameters of the admixed individualscoancestry <-coanc_admix(admix_proportions, coancestry_subpops_tree)# calculate true FST of simulationfst_admix(admix_proportions, coancestry_subpops_tree)#> [1] 0.2078772These admixture proportions resemble our previous simulation, except we reduced the number of intermediate subpopulations and the spread parametersigma was picked manually just above:
# visualize the admixture proportions# colors for these independent subpopulationscol_subpops <-brewer.pal(k_subpops,"Paired")# shrink default marginspar(mar =c(1,4,0,0)+0.2)barplot(t(admix_proportions),col = col_subpops,border =NA,space =0,ylab ='Admixture prop.')mtext('Individuals',1)The coancestry matrix looks a bit more blocky because there are fewer intermediate subpopulations and there is less spread, and also resembles the coancestry of the tree subpopulations we visualized earlier:
# Visualize the coancestry matrix using "popkin"!plot_popkin( coancestry,# zero inner margin (plus padding) because we have no labelsmar =0,leg_title ='Coancestry')The functiondraw_all_admix acceptstree_subpops in place ofinbr_subpops to perform the entire simulation of genotypes and intermediate quantities for intermediate subpopulations related by a tree. However, unlike functions for the true structure (likecoanc_admix andfst_admix earlier) which accept arbitrary intermediate coancestry matrices, the functions for drawing intermediate subpopulation allele frequencies requires trees or vectors (independent subpopulations) and do not support arbitrary coancestry matrices.
For simplicity we ask for genotypes only and skip analyzing the other allele frequency matrices (p_subpops here is simulated fromdraw_p_subpops_tree, which we analyzed earlier).
# number of loci in simulation (NOTE this is 30x less than in publication!)m_loci <-10000# draw all random Allele Freqs (AFs) and genotypes# reuse the previous inbr_subpops, admix_proportionsout <-draw_all_admix(admix_proportions = admix_proportions,tree_subpops = tree_subpops,m_loci = m_loci)# genotypesX <-out$XLet’s verify that the correlation structure of the genotypes matches the theoretical coancestry matrix we constructed earlier.
# estimate kinship using popkin# here we didn't use subpopulation labels for simplicitykinship_estimate <-popkin(X)# replace diagonal with inbreeding coeffs. to match coancestry matrixcoancestry_estimate <-inbr_diag(kinship_estimate)# Visualize the coancestry matrix using "popkin"!plot_popkin(list( coancestry, coancestry_estimate ),titles =c('Truth','Estimate'),# second value is top margin, for panel titlesmar =c(0,2.5),leg_title ='Coancestry')Balding, D. J., and R. A. Nichols. 1995. “A Method for Quantifying Differentiation Between Populations at Multi-Allelic Loci and Its Implications for Investigating Identity and Paternity.”Genetica 96 (1-2): 3–12.
Ochoa, Alejandro, and John D. Storey. 2016. “FST and Kinship for Arbitrary Population Structures I: Generalized Definitions.”bioRxiv doi:10.1101/083915.https://doi.org/10.1101/083915.
———. 2021. “Estimating FST and Kinship for Arbitrary Population Structures.”PLoS Genet 17 (1): e1009241.https://doi.org/10.1371/journal.pgen.1009241.
Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. “Inference of Population Structure Using Multilocus Genotype Data.”Genetics 155 (2): 945–59.