Disclosure of Invention
In one aspect, embodiments of the present invention provide a method of assessing fetal DNA concentration from maternal and maternal DNA, the method comprising the steps of:
S101, sequencing binary loci of a maternal DNA sample and a maternal free DNA sample to obtain DNA data F and S respectively, wherein the maternal free DNA sample is obtained by separating maternal peripheral blood;
s102, acquiring a site set X' meeting the following requirements in S,
X'={xi|nai(S)/ni(S)≤0.2∩nai(F)/ni(F)≥0.9}∪{xi|nAi(S)/ni(S)≤0.2∩nAi(F)/ni(F)≥0.9},
Wherein nA and nA represent observations of two-state sites a and a, respectively, n=na+na, k=na;
S103, calculating the probability P of a certain position on the position set X' according to a formula I,
Wherein,0≤p≤0.5,Pm=0.4;
S104, obtaining pmax when the cumulative probability h of the point set X' is maximum by adopting a maximum likelihood value method;
s105. fetal DNA concentration n=2pmax.
Wherein, in step S101, the sequencing method comprises:
s1011, constructing a probe;
S1012, extracting DNA in the sample;
s1013, a library of building blocks;
S1014, performing hybridization capturing and sequencing on the target area of the library by adopting the probe in the step S1011;
and S1015, splitting the sequencing data and filtering the quality value to obtain the sequencing data.
Wherein the binary site is selected from SNP site, INDEL site and/or STR site, and the crowd frequency of the binary site is 0.05-0.95.
Specifically, in the SNP site, A represents a wild-type site, and a represents a mutant site.
Preferably, the number of binary sites is greater than 1000.
In step S104, calculating the accumulated probability h of all the sites on the site set X' according to a formula II, wherein p takes a value at preset intervals to calculate the accumulated probability h, and the p value when h takes the maximum value is pmax;
preferably, the predetermined interval is 0.0001.
On the other hand, the embodiment of the invention also provides application of the method for evaluating the DNA concentration of the fetus by using the DNA of the father and the mother, wherein when N is more than or equal to 0.4, the white blood cells of the pregnant woman need to be sequenced when the parent and the child are identified, when N is more than or equal to 0.004, the parent and the child are identified according to a second-generation DNA parent identification method, when N is less than or equal to 0.004, the parent is judged if all sites are matched with the father, if the sites are not matched and the fetus is a male fetus, the judgment is carried out according to the matching condition of Y chromosome, and if the sites are not matched and the fetus is a female fetus, the parent-child relationship cannot be judged, and the free DNA sample of the pregnant woman needs to be collected again.
The evaluation method provided by the invention can evaluate the DNA concentration of the fetus, can judge whether the white blood cells of the pregnant woman need to be detected or not according to the evaluation value in the paternity test, and can carry out the paternity test by adopting the conventional second-generation DNA paternity test method within the range of the experience value (0.004 < N < 0.4), thereby improving the accuracy of the paternity test.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent.
Example 1
Example 1 provides a method of assessing fetal DNA concentration from maternal and maternal DNA comprising the steps of:
And S101, sequencing binary sites of a maternal DNA sample and a maternal free DNA sample to obtain DNA data F and S respectively, wherein the maternal free DNA sample is obtained by separating maternal peripheral blood (comprising fetal DNA). Among them, the DNA data S and F in this example were obtained by the second generation sequencing technique.
S102, acquiring a site set X' meeting the following requirements in S,
X'={xi|nai(S)/ni(S)≤0.2∩nai(F)/ni(F)≥0.9}∪{xi|nAi(S)/ni(S)≤0.2∩nAi(F)/ni(F)≥0.9},
Wherein nA and nA represent observations of binary sites a and a, respectively, n=na+na, k=na, and the value requirement of F indicates that the same point on F meets the value requirement for data S.
S103, calculating the probability P of a certain position on the position set X' according to a formula I,
Wherein,P is more than or equal to 0 and less than or equal to 0.5, pm=0.4, and specifically, p takes discrete values at predetermined intervals from 0.
S104, obtaining pmax when the cumulative probability h of the point set X' is maximum by adopting a maximum likelihood value method.
S105. fetal DNA concentration n=2pmax.
Wherein, conventional second generation sequencing technology is adopted in step S101, the sequencing method comprises:
and S1011, constructing a probe, wherein the probe is designed according to the requirement.
S1012, extracting DNA in the sample.
S1013 library of building blocks.
S1014, performing hybridization capturing and sequencing on the target region of the library by adopting the probe in the step S1011.
And S1015, splitting the sequencing data and filtering the quality value to obtain the sequencing data.
Wherein the binary site is selected from SNP site, INDEL site and/or STR site, and the population frequency of the binary site is 0.05-0.95.
Wherein, in the SNP site, A represents a wild-type site, and a represents a mutant site. Specifically, the site is aligned with a human genome reference sequence, and the wild type, denoted as a, and the mutant, denoted as a, is identical to the reference genome alignment.
Preferably, to ensure accuracy, the number of binary sites is greater than 1000, such as 2693.
In step S104, the cumulative probability h of all the sites on the site set X' is calculated according to the formula II, and p is calculated at intervals (values from 0 to 0.5) to obtain the cumulative probability h, wherein when h is the maximum value, the p value is pmax;
Preferably, the predetermined interval is 0.0001 for ensuring accuracy, although other values, such as 0.001, may be used as desired.
Example 2
The embodiment of the invention also provides application of the method for evaluating the concentration of fetal DNA by using the DNA of the father and the mother disclosed in the embodiment 1, and when N is more than or equal to 0.4 and parent-child identification is carried out, the white blood cells of the pregnant woman are required to be sequenced to obtain the SNP locus genotype of the pregnant woman. And when 0.004< N <0.4, performing paternity test according to a conventional second-generation DNA paternity test method. When N is less than or equal to 0.004, parent-child identification is carried out, if all loci are matched with male parent, the parent is judged, if loci are not matched and a fetus is a male fetus, the parent is judged according to the matching condition of Y chromosome, if one mismatch is present, the parent is not judged, and if loci are not matched and the fetus is a female fetus, the parent-child relationship cannot be judged, and the sample needs to be sent again. In application, the father is a father or a non-father which is not concerned.
In addition, by adopting the method, the father and the non-father adopt the same algorithm to calculate (the steps S101-S104 are the same, the step S105 adopts N=2pmax) to obtain different fetal DNA concentrations, and the father and the non-father can be judged if the father is the father by combining other algorithms to obtain the fetal DNA concentrations.
Example 3
The maternal DNA sample F and the maternal DNA sample M are randomly generated through the frequency of Chinese crowd, the offspring Z is generated through the Mendelian genetic generation law, samples are mixed at intervals of 0.01 from 0 to 0.4, the simulated maternal free DNA sample S can be obtained by mixing the offspring Z and the maternal M samples, 10 samples are generated by mixing each proportion, and the sample number of the S sample set is 400. Wherein the F and S sample sets contain binary types comprising SNPs and INDELs.
A partial subset of polymorphic loci of Chinese population is obtained and used as a detection locus set X, and 2692 binary SNP loci with population frequency of 0.05-0.95, 1 INDEL and 2693 are adopted in the embodiment. The polymorphism distribution of each site xi of the detection site set X of samples F and S is obtained.
The concentration of fetus in sample S was calculated from the concentration of detection sites in samples F and S according to the method of example 1. Taking the simulation concentration of 0.2 as an example, the obtained relation diagram of the cumulative probability h and the p value is shown in fig. 3, and it can be seen from the figure that when p is a certain value (0.1), h can take the maximum value, and meanwhile, the p value is just 1/2 of the simulation concentration through verification. By plotting the simulated concentration N as the X axis and the calculated N (p 2) as the Y axis, a linear distribution diagram of N-N, n=1.0099126×n, and r2 =0.9997 can be obtained, and the distribution diagram is shown in fig. 4. Wherein, the lower oblique line is the actual simulation concentration, and the upper oblique line is the value calculated by the method of the patent. It can be seen from the graph that the difference between the concentration calculated by the method of the present patent and the simulated concentration is small, i.e. the evaluation method of the present patent has very high accuracy.
Example 4
Polymorphic sites of the maternal DNA sample F, the maternal DNA sample M, the progeny DNA sample Z were obtained by experimental sequencing analysis. A simulated maternal free DNA sample S can be obtained by mixing samples of Z and M in a known ratio p in example 3. The concentration of fetus in sample S was calculated from the concentration of detection sites in samples F and S according to the method of example 1. By plotting the simulated concentration N as the X axis and the calculated N (p 2) as the Y axis, a linear distribution of N-N, N=n+0.0044, and r2 = 0.9939483 can be obtained, and the distribution is shown in FIG. 5. Wherein, the lower oblique line is the actual simulation concentration, and the upper oblique line is the value calculated by the method of the patent. It can be seen from the graph that the difference between the concentration calculated by the method of the present patent and the simulated concentration is small, i.e. the evaluation method of the present patent has very high accuracy.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.