Disclosure of Invention
In one aspect, embodiments of the invention provide a method for assessing fetal DNA concentration from non-biotics and maternal DNA, the method comprising the steps of:
s101: sequencing the polymorphic sites of a non-father DNA sample and a pregnant woman free DNA sample to respectively obtain DNA data F' and S, wherein the pregnant woman free DNA sample is obtained by separating the peripheral blood of a pregnant woman;
s102: acquiring a site set X' meeting the following requirements in S,
X'={xi|nai(S)/ni(S)≤0.2∩nai(F)/ni(F)≥0.9}∪{xi|nAi(S)/ni(S)≤0.2∩nAi(F)/ni(F)≥0.9},
wherein nA and nA respectively represent observed values of the binary sites A and a, n is nA + nA, and k is nA;
s103: calculating the probability P of a certain locus in the locus set X' according to formula I,
s104: obtaining p when the cumulative probability h of the point set X' is maximum by adopting a maximum likelihood value methodmax;
S105: fetal DNA concentration N-4 pmax。
Wherein, in step S101, the sequencing method comprises:
s1011: constructing a probe;
s1012: extracting DNA in a sample;
s1013: a library of building blocks;
s1014: performing hybridization capture and sequencing on the library target region by adopting the probe of the step S1011;
s1015: and splitting and filtering the sequencing data by the quality value to obtain the sequencing data.
Wherein the polymorphic locus is selected from SNP locus, INDEL locus and/or STR locus, and the population frequency of the polymorphic locus is 0.05-0.95.
Specifically, in the SNP site, A represents a wild-type site and a represents a mutant site.
Preferably, the number of said dynodes is greater than 1000.
Wherein, in step S104: calculating the cumulative probability h of all the sites in the site set X' according to a formula II, calculating the cumulative probability h by taking the value of p at preset intervals, and determining the value of p as p when h takes the maximum valuemax;
Preferably, the predetermined interval is 0.0001.
On the other hand, the embodiment of the invention also provides the application of the method for evaluating the fetal DNA concentration by using the DNA of the non-living father and mother, when the N is more than or equal to 0.4, the leucocyte of the pregnant woman needs to be sequenced when the paternity test is carried out; when N is more than 0.004 and less than 0.4, paternity test is carried out according to a second generation DNA paternity test method; when N is less than or equal to 0.004, performing paternity test, and if all loci are matched with the male parent, judging the parent as a new parent; if the locus is not matched and the fetus is a male fetus, judging according to the matching condition of the Y chromosome, and judging that the fetus is a non-father if a mismatch exists; if the loci are not matched and the fetus is a female fetus, the paternity can not be judged, and the free DNA sample of the pregnant woman needs to be collected again.
The evaluation method provided by the invention can evaluate the DNA concentration of the fetus, and during paternity test, whether leucocytes of a pregnant woman need to be detected or not can be judged according to the evaluation value, whether the leucocytes need to be re-checked or not can be judged, and paternity test can be carried out by adopting a conventional second-generation DNA paternity test method within an empirical value range (N is more than 0.004 and less than 0.4); thereby improving the accuracy of paternity test.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
Example 1
Example 1 provides a method of assessing fetal DNA concentration from non-biological father and mother DNA, the method comprising the steps of:
s101: sequencing the polymorphic sites of a non-biological father DNA sample (a random father parent which can be real detection data or simulation data) and a free DNA sample of a pregnant woman to respectively obtain DNA data F' and S; wherein, the free DNA sample of the pregnant woman is obtained by separating the peripheral blood of the pregnant woman (containing fetal DNA). Among them, the DNA data S and F' in this example were obtained by the second generation sequencing technique.
S102: acquiring a site set X' meeting the following requirements in S,
X'={xi|nai(S)/ni(S)≤0.2∩nai(F)/ni(F)≥0.9}∪{xi|nAi(S)/ni(S)≤0.2∩nAi(F)/ni(F)≥0.9},
wherein nA and nA respectively represent observed values of the binary sites A and a, n is nA + nA, and k is nA; the value requirement of F 'indicates that the same point location on the data S, F' should meet the value requirement.
S103: calculating the probability P of a certain locus in the locus set X' according to formula I,
wherein,
p is more than or equal to 0 and less than or equal to 0.5, and Pm is 0.4; specifically, p is discretely valued at predetermined intervals starting from 0.
S104: obtaining p when the cumulative probability h of the point set X' is maximum by adopting a maximum likelihood value methodmax。
S105: fetal DNA concentration N-4 pmax。
Wherein, in step S101, a conventional second-generation sequencing technology is adopted, and the sequencing method comprises:
s1011: constructing a probe, wherein the probe is designed as required.
S1012: extracting DNA from the sample.
S1013: a library of building blocks.
S1014: and (4) performing hybridization capture and sequencing on the target region of the library by using the probe of the step S1011.
S1015: and splitting and filtering the sequencing data by the quality value to obtain the sequencing data.
Wherein the polymorphic locus is selected from SNP locus, INDEL locus and/or STR locus, etc., and the population frequency of the polymorphic locus is 0.05-0.95.
Among the SNP sites, A represents a wild-type site and a represents a mutant site. Specifically, the site is aligned with a human genome reference sequence, and the site aligned with the reference genome is called as a wild type, whereas the site aligned with the reference genome is called as a mutant type, and the site aligned with the reference genome is called as a mutant type.
Preferably, to ensure accuracy, the number of binary sites is greater than 1000, e.g., 2693.
Wherein, in step S104: calculating the cumulative probability h of all the sites in the site set X' according to a formula II, calculating the cumulative probability h by taking the value of p at preset intervals (from 0 to 0.5), and determining the value of p as p when h takes the maximum valuemax;
Preferably, to ensure accuracy, the predetermined interval is 0.0001; of course, other values, such as 0.001, may be used as desired.
Example 2
The embodiment of the invention also provides application of the method for evaluating the concentration of the DNA of the fetus by using the DNA of the non-living father and the mother, which is disclosed in the embodiment 1, when N is more than or equal to 0.4 and paternity test is carried out, the leucocyte of the pregnant woman needs to be sequenced to obtain the SNP locus genotype of the pregnant woman. And (3) when N is more than 0.004 and less than 0.4, performing paternity test according to a conventional second-generation DNA paternity test method. When N is less than or equal to 0.004, performing paternity test, and if all loci are matched with the male parent, judging the parent as a new parent; if the locus is not matched and the fetus is a male fetus, judging according to the matching condition of the Y chromosome, and judging that the fetus is a non-father if a mismatch exists; if the loci are not matched and the fetus is a female fetus, the relationship between the parent and the child cannot be judged, and the sample is required to be sent again. When the method is applied, the male parent is not concerned to be a biological parent or a non-biological parent.
In addition, according to the method, the same algorithm is adopted for calculation by the living parent and the non-living parent (the steps S101 to S104 are the same, and N is 2p in the step S105max) The obtained fetal DNA concentrations are different, and the fetal DNA concentrations are obtained by combining other algorithms, and then comparison is carried out to judge whether the male parent is a father.
Example 3
The method comprises the steps of randomly generating a father DNA sample F and a mother DNA sample M through Chinese population frequency, generating a child Z through Mendel' S genetic law, mixing samples at intervals of 0.01 from 0 to 0.4, mixing the child Z and the mother M to obtain a simulated pregnant woman free DNA sample S, mixing 10 samples in each proportion, and obtaining the number of the S sample set as 400. Among these, the F and S sample sets contained the binary types including SNPs and INDELs.
Obtaining a partial subset of the Chinese population polymorphic sites as a detection site set X, wherein the number of the adopted population frequency binary SNP sites is more than 1000, and the number of the adopted population frequency binary SNP sites is 0.05-0.95. And obtaining the polymorphism distribution of each site xi of the detection site set X of the samples F' and S.
The fetal concentration in the S sample was calculated from the detection site sets of samples F' and S according to the method of example 1. Taking the analog concentration as 0.2 as an example, the obtained relation graph of the cumulative probability h and the p value is shown in fig. 3, and it can be seen from the graph that when p is at a certain value (0.05), h can take the maximum value, and the p value at the moment is just 1/4 of the analog concentration after verification. Plotting the simulated concentration N as X-axis and the calculated p2' (1/2N) as Y-axis, we can obtain N-N as linear distribution graph, p2 ═ 0.493824 × N, r20.9968, the profile is shown in fig. 4. The upper oblique line is the actual simulated concentration, and the lower oblique line is the value calculated by the method of the patent. As can be seen from the figure, the difference between the concentration calculated by the method of the present invention and the simulated concentration is very small, i.e. the evaluation method of the present invention has very high accuracy.
Example 4
Obtaining a non-biological father DNA sample F' and a mother through experimental sequencing analysisPolymorphism sites of the DNA sample M and the progeny DNA sample Z. A sample of mock maternal free DNA sample S was obtained by mixing samples of Z and M in known proportions p as in example 3. The fetal concentration in the S sample was calculated from the detection site sets of samples F' and S according to the method of example 1. Plotting the simulated concentration N as X-axis and the calculated p2' (1/2N) as Y-axis, we can obtain N-N as linear distribution graph, p2 ═ 0.497622 × N, r2The profile is shown in fig. 5, at 0.9961. The upper oblique line is the actual simulated concentration, and the lower oblique line is the value calculated by the method of the patent. As can be seen from the figure, the difference between the concentration calculated by the method of the present invention and the simulated concentration is very small, i.e. the evaluation method of the present invention has very high accuracy.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.