Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides the application of a gene marker combination in preparing a probe combination and/or a kit and/or a system for detecting the solid tumors of children, and the gene marker combination can detect various biomarkers such as SNV, INDEL, CNV, gene fusion and the like, and can more sensitively and pertinently capture genes related to diseases, thereby more efficiently detecting gene variation having guiding significance to diagnosis and treatment.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect of the embodiments of the present invention, there is provided a use of a gene marker combination for preparing a probe combination and/or a kit for detecting a solid tumor in a child, the gene marker combination comprising:
a first gene marker combination for detecting single nucleotide variations SNV, insertions and deletions INDEL and copy number variations CNV; the first gene marker combination comprises: MTOR 36 1, IDH1, IDH2, TP53, ATRX, TERT, FGFR1, BRAF, H3F3A, HIST1H3A, NF2, CTNNB1, SMARCA4, DDX3X, ARID 1X, ARID 1X, NF X, DRAMER X, DROSHA, DGCR X, PPM 1X, SMO, SIX X, SIX X, MYCN, LT X, BCOR, NONO, WT X, TPM ACTB, COL6A X, MAP3K X, ASXL X, BRD X, NOTCH X, RLIM, CREBBP, MET, AFAG X, YA, GNAQ, GNA X, NRAS X, KRAS, PHPHK 3, PHAK X, NOCK X, PACCH X, PAC X, PHAK X, PHSACK X, PHAK X, PSN X, PAX, VGLL, TEAD, SRF, CIC 3H7, SS, SSX, SSX, SS18L, DNAJB, TACC, MAML, PCDHGA, NTRK, RELA, MAMLD, FAM118, NUTM2, ALK, KLF, CREB3L, CREB, ATF, FLI, ETV, ETV, FEV, ERG, PBX, SMARCA, USP, PDGB, PDGFB, MTCAMA, CITED, NCOA, MAML, SSX, PRKACA, ACVR, PDGFRA, PIK3R, KMT2, KBTBD, ZYM, PTEN, GSE, FBXW, CTDNEP, GFI1, SYNE, PRKAR1, NEB, MUC, STAG, SCN, RYRR, RYRYRYRYRYR, FU, SCN4, DISM 3, CHCSF, DNAH, CIC, SAND, SABR, SARGD, SDBR, SARD, SDBR, FAT, SDBR, FAK, SABR, SDBR, FAK, SDBR, SABR, SDBR, FAB, SARBF, SDBR, SABR, PSK, PSR, FORD, FO;
and a second gene marker combination for detecting a gene marker combination of the fused genetic variation FUSION; the second gene marker combination comprises: BRAF, TACC1, QKI, MAML2, PCDHGA1, NTRK2, RELA, MAMLD1, TFE3, NUTM2B, NTRK3, MET, NTRK1, TPM3, TFG, PRCC, ASPSCR1, ALK, KLF15, CREB3L1, CREB1, ATF1, FLI1, ETV1, ETV4, FEV, ERG, WT1, PBX1, SMARCA5, NR4A3, EWSR1, USP6, PDGFRB, FUS, CREB3L2, PDGFB 1, CITED 1, NCAF3672, 1L 1, YA 1, NOYAOR, SSX1, SSX1, SSX1, SSKACA 1, PRKAFB 1, FOMTA 1, FO 1, PAC 1, PSN 1, PSCANFET 1, PSCANFC 1, PSN 1, PSC 1, PSCANFET 1, PSC 1, PSCANFR 1, PSN 1, PSC 1, PSCANFB 1, PSN 1, PSC 1, PSN 1, PSC 1, PSN 1, PSC 1, PSN 1, PSC 1, PSN.
In the above technical solution, the solid tumor of the child comprises: a brain tumor, one of neuroblastoma, kidney tumor, soft tissue tumor, langerhans cell histopathy, osteogenic sarcoma, germ cell tumor, ewing tumor, retinoblastoma, and germ cell tumor.
The classification of mutations can be mainly divided into single base pair mutations (SNVs/SNPs), small insertions or deletions (InDels ≦ 50bp), structural mutations (SVs >50bp) and fusion gene mutations.
Single base pair variations (SNVs/SNPs), both of which are single nucleotide changes, are somewhat different if refined. SNPs are generally directed to "populations" and occupy a certain proportion of the population (well-characterized), whereas SNVs are generally directed to "individuals" and occur very infrequently (not well-characterized). Single Nucleotide Polymorphisms (SNPs), Single Nucleotide base changes, including substitutions, transversions, deletions and insertions, are the most common types of genetic variations in humans, resulting in Polymorphisms in nucleic acid sequences. Because of the characteristics of strong genetic stability, large quantity, wide distribution and the like, SNP is widely applied to research of group genetics, disease-related gene localization and the like. Single Nucleotide Variations (SNVs) are variations of a Single Nucleotide in a DNA sequence. There are three common patterns of single-site nucleotide substitution, single-site nucleotide deletion, and single-site nucleotide insertion. The replacement mode is that a certain single-site nucleotide on the genome is changed into another nucleotide, the deletion mode is that a certain site of the genome is deleted, and the insertion mode is that a certain single-site nucleotide on the genome is repeatedly expressed.
Insertion-deletion (InDel) refers to the insertion or deletion of a small sequence of fragments occurring at a certain position in the genome, and the length thereof is usually less than 50 bp. Unlike SNPs, it is not a single base change, but insertion or deletion of DNA fragments of different sizes occurs in the genome. The distribution frequency of the gene is second to that of SNP, and many of the distribution frequencies occur in important positions such as exon regions and promoter regions in genes. This variation often causes a major change in gene function, and InDel is also a very important genomic structural variation.
Structural Variation (SV) is a relatively large variety, and can be further divided into long fragment sequence Insertion (Insertion), Deletion (Deletion), Inversion (Inversion), intrachromosomal Translocation (Intra-chromosomal Translocation), interchromosomal Translocation (Inter-chromosomal Translocation), Copy Number Variation (Copy Number Variation) and some forms of more complex Variation, which are 50bp or more, according to different types of Structural Variation.
The fusion gene refers to that coding regions of two or more genes are connected end to end and are arranged in the same set of regulatory sequences. The fusion gene refers to a chimeric gene formed by connecting the coding regions of two or more genes end to end and placing the two or more genes under the control of the same set of regulatory sequences (including promoters, enhancers, ribosome binding sequences, terminators and the like), wherein the expression product of the fusion gene is a fusion protein.
In a second aspect of the embodiments of the present invention, there is provided a probe set for detecting a solid tumor of a child, the probe set comprising:
targeting DNA probe: for capturing a first gene marker combination targeted to said;
fusion DNA Probe: for capturing gene fusions that include DNA fragments resulting from the fusion of any two genes in the second gene marker set.
Further, the targeting DNA probe and the fusion DNA probe are nucleotide oligomers and are fused and complementary with the targeting DNA or the targeting DNA respectively.
Further, the probe size of the targeting DNA probe and the fusion DNA probe is 75-200 nucleotides.
In a third aspect of the embodiments of the present invention, there is provided a kit for detecting a solid tumor in a child, the kit comprising: the probe combination for detecting the solid tumor of the children.
In a fourth aspect of embodiments of the present invention, there is provided a system for detecting a solid tumor in a child, comprising:
a sequencing module for obtaining a sequencing result of the gene marker combination;
the comparison module is used for performing bioinformatics processing on the sequencing result to obtain processing data; and comparing the processed data with data of a normal sample to obtain gene mutation information, wherein the gene mutation information comprises Single Nucleotide Variation (SNV), insertion and deletion (INDEL) and Copy Number Variation (CNV) and the result of FUSION gene variation (FUSION).
Further, the bioinformatics processing specifically includes the steps of:
and the sequencing result is a fastq file, and the fastq file is subjected to quality control, genome association, analysis of somatic mutation and embryonic cell mutation and annotation.
In a fifth aspect of embodiments of the present invention, there is provided a system for detecting a solid tumor in a child, the system comprising:
a processor and a memory coupled to the processor, the memory storing instructions that when executed by the processor use the steps of:
performing bioinformatics processing on the sequencing result to obtain processing data; and comparing the processed data with data of a normal sample to obtain gene mutation information, wherein the gene mutation information comprises Single Nucleotide Variation (SNV), insertion and deletion (INDEL) and Copy Number Variation (CNV) and the result of FUSION gene variation (FUSION).
In a sixth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program uses the steps described when executed by a processor.
One or more technical solutions in the embodiments of the present invention have at least the following technical effects or advantages:
the gene marker combination provided by the embodiment of the invention is applied to the preparation of a probe combination and/or a kit and/or a system for detecting the solid tumors of children, covers the hot spot variation type in the common tumor types of the current solid tumors of children, and can simultaneously detect gene fusion, gene mutation and gene copy number variation. Compared with the method for detecting WES alone, RNAseq and CNV can comprehensively detect variation sites at one time, can comprehensively evaluate diseases, and provides basis for auxiliary diagnosis, prognosis evaluation and targeted medication of solid tumors of children. Meanwhile, the gene detection of the solid tumor of children in China has less panel, and the invention can provide the basis of gene molecule level for disease diagnosis, prognosis evaluation and targeted medication.
Detailed Description
The present invention is further described in detail below with reference to specific examples so that those skilled in the art can more clearly understand the present invention.
The following examples are provided only for illustrating the present invention and are not intended to limit the scope of the present invention. All other embodiments obtained by a person skilled in the art based on the specific embodiments of the present invention without any inventive step are within the scope of the present invention.
In the examples of the present invention, all the raw material components are commercially available products well known to those skilled in the art, unless otherwise specified; in the examples of the present invention, unless otherwise specified, all technical means used are conventional means well known to those skilled in the art.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
the invention discovers a gene marker combination for detecting solid tumors by selecting a TARGET database, a st-jude database, a GEO database, a COSMIC database, SRA data and genes which are frequently related to the TARGET treatment of the mainstream cancer in the literature, wherein the gene marker combination comprises:
a first gene marker combination for detecting single nucleotide variations SNV, insertions and deletions INDEL and copy number variations CNV; the first gene marker combination comprises: MTOR 36 1, IDH1, IDH2, TP53, ATRX, TERT, FGFR1, BRAF, H3F3A, HIST1H3A, NF2, CTNNB1, SMARCA4, DDX3X, ARID1A, ARID1A, NF A, DRAMER A, DROSHA, DGCR A, PPM 1A, SMO, SIX A, SIX A, MYCN, LT A, BCOR, NONO, WT A, TPM ACTB, COL6A A, MAP3K A, ASXL A, BRD A, NOTCH A, RLIM, CREBBP, MET, AFAG A, YA, GNAQ, GNA A, NRAS A, KRAS, PHPHK 3, PHK A, NOCK A, PACCH A, PAC A, PHAK A, PHN A, PSN, PAX 36 3, PAX7, VGLL2, TEAD1, SRF, CIC, ZC3H7B, SS18, SSX2, SSX1, SS18L1, DNAJB1, TACC1, MAML 1, PCDHGA1, NTRK1, RELA, MAMLD1, FAM118 1, SSTM 21, ALK, KLF1, CREB3L1, CREB1, ATF1, FLI1, ETV1, ETV1, FEV, ERG, PBX1, SMARCA 1, USP 1, PDGFRB, PDG 1, CAMTA1, CITED 1, NCOA 1, 1, MAML 1, SSX1, PRKACA 1, ACKANFCA 1, PDGF 1, PSNFR 1, PSOB 1, PSE 1, PSOB 1, PSE 1, PSOB 1, PSE 1, PSOB 1, PSE 1, PSOB 1, PSE 1, PSOB 1, PSE 1, PSOB 1, PSE;
and a second gene marker combination for detecting a gene marker combination of the fused genetic variation FUSION; the second gene marker combination comprises: BRAF, TACC1, QKI, MAML2, PCDHGA1, NTRK2, RELA, MAMLD1, TFE3, NUTM2B, NTRK3, MET, NTRK1, TPM3, TFG, PRCC, ASPSCR1, ALK, KLF15, CREB3L1, CREB1, ATF1, FLI1, ETV1, ETV4, FEV, ERG, WT1, PBX1, SMARCA5, NR4A3, EWSR1, USP6, PDGFRB, FUS, CREB3L2, PDGFB 1, CITED 1, NCAF3672, 1L 1, YA 1, NOYAOR, SSX1, SSX1, SSX1, SSKACA 1, PRKAFB 1, FOMTA 1, FO 1, PAC 1, PSN 1, PSCANFET 1, PSCANFC 1, PSN 1, PSC 1, PSCANFET 1, PSC 1, PSCANFR 1, PSN 1, PSC 1, PSCANFB 1, PSN 1, PSC 1, PSN 1, PSC 1, PSN 1, PSC 1, PSN 1, PSC 1, PSN.
Based on the next generation sequencing technology, the DNA probe of the gene marker combination is used for targeted enrichment of important exon regions and partial intron regions of 251 genes, and genes related to diseases can be captured more sensitively and pertinently, so that genetic variation having guiding significance on diagnosis and treatment can be detected more efficiently. The invention can detect various biomarkers such as SNV, INDEL, CNV and gene fusion.
The gene probe utilized in the present invention comprises a hybridizing nucleotide sequence complementary to a target position of a target nucleic acid. The term "complementary" means that the primer or probe is sufficiently complementary to selectively hybridize to a target nucleic acid sequence under hybridization conditions, and has the meaning of including both substantial complementarity (substentiality complementarity) and perfect complementarity (perfect complementarity), preferably being perfect complementarity. The term "substantially complementary sequence" as used herein includes not only completely identical sequences but also sequences that are partially different from the target sequence to be compared and that can function as primers for the specific target sequence. The sequences of the gene probe and the primer do not need to have a sequence completely complementary to a part of the sequence of the template, and may have sufficient complementarity within a range that can hybridize with the template and exert their inherent effects. Therefore, the gene probe of the present invention does not need to have a sequence completely complementary to the nucleotide sequence as a template, and may have sufficient complementarity within a range that can hybridize with the template and exert its inherent action. The design of the gene probe is well within the skill of those skilled in the art, and can be performed, for example, by a PRIMER design program (e.g., PRIMER 3 program). The examples of the present invention are purchased from commercial probe manufacturers, preferably from Integrated DNA Technologies (IDT) Inc. in the United states.
Unless otherwise specifically stated, various raw materials, reagents, instruments, equipment and the like used in the examples of the present invention are commercially available or can be prepared by an existing method.
The following will describe in detail the application of a gene marker combination in the present application in the preparation of probe combinations and/or kits and/or systems for detecting solid tumors in children, with reference to examples, comparative examples and experimental data.
Example 1 detection of solid tumors in childhood
Step 1, DNA extraction
DNA was extracted from paraffin-embedded samples of the tumor sites of three patients (Sample01, Sample02, Sample03), and it was recommended to use the Qiagen commercial extraction Kit GeneRead DNAFFFPE Kit (cat NO:180134)
Step 2, NGS library preparation
The DNA sample of a patient with 500ng of 300-400bp is interrupted by using an ultrasonic interruption method or an enzyme cutting method, the fragment size ranges from 300-400bp, HD780 is a commercial multi-concentration gradient cfDNA standard substance and comprises a wild type, 1% and 5% concentration gradient standard substances, each concentration gradient takes 50-100ng, and each concentration is repeated to mark as a wild type (Wt1a, Wt1b), 1% (MUT1a, MUT1b) and 5% (MUT2a, MUT2 b). Using IDT general library construction kit, or
Hyper Prep Kit, or
And (3) preparing the NGS library by using the TruSeq series DNA library preparation kit.
The quality of the NGS library was judged by measuring the concentration and fragment size of the NGS library using a qubit3.0 fluorescence quantifier and Agilent2100 system. The step 3 can be carried out when the quality of the NGS library is qualified.
Step 3, Capture of targets Using the Children solid tumor Gene Panel of the invention
According to the procedures of IDT Hybridization elution Kit (XGen Hybridization and Washkit), 400-600ng of each of the three patient NGS libraries (Sample01, Sample02, Sample03) and 6 standard NGS libraries (Wt1a, Wt1b, MUT1a, MUT1b, MUT2a, MUT2b) was used for target capture.
After the target capture DNA is mixed together, it is hybridized with the inventive panel probe for a period of 4-16h at 65 ℃ and an overnight hybridization for 16h is recommended.
Dynabeads Using streptavidin magnetic beadsTMM-270 Streptavidin Beads separate target DNA sequences captured by the probe, and the target DNA library is obtained by purification after hot elution and room temperature elution. Finally, the concentration of the target DNA library is amplified by using a PCR method and then purified.
The quality of the NGS library was judged by measuring the concentration and fragment size of the NGS library using a qubit3.0 fluorescence quantifier and Agilent2100 system (fig. 1). And (4) performing machine sequencing after the library quality is qualified.
Step 4, sequencing on computer and data analysis
The library was put into an Illumina NGS sequencer (e.g. Novaseq6000) for sequencing to obtain the original sequence data file.
And (4) performing letter generation analysis on the off-line data file to obtain a sequence QC report and obtain a genetic variation information file of each sample.
And (4) judging a positive mutation site by combining clinical characteristics and annotations of various databases such as HGMD, OMIM, Clinvar and the like on the mutation. Table 1 shows the SNV mutation detection of one sample, and shows that the gene Panel can detect the mutation with the mutation frequency of 1% and can reach the minimum detection limit of tumor detection.
Table 2 shows the variation and clinical significance of clinical information related to 3 samples, wherein the fusion of Sample01 suggests the sensitivity of targeted drug application and the tumor typing and prognosis conditions, the variation of the gene copy number of Sample02 suggests the prognosis conditions, the evaluation is a high-risk group, and the gene fusion of Sample03 clearly suggests the tumor type.
Table 1 shows the detection of SNV (single nucleotide variation) in one example
| Gene | Chromosome position a | Genetic variation | Frequency of variation |
| SMARCA4 | chr19:11172458 | NM_003072:intron34:c.4912-2A>C | 10.5% |
| CLCN6 | chr1:11888205 | NM_001286:exon11:c.883C>T:p.R295C | 51.6% |
| PBX1 | chr1:164529166 | NM_002585:exon1:c.107A>G:p.E36G | 6.7% |
| OBSCN | chr1:228456386 | NM_052843:exon17:c.5017C>T:p.R1673C | 1.1% |
| OBSCN | chr1:228506884 | NM_052843:exon54:c.14431G>A:p.A4811T | 49.8% |
| NEB | chr2:152534437 | NM_004543:exon33:c.3520G>A:p.A1174T | 51.7% |
| FGFR3 | chr4:1807297 | NM_000142:exon12:c.1546G>C:p.D516H | 48.3% |
| PDGFRB | chr5:149505089 | NM_002609:exon12:c.1726G>T:p.G576C | 47.2% |
| THBS2 | chr6:169634880 | NM_003247:exon11:c.1600G>A:p.V534M | 47.2% |
| NUTM2B | chr10:81470422 | NM_001278495:exon5:c.1636A>G:p.K546E | 1.6% |
| GNAS | chr20:57429939 | NM_080425:exon1:c.1619C>T:p.P540L | 1.3% |
| RLIM | chrX:73811792 | NM_016120:exon4:c.1358G>A:p.S453N | 2.9% |
TABLE 2-3 samples Positive variants and their clinical significance
In conclusion, the pediatric solid tumor Panel of the invention integrates the hot spot genes of the current pediatric solid tumor, covers the hot spot gene mutation, gene fusion and gene copy number variation, efficiently and specifically captures target DNA and performs sequencing analysis, can well replace the mode of simultaneously detecting WES (gene mutation detection) and RNAseq (gene fusion detection), and has high cost performance. Meanwhile, the kit can prompt the tumor classification, the prognosis curative effect and the targeted medication condition, and provides comprehensive evidence of a gene level for a clinician in the diagnosis and treatment process.
The panel of the invention was then sensitivity-verified using three standards of commercial standard HD780 (horizons) wild type, at concentrations of 1% and 5%, repeated 1 time for each concentration of standard, with the results shown in Table 3;
table 3-standard HD780 wild type (Wt1a, Wt1b), 1% (MUT1a, MUT1b) and 5% (MUT2a, MUT2b) detected.
As is clear from Table 3, the gene Panel of the present invention can detect a mutation at a mutation frequency of 1% and can achieve the minimum detection limit for tumor detection.
Example 2 detection System for solid tumor in Children
An embodiment of the present invention provides a system for detecting a solid tumor of a child, including:
a sequencing module for obtaining a sequencing result of the gene marker combination;
the comparison module is used for performing bioinformatics processing on the sequencing result to obtain processing data; and comparing the processed data with data of a normal sample to obtain gene mutation information, wherein the gene mutation information comprises Single Nucleotide Variation (SNV), insertion and deletion (INDEL) and Copy Number Variation (CNV) and the result of FUSION gene variation (FUSION).
The bioinformatics processing includes: and (4) carrying out biological information analysis on the sequencing result to obtain a sample mutation result. And the sequencing result is a fastq file, and the sequencing result is analyzed by using an algorithm for acquiring the gene mutation issued by Broad to obtain and annotate the gene mutation result. The method mainly comprises the steps of performing annotation by quality control of a fastq file, genome association, analysis of somatic cell mutation and embryonic cell mutation, wherein the used software has the fastq file for quality control; bwa, and GATK and mutect2 obtain somatic mutation; obtaining embryo cell mutation by a HaplotpypeCaller method; ANNOVAR is annotated.
And combining the mutation information with clinical information to obtain final diagnosis.
Example 3 detection System for solid tumor in Children
The embodiment of the invention provides a cervical cancer prognosis system, which comprises:
a processor and a memory coupled to the processor, the memory storing instructions,
the instructions when executed by the processor use the steps of:
performing bioinformatics processing on the sequencing result to obtain processing data; and comparing the processed data with data of a normal sample to obtain gene mutation information, wherein the gene mutation information comprises Single Nucleotide Variation (SNV), insertion and deletion (INDEL) and Copy Number Variation (CNV) and the result of FUSION gene variation (FUSION).
Embodiment 4 computer-readable storage Medium
Embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the embodiment 2 method and/or the embodiment 3 method.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling an electronic device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the foregoing embodiment, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Finally, it should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the embodiments of the present invention and their equivalents, the embodiments of the present invention are also intended to encompass such modifications and variations.