Movatterモバイル変換


[0]ホーム

URL:


CN114300054B - A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomes - Google Patents

A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomes
Download PDF

Info

Publication number
CN114300054B
CN114300054BCN202111667386.8ACN202111667386ACN114300054BCN 114300054 BCN114300054 BCN 114300054BCN 202111667386 ACN202111667386 ACN 202111667386ACN 114300054 BCN114300054 BCN 114300054B
Authority
CN
China
Prior art keywords
chromosome
sequence
repeated
centromere
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111667386.8A
Other languages
Chinese (zh)
Other versions
CN114300054A (en
Inventor
李贵喜
王少辉
胡娟
霍清园
李三华
齐华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Celnovtebio Biotechnology Inc
Original Assignee
Henan Celnovtebio Biotechnology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Celnovtebio Biotechnology IncfiledCriticalHenan Celnovtebio Biotechnology Inc
Priority to CN202111667386.8ApriorityCriticalpatent/CN114300054B/en
Publication of CN114300054ApublicationCriticalpatent/CN114300054A/en
Application grantedgrantedCritical
Publication of CN114300054BpublicationCriticalpatent/CN114300054B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention relates to a method for searching Alpha satellite DNA sequences of a human chromosome centromere region, belonging to the technical field of molecular biology informatics. The invention firstly utilizes the existing tool to obtain the matching file of the matching condition of the repeated sequence in each chromosome, then designs the script, links the script to the matching file, and runs the script in the background to realize the automatic statistics of the repeated times of the repeated sequence in the matching file in each chromosome, and judges whether the repeated sequence is the specific sequence of the target sequence according to the repeated times in each chromosome. According to the invention, the matching files do not need to be opened to carry out individual statistical analysis, and only the designed script is used for running in the matching files, so that the searching of the specific repeated sequences is automatically realized.

Description

Method for searching Alpha satellite DNA sequence of human chromosome centromere region
Technical Field
The invention relates to a method for searching Alpha satellite DNA sequences of a human chromosome centromere region, belonging to the technical field of molecular biology informatics.
Background
Alpha satellite DNA sequences are tandem DNA where there are different copies of the approximately 170bp sequence in the centromere of each chromosome of the human. Specific Alpha satellite DNA sequences are often used for the preparation of reference probes in fluorescent in situ hybridization probe combinations.
Because the size of the human staining centromere is different from 0.2Mb to 6.2Mb, the method is definitely a huge workload depending on the traditional biological software and sequence comparison method if Alpha satellite DNA sequences with better specificity are to be screened out from a large amount of data. However, there is currently no easy method for finding Alpha satellite DNA sequences specific to the centromere region of human chromosomes. Therefore, it is necessary to conduct a certain degree of research on a search method for a specific Alpha satellite DNA sequence.
Disclosure of Invention
The invention aims to provide a method for searching Alpha satellite DNA sequences of a human chromosome centromere region, which aims to solve the problem of low efficiency in searching Alpha satellite DNA sequences specific to the human chromosome centromere region at present.
The invention provides a method for searching Alpha satellite DNA sequences of a human chromosome centromere region for solving the technical problems, which comprises the following steps:
1) Obtaining a target chromosome centromere region DNA sequence and 24 human chromosome centromere region DNA sequences;
2) Extracting the repeated sequence of the obtained DNA sequence of the centromere region of the target chromosome to obtain a corresponding repeated sequence;
3) Matching and matching the obtained repeated sequence with DNA sequences of 24 chromosome centromere regions of human bodies to obtain a file containing matching conditions of the repeated sequence in the 24 chromosome centromere regions of each human body, namely a matching file;
4) And designing a script, linking the script to the matching file, counting the occurrence times of the repeated sequence in the matching file in 24 chromosome centromere regions of each human body, and indicating that the repeated sequence is the target chromosome centromere specific Alpha satellite DNA repeated sequence when the repeated sequence has far more repeated times in the DNA sequence of the target chromosome centromere region than in the DNA sequences of other chromosome centromere regions.
The invention firstly utilizes the existing tool to obtain the matching file of the matching condition of the repeated sequence in each chromosome, then designs the script, links the script to the matching file, and runs the script in the background to realize the automatic statistics of the repeated times of the repeated sequence in the matching file in each chromosome, and judges whether the repeated sequence is the specific sequence of the target sequence according to the repeated times in each chromosome. According to the invention, the matching files do not need to be opened to carry out individual statistical analysis, and only the designed script is used for running in the matching files, so that the searching of the specific repeated sequences is automatically realized.
Further, the script is realized by adopting perl language programming.
According to the invention, the perl language is used as a scripting language, a compiler and a linker are not needed to run codes, and the codes can be directly run in the matching file, so that statistics of repeated times of repeated sequences is realized, and the searching efficiency is further improved.
Further, the step 4) is characterized in that 1 repetition is recorded when the repetition number is counted, wherein the degree of matching with the repetition sequence is larger than a set threshold.
According to the invention, only the sequences with the matching degree larger than the set threshold value are counted by setting the threshold value, so that the sequences meeting the conditions can be quickly pricked from the matching file, and the counting efficiency is further improved.
Further, the set threshold is 85%.
The invention sets the set threshold value to 85%, so that the statistical result can meet the specific requirement.
Further, before comparing the obtained repetitive sequence with the DNA sequences of 24 chromosomal centromere regions of human body, the DNA sequences of 24 chromosomal centromere regions of human body need to be combined, and the obtained repetitive sequence is compared with the combined sequences.
In order to facilitate uniform matching, each chromosome sequence is prevented from being opened independently, and each chromosome sequence is combined into one database, so that matching efficiency is improved.
Further, in the step 2), the DNA sequence of the centromere region of the target chromosome is divided into several short sequences when the repeated sequence treatment is performed.
The invention considers the limitation of the online website adopted in the repeated sequence extraction on the data length, and divides the target chromosome sequence into a plurality of short sequences so as to adapt to the requirement of the repeated sequence extraction website.
Further, the method further comprises the step of converting the statistical result in the step 4) into a table so as to realize visual display of the statistical result.
In order to intuitively display the search result, the statistical result after script operation is copied into the table, and the statistical result is intuitively displayed in the table, so that the method is convenient for staff to intuitively check.
Drawings
FIG. 1 is a flow chart of a method of searching Alpha satellite DNA sequences of the chromosome centromere region of the present invention;
FIG. 2 is a diagram illustrating a UCSC website downloading process according to an embodiment of the present invention;
FIG. 3-a is a schematic diagram of an interface of an online website employed in an embodiment of the present invention;
FIG. 3-b is a schematic representation of a repeating sequence extracted in an embodiment of the present invention;
FIG. 4 is an interface representation of Tbtools tools employed in embodiments of the present invention;
FIG. 5 is a diagram showing a matching file obtained in an embodiment of the present invention;
FIG. 6-a is a graph showing the results of labeling fluorescent probes and genetic probes with Alpha satellite DNA sequences in the centromere region of chromosome 4;
FIG. 6-b is a graph showing the results of labeling fluorescent probes and genetic probes with Alpha satellite DNA sequences in the centromere region of chromosome 6;
FIG. 6-c is a graph showing the results of labeling fluorescent probes and genetic probes with Alpha satellite DNA sequences in the centromere region of chromosome 20.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings.
The invention aims at the problems that the matching file obtained by the traditional software is relatively large, generally reaches a plurality of G, the information contained in the matching file is relatively large, a certain time is required for opening, the repeated condition of the repeated sequence in each chromosome is determined by a manual statistics mode after opening, the efficiency is low, and the error is easy to occur. Therefore, a method for searching Alpha satellite DNA sequences of a human chromosome centromere region is provided, and belongs to the technical field of molecular bioinformatics. The invention firstly utilizes the existing tool to obtain the matching file of the matching condition of the repeated sequence in each chromosome, then designs the script, links the script to the matching file, and runs the script in the background to realize the automatic statistics of the repeated times of the repeated sequence in the matching file in each chromosome, and judges whether the repeated sequence is the specific sequence of the target sequence according to the repeated times in each chromosome. The specific implementation flow of the method is shown in fig. 1, and the specific implementation process is as follows.
1. Obtaining the DNA sequences of the target chromosome centromere region and 24 human chromosome centromere region.
The invention takes the chromosome 6 as a target chromosome, the total DNA sequence of the centromere region of the obtained chromosome 6 is 4.1Mb, the sequence can be obtained through UCSC website, and similarly, 24 chromosome centromere region DNA sequences are downloaded by the website. The usage interface of the UCSC website is shown in fig. 2.
2. The obtained DNA sequences of 24 chromosome centromere regions of human body are combined.
In order to facilitate subsequent searching and comparison, the invention needs to combine the obtained DNA sequences of 24 chromosome centromere regions of human body, and the embodiment uses a downloading Tbtools tool and installing (https:// download.csdn. Net/download/dingbp/10700820) to combine the 24 chromosome centromere DNA sequences into a new database, named as '1-Y'.
3. A repeat sequence is obtained.
The method comprises the steps of carrying out repeated sequence analysis on the DNA sequence of the centromere region of the obtained target chromosome to obtain a corresponding repeated sequence (Masked Regions), wherein the repeated sequence analysis is realized by an online website (https:// www.girinst.org/censor/index. Php), as shown in the figure 3-a, the method comprises the steps of firstly carrying out segmentation operation on the target sequence before uploading the target sequence to the website, and segmenting the target sequence into a plurality of short sequences according to the limitation of the maximum transmission data for uploading respectively, and the result of obtaining the repeated sequence of the target chromosome (No. 6) by the online website is shown in the figure 3-b.
4. And comparing and analyzing the obtained repeated sequence with the DNA sequences of 24 chromosome centromere regions of the human body to obtain corresponding matching files.
In this embodiment, the Tbtools tool is used to align the repeated sequence (Masked Regions) obtained in step 3 with the new database "1-Y" obtained in step 2, and the Tbtools tool is used to apply the interface as shown in fig. 4, where the output file is "123". When the Tbtools tool is matched, the generated file is generally larger to several G, and the comparison condition of the repeated sequence in the DNA sequences of 24 chromosome centromere regions of the human body is included, as shown in figure 5.
5. And (3) carrying out data arrangement and selection on the file obtained in the step (4) by utilizing a script so as to find the Alpha satellite DNA sequence of the centromere region of the human chromosome.
Because the obtained matching file is relatively large, if the file is directly opened, a certain time is required, and in order to improve the efficiency, the invention designs a script which needs to run under the Perl software, and realizes the analysis and statistics of the Perl program on the file with the file name in the script through a script command. The script is directly linked to the matching file, and the matching file can be directly operated in the background without opening the matching file by operating the script, so that automatic statistics can be carried out on the matching file.
The invention uses the script to count the occurrence times of the repeated sequence in the obtained matching file in 24 chromosome centromere areas of each human body, so as to judge whether the repeated sequence is the target chromosome centromere specific Alpha satellite DNA repeated sequence. In the case of counting the number of repetitions, the repetition is recorded as a repetition having a degree of matching with the repetition sequence greater than a set threshold, wherein the set threshold may be determined according to the actual situation, and the set threshold in this embodiment is 85%.
The script constructed in this example is shown below, where the input file name is "123", the output file name is "123result. Txt", and 85 is 85% sequence matching (experiments verify that sequences with matching higher than 85% can hybridize effectively).
In order to avoid the compiling and linking process of the program, the invention adopts perl language as the script language, and the specific implementation codes are as follows:
The final statistical result can be obtained through the script, the output file after the script is operated is in txt format, and the result is copied into an excel table for intuitively displaying the statistical result, and the result is shown in the table 1.
TABLE 1
The total class of Table 1 is the repeat sequence name, the row is the 1-Y chromosome name, and the other values are the number of repetitions of the sequence at a chromosome match of greater than 85%.
From the sequence alignment, it can be seen that the sequence hg38_ DNA FRAGMENT 428655- >428822 has 1578 repeats on chromosome 6 and less than or equal to 3 repeats on chromosome 1/3/12/20, so that it can be theoretically determined that the sequence is a chromosome 6 centromere specific Alpha satellite DNA repeat sequence.
Through the process, the invention can respectively obtain the specific repeated sequences of the chromosome 6, the chromosome 4 and the chromosome 20, wherein the specific repeated sequences are respectively as follows:
Chromosome 6 Alpha satellite DNA sequence:
TGTAGTATTTCCAAGCGGATATTTGGAACGCCTTGAAGCGTATGGTAGAAAAGGAAATATCTTTCCATAAAACCTAGACAGAACCCATCTCAGAAACGACTTTGTGATGTCTGCATTGAACTCACAGAGTTGAACATTTCTCTTGATAGAGCAGTTTTGAAACCCTCT
chromosome 4 Alpha satellite DNA sequence:
CTGCACTACCAGGAAGTGGACATTTCGAGCGCTTTGAGGCTTATGGTGAAAAAGGAAATATCTTCTCATAAAAACCAGAAAGAAGCGTTCTCAGAAACTTCTTTGTGTTGTGTGTACTCATGTAACAGTGTTGAACCATCCTTTTGACAGAGCAGTTTTGAAACAATCTTT
Chromosome 20 Alpha satellite DNA sequence:
GGATGTTTCGATTGAAGTCCCAGTGTTGAACATTCCCTTTTATAGAGCAGGTTGGAAACACTCTTTCTGCATTCCCTGGAAGTGGACATTTAGAGCGCTTTCAGGACGACGGTGAAAATGGAAATATCTTCCAAGAAAATCTAGATAGAA
in order to verify the effectiveness of the searching method, in-situ hybridization experiments are carried out after fluorescent markers are used for the searched chromosome centromere specific Alpha satellite DNA repeated sequences, and specific repeated sequences of the searched chromosomes 4, 6 and 20 are taken as examples. The experimental procedure was as follows:
I. fluorescent labeling:
① The searched Alpha satellite DNA sequence is delivered to a pUC57 vector by a gene synthesis company for cloning;
② The upstream primer and the downstream primer are synthesized, wherein the sequence of the upstream primer is CCTTATCCGGTAACTATCGTCT, and the sequence of the downstream primer is TGTTCTTCTAGTGTAGCCGTA. (Note: this sequence is pUC57 vector sequence for amplifying the repetitive sequence of inserted Alpha satellite DNA)
③ PCR labeling is carried out by taking the cloned plasmid as a template, and the labeling system is as follows:
After the PCR reaction is completed, the Alpha satellite DNA probe solution is obtained.
II. Preparation of probe working solution
1. Mu.L of Alpha satellite DNA probe solution, 1ul of gene probe solution, 1ul of Cot-1DNA and 7ul of hybridization buffer solution are taken, evenly mixed and centrifuged to prepare probe working solution.
Note that the selected gene probe is the corresponding Alpha satellite DNA probe on the same chromosome as the control probe, and Cot-1 DNA is used for blocking nonspecific hybridization.
III, fluorescence In Situ Hybridization (FISH)
1. Taking a clean glass slide;
2. taking 5 μl of suspension after re-suspending the cells, dripping the suspension onto a glass slide, and airing at room temperature;
3. Observing the cell density under a phase contrast microscope by using a 10X objective lens, wherein the cells are required to be non-overlapped, and the number of single-field cells is preferably 100-200:
3.1 if the cell density and number are proper, continuing to step 4;
3.2 if there is overlap of cells, diluting the cell suspension by adding appropriate fresh fixative;
3.3 if the cell density is low, centrifuging at 2000rpm for 5 minutes, carefully sucking a proper amount of supernatant, uniformly mixing, taking 3 μl of suspension, tabletting, airing, and observing;
4. If too much cell debris is observed under a phase contrast microscope, pretreatment is required and the appropriate hybridization region is selected.
Pretreatment of glass slides:
the dropped cell pellet is placed in an environment of 60 ℃ and baked for 20-30 minutes.
① . After baking the slide, placing the slide in 2 XSSC at room temperature and immersing for 10 minutes;
② . Then soaking in 70%, 85% and 100% ethanol for 2 min to dehydrate;
③ . Taking out the slide glass and airing at room temperature.
Sample denaturation, hybridization with probe (note light shielding):
① . Taking out the probe, mixing, centrifuging, adding 10ul probe to the hybridization area, covering with cover glass, sealing with sealing glue along the edge of cover glass, and taking care that sealing completely avoids hybridization solution volatilization.
② . The slides were placed on a hybridization apparatus for denaturing hybridization according to the set program (recommended: 82℃for 6 minutes, 40℃overnight hybridization).
Post hybridization wash (note light protection):
1. the prepared washing liquid I is placed in a water bath kettle for 30 minutes before washing, and is preheated to 72+/-1 ℃.
2. After removing the slide from the hybridization apparatus and removing the sealing gel, the slide was immersed in the washing solution II at room temperature for 10 minutes to remove the cover slip (the cover slip was not directly removed with tweezers).
3. The slide was placed in washing solution I at 72.+ -. 1 ℃ for 2 minutes.
4. The slide was removed and placed in washing solution II at room temperature for 2 minutes.
5. The slide glass is taken out and then soaked in 70% and 85% ethanol solution for 2 minutes, and then dried at room temperature.
6. DAPI staining and sealing, namely dripping 10 mu l of DAPI counterstain solution onto the sliced tissues, avoiding bubbles, covering a cover glass, smearing the edge of the cover glass with nail polish, and incubating for 10-20 minutes at minus 20 ℃.
7. Microscopic examination, namely preserving the slice in the dark. Long-term storage should be carried out at-20deg.C and counted under a fluorescence microscope.
The experimental results are shown in FIGS. 6-a, 6-b and 6-c, wherein FIG. 6-a is a graph showing the results of labeling fluorescent probes and gene probes with Alpha satellite DNA sequences in the centromeric region of chromosome 4, FIG. 6-b is a graph showing the results of labeling fluorescent probes and gene probes with Alpha satellite DNA sequences in the centromeric region of chromosome 6, and FIG. 6-c is a graph showing the results of labeling fluorescent probes and gene probes with Alpha satellite DNA sequences in the centromeric region of chromosome 20. This shows that the specific repetitive sequence found by the method meets the biological requirement, and the invention can not only quickly find the specific repetitive sequence, but also find the specific repetitive sequence accurately.

Claims (5)

CN202111667386.8A2021-12-312021-12-31 A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomesActiveCN114300054B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111667386.8ACN114300054B (en)2021-12-312021-12-31 A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomes

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111667386.8ACN114300054B (en)2021-12-312021-12-31 A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomes

Publications (2)

Publication NumberPublication Date
CN114300054A CN114300054A (en)2022-04-08
CN114300054Btrue CN114300054B (en)2025-02-07

Family

ID=80973530

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111667386.8AActiveCN114300054B (en)2021-12-312021-12-31 A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomes

Country Status (1)

CountryLink
CN (1)CN114300054B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103984880A (en)*2014-06-052014-08-13江苏省农业科学院Overall evaluation and mining method for biological genome tandem repeat sequences
CN106566876A (en)*2016-10-132017-04-19四川农业大学Oligonucleotide probe and acquisition method thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JPH0686697A (en)*1991-04-091994-03-29Univ California Repeat sequence chromosome-specific nucleic acid probe
EP2944647A1 (en)*2007-07-262015-11-18Cellay, Inc.Highly visible chromosome-specific probes and related methods
EP3548637A1 (en)*2016-11-292019-10-09Genomic VisionMethod for designing a set of polynucleotide sequences for analysis of specific events in a genetic region of interest
CN110232952B (en)*2018-12-302022-11-18中国农业科学院棉花研究所 A Bioinformatics Method for Batch Analysis of Microsatellite Data
KR102320966B1 (en)*2021-02-152021-11-04대한민국(식품의약품안전처장)Approach for microsatellite marker detection using NGS
CN113488105B (en)*2021-09-082022-01-18臻和(北京)生物科技有限公司Microsatellite locus based on amplicon next-generation sequencing MSI detection, screening method and application thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103984880A (en)*2014-06-052014-08-13江苏省农业科学院Overall evaluation and mining method for biological genome tandem repeat sequences
CN106566876A (en)*2016-10-132017-04-19四川农业大学Oligonucleotide probe and acquisition method thereof

Also Published As

Publication numberPublication date
CN114300054A (en)2022-04-08

Similar Documents

PublicationPublication DateTitle
Domingos et al.In the shadows: phylogenomics and coalescent species delimitation unveil cryptic diversity in a Cerrado endemic lizard (Squamata: Tropidurus)
CN111808983B (en) Standard DNA Fingerprint Library of Rubber Tree Varieties and Its Construction Method and Special Primers
CN110144418B (en) A common Camellia oleifera SSR molecular marker primer and labeling method and application
CN101684487A (en)Method for identifying industrially cultivated strains of hypsizygus marmoreus by using SSR molecular marker
CN107868834B (en)Real-time quantitative PCR detection primer group and method for Acipenser dabryanus gonad differential expression gene
CN113832243A (en) Core SNP markers for identification of tea varieties based on KASP technology
CN114574614B (en) Identification methods of Vietnamese camellia, fragrant flower camellia, small fruit camellia and common camellia
CN108642207B (en) A kind of detection method for rapid and accurate identification of bilberry plants
CN118600018B (en) Sheep whole genome molecular probe combination, gene chip and its application
Levin et al.Optimization for sequencing and analysis of degraded FFPE-RNA samples
CN114300054B (en) A method for searching the Alpha satellite DNA sequence in the centromere region of human chromosomes
CN113699220A (en)Method for identifying honey and tracing honey producing area by regional plant sources
CN113667760A (en)SSR marker primer and method for evaluating genetic diversity of sparus latus population
CN113142034A (en)Method for synchronously identifying floating algae and benthic algae in aquatic ecosystem
CN107164545B (en) Specific identification method of watermelon variety "Jingmei"
AU2021101596A4 (en)SSR molecular marker primer set for identifying Rhynchostylis and use thereof
CN111518921A (en)Method for identifying Liancheng white duck by adopting SNP molecular marker technology
CN104988148B (en)A kind of swamp type water buffalo SSR primers and its application
Iqbal et al.Status and potential of single‐cell transcriptomics for understanding plant development and functional biology
CN113462811A (en)Fructus amomi SSR molecular marker primer group and application thereof
CN105907847B (en)The application of primer sets, the method for carrying out germplasm resource for cotton analysis of genetic diversity using the primer sets
US11421281B2 (en)Methods for identification of driver mutations in a patient tumor by mutation processing based reconstruction of tumor developmental history
CN111705155A (en) EST-SSR marker identification method and primers for the hybrids of Huajinhua and Chinese Lycoris
CN109554445B (en)Effective and simple method for analyzing genetic relationship between peanut species
KR102286875B1 (en)Molecular marker for discriminating branchless watermelon cultivar and uses thereof

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp