Sequence analysis inmolecular biology involves identifying the sequence ofnucleotides in anucleic acid, oramino acids in apeptide orprotein. Once a sample has been obtained, DNA sequences may be produced automatically by machine and the result displayed on computer. Interpreting those results is still a task forhumans.
Information from sequence analysis is used in many fields of biology. It gives information on the relationship between individual organisms, or between groups of organisms. It shows how closely related they are.
ADNA sequence is the sequence of nucleotides in aDNAmolecule. It is written as a succession of letters representing the primary structure of aDNAmolecule or strand. If functional, such a sequence carries information for the sequence ofamino acids in aprotein molecule. The possible letters are A, C, G, and T, representing the fournucleotide bases of a DNA strand —adenine,cytosine,guanine,thymine. The sequences are printed next to one another, without gaps, as in the sequence AAAGTCTGAC.
The study ofRNA andproteins is more complex. The overall structure of DNA is simple and predictable (doublehelix). The study of RNA and proteins must include a study of their 3-dimensional structure, which is varied, and influences how they work. To some extent this can be assisted by computer, but has to be verified in each case.
Information on sequences is kept indatabases. Since the development of fast production of gene and protein sequences during the 1990s, the rate of addition of new sequences to the databases increases all the time.
Completegenome analysis has been done on over 800species and strains. The work is done by a machine, theDNA sequencer, which analyses light signals fromfluorochromes attached to thenucleotides. This type of work is gradually becoming less expensive.
As of December 2012, whole genome analysis has been completed on about 800 to 900 living species and strains of species. Numbers are approximate, and changing.[3]
Thehuman genome is stored on 23chromosome pairs in thecell nucleus and in the smallmitochondrial DNA. A great deal is now known about thesequences of DNA which are on our chromosomes. What the DNA actually does is now partly known. Applying this knowledge in practice has only just begun.
The Human Genome Project (HGP) produced a reference sequence which is used worldwide in biology and medicine.Nature published the publicly funded project's report,[4] andScience publishedCelera's paper.[5] These papers described how the draft sequence was produced, and gave an analysis of the sequence. Improved drafts were announced in 2003 and 2005, filling in to ≈92% of the sequence.[6]
The latest projectENCODE studies the way the genes are controlled.[7][8]
It is not necessary to have whole genome sequences forforensic work, such as identifying a criminal from traces of DNA left at a crime scene, or for paternity cases. At present whole genome sequencing is still very expensive, but fortunately, simpler and cheaper methods are available.
The basic idea is to look at certainloci (places) in the genome which are highly variable between people. About 10 to 15 of these loci are needed for a match, and the legal details differ between countries. A match between a sample and a suspect individual makes it extremely likely that the individual was the source of the sample. This evidence would then be the basis of the prosecution case for a crime. A similar analysis would show that a man was very likely the father of a child. This is really a modern way to do what was done withblood groups before DNA details could be analysed. The methods have been developed mainly by the work ofAlec Jeffreys.
Each person’s DNA contains twoalleles of a particular gene or 'marker': one from the father and one from the mother. 'Markers' are genes chosen for having a number of different alleles occurring frequently in the population. The following table is from a commercial DNA paternity testing experiment. It shows howrelatedness between parents and child is demonstrated with five markers:
DNA Marker | Mother | Child | Alleged father |
---|---|---|---|
D21S11 | 28, 30 | 28, 31 | 29, 31 |
D7S820 | 9, 10 | 10, 11 | 11, 12 |
TH01 | 14, 15 | 14, 16 | 15, 16 |
D13S317 | 7, 8 | 7, 9 | 8, 9 |
D19S433 | 14, 16.2 | 14, 15 | 15, 17 |
The results show that the child and the alleged father’s DNA match for these five markers. The complete test results showed this correlation on 16 markers between the child and the tested man. If a case is tested in court, a forensic scientist would give evidence on the likelihood of getting that result by chance.
There are state laws on DNA profiling in all 50 states of theUnited States.[9] Detailed information on database laws in each state can be found at the National Conference of State Legislatures website.[10]
Ancient DNA has been recovered from some sources. The record for survival of DNA suitable for sequence analysis is 700,000 years. Ahorse skeleton buried inpermafrost has provided bones with some DNA surviving.[11] The sequence was only 70% complete, but it was enough for researchers to say "It would not look like a horse as we know it… but we would expect it to be a one-toed horse". For comparison, researchers had access to DNA sequences of modern horses,donkeys andPrzewalski's horse.