For a more accessible and less technical introduction to this topic, seeIntroduction to genetics.
Genomic information
Schematic representation of the human diploidkaryotype, showing the organization of the genome into chromosomes, as well as annotatedbands and sub-bands as seen onG banding. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. Chromosomal changes during thecell cycle are displayed at top center. Themitochondrial genome is shown to scale at bottom left.
Thehuman genome is a complete set of DNA sequences for each of the 22 autosomes and the two distinct sex chromosomes (X and Y). A small DNA molecule is found within individualmitochondria. These are usually treated separately as the nuclear genome and themitochondrial genome.[1]
In 2000, scientists reported the sequencing of 88% of human genome,[2] but as of 2020, at least 8% was still missing.[3] In 2021, scientists reported sequencing a complete, female genome (i.e., without the Y chromosome).[4][3] Thehuman Y chromosome, consisting of 62,460,029 base pairs from a different cell line and found in all males, was sequenced completely in January 2022.[5]
The current version of the standard reference genome is called GRCh38.p14 (July 2023). It consists of 22 autosomes plus one copy of the X chromosome and one copy of the Y chromosome. It contains approximately 3.1 billion base pairs.[6] This represents the size of a composite genome based on data from multiple individuals but it is a good indication of the typical amount of DNA in a haploid set of chromosomes because the Y chromosome is quite small.[7] Most human cells are diploid so they contain twice as much DNA (~6.2 billion base pairs).
In 2023, a draft humanpangenome reference was published.[8] It is based on 47 genomes from persons of varied ethnicity.[8] Plans are underway for an improved reference capturing still more biodiversity from a still wider sample.[8]
While there are significant differences among the genomes of human individuals (on the order of 0.1% due tosingle-nucleotide variants[9] and 0.6% when consideringindels),[10] these are considerably smaller than the differences between humans and their closest living relatives, thebonobos andchimpanzees (~1.1%fixed single-nucleotide variants[11] and 4% when including indels).[12]
The total length of the humanreference genome does not represent the sequence of any specific individual, nor does it represent the sequence of all of the DNA found within a cell. The human reference genome only includes one copy of each of the paired, homologous autosomes plus one copy of each of the two sex chromosomes (X and Y). The total amount of DNA in this reference genome is 3.1 billion base pairs.[13]
Protein-coding sequences represent the most widely studied and best understood component of the human genome. These sequences ultimately lead to the production of all humanproteins, although several biological processes (e.g.DNA rearrangements andalternative pre-mRNA splicing) can lead to the production of many more unique proteins than the number of protein-coding genes.
The human reference genome contains somewhere between 19,000 and 20,000 protein-coding genes.[14][15] These genes contain an average of 10 introns and the average size of an intron is about 6 kb (6,000 bp).[16] This means that the average size of a protein-coding gene is about 62 kb and these genes take up about 40% of the genome.[17]
Exon sequences consist of coding DNA and untranslated regions (UTRs) at either end of the mature mRNA. The total amount of coding DNA is about 1-2% of the genome.[18][16]
Many people divide the genome into coding and non-coding DNA based on the idea that coding DNA is the most important functional component of the genome. About 98–99% of the human genome is non-coding DNA.
Noncoding RNA molecules play many essential roles in cells, especially in the many reactions ofprotein synthesis andRNA processing. Noncoding genes include those fortRNAs,ribosomal RNAs,microRNAs,snRNAs andlong non-coding RNAs (lncRNAs).[19][20][21][22] The number of reported non-coding genes continues to rise slowly but the exact number in the human genome is yet to be determined. Many RNAs are thought to be non-functional.[23]
Many ncRNAs are critical elements in gene regulation and expression. Noncoding RNA also contributes to epigenetics, transcription, RNA splicing, and the translational machinery. The role of RNA in genetic regulation and disease offers a new potential level of unexplored genomic complexity.[24]
Pseudogenes are inactive copies of protein-coding genes, often generated bygene duplication, that have become nonfunctional through the accumulation of inactivating mutations. The number of pseudogenes in the human genome is on the order of 13,000,[25] and in some chromosomes is nearly the same as the number of functional protein-coding genes. Gene duplication is a major mechanism through which new genetic material is generated duringmolecular evolution.
For example, theolfactory receptor gene family is one of the best-documented examples of pseudogenes in the human genome. More than 60 percent of the genes in this family are non-functional pseudogenes in humans. By comparison, only 20 percent of genes in the mouse olfactory receptor gene family are pseudogenes. Research suggests that this is a species-specific characteristic, as the most closely related primates all have proportionally fewer pseudogenes. This genetic discovery helps to explain the less acute sense of smell in humans relative to other mammals.[26]
The human genome has many differentregulatory sequences which are crucial to controllinggene expression. Some scientists believe that these sequences make up 8% of the genome,[27] but other scientists predict that 20% or more of the human genome might be devoted to regulatory sequences.[28][29]
A value of 8% would correspond to approximately 10,000 bp of regulatory DNA per gene and a value of 20% corresponds to 25,000 bp of regulatory DNA per gene. Many scientists think that these estimates are unreasonably high and conflict with the view that only 10% of the genome is functional and 90% isjunk DNA.[30]
Regulatory sequences have been known since the late 1960s.[31] The first identification of regulatory sequences in the human genome relied on recombinant DNA technology.[32] Later with the advent of genomic sequencing, the identification of these sequences could be inferred by evolutionary conservation. The evolutionary branch between theprimates andmouse, for example, occurred 70–90 million years ago.[33] So computer comparisons of gene sequences that identify conserved sequences will be an indication of their importance in functions such as gene regulation.[34]
The results indicate that about 10% of the human genome is conserved.[35][36] Several hundred thousand human genome sequences have been sequenced and there is a considerable amount of variation between individuals. Only 10% of the genome seems to be protected from mutations by purifying selection.[37]
As of 2012, the efforts have shifted toward finding interactions between DNA and regulatory proteins by the techniqueChIP-Seq, or gaps where the DNA is not packaged byhistones (DNase hypersensitive sites), both of which tell where there are potential regulatory sequences in the investigated cell type.[27]
About 8% of the human genome consists of tandem DNA arrays or tandem repeats, low complexity repeat sequences that have multiple adjacent copies (e.g. "CAGCAGCAG...").[39] The tandem sequences may be of variable lengths, from two nucleotides to tens of nucleotides. These sequences are highly variable, even among closely related individuals, and so are used forgenealogical DNA testing andforensic DNA analysis.[40]
Repeated sequences of fewer than ten nucleotides (e.g. the dinucleotide repeat (AC)n) are termed microsatellite sequences. Among the microsatellite sequences, trinucleotide repeats are of particular importance, as sometimes occur withincoding regions of genes for proteins and may lead to genetic disorders. For example, Huntington's disease results from an expansion of the trinucleotide repeat (CAG)n within theHuntingtin gene on human chromosome 4.Telomeres (the ends of linear chromosomes) end with a microsatellite hexanucleotide repeat of the sequence (TTAGGG)n.[citation needed]
Tandem repeats of longer sequences (arrays of repeated sequences 10–60 nucleotides long) are termedminisatellites.[41]
Transposable genetic elements, DNA sequences that can replicate and insert copies of themselves at other locations within a host genome, are an abundant component in the human genome. The most abundant transposon lineage,Alu, has about 50,000 active copies,[42] and can be inserted into intragenic and intergenic regions.[43] One other lineage, LINE-1, has about 100 active copies per genome (the number varies between people).[44] Together with non-functional relics of old transposons, they account for over half of total human DNA.[45] Sometimes called "jumping genes", transposons have played a major role in sculpting the human genome. Some of these sequences representendogenous retroviruses, DNA copies of viral sequences that have become permanently integrated into the genome and are now passed on to succeeding generations. There are also a significant number ofretroviruses in human DNA, at least 3 of which have been proven to possess an important function (i.e.,HIV-like functional HERV-K; envelope genes of non-functional viruses HERV-W and HERV-FRD play a role in placenta formation by inducing cell-cell fusion).
There is no consensus on what constitutes a "functional" element in the genome since geneticists, evolutionary biologists, and molecular biologists employ different definitions and methods.[46][47] Due to the ambiguity in the terminology, different schools of thought have emerged.[48] In evolutionary definitions, "functional" DNA, whether it is coding or non-coding, contributes to the fitness of the organism, and therefore is maintained by negativeevolutionary pressure whereas "non-functional" DNA has no benefit to the organism and therefore is under neutral selective pressure. This type of DNA has been described asjunk DNA.[49][50] In genetic definitions, "functional" DNA is related to how DNA segments manifest by phenotype and "nonfunctional" is related to loss-of-function effects on the organism.[46] In biochemical definitions, "functional" DNA relates to DNA sequences that specify molecular products (e.g. noncoding RNAs) and biochemical activities with mechanistic roles in gene or genome regulation (i.e. DNA sequences that impact cellular level activity such as cell type, condition, and molecular processes).[51][46] There is no consensus in the literature on the amount of functional DNA since, depending on how "function" is understood, ranges have been estimated from up to 90% of the human genome is likely nonfunctional DNA (junk DNA)[52] to up to 80% of the genome is likely functional.[53] It is also possible that junk DNA may acquire a function in the future and therefore may play a role in evolution,[54] but this is likely to occur only very rarely.[49] Finally DNA that is deleterious to the organism and is under negative selective pressure is called garbage DNA.[50]
The first humangenome sequences were published in nearly complete draft form in February 2001 by theHuman Genome Project[55] andCelera Corporation.[56] Completion of the Human Genome Project's sequencing effort was announced in 2004 with the publication of a draft genome sequence, leaving just 341 gaps in the sequence, representing highly repetitive and other DNA that could not be sequenced with the technology available at the time.[57] The human genome was the first of all vertebrates to be sequenced to such near-completion, and as of 2018, the diploid genomes of over a million individual humans had been determined usingnext-generation sequencing.[58]
These data are used worldwide inbiomedical science,anthropology,forensics and other branches of science. Such genomic studies have led to advances in the diagnosis and treatment of diseases, and to new insights in many fields of biology, includinghuman evolution.[citation needed]
By 2018, the total number of genes had been raised to at least 46,831,[59] plus another 2300micro-RNA genes.[60] A 2018 population survey found another 300 million bases of human genome that was not in the reference sequence.[61] Prior to the acquisition of the full genome sequence, estimates of the number of human genes ranged from 50,000 to 140,000 (with occasional vagueness about whether these estimates included non-protein coding genes).[62] As genome sequence quality and the methods for identifying protein-coding genes improved,[57] the count of recognized protein-coding genes dropped to 19,000–20,000.[63]
In 2022, the Telomere-to-Telomere (T2T) consortium reported the complete sequence of a human female genome,[3] filling all the gaps in theX chromosome (2020) and the 22 autosomes (May 2021).[3][64] The previously unsequenced parts containimmune response genes that help to adapt to and survive infections, as well as genes that are important for predictingdrug response.[65] The completed human genome sequence will also provide better understanding of human formation as an individual organism and how humans vary both between each other and other species.[65]
Although the 'completion' of the human genome project was announced in 2001,[2] there remained hundreds of gaps, with about 5–10% of the total sequence remaining undetermined. The missing genetic information was mostly in repetitiveheterochromatic regions and near thecentromeres andtelomeres, but also some gene-encodingeuchromatic regions.[66] There remained 160 euchromatic gaps in 2015 when the sequences spanning another 50 formerly unsequenced regions were determined.[67] Only in 2020 was the first truly complete telomere-to-telomere sequence of a human chromosome determined, namely of theX chromosome.[68] The first complete telomere-to-telomere sequence of a human autosomal chromosome,chromosome 8, followed a year later.[69] The complete human genome (without Y chromosome) was published in 2021, while with Y chromosome in January 2022.[3][4][70]
In 2023, a draft humanpangenome reference was published.[8] It is based on 47 genomes from persons of varied ethnicity.[8] Plans are underway for an improved reference capturing still more biodiversity from a still wider sample.[8]
With the exception of identical twins, all humans show significant variation in genomic DNA sequences. The humanreference genome (HRG) is used as a standard sequence reference.
There are several important points concerning the human reference genome:
The HRG is a haploid sequence. Each chromosome is represented once.
The HRG is a composite sequence, and does not correspond to any actual human individual.
The HRG is periodically updated to correct errors, ambiguities, and unknown "gaps".
The HRG in no way represents an "ideal" or "perfect" human individual. It is simply a standardized representation or model that is used for comparative purposes.
Most studies of human genetic variation have focused onsingle-nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur 1 in 1000 base pairs, on average, in theeuchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless ofrace, genetically 99.9% the same",[72] although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved incopy number variation.[73] A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by theInternational HapMap Project.[citation needed]
The genomic loci and length of certain types of smallrepetitive sequences are highly variable from person to person, which is the basis ofDNA fingerprinting andDNA paternity testing technologies. Theheterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significantphenotypic effect results from typical variation in repeats or heterochromatin.
Most gross genomic mutations ingamete germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities.Down syndrome,Turner Syndrome, and a number of other diseases result fromnondisjunction of entire chromosomes.Cancer cells frequently haveaneuploidy of chromosomes and chromosome arms, although acause and effect relationship between aneuploidy and cancer has not been established.
Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome.[74][75]
An example of a variation map is the HapMap being developed by theInternational HapMap Project. The HapMap is ahaplotype map of the human genome, "which will describe the common patterns of human DNA sequence variation."[76] It catalogs the patterns of small-scale variations in the genome that involve single DNA letters, or bases.
Researchers published the first sequence-based map of large-scale structural variation across the human genome in the journalNature in May 2008.[77][78] Large-scale structural variations are differences in the genome among people that range from a few thousand to a few million DNA bases; some are gains or losses of stretches of genome sequence and others appear as re-arrangements of stretches of sequence. These variations includedifferences in the number of copies individuals have of a particular gene, deletions, translocations and inversions.
Structural variation refers to genetic variants that affect larger segments of the human genome, as opposed to pointmutations. Often, structural variants (SVs) are defined as variants of 50 base pairs (bp) or greater, such as deletions, duplications, insertions, inversions and other rearrangements. About 90% of structural variants are noncoding deletions but most individuals have more than a thousand such deletions; the size of deletions ranges from dozens of base pairs to tens of thousands of bp.[79] On average, individuals carry ~3 rare structural variants that alter coding regions, e.g. deleteexons. About 2% of individuals carry ultra-rare megabase-scale structural variants, especially rearrangements. That is, millions of base pairs may be inverted within a chromosome; ultra-rare means that they are only found in individuals or their family members and thus have arisen very recently.[79]
Single-nucleotide polymorphisms (SNPs) do not occur homogeneously across the human genome. In fact, there is enormous diversity inSNP frequency between genes, reflecting different selective pressures on each gene as well as different mutation and recombination rates across the genome. However, studies on SNPs are biased towards coding regions, the data generated from them are unlikely to reflect the overall distribution of SNPs throughout the genome. Therefore, theSNP Consortium protocol was designed to identify SNPs with no bias towards coding regions and the Consortium's 100,000 SNPs generally reflect sequence diversity across the human chromosomes. TheSNP Consortium aims to expand the number of SNPs identified across the genome to 300 000 by the end of the first quarter of 2001.[80]
TSC SNP distribution along the long arm of chromosome 22 (fromhttps://web.archive.org/web/20130903043223/http://snp.cshl.org/ ). Each column represents a 1 Mb interval; the approximate cytogenetic position is given on the x-axis. Clear peaks and troughs of SNP density can be seen, possibly reflecting different rates of mutation, recombination and selection.
Changes innon-coding sequence and synonymous changes incoding sequence are generally more common than non-synonymous changes, reflecting greater selective pressure reducing diversity at positions dictating amino acid identity. Transitional changes are more common than transversions, with CpG dinucleotides showing the highest mutation rate, presumably due to deamination.[citation needed]
A personal genome sequence is a (nearly) completesequence of the chemical base pairs that make up theDNA of a single person. Because medical treatments have different effects on different people due to genetic variations such assingle-nucleotide polymorphisms (SNPs), the analysis of personal genomes may lead to personalized medical treatment based on individual genotypes.[81]
The first personal genome sequence to be determined was that ofCraig Venter in 2007. Personal genomes had not been sequenced in the public Human Genome Project to protect the identity of volunteers who provided DNA samples. That sequence was derived from the DNA of several volunteers from a diverse population.[82] However, early in the Venter-ledCelera Genomics genome sequencing effort the decision was made to switch from sequencing a composite sample to using DNA from a single individual, later revealed to have been Venter himself. Thus the Celera human genome sequence released in 2000 was largely that of one man. Subsequent replacement of the early composite-derived data and determination of the diploid sequence, representing both sets ofchromosomes, rather than a haploid sequence originally reported, allowed the release of the first personal genome.[83] In April 2008, that ofJames Watson was also completed. In 2009, Stephen Quake published his own genome sequence derived from a sequencer of his own design, the Heliscope.[84] A Stanford team led byEuan Ashley published a framework for the medical interpretation of human genomes implemented on Quake's genome and made whole genome-informed medical decisions for the first time.[85] That team further extended the approach to the West family, the first family sequenced as part of Illumina's Personal Genome Sequencing program.[86] Since then hundreds of personal genome sequences have been released,[87] including those ofDesmond Tutu,[88][89] and of aPaleo-Eskimo.[90] In 2012, the whole genome sequences of two family trios among 1092 genomes was made public.[9] In November 2013, a Spanish family made four personal exome datasets (about 1% of the genome) publicly available under aCreative Commons public domain license.[91][92] ThePersonal Genome Project (started in 2005) is among the few to make both genome sequences and corresponding medical phenotypes publicly available.[93][94]
The sequencing of individual genomes further unveiled levels of genetic complexity that had not been appreciated before. Personal genomics helped reveal the significant level of diversity in the human genome attributed not only to SNPs but structural variations as well. However, the application of such knowledge to the treatment of disease and in the medical field is only in its very beginnings.[95]Exome sequencing has become increasingly popular as a tool to aid in diagnosis of genetic disease because the exome contributes only 1% of the genomic sequence but accounts for roughly 85% of mutations that contribute significantly to disease.[96]
In humans,gene knockouts naturally occur asheterozygous orhomozygousloss-of-function gene knockouts. These knockouts are often difficult to distinguish, especially withinheterogeneous genetic backgrounds. They are also difficult to find as they occur in low frequencies.
Populations with a high level of parental-relatedness result in a larger number of homozygous gene knockouts as compared to outbred populations.[97]
Populations with high rates ofconsanguinity, such as countries with high rates of first-cousin marriages, display the highest frequencies of homozygous gene knockouts. Such populations include Pakistan, Iceland, and Amish populations. These populations with a high level of parental-relatedness have been subjects of human knock out research which has helped to determine the function of specific genes in humans. By distinguishing specific knockouts, researchers are able to use phenotypic analyses of these individuals to help characterize the gene that has been knocked out.
A pedigree displaying a first-cousin mating (carriers both carrying heterozygous knockouts mating as marked by double line) leading to offspring possessing a homozygous gene knockout
Knockouts in specific genes can cause genetic diseases, potentially have beneficial effects, or even result in no phenotypic effect at all. However, determining a knockout's phenotypic effect and in humans can be challenging. Challenges to characterizing and clinically interpreting knockouts include difficulty calling of DNA variants, determining disruption of protein function (annotation), and considering the amount of influencemosaicism has on the phenotype.[97]
One major study that investigated human knockouts is the Pakistan Risk of Myocardial Infarction study. It was found that individuals possessing a heterozygous loss-of-function gene knockout for theAPOC3 gene had lower triglycerides in the blood after consuming a high fat meal as compared to individuals without the mutation. However, individuals possessing homozygous loss-of-function gene knockouts of the APOC3 gene displayed the lowest level of triglycerides in the blood after the fat load test, as they produce no functional APOC3 protein.[98]
In each cell of the human body, the human genome experiences, on average, tens of thousands ofDNA damages per day.[99] These damages can block genome replication or genome transcription, and if they are notrepaired or are repaired incorrectly, they may lead tomutations, or other genome alterations in the human genome that threaten cell viability.[99]
Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, etc.). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet). With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example,cystic fibrosis is caused by mutations in the CFTR gene and is the most common recessive disorder in caucasian populations with over 1,300 different mutations known.[100]
Disease-causing mutations in specific genes are usually severe in terms of gene function and are rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they constitute a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified. Currently there are approximately 2,200 such disorders annotated in theOMIM database.[100]
Studies of genetic disorders are often performed by means of family-based studies. In some instances, population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, etc. Diagnosis and treatment of genetic disorders are usually performed by ageneticist-physician trained in clinical/medical genetics. The results of theHuman Genome Project are likely to provide increased availability ofgenetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions andcounselled on the consequences, the probability of inheritance, and how to avoid or ameliorate it in their offspring.
There are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, i.e., has little or no detectable effect on the physiology of the individual (although there may be fractional differences in fitness defined over evolutionary time frames). Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.
With the advent of the Human Genome andInternational HapMap Project, it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, etc. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disordersper se as their causes are complex, involving many different genetic and environmental factors. Thus there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder.
Additional genetic disorders of mention areKallman syndrome andPfeiffer syndrome (gene FGFR1),Fuchs corneal dystrophy (gene TCF4),Hirschsprung's disease (genes RET and FECH), Bardet-Biedl syndrome 1 (genes CCDC28B and BBS1), Bardet-Biedl syndrome 10 (gene BBS10), and facioscapulohumeral muscular dystrophy type 2 (genes D4Z4 and SMCHD1).[101]
Genome sequencing is now able to narrow the genome down to specific locations to more accurately find mutations that will result in a genetic disorder.Copy number variants (CNVs) andsingle nucleotide variants (SNVs) are also able to be detected at the same time as genome sequencing with newer sequencing procedures available, called Next Generation Sequencing (NGS).[102] This only analyzes a small portion of the genome, around 1–2%. The results of this sequencing can be used for clinical diagnosis of a genetic condition, includingUsher syndrome, retinal disease, hearing impairments, diabetes, epilepsy,Leigh disease, hereditary cancers, neuromuscular diseases, primary immunodeficiencies,severe combined immunodeficiency (SCID), and diseases of the mitochondria.[103] NGS can also be used to identify carriers of diseases before conception. The diseases that can be detected in this sequencing includeTay-Sachs disease,Bloom syndrome,Gaucher disease,Canavan disease,familial dysautonomia, cystic fibrosis,spinal muscular atrophy, andfragile-X syndrome. The Next Genome Sequencing can be narrowed down to specifically look for diseases more prevalent in certain ethnic populations.[104]
Prevalence and associated gene/chromosome for some human genetic disorders
Comparative genomics studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of extant lineages approximately 200 million years ago, containing the vast majority of genes.[105][106] The publishedchimpanzee genome differs from that of the human genome by 1.23% in direct sequence comparisons.[107] Around 20% of this figure is accounted for by variation within each species, leaving only ~1.06% consistent sequence divergence between humans and chimps at shared genes.[108] This nucleotide by nucleotide difference is dwarfed, however, by the portion of each genome that is not shared, including around 6% of functional genes that are unique to either humans or chimps.[109]
In other words, the considerable observable differences between humans and chimps may be due as much or more to genome level variation in the number, function and expression of genes rather than DNA sequence changes in shared genes. Indeed, even within humans, there has been found to be a previously unappreciated amount of copy number variation (CNV) which can make up as much as 5–15% of the human genome. In other words, between humans, there could be +/- 500,000,000 base pairs of DNA, some being active genes, others inactivated, or active at different levels. The full significance of this finding remains to be seen. On average, a typical human protein-coding gene differs from its chimpanzeeortholog by only twoamino acid substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is humanchromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13.[110] (later renamed to chromosomes 2A and 2B, respectively).
Humans have undergone an extraordinary loss ofolfactory receptor genes during our recent evolution, which explains our relatively crude sense ofsmell compared to most other mammals. Evolutionary evidence suggests that the emergence ofcolor vision in humans and several otherprimate species has diminished the need for the sense of smell.[111]
The humanmitochondrial DNA is of tremendous interest to geneticists, since it undoubtedly plays a role inmitochondrial disease. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent (seeMitochondrial Eve).
Due to the damage induced by the exposure to Reactive Oxygen Species mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold higher mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry.[citation needed] Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration ofNative Americans fromSiberia[113] orPolynesians from southeasternAsia.[citation needed] It has also been used to show that there is no trace ofNeanderthal DNA in the European gene mixture inherited through purely maternal lineage.[114] Due to the restrictive all or none manner of mtDNA inheritance, this result (no trace of Neanderthal mtDNA) would be likely unless there were a large percentage of Neanderthal ancestry, or there was strong positive selection for that mtDNA. For example, going back 5 generations, only 1 of a person's 32 ancestors contributed to that person's mtDNA, so if one of these 32 was pure Neanderthal an expected ~3% of that person's autosomal DNA would be of Neanderthal origin, yet they would have a ~97% chance of having no trace of Neanderthal mtDNA.[citation needed]
Epigenetics describes a variety of features of the human genome that transcend its primary DNA sequence, such aschromatin packaging,histone modifications andDNA methylation, and which are important in regulating gene expression, genome replication and other cellular processes. Epigenetic markers strengthen and weaken transcription of certain genes but do not affect the actual sequence of DNA nucleotides. DNA methylation is a major form of epigenetic control over gene expression and one of the most highly studied topics in epigenetics. During development, the human DNA methylation profile experiences dramatic changes. In early germ line cells, the genome has very low methylation levels. These low levels generally describe active genes. As development progresses, parental imprinting tags lead to increased methylation activity.[115][116]
Epigenetic patterns can be identified between tissues within an individual as well as between individuals themselves. Identical genes that have differences only in their epigenetic state are calledepialleles. Epialleles can be placed into three categories: those directly determined by an individual's genotype, those influenced by genotype, and those entirely independent of genotype. The epigenome is also influenced significantly by environmental factors. Diet, toxins, and hormones impact the epigenetic state. Studies in dietary manipulation have demonstrated that methyl-deficient diets are associated with hypomethylation of the epigenome. Such studies establish epigenetics as an important interface between the environment and the genome.[117]
^Miller JH, Ippen K, Scaife JG, Beckwith JR (1968). "The promoter-operator region of the lac operon of Escherichia coli".J. Mol. Biol.38 (3):413–420.doi:10.1016/0022-2836(68)90395-1.PMID4887877.
^Wright S, Rosenthal A, Flavell R, Grosveld F (1984). "DNA sequences required for regulated expression of beta-globin genes in murine erythroleukemia cells".Cell.38 (1):265–273.doi:10.1016/0092-8674(84)90548-8.PMID6088069.S2CID34587386.
^Corpas M, Cariaso M, Coletta A, Weiss D, Harrison AP, Moran F, et al. (12 November 2013). "A Complete Public Domain Family Genomics Dataset".bioRxiv10.1101/000216.
^Wong JC (2017). "Overview of the Clinical Utility of Next Generation Sequencing in Molecular Diagnoses of Human Genetic Disorders". In Wong LJ (ed.).Next Generation Sequencing Based Clinical Molecular Diagnosis of Human Genetic Disorders. Cham: Springer International Publishing. pp. 1–11.doi:10.1007/978-3-319-56418-0_1.ISBN978-3-319-56416-6.
^Fedick A, Zhang J (2017). "Next Generation of Carrier Screening". In Wong LJ (ed.).Next Generation Sequencing Based Clinical Molecular Diagnosis of Human Genetic Disorders. Cham: Springer International Publishing. pp. 339–354.doi:10.1007/978-3-319-56418-0_16.ISBN978-3-319-56416-6.
^Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. (December 2002)."Initial sequencing and comparative analysis of the mouse genome".Nature.420 (6915):520–562.Bibcode:2002Natur.420..520W.doi:10.1038/nature01262.PMID12466850.the proportion of small (50–100 bp) segments in the mammalian genome that is under (purifying) selection can be estimated to be about 5%. This proportion is much higher than can be explained by protein-coding sequences alone, implying that the genome contains many additional features (such as untranslated regions, regulatory elements, non-protein-coding genes, and chromosomal structural elements) under selection for biological function.
^Scheen AJ, Junien C (May–June 2012). "[Epigenetics, interface between environment and genes: role in complex diseases]".Revue Médicale de Liège.67 (5–6):250–257.PMID22891475.