Movatterモバイル変換

[0]ホーム

Jump to content

1000 Genomes Project

Edit links

From Wikipedia, the free encyclopedia

International research effort on genetic variation

This article needs to beupdated. Please help update this article to reflect recent events or newly available information.(October 2023)

The1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue ofhuman genetic variation at the time. Scientists planned tosequence thegenomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements innewly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journalNature.^[1] In 2012, the sequencing of 1092 genomes was announced in aNature publication.^[2] In 2015, two papers inNature reported results and the completion of the project and opportunities for future research.^[3]^[4]

Many rare variations, restricted to closely related groups, were identified, and eight structural-variation classes were analyzed.^[5]

The project united multidisciplinary research teams from institutes around the world, includingChina,Italy,Japan,Kenya,Nigeria,Peru, theUnited Kingdom, and theUnited States contributing to the sequence dataset and to a refinedhuman genome map freely accessible through public databases to the scientific community and the general public alike.^[2]

TheInternational Genome Sample Resource was created to host and expand on the data set after the project's end.^[6]

Changes in the number and order of genes (A-D) create genetic diversity within and between populations.

Background

[edit]

Since the completion of theHuman Genome Project advances in humanpopulation genetics andcomparative genomics enabled further insight into genetic diversity.^[7] The understanding aboutstructural variations (insertions/deletions (indels),copy number variations (CNV),retroelements),single-nucleotide polymorphisms (SNPs), andnatural selection were being solidified.^[8]^[9]^[10]^[11]

The diversity ofHuman genetic variation such as thatIndels were being uncovered and investigating human genomic variations^{[citation needed]}

Natural selection

[edit]

It also aimed to provide evidence that can be used to explore the impact ofnatural selection on population differences. Patterns ofDNA polymorphisms can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism.^[12]^[13] Such insights could improve understanding ofphenotypic variations,genetic disorders andMendelian inheritance and their effects on survival and/or reproduction of different human populations.

Project description

[edit]

This section needs to beupdated. Please help update this article to reflect recent events or newly available information.(April 2021)

Goals

[edit]

The 1000 Genomes Project was designed to bridge the gap of knowledge between rare genetic variants that have a severe effect predominantly on simple traits (e.g.cystic fibrosis,Huntington disease) and common genetic variants have a mild effect and are implicated in complex traits (e.g.cognition,diabetes,heart disease).^[14]

The primary goal of this project was to create a complete and detailed catalogue ofhuman genetic variations, which can be used forassociation studies relating genetic variation to disease. The consortium aimed to discover >95 % of the variants (e.g. SNPs, CNVs, indels) withminor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies,haplotype backgrounds andlinkage disequilibrium patterns of variant alleles.^[15]

Secondary goals included the support of better SNP and probe selection forgenotyping platforms in future studies and the improvement of thehuman reference sequence. The completed database was expected be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation andrecombination.^[15]

Outline

[edit]

Thehuman genome consists of approximately 3 billion DNA base pairs and is estimated to carry around 20,000protein codinggenes. In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, data quality standards and sequence coverage.^[15]

Over the course of the next three years,^{[clarification needed]} scientists at theSanger Institute,BGI Shenzhen and theNational Human Genome Research Institute’s Large-Scale Sequencing Network planned to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that was required, recruiting additional participants was maintained.^[14]

Almost 10 billion bases were to be sequenced per day over a period of the two year production phase, equating to more than two human genomes every 24 hours. The intended sequence dataset was to comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published inDNA databases at the time.^[14]

To determine the final design of the full project three pilot studies were to be carried out within the first year of the project. The first pilot intends to genotype 180 people of 3major geographic groups at low coverage (2×). For the second pilot study, the genomes of two nuclear families (both parents and an adult child) are going to be sequenced with deep coverage (20× per genome). The third pilot study involves sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20×).^[14]^[15]

It was estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Several newer technologies (e.g.Solexa,454,SOLiD) were to be applied, lowering the expected costs to between $30 million and $50 million. The major support was provided by theWellcome Trust Sanger Institute in Hinxton, England; theBeijing Genomics Institute, Shenzhen (BGI Shenzhen), China; and theNHGRI, part of the National Institutes of Health (NIH).^[14]

In keeping withFort Lauderdale principles^[16] all genome sequence data (including variant calls) is freely available as the project progresses and can be downloaded via ftp from the 1000 genomes project webpage.^[17]

Human genome samples

[edit]

Locations of population samples of 1000 Genomes Project.^[18] Each circle represents the number of sequences in the final release.

Based on the overall goals for the project, the samples will be chosen to provide power in populations whereassociation studies for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation.^[15]

For the pilot studies human genome samples from theHapMap collection will be sequenced. It will be useful to focus on samples that have additional data available (such asENCODE sequence, genome-wide genotypes,fosmid-end sequence, structural variation assays, andgene expression) to be able to compare the results with those from other projects.^[15]

Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study:Yoruba inIbadan (YRI),Nigeria;Japanese inTokyo (JPT);Chinese inBeijing (CHB);Utah residents with ancestry from northern and westernEurope (CEU);Luhya inWebuye,Kenya (LWK);Maasai inKinyawa, Kenya (MKK); Toscani inItaly (TSI); Peruvians inLima,Peru (PEL); Gujarati Indians inHouston (GIH); Chinese in metropolitanDenver (CHD); people ofMexican ancestry inLos Angeles (MXL); and people ofAfrican ancestry in the southwesternUnited States (ASW).^[14]

ID	Place	Population	Detail
ASW	*	African Ancestry inSouthwestern US	[1]
ACB	*	African Caribbean inBarbados	[2]
BEB		Bengali inBangladesh	[3]
GBR		British fromEngland andScotland	[4]
CDX		Chinese Dai inXishuangbanna,China	[5]
CLM		Colombian inMedellín,Colombia	[6]
ESN		Esan inNigeria	[7]
FIN		Finnish inFinland	[8]
GWD		Gambian inWestern Division –Mandinka	[9]
GIH	*	Gujarati Indians inHouston,Texas,United States	[10]
CHB		Han Chinese inBeijing,China	[11]
CHS		Han Chinese South China	[12]
IBS		Iberian populations inSpain	[13]
ITU	*	Indian Telugu in theUnited Kingdom	[14]
JPT		Japanese inTokyo,Japan	[15]
KHV		Kinh inHo Chi Minh City,Vietnam	[16]
LWK		Luhya inWebuye,Kenya	[17]
MSL		Mende inSierra Leone	[18]
MXL	*	Mexican Ancestry inLos Angeles,California,United States	[19]
PEL		Peruvian inLima,Peru	[20]
PUR		Puerto Rican inPuerto Rico	[21]
PJL		Punjabi inLahore,Pakistan	[22]
STU	*	Sri Lankan Tamil in the United Kingdom	[23]
TSI		Toscani inItaly	[24]
YRI		Yoruba inIbadan,Nigeria	[25]
CEU	*	Utah residents withNorthern andWestern European ancestry from theCEPH collection	[26]

* Population that was collected in diaspora

Community meeting

[edit]

Data generated by the 1000 Genomes Project is widely used by the genetics community, making the first 1000 Genomes Project one of the most cited papers in biology.^[19] To support this user community, the project held a community analysis meeting in July 2012 that included talks highlighting key project discoveries, their impact on population genetics and human disease studies, and summaries of other large-scale sequencing studies.^[20]

Project findings

[edit]

Pilot phase

[edit]

The pilot phase consisted of three projects:

low-coverage whole-genome sequencing of 179 individuals from 4 populations
high-coverage sequencing of 2 trios (mother-father-child)
exon-targeted sequencing of 697 individuals from 7 populations

It was found that on average, each person carries around 250–300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10⁻⁸ per base per generation.^[1]

References

[edit]

^^a ^bAbecasis GR,Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, et al. (October 2010)."A map of human genome variation from population-scale sequencing".Nature.467 (7319):1061–73.Bibcode:2010Natur.467.1061T.doi:10.1038/nature09534.PMC 3042601.PMID 20981092.
^^a ^bAbecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. (November 2012)."An integrated map of genetic variation from 1,092 human genomes".Nature.491 (7422):56–65.Bibcode:2012Natur.491...56T.doi:10.1038/nature11632.PMC 3498066.PMID 23128226.
^Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. (October 2015)."A global reference for human genetic variation".Nature.526 (7571):68–74.Bibcode:2015Natur.526...68T.doi:10.1038/nature15393.PMC 4750478.PMID 26432245.
^Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. (October 2015)."An integrated map of structural variation in 2,504 human genomes".Nature.526 (7571):75–81.Bibcode:2015Natur.526...75..doi:10.1038/nature15394.PMC 4617611.PMID 26432246.
^"Variety of life".Nature News & Comment. 2015-09-30. Retrieved2015-10-15.
^"1000 Genomes Project | Scientific Computing and Data".Mount Sinai School of Medicine. 2020-07-07. Retrieved2023-10-01.
^Nielsen R (October 2010)."Genomics: In search of rare human variants".Nature.467 (7319):1050–1.Bibcode:2010Natur.467.1050N.doi:10.1038/4671050a.PMID 20981085.
^JC Long, Human Genetic Variation: The mechanisms and results of microevolution, American Anthropological Association (2004)
^Anzai T, Shiina T, Kimura N, Yanagiya K, Kohara S, Shigenari A, et al. (June 2003)."Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/deletions as the major path to genomic divergence".Proceedings of the National Academy of Sciences of the United States of America.100 (13):7708–13.Bibcode:2003PNAS..100.7708A.doi:10.1073/pnas.1230533100.PMC 164652.PMID 12799463.
^Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. (November 2006)."Global variation in copy number in the human genome".Nature.444 (7118):444–54.Bibcode:2006Natur.444..444R.doi:10.1038/nature05329.PMC 2669898.PMID 17122850.
^Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (March 2008). "Natural selection has driven population differentiation in modern humans".Nature Genetics.40 (3):340–5.doi:10.1038/ng.78.PMID 18246066.S2CID 205357396.
^EE Harris et al., The molecular signature of selection underlying human adaptations, Yearbook of Physical Anthropology 49: 89-130 (2006)
^Bamshad M, Wooding SP (February 2003). "Signatures of natural selection in the human genome".Nature Reviews. Genetics.4 (2):99–111.doi:10.1038/nrg999.PMID 12560807.S2CID 13722452.
^^a ^b ^c ^d ^e ^fG Spencer, International Consortium Announces the 1000 Genomes Project, EMBARGOED (2008)http://www.nih.gov/news/health/jan2008/nhgri-22.htm
^^a ^b ^c ^d ^e ^fMeeting Report: A Workshop to Plan a Deep Catalog of Human Genetic Variation, (2007)http://www.1000genomes.org/sites/1000genomes.org/files/docs/1000Genomes-MeetingReport.pdf
^[https://web.archive.org/web/20131228183230/http://www.genome.gov/pages/research/wellcomereport0303.pdf Archived 2013-12-28 at theWayback Machine
^1000 genomes project data webpage
^Oleksyk TK, Brukhin V, O'Brien SJ (2015)."The Genome Russia project: closing the largest remaining omission on the world Genome map".GigaScience.4: 53.doi:10.1186/s13742-015-0095-0.PMC 4644275.PMID 26568821.
^C. King (2012) The Hottest Research of 2011.Science Watchhttp://archive.sciencewatch.com/newsletter/2012/201203/hottest_research_2012/
^1000 Genomes Project Community Analysis Meetinghttp://1000gconference.sph.umich.edu/

External links

[edit]

1000 Genomes - A Deep Catalog of Human Genetic Variation - official web page
International HapMap Project Archived 2014-04-16 at theWayback Machine - official web page
Human Genome Project Information

Wellcome Trust

Centres and institutes

Current	Francis Crick Institute Gurdon Institute Sainsbury Wellcome Centre for Neural Circuits and Behaviour Science Learning Centres WTC for Cell-Matrix Research WTC for Gene Regulation and Expression WTC for Human Genetics WTC for Mitochondrial Research WTC for Molecular Parasitology WTC for Neuroimaging WTC for Stem Cell Research Wellcome Sanger Institute
Former	Wellcome Trust Centre for the History of Medicine Wellcome Research Laboratories

Projects and facilities

Board of governors

Executive leadership team

Jeremy Farrar
Chris Bird
Stephen Caddick
Simon Chaplin
Alyson Fox
Peter Pereira Gray
Mark Henderson
Chonnettia Jones
Tim Livett
Nick Moakes
Kathy Poole
Jim Smith
James Thomas
Ed Whiting

Former directors

Peter Williams (1965–1991)
Bridget Ogilvie (1991–1998)
Michael Dexter (1998–2008)
Mark Walport (2003–2013)

Other key people

Awards and fellowships

Capital Awards
Collaborative Awards in Science
Investigator Awards in Science
Institutional Strategic Support Fund
Science Strategic Award
Sir Henry DaleFellowship
Sir Henry WellcomePostdoctoral Fellowship
Wellcome Book Prize
Wellcome Image Awards
Wellcome Trust Centre
Wellcome Trust Principal Research Fellow
Wellcome Trust Senior Research Fellow

Category

v t e Personal genomics
Data collection	Biobank Biological database
Field concepts	Biological specimen De-identification Human genetic variation Genetic linkage Single-nucleotide polymorphisms Identity by descent Genetic disorder
Applications	Personalized medicine Predictive medicine Genetic epidemiology Pharmacogenomics
Analysis techniques	Whole genome sequencing Genome-wide association study SNP array Genetic testing
Major projects	Human Genome Project International HapMap Project 1000 Genomes Project Human Genome Diversity Project