
The Human Gene Mutation Database:building a comprehensive mutation repository for clinical and molecular genetics,diagnostic testing and personalized genomic medicine
Peter D Stenson
Matthew Mort
Edward V Ball
Katy Shaw
Andrew D Phillips
David N Cooper
Corresponding author.
Received 2013 Jul 29; Accepted 2013 Sep 3; Issue date 2014.
Open AccessThis article is distributed under theterms of the Creative Commons Attribution License which permits any use,distribution, and reproduction in any medium, provided the original author(s) andthe source are credited.
Abstract
The Human Gene Mutation Database (HGMD®) is acomprehensive collection of germline mutations in nuclear genes that underlie, orare associated with, human inherited disease. By June 2013, the database containedover 141,000 different lesions detected in over 5,700 different genes, with newmutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMDwas originally established in 1996 for the scientific study of mutational mechanismsin human genes. However, it has since acquired a much broader utility as a centralunified disease-oriented mutation repository utilized by human moleculargeneticists, genome scientists, molecular biologists, clinicians and geneticcounsellors as well as by those specializing in biopharmaceuticals, bioinformaticsand personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academicinstitutions/non-profit organizations whilst the subscription version (HGMDProfessional) is available to academic, clinical and commercial users under licensevia BIOBASE GmbH.
Introduction
The Human Gene Mutation Database (HGMD®) represents an attempt to collate all known gene lesions responsiblefor causing human inherited disease together with disease-associated/functionalpolymorphisms that have been published in the peer-reviewed literature. These datacomprise single base-pair substitutions in coding, regulatory and splicing-relevant(both intronic and exonic) regions of human nuclear genes, as well asmicro-deletions and micro-insertions, combined micro-insertions/micro-deletions(indels) of 20 bp or less, repeat variations, gross lesions (deletions, insertionsand duplications of greater than 20 bp, up to and including a single characterizedgene or group of contiguous genes that are directly involved in the aetiology of thedisease/phenotype) and complex rearrangements (including inversions, translocationsand complex indels). Mutation data are summarized in Table 1.
Table 1.
Numbers of different mutations by mutation type present in HGMDProfessional 2013.2 and the publicly available version of the database (June28th 2013)
| Mutation type | Total numbers of mutations | ||
|---|---|---|---|
| HGMD Professional | With chromosomal coordinates | Publicly available | |
| Missense substitutions | 62,368 | 61,845 | 44,933 |
| Nonsense substitutions | 15,781 | 15,574 | 11,306 |
| Splicing substitutions | 13,030 | 12,538 | 9,467 |
| Regulatory substitutions | 2,751 | 2,713 | 1,753 |
| Micro-deletions ≤ 20 bp | 21,681 | 21,134 | 15,796 |
| Micro-insertions ≤20 bp | 8,994 | 8,721 | 6,494 |
| Micro-indels ≤20 bp | 2,083 | 2,004 | 1,459 |
| Gross deletions >20 bp | 10,267 | 0 | 6,156 |
| Gross insertions/duplications >20 bp | 2,376 | 0 | 1,253 |
| Complex rearrangements | 1,409 | 0 | 946 |
| Repeat variations | 421 | 0 | 305 |
| Totals | 141,161 | 124,529 | 99,868 |
HGMD does not include either somatic or mitochondrial mutations, which are wellcovered by COSMIC (Forbes et al.2011)and MitoMap (Ruiz-Pesini et al.2007),respectively. HGMD also does not attempt to provide comprehensive coverage ofpharmacological variants (except for those variants where evidence supporting afunctional impairment has been provided); such variants are covered by PharmGKB(Thorn et al.2010). Finally, HGMD isnot a general genetic variation database; users interested in this type of variantshould visit dbSNP (Sherry et al.2001)or the Exome Variant Server (http://evs.gs.washington.edu/EVS/).
HGMD was originally established for the scientific study of mutationalmechanisms in human genes causing inherited disease (Cooper et al.2010), but has since acquired a much broaderutility as a central unified repository for germ-line disease-related functionalvariation. It is now routinely accessed and utilized by next generation sequencing(NGS) project researchers, human molecular geneticists, molecular biologists,clinicians and genetic counsellors as well as by those specializing inbiopharmaceuticals, bioinformatics and personalized genomics.
HGMD is available in two versions: one public, one obtainable by subscription.The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academicinstitutions/non-profit organizations. This version is, however, maintained in abasic form that is only updated twice per annum, is permanently 3 years out of date,and does not contain any of the additional annotations or extra features present inHGMD Professional (such as GRCh37/hg19 genomic chromosomal coordinates, HGVSnomenclature and additional literature references, see Table 2). The Professional version is available to bothcommercial and academic/non-profit users via subscription from BIOBASE GmbH (http://www.biobase-international.com).
Table 2.
Differences between HGMD Professional and HGMD Public
| HGMD Professional | HGMD Public | |
|---|---|---|
| Up-to-date mutation data | ✓ | ✗ |
| Curator comments | ✓ | ✓ |
| Quarterly updates+ | ✓ | ✗ |
| Gene-oriented search | ✓ | ✓ |
| Mutation-oriented search | ✓ | ✗ |
| Reference-oriented search | ✓ | ✗ |
| Batch search mode | ✓ | ✗ |
| Chromosomal coordinates | ✓ | ✗ |
| HGVS nomenclature | ✓ | ✗ |
| Additional literature references | ✓ | ✗ |
| Tracked variant history | ✓ | ✗ |
| dbSNP identifiers | ✓ | ✗ |
| Enhanced search options | ✓ | ✗ |
| Advanced search features | ✓ | ✗ |
| Disease ontology terms* | ✓ | ✗ |
| Data in VCF* | ✓ | ✗ |
| Downloadable version | ✓ | ✗ |
+HGMD Public is updated on a 6-monthlybasis
* Download customers only
Acquisition of mutation data
All HGMD mutation data are manually curated from the scientific literature.Identification of relevant literature reports is carried out via a combination ofmanual journal screening and automated procedures. The database currently containsmutation entries obtained from over 41,000 primary and 15,000 additional(supplementary) literature reports published in more than 1,950 different journals.Of >10,000 identified articles screened for mutation data during 2012, 35 %contained novel mutation data, 29 % contained additional useful information (e.g. invitro functional data or further clinical or phenotypic information) and weretherefore cited as additional references, whilst the remaining 36 % of articlescontained no novel mutation data or supporting information to warrant theirinclusion as either primary or supplementary references in HGMD. The number ofarticles screened by HGMD is increasing on a yearly basis; however, we impose noprior limit upon the number of articles we include as supplementary references for agiven mutation.
For ~4 % of all the missense/nonsense mutations reported in the literatureduring 2012, it was necessary for the HGMD Curators to contact the original authorsto obtain correction and/or clarification of the nature or precise location of themutations in question. However, only half of the mutations that required authorcontact were satisfactorily resolved by these means, leading to their inclusion inHGMD; the ~2 % of unresolved missense/nonsense mutations will not be entered intoHGMD unless or until the nature or precise location of the mutation(s) in questionis determined to the satisfaction of the HGMD curators. Such data (currently 366entries) are, however, retained indefinitely by HGMD as part of a “Bad Bank” ofinadequately described mutations.
Classes of variant listed in HGMD
There are six different classes of variant present in HGMD (Figs. 1,2).Disease-causing mutations (DM) are entered into HGMD where the authors of thecorresponding report(s) have demonstrated that the reported mutation(s) are involvedin conferring the associated clinical phenotype upon the individuals concerned. TheDM classification may, however, also appear with a question mark (DM?), denoting aprobable/possible pathological mutation, reported to be pathogenic in thecorresponding report, but where (1) the author has indicated that there may be somedegree of uncertainty; (2) the HGMD curators believe greater interpretationalcaution is warranted; or (3) subsequent evidence has appeared in the literaturewhich has called the putatively deleterious nature of the variant intoquestion.
Fig. 1.

5,734 genes are listed in HGMD professional 2013.2, subdividedhere by variant class
Fig. 2.
HGMD annual mutation totals subdivided by variant class. *2013figures to June 28th
Disease-associated polymorphisms (DP) are entered into HGMD where there isevidence for a significant association with a disease/clinical phenotype along withadditional evidence that the polymorphism is itself likely to be of functionalrelevance (e.g. as a consequence of genic/genomic location, evolutionaryconservation, transcription factor binding potential, etc.), although there may beno direct evidence (e.g. from an expression study) of a functional effect.Functional polymorphisms (FP) are included in HGMD where the reporting authors haveshown that the polymorphism in question exerts a direct functional effect (e.g. bymeans of an in vitro reporter gene assay or alternatively by protein structure,function or expression studies), but with no disease association reported as yet.Disease-associated polymorphisms with supporting functional evidence (DFP) must meetboth of the above criteria in that the polymorphism should not only be reported tobe significantly associated with disease but should also display evidence of beingof direct functional relevance.
Copy number variations (CNVs) represent an important subset of potentiallyfunctional disease-associated variation. While HGMD does not wish to replicate theexcellent curatorial work of other resources (e.g. the Database of Genomic Variantshttp://dgv.tcag.ca/dgv/app/home, DECIPHERhttp://decipher.sanger.ac.uk/ and Copy Number Variation in Diseasehttp://202.97.205.78/CNVD/), we are nevertheless interested in including such variants (as grossdeletions or duplications) if they meet certain criteria. Therefore, HGMD willinclude such variations if they have been shown to be both of functionalsignificance and associated with disease, and involve a single characterized genethat has itself been directly implicated in the disease association. Such variantswould be entered under one of the above-mentioned polymorphism categories, dependingupon the supporting evidence provided by the authors of the original reportingarticle.
In the opinion of the HGMD curators, the polymorphism data present in HGMDshould be viewed with a considerable degree of caution owing to (1) the possibilitythat the observed disease association may be simply due to a linkage disequilibriumeffect rather than a bona fide underlying functional mechanism and (2) the fact thatin vitro studies are not invariably accurate indicators of in vivo functionality(Cirulli and Goldstein2007; Dimas etal.2009).
Finally, frameshift or truncating variants (FTV) are polymorphic or rarevariants reported in the literature that are predicted to truncate or otherwisealter the length of the gene product (i.e. a stop-gain, stop-loss or frameshiftvariant) but with no disease association reported as yet. Most known FTVs have beenidentified during the course of large-scale genome/exome screening studies(involving either patient panels or apparently healthy individuals from the generalpopulation). They may be considered to represent either latent protein deficienciesor, potentially, heterozygous carrier states for recessive disorders. Coverage ofFTVs is far from being comprehensive at this juncture, and it remains unclear whatproportion will turn out to be clinically significant.
The HGMD curators have adopted a policy of continual reassessment of the curatedcontent within the database. If and when additional and important new informationpertaining to a specific mutation entry becomes available (e.g. questionablepathogenicity, confirmed pathogenicity, additional clinical or laboratoryphenotypes, population frequency data, supporting functional studies, etc.), thenthe mutation entry may be revised or even re-categorized. Alternatively, a commentor additional reference may be added in order to communicate this new information tousers. Where new information becomes available which suggests that a givendisease-causing mutation (DM) is likely to be of questionable pathological relevanceor possibly a neutral polymorphism (on the basis of additional case reports,genome/population screening studies, presence in dbSNP with reliable populationfrequency data, etc.), it may be flagged with a question mark (DM?) or even removedfrom the database entirely if it turns out to have been erroneously included abinitio. In a recent re-curation exercise, a total of 539 mutations were re-examineddue to their presence in the 1000 Genomes Project dataset at a frequency of >1 %.Of the total re-examined, 33 mutations were removed from HGMD, 109 werere-categorized and 220 had additional comments or references added to furtherjustify their inclusion in HGMD (Xue et al.2012). One reason why some HGMD-listed mutations are often to befound among 1000 Genomes Project data is that many pathogenic lesions are foundquite frequently in the population at large (Nishiguchi and Rivolta2012; Andreasen et al.2013; Lazarin et al.2013; Cooper et al.2013). In addition to internal curation, users of HGMD Professionalmay utilize a feedback function in order to inform the HGMD curators of relevant newor missing information, to request corrections or to ask for the reclassification orremoval of a listed variant.
Most of the clinical phenotypes attributed to DMs in HGMD represent individuallyrare conditions that are generally regarded as monogenic diseases. However, it isimportant to note that HGMD also considers a few silent protein deficiencies orbiochemical phenotypes (e.g. butrylcholinesterase deficiency, reduced oxygenaffinity haemoglobin, etc.) to be worthy of inclusion since they are potentiallydisease-relevant (even if they are relatively common in the general population);such variants may well be assigned to the DM category.
For individual mutations in HGMD, the provision of zygosity information(heterozygous, homozygous or compound heterozygous) has not been attempted. Reasonsfor this include (1) the fact that this information is not always unambiguouslyprovided in the corresponding article; (2) the possibility that a given mutation maybe pathogenic irrespective of the zygosity in which it is found; (3) the clinicalconsequences of zygosity may often be modified by other genetic variants either incis or intrans and (4) the general phenomenon of variable or reduced penetrancewhich ensures that the genotype is not invariably predictive of the phenotype(Cooper et al.2013). Thus, informationpertaining to zygosity would not always be helpful or informative with regard toascertaining or predicting the clinical phenotype, and indeed might even proveinaccurate or misleading.
HGMD users should not assume that just because a mutation is labelled “DM”, thatit automatically follows that the mutation is known or believed to be pathogenic inall individuals harbouring it (i.e. that the mutation exhibits 100 % penetrance).This is not invariably going to be the case and many “disease-causing mutations”will display reduced or variable penetrance for a variety of different reasons(reviewed by Cooper et al.2013).Indeed, next generation sequencing programmes (such as the 1000 Genomes Project) arenow identifying considerable numbers of “DM” mutations in apparently healthyindividuals (MacArthur et al.2012; Xueet al.2012). Such lesions should notautomatically be regarded as being clinically irrelevant because it is quitepossible that they represent low-penetrance, mild or late onset, or more complexdisease susceptibility alleles, as opposed to neutral variants (Cooper et al.2013).
It has always been HGMD policy to enter a variant into the database even if itspathological relevance may be questionable (while indicating this fact whereverfeasible to our users), rather than run the risk of inadvertently excluding avariant that may be directly (or indirectly) relevant to disease. We have takenseveral steps to highlight such equivocation in HGMD, viz. the recent introductionof the DM? variant class, a dbSNP 1000 Genomes frequency flag (to highlight HGMDvariants that are also present in dbSNP, with allele frequency information included;see below) and the provision of additional literature citations where thepathogenicity of the variant may have been subsequently either questioned orconfirmed. This latter point is particularly pertinent in the clinical setting,where a greater burden of proof may be required for use in diagnostic and predictivemedicine, and when considering the return of incidental findings to patients aftertesting (Green et al.2012,2013; Ng et al.2013; Gonsalves et al.2013).
HGMD Professional
HGMD Professional has been developed to serve as the subscription version ofHGMD, and is available to both commercial and academic customers under license fromBIOBASE GmbH. HGMD Professional allows access to up-to-date mutation data with aquarterly release cycle; this version is therefore essential for checking thenovelty of newly found mutations. HGMD Professional contains many features notavailable in the free public version (Table 2). More powerful search tools in the form of an expanded searchengine with full text Boolean searching are provided. A batch search mode hasrecently been developed to allow users to search HGMD using gene (e.g. OMIM IDs) andvariant (e.g. dbSNP IDs) oriented lists. Users can employ these tools to performadditional searches for gene-specific (e.g. chromosomal locations, genenames/aliases and gene ontology), mutation-specific (e.g. chromosomal coordinates,HGVS nomenclature, dbSNP ID) or citation-specific (e.g. first author, publicationyear, PubMed ID) information. The provision of chromosomal coordinates (hg19) forthe vast majority of our nucleotide substitutions (98.7 % coverage) and othermicro-lesions (97.3 % coverage) has made HGMD an invaluable tool for the large-scaleanalysis of NGS datasets such as the 1000 Genomes Project (1000 Genomes ProjectConsortium2010,2012). Additional information is also provided on amutation-specific basis including curatorial comments pertaining to particularmutations (for example, if the mutation data presented required in-house correctionin relation to the data presented in the original publication [5–10 % of entries],or if the clinical phenotype is associated with a more complex, i.e. a digenic orSNP in-cis inheritance pattern), additionalreports comprising functional characterisation, further phenotypic information,comparative biochemical parameters, evolutionary conservation and SIFT (Sim et al.2012) and MutPred (Li et al.,2009) predictions. These additionalannotations are updated on a regular basis.
Recently, HGMD clinical phenotypes have been annotated against the UnifiedMedical Language System (UMLS) using a combination of manual curation and naturallanguage processing. The UMLS is a comprehensive collection of biomedical conceptsand the relationships between them (http://www.nlm.nih.gov/research/umls/). These UMLS mappings provide users with a more accurate and expandedphenotype search. Thus, searches using alternative disease names will return thesame result-set, e.g. a search for “breast cancer” would yield identical results toa search for “malignant breast tumour”. In addition, utilizing the UMLS allows forpowerful semantic searching (e.g. searches for all mutations linked to blooddisorders or all immune disorders).
Another new feature involves the highlighting of HGMD entries where thepathogenicity of the variant may have been cast into doubt by virtue of its allelefrequency. HGMD Professional now displays a frequency flag when a listed variant isalso found in dbSNP, and population frequency data from the 1000 Genomes Project arealso provided. HGMD data have also recently been made available in Variant CallFormat or VCF (Danecek et al.2011),which will facilitate the comparison of HGMD with large NGS datasets. In addition tosearching and viewing mutation data in a variety of ways, users of HGMD Professionalmay utilize a new feedback facility to submit corrections to the database curatorsor to request additional features.
HGMD Professional also contains an Advanced Search suite which has been designedto enhance mutation searching, viewing and retrieval. Two of the main types ofmutation in HGMD (single-nucleotide substitutions and micro-lesions) can beinterrogated with this toolset. Datasets for more than one mutation type may becombined (for example, micro-deletions, micro-insertions and indels) to enable morepowerful searching across comparable types of mutation. When using the AdvancedSearch, users can tailor their queries with more specific criteria, includingfunctional profile (e.g. in vitro and in silico characterized transcription factorbinding sites, post-translational modifications, microRNA binding sites, upstreamORFs, and catalytic residues, see Fig. 3);amino-acid change; nucleotide substitution; size and/or sequence composition ofmicro-deletions, micro-insertions or indels; pre- or user-defined sequence motifs(both those created and those abolished by the mutation); dbSNP number; keywordsfound in the article title or abstract. Results returned by the Advanced Search canbe downloaded as tab-delimited text or a genome browser track, ready to be used indifferent applications. The Advanced Search also includes a batch mode called“Mutation Mart” to query HGMD via multiple identifiers including dbSNP, Entrez gene (http://www.ncbi.nlm.nih.gov/gene) and PubMed. HGMD Professional is available to subscribers either asan online only package or in downloadable form enabling users to incorporate HGMDdata into their local variant analysis pipelines (http://www.biobase-international.com).
Fig. 3.
Advanced nucleotide substitutions search in HGMDProfessional
Other variant databases
Several other databases are available that attempt to record disease-causing ordisease-associated (i.e. pathogenic) variation. These include the Online MendelianInheritance in Man, OMIM (http://www.omim.org/; Amberger et al.2009),ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/), dbSNP (http://www.ncbi.nlm.nih.gov/SNP/; Sherry et al.2001) andan assorted collection of locus-specific mutation databases (LSDBs) (http://www.hgvs.org/dblist/glsdb.html). OMIM does not provide statistics for allelic variants on itswebsite; however, 22,901 germline OMIM variants appear to have been added toClinVar, which itself contains a total of 25,375 pathogenic and probable pathogenicgermline variants, while dbSNP contains 23,973 pathogenic or probable pathogenicgermline variants (all databases were accessed July 10th 2013). Owing to the highlydispersed nature of the LSDBs and the potential for duplication between databases,accurate statistics with regard to like-for-like bona fide germline disease-causing(not merely neutral) variation is difficult to obtain. Since OMIM only records alimited number of variants per gene, and ClinVar is still in its infancy, HGMD isthe only database of human pathological mutations that approaches comprehensivecoverage of the peer-reviewed literature (Peterson et al.2013). Since ClinVar and the LSDBs containunpublished (non-peer reviewed) mutation data, the question has arisen as to whetherHGMD should also include these data (Patrinos et al.2012). However, several obstacles have been encountered by theLSDBs, including serious problems pertaining to data quality as well as issues ofdata provenance and consent. HGMD has therefore taken the decision not to includesuch data at this time.
How HGMD is utilized
Registered users of the public HGMD website currently number in excess of60,000. Users may not download HGMD data in their entirety. However, mutation datamay be made available at the discretion of the curators for non-commercial researchpurposes. Potential collaborators who wish to access HGMD data in full are requiredto sign a confidentiality agreement.
HGMD data have been used to perform an extensive series of meta-analyses ondifferent types of gene mutation causing human inherited disease. These studies havehelped to improve our understanding of mutational spectra and the molecularmechanisms underlying human inherited disease (Cooper et al.2011). They have served to demonstrate not onlythat human gene mutation is an inherently non-random process but also that thenature, location and frequency of different types of mutation are shaped in largepart by the local DNA sequence environment (Cooper et al.2011). HGMD data have been used extensively inseveral international collaborative research projects including the 1000 GenomesProject (1000 Genomes Project2010,2012), where a surprising number ofHGMD variants were found in apparently healthy individuals. They have also been usedin the comparative analysis of several orthologous genomes including gorilla (Scallyet al.2012), cynomolgus and Chinesemacaque (Yan et al.2011), Rhesusmacaque (Rhesus Macaque Genome Sequencing and Analysis Consortium2007) and rat (Rat Genome Sequencing ProjectConsortium2004), in which manyapparently disease-causing mutations in human were found as wild type (‘compensatedmutations’).
In a clinical setting, HGMD is widely utilized by many groups in ongoing NGSdiagnostic (Johnston et al.2012, Calvoet al.2012, Bell et al.2011) and human genome sequencing (Tong et al.2010; Kim et al.2009) programmes. HGMD has also been used by anumber of different groups to aid the development of post-NGS variant interpretationalgorithms including MutPred (Li et al.2009), PROVEAN (Choi et al.2012), CAROL (Lopes et al.2012), CRAVAT (Douville et al.2013), NEST (Carter et al.2013) and FATHMM (Shihab et al.2013). Finally, HGMD has been used as a resource for structuralbiologists in the reconstruction of protein interaction networks (Wang et al.2012; Guo et al.2013). A more complete list of articles which haveutilized HGMD data or expertise in their production can be found on the HGMD website (http://www.hgmd.cf.ac.uk/docs/articles.html).
Data sharing
A limited HGMD data set, containing both chromosomal coordinates and HGMDidentifiers, has been made available via academic data exchange programmes to theGen2Phen project (Webb et al.2011),the European Bioinformatics Institute (EBI)/Ensembl (Flicek et al.2013) and the University of California, Santa Cruz(UCSC) (Meyer et al.2013) and may beviewed in these projects’ respective genome browsers. Data from HGMD Professionalhave additionally been made available to HGMD subscribers via Genome Trax™ (BIOBASEGmbH) and Alamut (Interactive Biosoftware), but are also accessible as part of theHGMD Professional stand-alone package (BIOBASE GmbH). Allowing free access to thebulk of the mutation data present in HGMD, while generating sufficient income fromits commercial distribution to support its maintenance and expansion, represents abusiness model that should maximize the availability of HGMD at the same time asensuring its long-term sustainability. Although we are necessarily obliged to beprudent with regard to data sharing with public data repositories, we have alwaystaken the view that making as much data as possible publicly available is generallybeneficial to both HGMD and its users worldwide.
Future plans
The provision of chromosomal coordinates for the vast majority of coding regionmicro-lesions in HGMD is now complete. Expanding this provision to includemicro-lesions in non-coding regions and the gross and complex lesion dataset (wherefeasible) is a high priority, as is expanding the provision of genomic coordinatesto include popularly utilized NGS formats such as General Feature Format (GFF) (http://www.sanger.ac.uk/resources/software/gff/) and BED format, to complement the recently added HGMD Variant CallFormat (VCF) (Danecek et al.2011).Mutations will also be mapped to the new genome build (GRCh38) in due course. Alisting of removed variants will be implemented as time allows. Provision of genomicreference sequences based on the NCBI RefSeqGene project (Pruitt et al.2009), links to available protein structures andhomology models, and mapping HGMD phenotypes to the Human Phenotype Ontology (HPO)are also regarded as priorities.
In its current state of development, HGMD provides the user with a uniqueresource that can be utilized not only to obtain evidence to support thepathological authenticity and/or novelty of detected gene lesions and to acquire anoverview of the mutational spectra for specific genes, but also as a knowledgebasefor use in the bioinformatics and whole genome screening projects that underpinpersonalized genomics.
Conflict of interest
The authors wish to declare an interest in so far as HGMD is financiallysupported by BIOBASE GmbH through a license agreement with CardiffUniversity.
Contributor Information
Peter D. Stenson, Phone: +44-29-20744062, FAX: +44-29-20746551, Email: StensonPD@Cardiff.ac.uk
David N. Cooper, Phone: +44-29-20744062, FAX: +44-29-20746551, Email: cooperDN@cardiff.ac.uk
References
- 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A,Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA (2010) A map of human genomevariation from population-scale sequencing. Nature 467:1061–1073 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD,DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA (2012) Anintegrated map of genetic variation from 1,092 human genomes. Nature491:56–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amberger J, Bocchini CA, Scott AF, Hamosh A (2009) McKusick’sOnline mendelian inheritance in man (OMIM). Nucleic Acids Res37:D793–D796 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andreasen C, Refsgaard L, Nielsen JB, Sajadieh A, Winkel BG,Tfelt-Hansen J, Haunsø S, Holst AG, Svendsen JH, Olesen MS (2013) Mutations ingenes encoding cardiac ion channels previously associated with sudden infant deathsyndrome (SIDS) are present with high frequency in new exome data. Can J Cardiol29(9):1104–1109 [DOI] [PubMed] [Google Scholar]
- Bell CJ, Dinwiddie DL, Miller NA, Hateley SL, Ganusova EE, Mudge J,Langley RJ, Zhang L, Lee CC, Schilkey FD, Sheth V, Woodward JE, Peckham HE,Schroth GP, Kim RW, Kingsmore SF (2011) Carrier testing for severe childhoodrecessive diseases by next-generation sequencing. Sci Transl Med3:65ra4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calvo SE, Compton AG, Hershman SG, Lim SC, Lieber DS, Tucker EJ,Laskowski A, Garone C, Liu S, Jaffe DB, Christodoulou J, Fletcher JM, Bruno DL,Goldblatt J, Dimauro S, Thorburn DR, Mootha VK (2012) Molecular diagnosis ofinfantile mitochondrial disease with targeted next-generation sequencing. SciTransl Med 4:118ra10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carter H, Douville C, Stenson PD, Cooper DN, Karchin R (2013)Identifying Mendelian disease genes with the Variant Effect Scoring Tool. BMCGenomics 14(Suppl 3):S3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting thefunctional effect of amino acid substitutions and indels. PLoS ONE2012:e46688 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cirulli ET, Goldstein DB (2007) In vitro assays fail to predict invivo effects of regulatory polymorphisms. Hum Mol Genet16:1931–1939 [DOI] [PubMed] [Google Scholar]
- Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD,Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD (2010) Genes, mutations,and human inherited disease at the dawn of the age of personalized genomics. HumMutat 31:631–655 [DOI] [PubMed] [Google Scholar]
- Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, ChenJM (2011) On the sequence-directed nature of human gene mutation: the role ofgenomic architecture and the local DNA sequence environment in mediating genemutations underlying human inherited disease. Hum Mutat32:1075–1099 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C,Kehrer-Sawatzki H (2013) Where genotype is not predictive of phenotype: towards anunderstanding of the molecular basis of reduced penetrance in human inheriteddisease. Hum Genet 132:1077–1130 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA,Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 GenomesProject Analysis Group (2011) The variant call format and VCFtools. Bioinformatics27:2156–2158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C,Attar-Cohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, Gagnebin M,Nisbett J, Deloukas P, Dermitzakis ET, Antonarakis SE (2009) Common regulatoryvariation impacts gene expression in a cell type-dependent manner. Science325:1246–1250 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD,Cooper DN, Ryan M, Karchin R (2013) CRAVAT: cancer-related analysis of VAriantsToolkit. Bioinformatics 29:647–648 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S,Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L,García-Girón C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kähäri AK, Keenan S,Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R,Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ritchie GR, RuffierM, Schuster M, Sheppard D, Sobral D, Taylor K, Thormann A, Trevanion S, White S,Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Harrow J, Herrero J, HubbardTJ, Johnson N, Kinsella R, Parker A, Spudich G, Yates A, Zadissa A, Searle SM(2013) ENSEMBL 2013. Nucleic Acids Res 41:D48–D55 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M,Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA(2011) COSMIC: mining complete cancer genomes in the Catalogue of SomaticMutations in Cancer. Nucleic Acids Res 39(Databaseissue):D945–D950 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gonsalves SG, Ng D, Johnston JJ, Teer JK, NISC ComparativeSequencing Program, Stenson PD, Cooper DN, Mullikin JC, Biesecker LG (2013) Usingexome data for opportunistic screening of malignant hyperthermia susceptibility.Anesthesiology 6(4):337–346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green RC, Berg JS, Berry GT, Biesecker LG, Dimmock DP, Evans JP,Grody WW, Hegde MR, Kalia S, Korf BR, Krantz I, McGuire AL, Miller DT, Murray MF,Nussbaum RL, Plon SE, Rehm HL, Jacob HJ (2012) Exploring concordance anddiscordance for return of incidental findings from clinical sequencing. Genet Med14:405–410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, McGuireAL, Nussbaum RL, O’Daniel JM, Ormond KE, Rehm HL, Watson MS, Williams MS,Biesecker LG (2013) ACMG recommendations for reporting of incidental findings inclinical exome and genome sequencing. Genet Med 15:565–574 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo Y, Wei X, Das J, Grimson A, Lipkin SM, Clark AG, Yu H (2013)Dissecting disease inheritance modes in a three-dimensional network challenges the“guilt-by-association” principle. Am J Hum Genet 93:78–89 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnston JJ, Rubinstein WS, Facio FM, Ng D, Singh LN, Teer JK,Mullikin JC, Biesecker LG (2012) Secondary variants in individuals undergoingexome sequencing: screening of 572 individuals identifies high-penetrancemutations in cancer-susceptibility genes. Am J Hum Genet91:97–108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA,Hong D, Bell CJ, Kim HS, Chung IS, Lee WC, Lee JS, Seo SH, Yun JY, Woo HN, Lee H,Suh D, Lee S, Kim HJ, Yavartanoo M, Kwak M, Zheng Y, Lee MK, Park H, Kim JY,Gokcumen O, Mills RE, Zaranek AW, Thakuria J, Wu X, Kim RW, Huntley JJ, Luo S,Schroth GP, Wu TD, Kim H, Yang KS, Park WY, Kim H, Church GM, Lee C, Kingsmore SF,Seo JS (2009) A highly annotated whole-genome sequence of a Korean individual.Nature 460:1011–1015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lazarin GA, Haque IS, Nazareth S, Iori K, Patterson AS, JacobsonJL, Marshall JR, Seltzer WK, Patrizio P, Evans EA, Srinivasan BS (2013) Anempirical estimate of carrier frequencies for 400+ causal Mendelian variants:results from an ethnically diverse clinical sample of 23,453 individuals. GenetMed 15:178–186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, Cooper DN, Mooney SD,Radivojac P (2009) Automated inference of molecular mechanisms of disease fromamino acid substitutions. Bioinformatics 25:2744–2750 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopes MC, Joyce C, Ritchie GR, John SL, Cunningham F, Asimit J,Zeggini E (2012) A combined functional annotation score for non-synonymousvariants. Hum Hered 73:47–51 [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J,Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD,Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE,Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, SunerMM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, YngvadottirB, Bumpstead S, Cooper DN, Xue Y, Romero IG, 1000 Genomes Project Consortium, WangJ, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, HarrowJ, Hurles ME, Gerstein MB, Tyler-Smith C (2012) A systematic survey ofloss-of-function variants in human protein-coding genes. Science335:823–828 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M,Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, LeeBT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L,Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H,Barber GP, Haussler D, Kent WJ (2013) The UCSC Genome Browser database: extensionsand updates. Nucleic Acids Res 41(1):D64–D69 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ng D, Johnston JJ, Teer JK, Singh LN, Peller LC, Wynter JS, LewisKL, Cooper DN, Stenson PD, Mullikin JC, Biesecker LG (2013) Interpreting secondarycardiac disease variants in an exome cohort. Circ Cardiovasc Genet6:337–346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishiguchi KM, Rivolta C (2012) Genes associated with retinitispigmentosa and allied diseases are frequently mutated in the general population.PLoS ONE 7:e41902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patrinos GP, Cooper DN, van Mulligen E, Gkantouna V, Tzimas G,Tatum Z, Schultes E, Roos M, Mons B (2012) Microattribution and nanopublication asmeans to incentivize the placement of human genome variation data into the publicdomain. Hum Mutat 33:1503–1512 [DOI] [PubMed] [Google Scholar]
- Peterson TA, Doughty E, Kann MG (2013) Towards precision medicine:advances in computational approaches for analysis of human variants. J Mol Biol. doi:10.1016/j.jmb.2013.08.008 [DOI] [PMC free article] [PubMed]
- Pruitt KD, Tatusova T, Klimke W, Maglott DR (2009) NCBI referencesequences: current status, policy and new initiatives. Nucleic Acids Res37:D32–D36 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rat Genome Sequencing Project Consortium (2004) Genome sequence ofthe Brown Norway rat yields insights into mammalian evolution. Nature428:493–521 [DOI] [PubMed] [Google Scholar]
- Rhesus Macaque Genome Sequencing and Analysis Consortium (2007)Evolutionary and biomedical insights from the rhesus macaque genome. Science316:222–234 [DOI] [PubMed] [Google Scholar]
- Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, MishmarD, Yi C, Kreuziger J, Baldi P, Wallace DC (2007) An enhanced MITOMAP with a globalmtDNA mutational phylogeny. Nucleic Acids Res 35:D823–D828 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J,Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, McCarthy S, Montgomery SH,Schwalie PC, Tang YA, Ward MC, Xue Y, Yngvadottir B, Alkan C, Andersen LN, Ayub Q,Ball EV, Beal K, Bradley BJ, Chen Y, Clee CM, Fitzgerald S, Graves TA, Gu Y, HeathP, Heger A, Karakoc E, Kolb-Kokocinski A, Laird GK, Lunter G, Meader S, Mort M,Mullikin JC, Munch K, O’Connor TD, Phillips AD, Prado-Martinez J, Rogers AS,Sajjadian S, Schmidt D, Shaw K, Simpson JT, Stenson PD, Turner DJ, Vigilant L,Vilella AJ, Whitener W, Zhu B, Cooper DN, de Jong P, Dermitzakis ET, Eichler EE,Flicek P, Goldman N, Mundy NI, Ning Z, Odom DT, Ponting CP, Quail MA, Ryder OA,Searle SM, Warren WC, Wilson RK, Schierup MH, Rogers J, Tyler-Smith C, Durbin R(2012) Insights into hominid evolution from the gorilla genome sequence. Nature483:169–175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM,Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res29:308–311 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ,Day IN, Gaunt TR (2013) Predicting the functional, molecular, and phenotypicconsequences of amino acid substitutions using hidden Markov models. Hum Mutat34:57–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFTweb server: predicting effects of amino acid substitutions on proteins. NucleicAcids Res 40:W452–W457 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorn CF, Klein TE, Altman RB (2010) Pharmacogenomics andbioinformatics: PharmGKB. Pharmacogenomics 11:501–505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tong P, Prendergast JG, Lohan AJ, Farrington SM, Cronin S, Friel N,Bradley DG, Hardiman O, Evans A, Wilson JF, Loftus B (2010) Sequencing andanalysis of an Irish human genome. Genome Biol 11:R91 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Wei X, Thijssen B, Das J, Lipkin SM, Yu H (2012)Three-dimensional reconstruction of protein networks provides insight into humangenetic disease. Nat Biotechnol 30:159–164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Webb AJ, Thorisson GA, Brookes AJ, GEN2PHEN Consortium (2011) Aninformatics project and online “Knowledge Centre” supporting moderngenotype-to-phenotype research. Hum Mutat 32:543–550 [DOI] [PubMed] [Google Scholar]
- Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, ShawK, Stenson PD, Cooper DN, Tyler-Smith C, the 1000 Genomes Project Consortium(2012) Deleterious- and disease-allele prevalence in healthy individuals: insightsfrom current predictions, mutation databases, and population-scale resequencing.Am J Hum Genet 91:1022–1032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan G, Zhang G, Fang X, Zhang Y, Li C, Ling F, Cooper DN, Li Q, LiY, van Gool AJ, Du H, Chen J, Chen R, Zhang P, Huang Z, Thompson JR, Meng Y, BaiY, Wang J, Zhuo M, Wang T, Huang Y, Wei L, Li J, Wang Z, Hu H, Yang P, Le L,Stenson PD, Li B, Liu X, Ball EV, An N, Huang Q, Zhang Y, Fan W, Zhang X, Li Y,Wang W, Katze MG, Su B, Nielsen R, Yang H, Wang J, Wang X, Wang J (2011) Genomesequencing and comparison of two nonhuman primate animal models, the cynomolgusand Chinese rhesus macaques. Nat Biotechnol 29:1019–1023 [DOI] [PubMed] [Google Scholar]

