Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

BioMed Central full text link BioMed Central Free PMC article
Full text links

Actions

.2020 Mar 17;12(1):28.
doi: 10.1186/s13073-020-00725-6.

Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders

Collaborators, Affiliations

Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders

Dennis Lal et al. Genome Med..

Abstract

Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs.

Methods: Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families.

Results: We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint.

Conclusion: This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.

Keywords: Conservation; Gene family; Missense variants; Neurodevelopmental disorders; Paralogs.

PubMed Disclaimer

Conflict of interest statement

ST, SW, and KLH were full-time employees of Ambry Genetics during the study. The remaining authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Vertical (ortholog) vs. horizontal (paralog) conservation. Top: protein sequence alignment of voltage-gated sodium channels. Top left: alignment ofHomo sapiens (NP_001159435.1),Bos taurus (NP_001180147.1), andMus musculus (NP_001300926.1) SCN1A protein sequences. High sequence similarity is depicted by violet amino acid coloring and yellow conservation bars below the alignment using JalView. Top right: protein alignment in JalView of all members of the human voltage-gated sodium channel gene family (SCN1A,SCN2A,SCN3A,SCN4A,SCN5A,SCN7A,SCN8A,SCN9A,SCN10A,SCN11A). This alignment of paralogs shows less conservation compared to the alignment ofSCN1A to its vertical cross-species orthologs on the left. Bottom left: GERP score analysis over all genes within gene families (homolog conservation is measured by the percentage of all nucleotides per gene with GERP scores > 2). Bottom right: distribution percentage of nucleotides per gene within gene families having para_zscores > 0. Conservation between close homologs is generally much more uniform and homogeneous than conservation between paralogs
Fig. 2
Fig. 2
Assessment of paralog conservation.a Identification of missense variant gene family enrichment in NDD patients for paralog-conserved missense variants. NDD-associated missense variants are enriched in paralog-conserved sites.y-axis: missense variant enrichment analysis considering only paralog non-conserved sites across genes of each gene family (para_zcore ≤ 0,pmissense_not_conserved).x-axis: missense variant enrichment analysis considering only paralog-conserved sites (para_zcore > 0,pmissense_conserved). None of the gene families shows exome-wide significant enrichment for paralog non-conserved sites. Twenty-six gene families (depicted by circles) show exome-wide significant de novo missense variant burden at paralog-conserved sites. The significance threshold was calculated by Bonferroni correction for testing 5 × 2871 gene families (P = 3.48 × 10−6) and is depicted by the blue dotted line.b Enrichment of missense variants in paralog-conserved sites in genes with significant DNM burden in this study. Distribution of NDD patient missense, nonsense, and synonymous para_zscores for all non-significantly enriched genes (top) and genes significantly enriched for DNM missense variants (bottom panel) depicted by density plots. DNM burden was calculated using the mutational framework described by Samotcha et al. (for details, see the “Methods” section). Genes were categorized into two groups: those with a significant burden and those without. In disease-associated genes (those with DNM burden), missense variants were enriched at paralog-conserved sites relative to missense variants in non-significantly enriched genes (P value < 2.2E−16, top vs. bottom panel). Missense variants in genes without DNM burden were not enriched at paralog-conserved sites compared to synonymous variants (P value = 0.1157, top panel). In genes with DNM burden (bottom panel), missense variants were significantly enriched at paralog-conserved sites compared to synonymous variants (P value = 3.01 × 10−4). The same test for nonsense variants vs. synonymous variants did not show significant differences in paralog conservation (P value = 0.3913).P values were calculated using a Wilcoxon test
Fig. 3
Fig. 3
Established NDD disease genes are brain expressed and under evolutionary constraint. Every dot represents a gene of the 43 DNM enriched gene families. The colors of the box and font represent the number of DNMs (N.DNM) identified in the gene in 10,668 NDD trios.y-axis: brain gene expression level in RPKM derived from the GTEx expression dataset;x-axis: gene constraint scores (left: pLI, indicating gene LoF intolerance; right: missensez-score, indicating gene missense intolerance). Disease-associated DNMs are likely to affect brain-expressed and evolutionary constrained genes (defined as brain expression RPKM > 1, constraint score pLI ≥ 0.9 and missensez-score > 3.09; green boxes). In support of this hypothesis, we observe that all previously known and frequently mutated genes are brain expressed and under evolutionary constraint
Fig. 4
Fig. 4
Visualization of para_zscores forKCNQ2,STXBP1,CACNA1A, andGRIN2B. Protein sequence is plotted from left to right. Each bar and dot represent one amino acid. Amino acids affected by a missense mutation in the NDD cohort are colored blue, patient PTVs are depicted in pink, and synonymous variants in orange. Amino acid residues with no mutations are colored gray.y-axis: para_zscore. Positive values indicate paralog conservation, and the highest score indicates that these amino acids are identical over all gene family members. The red dotted lines indicate the mean paralog conservation of each protein sequence, and the bars below the mean indicate regions of low paralog conservation, thus higher sequence variability over all members of the gene family
See this image and copyright information in PMC

References

    1. Allen AS, Berkovic SF, Cossette P, Delanty N, Dlugos D, Eichler EE, et al. De novo mutations in epileptic encephalopathies. Nature. 2013;501(7466):217–221. doi: 10.1038/nature12439. - DOI - PMC - PubMed
    1. De Rubeis S, He X, Goldberg AP, Poultney CS, Samocha K, Ercument Cicek A, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515(7526):209–215. doi: 10.1038/nature13772. - DOI - PMC - PubMed
    1. Fitzgerald TW, Gerety SS, Jones WD, van Kogelenberg M, King DA, McRae J, et al. Large-scale discovery of novel genetic causes of developmental disorders. Nature. 2014;519(7542):223–228. - PMC - PubMed
    1. McRae JF, Clayton S, Fitzgerald TW, Kaplanis J, Prigmore E, Rajan D, et al. Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017;542(7642):433–438. doi: 10.1038/nature21062. - DOI - PMC - PubMed
    1. Fromer M, Pocklington AJ, Kavanagh DH, Williams HJ, Dwyer S, Gormley P, et al. De novo mutations in schizophrenia implicate synaptic networks. Nature. 2014;506(7487):179–184. doi: 10.1038/nature12929. - DOI - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full text links
BioMed Central full text link BioMed Central Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp