Movatterモバイル変換


[0]ホーム

URL:


We use cookies to enhance the usability of our website. If you continue, we'll assume that you are happy to receive all cookies.More information.Don't show this again.
HPA
RESOURCES
ABOUT
NEWS
LEARN
DATA
HELP
Fields »
Search result

Field
Term
Gene name
Class
Subclass
Category
Keyword
Chromosome
External id
Tissue
Cell type
Expression
Antibody panel
Tissue
Main location
Patient ID
Annotation
Tissue
Category
Tau score
Cluster
Reliability
Brain region
Category
Tau score
Brain region
Category
Tau score
Brain region
Category
Tau score
Cluster
Reliability
Tissue
Cell type
Enrichment
Cell type
Category
Tau score
Cell type group
Category
Tau score
Cell type
Category
Tau score
Cell type
Category
Tau score
Cell lineage
Category
Tau score
Cluster
Cluster
Location
Searches
Location
Cell line
Class
Type
Phase
Reliability
Cancer
Prognosis
Cancer
Category
Cancer
Category
Tau score
Cluster
Variants
Interacting gene (ensg_id)
Type
Number of interactions
Pathway
ipTM
Category
Category
Category
Category
Validation
Validation
Validation
Validation
Antibodies
Data type
Column


MethodTranscriptomics

Transcriptomics

HPA RNA-seq data overview

In total, 1206cell lines, 40human tissues, 193 samples from micro-dissected areas and regions of thehuman brain, and 18immune cell types as well as total peripheral blood mononuclear cells (PBMC) have been analyzed by RNA-seq to estimate the transcript abundance of each protein-coding gene. Additionally, 19mouse tissue samples and 32pig tissue samples collected from the brain and retina of the animals were sampled and analyzed by RNA-seq.

Normal tissue specimens were collected with consent from patients and all samples were anonymized in accordance with approval from the local ethics committee (ref #2011/473) and Swedish rules and legislation. All tissues were collected from the Uppsala Biobank and RNA samples were extracted from frozen tissue sections.

For a total number of 186 normal tissue samples mRNA sequencing was performed on Illumina HiSeq2000 and 2500 machines (Illumina, San Diego, CA, USA) using the standard Illumina RNA-seq protocol with a read length of 2x100 bases.

Normalization of transcriptomics data

For both theHPA andGTEx transcriptomics datasets, the average TPM value of all individual samples for each human tissue or human cell type was used to estimate the gene expression level. To be able to combine the datasets intoconsensus transcript expression levels, a pipeline was set up to normalize the data for all samples. In brief, all TPM values per sample were scaled to a sum of 1 million TPM (denoted pTPM) to compensate for the non-coding transcripts that had been previously removed. Next, all TPM values of all samples within each data source (HPA + GTEx humantissues, HPAimmune cell types, HPAcell lines) were normalized separately using Trimmed mean of M values (TMM) to allow for between-sample comparisons. The resulting normalized transcript expression values, denoted nTPM, were calculated for each gene in every sample. nTPM values below 0.1 are not visualized on the Atlas sections.

For thebrain dataset, an additional normalization was performed using linear regression to do the correction for inter-individual variation using the removeBatchEffect in the R package Limma with subject as a batch parameter. To reduce the technical variation between MGI and illumina platforms, 19 reference samples were included and run on both platforms. Intensity normalization based on reference samples was conducted to minimize technical variation between two platforms.

Consensus transcript expression levels for each gene were summarized in 51 human tissues based on transcriptomics data from the two sources HPA and GTEx. The consensus nTPM value for each gene and tissue type represents the maximum nTPM value based on HPA and GTEx. For tissues with multiple sub-tissues (brain regions, immune cells, lymphoid tissues and intestine) the maximum of all sub-tissues is used for the tissue type and the total number of tissue types in the human tissue consensus set is 37.

TheFANTOM5 dataset was normalized separately on the sample level using TMM. The normalized Tags Per Million for each gene were calculated based on the average of all individual samples for each human tissue.

Mouse andpig transcriptomic data generated by the HPA in collaboration withBGI, were normalized separately, according to the same procedure used for human tissues and cell types, no Limma adjustment was performed on the mouse and pig data. Consensus transcript expression levels is summarized into 13 brain regions for mouse brain and 15 regions for pig brain, where sub-regional samples were combined and the maximum of sub-regions used for the brain region.

Single cell type clusters were normalized separately from other transcriptomics datasets using TMM. To generate expression values per cell type, clusters were aggregated per cell type by first calculating the weighted mean nTPM in all cells with the same cluster annotation within a dataset. The values for the same cell types in different data sets were then mean averaged to a single aggregated value. Only clusters with medium and high reliability were included and clusters containing mixed cell types, Neutrophils and Platelets were excluded.

Classification of transcriptomics data

The consensus transcriptomics data was used to classify all genes according to their tissue-specific, single cell type-specific, brain region-specific, immune cell-specific or cell line-specific expression into two different schemas: specificity category and distribution category. These are defined based on the total set of all nTPM values in 40 tissues, 154 single cell types, 13 main regions of each mammalian brain,18 immune cell types or 1132 cell lines grouped into 28 cancer types and using a cutoff value of 1 nTPM as a limit for detection across all tissues or cell types.

Explanation of the specificity category

CategoryDescription
EnrichednTPM in a particular tissue/region/cell type at least four times any other tissue/region/cell type
Group enrichednTPM in a group (of 2-5 tissues, brain regions, single cell types or cell lines, or 2-10 immune cell types) at least four times any other tissue/region/cell line/immune cell type/cell type
EnhancedEnhanced: nTPM in a one or several tissues, brain regions, cell lines, immune cell types or single cell types that has at least four times the mean of all tissue/region/cell types
Low specificitynTPM ≥ 1 in at least one tissue/region/cell type but not elevated in any tissue/region/cell type
Not detectednTPM < 1 in all tissue/region/cell types


An additional category "elevated", containing all genes in the first three categories (tissue/cell line/cell type enriched, group enriched and tissue/cell line/cell type enhanced), has been used for some parts of the analysis. TS/CS-score (Tissue Specificity/Cell Specificity score) is calculated for “elevated” tissues/cell lines. TS/CS-score is calculated as the fold change from the tissue/cell line with highest RNA to the tissue/cell line with second highest RNA.

Explanation of the distribution category

CategoryDescription
Detected in singleDetected in a single tissue/region/cell type
Detected in someDetected in more than one but less than one third of tissues/regions/cell types
Detected in manyDetected in at least a third but not all tissues/regions/cell types
Detected in allDetected in all tissues/regions/cell types
Not detectednTPM < 1 in all tissues/regions/cell types

Gene clustering of transcriptomics data

The RNA expression data has been used to classify protein-coding genes into expression clusters for tissues, single cell types, immune cells, and cell lines.

ClusteringNumber of tissues, cell types or cell linesSample aggregation level
Tissue78Averaged nTPM expression per tissue type (40 HPA and 38 GTEX tissue types)
Single cell type1175Averaged nCPM expression per cell type cluster
Cell lines1206nTPM expression of individual cell lines
Immune cells103Averaged nTPM expression per immune cell
Brain193Averaged nTPM expression per brain region


Data preprocessing

For each dataset, genes with expression level > 1 in at least one of the samples were selected. The data was genewise scaled to Z-scores to account for differences in dynamic ranges between genes across samples. After scaling, the expression data was projected into a lower dimensional space using Principal Component Analysis (PCA), where a number of components were selected to satisfy Kaiser’s rule (eigenvalue ≥ 1) and explaining at least 80% of the total variance. Gene to gene distances were calculated as the Spearman correlation of gene expression across samples, and transformed to Spearman distance (1 - Spearman correlation).


Gene clustering

Based on the distances, a k-nearest neighbors (kNN) graph was computeted based on 20 nearest neighbors, which was subsequently to find clusters of similarly expressed genes via Louvain clustering. To account for the stochasticity in the louvain algorithim, the clustering was performed 100 times. The results were later collapsed into a single consensus clustering. Confidence of the gene-to-cluster assignment was calculated as the fraction of times that the gene was assigned to the cluster.


Cluster annotation

The clustering generated for each of the datasets is manually annotated to assign a specificity and function to each cluster. The annotation is based on overrepresentation analysis towards biological databases, including Gene Ontology, Reactome, PanglaoDB, TRRUST, and KEGG, as well as HPA classifications including subcellular location, protein class, secretion location and classification, and specificity toward tissues, single cell types, immune cells, brain regions, and cell lines. A reliability score is manually set for each cluster indicating the confidence of specificity and function assignment.

Clustering visualization

The clustering results are visualized in a UMAP. Colored polygons were generated to represent the main contiguous masses of genes corresponding to the same cluster. First, for each cluster, the two-dimensional density was estimated in the UMAP, and an area enveloping 95% of the total density was determined. The areas were moderated to include contiguous areas corresponding to at least 5% of the total area in the UMAP space. Finally, contiguous areas were converted to two-dimensional polygons per each cluster.


GTEx RNA-seq data

The Genotype-Tissue Expression (GTEx) project collects and analyzes multiple human post mortem tissues. RNA-seq data from 36 of their tissue types was mapped based on RSEMv1.3.0 (v8) and the resulting TPM values have been included in the Human Protein Atlas for all corresponding genes that could be mapped from Gencode v26 toEnsembl version 109. The GTEx retina data are based on EyeGEx data fromRatnapriya et al., Nature Genetics 2019 and transcript abundance estimation was performed usingKallisto v0.48.0 usingEnsembl version 109 as reference genome.

TissueGTEx tissueNumber of samples
Adipose tissueAdipose - Subcutaneous714
Adipose - Visceral (Omentum)587
Adrenal glandAdrenal Gland295
AmygdalaBrain - Amygdala181
Blood vesselArtery - Aorta472
Artery - Coronary268
Artery - Tibial691
BreastBreast - Mammary Tissue514
CaudateBrain - Caudate (basal ganglia)300
CerebellumBrain - Cerebellar Hemisphere277
Brain - Cerebellum266
Cerebral cortexBrain - Anterior cingulate cortex (BA24)233
Brain - Cortex270
Brain - Frontal Cortex (BA9)269
CervixCervix - Ectocervix24
Cervix - Endocervix23
ColonColon - Sigmoid419
Colon - Transverse479
EndometriumUterus - Endometrium27
EsophagusEsophagus - Mucosa614
Fallopian tubeFallopian Tube29
Heart muscleHeart - Atrial Appendage461
Heart - Left Ventricle452
HippocampusBrain - Hippocampus255
HypothalamusBrain - Hypothalamus257
KidneyKidney - Cortex104
Kidney - Medulla11
LiverLiver262
LungLung604
Nucleus accumbensBrain - Nucleus accumbens (basal ganglia)285
OvaryOvary193
PancreasPancreas362
Pituitary glandPituitary313
ProstateProstate282
PutamenBrain - Putamen (basal ganglia)254
RetinaRetina105
Salivary glandMinor Salivary Gland181
Skeletal muscleMuscle - Skeletal818
SkinSkin - Not Sun Exposed (Suprapubic)651
Skin - Sun Exposed (Lower leg)754
Small intestineSmall Intestine - Terminal Ileum207
Spinal cordBrain - Spinal cord (cervical c-1)204
SpleenSpleen277
StomachStomach407
Substantia nigraBrain - Substantia nigra183
TestisTestis414
Thyroid glandThyroid684
Urinary bladderBladder77
VaginaVagina170

FANTOM5 CAGE data

The Functional Annotation of Mammalian Genomes 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type specific transcriptomes using Cap Analysis of Gene Expression (CAGE) (Takahashi H et al. (2012)), which is based on a series of full-length cDNA technologies developed in RIKEN. CAGE data for 60 of their tissues was obtained from theFANTOM5 repository and mapped toEnsembl version 109.

TissueFANTOM5 tissueSample descriptionFANTOM5 sample id
Adipose tissueAdipose tissue65,65,76 years, mixedFF:10010-101C1
AmygdalaAmygdala76 years, femaleFF:10151-102I7
AppendixAppendix29 years, maleFF:10189-103D9
BreastBreast77 years, femaleFF:10080-102A8
CaudateCaudate nucleus76 years, femaleFF:10164-103B2
CerebellumCerebellum22-68 years, mixedFF:10083-102B2
Cerebellum76 years, femaleFF:10166-103B4
CervixCervix40,46,57,65 years, femaleFF:10013-101C4
ColonColon62,83,84 years, mixedFF:10014-101C5
Corpus callosumCorpus callosum24-68 years, mixedFF:10042-101F6
Ductus deferensDuctus deferens24 years, maleFF:10196-103E7
EndometriumUterus23-63 years, femaleFF:10100-102D1
EpididymisEpididymis24 years, maleFF:10197-103E8
EsophagusEsophagus68,74,75 years, mixedFF:10015-101C6
Frontal lobeFrontal lobe32-61 years, mixedFF:10040-101F4
GallbladderGall bladder57 years, maleFF:10198-103E9
Globus pallidusGlobus pallidus76 years, femaleFF:10161-103A8
Globus pallidus60 years, femaleFF:10175-103C4
Heart muscleHeart70,73,74 years, mixedFF:10016-101C7
Left ventricle73 years, femaleFF:10078-102A6
Left atrium40 years, maleFF:10079-102A7
HippocampusHippocampus76 years, femaleFF:10153-102I9
Hippocampus60 years, femaleFF:10169-103B7
Insular cortexInsula20-68 years, mixedFF:10039-101F3
KidneyKidney60,62,63 years, femaleFF:10017-101C8
LiverLiver64,69,70 years, mixedFF:10018-101C9
Locus coeruleusLocus coeruleus76 years, femaleFF:10165-103B3
Locus coeruleus60 years, femaleFF:10182-103D2
LungLung46,65,94 years, mixedFF:10019-101D1
Lung - right lower lobe29 years, maleFF:10075-102A3
Lymph nodeLymph node30 years, maleFF:10077-102A5
Medial frontal gyrusMedial frontal gyrus76 years, femaleFF:10150-102I6
Medial temporal gyrusMedial temporal gyrus76 years, femaleFF:10156-103A3
Medial temporal gyrus60 years, femaleFF:10183-103D3
Medulla oblongataMedulla oblongata18-64 years, mixedFF:10038-101F2
Medulla oblongata76 years, femaleFF:10155-103A2
Medulla oblongata60 years, femaleFF:10174-103C3
Nucleus accumbensNucleus accumbens23-56 years, mixedFF:10037-101F1
Occipital cortexOccipital cortex76 years, femaleFF:10163-103B1
Occipital lobeOccipital lobe27 years, maleFF:10076-102A4
Occipital poleOccipital pole22-68 years, mixedFF:10036-101E9
Olfactory bulbOlfactory region87 years, femaleFF:10195-103E6
OvaryOvary47,75,84 years, femaleFF:10020-101D2
PancreasPancreas52 years, maleFF:10049-101G4
Paracentral gyrusParacentral gyrus22-69 years, mixedFF:10035-101E8
Parietal lobeParietal lobe35-89 years, mixedFF:10034-101E7
Parietal lobe76 years, femaleFF:10157-103A4
Parietal lobe60 years, femaleFF:10171-103B9
Pituitary glandPituitary gland76 years, femaleFF:10162-103A9
PlacentaPlacentafemaleFF:10021-101D3
PonsPons18-54 years, mixedFF:10033-101E6
Postcentral gyrusPostcentral gyrus44-52 years, mixedFF:10032-101E5
ProstateProstate73,79,93 years, maleFF:10022-101D4
PutamenPutamen60 years, femaleFF:10176-103C5
RetinaRetina24-65 years, mixedFF:10030-101E3
Salivary glandSalivary gland16-60 years, mixedFF:10093-102C3
Parotid gland23 years, maleFF:10199-103F1
Submaxillary gland24 years, maleFF:10202-103F4
Seminal vesicleSeminal vesicle24 years, maleFF:10201-103F3
Skeletal muscleSkeletal muscle55,79,79 years, mixedFF:10023-101D5
Skeletal muscle - soleus musclemaleFF:10282-104F3
Small intestineSmall intestine15,40,85 years, mixedFF:10024-101D6
Smooth muscleSmooth muscle20-68 years, maleFF:10048-101G3
Spinal cordSpinal cord76 years, femaleFF:10159-103A6
Spinal cord60 years, femaleFF:10181-103D1
SpleenSpleen39,50,70 years, maleFF:10025-101D7
Substantia nigraSubstantia nigra76 years, femaleFF:10158-103A5
Temporal cortexTemporal lobe32-61 years, mixedFF:10031-101E4
TestisTestis34,53,86 years, maleFF:10026-101D8
Testis14-64 years, maleFF:10096-102C6
ThalamusThalamus76 years, femaleFF:10154-103A1
ThymusThymus0.5,0.5,0.83 years old infant years, maleFF:10027-101D9
Thyroid glandThyroid67,68,78 years, mixedFF:10028-101E1
TongueTongue28 years, maleFF:10203-103F5
TonsilTonsil22-61 years, mixedFF:10047-101G2
Urinary bladderBladder55,58,79 years, mixedFF:10011-101C2
VaginaVagina68 years, femaleFF:10204-103F6

Contact

The Project

The Human Protein Atlas

KAW logoThe Human Protein Atlas project is funded
by the Knut & Alice Wallenberg Foundation.

Facebook logoX logoBluesky logoLinkedin logoRSS feed logocontact@proteinatlas.org
GCBR logoElixir core logoSciLifeLab logoUppsala university logoKI logoKTH logo

[8]ページ先頭

©2009-2025 Movatter.jp