Motivation: Genes are often characterized dichotomously as either housekeeping or single-tissue specific. We conjectured that crucial functional information resides in genes with midrange profiles of expression.
Results: To obtain such novel information genome-wide, we have determined the mRNA expression levels for one of the largest hitherto analyzed set of 62 839 probesets in 12 representative normal human tissues. Indeed, when using a newly defined graded tissue specificity index τ, valued between 0 for housekeeping genes and 1 for tissue-specific genes, genes with midrange profiles having 0.15 < τ < 0.85 were found to constitute >50% of all expression patterns. We developed a binary classification, indicating for every gene theIB tissues in which it is overly expressed, and the 12 −IB tissues in which it shows low expression. The 85 dominant midrange patterns withIB = 2–11 were found to be bimodally distributed, and to contribute most significantly to the definition of tissue specification dendrograms. Our analyses provide a novel route to infer expression profiles for presumed ancestral nodes in the tissue dendrogram. Such definition has uncovered an unsuspected correlation, wherebyde novo enhancement and diminution of gene expression go hand in hand. These findings highlight the importance of gene suppression events, with implications to the course of tissue specification in ontogeny and phylogeny.
Availability: All data and analyses are publically available at the GeneNote website,http://genecards.weizmann.ac.il/genenote/ and, GEO accession GSE803.
Contact: [email protected]
Supplementary information: Four tables available at the above site.
The ontogeny of complex multicellular organisms is enabled by the differential expression of genes across various cell types. Expression profiling with DNA arrays offers the opportunity to systematically identify such patterns (Halfon and Michelson, 2002;Slonim, 2002). Housekeeping genes are expressed in all cell types, whereas other genes are expressed in a more restricted selection of tissues. In the previous research on the tissue specificity of genes, emphasis has mainly been on the extremes of one-tissue specific (Hsiaoet al., 2001,http://www.humangenes.org;Suet al., 2002) and housekeeping genes (Eisenberg and Levanon, 2003;Lercheret al., 2002;Warringtonet al., 2000). However, many genes may show midrange patterns of expression, that is, expressed at a high level in a subset of the tissues, and at a much lower level or not at all in other tissues. This term is related to the cross-tissue ‘breadth’ of gene expression, rather than high or low overall expression intensities. Here, we investigate the occurrence and potential significance of midrange patterns of expression, noting that important information about a given tissue may be harbored not only in tissue-specific enhancement of expression, but also in tissue-specific suppression.
Some recent high-throughput DNA arrays studies of gene expression have been aimed at characterizing healthy tissue transcription patterns. One of these examined the transcription profiles in 28 normal human tissues and 45 mouse tissues, utilizing 12 000 oligonucleotide probesets (Suet al., 2002). cDNA arrays have also been used to examine expression of genes across normal human tissues (Saito-Hisaminatoet al., 2002). These, as well as other surveys on normal tissues (Havertyet al., 2002;Hsiaoet al., 2001) were limited only to the more well-characterized genes, and did not afford a total genome-wide view. Studies on a more complete gene set focused on a comparison between diseased and non-diseased states (Bakayet al., 2002;Iacobuzio-Donahueet al., 2002;Marianiet al., 2002). In a recent report (Shmueliet al., 2003), as well as in the current work, we queried 12 normal human tissues with a complete gamut of 62 839 probesets, representing 23 271 identifiable human genes. This is one of the largest sets employed to date, and includes nearly 12 000 genes whose tissue expression has not been examined by the earlier studies. Most recently,Suet al. (2004) have extended their expression atlas to encompass 79 human and 61 mouse tissues.
The resulting genome-wide view of gene expression patterns is used here to reveal relationships among healthy human tissues, as well as to generate new genome annotation tools. Specifically, our data shed new light on genes with midrange profiles of expression, with implications to the fine balance of gene expression and suppression that underlie tissue specification.
The expression intensity of mRNA was assayed across five microarrays (Affymetrix GeneChips U95A–E), containing a total of 62 839 probesets, each in duplicate. Poly(A)+RNA samples from the human tissues were purchased from Clontech (Palo Alto, CA, details in Table S1 in the Supplementary Material). This collection of major human tissues, includes bone marrow, brain, heart, kidney, liver, lung, pancreas, prostate, skeletal muscle, spinal cord, spleen and thymus. These RNA samples have relatively coarsely defined tissue delineations but are compatible in this respect to those used in other studies of transcription patterns in a group of normal human tissues (Suet al., 2002,2004;Saito-Hisaminatoet al., 2002). Each RNA sample was typically composed of a pool of 10–25 individuals. While such commercial pooled samples from anonymous donors are demographically ill-defined, they are advantageous in enabling others to reproduce the experiments.
Replicate experiments were done independently, mostly from RNA of identical lot numbers. Exceptions are kidney, pancreas and prostate. Aliquots of each sample (12 μg cRNA in 200 μl hybridization mix) were hybridized to a GeneChip Human Genome U95A–E array set (Affymetrix, Santa Clara, CA). Preparation and hybridization of cRNA were done according to the manufacturer's instructions (Affymetrix, 2001,http://.com/support/technical/manuals.affx).
The expression value for each gene was determined using the MicrroArray Suite version 5.0 (MAS 5.0) software (Hubbellet al., 2002;Liuet al., 2003) with default parameters, without using the MAS 5.0 scaling and normalizing procedures. The quantilization procedure used here (see below) encapsulates some features of a preprocessing method, RMA normalization (Irizarryet al., 2003). Affymetrix MAS 5.0 signal values were normalized by taking the log10 of all values (substituting −1 for zero intensities) and then subtracting the mean for the particular array and adding the total experimental mean (Shmueliet al., 2003). Finally, intensities less than log10 30 were set to log10 30 to eliminate the perturbation by the noise present in the low intensities. Variations in this threshold resulted in no significant changes. The MAS 5.0 intensities, ranging on a decimal logarithmic scale from log10 30 to roughly 4, were converted into a quantile scale. The expression data, averaged over the two replicates, were divided into 11 bins, whereby 10 equal density quantiles spanned the values above log10 30, and an eleventh ‘zero bin’ included the remaining low-intensity values. Henceforth, the quantiled profiles were used in the analysis.
Single-classification ANOVA with equal sample sizes was employed on the preprocessed 24 element expression vector composed of 12 tissues in two replicates. For each tissue profile, the sum of the squares of the differences between the replicates was compared with the sum of the squares of the differences between the averages of the tissue expressions. To account for the multiple comparison problem inherent in calculating theP-values for all 62 839 probesets, we calculated the false discovery rate (Benjamini and Hochberg, 1995). We chose a 1% error rate, which gave aP-value cutoff of 0.0036. This resulted in 22 936 profiles that were defined as ‘differentially expressed’. The remaining profiles were further divided into not expressed profiles, defined as having all 12 values in the zero bin, and housekeeping profiles, whose expression is non-zero in all tissues and all intensities are of a similar value (SD smaller than 1 quantile unit). The remaining profiles were defined as uncategorized. The algorithms described below were deployed on the 22 936 differentially expressed profiles.
The association of probesets to genes was performed using the GeneAnnot algorithm (Chalifa-Caspiet al., 2003,2004). GeneAnnot comprehensively identifies relationships between oligonucleotide array probesets and annotated genes in GeneCards (Safranet al., 2002) by performing pairwise alignments between the probe sequences and gene transcripts, and assigning sensitivity and specificity scores to them. A further step of probeset annotation, conducted by GeneTide (Shklaret al., 2004), was to assign annotation based upon the transcript from which the probeset was derived. This was carried out by an integration of transcript annotation data from several resources such as UniGene (Wheeleret al., 2003) and AceView (http://www.humangenes.org). Furthermore, these target sequences were aligned against the human genome using BLAT (Kent, 2002), and assigned a gene according to their genomic location using GeneLoc (Rosenet al., 2003).
We first defined the ‘gap’ index for each expression profile as the maximum difference between the two neighboring values in the sorted quantile vector. When the same ‘gap’ was found more than once in a profile, the first gap, between the smaller neighboring values with that gap was taken. The ‘gap’ was used to convert expression profiles into binary form. For those 8224 differentially expressed profiles with a ‘gap’ of at least 3, expression above the ‘gap’ was interpreted as overexpressed (1) and the rest as underexpressed (0). This set of 8224 probeset profiles form our ‘mingap’ set. The remaining 14 712 differentially expressed profiles were classified to the best matching binary patterns detected by ‘gap’ as follows. The Euclidean distance was calculated between each of the 14 712 profiles and the mean expression profile of each of the binary patterns. The pattern to which this distance was smallest was selected as the matching binary pattern for the profile. The binary index,IB, corresponding to each binary pattern is defined as the number of 1s in the pattern.
The Superparamagnetic clustering (SPC) algorithm (Blattet al., 1996) was applied to the same set of profiles used in the binary pattern analysis. Before clustering, each profile was centered and normalized such that its mean was centered to zero, and its norm became one [as described byKannanet al. (2001)]. The SPC parameters used are detailed in Table S2 (Supplementary Material).
Given two binary tissue expression profiles, an ancestor profile was inferred by first assuming that instances of agreement (both 1s or both 0s) are unaltered in the ancestor. In the disagreement cases (1 and 0, or 0 and 1), maximum parsimony is applied, with a majority call of expression in the remaining tissues. Our method for inferring the ancestors of each node in a dendrogram including the deep internal nodes involved following the linkages of the hierarchically clustered tree and successively inferring each node.
All analyses were implemented in Matlab (www.mathworks.com). Scripts and intermediate data are all available upon request.
Expression profiles were generated for a set of 12 representative normal human tissues (Fig. 1). This was done with a total of 62 839 oligonucleotide probesets, of which nearly 75% corresponded to annotated human genes, encompassing 23 271 GeneCards entries (Safranet al., 2002), and the rest could not be associated with currently known gene-related sequences (Table 1). The 50 214 probesets included in the four less commonly used arrays U95B–E provided novel expression information on 11 418 GeneCards genes. This genome-wide view of human tissue expression patterns is available in the GeneNote database (Shmueliet al., 2003,http://genecards.weizmann.ac.il/genenote/). The expression profiles were classified into four categories: differentially expressed, housekeeping, not expressed and uncategorized (Fig. 1 andTable 1). It is seen that a majority (∼90%) of the probesets in the first two categories are related to known genes, while most of the unannotated probesets are included in the last two categories, as expected.
To examine the complete expression pattern diversity, we developed a tissue specificity index, τ, a quantitative, graded scalar measure of the specificity of an expression profile. τ values interpolate the entire range between 0 for housekeeping genes and 1 for strictly one-tissue-specific genes. It is seen (Fig. 2A) that τ values near 0 and 1 tend to be more probable than the intermediate values, generating a U-shaped distribution. However, as many as 57% of all profiles have intermediate specificities: 0.15 ≤ τ ≤ 0.85, constituting the largest group, greater than the housekeeping and one-tissue-specific sets combined.
To evaluate the robustness of the shape of the τ distribution to additional tissues and organisms, we calculated the same distributions for a previously published set (Suet al., 2002) where 27 human and 45 mouse normal tissues with replicates were analyzed for one-fifth of the gene representations examined here. We found that the shape of the τ distributions was largely similar in all three datasets. Indeed, nearly identical percentages of profiles with intermediate specificity (0.15 ≤ τ ≤ 0.85) are detected: 56% for mouse and 57% for human.
Do our tissue-specificity (τ) estimates from 12 tissues scale up when a more comprehensive number of tissues are examined? A recently published study (Suet al., 2004) provides human gene expression profiles across 74 non-cancerous human tissues. We found a high correlation (R = 0.85) between the τ indices of genes across the two datasets for differentially expressed genes (Fig. 3). Two clusters of τ values differ markedly: low τ in GeneNote, high τ in the new study and vice versa. The former correspond to genes specific to tissues not present in GeneNote, and the latter to spleen, not present in the more recent study. Congruence between the tissue specificity values based upon 12 and 74 tissues demonstrates the power of our newly defined tissue specificity index, and shows that our choice of tissues is fairly representative of the complete tissue-set transcriptome. An analysis of the distribution of τ values for the new dataset (Fig. 2A, dotted line) shows a relatively high preponderance (>60%) of intermediate τ values, likely stemming from the use of subtissues such as different brain regions.
The one-dimensional tissue specificity index is limited in its capacity to identify and categorize specific classes of expression patterns. To overcome this, we developed a procedure that converts an arbitrary expression profile into a binary pattern. The quantiled expression profiles are mapped from a very large set of the cardinality 1112 (more than 3.1 billion) to a reduced set of 212 = 4096 possible patterns. This analysis was initially performed on a subset of 8224 probeset expression profiles that fulfilled a specific intensity gap criterion (the mingap set, see the Algorithms section).
Of the possible 4094 binary patterns (excluding the all-0 and all-1 patterns), 859 were actually observed in this set. The probesets of the first microarray (U95A) detected only 498 of these patterns, while the remaining 42% of the patterns were found only on the four additional arrays (U95B–E). Further, the four additional arrays strengthen 127 patterns that were populated by only one profile in the first array. Subsequently, the differentially expressed profiles not included in the mingap set were binarized by matching each one to its closest binary counterpart.
The results of the binarization are shown inFigure 4. The different panels 4.i (i = 1–12) have profiles (parsed from among the 8224 gene representations of the differentially expressed and 4216 housekeeping genes) with high expression ini tissues and underexpression in 12 minusi tissues. Panel 4.12 contains the strictly defined housekeeping genes. In panel 4.1 (single-tissue specificity), brain, bone marrow, pancreas, skeletal muscle and liver are more highly represented, while spinal cord, kidney, heart and spleen have relatively few profiles. In panel 4.2, prevalent two-tissue specific patterns are brain and spinal cord, heart and skeletal muscle, bone marrow and thymus, and kidney and liver. Bone marrow, spleen and thymus tissues define a major three-tissue pattern in panel 4.3. Panels 4.9–4.11 depict profiles with expression in all but 3, 2 or 1 tissue(s), respectively. Notably, the same five tissues with the most single-tissue specific profiles (brain, bone marrow, pancreas, muscle and liver) also have the greatest number of single-tissue suppressed profiles.
Of the individual profiles appearing inFigure 4, 5220 are not well annotated to any known gene and should therefore be considered interesting. For the implied novel genes, function can be preliminarily ascertained based upon their expression profile. Table S3 (Supplementary Materials) shows the expression profile along with the identifier of the sequence from which the probeset was derived.
We subsequently defined the 99 most populated (>25 probeset profiles) binary patterns among the 22 936 differentially expressed genes including the housekeeping (all 1s) and null (all 0s) patterns (Fig. 2B). The number of populated binary patterns in each binary index,IB, shows a clear bimodal distribution (Fig. 2C), with peaks at binary index values of 2 and 10. This behavior reflects the same bimodal trend seen for τ values inFigure 2A. Whereas all 12 one-tissue-specific patterns are included, only about one-third of the two-tissue expressed patterns (IB = 2) and about one-quarter of the two-tissue repressed patterns (IB = 10) are included in this set, suggesting biases toward specific oligo-tissue combinations. The peak at highIB values inFigure 2C corresponds to profiles with low expression in 1–3 tissues and high expression in the others. We use the term suppression to describe instances where genes are expressed at lower levels in a few tissues. This does not necessarily imply an active process where the expression of a gene is specifically turned off. It could equally be due to a loss of activation in expression or a dilution of mRNA levels in one tissue relative to another, due to a different cellular composition.
To test the validity of the supervised binary clustering, we also applied unsupervised SPC (Blattet al., 1996;Getzet al., 2000) to our data (Fig. 5A). SPC is suitable for the clustering of gene expression profiles due to its stability against noise and the inherent measure of cluster stability (Getzet al., 2000). The identified 70 SPC clusters showed a strong correlation with the 97 binary clusters (Fig. 5B). Some binary patterns are represented by multiple SPC clusters, thus serving to further refine the relevant binary patterns (Fig. 5C). The high level of overall agreement between the two clustering methods lends additional credence to the binary classification proposed here.
Inter-tissue distances were calculated between pairs of tissue vectors, each containing the 22 936 expression values of the set of differentially expressed profiles. The resulting tissue dendrogram (Fig. 6A) shows a specific set of groupings relating to different degrees of inter-tissue similarities. The dendrogram reveals a set of tissue relationships that is consistent with previous knowledge (Hsiaoet al., 2001). The immunological tissues, bone marrow, thymus and spleen, along with the lung, cluster together. Pairs of related tissues coupled in the dendrogram are: heart and skeletal muscle, kidney and liver, brain and spinal cord, and prostate and pancreas.
To isolate those profiles that specify the underlying relationships among the tissues, we split the differentially expressed profiles into two groups: those withIB = 1 and those withIB = 2–11. We found that the tree based upon the second group, with midrange profiles (Fig. 6B) recovers the most important features of the dendrogram based on the entire set: a united nervous system, muscle tissues juxtaposed and immune system mutually coherent. In contrast, the dendrogram based upon theIB = 1 group (Fig. 6C) is very different and appears much more removed from known tissue relationships. For example, the spinal cord is closest to heart and very distant from brain. One could argue that the visible non-zero off-diagonal values in theIB = 1 patterns (panel 4.1 ofFig. 4) would contribute sufficient information, so as to generate a more biologically realistic tissue dendrogram. But this is not the case.
The availability of genome-wide expression profiles for each of the tissues provides a unique opportunity to obtain additional information regarding mutual relationships among different tissues. Specifically, it is possible to derive from the tissue dendrogram an inferred gene expression profile for each of the ‘ancestral’ tissues represented by the internal nodes (Fig. 7A). As an example, it is seen (Fig. 7B) that in most cases of expression in brain but not in spinal cord, underexpression is inferred for the ancestral tissue, reflectingde novo specificities for brain. On the other hand, most of the high expressions found in spinal cord but not in brain are also positive in the inferred ancestral tissue, suggesting that the difference corresponds to brain-specific suppressions. More generally, we found that for most ancestral tissues, including all but one of the most closely related doublets, the tissue with more genes showing novel expression also exhibits more genes with novel suppression (Fig. 7C). This phenomenon is also gleaned by visual inspection of panels 1 and 11 ofFigure 4, as described above.
This paper proposes a set of novel genome-wide-specific annotation tools. First, each of the 23 271 genes targeted by 46 185 probesets (Table 1) has one or more tissue expression profiles, documented in GeneNote. A set of tools has been developed to allow one to generate a consensus expression pattern for each of these genes, with the exclusion of outliers. Second, every gene is marked with a specific value of τ, identifying it as belonging to a particular range on a graded tissue specificity measure between extreme tissue specificity and a complete absence of such specificity. Third, a gene with a differentially expressed profile is related to a binary pattern, indicating the combination of tissues in which it is more highly expressed and suppressed. We believe that these binary patterns are more amenable to intuitive scientific interpretation than classification based on standard clustering algorithms. It is reassuring, though, that a high degree of correlation is demonstrated between the two systems. All the above information provides tools for assigning potential function to novel and hitherto un-annotated genes. As the annotation tools presented here are easily generalized, we believe they can be fruitfully applied to a wide spectrum of datasets, for example, to sets with tumor and non-tumor samples.
The binary pattern analysis is particularly useful in revealing expression profiles that constitute unusual tissue combinations. For example, the pattern number 36 inFigure 2B denotes high expression in bone marrow, pancreas and liver; pattern number 47 denotes high expression in heart, prostate and spinal cord. In general, among the 98 binary patterns ofFigure 2B that show expression in at least one tissue, 1 pattern corresponds to the housekeeping expression profile and another 12 denote single-tissue-specific profiles. The remaining 85 patterns are defined here as denoting midrange profiles of expression. Of these, a maximum of 33 patterns may be considered as consistent with tissue clustering as defined by the dendrogram ofFigure 6A, as they correspond to the groups of tissues defined by the terminal and internal nodes of the dendrogram. Thus, a majority of the dominant binary patterns corresponding to midrange profiles may be viewed as unexpected. Such patterns are difficult to explain in terms of tissue similarities, including the sharing of common cell types among disparate tissues. Alternatively, there may be yet undiscovered underlying transcription control mechanisms that could be discerned by future research. Some such unexpected expression patterns may be a neutral mode of expression (Khaitovichet al., 2004;Yanaiet al., 2004).
The approach explored here focuses on midrange profiles of transcription, with elevated expression/suppression in specific tissue combinations, and intermediate values of the tissue specificity index τ. Our analysis has revealed that midrange profiles constitute a majority of the tissue specificity expression patterns. Despite its ubiquity, this category has received remarkably little attention relative to its housekeeping and tissue-specific counterparts. Of the nearly 100 most populated binary patterns, more than 80% are midrange patterns. A recent expression study in maize has also shown that a relatively small portion of genes tend to be organ specific while the remaining show diverse expression (Choet al., 2002).
Most focused arrays with specific subsets of genes used by various authors contain mostly tissue-specific genes whose level is elevated in a single tissue. Such arrays may be considered ‘too focused’. Our results and analyses, suggesting the importance of genes with midrange expression profiles, could have serious impact in terms of array design and experimentation.
A dominant property of midrange profiles is the surprising preponderance of patterns with tissue-specific gene suppression (IB = 9–11), which are almost as populated as oligo-expression patterns (IB = 2–4). The most underrepresented set of profiles are the midrange profiles withIB = 5–7. Our results also indicate that in the evolution of a tissue,de novo expression andde novo suppression go hand in hand.
It thus appears that gene suppression plays a major role in tissue evolution and is tightly coupled with novel expression in the origin of distinct tissues. Such tissue-specific gene suppression may be mediated by specific pathways of transcription control (Hsia and McGinnis, 2003), as well as by other cellular mechanisms, including those mediated by RNA interference (Cerutti, 2003). One practical conclusion related to tissue-specific arrays is that these should preferably contain, in addition to single-tissue-specific genes, also genes that manifest more complex patterns of expression–suppression.
Understanding the signaling and control pathways that govern organ development during ontogeny constitutes a fundamental problem of developmental biology (Burgesset al., 2002). Studies in model organisms such asDrosophila have demonstrated the complex interplay of signaling molecules that underlie developmental events associated with the embryonic maturation of tissues and cell types (St Johnston, 2002). The exact spatial and temporal expression of genes, and the interaction of their protein products elicit a developmental code of organ commitment and early patterning. This code likely manifests itself in the pattern of gene expression in each of the tissues. Furthermore, when new tissues are formed in ontogeny or phylogeny, their ancestral precursors should have their own expression patterns, complexly related to those of the more highly differentiated derived tissues. To validate this concept in the future, direct experimental testing of expression patterns at early stages in embryogenesis will be required. Our analysis of ancestral tissue expression, which points to a correlation between novel tissue expression and suppression, and the availability of a tissue dendrogram relating to the full gamut of genes of the human genome, can serve as a valuable tool for such studies.
Gene expression profiles across 12 normal human tissues. Classification of the 62 839 expression profiles (horizontal lines) into four groups indicated by the right bar: HK, housekeeping; DE, differentially expressed; UC, uncategorized; and NE, not expressed (Table 1). The left bar indicates the origin from Array A (peach) and Arrays B–E (brown). The profiles within each category were sorted in ascending order according to τ, the tissue specificity index. Expression intensities are color coded by quantile values on the bottom bar. Tissue abbreviations: BRN, brain; SPC, spinal cord; BMR, bone marrow; SPL, spleen; TMS, thymus; LNG, lung; PNC, pancreas; PST, prostate; HRT, heart; MSL, skeletal muscle; KDN, kidney; and LVR, liver.
Differentially expressed | Housekeeping | Not expressed | Uncategorized | Total | |
---|---|---|---|---|---|
GeneCards | 20 589 | 3181 | 9859 | 12 726 | 46 185 |
Not annotated | 2347 | 1035 | 6502 | 6600 | 16 654 |
Total | 22 936 | 4216 | 16 361 | 19 326 | 62 839 |
Differentially expressed | Housekeeping | Not expressed | Uncategorized | Total | |
---|---|---|---|---|---|
GeneCards | 20 589 | 3181 | 9859 | 12 726 | 46 185 |
Not annotated | 2347 | 1035 | 6502 | 6600 | 16 654 |
Total | 22 936 | 4216 | 16 361 | 19 326 | 62 839 |
The probesets to GeneCards associations were done using the GeneAnnot algorithm (Chalifa-Caspiet al., 2004), and annotation from the original transcripts from which the probesets were derived (see Systems and methods section).
Differentially expressed | Housekeeping | Not expressed | Uncategorized | Total | |
---|---|---|---|---|---|
GeneCards | 20 589 | 3181 | 9859 | 12 726 | 46 185 |
Not annotated | 2347 | 1035 | 6502 | 6600 | 16 654 |
Total | 22 936 | 4216 | 16 361 | 19 326 | 62 839 |
Differentially expressed | Housekeeping | Not expressed | Uncategorized | Total | |
---|---|---|---|---|---|
GeneCards | 20 589 | 3181 | 9859 | 12 726 | 46 185 |
Not annotated | 2347 | 1035 | 6502 | 6600 | 16 654 |
Total | 22 936 | 4216 | 16 361 | 19 326 | 62 839 |
The probesets to GeneCards associations were done using the GeneAnnot algorithm (Chalifa-Caspiet al., 2004), and annotation from the original transcripts from which the probesets were derived (see Systems and methods section).
Tissue specificity index and expression pattern repertoire.(A) Distribution of τ values for 27 152 profiles that include the 22 936 differentially expressed and 4216 housekeeping profiles (bars). The τ distributions are also shown for the 12 626 profiles (array HE-U133A) across 74 human tissues (dotted line curve) from a recent study (Suet al., 2004), as well as 27 human tissues (light gray curve), and 12 654 across 45 mouse tissues (dark gray curve) from another set (Suet al., 2002).(B) A summary representation of the most populated binary patterns (columns), where black circles indicate high expression. The patterns, enumerated on the abscissa, are sorted according to binary value. Tissue abbreviations as inFigure 1.(C) The frequency distribution ofIB values of the binary patterns shown in (B). The superimposed curve indicates the expected distribution following a random binomial model.
Comparison of tissue specificity indices (τ) in different data sets. The τ indices of the current work (GeneNote) were compared with those of a recently published set with 74 human tissues (Suet al., 2004). The two sets were compared based upon the probeset mappings (U95 and U133) released by Affymetrix, where if more than one U133 probeset could be matched for a given U95 probeset, only one was taken. This restriction resulted in 13 124 probeset pairs. The expression intensities of the two sets were normalized by quantile normalization, and subsequently the mean of each expression profile for each set was scaled to the total mean of the profile. Replicates were averaged and signal quantilization was carried out as described in the systems and methods section. Shown are the τ pairs for the 9450 differentially expressed genes of our set.
Profile clustering in 12 binary pattern sets. Each panel contains all mingap set expression profiles with binary pattern having theIB value indicated on top, sorted by the pattern's binary value. Housekeeping profiles are taken asIB=12. The profiles of each pattern were further sorted according to the mean of expression values corresponding to the non-zero elements in the corresponding binary pattern. The color code for expression intensity is as inFigure 1. Table S4 (Supplementary Material) specifies the accession identifiers for each expression profile shown.
Superparamagnetic clustering (SPC).(A) A summary representation of the results of SPC. Each column (enumerated on the abscissa and sorted by gap-computed binary values) represents an average pattern of all profiles in a cluster. The predefined SPC minimal cluster size was set to 10 profiles. Colour coding is as inFig. 1, representing average quantiles.(B) Each element in the matrix is a comparison between the profiles of an SPC cluster and those of a binary cluster. Binary clusters were obtained for the 8224 profiles of the mingap set, and only the patterns with at least 10 profiles are shown. The color-coded score (right bar) is calculated as the number of fraction profiles between the two cluster patterns, divided by the size of the smaller of the two clusters.(C) Augmented view of 11 individual SPC clusters. Centered and normalized quantiled expression profiles of the cluster's members are shown. The clusters 2, 23 and 65 [corresponding to their rows in B] manifest expression in both brain and spinal cord; 2 is equally expressed in both tissues, 23 is higher in the brain and 65 is higher in spinal cord. Similar relationships are seen in the cluster triplets 12, 42 and 59, expressed in both heart and skeletal muscle, and 15, 36 and 46, expressed in both liver and kidney. Table S2 (Supplementary Material) specifies the accession identifiers for all profiles clustered by SPC. Colour represents normalized centered expression intensities.
Midrange specificity profiles and the tissue dendrograms. Tissue dendrograms, based upon a hierarchical clustering using average linkage of the differentially expressed genes.(A) based on all differentially expressed (DE) profiles;(B) based upon the 4921 mingap profiles withIB = 2–11;(C) based upon 3303 profiles withIB = 1. A systematic analysis of this trend (data not shown) suggests that the contribution to the dendrogram from profiles with high binary index values (IB = 8–11) is greater than that from the symmetrically disposed patterns with low binary index (IB = 2–5).
Ancestral tissue patterns.(A) The tissue dendrogram, based upon the differentially expressed genes, is shown with superimposed tissue vectors and inferred ancestral tissue vectors based upon the binary profiles of the mingap set.(B) Two instances of ancestral tissue (ANC) inferences for brain and spinal cord (left) and for kidney and liver (right).(C) Correlation of novel tissue expression and novel tissue suppression of profiles with reference to their inferred ancestor. For each internal ancestral node, a ratio was calculated between the novel expressions in the two derived tissue vectors with respect to the ancestral tissue vector (abscissa). Similar ratios were also calculated for the novel suppressions (ordinate). Points represent all internal nodes, with tissue pair order selected to have an abscissa value higher than 1. Only two of the internal nodes [labeled in light and dark gray, and similarly marked in (A)] had an ordinate value lower than 1, indicating lack of correlation. The node corresponding to the brain and spinal cord ancestor is marked with an empty circle.
I.Y. is a Koshland Scholar, D.L. is the incumbent of the Ralph and Lois Silver Chair in Human Genomics, and E.D. is the incumbent of the Henry J. Leir Professorial Chair. This work was made possible by the generosity of the Abraham and Judith Goldwasser Foundation. It was further supported by the Crown Human Genome Center and by the Koshland Center for Basic Research.
Bakay, M., Zhao, P., Chen, J., Hoffman, E.P.
Benjamini, Y. and Hochberg, Y.
Blatt, M., Wiseman, S., Domany, E.
Burgess, R., Lunyak, V., Rosenfeld, M.
Chalifa-Caspi, V., Shmueli, O., Benjamin-Rodrig, H., Rosen, N., Shmoish, M., Yanai, I., Ophir, R., Kats, P., Safran, M., Lancet, D.
Chalifa-Caspi, V., Yanai, I., Ophir, R., Rosen, N., Shmoish, M., Benjamin-Rodrig, H., Shklar, M., Stein, T.I., Shmueli, O., Safran, M., et al.
Cho, Y., Fernandes, J., Kim, S.H., Walbot, V.
Getz, G., Levine, E., Domany, E.
Halfon, M.S. and Michelson, A.M.
Haverty, P.M., Weng, Z., Best, N.L., Auerbach, K.R., Hsiao, L.L., Jensen, R.V., Gullans, S.R.
Hsia, C.C. and McGinnis, W.
Hsiao, L.L., Dangond, F., Yoshida, T., Hong, R., Jensen, R.V., Misra, J., Dillon, W., Lee, K.F., Clark, K.E., Haverty, P., et al.
Hubbell, E., Liu, W.M., Mei, R.
Iacobuzio-Donahue, C.A., Maitra, A., Shen-Ong, G.L., van Heek, T., Ashfaq, R., Meyer, R., Walter, K., Berg, K., Hollingsworth, M.A., Cameron, J.L., et al.
Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.
Kannan, K., Kaminski, N., Rechavi, G., Jakob-Hirsch, J., Amariglio, N., Givol, D.
Khaitovich, P., Weiss, G., Lachmann, M., Hellmann, I., Enard, W., Muetzel, B., Wirkner, U., Ansorge, W., Paabo, S.
Lercher, M.J., Urrutia, A.O., Hurst, L.D.
Liu, G., Loraine, A.E., Shigeta, R., Cline, M., Cheng, J., Valmeekam, V., Sun, S., Kulp, D., Siani-Rose, M.A.
Mariani, T.J., Budhraja, V., Mecham, B.H., Gu, C.C., Watson, M.A., Sadovsky, Y.
Rosen, N., Chalifa-Caspi, V., Shmueli, O., Adato, A., Lapidot, M., Stampnitzky, J., Safran, M., Lancet, D.
Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., Ben-Dor, U., Esterman, N., Rosen, N., Peter, I., et al.
Saito-Hisaminato, A., Katagiri, T., Kakiuchi, S., Nakamura, T., Tsunoda, T., Nakamura, Y.
Shklar, M., et al.
Shmueli, O., Horn-Saban, S., Chalifa-Caspi, V., Shmoish, M., Ophir, R., Benjamin-Rodrig, R., Safran, M., Domany, E., Lancet, D.
Slonim, D.K.
St Johnston, D.
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al.
Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al.
Warrington, J.A., Nair, A., Mahadevappa, M., Tsyganskaya, M.
Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Sequeira, E., Tatusova, T.A., et al.
Month: | Total Views: |
---|---|
November 2016 | 5 |
December 2016 | 14 |
January 2017 | 39 |
February 2017 | 58 |
March 2017 | 93 |
April 2017 | 43 |
May 2017 | 90 |
June 2017 | 39 |
July 2017 | 52 |
August 2017 | 70 |
September 2017 | 60 |
October 2017 | 65 |
November 2017 | 218 |
December 2017 | 238 |
January 2018 | 279 |
February 2018 | 366 |
March 2018 | 347 |
April 2018 | 354 |
May 2018 | 206 |
June 2018 | 383 |
July 2018 | 342 |
August 2018 | 220 |
September 2018 | 300 |
October 2018 | 363 |
November 2018 | 363 |
December 2018 | 230 |
January 2019 | 194 |
February 2019 | 446 |
March 2019 | 431 |
April 2019 | 486 |
May 2019 | 269 |
June 2019 | 187 |
July 2019 | 176 |
August 2019 | 180 |
September 2019 | 202 |
October 2019 | 189 |
November 2019 | 122 |
December 2019 | 137 |
January 2020 | 109 |
February 2020 | 94 |
March 2020 | 122 |
April 2020 | 146 |
May 2020 | 111 |
June 2020 | 152 |
July 2020 | 127 |
August 2020 | 108 |
September 2020 | 182 |
October 2020 | 168 |
November 2020 | 180 |
December 2020 | 190 |
January 2021 | 164 |
February 2021 | 157 |
March 2021 | 235 |
April 2021 | 243 |
May 2021 | 279 |
June 2021 | 222 |
July 2021 | 167 |
August 2021 | 154 |
September 2021 | 204 |
October 2021 | 190 |
November 2021 | 227 |
December 2021 | 160 |
January 2022 | 174 |
February 2022 | 164 |
March 2022 | 182 |
April 2022 | 212 |
May 2022 | 214 |
June 2022 | 193 |
July 2022 | 210 |
August 2022 | 187 |
September 2022 | 266 |
October 2022 | 227 |
November 2022 | 203 |
December 2022 | 179 |
January 2023 | 198 |
February 2023 | 205 |
March 2023 | 224 |
April 2023 | 214 |
May 2023 | 240 |
June 2023 | 219 |
July 2023 | 160 |
August 2023 | 223 |
September 2023 | 198 |
October 2023 | 192 |
November 2023 | 176 |
December 2023 | 245 |
January 2024 | 271 |
February 2024 | 263 |
March 2024 | 410 |
April 2024 | 298 |
May 2024 | 292 |
June 2024 | 227 |
July 2024 | 204 |
August 2024 | 190 |
September 2024 | 217 |
October 2024 | 243 |
November 2024 | 206 |
December 2024 | 231 |
January 2025 | 246 |
February 2025 | 262 |
March 2025 | 287 |
April 2025 | 70 |
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.