Motivation: Genes are often characterized dichotomously as either housekeeping or single-tissue specific. We conjectured that crucial functional information resides in genes with midrange profiles of expression.

Results: To obtain such novel information genome-wide, we have determined the mRNA expression levels for one of the largest hitherto analyzed set of 62 839 probesets in 12 representative normal human tissues. Indeed, when using a newly defined graded tissue specificity index τ, valued between 0 for housekeeping genes and 1 for tissue-specific genes, genes with midrange profiles having 0.15 < τ < 0.85 were found to constitute >50% of all expression patterns. We developed a binary classification, indicating for every gene theI_B tissues in which it is overly expressed, and the 12 −I_B tissues in which it shows low expression. The 85 dominant midrange patterns withI_B = 2–11 were found to be bimodally distributed, and to contribute most significantly to the definition of tissue specification dendrograms. Our analyses provide a novel route to infer expression profiles for presumed ancestral nodes in the tissue dendrogram. Such definition has uncovered an unsuspected correlation, wherebyde novo enhancement and diminution of gene expression go hand in hand. These findings highlight the importance of gene suppression events, with implications to the course of tissue specification in ontogeny and phylogeny.

Availability: All data and analyses are publically available at the GeneNote website,http://genecards.weizmann.ac.il/genenote/ and, GEO accession GSE803.

Contact: [email protected]

Supplementary information: Four tables available at the above site.

INTRODUCTION

The ontogeny of complex multicellular organisms is enabled by the differential expression of genes across various cell types. Expression profiling with DNA arrays offers the opportunity to systematically identify such patterns (Halfon and Michelson, 2002;Slonim, 2002). Housekeeping genes are expressed in all cell types, whereas other genes are expressed in a more restricted selection of tissues. In the previous research on the tissue specificity of genes, emphasis has mainly been on the extremes of one-tissue specific (Hsiaoet al., 2001,http://www.humangenes.org;Suet al., 2002) and housekeeping genes (Eisenberg and Levanon, 2003;Lercheret al., 2002;Warringtonet al., 2000). However, many genes may show midrange patterns of expression, that is, expressed at a high level in a subset of the tissues, and at a much lower level or not at all in other tissues. This term is related to the cross-tissue ‘breadth’ of gene expression, rather than high or low overall expression intensities. Here, we investigate the occurrence and potential significance of midrange patterns of expression, noting that important information about a given tissue may be harbored not only in tissue-specific enhancement of expression, but also in tissue-specific suppression.

Some recent high-throughput DNA arrays studies of gene expression have been aimed at characterizing healthy tissue transcription patterns. One of these examined the transcription profiles in 28 normal human tissues and 45 mouse tissues, utilizing 12 000 oligonucleotide probesets (Suet al., 2002). cDNA arrays have also been used to examine expression of genes across normal human tissues (Saito-Hisaminatoet al., 2002). These, as well as other surveys on normal tissues (Havertyet al., 2002;Hsiaoet al., 2001) were limited only to the more well-characterized genes, and did not afford a total genome-wide view. Studies on a more complete gene set focused on a comparison between diseased and non-diseased states (Bakayet al., 2002;Iacobuzio-Donahueet al., 2002;Marianiet al., 2002). In a recent report (Shmueliet al., 2003), as well as in the current work, we queried 12 normal human tissues with a complete gamut of 62 839 probesets, representing 23 271 identifiable human genes. This is one of the largest sets employed to date, and includes nearly 12 000 genes whose tissue expression has not been examined by the earlier studies. Most recently,Suet al. (2004) have extended their expression atlas to encompass 79 human and 61 mouse tissues.

The resulting genome-wide view of gene expression patterns is used here to reveal relationships among healthy human tissues, as well as to generate new genome annotation tools. Specifically, our data shed new light on genes with midrange profiles of expression, with implications to the fine balance of gene expression and suppression that underlie tissue specification.

SYSTEMS AND METHODS

Expression data preprocessing

The expression intensity of mRNA was assayed across five microarrays (Affymetrix GeneChips U95A–E), containing a total of 62 839 probesets, each in duplicate. Poly(A)⁺RNA samples from the human tissues were purchased from Clontech (Palo Alto, CA, details in Table S1 in the Supplementary Material). This collection of major human tissues, includes bone marrow, brain, heart, kidney, liver, lung, pancreas, prostate, skeletal muscle, spinal cord, spleen and thymus. These RNA samples have relatively coarsely defined tissue delineations but are compatible in this respect to those used in other studies of transcription patterns in a group of normal human tissues (Suet al., 2002,2004;Saito-Hisaminatoet al., 2002). Each RNA sample was typically composed of a pool of 10–25 individuals. While such commercial pooled samples from anonymous donors are demographically ill-defined, they are advantageous in enabling others to reproduce the experiments.

Replicate experiments were done independently, mostly from RNA of identical lot numbers. Exceptions are kidney, pancreas and prostate. Aliquots of each sample (12 μg cRNA in 200 μl hybridization mix) were hybridized to a GeneChip Human Genome U95A–E array set (Affymetrix, Santa Clara, CA). Preparation and hybridization of cRNA were done according to the manufacturer's instructions (Affymetrix, 2001,http://.com/support/technical/manuals.affx).

The expression value for each gene was determined using the MicrroArray Suite version 5.0 (MAS 5.0) software (Hubbellet al., 2002;Liuet al., 2003) with default parameters, without using the MAS 5.0 scaling and normalizing procedures. The quantilization procedure used here (see below) encapsulates some features of a preprocessing method, RMA normalization (Irizarryet al., 2003). Affymetrix MAS 5.0 signal values were normalized by taking the log₁₀ of all values (substituting −1 for zero intensities) and then subtracting the mean for the particular array and adding the total experimental mean (Shmueliet al., 2003). Finally, intensities less than log₁₀ 30 were set to log₁₀ 30 to eliminate the perturbation by the noise present in the low intensities. Variations in this threshold resulted in no significant changes. The MAS 5.0 intensities, ranging on a decimal logarithmic scale from log₁₀ 30 to roughly 4, were converted into a quantile scale. The expression data, averaged over the two replicates, were divided into 11 bins, whereby 10 equal density quantiles spanned the values above log₁₀ 30, and an eleventh ‘zero bin’ included the remaining low-intensity values. Henceforth, the quantiled profiles were used in the analysis.

Statistical analysis of differential expression

Single-classification ANOVA with equal sample sizes was employed on the preprocessed 24 element expression vector composed of 12 tissues in two replicates. For each tissue profile, the sum of the squares of the differences between the replicates was compared with the sum of the squares of the differences between the averages of the tissue expressions. To account for the multiple comparison problem inherent in calculating theP-values for all 62 839 probesets, we calculated the false discovery rate (Benjamini and Hochberg, 1995). We chose a 1% error rate, which gave aP-value cutoff of 0.0036. This resulted in 22 936 profiles that were defined as ‘differentially expressed’. The remaining profiles were further divided into not expressed profiles, defined as having all 12 values in the zero bin, and housekeeping profiles, whose expression is non-zero in all tissues and all intensities are of a similar value (SD smaller than 1 quantile unit). The remaining profiles were defined as uncategorized. The algorithms described below were deployed on the 22 936 differentially expressed profiles.

Probesets to genes analysis

The association of probesets to genes was performed using the GeneAnnot algorithm (Chalifa-Caspiet al., 2003,2004). GeneAnnot comprehensively identifies relationships between oligonucleotide array probesets and annotated genes in GeneCards (Safranet al., 2002) by performing pairwise alignments between the probe sequences and gene transcripts, and assigning sensitivity and specificity scores to them. A further step of probeset annotation, conducted by GeneTide (Shklaret al., 2004), was to assign annotation based upon the transcript from which the probeset was derived. This was carried out by an integration of transcript annotation data from several resources such as UniGene (Wheeleret al., 2003) and AceView (http://www.humangenes.org). Furthermore, these target sequences were aligned against the human genome using BLAT (Kent, 2002), and assigned a gene according to their genomic location using GeneLoc (Rosenet al., 2003).

ALGORITHMS

Tissue specificity index

The index τ is defined as:

\[\tau =\frac{{\sum }_{i=1}^{N}(1-{x}_{i})}{N-1},\]

whereN is the number of tissues andx_i is the expression profile component normalized by the maximal component value. For example, expression profile ‘0 8 0 0 0 2 0 2 0 0 0 0’ is defined to have τ = 0.95. Other definitions, for example, based on entropy or geometric considerations, were pursued but found less robust in terms of sensitivity to extreme profile component values.

Binary patterns

We first defined the ‘gap’ index for each expression profile as the maximum difference between the two neighboring values in the sorted quantile vector. When the same ‘gap’ was found more than once in a profile, the first gap, between the smaller neighboring values with that gap was taken. The ‘gap’ was used to convert expression profiles into binary form. For those 8224 differentially expressed profiles with a ‘gap’ of at least 3, expression above the ‘gap’ was interpreted as overexpressed (1) and the rest as underexpressed (0). This set of 8224 probeset profiles form our ‘mingap’ set. The remaining 14 712 differentially expressed profiles were classified to the best matching binary patterns detected by ‘gap’ as follows. The Euclidean distance was calculated between each of the 14 712 profiles and the mean expression profile of each of the binary patterns. The pattern to which this distance was smallest was selected as the matching binary pattern for the profile. The binary index,I_B, corresponding to each binary pattern is defined as the number of 1s in the pattern.

Unsupervised clustering

The Superparamagnetic clustering (SPC) algorithm (Blattet al., 1996) was applied to the same set of profiles used in the binary pattern analysis. Before clustering, each profile was centered and normalized such that its mean was centered to zero, and its norm became one [as described byKannanet al. (2001)]. The SPC parameters used are detailed in Table S2 (Supplementary Material).

Ancestral tissue reconstructions

Given two binary tissue expression profiles, an ancestor profile was inferred by first assuming that instances of agreement (both 1s or both 0s) are unaltered in the ancestor. In the disagreement cases (1 and 0, or 0 and 1), maximum parsimony is applied, with a majority call of expression in the remaining tissues. Our method for inferring the ancestors of each node in a dendrogram including the deep internal nodes involved following the linkages of the hierarchically clustered tree and successively inferring each node.

Availability

All analyses were implemented in Matlab (www.mathworks.com). Scripts and intermediate data are all available upon request.

IMPLEMENTATION

Expression profile categorization

Expression profiles were generated for a set of 12 representative normal human tissues (Fig. 1). This was done with a total of 62 839 oligonucleotide probesets, of which nearly 75% corresponded to annotated human genes, encompassing 23 271 GeneCards entries (Safranet al., 2002), and the rest could not be associated with currently known gene-related sequences (Table 1). The 50 214 probesets included in the four less commonly used arrays U95B–E provided novel expression information on 11 418 GeneCards genes. This genome-wide view of human tissue expression patterns is available in the GeneNote database (Shmueliet al., 2003,http://genecards.weizmann.ac.il/genenote/). The expression profiles were classified into four categories: differentially expressed, housekeeping, not expressed and uncategorized (Fig. 1 andTable 1). It is seen that a majority (∼90%) of the probesets in the first two categories are related to known genes, while most of the unannotated probesets are included in the last two categories, as expected.

Distribution of tissue specificities

To examine the complete expression pattern diversity, we developed a tissue specificity index, τ, a quantitative, graded scalar measure of the specificity of an expression profile. τ values interpolate the entire range between 0 for housekeeping genes and 1 for strictly one-tissue-specific genes. It is seen (Fig. 2A) that τ values near 0 and 1 tend to be more probable than the intermediate values, generating a U-shaped distribution. However, as many as 57% of all profiles have intermediate specificities: 0.15 ≤ τ ≤ 0.85, constituting the largest group, greater than the housekeeping and one-tissue-specific sets combined.

To evaluate the robustness of the shape of the τ distribution to additional tissues and organisms, we calculated the same distributions for a previously published set (Suet al., 2002) where 27 human and 45 mouse normal tissues with replicates were analyzed for one-fifth of the gene representations examined here. We found that the shape of the τ distributions was largely similar in all three datasets. Indeed, nearly identical percentages of profiles with intermediate specificity (0.15 ≤ τ ≤ 0.85) are detected: 56% for mouse and 57% for human.

Do our tissue-specificity (τ) estimates from 12 tissues scale up when a more comprehensive number of tissues are examined? A recently published study (Suet al., 2004) provides human gene expression profiles across 74 non-cancerous human tissues. We found a high correlation (R = 0.85) between the τ indices of genes across the two datasets for differentially expressed genes (Fig. 3). Two clusters of τ values differ markedly: low τ in GeneNote, high τ in the new study and vice versa. The former correspond to genes specific to tissues not present in GeneNote, and the latter to spleen, not present in the more recent study. Congruence between the tissue specificity values based upon 12 and 74 tissues demonstrates the power of our newly defined tissue specificity index, and shows that our choice of tissues is fairly representative of the complete tissue-set transcriptome. An analysis of the distribution of τ values for the new dataset (Fig. 2A, dotted line) shows a relatively high preponderance (>60%) of intermediate τ values, likely stemming from the use of subtissues such as different brain regions.

Binary expression patterns

The one-dimensional tissue specificity index is limited in its capacity to identify and categorize specific classes of expression patterns. To overcome this, we developed a procedure that converts an arbitrary expression profile into a binary pattern. The quantiled expression profiles are mapped from a very large set of the cardinality 11¹² (more than 3.1 billion) to a reduced set of 2¹² = 4096 possible patterns. This analysis was initially performed on a subset of 8224 probeset expression profiles that fulfilled a specific intensity gap criterion (the mingap set, see the Algorithms section).

Of the possible 4094 binary patterns (excluding the all-0 and all-1 patterns), 859 were actually observed in this set. The probesets of the first microarray (U95A) detected only 498 of these patterns, while the remaining 42% of the patterns were found only on the four additional arrays (U95B–E). Further, the four additional arrays strengthen 127 patterns that were populated by only one profile in the first array. Subsequently, the differentially expressed profiles not included in the mingap set were binarized by matching each one to its closest binary counterpart.

The results of the binarization are shown inFigure 4. The different panels 4.i (i = 1–12) have profiles (parsed from among the 8224 gene representations of the differentially expressed and 4216 housekeeping genes) with high expression ini tissues and underexpression in 12 minusi tissues. Panel 4.12 contains the strictly defined housekeeping genes. In panel 4.1 (single-tissue specificity), brain, bone marrow, pancreas, skeletal muscle and liver are more highly represented, while spinal cord, kidney, heart and spleen have relatively few profiles. In panel 4.2, prevalent two-tissue specific patterns are brain and spinal cord, heart and skeletal muscle, bone marrow and thymus, and kidney and liver. Bone marrow, spleen and thymus tissues define a major three-tissue pattern in panel 4.3. Panels 4.9–4.11 depict profiles with expression in all but 3, 2 or 1 tissue(s), respectively. Notably, the same five tissues with the most single-tissue specific profiles (brain, bone marrow, pancreas, muscle and liver) also have the greatest number of single-tissue suppressed profiles.

Of the individual profiles appearing inFigure 4, 5220 are not well annotated to any known gene and should therefore be considered interesting. For the implied novel genes, function can be preliminarily ascertained based upon their expression profile. Table S3 (Supplementary Materials) shows the expression profile along with the identifier of the sequence from which the probeset was derived.

We subsequently defined the 99 most populated (>25 probeset profiles) binary patterns among the 22 936 differentially expressed genes including the housekeeping (all 1s) and null (all 0s) patterns (Fig. 2B). The number of populated binary patterns in each binary index,I_B, shows a clear bimodal distribution (Fig. 2C), with peaks at binary index values of 2 and 10. This behavior reflects the same bimodal trend seen for τ values inFigure 2A. Whereas all 12 one-tissue-specific patterns are included, only about one-third of the two-tissue expressed patterns (I_B = 2) and about one-quarter of the two-tissue repressed patterns (I_B = 10) are included in this set, suggesting biases toward specific oligo-tissue combinations. The peak at highI_B values inFigure 2C corresponds to profiles with low expression in 1–3 tissues and high expression in the others. We use the term suppression to describe instances where genes are expressed at lower levels in a few tissues. This does not necessarily imply an active process where the expression of a gene is specifically turned off. It could equally be due to a loss of activation in expression or a dilution of mRNA levels in one tissue relative to another, due to a different cellular composition.

To test the validity of the supervised binary clustering, we also applied unsupervised SPC (Blattet al., 1996;Getzet al., 2000) to our data (Fig. 5A). SPC is suitable for the clustering of gene expression profiles due to its stability against noise and the inherent measure of cluster stability (Getzet al., 2000). The identified 70 SPC clusters showed a strong correlation with the 97 binary clusters (Fig. 5B). Some binary patterns are represented by multiple SPC clusters, thus serving to further refine the relevant binary patterns (Fig. 5C). The high level of overall agreement between the two clustering methods lends additional credence to the binary classification proposed here.

Tissue relationships based upon the expression repertoire

Inter-tissue distances were calculated between pairs of tissue vectors, each containing the 22 936 expression values of the set of differentially expressed profiles. The resulting tissue dendrogram (Fig. 6A) shows a specific set of groupings relating to different degrees of inter-tissue similarities. The dendrogram reveals a set of tissue relationships that is consistent with previous knowledge (Hsiaoet al., 2001). The immunological tissues, bone marrow, thymus and spleen, along with the lung, cluster together. Pairs of related tissues coupled in the dendrogram are: heart and skeletal muscle, kidney and liver, brain and spinal cord, and prostate and pancreas.

To isolate those profiles that specify the underlying relationships among the tissues, we split the differentially expressed profiles into two groups: those withI_B = 1 and those withI_B = 2–11. We found that the tree based upon the second group, with midrange profiles (Fig. 6B) recovers the most important features of the dendrogram based on the entire set: a united nervous system, muscle tissues juxtaposed and immune system mutually coherent. In contrast, the dendrogram based upon theI_B = 1 group (Fig. 6C) is very different and appears much more removed from known tissue relationships. For example, the spinal cord is closest to heart and very distant from brain. One could argue that the visible non-zero off-diagonal values in theI_B = 1 patterns (panel 4.1 ofFig. 4) would contribute sufficient information, so as to generate a more biologically realistic tissue dendrogram. But this is not the case.

Inferring ancestral tissue profiles

The availability of genome-wide expression profiles for each of the tissues provides a unique opportunity to obtain additional information regarding mutual relationships among different tissues. Specifically, it is possible to derive from the tissue dendrogram an inferred gene expression profile for each of the ‘ancestral’ tissues represented by the internal nodes (Fig. 7A). As an example, it is seen (Fig. 7B) that in most cases of expression in brain but not in spinal cord, underexpression is inferred for the ancestral tissue, reflectingde novo specificities for brain. On the other hand, most of the high expressions found in spinal cord but not in brain are also positive in the inferred ancestral tissue, suggesting that the difference corresponds to brain-specific suppressions. More generally, we found that for most ancestral tissues, including all but one of the most closely related doublets, the tissue with more genes showing novel expression also exhibits more genes with novel suppression (Fig. 7C). This phenomenon is also gleaned by visual inspection of panels 1 and 11 ofFigure 4, as described above.

DISCUSSION

This paper proposes a set of novel genome-wide-specific annotation tools. First, each of the 23 271 genes targeted by 46 185 probesets (Table 1) has one or more tissue expression profiles, documented in GeneNote. A set of tools has been developed to allow one to generate a consensus expression pattern for each of these genes, with the exclusion of outliers. Second, every gene is marked with a specific value of τ, identifying it as belonging to a particular range on a graded tissue specificity measure between extreme tissue specificity and a complete absence of such specificity. Third, a gene with a differentially expressed profile is related to a binary pattern, indicating the combination of tissues in which it is more highly expressed and suppressed. We believe that these binary patterns are more amenable to intuitive scientific interpretation than classification based on standard clustering algorithms. It is reassuring, though, that a high degree of correlation is demonstrated between the two systems. All the above information provides tools for assigning potential function to novel and hitherto un-annotated genes. As the annotation tools presented here are easily generalized, we believe they can be fruitfully applied to a wide spectrum of datasets, for example, to sets with tumor and non-tumor samples.

The binary pattern analysis is particularly useful in revealing expression profiles that constitute unusual tissue combinations. For example, the pattern number 36 inFigure 2B denotes high expression in bone marrow, pancreas and liver; pattern number 47 denotes high expression in heart, prostate and spinal cord. In general, among the 98 binary patterns ofFigure 2B that show expression in at least one tissue, 1 pattern corresponds to the housekeeping expression profile and another 12 denote single-tissue-specific profiles. The remaining 85 patterns are defined here as denoting midrange profiles of expression. Of these, a maximum of 33 patterns may be considered as consistent with tissue clustering as defined by the dendrogram ofFigure 6A, as they correspond to the groups of tissues defined by the terminal and internal nodes of the dendrogram. Thus, a majority of the dominant binary patterns corresponding to midrange profiles may be viewed as unexpected. Such patterns are difficult to explain in terms of tissue similarities, including the sharing of common cell types among disparate tissues. Alternatively, there may be yet undiscovered underlying transcription control mechanisms that could be discerned by future research. Some such unexpected expression patterns may be a neutral mode of expression (Khaitovichet al., 2004;Yanaiet al., 2004).

The approach explored here focuses on midrange profiles of transcription, with elevated expression/suppression in specific tissue combinations, and intermediate values of the tissue specificity index τ. Our analysis has revealed that midrange profiles constitute a majority of the tissue specificity expression patterns. Despite its ubiquity, this category has received remarkably little attention relative to its housekeeping and tissue-specific counterparts. Of the nearly 100 most populated binary patterns, more than 80% are midrange patterns. A recent expression study in maize has also shown that a relatively small portion of genes tend to be organ specific while the remaining show diverse expression (Choet al., 2002).

Most focused arrays with specific subsets of genes used by various authors contain mostly tissue-specific genes whose level is elevated in a single tissue. Such arrays may be considered ‘too focused’. Our results and analyses, suggesting the importance of genes with midrange expression profiles, could have serious impact in terms of array design and experimentation.

A dominant property of midrange profiles is the surprising preponderance of patterns with tissue-specific gene suppression (I_B = 9–11), which are almost as populated as oligo-expression patterns (I_B = 2–4). The most underrepresented set of profiles are the midrange profiles withI_B = 5–7. Our results also indicate that in the evolution of a tissue,de novo expression andde novo suppression go hand in hand.

It thus appears that gene suppression plays a major role in tissue evolution and is tightly coupled with novel expression in the origin of distinct tissues. Such tissue-specific gene suppression may be mediated by specific pathways of transcription control (Hsia and McGinnis, 2003), as well as by other cellular mechanisms, including those mediated by RNA interference (Cerutti, 2003). One practical conclusion related to tissue-specific arrays is that these should preferably contain, in addition to single-tissue-specific genes, also genes that manifest more complex patterns of expression–suppression.

CONCLUSION

Understanding the signaling and control pathways that govern organ development during ontogeny constitutes a fundamental problem of developmental biology (Burgesset al., 2002). Studies in model organisms such asDrosophila have demonstrated the complex interplay of signaling molecules that underlie developmental events associated with the embryonic maturation of tissues and cell types (St Johnston, 2002). The exact spatial and temporal expression of genes, and the interaction of their protein products elicit a developmental code of organ commitment and early patterning. This code likely manifests itself in the pattern of gene expression in each of the tissues. Furthermore, when new tissues are formed in ontogeny or phylogeny, their ancestral precursors should have their own expression patterns, complexly related to those of the more highly differentiated derived tissues. To validate this concept in the future, direct experimental testing of expression patterns at early stages in embryogenesis will be required. Our analysis of ancestral tissue expression, which points to a correlation between novel tissue expression and suppression, and the availability of a tissue dendrogram relating to the full gamut of genes of the human genome, can serve as a valuable tool for such studies.

Fig. 1

Gene expression profiles across 12 normal human tissues. Classification of the 62 839 expression profiles (horizontal lines) into four groups indicated by the right bar: HK, housekeeping; DE, differentially expressed; UC, uncategorized; and NE, not expressed (Table 1). The left bar indicates the origin from Array A (peach) and Arrays B–E (brown). The profiles within each category were sorted in ascending order according to τ, the tissue specificity index. Expression intensities are color coded by quantile values on the bottom bar. Tissue abbreviations: BRN, brain; SPC, spinal cord; BMR, bone marrow; SPL, spleen; TMS, thymus; LNG, lung; PNC, pancreas; PST, prostate; HRT, heart; MSL, skeletal muscle; KDN, kidney; and LVR, liver.

Open in new tab Download slide

Table 1

Open in new tab

Partition of U95A–E probesets into four expression categories (Fig. 1)

	Differentially expressed	Housekeeping	Not expressed	Uncategorized	Total
GeneCards	20 589	3181	9859	12 726	46 185
Not annotated	2347	1035	6502	6600	16 654
Total	22 936	4216	16 361	19 326	62 839

	Differentially expressed	Housekeeping	Not expressed	Uncategorized	Total
GeneCards	20 589	3181	9859	12 726	46 185
Not annotated	2347	1035	6502	6600	16 654
Total	22 936	4216	16 361	19 326	62 839

The probesets to GeneCards associations were done using the GeneAnnot algorithm (Chalifa-Caspiet al., 2004), and annotation from the original transcripts from which the probesets were derived (see Systems and methods section).

Table 1

Open in new tab

Partition of U95A–E probesets into four expression categories (Fig. 1)

	Differentially expressed	Housekeeping	Not expressed	Uncategorized	Total
GeneCards	20 589	3181	9859	12 726	46 185
Not annotated	2347	1035	6502	6600	16 654
Total	22 936	4216	16 361	19 326	62 839

	Differentially expressed	Housekeeping	Not expressed	Uncategorized	Total
GeneCards	20 589	3181	9859	12 726	46 185
Not annotated	2347	1035	6502	6600	16 654
Total	22 936	4216	16 361	19 326	62 839

Fig. 2

Tissue specificity index and expression pattern repertoire.(A) Distribution of τ values for 27 152 profiles that include the 22 936 differentially expressed and 4216 housekeeping profiles (bars). The τ distributions are also shown for the 12 626 profiles (array HE-U133A) across 74 human tissues (dotted line curve) from a recent study (Suet al., 2004), as well as 27 human tissues (light gray curve), and 12 654 across 45 mouse tissues (dark gray curve) from another set (Suet al., 2002).(B) A summary representation of the most populated binary patterns (columns), where black circles indicate high expression. The patterns, enumerated on the abscissa, are sorted according to binary value. Tissue abbreviations as inFigure 1.(C) The frequency distribution ofI_B values of the binary patterns shown in (B). The superimposed curve indicates the expected distribution following a random binomial model.

Open in new tab Download slide

Fig. 3

Comparison of tissue specificity indices (τ) in different data sets. The τ indices of the current work (GeneNote) were compared with those of a recently published set with 74 human tissues (Suet al., 2004). The two sets were compared based upon the probeset mappings (U95 and U133) released by Affymetrix, where if more than one U133 probeset could be matched for a given U95 probeset, only one was taken. This restriction resulted in 13 124 probeset pairs. The expression intensities of the two sets were normalized by quantile normalization, and subsequently the mean of each expression profile for each set was scaled to the total mean of the profile. Replicates were averaged and signal quantilization was carried out as described in the systems and methods section. Shown are the τ pairs for the 9450 differentially expressed genes of our set.

Open in new tab Download slide

Fig. 4

Profile clustering in 12 binary pattern sets. Each panel contains all mingap set expression profiles with binary pattern having theI_B value indicated on top, sorted by the pattern's binary value. Housekeeping profiles are taken asI_B=12. The profiles of each pattern were further sorted according to the mean of expression values corresponding to the non-zero elements in the corresponding binary pattern. The color code for expression intensity is as inFigure 1. Table S4 (Supplementary Material) specifies the accession identifiers for each expression profile shown.

Open in new tab Download slide

$Superparamagnetic clustering (SPC). (A) A summary representation of the results of SPC. Each column (enumerated on the abscissa and sorted by gap-computed binary values) represents an average pattern of all profiles in a cluster. The predefined SPC minimal cluster size was set to 10 profiles. Colour coding is as in Fig. 1, representing average quantiles. (B) Each element in the matrix is a comparison between the profiles of an SPC cluster and those of a binary cluster. Binary clusters were obtained for the 8224 profiles of the mingap set, and only the patterns with at least 10 profiles are shown. The color-coded score (right bar) is calculated as the number of fraction profiles between the two cluster patterns, divided by the size of the smaller of the two clusters. (C) Augmented view of 11 individual SPC clusters. Centered and normalized quantiled expression profiles of the cluster's members are shown. The clusters 2, 23 and 65 [corresponding to their rows in B] manifest expression in both brain and spinal cord; 2 is equally expressed in both tissues, 23 is higher in the brain and 65 is higher in spinal cord. Similar relationships are seen in the cluster triplets 12, 42 and 59, expressed in both heart and skeletal muscle, and 15, 36 and 46, expressed in both liver and kidney. Table S2 (Supplementary Material) specifies the accession identifiers for all profiles clustered by SPC. Colour represents normalized centered expression intensities.$

Fig. 5

Superparamagnetic clustering (SPC).(A) A summary representation of the results of SPC. Each column (enumerated on the abscissa and sorted by gap-computed binary values) represents an average pattern of all profiles in a cluster. The predefined SPC minimal cluster size was set to 10 profiles. Colour coding is as inFig. 1, representing average quantiles.(B) Each element in the matrix is a comparison between the profiles of an SPC cluster and those of a binary cluster. Binary clusters were obtained for the 8224 profiles of the mingap set, and only the patterns with at least 10 profiles are shown. The color-coded score (right bar) is calculated as the number of fraction profiles between the two cluster patterns, divided by the size of the smaller of the two clusters.(C) Augmented view of 11 individual SPC clusters. Centered and normalized quantiled expression profiles of the cluster's members are shown. The clusters 2, 23 and 65 [corresponding to their rows in B] manifest expression in both brain and spinal cord; 2 is equally expressed in both tissues, 23 is higher in the brain and 65 is higher in spinal cord. Similar relationships are seen in the cluster triplets 12, 42 and 59, expressed in both heart and skeletal muscle, and 15, 36 and 46, expressed in both liver and kidney. Table S2 (Supplementary Material) specifies the accession identifiers for all profiles clustered by SPC. Colour represents normalized centered expression intensities.

Open in new tab Download slide

Fig. 6

Midrange specificity profiles and the tissue dendrograms. Tissue dendrograms, based upon a hierarchical clustering using average linkage of the differentially expressed genes.(A) based on all differentially expressed (DE) profiles;(B) based upon the 4921 mingap profiles withI_B = 2–11;(C) based upon 3303 profiles withI_B = 1. A systematic analysis of this trend (data not shown) suggests that the contribution to the dendrogram from profiles with high binary index values (I_B = 8–11) is greater than that from the symmetrically disposed patterns with low binary index (I_B = 2–5).

Open in new tab Download slide

Fig. 7

Ancestral tissue patterns.(A) The tissue dendrogram, based upon the differentially expressed genes, is shown with superimposed tissue vectors and inferred ancestral tissue vectors based upon the binary profiles of the mingap set.(B) Two instances of ancestral tissue (ANC) inferences for brain and spinal cord (left) and for kidney and liver (right).(C) Correlation of novel tissue expression and novel tissue suppression of profiles with reference to their inferred ancestor. For each internal ancestral node, a ratio was calculated between the novel expressions in the two derived tissue vectors with respect to the ancestral tissue vector (abscissa). Similar ratios were also calculated for the novel suppressions (ordinate). Points represent all internal nodes, with tissue pair order selected to have an abscissa value higher than 1. Only two of the internal nodes [labeled in light and dark gray, and similarly marked in (A)] had an ordinate value lower than 1, indicating lack of correlation. The node corresponding to the brain and spinal cord ancestor is marked with an empty circle.

Open in new tab Download slide

I.Y. is a Koshland Scholar, D.L. is the incumbent of the Ralph and Lois Silver Chair in Human Genomics, and E.D. is the incumbent of the Henry J. Leir Professorial Chair. This work was made possible by the generosity of the Abraham and Judith Goldwasser Foundation. It was further supported by the Crown Human Genome Center and by the Koshland Center for Basic Research.

REFERENCES

Affymetrix.

2001

Microarray Suite User Guide, Version 5

Bakay, M., Zhao, P., Chen, J., Hoffman, E.P.

2002

A web-accessible complete transcriptome of normal human and DMD muscle.

Neuromuscul. Disord.

(Suppl. 1),

S125

–S141

Benjamini, Y. and Hochberg, Y.

1995

Controlling the false discovery rate: a practical and powerful approach to multiple testing.

J. R. Stat. Soc. B

289

–300

Blatt, M., Wiseman, S., Domany, E.

1996

Superparamagnetic clustering of data.

Phys. Rev. Lett.

3251

–3254

Burgess, R., Lunyak, V., Rosenfeld, M.

2002

Signaling and transcriptional control of pituitary development.

Curr. Opin. Genet. Dev.

534

–539

Cerutti, H.

2003

RNA interference: traveling in the cell and gaining functions?.

Trends Genet.

–46

Chalifa-Caspi, V., Shmueli, O., Benjamin-Rodrig, H., Rosen, N., Shmoish, M., Yanai, I., Ophir, R., Kats, P., Safran, M., Lancet, D.

2003

GeneAnnot: interfacing GeneCards with high throughput gene expression compendia.

Brief. Bioinformatics

349

–360

Chalifa-Caspi, V., Yanai, I., Ophir, R., Rosen, N., Shmoish, M., Benjamin-Rodrig, H., Shklar, M., Stein, T.I., Shmueli, O., Safran, M., et al.

2004

GeneAnnot: comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes.

Bioinformatics

1457

–1458

Cho, Y., Fernandes, J., Kim, S.H., Walbot, V.

2002

Gene-expression profile comparisons distinguish seven organs of maize.

Genome Biol.

research0045

Eisenberg, E. and Levanon, E.Y.

2003

Human housekeeping genes are compact.

Trends Genet.

362

–365

Getz, G., Levine, E., Domany, E.

2000

Coupled two-way clustering analysis of gene microarray data.

Proc. Natl Acad. Sci. USA

12079

–12084

Halfon, M.S. and Michelson, A.M.

2002

Exploring genetic regulatory networks in metazoan development: methods and models.

Physiol. Genomics

131

–143

Haverty, P.M., Weng, Z., Best, N.L., Auerbach, K.R., Hsiao, L.L., Jensen, R.V., Gullans, S.R.

2002

HugeIndex: a database with visualization tools for high-density oligonucleotide array data from normal human tissues.

Nucleic Acids Res.

214

–217

Hsia, C.C. and McGinnis, W.

2003

Evolution of transcription factor function.

Curr. Opin. Genet. Dev.

199

–206

Hsiao, L.L., Dangond, F., Yoshida, T., Hong, R., Jensen, R.V., Misra, J., Dillon, W., Lee, K.F., Clark, K.E., Haverty, P., et al.

2001

A compendium of gene expression in normal human tissues.

Physiol. Genomics

–104

Hubbell, E., Liu, W.M., Mei, R.

2002

Robust estimators for expression analysis.

Bioinformatics

1585

–1592

Iacobuzio-Donahue, C.A., Maitra, A., Shen-Ong, G.L., van Heek, T., Ashfaq, R., Meyer, R., Walter, K., Berg, K., Hollingsworth, M.A., Cameron, J.L., et al.

2002

Discovery of novel tumor markers of pancreatic cancer using global gene expression technology.

Am. J. Pathol.

160

1239

–1249

Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.

2003

Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

Biostatistics

249

–264

Kannan, K., Kaminski, N., Rechavi, G., Jakob-Hirsch, J., Amariglio, N., Givol, D.

2001

DNA microarray analysis of genes involved in p53 mediated apoptosis: activation of Apaf-1.

Oncogene

3449

–3455

Kent, W.J.

2002

BLAT—the BLAST-like alignment tool.

Genome Res.

656

–664

Khaitovich, P., Weiss, G., Lachmann, M., Hellmann, I., Enard, W., Muetzel, B., Wirkner, U., Ansorge, W., Paabo, S.

2004

A neutral model of transcriptome evolution.

PLOS Biol.

E132

Lercher, M.J., Urrutia, A.O., Hurst, L.D.

2002

Clustering of housekeeping genes provides a unified model of gene order in the human genome.

Nat. Genet.

180

–183

Liu, G., Loraine, A.E., Shigeta, R., Cline, M., Cheng, J., Valmeekam, V., Sun, S., Kulp, D., Siani-Rose, M.A.

2003

NetAffx: affymetrix probesets and annotations.

Nucleic Acids Res.

–86

Mariani, T.J., Budhraja, V., Mecham, B.H., Gu, C.C., Watson, M.A., Sadovsky, Y.

2002

A variable fold-change threshold determines significance for expression microarrays.

FASEB J.

321

–323

Rosen, N., Chalifa-Caspi, V., Shmueli, O., Adato, A., Lapidot, M., Stampnitzky, J., Safran, M., Lancet, D.

2003

GeneLoc: exon-based integration of human genome maps.

Bioinformatics

(Suppl. 1),

I222

–I224

Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., Ben-Dor, U., Esterman, N., Rosen, N., Peter, I., et al.

2002

GeneCards (2002): towards a complete, object-oriented, human gene compendium.

Bioinformatics

1542

–1543

Saito-Hisaminato, A., Katagiri, T., Kakiuchi, S., Nakamura, T., Tsunoda, T., Nakamura, Y.

2002

Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray.

DNA Res.

–45

Shklar, M., et al.

2004

GeneTide: Terra Incognita Discovery Endeavor Mining ESTs and Expression Data to Elucidate Known and De-Noro GeneCards Genes.

478

–479 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, CSB2004

Shmueli, O., Horn-Saban, S., Chalifa-Caspi, V., Shmoish, M., Ophir, R., Benjamin-Rodrig, R., Safran, M., Domany, E., Lancet, D.

2003

GeneNote: whole genome expression profiles in normal human tissues.

C.R. Biologies

326

1067

–1072

Slonim, D.K.

2002

From patterns to pathways: gene expression data analysis comes of age.

Nat Genet.

(Suppl.),

502

–508

St Johnston, D.

2002

The art and design of genetic screens:Drosophila melanogaster .

Nat. Rev. Genet.

176

–188

Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al.

2002

Large-scale analysis of the human and mouse transcriptomes.

Proc. Natl Acad. Sci. USA

4465

–4470

Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al.

2004

A gene atlas of the mouse and human protein-encoding transcriptomes.

Proc. Natl Acad. Sci. USA

101

6062

–6067

Warrington, J.A., Nair, A., Mahadevappa, M., Tsyganskaya, M.

2000

Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes.

Physiol. Genomics

143

–147

Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Sequeira, E., Tatusova, T.A., et al.

2003

Database resources of the National Center for Biotechnology.

Nucleic Acids Res.

–33

Yanai, I., Graur, D., Ophir, R.

2004

Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control.

OMICS

–24

Issue Section:

Original Papers>Gene expression

Download all slides

Citations

Views

20,769

Altmetric

More metrics information

Metrics

Total Views20,769

16,549Pageviews

4,220PDF Downloads

Since 11/1/2016

Month:	Total Views:
November 2016	5
December 2016	14
January 2017	39
February 2017	58
March 2017	93
April 2017	43
May 2017	90
June 2017	39
July 2017	52
August 2017	70
September 2017	60
October 2017	65
November 2017	218
December 2017	238
January 2018	279
February 2018	366
March 2018	347
April 2018	354
May 2018	206
June 2018	383
July 2018	342
August 2018	220
September 2018	300
October 2018	363
November 2018	363
December 2018	230
January 2019	194
February 2019	446
March 2019	431
April 2019	486
May 2019	269
June 2019	187
July 2019	176
August 2019	180
September 2019	202
October 2019	189
November 2019	122
December 2019	137
January 2020	109
February 2020	94
March 2020	122
April 2020	146
May 2020	111
June 2020	152
July 2020	127
August 2020	108
September 2020	182
October 2020	168
November 2020	180
December 2020	190
January 2021	164
February 2021	157
March 2021	235
April 2021	243
May 2021	279
June 2021	222
July 2021	167
August 2021	154
September 2021	204
October 2021	190
November 2021	227
December 2021	160
January 2022	174
February 2022	164
March 2022	182
April 2022	212
May 2022	214
June 2022	193
July 2022	210
August 2022	187
September 2022	266
October 2022	227
November 2022	203
December 2022	179
January 2023	198
February 2023	205
March 2023	224
April 2023	214
May 2023	240
June 2023	219
July 2023	160
August 2023	223
September 2023	198
October 2023	192
November 2023	176
December 2023	245
January 2024	271
February 2024	263
March 2024	410
April 2024	298
May 2024	292
June 2024	227
July 2024	204
August 2024	190
September 2024	217
October 2024	243
November 2024	206
December 2024	231
January 2025	246
February 2025	262
March 2025	287
April 2025	70

Citations

814Web of Science

Altmetrics

Email alerts

New journal issues

New journal articles

Citing articles via

Web of Science (814)

Google Scholar

Finding Antibodies in Cryo-EM maps with CrAI

Approximating edit distances between complex tandem repeats efficiently

Sawfish: Improving long-read structural variant discovery and genotyping with local haplotype modeling

Gradient matching accelerates mixed-effects inference for biochemical networks

Predicting Explainable Dementia Types with LLM-aided Feature Engineering

Looking for your next opportunity?

Having trouble contacting the network. Please try again in a moment or two.

Online ISSN 1367-4811

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Cookie policy
Privacy policy
Legal notice

Movatterモバイル変換

Article Contents

Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification

Abstract

INTRODUCTION

SYSTEMS AND METHODS

Expression data preprocessing

Statistical analysis of differential expression

Probesets to genes analysis

ALGORITHMS

Tissue specificity index

Binary patterns

Unsupervised clustering

Ancestral tissue reconstructions

Availability

IMPLEMENTATION

Expression profile categorization

Distribution of tissue specificities

Binary expression patterns

Tissue relationships based upon the expression repertoire

Inferring ancestral tissue profiles

DISCUSSION

CONCLUSION

REFERENCES

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only