Movatterモバイル変換


[0]ホーム

URL:


Skip to Main Content
Advertisement
Oxford Academic
Search
Bioinformatics
International Society for Computational Biology
Close
Search
Journal Article

Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach

,
Shao Li*
Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University
 
Beijing 100084, China
*To whom correspondence should be addressed.
Search for other works by this author on:
,
Lijiang Wu
Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University
 
Beijing 100084, China
Search for other works by this author on:
Zhongqi Zhang
Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University
 
Beijing 100084, China
Search for other works by this author on:

Associate Editor: Alfonso Valencia

Author Notes
Bioinformatics, Volume 22, Issue 17, September 2006, Pages 2143–2150,https://doi.org/10.1093/bioinformatics/btl363
Published:
04 July 2006
Article history
Received:
03 February 2006
Revision received:
16 May 2006
Accepted:
29 June 2006
Published:
04 July 2006
Search
Close
Search

Abstract

Motivation: Network reconstruction of biological entities is very important for understanding biological processes and the organizational principles of biological systems. This work focuses on integrating both the literatures and microarray gene-expression data, and a combined literature mining and microarray analysis (LMMA) approach is developed to construct gene networks of a specific biological system.

Results: In the LMMA approach, a global network is first constructed using the literature-based co-occurrence method. It is then refined using microarray data through a multivariate selection procedure. An application of LMMA to the angiogenesis is presented. Our result shows that the LMMA-based network is more reliable than the co-occurrence-based network in dealing with multiple levels of KEGG gene, KEGG Orthology and pathway.

Availability: The LMMA program is available upon request.

Contact:  [email protected]

Supplementary Information: Supplementary data are available atBioinformatics online.

1 INTRODUCTION

Reconstructing networks of biological entities such as genes, transcription factors, proteins, compounds and other regulatory molecules is very important for understanding the biological processes and the organizational principles of the biological systems (Barabasi and Oltvai, 2004). Rapid progress in the biomedical domain has resulted in enormous amount of biomedical literatures. Along with the booming growth of biomedical researches, literature mining (LM) has become a promising direction for knowledge discovery. Various techniques have been developed which make it possible to reveal putative biological networks hidden in the huge collection of individual literatures (Shatkay and Feldman, 2003). Among them the co-occurrence (co-citation) approach (Stapley and Benoit, 2000;Jenssenet al., 2001) is the simplest and most comprehensive in implementation. It can also be easily adopted to find the association between biological entities, such as the genes relations (Jenssenet al., 2001) and the chemical compound–gene relations (Zhuet al., 2005). In our early study (Zhang and Li, 2004), a subject-oriented literature mining technique has been developed to extract subject-specific knowledge by incorporating prior knowledge from biologists. Such approach can retrieve information contained in abundant literatures regardless of individual experimental conditions. However, the literature-derived network is relatively crude and redundant. The co-occurrence approach lacks realistic analysis of various types of relations since the literature reservoir is a collection of results from diverse investigations. Moreover, the networks constructed from literatures are usually not specific with respect to a certain biological process and may unavoidably include overlapped relations, resulting in large and densely connected networks lacking in significant biological meaning.

Network reconstruction from the high-throughput microarray data is another active area in the past decade (van Somerenet al., 2002;de Jong 2002). Microarray technology that documents large-scale gene expression profiles allows characterizing the states of a specific biological system, providing a powerful platform for assessing global gene regulation and gene function. So far, a number of methods are available on reconstructing gene networks using microarray such as deterministic Boolean networks (Lianget al., 1998) and ordinary differential equations (Zaket al., 2003). However, it is difficult to build a reliable network from a small number of array samples owing to the non-uniform distribution of gene expression levels among thousands of genes. Such a technique is also insufficient for detailed biological investigation when prior knowledge is absent (Le Phillipet al., 2004).

Both literature-based and microarray-based approaches share the common goal of identifying the hidden networks of biological entities. Integrating both the experimental data and the literature knowledge in an iterative fashion seems to be an effective way in biological network modeling (Le Phillipet al., 2004). Various approaches have been developed to identify gene clusters and accompanying literature topics (Küffneret al., 2005), and model some biological process such as neuro-endocrine-immune interactions (Wu and Li, 2005). In this article, we propose a novel approach to reconstruct gene networks through combining literature mining and microarray analysis (LMMA), where a global network is first derived using the literature-based co-occurrence method, and then refined using microarray data. The LMMA approach is applied to build an angiogenesis network. The network and its corresponding biological meaning are evaluated in multiple levels of KEGG Gene, KEGG Orthology and pathway. The results show that the LMMA-based network is more reliable and manageable with more significant biological content than the LM-based network.

2 METHODS

2.1 Co-occurrence-based PubMed literature mining

The first step in the LMMA approach is to derive co-occurrence dataset through literature mining. To find co-citations, a pool of articles and a dictionary containing gene symbols and their synonyms are required. In LMMA approach, the literature information is mainly obtained from the National Library of Medicine's PubMed database (Author Webpage). In general LM approach, specific subject interactions cannot be highlighted since all interactions tend to have similar co-citation number (Zhang and Li, 2004). We therefore prepare candidate articles and/or biological entities dictionary at two stages to incorporate prior knowledge. First, using a keyword referring to a certain subject, we select PubMed literatures that contain only terms of a biological subject. Next, an authoritative, standard or specific glossary is employed to provide a context for building topic-related gene networks. In the present work, we use the HUGO (Human Genome Organisation,Author Webpage) glossary, which contains ∼20000 non-redundant gene symbols, for literature mining.

We perform LM by sharing assumption with many existing LM systems that when two genes are co-cited in the same text unit, there should be a potential biological relationship between them (Stapley and Benoit, 2000;Jenssenet al., 2001). Sentence as a text unit is found to make the good trade-off between precision and recall with high effectiveness (Dinget al., 2002). Accordingly, as our previous work described (Zhang and Li, 2004), we regard two HUGO gene symbols as co-related if they are co-cited in the same sentence. In the HUGO glossary, one gene corresponds to a unique symbol (a one-to-one short mnemonic representation of the gene name) with several aliases. We regard all <alias, symbol> as <key, value> and store them in a hash table, by which many alias co-occurrences are reflected as a corresponding symbol co-occurrence. Capital letters and lowercase are discerned and only the complete words are considered. Next, we count the co-occurrence number of all symbol pairs to form an LM-based network regarded to a special subject.

2.2 Microarray datasets

Microarray datasets related to a biological process are collected from experiments or public repositories such as SMD (Stanford Microarray Database,Author Webpage) that stores a large volume of raw and normalized data from public microarray information (Sherlocket al., 2001). The downloaded microarray data are pre-processed following SMD procedures. In the step of Gene Filtering Options, we selected ‘center data for each array by mean’. Meanwhile, aK-nearest neighbors method (Troyanskayaet al., 2001) is used to evaluate the missing values in the microarray datasets. Briefly, a Pearson correlation analysis is employed to derive otherK genes which have the most similar expression profiles of a missing value of genexi in a observation (i.e. a microarray experiment)j, xij. Then the missing value is retrived from the weighted mean of the correspondingK genes.

2.3 LMMA-based network construction

The LMMA approach employs a module of statistical multivariate selection for gene interaction analysis. This is based on the hypothesis that if a co-cited gene-pair is positively or negatively co-expressed, they will indeed interact with each other (D'haeseleeret al., 2000;Geet al., 2001). Taking the values ofn genes as variablex1,x2, … , xn, the dataset withm observations (i.e.m microarray experiments) and then variables is denoted by
[x1,x2,,xn]=[x11x12x1nx21x22x2nxm1xm2xmn].
(1)
Regression approaches are known to be helpful for analyzing microarray data and handling the complex correlations between expression levels of various genes across samples (Westet al., 2001;Segalet al., 2003). Assuming the relations of variables in gene expression data follow a linear model (D'haeseleeret al., 1999), a LM-based network can be refined through multiple variables selection, resulting in a network called LMMA network. We define each node coupling with its neighboring nodes as a ‘unit-network’. The linear approximation of a sub-model is expressed as,xk=βk0+βk1xk1+βk2xk2++βklxkl+e, where the variablesxk1,xk2,,xkl denote the neighboring nodes ofxk in the LM-based network, ande is the random error. Subsequently, stepwise multiple variables selection is used to add new variables and eliminate insignificant variables. The significance of a variable is measured by aP-value which is determined from aF-test,
Fj=SSRjSSE/(ml1),
(2)
where SSE and SSRj represent the residual of random error and the residual of variables (except variablej) of the full model, respectively,m is the number of observations andl is the number of variables (including variablej) of the full model.
A cutoff threshold ofP-value, named Thp, is set to determine whether a variable should be added or deleted. Starting from a null model, we add one variable for evaluation at a time. When adding a new variable, some predetermined variables may exceed the threshold and will be deleted one at a time. The statistical significance of a whole model is also verified using theF-test as follows:
FMod=SSR/lSSE/(ml1),
(3)
where SSR represents the residual of random error and the residual of all variables of the full model. TheP-value,P=P(Fl,ml1>FMod). The significance of the LMMA network of one node can be evaluated.

A LMMA-based network is constructed by recombining all the refined sub-networks after the multivariate selection. For a specific interaction betweenxi andxj, there are two regression coefficients. One is when we regressxi onxj, the other is when we regressxj onxi, and the one with smallerP-value is used in LMMA. Note that the directionality of the LMMA network is currently not considered.

2.4 Network evaluation

First, in a network, a node represents a gene, and a connection between two nodes represents that these two genes are biologically related. The number of connections that a node has in a network is called the degree of the node (Jeonget al., 2000;Hanet al., 2004;Songet al., 2005), which indicates how many genes one gene is related with. The distribution of co-citation degrees is analyzed to know the topological property of the network reconstructed. The connectivity of a network is expressed by the shortest path from one node to another through the network. Each pair of nodes(xi,xj) has a shortest pathli,j. The average path length of the network is defined as
2n(n1)injj<ili,j
Second, a permutation test is performed to examine the stability and integrity of the LMMA network. Keeping the total number of connections fixed the same as the LMMA network, we randomly eliminate the connections in the LM sub-network whose nodes (genes) are overlapped with microarray dataset, resulting in the so-called LM-random filtering networks. The cluster sizes between LM-random filtering and LMMA networks are analyzed by Kolmogorov–Smirnov test. Here, a cluster is defined as a group of connected genes separated from other genes. The cluster size denotes the number of genes in a cluster. Next, the average path length of the largest cluster in both LMMA and LM-random filtering networks is normalized (divided by the number of nodes) and then statistically analyzed byt test.

Third, we employ a leave one out cross validation (LOOCV) (Lachenbruch and Mickey, 1968) approach for evaluating the goodness of fitting in both LM and LMMA networks. According to LOOCV, when the observation (i.e. the microarray experiment)j is omitted for genei and its neighbors, gene 1(i), gene 2(i), … , genel(i), a new linear network can be constructed based on the remaining observationsx-j(i) andxj1(i),xj2(i),,xjl(i). And the omittedxj(i) can be recovered asx^j(i) through the corresponding observations of neighboring genesxj1(i),xj2(i),,xjl(i).

To evaluate the robustness of the network, the mean square error (MSE) for genei, and the error sum of squares (SSE) for a holistic network, are calculated according to the following equations,
MSE(i)=1mj=1m(xj(i)x^j(i))2
(4)
SSE=i=1wj=1m(xj(i)x^j(i))2,
(5)
wherem is the number of the experiments, and also the times of iteration;xj(i) is the true value andx^j(i) is the re-evaluated value;w is the number of genes within the network. A lower MSE value refers to good fitting.
A standard-score of MSE, SSmse, can be expressed as
SSmse=SSE/mwSTD{xj(i)},
(6)
whereSTD{xj(i)} represents the standard deviation of the gene expressionxj. The standard-score depicts a relative value of SSE to gene expression variation. Good model exhibits smallSmse value.

2.5 Network validation and pathway extraction

Pathway information is essential for successful quantitative modeling of biological systems (Caryet al., 2005). A well-known pathway database that provides the information of metabolic, regulatory and disease pathways is deposited in KEGG (Kyoto Encyclopedia of Genes and Genomes,Author Webpage) (Kanehisa and Goto, 2000). The relationship recorded in KEGG database is known to be special on the conception KEGG Orthology (KO,Author Webpage), a classification of orthologous genes that links directly to known pathways defined by KEGG. The KO dataset is a single complex flat file containing entries for all of the KO functional terms (the leaf nodes at the fourth level of the KO hierarchy). For more details about KO refer to Maoet al. (2005).

In order to take further insights on the underlying biological meanings of our networks, we map the LM- and LMMA-based networks to KEGG pathway database. First, we extract the KO hierarchy and the known associations between genes and their corresponding KO functional terms from the KO dataset. Second, we extract all the annotated genes from the KEGG Genes (KG) dataset. Both the KO hierarchical and the KG hierarchical relations are employed as benchmarks to validate the interactions in the networks. Here, a true positive (TP) defines an entry that is identified in our networks and is also identified in the dataset, a false positive (FP) refers to an entry that is identified in our networks but it does not occur in the dataset, a true negative (TN) represents an entry that is not identified in our networks and it does not occur in the dataset and a false negative (FN) indicates an entry that is not identified in our network but it occurs in the dataset. Here, we consider only KO/KG connections as entries for the definition of true positives, TP, FP, TN and FN. The precision,p, and the recall,r, of a network are derived respectively using the definitionp=TP/(TP+FP), andr=TP/(TP+FN). To validate the effect of the LMMA for the precision of the relations predicted, Fisher Exact test with its online software (Author Webpage) is used to calculate the exactP-value of comparing the proportions of TP (FP) between LMMA and LM networks.

Moreover, we group the nodes and connections in the LMMA network according to KEGG pathway definitions. Here, a Fisher's Exact Test for KEGG pathway identification described in DAVID (the Database for Annotation, Visualization and Integrated Discovery,Author Webpage) (Denniset al., 2003) is employed. We perform the KEGG pathway extraction for the LMMA network by statistically evaluating the gene-enrichment in the network, which is compared with the random chance. Fisher's Exact Test is adopted to determine whether the proportion of genes of the LMMA network in a KEGG pathway is significantly higher than that for the human genomic background genes.

3 CONSTRUCTING ANGIOGENESIS NETWORK: AN APPLICATION

Angiogenesis is the process of generating new capillary blood vessels, and a key issue for various disorders especially for a variety of solid tumors, vascular and rheumatoid diseases (Folkman, 1995). Few other processes have such a significant impact as angiogenesis on so many people worldwide. So far, the underlying biological rules of angiogenesis remain unclear. It is therefore critical to understand the molecular basis and biological pathways of angiogenesis (Carmeliet, 2003).

3.1 Reconstruction of LM- and LMMA-based angiogenesis networks

We have successfully reconstructed angiogenesis-oriented networks using both LM and LMMA approaches. First, we collect all the angiogenesis-related PubMed abstracts (till July 24, 2005) using ‘angiogenesis’ as a keyword. A total of 23 497 ‘angiogenesis’ related PubMed abstracts are indexed automatically. By putting HUGO glossary into this abstract pool, we obtained 1929 angiogenesis-related genes. A total of 9514 co-citations among these genes are extracted to construct the co-occurrence based angiogenesis network. We construct a LM-based network with a co-occurrence number of at least 1. This results in the network with the maximum gene interactions.

Next, we select the gene expression profiles of endothelial cells (EC) and solid tumors (ST) from SMD. It is believed that EC is responsible for the generation of blood vessels and ST is the majority of angiogenesis-dependent diseases (Carmeliet, 2003;Folkman, 1995). The EC microarray dataset contains 44 639 genes and 53 experiments, while the ST microarray dataset contains 39 726 genes and 119 experiments. The largest connected gene network in LM with its genes identified in the EC microarray dataset is called LM–EC network (1257 genes and 6761 connections). Similarly, the largest connected gene network in LM with its genes identified in the ST microarray dataset is called LM-ST network (1258 genes and 6884 connections). Accordingly, two LMMA-based angiogenesis networks, LMMA-EC and LMMA-ST are built (Table 1). Using the common genes as the baseline, we compare the LM-EC and the LM-ST networks with their corresponding LMMA-EC and LMMA-ST networks respectively.

Table 1

LM-based and LMMA-based angiogenesis network structures (Thp = 0.150)

LM-ECLMMA-ECLM-STLMMA-ST
Common nodesa1257103112581162
Connectionsa6761284868843935
Average path lengtha2.98103.61012.97413.3487
Average degreeb5.37382.27775.47223.1375
SSEc522.3206380.1941520.2295479.0745
SSmsec0.06690.0570.06140.0589
Microarray size1257*531257*531258*1191258*119
LM-ECLMMA-ECLM-STLMMA-ST
Common nodesa1257103112581162
Connectionsa6761284868843935
Average path lengtha2.98103.61012.97413.3487
Average degreeb5.37382.27775.47223.1375
SSEc522.3206380.1941520.2295479.0745
SSmsec0.06690.0570.06140.0589
Microarray size1257*531257*531258*1191258*119

aIn the largest connected sub-network.

bIn the whole network.

cAll nodes except for the isolated ones.

Table 1

LM-based and LMMA-based angiogenesis network structures (Thp = 0.150)

LM-ECLMMA-ECLM-STLMMA-ST
Common nodesa1257103112581162
Connectionsa6761284868843935
Average path lengtha2.98103.61012.97413.3487
Average degreeb5.37382.27775.47223.1375
SSEc522.3206380.1941520.2295479.0745
SSmsec0.06690.0570.06140.0589
Microarray size1257*531257*531258*1191258*119
LM-ECLMMA-ECLM-STLMMA-ST
Common nodesa1257103112581162
Connectionsa6761284868843935
Average path lengtha2.98103.61012.97413.3487
Average degreeb5.37382.27775.47223.1375
SSEc522.3206380.1941520.2295479.0745
SSmsec0.06690.0570.06140.0589
Microarray size1257*531257*531258*1191258*119

aIn the largest connected sub-network.

bIn the whole network.

cAll nodes except for the isolated ones.

Table 1 lists the network parameters for LM- and LMMA-based angiogenesis networks. It shows that redundant connections are eliminated after multivariate selection. The connections for LMMA-EC and LMMA-ST networks are much smaller than those of the predominant sub-networks of LM-EC and LM-ST, respectively. The elimination of connections results in a dramatic decrease of the average degrees of genes and a slightly reduction of node number and average path length. Moreover, as shown inFigure 1a and b, when comparing with the LM-random filtering networks derived from the permutation test, the LMMA network results in not only significantly larger cluster size (P < 0.0001, by Kolmogorov–Smirnov test), but also smaller path length of the largest cluster (P < 0.001 byt-test). The results demonstrate that LMMA is more stable and integrative than that of the LM-random filtering. Similar performance is observed with the LMMA-ST network (Supplementary Fig. S1). Thus, LMMA seems to maintain the backbone of the LM-based angiogenesis network.

(a) Comparison of the cluster sizes between the LMMA-EC network and the LM-EC random filtering networks (P < 0.0001, by Kolmogorov–Smirnov test). Other clusters are with <10 nodes (data not shown). (b) Comparison of the normalized average path length in the largest cluster between the LMMA-EC and the LM-EC random filtering networks (P < 0.001 by t test). (c) Relationship between the number of nodes and the degree of nodes in the whole LM angiogenesis network, LMMA-EC network, and LMMA-ST network (Thp = 0.150). The distribution of degrees in three networks follows a power law, obviously appearing to be scale-free.
Fig. 1

(a) Comparison of the cluster sizes between the LMMA-EC network and the LM-EC random filtering networks (P < 0.0001, by Kolmogorov–Smirnov test). Other clusters are with <10 nodes (data not shown).(b) Comparison of the normalized average path length in the largest cluster between the LMMA-EC and the LM-EC random filtering networks (P < 0.001 byt test).(c) Relationship between the number of nodes and the degree of nodes in the whole LM angiogenesis network, LMMA-EC network, and LMMA-ST network (Thp = 0.150). The distribution of degrees in three networks follows a power law, obviously appearing to be scale-free.

Figure 1c shows the relationship between the number of nodes and the degree of nodes in both LM- and LMMA-based angiogenesis networks. Obviously, the profiles follow a power-law distribution, indicating that the topological properties of both networks are scale-free (Jeonget al., 2000;Songet al., 2005). Recent studies (Hanet al., 2004;Ozieret al., 2003) show that centrally located, highly connected hub nodes in a scale-free network dominate network operation.

3.2 Comparison of LM- and LMMA-based angiogenesis networks

Top 15 hub genes in both LM-based and LMMA-based angiogenesis networks are listed inTable 2. Vascular endothelial growth factor (VEGF) is identified in both LM and LMMA networks as the hub gene with the highest degree. VEGF is known to be a multi-functional cytokine that plays an important role in vasculogenesis (Mukhopadhyay and Datta, 2004). The activation of endothelial cells by VEGF sets in motion a series of steps towards the creation of new blood vessels (Folkman, 1995).

Table 2

The top 15 hub genes identified in LM-based and LMMA-based angiogenesis networks (Thp = 0.150)

GeneDegree (LM-EC; LM-ST)aDegree (LMMA)P-valueb
ECSTECST
VEGF5545111700
NUDT6211512500
KDR1825111703.59e−06
SIAT7B15651441.19e−070
TNF149514600
IL8148262700
MVD126192800
CD3411151221.19e−070
EGF10432401.35e−130
IL697312400
CDH1796302700
HIF1A9321381.65e−120
SOS18714251.54e−110
CCM18351146.92e−060
PSME378183400
GeneDegree (LM-EC; LM-ST)aDegree (LMMA)P-valueb
ECSTECST
VEGF5545111700
NUDT6211512500
KDR1825111703.59e−06
SIAT7B15651441.19e−070
TNF149514600
IL8148262700
MVD126192800
CD3411151221.19e−070
EGF10432401.35e−130
IL697312400
CDH1796302700
HIF1A9321381.65e−120
SOS18714251.54e−110
CCM18351146.92e−060
PSME378183400

aDegree of these hub genes in both LM-EC and LM-ST networks are the same.

b  P-values are calculated fromF-test for the unit-network of each gene (Equation 3).

Table 2

The top 15 hub genes identified in LM-based and LMMA-based angiogenesis networks (Thp = 0.150)

GeneDegree (LM-EC; LM-ST)aDegree (LMMA)P-valueb
ECSTECST
VEGF5545111700
NUDT6211512500
KDR1825111703.59e−06
SIAT7B15651441.19e−070
TNF149514600
IL8148262700
MVD126192800
CD3411151221.19e−070
EGF10432401.35e−130
IL697312400
CDH1796302700
HIF1A9321381.65e−120
SOS18714251.54e−110
CCM18351146.92e−060
PSME378183400
GeneDegree (LM-EC; LM-ST)aDegree (LMMA)P-valueb
ECSTECST
VEGF5545111700
NUDT6211512500
KDR1825111703.59e−06
SIAT7B15651441.19e−070
TNF149514600
IL8148262700
MVD126192800
CD3411151221.19e−070
EGF10432401.35e−130
IL697312400
CDH1796302700
HIF1A9321381.65e−120
SOS18714251.54e−110
CCM18351146.92e−060
PSME378183400

aDegree of these hub genes in both LM-EC and LM-ST networks are the same.

b  P-values are calculated fromF-test for the unit-network of each gene (Equation 3).

Table 2 lists theP-values for the unit-networks of 15 hub genes derived from theF-test. We calculate theP-values for different networks. The results show that the LMMA-based angiogenesis network is more reliable than the LM-based one.Figure 2 illustrates LOOCV gene expression values for the unit-networks of VEGF, EGF, TNF and IL6, respectively. The MSE values of the LMMA unit-networks are smaller than those of the LM, indicating that the LMMA network fits better to the microarray data of angiogenesis. Meanwhile,Table 1 lists the SSE and the SSmse scores resulted from LM- and LMMA-based networks. The reduced errors in LMMA again suggest the improvement of the LMMA-based networks.

Gene expression values derived from the leave one out cross validation approach for four hub genes VEGF, EGF, TNF and IL6 in both LM-EC and LMMA-EC networks. A total of 53 experiments in EC microarray dataset are tested.
Fig. 2

Gene expression values derived from the leave one out cross validation approach for four hub genes VEGF, EGF, TNF and IL6 in both LM-EC and LMMA-EC networks. A total of 53 experiments in EC microarray dataset are tested.

Figure 3a and b shows the precision and the recall rates of both the LM- and LMMA-based angiogenesis networks at different threshold Thp. The LMMA-based network exhibits higher precisions and lower recalls than the LM-based one. On the other hand, the recall of LMMA-based network increases gradually with the increasing thresholds. We select a suitable threshold, Thp = 0.150, in the LMMA-based EC and ST networks.

Comparison of (a) precision and (b) recall in LM, LMMA-EC and LMMA-ST angiogenesis networks at different thresholds. Here LM represents LM-EC and LM-ST since genes in LM-EC and LM-ST are identical when mapping to KEGG. The X axis denotes the P-value thresholds calculated from F-test in the step of statistical multivariate selection. Both the precision and the recall rates are calculated against KEGG.
Fig. 3

Comparison of (a) precision and (b) recall in LM, LMMA-EC and LMMA-ST angiogenesis networks at different thresholds. Here LM represents LM-EC and LM-ST since genes in LM-EC and LM-ST are identical when mapping to KEGG. TheX axis denotes theP-value thresholds calculated fromF-test in the step of statistical multivariate selection. Both the precision and the recall rates are calculated against KEGG.

Both LM-EC and LM-ST networks have the same 474 genes corresponding to 355 KO entities covered by KEGG database. When the LM-based network is refined by LMMA, the proportion of the TP rates increases significantly, while the proportion of FP rates decreases evidently.Table 3 shows the statistical results between LM- and LMMA-based angiogenesis networks, which demonstrate that the LMMA approach significantly eliminates the false positive relations.

Table 3

The true positive (TP), false positive (FP) and the statisticalP-values of TP/FP ratio (by Fisher Exact Test) between LM and LMMA networksa

KEGGNetworkThp0.0250.0500.0750.1000.1250.1500.1750.200
KGLMTP237237237237237237237237
LMFP10481048104810481048104810481048
LMMA-ECTP394956768398108111
LMMA-ECFP121175241267303349417458
TP/FP (LMMA-EC versus LM)P-value0.0170040.0346790.0647850.018370.023720.015260.030120.04408
LMMA-STTP718393101111130137135
LMMA-STFP223300350392436471499513
TP/FP (LMMA-ST versus LM)P-value0.00569280.0216720.027620.0328330.0335520.0131960.0132740.021953
KOLMTP139139139139139139139139
LMFP170170170170170170170170
LMMA-ECTP2933375257707477
LMMA-ECFP1934404446556063
TP/FP (LMMA-EC versus LM)P-value0.0172790.0876760.0902940.0271460.0173720.00979820.0116670.011803
LMMA-STTP4354626773869087
LMMA-STFP3846526469808287
TP/FP (LMMA-ST versus LM)P-value0.0428570.0269120.0201020.0413360.0362140.0280950.0230830.04316
KEGGNetworkThp0.0250.0500.0750.1000.1250.1500.1750.200
KGLMTP237237237237237237237237
LMFP10481048104810481048104810481048
LMMA-ECTP394956768398108111
LMMA-ECFP121175241267303349417458
TP/FP (LMMA-EC versus LM)P-value0.0170040.0346790.0647850.018370.023720.015260.030120.04408
LMMA-STTP718393101111130137135
LMMA-STFP223300350392436471499513
TP/FP (LMMA-ST versus LM)P-value0.00569280.0216720.027620.0328330.0335520.0131960.0132740.021953
KOLMTP139139139139139139139139
LMFP170170170170170170170170
LMMA-ECTP2933375257707477
LMMA-ECFP1934404446556063
TP/FP (LMMA-EC versus LM)P-value0.0172790.0876760.0902940.0271460.0173720.00979820.0116670.011803
LMMA-STTP4354626773869087
LMMA-STFP3846526469808287
TP/FP (LMMA-ST versus LM)P-value0.0428570.0269120.0201020.0413360.0362140.0280950.0230830.04316

aHere LM represents LM-EC and LM-ST since genes in LM-EC and LM-ST are identical when mapping to KEGG database. KG = KEGG Gene; KO = KEGG Orthology.

Table 3

The true positive (TP), false positive (FP) and the statisticalP-values of TP/FP ratio (by Fisher Exact Test) between LM and LMMA networksa

KEGGNetworkThp0.0250.0500.0750.1000.1250.1500.1750.200
KGLMTP237237237237237237237237
LMFP10481048104810481048104810481048
LMMA-ECTP394956768398108111
LMMA-ECFP121175241267303349417458
TP/FP (LMMA-EC versus LM)P-value0.0170040.0346790.0647850.018370.023720.015260.030120.04408
LMMA-STTP718393101111130137135
LMMA-STFP223300350392436471499513
TP/FP (LMMA-ST versus LM)P-value0.00569280.0216720.027620.0328330.0335520.0131960.0132740.021953
KOLMTP139139139139139139139139
LMFP170170170170170170170170
LMMA-ECTP2933375257707477
LMMA-ECFP1934404446556063
TP/FP (LMMA-EC versus LM)P-value0.0172790.0876760.0902940.0271460.0173720.00979820.0116670.011803
LMMA-STTP4354626773869087
LMMA-STFP3846526469808287
TP/FP (LMMA-ST versus LM)P-value0.0428570.0269120.0201020.0413360.0362140.0280950.0230830.04316
KEGGNetworkThp0.0250.0500.0750.1000.1250.1500.1750.200
KGLMTP237237237237237237237237
LMFP10481048104810481048104810481048
LMMA-ECTP394956768398108111
LMMA-ECFP121175241267303349417458
TP/FP (LMMA-EC versus LM)P-value0.0170040.0346790.0647850.018370.023720.015260.030120.04408
LMMA-STTP718393101111130137135
LMMA-STFP223300350392436471499513
TP/FP (LMMA-ST versus LM)P-value0.00569280.0216720.027620.0328330.0335520.0131960.0132740.021953
KOLMTP139139139139139139139139
LMFP170170170170170170170170
LMMA-ECTP2933375257707477
LMMA-ECFP1934404446556063
TP/FP (LMMA-EC versus LM)P-value0.0172790.0876760.0902940.0271460.0173720.00979820.0116670.011803
LMMA-STTP4354626773869087
LMMA-STFP3846526469808287
TP/FP (LMMA-ST versus LM)P-value0.0428570.0269120.0201020.0413360.0362140.0280950.0230830.04316

aHere LM represents LM-EC and LM-ST since genes in LM-EC and LM-ST are identical when mapping to KEGG database. KG = KEGG Gene; KO = KEGG Orthology.

3.3 Pathway extraction from networks

The statistical significance of pathways in LMMA-based angiogenesis networks is derived from Fisher Exact Test. The results are shown inTable 4 and graphically represented by an example, the EGF (epidermal growth factor) unit-network, inFigure 4. See more in Discussion below. Although many co-occurrence relations are eliminated from the LM-based network, main pathway information, such as the focal adhesion pathway, signaling pathways of TGF-beta, MAPK, Calcium and Wnt, is observed in the LMMA-based network with significantP-values. Thus, pathways in LMMA-based network are significantly enriched.

An EGF (epidermal growth factor) unit-network derived respectively from the co-occurrence literature mining and the LMMA approaches. A total of 21 genes co-cited with EGF in LM are removed by LMMA. By manually revisiting the PubMed records, these 21 genes are found in false relations with EGF resulted from homonymic mis-matches and confused lexical orders (in the blue pane), unknown relations (in the purple pane) and isolated relations (in the yellow pane). A Neato program in the Graphviz software (AT&T; ) is adopted to visualize the constructed network.
Fig. 4

An EGF (epidermal growth factor) unit-network derived respectively from the co-occurrence literature mining and the LMMA approaches. A total of 21 genes co-cited with EGF in LM are removed by LMMA. By manually revisiting the PubMed records, these 21 genes are found in false relations with EGF resulted from homonymic mis-matches and confused lexical orders (in the blue pane), unknown relations (in the purple pane) and isolated relations (in the yellow pane). A Neato program in the Graphviz software (AT&T;Author Webpage) is adopted to visualize the constructed network.

Table 4

KEGG pathways with significantP-values in LMMA-based angiogenesis networks (Thp = 0.150)a

LMMA-EC (KG)LMMA-EC (KO)LMMA-ST (KG)LMMA-ST (KO)
Focal adhesion pathway0.000871.09e − 070.000842.84e − 08
MAPK signaling pathway0.028250.0133380.0157790.00910
Adherens junction2.14e − 221.31e − 133.08e − 242.19e − 14
TGF-beta signaling pathway0.000100.005408.76e − 060.00585
Insulin signaling pathway1.27e − 060.002641.33e − 070.00225
Calcium signaling pathway0.000110.003731.86e − 086.66e − 05
Wnt signaling pathway0.030100.00548
Regulation of actin cytoskeleton0.011000.00020
Cytokine-cytokine receptor interaction5.13E-099.52E-16
Apoptosis0.001270.03001
Cell cycle0.045940.02220
LMMA-EC (KG)LMMA-EC (KO)LMMA-ST (KG)LMMA-ST (KO)
Focal adhesion pathway0.000871.09e − 070.000842.84e − 08
MAPK signaling pathway0.028250.0133380.0157790.00910
Adherens junction2.14e − 221.31e − 133.08e − 242.19e − 14
TGF-beta signaling pathway0.000100.005408.76e − 060.00585
Insulin signaling pathway1.27e − 060.002641.33e − 070.00225
Calcium signaling pathway0.000110.003731.86e − 086.66e − 05
Wnt signaling pathway0.030100.00548
Regulation of actin cytoskeleton0.011000.00020
Cytokine-cytokine receptor interaction5.13E-099.52E-16
Apoptosis0.001270.03001
Cell cycle0.045940.02220

a  P-values are calculated from Fisher Exact Test. KG = KEGG Gene. KO = KEGG Orthology.

Table 4

KEGG pathways with significantP-values in LMMA-based angiogenesis networks (Thp = 0.150)a

LMMA-EC (KG)LMMA-EC (KO)LMMA-ST (KG)LMMA-ST (KO)
Focal adhesion pathway0.000871.09e − 070.000842.84e − 08
MAPK signaling pathway0.028250.0133380.0157790.00910
Adherens junction2.14e − 221.31e − 133.08e − 242.19e − 14
TGF-beta signaling pathway0.000100.005408.76e − 060.00585
Insulin signaling pathway1.27e − 060.002641.33e − 070.00225
Calcium signaling pathway0.000110.003731.86e − 086.66e − 05
Wnt signaling pathway0.030100.00548
Regulation of actin cytoskeleton0.011000.00020
Cytokine-cytokine receptor interaction5.13E-099.52E-16
Apoptosis0.001270.03001
Cell cycle0.045940.02220
LMMA-EC (KG)LMMA-EC (KO)LMMA-ST (KG)LMMA-ST (KO)
Focal adhesion pathway0.000871.09e − 070.000842.84e − 08
MAPK signaling pathway0.028250.0133380.0157790.00910
Adherens junction2.14e − 221.31e − 133.08e − 242.19e − 14
TGF-beta signaling pathway0.000100.005408.76e − 060.00585
Insulin signaling pathway1.27e − 060.002641.33e − 070.00225
Calcium signaling pathway0.000110.003731.86e − 086.66e − 05
Wnt signaling pathway0.030100.00548
Regulation of actin cytoskeleton0.011000.00020
Cytokine-cytokine receptor interaction5.13E-099.52E-16
Apoptosis0.001270.03001
Cell cycle0.045940.02220

a  P-values are calculated from Fisher Exact Test. KG = KEGG Gene. KO = KEGG Orthology.

4 DISCUSSION AND CONCLUSION

High false positive rate is a well-known problem in most high-throughput methods for detecting molecular interactions (von Meringet al., 2002). In this work, we developed a LMMA approach to construct networks based on both existing knowledge (literature) and experimental information (microarray). Such approach performs multivariate analysis to modify the literature-derived holistic network using subject-oriented gene expression profiles. To analyze the hidden network buried in microarray datasets, two aspects make it necessary to construct the LM-based network beforehand. First, it is not advisable to construct the network directly from thousands of candidate variables if prior knowledge about the network is not available. Second, the number of variables should not exceed the number of observations (i.e. microarray experiments); otherwise the results will be falsely optimized. Thus, a certain number of arrays are required in LMMA for multivariate selection.

As an application, PubMed literatures and microarray datasets from both the EC and the ST are selected respectively to reconstruct the LMMA network for angiogenesis. The LMMA approach results in a larger cluster size, and a smaller average path length when comparing with a LM-random filtering, while preserves similar topological properties comparing with the LM-based network. Therefore, it indicates that LMMA can eliminate redundant relations while maintain the backbone of the LM-based network.

Angiogenesis networks constructed by LM and LMMA are tested for accuracy on confident sets of interactions. Both precision and recall rates are calculated against KEGG, one commonly used benchmark. We show that LMMA significantly improves the precision rate when comparing with LM alone. On the other hand, asBorket al. (2004) reported, the choice of benchmark set is still a knotty problem because the agreement among different benchmark sets is surprisingly poor. For example, less than half of all pairs in the KEGG benchmark set are present in the Gene Ontology biological process benchmark set (Borket al., 2004). Moreover, it is commonly known that co-occurrence in literature often describes or reflects more general relationships between genes. Some of these may be implicit and/or so novel that they have not yet reached the status of common knowledge or accepted fact often required for inclusion in databases such as KEGG. Two aspects mentioned above may be the reason why both the LM and LMMA approaches resulted in a low recall rate (Fig. 3) when calculated against KEGG. Even so, we still show that the integration with microarray data can significantly increase the reliability of gene co-occurrence networks extracted from the literature.

To demonstrate how LMMA reduces the false positive rate and improves the precision, we select, EGF (epidermal growth factor), a key player in angiogenesis as an example. As shown inFigure 4, LMMA totally removes 21 EGF false related genes from LM-based EGF unit-network. First, LMMA deletes five mis-matched genes in LM: SC, SF, AA and PC are abbreviations of stem cells, scatter factor, arachidonic acid or anaplastic astrocytoma, and prostate cancer respectively; IL8RA (interleukin 8 receptor, alpha) is misinterpretated by IL8 and EGF receptor in the lexical order. Second, LMMA cancels eight genes with unknown relations (few co-citation) to EGF in LM: CCR6, FGF16, MAP3K8 and EGF are co-cited in only one PubMed sentence recorded in a gene expression experiment (Gerritsenet al., 2003); the same as IL11, IL10, IL3, IL4 and CCR2. Third, LMMA removes eight genes that seldom have co-occurrences with EGF even by using their alias: NRG2, Scube1, NPY6R, ZNF78L2, IFI44, RNU106, AXPC1 and ANGPTL6. Thus, our results indicate that common errors, which lead to the false relations in LM, can be effectively removed by the LMMA approach.

Moreover, there are 11 most statistically significant KEGG pathways in the LMMA-based angiogenesis networks. SeeTable 4 for the detailedP-value of each pathway calculated by Fisher Exact Test. Among them the focal adhesion pathway, the adherens junction pathway and the regulation of actin cytoskeleton pathway contribute to the complex processes such as endothelial cell migration, morphogenesis and angiogenesis (Bixet al., 2004). TGF-beta regulates angiogenesis by affecting proliferation, differentiation and migration of endothelial cells (Lomnytskaet al., 2004). Insulin signaling pathway is implicated in cellular mitogenesis, angiogenesis, tumor cell survival and tumorigenesis (Cohenet al., 2005). Many Wnt proteins act through a canonical, beta-catenin signaling pathway (Masckauchanet al., 2005) and are able to control diverse biological processes, such as cell differentiation, proliferation (Masckauchanet al., 2005) and vasculature (Goodwin and D'Amore, 2002). Among the intracellular kinases implicated in angiogenesis, p38 MAPK has been shown to transduce signals critical for vascular remodeling and maturation (Zhuet al., 2003). Ca(2+) signaling is involved in virtually all cellular processes (Munaronet al., 2004). In addition, a variety of stimulatory cytokines, such as tumor necrosis factor (TNF)-alpha, interleukin (IL)-1, -6 and interferon (IFN)-gamma, and growth factors can promote the development of functional and structural vascular changes (Kofleret al., 2005). Therefore, pathway information in the LMMA-based angiogenesis network suggests that multiple pathway interactions boost the activity of either EC or ST, which are in accordance with recent reports (Mukhopadhyay and Datta, 2004;McCarty, 2004). Since multiple pathways are dysfunctional in angiogenesis related disorders such as cancers, a multifocal signal modulation therapy is proposed recently (McCarty, 2004). And LMMA network will be helpful for analyzing the interactions of multiple pathways in such complex biological processes.

As for the usability of LMMA, this system is flexible in application to any biological topic if the related literature and microarray data are available. Note that to construct a LMMA network, the number of all candidate variables (genes) should be controlled in a proper size, and the accuracy of the LMMA approach increases with the increasing number of candidate variables in a certain scope. For the LMMA-based angiogenesis network, it summarizes large amounts of angiogenesis related literatures and high-throughput microarray data. The LMMA approach enables researchers not only to keep up-to-date with all the relevant literature on specialized biological topics, but also to make sense of the relevant large-scale microarray dataset. Meanwhile, the LMMA approach serves as a useful tool for constructing specific biological network and experimental design. Thus, LMMA acts as a valuable computer representation of the known angiogenesis-related pathways, as well as the interactions among multiple pathways. Such representation will enable a systemic recognition for angiogenesis in the context of complex gene interactions, which is also helpful for studying the regulation of various complex biological, physiological and pathological systems. In the ‘omics’ field, the LMMA approach can be further explored to study protein–protein and other interactions.

The authors would like to express their great appreciation to B. Li (Boston University, USA), X. G. Zhang and C. Zhang in their lab for helpful discussions and comments. The authors would like to acknowledge the financial support from FANEDD (No. 200366), the Key Project of Chinese MOE (No. 104009) and the Basic Research Foundation of TNList.

Conflict of Interest: none declared.

REFERENCES

Barabasi
A.L.
Oltvai
Z.N.
,
Network biology: understanding the cell's functional organization
Nat. Rev. Genet.
,
2004
, vol.
5
(pg.
101
-
113
)
Bix
G.
et al.
,
Endorepellin causes endothelial cell disassembly of actin cytoskeleton and focal adhesions through alpha2beta1 integrin
J. Cell Biol.
,
2004
, vol.
166
(pg.
97
-
109
)
Bork
P.
et al.
,
Protein interaction networks from yeast to human
Curr. Opin. Struct. Biol.
,
2004
, vol.
14
(pg.
292
-
299
)
Carmeliet
P.
,
Angiogenesis in health and disease
Nat. Med.
,
2003
, vol.
9
(pg.
653
-
660
)
Cary
M.P.
et al.
,
Pathway information for systems biology
FEBS Lett.
,
2005
, vol.
579
(pg.
1815
-
1820
)
Cohen
B.D.
et al.
,
Combination therapy enhances the inhibition of tumor growth with the fully human anti-type 1 insulin-like growth factor receptor monoclonal antibody CP-751,871
Clin. Cancer Res.
,
2005
, vol.
11
(pg.
2063
-
2073
)
Dennis
G.
Jr
et al.
,
DAVID: Database for Annotation, Visualization, and Integrated Discovery
Genome Biol.
,
2003
, vol.
4
pg.
R60
D'haeseleer
P.
et al.
,
Linear modeling of mRNA expression levels during CNS development and injury
Pac. Symp. Biocomput.
,
1999
(pg.
41
-
52
)
D'haeseleer
P.
et al.
,
Genetic network inference: from co-expression clustering to reverse engineering
Bioinformatics
,
2000
, vol.
16
(pg.
707
-
726
)
de Jong
H.
,
Modeling and simulation of genetic regulatory systems: a literature review
J. Comput. Biol.
,
2002
, vol.
9
(pg.
67
-
103
)
Ding
J.
et al.
,
Mining Medline: abstracts, sentences, or phrases?
Pac. Symp. Biocomput.
,
2002
, vol.
7
(pg.
326
-
337
)
Folkman
J.
,
Angiogenesis in cancer, vascular, rheumatoid and other diseases
Nat. Med.
,
1995
, vol.
1
(pg.
27
-
31
)
Ge
H.
et al.
,
Correlation between transcriptome and interactome mapping data fromSaccharomyces cerevisiae
Nat. Genet.
,
2001
, vol.
29
(pg.
482
-
486
)
Gerritsen
M.E.
et al.
,
Using gene expression profiling to identify the molecular basis of the synergistic actions of hepatocyte growth factor and vascular endothelial growth factor in human endothelial cells
Br. J. Pharmacol.
,
2003
, vol.
140
(pg.
595
-
610
)
Goodwin
A.M.
D'Amore
P.A.
,
Wnt signaling in the vasculature
Angiogenesis
,
2002
, vol.
5
(pg.
1
-
9
)
Han
J.D.
et al.
,
Evidence for dynamically organized modularity in the yeast protein–protein interaction network
Nature
,
2004
, vol.
430
(pg.
88
-
93
)
Jenssen
T.K.
et al.
,
A literature network of human genes for high-throughput analysis of gene expression
Nat. Genet.
,
2001
, vol.
28
(pg.
21
-
28
)
Jeong
H.
et al.
,
The large-scale organization of metabolic networks
Nature
,
2000
, vol.
407
(pg.
651
-
654
)
Kanehisa
M.
Goto
S.
,
KEGG: Kyoto encyclopedia of genes and genomes
Nucl. Acids. Res.
,
2000
, vol.
28
(pg.
27
-
30
)
Kofler
S.
et al.
,
Role of cytokines in cardiovascular diseases: a focus on endothelial responses to inflammation
Clin. Sci (Lond).
,
2005
, vol.
108
(pg.
205
-
213
)
Küffner
R.
et al.
,
Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contexts
Bioinformatics
,
2005
, vol.
21
(pg.
ii259
-
ii267
)
Lachenbruch
P.A.
Mickey
M.R.
,
Estimation of error rates in discriminant analysis
Technometrics
,
1968
, vol.
10
(pg.
1
-
11
)
Le Phillip
P.
et al.
,
Using prior knowledge to improve genetic network reconstruction from microarray data
In Silico Biol.
,
2004
, vol.
4
(pg.
335
-
353
)
Liang
S.
et al.
,
Reveal a general reverse engineering algorithm for inference of genetic network architectures
Pac. Symp. Biocomput.
,
1998
, vol.
3
(pg.
18
-
29
)
Lomnytska
M.
et al.
,
Transforming growth factor-beta1-regulated proteins in human endothelial cells identified by two-dimensional gel electrophoresis and mass spectrometry
Proteomics
,
2004
, vol.
4
(pg.
995
-
1006
)
Mao
X.
et al.
,
Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary
Bioinformatics
,
2005
, vol.
21
(pg.
3787
-
3793
)
Masckauchan
T.N.
et al.
,
Wnt/beta-catenin signaling induces proliferation, survival and interleukin-8 in human endothelial cells
Angiogenesis
,
2005
, vol.
8
(pg.
43
-
51
)
McCarty
M.F.
,
Targeting multiple signaling pathways as a strategy for managing prostate cancer: multifocal signal modulation therapy
Integr. Cancer Ther.
,
2004
, vol.
3
(pg.
349
-
380
)
Mukhopadhyay
D.
Datta
K.
,
Multiple regulatory pathways of vascular permeability factor/vascular endothelial growth factor (VPF/VEGF) expression in tumors
Semin. Cancer Biol.
,
2004
, vol.
14
(pg.
123
-
130
)
Munaron
L.
et al.
,
Blocking Ca2+entry: a way to control cell proliferation
Curr. Med. Chem.
,
2004
, vol.
11
(pg.
1533
-
1543
)
Ozier
O.
et al.
,
Global architecture of genetic interactions on the protein network
Nat. Biotechnol.
,
2003
, vol.
21
(pg.
490
-
491
)
Segal
M.R.
et al.
,
Regression approaches for microarray data analysis
J. Comput. Biol.
,
2003
, vol.
10
(pg.
961
-
980
)
Shatkay
H.
Feldman
R.
,
Mining the biomedical literature in the genomic era: an overview
J. Comput. Biol.
,
2003
, vol.
10
(pg.
821
-
855
)
Sherlock
G.
et al.
,
The Stanford Microarray Database
Nucleic Acids. Res.
,
2001
, vol.
29
(pg.
152
-
155
)
Song
C.
et al.
,
Self-similarity of complex networks
Nature
,
2005
, vol.
433
(pg.
392
-
395
)
Stapley
B.J.
Benoit
G.
,
Information retrieval and visualization from co-occurrences of gene names in Medline abstracts
Pac. Symp. Biocomput.
,
2000
(pg.
529
-
540
)
Troyanskaya
O.
et al.
,
Missing value estimation methods for DNA microarrays
Bioinformatics
,
2001
, vol.
17
(pg.
520
-
525
)
van Someren
E.P.
et al.
,
Genetic network modeling
Pharmacogenomics
,
2002
, vol.
3
(pg.
507
-
525
)
von Mering
C.
et al.
,
Comparative assessment of large scale data sets of protein–protein interactions
Nature
,
2002
, vol.
417
(pg.
399
-
403
)
West
M.
et al.
,
Predicting the clinical status of human breast cancer using gene expression profiles
Proc. Natl Acad. Sci. USA
,
2001
, vol.
98
(pg.
11462
-
11467
)
Wu
L.J.
Li
S.
,
Combined literature mining and gene expression analysis for modeling neuro-endocrine-immune interactions
Lect. Notes Comput. Sci.
,
2005
, vol.
3645
(pg.
31
-
40
)
Zhang
C.
Li
S.
,
Modeling of neuro-endocrine-immune network via subject oriented literature mining
Proc. BGRS
,
2004
, vol.
2
(pg.
167
-
170
)
Zhu
S.
et al.
,
A probabilistic model for mining implicit ‘chemical compound-gene’ relations from literature
Bioinformatics
,
2005
, vol.
21
(pg.
ii245
-
ii251
)
Zhu
W.H.
et al.
,
Requisite role of p38 MAPK in mural cell recruitment during angiogenesis in the rat aorta model
J. Vasc. Res.
,
2003
, vol.
40
(pg.
140
-
148
)

Author notes

Associate Editor: Alfonso Valencia

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email:[email protected]
Advertisement

Citations

Views

2,189

Altmetric

Metrics
Total Views2,189
1,596Pageviews
593PDF Downloads
Since 12/1/2016
Month:Total Views:
December 20163
January 20172
February 20175
March 201714
April 20172
May 20178
June 20179
July 201713
August 20178
September 20179
October 20171
November 20179
December 201720
January 201821
February 201818
March 201834
April 201834
May 201834
June 201835
July 201817
August 201832
September 201823
October 20189
November 201836
December 201826
January 201921
February 201926
March 201929
April 201932
May 201936
June 201923
July 201925
August 201937
September 201923
October 201915
November 201930
December 201924
January 202024
February 202029
March 20206
April 202035
May 20209
June 202021
July 202014
August 202018
September 202029
October 202018
November 202019
December 202017
January 202131
February 202116
March 202152
April 202120
May 202116
June 202125
July 202124
August 202124
September 202117
October 202124
November 202121
December 202116
January 202214
February 202232
March 202219
April 202221
May 202228
June 202238
July 202225
August 202250
September 202242
October 202236
November 202228
December 202215
January 202327
February 202324
March 202328
April 202314
May 20235
June 20235
July 202316
August 202326
September 202314
October 202317
November 202327
December 202334
January 202413
February 202442
March 202424
April 202419
May 202427
June 202430
July 202425
August 202420
September 202429
October 202419
November 202415
December 202428
January 20259
February 202515
March 202521
Citations
Powered by Dimensions
72Web of Science
Altmetrics
×

Email alerts

New journal issues alert

To set up an email alert, pleasesign in to your personal account, orregister

Sign in

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Journal article activity alert

To set up an email alert, pleasesign in to your personal account, orregister

Sign in

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD
Having trouble contacting the network. Please try again in a moment or two.
Oxford University Press
Journals Career Network
Advertisement
Advertisement
Advertisement
Bioinformatics
  • Online ISSN 1367-4811
  • Copyright © 2025 Oxford University Press
Close
Close
This Feature Is Available To Subscribers Only

Sign In orCreate an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close

[8]ページ先頭

©2009-2025 Movatter.jp