| Type: | Package |
| Title: | Permutation Testing Network Module Preservation Across Datasets |
| Version: | 1.2.9 |
| BugReports: | https://github.com/sritchie73/NetRep/issues |
| Description: | Functions for assessing the replication/preservation of a network module's topology across datasets through permutation testing; Ritchie et al. (2015) <doi:10.1016/j.cels.2016.06.012>. |
| License: | GPL-2 |
| Depends: | R (≥ 3.6), methods |
| Imports: | foreach, Rcpp (≥ 0.11), statmod, RhpcBLASctl, abind,RColorBrewer, utils, stats, graphics, grDevices |
| Suggests: | bigmemory, testthat, knitr, rmarkdown |
| LinkingTo: | Rcpp, BH, RcppArmadillo (≥ 0.4) |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Packaged: | 2025-10-23 14:58:56 UTC; sr827 |
| Author: | Scott Ritchie [aut, cre] (0000-0002-8454-9548) |
| Maintainer: | Scott Ritchie <sritchie73@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-23 15:20:08 UTC |
Fast permutation procedure for testing network module replication
Description
Functions for assessing the replication/preservation of a network module's topology across datasets through permutation testing. This is suitable for networks that can be meaningfully inferred from multiple datasets. These include gene coexpression networks, protein-protein interaction networks, andmicrobial interaction networks. Modules within these networks consist of groups of nodes that are particularly interesting: for example a group of tightly connected genes associated with a disease, groups of genes annotatedwith the same term in the Gene Ontology database, or groups of interactingmicrobial species, i.e. communities. Application of this method can answerquestions such as; (1) do the relationships between genes in a module replicate in an independent cohort? (2) are these gene coexpression modulespreserved across tissues or tissue specific? (3) are these modules conservedacross species? (4) are microbial communities preserved across multiple spatiallocations?
Details
The main function for this package ismodulePreservation. Several functions for downstream are also provided:networkProperties for calculating the topological properties of a module, andplotModule for visualising a module.
Author(s)
Maintainer: Scott Ritchiesritchie73@gmail.com (0000-0002-8454-9548)
See Also
Useful links:
Report bugs athttps://github.com/sritchie73/NetRep/issues
Combine results of multiple permutation procedures
Description
This function takes the output from multiple runs ofmodulePreservation, combines their results, and returns a newset of permutation test P-values. This is useful for parallelising calculations across multiple machines.
Usage
combineAnalyses(pres1, pres2)Arguments
pres1,pres2 | lists returned by |
Details
The calls to 'modulePreservation' must have been identical for both inputlists, with the exception of the number of threads used and the number ofpermutations calculated.
Value
A nested list containing the same elements asmodulePreservation.
Examples
data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)pres1 <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, nPerm=1000, discovery="discovery", test="test", nThreads=2)pres2 <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, nPerm=1000, discovery="discovery", test="test", nThreads=2)combined <- combineAnalyses(pres1, pres2)Template parameters
Description
Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.
Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
verbose | logical; should progress be reported? Default is |
The 'disk.matrix' class
Description
A'disk.matrix' contains a file path to a matrix stored on disk,along with meta data for how to read that file. This allowsNetRepto load datasets into RAM only when required, i.e. one at a time. This significantly reduces the memory usage of R when analysing large datasets.'disk.matrix' objects may be supplied instead of'matrix' objects in the input list arguments'network','data', and'correlation', which are common to most ofNetRep's functions.
Usage
attach.disk.matrix(file, serialized = TRUE, ...)serialize.table(file, ...)is.disk.matrix(x)as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'disk.matrix'as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'matrix'as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'ANY'as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'disk.matrix'as.matrix(x)## S4 method for signature 'disk.matrix'show(object)Arguments
file | for |
serialized | determines how the matrix will be loaded from disk into Rby |
... | arguments to be used by |
x | for |
serialize | determines how the matrix is saved to disk by |
object | a |
Details
Matrices may either be stored as regular table files that can be read byread.table, or as serialized R objects that can be read byreadRDS. Serialized objects are much faster to load, but cannot be read by other programs.
Theattach.disk.matrix function creates adisk.matrix objectfrom a file path. Theas.matrix function will load the data from diskinto the R session as a regularmatrix object.
Theas.disk.matrix function converts a matrix into adisk.matrix by saving its contents to the specifiedfile. Theserialize argument determines whether the data is stored as a serialized R object or as a tab-separated file (i.e.sep="\t"). Werecommend storing the matrix as a serialized R object unless disk space isa concern. More control over the storage format can be obtained by usingsaveRDS orwrite.table directly.
Theserialize.matrix function converts a file in table format to aserialized R object with the same file name, but with the ".rds" extension.
Value
Adisk.matrix object (attach.disk.matrix,as.disk.matrix),amatrix (as.matrix), the file path to a serialized matrix(serialize.table), or aTRUE orFALSE indicating whether an object is adisk.matrix (is.disk.matrix).
Slots
filethe name of the file where the matrix is saved.
read.funceither
"read.table"or"readRDS".func.argsa list of arguments to be supplied to the
'read.func'.
Warning
attach.disk.matrix does not check whether the specified file can beread into R.as.matrix will fail and throw an error if this is thecase.
Example data
Description
Example gene coexpression networks inferred from two independent datasets to demonstrate the usage of package functions.
Usage
data("NetRep")Format
- "discovery_network"
a
matrixwith 150 columns and 150 rows containing the network edge weights encoding the interaction strength between each pair ofgenes in thediscovery dataset.- "discovery_data"
a
matrixwith 150 columns (genes) and 30 rows (samples) whose entries correspond to the expression level of each gene in each samplein thediscovery dataset.- "discovery_correlation"
a
matrixwith 150 columns and 150 rows containing the correlation-coefficients between each pair of genes calculated from the"discovery_data"matrix.- \"module_labels"
a named
vectorwith 150 entries containing the module assignmentfor each gene as identified in thediscovery dataset.- "test_network"
a
matrixwith 150 columns and 150 rows containing the network edge weights encoding the interaction strength between each pair ofgenes in thetest dataset.- "test_data"
a
matrixwith 150 columns (genes) and 30 rows (samples) whose entries correspond to the expression level of each gene in each samplein thetest dataset.- "test_correlation"
a
matrixwith 150 columns and 150 rows containing the correlation-coefficients between each pair of genes calculated from the"test_data"matrix.
An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.
An object of classmatrix (inherits fromarray) with 30 rows and 150 columns.
An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.
An object of classnumeric of length 150.
An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.
An object of classmatrix (inherits fromarray) with 30 rows and 150 columns.
An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.
Details
Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one vector for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
This data is used to provide concrete examples of the usage of these arguments in each package function.
Simulation details
Thediscovery gene expression dataset ("discovery_data")containing 30 samples and 150 genes was simulated to contain four distinctmodules of sizes 20, 25, 30, and 35 genes. Data for each module weresimulated as:
G^{(w)}_{simulated} = E^{(w)} r_i + \sqrt{1 - r^2_i} \epsilon
WhereE^{(w)} is the simulated module'ssummary vector,r is the simulated module'snode contributions for each gene,and\epsilon is the error term drawn from a standard normal distribution.E^{(w)} andr were simulated by bootstrapping (sampling with replacement) samples and genes from the corresponding vectors in modules 63, 51, 57, and 50 discovered in the liver tissue gene expression data from apublicly available mouse dataset (see reference(1) for details on the dataset and network discovery). The remaining 40 genes that were not part of any module were simulated by randomly selecting 40 liver genes and bootstrapping 30 samples and adding the noise term,\epsilon. Avector of module assignments was created ("module_labels") in whicheach gene was labelled with a number 1-4 corresponding to the module theywere simulated to be coexpressed with, or a label of 0 for the for the 40"background" genes not participating in any module. The correlationstructure ("discovery_correlation") was calculated as the Pearson'scorrelation coefficient between genes (cor(discovery_data)). Edge weights in theinteraction network ("discovery_network") were calculated as theabsolute value of the correlation coefficient exponentiated to the power 5(abs(discovery_correlation)^5).
An independent test dataset ("test_data") containing the same 150genes as thediscovery dataset but 30 different samples wassimulated as above. Modules 1 and 4 (containing 20 and 35 genesrespectively) were simulated to be preserved using the same equationabove, where thesummary vectorE^{(w)} was bootstrapped fromthe same liver modules (modules 63 and 50) as in thediscovery andwith identicalnode contributionsr as in thediscovery dataset. Genes in modules 2 and 3 were simulated as"background" genes,i.e. not preserved as described above. Thecorrelation structure between genes in thetest dataset("test_correlation") and the interaction network("test_network") were calculated the same way as in thediscovery dataset.
The random seed used for the simulations was 37.
References
Ritchie, S.C.,et al.,A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell Systems.3, 71-82 (2016).
See Also
modulePreservation,plotModule, andnetworkProperties.
Load a 'bigMatrix' (deprecated)
Description
The'bigMatrix' class is no longer implemented in theNetReppackage: the shared memory approach was incompatabile with high performancecompute clusters, so the parallel permutation procedure has been translatedinto C++ code (which is also much faster). Thedisk.matrixclass should now be used instead when analysing large datasets.
Usage
load.bigMatrix(backingfile)Arguments
backingfile | path to the backingfile for the |
Details
This function will convert'bigMatrix' data saved by previous versions ofNetRep to a serialized R matrix saved in the same locationand return adisk.matrix object with the associated file path.If this conversion has taken place already the function will throw a warning.
This function will also convert the'bigMatrix' descriptor file to abig.matrix descriptor file to preserve compatability with functions in thebigmemory package. If this functionality is not required, the files with the extensions ".bin" and ".desc" may be removed.
A note for users using multi-node high performance clusters:'big.matrix' objects are not suitable for general usage. Accessto file-backed shared memory segments on multi-node systems is very slowdue to consistency checks performed by the operating system. This becomesexponentially worse the more R sessions there are simultaneously accessingthe shared memory segment, e.g. through parallelforeach loops.
Replication and preservation of network modules across datasets
Description
Quantify the preservation of network modules (sub-graphs) in an independentdataset through permutation testing on module topology. Seven networkstatistics (see details) are calculated for each module and then tested bycomparing to distributions generated from their calculation on random subsetsin the test dataset.
Usage
modulePreservation( network, data, correlation, moduleAssignments, modules = NULL, backgroundLabel = "0", discovery = 1, test = 2, selfPreservation = FALSE, nThreads = NULL, nPerm = NULL, null = "overlap", alternative = "greater", simplify = TRUE, verbose = TRUE)Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
selfPreservation | logical; if |
nThreads | number of threads to parallelise the calculation of network properties over. Automatically determined as the number of cores - 1 if not specified. |
nPerm | number of permutations to use. If not specified, the number of permutations will be automatically determined (see details). When set to 0the permutation procedure will be skipped and the observed module preservation will be returned without p-values. |
null | variables to include when generating the null distributions. Must be either "overlap" or "all" (see details). |
alternative | The type of module preservation test to perform. Must be one of "greater" (default), "less" or "two.sided" (see details). |
simplify | logical; if |
verbose | logical; should progress be reported? Default is |
Details
Input data structures:
The preservation of network modules in a second dataset is quantified bymeasuring the preservation of topological properties between thediscovery andtest datasets. These properties are calculatednot only from the interaction networks inferred in each dataset, but alsofrom the data used to infer those networks (e.g. gene expression data) aswell as the correlation structure between variables/nodes. Thus, allfunctions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists.
Analysing large datasets:
Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.
Additional memory usage of the permutation procedure is directlyproportional to the sum of module sizes squared multiplied by the number of threads. Very large modules may result in significant additional memoryusage per core due to extraction of the correlation coefficient sub-matrixat each permutation.
Module Preservation Statistics:
Module preservation is assessed through seven module preservation statistics,each of which captures a different aspect of a module's topology;i.e.the structure of the relationships between its nodes(1,2). Below isa description of each statistic, what they seek to measure, and where theirinterpretation may be inappropriate.
Themodule coherence ('coherence'),average node contribution ('avg.contrib'), andconcordance of node contribution ('cor.contrib') are all calculated from the data used to infer the network (provided in the'data' argument). They are calculated from the module'ssummary profile. This is the eigenvectorof the 1st principal component across all observations for every nodecomposing the module. For gene coexpression modules this can be interpretedas a "summary expression profile". It is typically referred to as the"module eigengene" in the weighted gene coexpression network analysisliterature(4).
Themodule coherence ('coherence') quantifies the proportion of module variance explained by the module's "summary profile". The higherthis value, the more "coherent" the data is,i.e. the more similarthe observations are nodes for each sample. With the default alternatehypothesis, a small permutationP-value indicates that the module ismore coherent than expected by chance.
Theaverage node contribution ('avg.contrib') andconcordance of node contribution ('cor.contrib') are calculated from thenode contribution, which quantifies how similar each node is to the modules'ssummary profile. It is calculated as the Pearsoncorrelation coefficient between each node and the module summary profile. Inthe weighted gene coexpression network literature it is typically called the"module membership"(2).
Theaverage node contribution ('avg.contrib') quantifies howsimilar nodes are to the module summary profile in the test dataset. Nodesdetract from this score where the sign of their node contribution flips between the discovery and test datasets,e.g. in the case of differential gene expression across conditions. A highaverage nodecontribution with a small permutationP-value indicates that themodule remains coherent in the test dataset, and that the nodes are actingtogether in a similar way.
Theconcordance of node contribution ('cor.contrib') measures whether the relative rank of nodes (in terms of their node contribution) is preserved across datasets. If a module is coherent enough that all nodes contribute strongly, then this statistic will not be meaningful as its valuewill be heavily influenced by tiny variations in node rank. This can beassessed through visualisation of the module topology (seeplotContribution.) Similarly, a strong'cor.contrib' is unlikely to be meaningful if the'avg.contrib' is not significant.
Theconcordance of correlation strucutre ('cor.cor') anddensity of correlation structure ('avg.cor') are calculated from the user-provided correlation structure between nodes (provided in the'correlation' argument). This is referred to as "coexpression" whencalculated on gene expression data.
The'avg.cor' measures how strongly nodes within a module are correlation on average in the test dataset. This average depends on the correlation coefficients in the discovery dataset: the score is penalised where correlation coefficients change in sign between datasets. A high'avg.cor' with a small permutationP-value indicates that the module is (a) more strongly correlated than expected by chance for a module of the same size, and (b) more consistently correlated with respect to the discovery dataset than expected by chance.
The'cor.cor' measures how similar the correlation coefficients are across the two datasets. A high'cor.cor' with a small permutationP-value indicates that the correlation structure within a module is more similar across datasets than expected by chance. If all nodes within a module are very similarly correlated then this statistic will not be meaningful, as its value will be heavily influenced by tiny, non-meaningful, variations in correlation strength. This can be assessed throughvisualisation of the module topology (seeplotCorrelation.)Similarly, a strong'cor.cor' is unlikely to be meaningful if the'avg.cor' is not significant.
Theaverage edge weight ('avg.weight') andconcordanceof weighted degree ('cor.degree') are both calculated from the interaction network (provided as adjacency matrices to the'network'argument).
The'avg.weight' measures the average connection strength between nodes in the test dataset. In the weighted gene coexpression network literature this is typically called the "module density"(2). A high'avg.weight' with a small permutationP-value indicates thatthe module is more strongly connected in the test dataset than expected bychance.
The'cor.degree' calculates whether the relative rank of each node'sweighted degree is similar across datasets. Theweighteddegree is calculated as the sum of a node's edge weights to all other nodesin the module. In the weighted gene coexpression network literature this is typically called the "intramodular connectivity"(2). This statistic will not be meaningful where all nodes are connected to each other with similar strength, as its value will be heavily influenced by tiny,non-meaningful, variations in weighted degree. This can be assessed throughvisualisation of the module topology (seeplotDegree.)
Both the'avg.weight' and'cor.degree' assume edges are weighted, and that the network is densely connected. Note that for sparse networks, edges with zero weight are included when calculating bothstatistics. Only the magnitude of the weights, not their sign, contribute tothe score. If the network isunweighted,i.e. edges indicatepresence or absence of a relationship, then the'avg.weight' will bethe proportion of the number of edges to the total number of possible edgeswhile theweighted degree simply becomes thedegree. A high'avg.weight' in this case measures how interconnected a module is inthe test dataset. A highdegree indicates that a node is connected tomany other nodes. The interpretation of the'cor.degree' remainsunchanged between weighted and unweighted networks. If the network isdirected the interpretation of the'avg.weight' remains unchanged,while thecor.degree will measure the concordance of the nodein-degree in the test network. To measure theout-degreeinstead, the adjacency matrices provided to the'network' argumentshould be transposed.
Sparse data:
Caution should be used when runningNetRepon sparse data (i.e. where there are many zero values in the data used to infer the network). For this data, theaverage node contribution ('avg.contrib'),concordance of node contribution ('cor.contrib'), andmodule coherence ('coherence')will all be systematically underestimated due to their reliance on the Pearson correlation coefficient to calculate thenode contribution.
Care should also be taken to use appropriate methods for inferring thecorrelation structure when the data is sparse for the same reason.
Proportional data:
Caution should be used when runningNetRep on proportional data (i.e. where observations across samples all sum to the same value,e.g. 1). For this data, theaverage node contribution ('avg.contrib'),concordance of node contribution ('cor.contrib'), andmodule coherence ('coherence')will all be systematically overestimated due to their reliance on the Pearson correlation coefficient to calculate thenode contribution.
Care should also be taken to use appropriate methods for inferring thecorrelation structure from proportional data for the same reason.
Hypothesis testing:
Three alternative hypotheses are available. "greater", the default, testswhether each module preservation statistic is larger than expected by chance. "lesser" tests whether each module preservation statistic is smallerthan expected by chance, which may be useful for identifying modules thatare extremely different in thetest dataset. "two.sided" can be usedto test both alternate hypotheses.
To determine whether a module preservation statistic deviates from chance, apermutation procedure is employed. Each statistic is calculated between themodule in thediscovery dataset andnPerm random subsets ofthe same size in thetest dataset in order to assess the distributionof each statistic under the null hypothesis.
Two models for the null hypothesis are available: "overlap", the default, only nodes that are present in both thediscovery andtestnetworks are used when generating null distributions. This is appropriateunder an assumption that nodes that are present in thetest dataset, but not present in thediscovery dataset, are unobserved: that is,they may fall in the module(s) of interest in thediscovery datasetif they were to be measured there. Alternatively, "all" will use all nodesin thetest network when generating the null distributions.
The number of permutations required for any given significance threshold is approximately 1 / the desired significance for one sided tests, and double that for two-sided tests. This can be calculated withrequiredPerms. WhennPerm is not specified, the number of permutations is automatically calculated as the number required for a Bonferroni corrected significance threshold adjusting for the total number of tests for each statistic, i.e. the total number of modules to be analysedmultiplied by the number oftest datasets each module is tested in. Although assessing the replication of a small numberof modules calls for very few permutations, we recommend using no fewer than 1,000 as fewer permutations are unlikely to generate representative null distributions.Note: the assumption used byrequiredPerms to determine the correct number of permtutations breaks down when assessing thepreservation of modules in a very small dataset (e.g. gene sets in a datasetwith less than 100 genes total). However, the reported p-values will stillbe accurate (seepermutationTest)(3).
Value
A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is also a list, containing the followingobjects:
observed:A matrix of the observed values for the module preservation statistics.Rows correspond to modules, and columns to the module preservationstatistics.
nulls:A three dimensional array containing the values of the module preservation statistics evaluated on random permutation of module assignment in the test network. Rows correspond to modules, columns tothe module preservation statistics, and the third dimension to the permutations.
p.values:A matrix of p-values for the
observedmodule preservation statistics as evaluated through a permutation test using the corresponding values innulls.nVarsPresent:A vector containing the number of variables that are present in the testdataset for each module.
propVarsPresent:A vector containing the proportion of variables present in the test datasetfor each module. Modules where this is less than 1 should be investigated further before making judgements about preservation to ensure that the missing variables are not the most connected ones.
contingency:If
moduleAssignmentsare present for both thediscoveryandtest datasets, then a contingency table showing the overlapbetween modules across datasets is returned. Rows correspond to modulesin thediscovery dataset, columns to modules in thetestdataset.
Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if module preservation is tested in only one dataset, thenthe returned list will have only the above elements.
Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond toa dataset, e.g.results[["Dataset1"]][["Dataset2"]] indicates an analysis where modules discovered in "Dataset1" are assessed for preservation in "Dataset2". Dataset comparisons which have not been assessed will containNULL.
References
Ritchie, S.C.,et al.,A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell Systems.3, 71-82 (2016).
Langfelder, P., Luo, R., Oldham, M. C. & Horvath, S.Is mynetwork module preserved and reproducible? PLoS Comput. Biol.7, e1001057 (2011).
Phipson, B. & Smyth, G. K.Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.Stat. Appl. Genet. Mol. Biol.9, Article39 (2010).
Langfelder, P. & Horvath, S.WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics9, 559 (2008).
See Also
Functions for:visualising network modules,calculating module topology,calculating permutation test P-values, andsplitting computation over multiple machines.
Examples
# load in example data, correlation, and network matrices for a discovery and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Assess module preservation: you should run at least 10,000 permutationspreservation <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, nPerm=1000, discovery="discovery", test="test", nThreads=2)Calculate the topological properties for a network module
Description
Calculates the network properties used to assess module preservation for oneor more modules in a user specified dataset.
Usage
networkProperties( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, simplify = TRUE, verbose = TRUE)Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
simplify | logical; if |
verbose | logical; should progress be reported? Default is |
Details
Input data structures:
Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If thenetworkProperties are being calculate within thediscovery ortest datasets, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.
Analysing large datasets:
Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.
Value
A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is a list that has one element per'modules' specified. Each of these is a list containing the following objects:
'degree':The weighted within-module degree: the sum of edge weights for each node in the module.
'avgWeight':The average edge weight within the module.
If the'data' used to infer the'test' network is provided then the following are also returned:
'summary':A vector summarising the module across each sample. This is calculated as the first eigenvector of the module from a principal component analysis.
'contribution':Thenode contribution: the similarity between each node and themodule summary profile (
'summary').'coherence':The proportion of module variance explained by the
'summary'vector.
Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if the network properties are requested for only one module in only one dataset, then the returned list will have only the above elements.
Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond toa dataset, and each element at the third level will correspond to modules discovered in the dataset specified at the top level if module labels are provided in the correspondingmoduleAssignments list element. E.g.results[["Dataset1"]][["Dataset2"]][["module1"]] will contain the properties of "module1" as calculated in "Dataset2", where "module1" was indentified in "Dataset1". Modules and datasets for which calculation of the network properties have not been requested will containNULL.
See Also
Getting nodes ordered by degree., andOrdering samples by module summary
Examples
# load in example data, correlation, and network matrices for a discovery and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Calculate the topological properties of all network modules in the discovery datasetprops <- networkProperties( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list) # Calculate the topological properties in the test dataset for the same modulestest_props <- networkProperties( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, discovery="discovery", test="test")Order nodes in descending order ofweighted degree and order modules by the similarity of their summary vectors.
Description
Order nodes in descending order ofweighted degree and order modules by the similarity of their summary vectors.
Usage
nodeOrder( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, na.rm = FALSE, orderModules = TRUE, mean = FALSE, simplify = TRUE, verbose = TRUE)Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
na.rm | logical; If |
orderModules | logical; if |
mean | logical; if |
simplify | logical; if |
verbose | logical; should progress be reported? Default is |
Details
Input data structures:
Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If thenodeOrder are being calculate within thediscovery ortest datasets, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.
Analysing large datasets:
Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.
Mean weighted degree:
When multiple'test' datasets are specified and'mean' isTRUE, then the order of nodes will be determine by the average ofeach node's weighted degree across datasets. The weighted degree in each dataset is scaled to the node with the maximum weighted degree in thatmodule in that dataset: this prevents differences in average edge weight across datasets from influencing the outcome (otherwise the mean would beweighted by the overall density of connections in the module). Thus, the mean weighted degree is a robust measure of a node's relative importance to a module across datasets. The mean is calculated with'na.rm=TRUE': where a node is missing it does not contribute to the mean.
Value
A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is a list that has one element per'modules' specified, containing a vector of node names for therequested module. Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if the node ordering are requested for module(s) in only one dataset, then a single vector of node labels willbe returned.
Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond to a dataset, and each element at the third level will correspond to modules discovered in the dataset specified at the top level if module labels are provided in the correspondingmoduleAssignments list element. E.g.results[["Dataset1"]][["Dataset2"]][["module1"]] will contain the order of nodes calculated in "Dataset2", where "module1" was indentified in"Dataset1". Modules and datasets for which calculation of the node order have not been requested will containNULL.
References
Langfelder, P., Mischel, P. S. & Horvath, S.When is hub gene selection better than standard meta-analysis? PLoS One8, e61505 (2013).
See Also
Examples
# load in example data, correlation, and network matrices for a discovery# and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Sort modules by similarity and nodes within each module by their weighted # degreenodes <- nodeOrder( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list)Template parameters
Description
Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.
Arguments
orderModules | logical; if |
Permutation test P-values for module preservation statistics
Description
Evaluates the statistical significance of each module preservation test statistic for one or more modules.
Usage
permutationTest( nulls, observed, nVarsPresent, totalSize, alternative = "greater")Arguments
nulls | a 3-dimension matrix where the columns correspond to modulepreservation statistics, rows correspond to modules, and the third dimension to null distribution observations drawn from the permutation procedure in |
observed | a matrix of observed values for each module preservationstatistc (columns) for each module (rows) returned from |
nVarsPresent | a vector containing the number of variables/nodes in eachmodule that was present in thetest dataset. Returned as a list element of the same name by |
totalSize | the size of the test network used to perform the test. Returned as a list element of the same name by |
alternative | a character string specifying the alternative hypothesis, must be one of "greater" (default), "less", or "two.sided". You can specify just the initial letter. |
Details
Calculates exact p-values for permutation tests when permutations are randomly drawn with replacement using thepermp function in thestatmod package.
This function may be useful for re-calculating permutation test P-values,for example when there are missing values due to sparse data. In this casethe user may decide that these missing values should be assigned 0 so thatP-values aren't signficant purely due to many incalculable statistics leadingto low power.
References
Phipson, B. & Smyth, G. K.Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.Stat. Appl. Genet. Mol. Biol.9, Article39 (2010).
Examples
data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Note that we recommend running at least 10,000 permutations to make sure # that the null distributions are representative.preservation <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, nPerm=1000, discovery="discovery", test="test")# Re-calculate the permutation test P-valuesp.values <- permutationTest( preservation$nulls, preservation$observed, preservation$nVarsPresent, preservation$totalSize, preservation$alternative)Plot the topology of a network module
Description
Plot the correlation structure, network edges, scaled weighted degree, node contribtuion, module data, and module summary vectors of one ormore network modules.
Individual components of the module plot can be plotted usingplotCorrelation,plotNetwork,plotDegree,plotContribution,plotData, andplotSummary.
Usage
plotModule( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderSamplesBy = NULL, orderNodesBy = NULL, orderModules = TRUE, plotNodeNames = TRUE, plotSampleNames = TRUE, plotModuleNames = NULL, main = "Module Topology", main.line = 1, drawBorders = FALSE, lwd = 1, naxt.line = -0.5, saxt.line = -0.5, maxt.line = NULL, xaxt.line = -0.5, xaxt.tck = -0.025, xlab.line = 2.5, yaxt.line = 0, yaxt.tck = -0.15, ylab.line = 2.5, laxt.line = 2.5, laxt.tck = 0.04, cex.axis = 0.8, legend.main.line = 1.5, cex.lab = 1.2, cex.main = 2, dataCols = NULL, dataRange = NULL, corCols = correlation.palette(), corRange = c(-1, 1), netCols = network.palette(), netRange = c(0, 1), degreeCol = "#feb24c", contribCols = c("#A50026", "#313695"), summaryCols = c("#1B7837", "#762A83"), naCol = "#bdbdbd", dryRun = FALSE)Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
verbose | logical; should progress be reported? Default is |
orderSamplesBy |
|
orderNodesBy |
|
orderModules | logical; if |
plotNodeNames | logical; controls whether the node names are drawed on the bottom axis. |
plotSampleNames | logical; controls whether the sample names are drawed on the left axis. |
plotModuleNames | logical; controls whether module names are drawed.The default is for module names to be drawed when multiple |
main | title for the plot. |
main.line | the number of lines into the top margin at which the plottitle will be drawn. |
drawBorders | logical; if |
lwd | line width for borders and axes. |
naxt.line | the number of lines into the bottom margin at which the nodenames will be drawn. |
saxt.line | the number of lines into the left margin at which the samplenames will be drawn. |
maxt.line | the number of lines into the bottom margin at which the module names will be drawn. |
xaxt.line | the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot. |
xaxt.tck | the size of the x-axis ticks for the module summary bar plot. |
xlab.line | the number of lines into the bottom margin at which the x axis label on themodule summary bar plot(s) will be drawn. |
yaxt.line | the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots. |
yaxt.tck | the size of the y-axis ticks for the weighted degree and node contribution bar plots. |
ylab.line | the number of lines into the left margin at which the y axis labels on theweighted degree andnode contribution bar plots will be drawn. |
laxt.line | the distance from the legend to draw the legend axis labels, as multiple of |
laxt.tck | size of the ticks on each axis legend relative to thesize of the correlation, edge weights, and data matrix heatmaps. |
cex.axis | relative size of the node and sample names. |
legend.main.line | the distance from the legend to draw the legend title. |
cex.lab | relative size of the module names and legend titles. |
cex.main | relative size of the plot titles. |
dataCols | a character vector of colors to create a gradient from forthe data heatmap (see details). Automatically determined if |
dataRange | the range of values to map to the |
corCols | a character vector of colors to create a gradient from forthe correlation structure heatmap (see details). |
corRange | the range of values to map to the |
netCols | a character vector of colors to create a gradient from forthe network edge weight heatmap (see details). |
netRange | the range of values to map to the |
degreeCol | color to use for the weighted degree bar plot. |
contribCols | color(s) to use for the node contribution bar plot (see details). |
summaryCols | color(s) to use for the node contribution bar plot (see details). |
naCol | color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps. |
dryRun | logical; if |
Details
Input data structures:
Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If the node andsample ordering is being calculated within the same dataset being visualised, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.
Analysing large datasets:
Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.
Node, sample, and module ordering:
By default, nodes are ordered in decreasing order ofweighted degreein thediscovery dataset (seenodeOrder). Missing nodes are colored in grey. This facilitates the visual comparison of modules across datasets, as the node ordering will be preserved.
Alternatively, a vector containing the names or indices of one or moredatasets can be provided to theorderNodesBy argument.
If a single dataset is provided, then nodes will be ordered in decreasing order ofweighted degree in that dataset. Only nodes that are present in this dataset will be drawn when ordering nodes by a dataset that is not thediscovery dataset for the requested modules(s).
If multiple datasets are provided then theweighted degree will beaveraged across these datasets (seenodeOrder for more details). This is useful for obtaining a robust ordering of nodes by relative importance, assuming the modules displayed are preserved in those datasets.
Ordering of nodes byweighted degree can be suppressed by settingorderNodesBy toNA, in which case nodes will be ordered as in the matrices provided in thenetwork,data, andcorrelation arguments.
When multiple modules are drawn, modules are ordered by the similarityof their summary vectors in the dataset(s) specified inorderNodesByargument. If multiple datasets are provided to theorderNodesByargument then the module summary vectors are concatenated across datasets.
By default, samples in the data heatmap and accompanying module summary barplot are ordered in descending order ofmodule summary in the drawn dataset (specified by thetest argument). If multiple modules are drawn, samples are ordered as per the left-most module on the plot.
Alternatively, a vector containing the name or index of another dataset may be provided to theorderSamplesBy argument. In this case, samples will be ordered in descending order ofmodule summary in the specified dataset. This is useful when comparing different measurements across samples, for example, gene expression data obtained from multiple tissues samples across the same individuals. If the dataset specified is thediscovery dataset, then missing samples will be displayed as horizontal grey bars. If the dataset specified is one of the other datasets, samples present in both the specified dataset and thetest dataset will be displayed first in order of the specified dataset, then samples present in only the test dataset will be displayedunderneath a horizontal black line ordered by their module summary vector in the test dataset.
Order of samples bymodule summary can be suppressed by settingorderSamplesBy toNA, in which case samples will be order asin the matrix provided to thedata argument for the drawn dataset.
Weighted degree scaling:
When drawn on a plot, the weighted degree of each node is scaled to the maximum weighted degree within its module. The scaled weighted degree is measure of relative importance for each node to its module. This makes visualisation of multiple modules with different sizes and densities possible. However, the scaled weighted degree should only be interpretedfor groups of nodes that have an apparent module structure.
Plot layout and device size
For optimal results we recommend viewing single modules on a PNG devicewith a width of 1500, a height of 2700 and a nominal resolution of 300 (png(filename, width=5*300, height=9*300, res=300))).
Warning: PDF and other vectorized devices should not be used whenplotting more than a hundred nodes. Large files will be generated whichmay cause image editing programs such as Inkscape or Illustrator to crashwhen polishing figures for publication.
WhendryRun isTRUE only the axes, legends, labels, andtitle will be drawn, allowing for quick iteration of customisableparameters to get the plot layout correct.
If axis labels or legends are drawn off screen then the margins of the plot should be adjusted prior to plotting using thepar command to increase the margin size (see the "mar" option in thepar help page).
The size of text labels can be modified by increasing or decreasing thecex.main,cex.lab, andcex.axis arguments:
cex.main:controls the size of the plot title (specified in the
mainargument).cex.lab:controls the size of the axis labels on theweighted degree,node contribution,andmodule summary bar plots as well asthe size of the module labels and the heatmap legend titles.
cex.axis:contols the size of the axis tick labels, including the node and sample labels.
The position of these labels can be changed through the following arguments:
xaxt.line:controls the distance from the plot the x-axistick labels are drawn on themodule summary bar plot.
xlab.line:controls the distance from the plot the x-axis label is drawn on themodule summary bar plot.
yaxt.line:controls the distance from the plot the y-axis tick labels are drawn on theweighted degree andnode contribution bar plots.
ylab.line:controls the distance from the plot the y-axislabel is drawn on theweighted degree andnode contribution bar plots.
main.line:controls the distance from the plot the title isdrawn.
naxt.line:controls the distance from the plot the node labels are drawn.
saxt.line:controls the distance from the plot the sample labels are drawn.
maxt.line:controls the distance from the plot the modulelabels are drawn.
laxt.line:controls the distance from the heatmap legendsthat the gradient legend labels are drawn.
legend.main.line:controls the distance from the heatmaplegends that the legend title is drawn.
The rendering of node, sample, and module names can be disabled by settingplotNodeNames,plotSampleNames, andplotModuleNamestoFALSE.
The size of the axis ticks can be changed by increasing or decreasing thefollowing arguments:
xaxt.tck:size of the x-axis tick labels as a multiple ofthe height of themodule summary bar plot
yaxt.tck:size of the y-axis tick labels as a multiple of the width of theweighted degree ornode contributionbar plots.
laxt.tck:size of the heatmap legend axis ticks as a multiple of the width of the data, correlation structure, or network edge weight heatmaps.
ThedrawBorders argument controls whether borders are drawn around the weighted degree, node contribution, or module summary bar plots. Thelwd argument controls the thickness of these borders, as well as the thickness of axes and axis ticks.
Modifying the color palettes:
ThedataCols anddataRange arguments control the appearance of the data heatmap (seeplotData). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between indataCols anddataRange specifies the range of values that maps to this gradient. Values outside of the specifieddataRange will be rendered with the colors used at eitherextreme of the gradient. The default gradient is determined based on thedata shown on the plot. If all values in thedata matrix arepositive, then the gradient is interpolated between white and green, wherewhite is used for the smallest value and green for the largest. If allvalues are negative, then the gradient is interpolated between purple andwhite, where purple is used for the smallest value and white for the valueclosest to zero. If the data contains both positive and negative values, then the gradient is interpolated between purple, white, and green, where white is used for values of zero. In this case the range shown is always centered at zero, with the values at either extreme determined by the value in the rendereddata with the strongest magnitude (the maximum of the absolute value).
ThecorCols andcorRange arguments control the appearance ofthe correlation structure heatmap (seeplotCorrelation). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between incorCols. By default,strong negative correlations are shown in blue, and strong positivecorrelations in red, and weak correlations as white.corRange controls the range of values that this gradient maps to, by default, -1 to1. Changing this may be useful for showing differences where range of correlation coefficients is small.
ThenetCols andnetRange arguments control the appearance ofthe network edge weight heatmap (seeplotNetwork). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between innetCols. By default,weak or non-edges are shown in white, while strong edges are shown in red.ThenetRange controls the range of values this gradient maps to, by default, 0 to 1. IfnetRange is set toNA, then the gradient will be mapped to values between 0 and the maximum edge weight ofthe shown network.
ThedegreeCol argument controls the color of the weighted degreebar plot (seeplotDegree).
ThecontribCols argument controls the color of the node contribution bar plot (seeplotContribution. This can be specified as single value to be used for all nodes, or as two colors: oneto use for nodes with positive contributions and one to use for nodes withnegative contributions.
ThesummaryCols argument controls the color of the module summary bar plot (seeplotSummary. This can be specified as singlevalue to be used for all samples, or as two colors: one to use for sampleswith a positive module summary value and one fpr samples with a negativemodule summary value.
ThenaCol argument controls the color of missing nodes and sampleson the data, correlaton structure, and network edge weight heatmaps.
Embedding in Rmarkdown documents
The chunk optionfig.keep="last" should be set to avoid an empty plot being embedded above the plot generated byplotModule. Thisempty plot is generated so that an error will be thrown as early as possible if the margins are too small to be displayed. Normally, theseare drawn over with the actual plot components when drawing the plot onother graphical devices.
See Also
plotCorrelation,plotNetwork,plotDegree,plotContribution,plotData, andplotSummary.
Examples
# load in example data, correlation, and network matrices for a discovery # and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Plot module 1, 2 and 4 in the discovery datasetplotModule( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4))# Now plot them in the test dataset (module 2 does not replicate)plotModule( network=network_list,data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery", test="test")# Plot modules 1 and 4, which replicate, in the test datset ordering nodes# by weighted degree averaged across the two datasetsplotModule( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 4), discovery="discovery", test="test", orderNodesBy=c("discovery", "test"))Plot a topological feature of network module
Description
Functions for plotting the topology of a network module.
Usage
plotData( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderSamplesBy = NULL, orderNodesBy = NULL, orderModules = TRUE, plotNodeNames = TRUE, plotSampleNames = TRUE, plotModuleNames = NULL, main = "", main.line = 1, lwd = 1, plotLegend = TRUE, legend.main = "Data", legend.main.line = 1, naxt.line = -0.5, saxt.line = -0.5, maxt.line = 3, legend.position = 0.15, laxt.tck = 0.03, laxt.line = 3, cex.axis = 0.8, cex.lab = 1.2, cex.main = 2, dataCols = NULL, dataRange = NULL, naCol = "#bdbdbd", dryRun = FALSE)plotCorrelation( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderNodesBy = NULL, symmetric = FALSE, orderModules = TRUE, plotNodeNames = TRUE, plotModuleNames = NULL, main = "", main.line = 1, lwd = 1, plotLegend = TRUE, legend.main = "Correlation", legend.main.line = 1, naxt.line = -0.5, maxt.line = 3, legend.position = NULL, laxt.tck = NULL, laxt.line = NULL, cex.axis = 0.8, cex.lab = 1.2, cex.main = 2, corCols = correlation.palette(), corRange = c(-1, 1), naCol = "#bdbdbd", dryRun = FALSE)plotNetwork( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderNodesBy = NULL, symmetric = FALSE, orderModules = TRUE, plotNodeNames = TRUE, plotModuleNames = NULL, main = "", main.line = 1, lwd = 1, plotLegend = TRUE, legend.main = "Edge weight", legend.main.line = 1, naxt.line = -0.5, maxt.line = 3, legend.position = NULL, laxt.tck = NULL, laxt.line = NULL, cex.axis = 0.8, cex.lab = 1.2, cex.main = 2, netCols = network.palette(), netRange = c(0, 1), naCol = "#bdbdbd", dryRun = FALSE)plotContribution( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderNodesBy = NULL, orderModules = TRUE, plotNodeNames = TRUE, plotModuleNames = NULL, main = "", main.line = 1, ylab.line = 2.5, lwd = 1, drawBorders = FALSE, naxt.line = -0.5, maxt.line = 3, yaxt.line = 0, yaxt.tck = -0.035, cex.axis = 0.8, cex.lab = 1.2, cex.main = 2, contribCols = c("#A50026", "#313695"), naCol = "#bdbdbd", dryRun = FALSE)plotDegree( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderNodesBy = NULL, orderModules = TRUE, plotNodeNames = TRUE, plotModuleNames = NULL, main = "", main.line = 1, lwd = 1, drawBorders = FALSE, naxt.line = -0.5, maxt.line = 3, yaxt.line = 0, yaxt.tck = -0.035, ylab.line = 2.5, cex.axis = 0.8, cex.lab = 1.2, cex.main = 2, degreeCol = "#feb24c", naCol = "#bdbdbd", dryRun = FALSE)plotSummary( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, verbose = TRUE, orderSamplesBy = NULL, orderNodesBy = NULL, orderModules = TRUE, plotSampleNames = TRUE, plotModuleNames = NULL, main = "", main.line = 1, xlab.line = 2.5, lwd = 1, drawBorders = FALSE, saxt.line = -0.5, maxt.line = 0, xaxt.line = 0, xaxt.tck = -0.025, cex.axis = 0.8, cex.lab = 1.2, cex.main = 2, summaryCols = c("#1B7837", "#762A83"), naCol = "#bdbdbd", dryRun = FALSE)Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
verbose | logical; should progress be reported? Default is |
orderSamplesBy |
|
orderNodesBy |
|
orderModules | logical; if |
plotNodeNames | logical; controls whether the node names are drawed on the bottom axis. |
plotSampleNames | logical; controls whether the sample names are drawed on the left axis. |
plotModuleNames | logical; controls whether module names are drawed.The default is for module names to be drawed when multiple |
main | title for the plot. |
main.line | the number of lines into the top margin at which the plottitle will be drawn. |
lwd | line width for borders and axes. |
plotLegend | logical; controls whether a legend is drawn when using |
legend.main | title for the legend. |
legend.main.line | the distance from the legend to draw the legend title. |
naxt.line | the number of lines into the bottom margin at which the nodenames will be drawn. |
saxt.line | the number of lines into the left margin at which the samplenames will be drawn. |
maxt.line | the number of lines into the bottom margin at which the module names will be drawn. |
legend.position | the distance from the plot to start the legend, as aproportion of the plot width. |
laxt.tck | size of the ticks on each axis legend relative to thesize of the correlation, edge weights, and data matrix heatmaps. |
laxt.line | the distance from the legend to draw the legend axis labels, as multiple of |
cex.axis | relative size of the node and sample names. |
cex.lab | relative size of the module names and legend titles. |
cex.main | relative size of the plot titles. |
dataCols | a character vector of colors to create a gradient from forthe data heatmap (see details). Automatically determined if |
dataRange | the range of values to map to the |
naCol | color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps. |
dryRun | logical; if |
symmetric | logical; controls whether the correlation and network heatmaps are drawn as symmetric (square) heatmaps or asymettric triangle heatmaps. If symmetric, then the node and module names will also be renderedon the left axis. |
corCols | a character vector of colors to create a gradient from forthe correlation structure heatmap (see details). |
corRange | the range of values to map to the |
netCols | a character vector of colors to create a gradient from forthe network edge weight heatmap (see details). |
netRange | the range of values to map to the |
ylab.line | the number of lines into the left margin at which the y axis labels on theweighted degree andnode contribution bar plots will be drawn. |
drawBorders | logical; if |
yaxt.line | the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots. |
yaxt.tck | the size of the y-axis ticks for the weighted degree and node contribution bar plots. |
contribCols | color(s) to use for the node contribution bar plot (see details). |
degreeCol | color to use for the weighted degree bar plot. |
xlab.line | the number of lines into the bottom margin at which the x axis label on themodule summary bar plot(s) will be drawn. |
xaxt.line | the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot. |
xaxt.tck | the size of the x-axis ticks for the module summary bar plot. |
summaryCols | color(s) to use for the node contribution bar plot (see details). |
Details
Input data structures:
Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If the node andsample ordering is being calculated within the same dataset being visualised, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.
Analysing large datasets:
Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.
Node, sample, and module ordering:
By default, nodes are ordered in decreasing order ofweighted degreein thediscovery dataset (seenodeOrder). Missing nodes are colored in grey. This facilitates the visual comparison of modules across datasets, as the node ordering will be preserved.
Alternatively, a vector containing the names or indices of one or moredatasets can be provided to theorderNodesBy argument.
If a single dataset is provided, then nodes will be ordered in decreasing order ofweighted degree in that dataset. Only nodes that are present in this dataset will be drawn when ordering nodes by a dataset that is not thediscovery dataset for the requested modules(s).
If multiple datasets are provided then theweighted degree will beaveraged across these datasets (seenodeOrder for more details). This is useful for obtaining a robust ordering of nodes by relative importance, assuming the modules displayed are preserved in those datasets.
Ordering of nodes byweighted degree can be suppressed by settingorderNodesBy toNA, in which case nodes will be ordered as in the matrices provided in thenetwork,data, andcorrelation arguments.
When multiple modules are drawn, modules are ordered by the similarityof their summary vectors in the dataset(s) specified inorderNodesByargument. If multiple datasets are provided to theorderNodesByargument then the module summary vectors are concatenated across datasets.
By default, samples in the data heatmap and accompanying module summary barplot are ordered in descending order ofmodule summary in the drawn dataset (specified by thetest argument). If multiple modules are drawn, samples are ordered as per the left-most module on the plot.
Alternatively, a vector containing the name or index of another dataset may be provided to theorderSamplesBy argument. In this case, samples will be ordered in descending order ofmodule summary in the specified dataset. This is useful when comparing different measurements across samples, for example, gene expression data obtained from multiple tissues samples across the same individuals. If the dataset specified is thediscovery dataset, then missing samples will be displayed as horizontal grey bars. If the dataset specified is one of the other datasets, samples present in both the specified dataset and thetest dataset will be displayed first in order of the specified dataset, then samples present in only the test dataset will be displayedunderneath a horizontal black line ordered by their module summary vector in the test dataset.
Order of samples bymodule summary can be suppressed by settingorderSamplesBy toNA, in which case samples will be order asin the matrix provided to thedata argument for the drawn dataset.
Weighted degree scaling:
When drawn on a plot, the weighted degree of each node is scaled to the maximum weighted degree within its module. The scaled weighted degree is measure of relative importance for each node to its module. This makes visualisation of multiple modules with different sizes and densities possible. However, the scaled weighted degree should only be interpretedfor groups of nodes that have an apparent module structure.
Plot layout and device size
Although reasonable default values for most parameters have been provided,the rendering of axes and titles may need adjusting depending on the sizeof the plot window. ThedryRun argument is useful for quickly determining whether the plot will render correctly. WhendryRun isTRUE only the axes, legends, labels, and title will be drawn, allowing for quick iteration of customisable parameters to get the plot layout correct.
Warning: PDF and other vectorized devices should not be used whenplotting the heatmaps with more than a hundred nodes. Large files will begenerated which may cause image editing programs such as Inkscape orIllustrator to crash when polishing figures for publication.
If axis labels or legends are drawn off screen then the margins of the plot should be adjusted prior to plotting using thepar command to increase the margin size (see the "mar" option in thepar help page).
The size of text labels can be modified by increasing or decreasing thecex.main,cex.lab, andcex.axis arguments:
cex.main:controls the size of the plot title (specified in the
mainargument).cex.lab:controls the size of the axis labels on theweighted degree,node contribution,andmodule summary bar plots as well asthe size of the module labels and the heatmap legend titles.
cex.axis:contols the size of the axis tick labels, including the node and sample labels.
The position of these labels can be changed through the following arguments:
xaxt.line:controls the distance from the plot the x-axistick labels are drawn on themodule summary bar plot.
xlab.line:controls the distance from the plot the x-axis label is drawn on themodule summary bar plot.
yaxt.line:controls the distance from the plot the y-axis tick labels are drawn on theweighted degree andnode contribution bar plots.
ylab.line:controls the distance from the plot the y-axislabel is drawn on theweighted degree andnode contribution bar plots.
main.line:controls the distance from the plot the title isdrawn.
naxt.line:controls the distance from the plot the node labels are drawn.
saxt.line:controls the distance from the plot the sample labels are drawn.
maxt.line:controls the distance from the plot the modulelabels are drawn.
laxt.line:controls the distance from the heatmap legendsthat the gradient legend labels are drawn.
legend.main.line:controls the distance from the heatmaplegends that the legend title is drawn.
The rendering of node, sample, and module names can be disabled by settingplotNodeNames,plotSampleNames, andplotModuleNamestoFALSE.
The size of the axis ticks can be changed by increasing or decreasing thefollowing arguments:
xaxt.tck:size of the x-axis tick labels as a multiple ofthe height of themodule summary bar plot
yaxt.tck:size of the y-axis tick labels as a multiple of the width of theweighted degree ornode contributionbar plots.
laxt.tck:size of the heatmap legend axis ticks as a multiple of the width of the data, correlation structure, or network edge weight heatmaps.
The placement of heatmap legends is controlled by the following arguments:
plotLegend:if
FALSElegend will not be drawn.legend.position:a multiple of the plot width, controls the horizontal distance from the plot the legend is drawn.
ThedrawBorders argument controls whether borders are drawn around the weighted degree, node contribution, or module summary bar plots. Thelwd argument controls the thickness of these borders, as well as the thickness of axes and axis ticks.
Modifying the color palettes:
ThedataCols anddataRange arguments control the appearance of the data heatmap (seeplotData). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between indataCols anddataRange specifies the range of values that maps to this gradient. Values outside of the specifieddataRange will be rendered with the colors used at eitherextreme of the gradient. The default gradient is determined based on thedata shown on the plot. If all values in thedata matrix arepositive, then the gradient is interpolated between white and green, wherewhite is used for the smallest value and green for the largest. If allvalues are negative, then the gradient is interpolated between purple andwhite, where purple is used for the smallest value and white for the valueclosest to zero. If the data contains both positive and negative values, then the gradient is interpolated between purple, white, and green, where white is used for values of zero. In this case the range shown is always centered at zero, with the values at either extreme determined by the value in the rendereddata with the strongest magnitude (the maximum of the absolute value).
ThecorCols andcorRange arguments control the appearance ofthe correlation structure heatmap (seeplotCorrelation). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between incorCols. By default,strong negative correlations are shown in blue, and strong positivecorrelations in red, and weak correlations as white.corRange controls the range of values that this gradient maps to, by default, -1 to1. Changing this may be useful for showing differences where range of correlation coefficients is small.
ThenetCols andnetRange arguments control the appearance ofthe network edge weight heatmap (seeplotNetwork). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between innetCols. By default,weak or non-edges are shown in white, while strong edges are shown in red.ThenetRange controls the range of values this gradient maps to, by default, 0 to 1. IfnetRange is set toNA, then the gradient will be mapped to values between 0 and the maximum edge weight ofthe shown network.
ThedegreeCol argument controls the color of the weighted degreebar plot (seeplotDegree).
ThecontribCols argument controls the color of the node contribution bar plot (seeplotContribution. This can be specified as single value to be used for all nodes, or as two colors: oneto use for nodes with positive contributions and one to use for nodes withnegative contributions.
ThesummaryCols argument controls the color of the module summary bar plot (seeplotSummary. This can be specified as singlevalue to be used for all samples, or as two colors: one to use for sampleswith a positive module summary value and one fpr samples with a negativemodule summary value.
ThenaCol argument controls the color of missing nodes and sampleson the data, correlaton structure, and network edge weight heatmaps.
See Also
plotModule for a combined plot showing all topological properties for a network module.
Examples
# load in example data, correlation, and network matrices for a discovery and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Plot the data for module 1, 2 and 4 in the discovery datasetplotData( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4))# Symmetric = TRUE gives a traditional heatmap for the correlation structure# and weighted networkplotCorrelation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4), symmetric=TRUE)# While the default is to render only one half of the (symmetric) matrixplotNetwork( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4))# Plot the degree of nodes in each module in the test dataset, but show them# in the same order as the discovery dataset to compare how node degree # changesplotDegree( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery", test="test")# Alternatively nodes can be ordered on the plot by degree in the test datasetplotDegree( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery", test="test", orderNodesBy="test")# Or by averaging the degree across datasets for a more robust ordering plotDegree( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery", test="test", orderNodesBy=c("discovery", "test"))# Arbitrary subsets can be plotted:plotContribution( network=network_list[[1]][1:10, 1:10], data=data_list[[1]][, 1:10], correlation=correlation_list[[1]][1:10, 1:10], orderNodesBy=NA)# Plot the module summary vectors for multiple modules:plotSummary( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery", test="test", orderSamplesBy="test")Template parameters
Description
Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.
Arguments
orderNodesBy |
|
orderSamplesBy |
|
plotNodeNames | logical; controls whether the node names are drawed on the bottom axis. |
plotSampleNames | logical; controls whether the sample names are drawed on the left axis. |
plotModuleNames | logical; controls whether module names are drawed.The default is for module names to be drawed when multiple |
main | title for the plot. |
main.line | the number of lines into the top margin at which the plottitle will be drawn. |
drawBorders | logical; if |
lwd | line width for borders and axes. |
naxt.line | the number of lines into the bottom margin at which the nodenames will be drawn. |
saxt.line | the number of lines into the left margin at which the samplenames will be drawn. |
maxt.line | the number of lines into the bottom margin at which the module names will be drawn. |
xaxt.line | the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot. |
xaxt.tck | the size of the x-axis ticks for the module summary bar plot. |
xlab.line | the number of lines into the bottom margin at which the x axis label on themodule summary bar plot(s) will be drawn. |
yaxt.line | the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots. |
ylab.line | the number of lines into the left margin at which the y axis labels on theweighted degree andnode contribution bar plots will be drawn. |
yaxt.tck | the size of the y-axis ticks for the weighted degree and node contribution bar plots. |
laxt.line | the distance from the legend to draw the legend axis labels, as multiple of |
laxt.tck | size of the ticks on each axis legend relative to thesize of the correlation, edge weights, and data matrix heatmaps. |
legend.main.line | the distance from the legend to draw the legend title. |
cex.axis | relative size of the node and sample names. |
cex.lab | relative size of the module names and legend titles. |
cex.main | relative size of the plot titles. |
dataCols | a character vector of colors to create a gradient from forthe data heatmap (see details). Automatically determined if |
dataRange | the range of values to map to the |
corCols | a character vector of colors to create a gradient from forthe correlation structure heatmap (see details). |
corRange | the range of values to map to the |
netCols | a character vector of colors to create a gradient from forthe network edge weight heatmap (see details). |
netRange | the range of values to map to the |
degreeCol | color to use for the weighted degree bar plot. |
contribCols | color(s) to use for the node contribution bar plot (see details). |
summaryCols | color(s) to use for the node contribution bar plot (see details). |
naCol | color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps. |
dryRun | logical; if |
How many permutations do I need to test at my desired significance level?
Description
How many permutations do I need to test at my desired significance level?
Usage
requiredPerms(alpha, alternative = "greater")Arguments
alpha | desired significance threshold. |
alternative | a character string specifying the alternative hypothesis, must be one of "greater" (default), "less", or "two.sided". You can specify just the initial letter. |
Value
The minimum number of permutations required to detect any significantassociations at the providedalpha. The minimum p-value will alwaysbe smaller thanalpha.
Examples
data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# How many permutations are required to Bonferroni adjust for the 4 modules # in the example data? nPerm <- requiredPerms(0.05/4) # Note that we recommend running at least 10,000 permutations to make sure # that the null distributions are representative.preservation <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, nPerm=nPerm, discovery="discovery", test="test")Order samples within a network.
Description
Get the order of samples within a module based on the module summary vector.
Usage
sampleOrder( network, data, correlation, moduleAssignments = NULL, modules = NULL, backgroundLabel = "0", discovery = NULL, test = NULL, na.rm = FALSE, simplify = TRUE, verbose = TRUE)Arguments
network | a list of interaction networks, one for each dataset. Each entry of the list should be a |
data | a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction |
correlation | a list of matrices, one for each dataset. Each entry ofthe list should be a |
moduleAssignments | a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset. |
modules | a list of vectors, one for each |
backgroundLabel | a single label given to nodes that do not belong to any module in the |
discovery | a vector of names or indices denoting thediscoverydataset(s) in the |
test | a list of vectors, one for each |
na.rm | logical; If |
simplify | logical; if |
verbose | logical; should progress be reported? Default is |
Details
Input data structures:
Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:
network:a list of interaction networks, one for each dataset.
data:a list of data matrices used to infer those networks, one for each dataset.
correlation:a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments:a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.
modules:a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.
discovery:a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.
test:a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of the
network,data, andcorrelationargument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.
The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If thesampleOrder are being calculate within thediscovery ortest datasets, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.
Analysing large datasets:
Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.
Value
A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is a list that has one element per'modules' specified, containing a vector of node names for therequested module. Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if the sample ordering are requested for in only one dataset, then a single vector of node labels will be returned.
Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond to a dataset, and each element at the third level will correspond to modules discovered in the dataset specified at the top level if module labels are provided in the correspondingmoduleAssignments list element. E.g.results[["Dataset1"]][["Dataset2"]][["module1"]] will contain the order of samples calculated in "Dataset2", where "module1" was indentifiedin "Dataset1". Modules and datasets for which calculation of the sampleorder have not been requested will containNULL.
See Also
Examples
# load in example data, correlation, and network matrices for a discovery # and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Sort nodes within module 1 in descending order by module summarysamples <- sampleOrder( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, modules="1" )Template parameters
Description
Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.
Arguments
simplify | logical; if |