Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Permutation Testing Network Module Preservation Across Datasets
Version:1.2.9
BugReports:https://github.com/sritchie73/NetRep/issues
Description:Functions for assessing the replication/preservation of a network module's topology across datasets through permutation testing; Ritchie et al. (2015) <doi:10.1016/j.cels.2016.06.012>.
License:GPL-2
Depends:R (≥ 3.6), methods
Imports:foreach, Rcpp (≥ 0.11), statmod, RhpcBLASctl, abind,RColorBrewer, utils, stats, graphics, grDevices
Suggests:bigmemory, testthat, knitr, rmarkdown
LinkingTo:Rcpp, BH, RcppArmadillo (≥ 0.4)
RoxygenNote:7.3.3
VignetteBuilder:knitr
Encoding:UTF-8
NeedsCompilation:yes
Packaged:2025-10-23 14:58:56 UTC; sr827
Author:Scott Ritchie [aut, cre] (0000-0002-8454-9548)
Maintainer:Scott Ritchie <sritchie73@gmail.com>
Repository:CRAN
Date/Publication:2025-10-23 15:20:08 UTC

Fast permutation procedure for testing network module replication

Description

Functions for assessing the replication/preservation of a network module's topology across datasets through permutation testing. This is suitable for networks that can be meaningfully inferred from multiple datasets. These include gene coexpression networks, protein-protein interaction networks, andmicrobial interaction networks. Modules within these networks consist of groups of nodes that are particularly interesting: for example a group of tightly connected genes associated with a disease, groups of genes annotatedwith the same term in the Gene Ontology database, or groups of interactingmicrobial species, i.e. communities. Application of this method can answerquestions such as; (1) do the relationships between genes in a module replicate in an independent cohort? (2) are these gene coexpression modulespreserved across tissues or tissue specific? (3) are these modules conservedacross species? (4) are microbial communities preserved across multiple spatiallocations?

Details

The main function for this package ismodulePreservation. Several functions for downstream are also provided:networkProperties for calculating the topological properties of a module, andplotModule for visualising a module.

Author(s)

Maintainer: Scott Ritchiesritchie73@gmail.com (0000-0002-8454-9548)

See Also

Useful links:


Combine results of multiple permutation procedures

Description

This function takes the output from multiple runs ofmodulePreservation, combines their results, and returns a newset of permutation test P-values. This is useful for parallelising calculations across multiple machines.

Usage

combineAnalyses(pres1, pres2)

Arguments

pres1,pres2

lists returned bymodulePreservation.

Details

The calls to 'modulePreservation' must have been identical for both inputlists, with the exception of the number of threads used and the number ofpermutations calculated.

Value

A nested list containing the same elements asmodulePreservation.

Examples

data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)pres1 <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list, moduleAssignments=labels_list, nPerm=1000, discovery="discovery",  test="test", nThreads=2)pres2 <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, nPerm=1000, discovery="discovery",  test="test", nThreads=2)combined <- combineAnalyses(pres1, pres2)

Template parameters

Description

Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

verbose

logical; should progress be reported? Default isTRUE.


The 'disk.matrix' class

Description

A'disk.matrix' contains a file path to a matrix stored on disk,along with meta data for how to read that file. This allowsNetRepto load datasets into RAM only when required, i.e. one at a time. This significantly reduces the memory usage of R when analysing large datasets.'disk.matrix' objects may be supplied instead of'matrix' objects in the input list arguments'network','data', and'correlation', which are common to most ofNetRep's functions.

Usage

attach.disk.matrix(file, serialized = TRUE, ...)serialize.table(file, ...)is.disk.matrix(x)as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'disk.matrix'as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'matrix'as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'ANY'as.disk.matrix(x, file, serialize = TRUE)## S4 method for signature 'disk.matrix'as.matrix(x)## S4 method for signature 'disk.matrix'show(object)

Arguments

file

forattach.disk.matrix the file name of a matrix on disk. Foras.disk.matrix the file name to save the matrix to. Forserialize.table the file name of a matrix in table format on disk.

serialized

determines how the matrix will be loaded from disk into Rbyas.matrix. IfTRUE, thereadRDS function will be used. IfFALSE, theread.table function will be used.

...

arguments to be used byread.table when reading in matrix data from a file in table format.

x

foras.matrix adisk.matrix object to load into R. Foras.disk.matrix an object to convert to adisk.matrix. Foris.disk.matrix an object to check if its adisk.matrix.

serialize

determines how the matrix is saved to disk byas.disk.matrix. IfTRUE it will be stored as a serialized R object usingsaveRDS. IfFALSE it will be stored as a tab-separated file usingwrite.table.

object

a'disk.matrix' object.

Details

Matrices may either be stored as regular table files that can be read byread.table, or as serialized R objects that can be read byreadRDS. Serialized objects are much faster to load, but cannot be read by other programs.

Theattach.disk.matrix function creates adisk.matrix objectfrom a file path. Theas.matrix function will load the data from diskinto the R session as a regularmatrix object.

Theas.disk.matrix function converts a matrix into adisk.matrix by saving its contents to the specifiedfile. Theserialize argument determines whether the data is stored as a serialized R object or as a tab-separated file (i.e.sep="\t"). Werecommend storing the matrix as a serialized R object unless disk space isa concern. More control over the storage format can be obtained by usingsaveRDS orwrite.table directly.

Theserialize.matrix function converts a file in table format to aserialized R object with the same file name, but with the ".rds" extension.

Value

Adisk.matrix object (attach.disk.matrix,as.disk.matrix),amatrix (as.matrix), the file path to a serialized matrix(serialize.table), or aTRUE orFALSE indicating whether an object is adisk.matrix (is.disk.matrix).

Slots

file

the name of the file where the matrix is saved.

read.func

either"read.table" or"readRDS".

func.args

a list of arguments to be supplied to the'read.func'.

Warning

attach.disk.matrix does not check whether the specified file can beread into R.as.matrix will fail and throw an error if this is thecase.


Example data

Description

Example gene coexpression networks inferred from two independent datasets to demonstrate the usage of package functions.

Usage

data("NetRep")

Format

"discovery_network"

amatrix with 150 columns and 150 rows containing the network edge weights encoding the interaction strength between each pair ofgenes in thediscovery dataset.

"discovery_data"

amatrix with 150 columns (genes) and 30 rows (samples) whose entries correspond to the expression level of each gene in each samplein thediscovery dataset.

"discovery_correlation"

amatrix with 150 columns and 150 rows containing the correlation-coefficients between each pair of genes calculated from the"discovery_data"matrix.

\"module_labels"

a namedvector with 150 entries containing the module assignmentfor each gene as identified in thediscovery dataset.

"test_network"

amatrix with 150 columns and 150 rows containing the network edge weights encoding the interaction strength between each pair ofgenes in thetest dataset.

"test_data"

amatrix with 150 columns (genes) and 30 rows (samples) whose entries correspond to the expression level of each gene in each samplein thetest dataset.

"test_correlation"

amatrix with 150 columns and 150 rows containing the correlation-coefficients between each pair of genes calculated from the"test_data"matrix.

An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.

An object of classmatrix (inherits fromarray) with 30 rows and 150 columns.

An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.

An object of classnumeric of length 150.

An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.

An object of classmatrix (inherits fromarray) with 30 rows and 150 columns.

An object of classmatrix (inherits fromarray) with 150 rows and 150 columns.

Details

Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one vector for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

This data is used to provide concrete examples of the usage of these arguments in each package function.

Simulation details

Thediscovery gene expression dataset ("discovery_data")containing 30 samples and 150 genes was simulated to contain four distinctmodules of sizes 20, 25, 30, and 35 genes. Data for each module weresimulated as:

G^{(w)}_{simulated} = E^{(w)} r_i + \sqrt{1 - r^2_i} \epsilon

WhereE^{(w)} is the simulated module'ssummary vector,r is the simulated module'snode contributions for each gene,and\epsilon is the error term drawn from a standard normal distribution.E^{(w)} andr were simulated by bootstrapping (sampling with replacement) samples and genes from the corresponding vectors in modules 63, 51, 57, and 50 discovered in the liver tissue gene expression data from apublicly available mouse dataset (see reference(1) for details on the dataset and network discovery). The remaining 40 genes that were not part of any module were simulated by randomly selecting 40 liver genes and bootstrapping 30 samples and adding the noise term,\epsilon. Avector of module assignments was created ("module_labels") in whicheach gene was labelled with a number 1-4 corresponding to the module theywere simulated to be coexpressed with, or a label of 0 for the for the 40"background" genes not participating in any module. The correlationstructure ("discovery_correlation") was calculated as the Pearson'scorrelation coefficient between genes (cor(discovery_data)). Edge weights in theinteraction network ("discovery_network") were calculated as theabsolute value of the correlation coefficient exponentiated to the power 5(abs(discovery_correlation)^5).

An independent test dataset ("test_data") containing the same 150genes as thediscovery dataset but 30 different samples wassimulated as above. Modules 1 and 4 (containing 20 and 35 genesrespectively) were simulated to be preserved using the same equationabove, where thesummary vectorE^{(w)} was bootstrapped fromthe same liver modules (modules 63 and 50) as in thediscovery andwith identicalnode contributionsr as in thediscovery dataset. Genes in modules 2 and 3 were simulated as"background" genes,i.e. not preserved as described above. Thecorrelation structure between genes in thetest dataset("test_correlation") and the interaction network("test_network") were calculated the same way as in thediscovery dataset.

The random seed used for the simulations was 37.

References

  1. Ritchie, S.C.,et al.,A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell Systems.3, 71-82 (2016).

See Also

modulePreservation,plotModule, andnetworkProperties.


Load a 'bigMatrix' (deprecated)

Description

The'bigMatrix' class is no longer implemented in theNetReppackage: the shared memory approach was incompatabile with high performancecompute clusters, so the parallel permutation procedure has been translatedinto C++ code (which is also much faster). Thedisk.matrixclass should now be used instead when analysing large datasets.

Usage

load.bigMatrix(backingfile)

Arguments

backingfile

path to the backingfile for the'bigMatrix'. Thefile extension must be omitted.

Details

This function will convert'bigMatrix' data saved by previous versions ofNetRep to a serialized R matrix saved in the same locationand return adisk.matrix object with the associated file path.If this conversion has taken place already the function will throw a warning.

This function will also convert the'bigMatrix' descriptor file to abig.matrix descriptor file to preserve compatability with functions in thebigmemory package. If this functionality is not required, the files with the extensions ".bin" and ".desc" may be removed.

A note for users using multi-node high performance clusters:'big.matrix' objects are not suitable for general usage. Accessto file-backed shared memory segments on multi-node systems is very slowdue to consistency checks performed by the operating system. This becomesexponentially worse the more R sessions there are simultaneously accessingthe shared memory segment, e.g. through parallelforeach loops.


Replication and preservation of network modules across datasets

Description

Quantify the preservation of network modules (sub-graphs) in an independentdataset through permutation testing on module topology. Seven networkstatistics (see details) are calculated for each module and then tested bycomparing to distributions generated from their calculation on random subsetsin the test dataset.

Usage

modulePreservation(  network,  data,  correlation,  moduleAssignments,  modules = NULL,  backgroundLabel = "0",  discovery = 1,  test = 2,  selfPreservation = FALSE,  nThreads = NULL,  nPerm = NULL,  null = "overlap",  alternative = "greater",  simplify = TRUE,  verbose = TRUE)

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

selfPreservation

logical; ifFALSE (default) then module preservation analysis will not be performed within a dataset (i.e. where thediscovery andtest datasets are the same).

nThreads

number of threads to parallelise the calculation of network properties over. Automatically determined as the number of cores - 1 if not specified.

nPerm

number of permutations to use. If not specified, the number of permutations will be automatically determined (see details). When set to 0the permutation procedure will be skipped and the observed module preservation will be returned without p-values.

null

variables to include when generating the null distributions. Must be either "overlap" or "all" (see details).

alternative

The type of module preservation test to perform. Must be one of "greater" (default), "less" or "two.sided" (see details).

simplify

logical; ifTRUE, simplify the structure of the outputlist if possible (see Return Value).

verbose

logical; should progress be reported? Default isTRUE.

Details

Input data structures:

The preservation of network modules in a second dataset is quantified bymeasuring the preservation of topological properties between thediscovery andtest datasets. These properties are calculatednot only from the interaction networks inferred in each dataset, but alsofrom the data used to infer those networks (e.g. gene expression data) aswell as the correlation structure between variables/nodes. Thus, allfunctions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists.

Analysing large datasets:

Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Additional memory usage of the permutation procedure is directlyproportional to the sum of module sizes squared multiplied by the number of threads. Very large modules may result in significant additional memoryusage per core due to extraction of the correlation coefficient sub-matrixat each permutation.

Module Preservation Statistics:

Module preservation is assessed through seven module preservation statistics,each of which captures a different aspect of a module's topology;i.e.the structure of the relationships between its nodes(1,2). Below isa description of each statistic, what they seek to measure, and where theirinterpretation may be inappropriate.

Themodule coherence ('coherence'),average node contribution ('avg.contrib'), andconcordance of node contribution ('cor.contrib') are all calculated from the data used to infer the network (provided in the'data' argument). They are calculated from the module'ssummary profile. This is the eigenvectorof the 1st principal component across all observations for every nodecomposing the module. For gene coexpression modules this can be interpretedas a "summary expression profile". It is typically referred to as the"module eigengene" in the weighted gene coexpression network analysisliterature(4).

Themodule coherence ('coherence') quantifies the proportion of module variance explained by the module's "summary profile". The higherthis value, the more "coherent" the data is,i.e. the more similarthe observations are nodes for each sample. With the default alternatehypothesis, a small permutationP-value indicates that the module ismore coherent than expected by chance.

Theaverage node contribution ('avg.contrib') andconcordance of node contribution ('cor.contrib') are calculated from thenode contribution, which quantifies how similar each node is to the modules'ssummary profile. It is calculated as the Pearsoncorrelation coefficient between each node and the module summary profile. Inthe weighted gene coexpression network literature it is typically called the"module membership"(2).

Theaverage node contribution ('avg.contrib') quantifies howsimilar nodes are to the module summary profile in the test dataset. Nodesdetract from this score where the sign of their node contribution flips between the discovery and test datasets,e.g. in the case of differential gene expression across conditions. A highaverage nodecontribution with a small permutationP-value indicates that themodule remains coherent in the test dataset, and that the nodes are actingtogether in a similar way.

Theconcordance of node contribution ('cor.contrib') measures whether the relative rank of nodes (in terms of their node contribution) is preserved across datasets. If a module is coherent enough that all nodes contribute strongly, then this statistic will not be meaningful as its valuewill be heavily influenced by tiny variations in node rank. This can beassessed through visualisation of the module topology (seeplotContribution.) Similarly, a strong'cor.contrib' is unlikely to be meaningful if the'avg.contrib' is not significant.

Theconcordance of correlation strucutre ('cor.cor') anddensity of correlation structure ('avg.cor') are calculated from the user-provided correlation structure between nodes (provided in the'correlation' argument). This is referred to as "coexpression" whencalculated on gene expression data.

The'avg.cor' measures how strongly nodes within a module are correlation on average in the test dataset. This average depends on the correlation coefficients in the discovery dataset: the score is penalised where correlation coefficients change in sign between datasets. A high'avg.cor' with a small permutationP-value indicates that the module is (a) more strongly correlated than expected by chance for a module of the same size, and (b) more consistently correlated with respect to the discovery dataset than expected by chance.

The'cor.cor' measures how similar the correlation coefficients are across the two datasets. A high'cor.cor' with a small permutationP-value indicates that the correlation structure within a module is more similar across datasets than expected by chance. If all nodes within a module are very similarly correlated then this statistic will not be meaningful, as its value will be heavily influenced by tiny, non-meaningful, variations in correlation strength. This can be assessed throughvisualisation of the module topology (seeplotCorrelation.)Similarly, a strong'cor.cor' is unlikely to be meaningful if the'avg.cor' is not significant.

Theaverage edge weight ('avg.weight') andconcordanceof weighted degree ('cor.degree') are both calculated from the interaction network (provided as adjacency matrices to the'network'argument).

The'avg.weight' measures the average connection strength between nodes in the test dataset. In the weighted gene coexpression network literature this is typically called the "module density"(2). A high'avg.weight' with a small permutationP-value indicates thatthe module is more strongly connected in the test dataset than expected bychance.

The'cor.degree' calculates whether the relative rank of each node'sweighted degree is similar across datasets. Theweighteddegree is calculated as the sum of a node's edge weights to all other nodesin the module. In the weighted gene coexpression network literature this is typically called the "intramodular connectivity"(2). This statistic will not be meaningful where all nodes are connected to each other with similar strength, as its value will be heavily influenced by tiny,non-meaningful, variations in weighted degree. This can be assessed throughvisualisation of the module topology (seeplotDegree.)

Both the'avg.weight' and'cor.degree' assume edges are weighted, and that the network is densely connected. Note that for sparse networks, edges with zero weight are included when calculating bothstatistics. Only the magnitude of the weights, not their sign, contribute tothe score. If the network isunweighted,i.e. edges indicatepresence or absence of a relationship, then the'avg.weight' will bethe proportion of the number of edges to the total number of possible edgeswhile theweighted degree simply becomes thedegree. A high'avg.weight' in this case measures how interconnected a module is inthe test dataset. A highdegree indicates that a node is connected tomany other nodes. The interpretation of the'cor.degree' remainsunchanged between weighted and unweighted networks. If the network isdirected the interpretation of the'avg.weight' remains unchanged,while thecor.degree will measure the concordance of the nodein-degree in the test network. To measure theout-degreeinstead, the adjacency matrices provided to the'network' argumentshould be transposed.

Sparse data:

Caution should be used when runningNetRepon sparse data (i.e. where there are many zero values in the data used to infer the network). For this data, theaverage node contribution ('avg.contrib'),concordance of node contribution ('cor.contrib'), andmodule coherence ('coherence')will all be systematically underestimated due to their reliance on the Pearson correlation coefficient to calculate thenode contribution.

Care should also be taken to use appropriate methods for inferring thecorrelation structure when the data is sparse for the same reason.

Proportional data:

Caution should be used when runningNetRep on proportional data (i.e. where observations across samples all sum to the same value,e.g. 1). For this data, theaverage node contribution ('avg.contrib'),concordance of node contribution ('cor.contrib'), andmodule coherence ('coherence')will all be systematically overestimated due to their reliance on the Pearson correlation coefficient to calculate thenode contribution.

Care should also be taken to use appropriate methods for inferring thecorrelation structure from proportional data for the same reason.

Hypothesis testing:

Three alternative hypotheses are available. "greater", the default, testswhether each module preservation statistic is larger than expected by chance. "lesser" tests whether each module preservation statistic is smallerthan expected by chance, which may be useful for identifying modules thatare extremely different in thetest dataset. "two.sided" can be usedto test both alternate hypotheses.

To determine whether a module preservation statistic deviates from chance, apermutation procedure is employed. Each statistic is calculated between themodule in thediscovery dataset andnPerm random subsets ofthe same size in thetest dataset in order to assess the distributionof each statistic under the null hypothesis.

Two models for the null hypothesis are available: "overlap", the default, only nodes that are present in both thediscovery andtestnetworks are used when generating null distributions. This is appropriateunder an assumption that nodes that are present in thetest dataset, but not present in thediscovery dataset, are unobserved: that is,they may fall in the module(s) of interest in thediscovery datasetif they were to be measured there. Alternatively, "all" will use all nodesin thetest network when generating the null distributions.

The number of permutations required for any given significance threshold is approximately 1 / the desired significance for one sided tests, and double that for two-sided tests. This can be calculated withrequiredPerms. WhennPerm is not specified, the number of permutations is automatically calculated as the number required for a Bonferroni corrected significance threshold adjusting for the total number of tests for each statistic, i.e. the total number of modules to be analysedmultiplied by the number oftest datasets each module is tested in. Although assessing the replication of a small numberof modules calls for very few permutations, we recommend using no fewer than 1,000 as fewer permutations are unlikely to generate representative null distributions.Note: the assumption used byrequiredPerms to determine the correct number of permtutations breaks down when assessing thepreservation of modules in a very small dataset (e.g. gene sets in a datasetwith less than 100 genes total). However, the reported p-values will stillbe accurate (seepermutationTest)(3).

Value

A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is also a list, containing the followingobjects:

observed:

A matrix of the observed values for the module preservation statistics.Rows correspond to modules, and columns to the module preservationstatistics.

nulls:

A three dimensional array containing the values of the module preservation statistics evaluated on random permutation of module assignment in the test network. Rows correspond to modules, columns tothe module preservation statistics, and the third dimension to the permutations.

p.values:

A matrix of p-values for theobserved module preservation statistics as evaluated through a permutation test using the corresponding values innulls.

nVarsPresent:

A vector containing the number of variables that are present in the testdataset for each module.

propVarsPresent:

A vector containing the proportion of variables present in the test datasetfor each module. Modules where this is less than 1 should be investigated further before making judgements about preservation to ensure that the missing variables are not the most connected ones.

contingency:

IfmoduleAssignments are present for both thediscoveryandtest datasets, then a contingency table showing the overlapbetween modules across datasets is returned. Rows correspond to modulesin thediscovery dataset, columns to modules in thetestdataset.

Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if module preservation is tested in only one dataset, thenthe returned list will have only the above elements.

Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond toa dataset, e.g.results[["Dataset1"]][["Dataset2"]] indicates an analysis where modules discovered in "Dataset1" are assessed for preservation in "Dataset2". Dataset comparisons which have not been assessed will containNULL.

References

  1. Ritchie, S.C.,et al.,A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell Systems.3, 71-82 (2016).

  2. Langfelder, P., Luo, R., Oldham, M. C. & Horvath, S.Is mynetwork module preserved and reproducible? PLoS Comput. Biol.7, e1001057 (2011).

  3. Phipson, B. & Smyth, G. K.Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.Stat. Appl. Genet. Mol. Biol.9, Article39 (2010).

  4. Langfelder, P. & Horvath, S.WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics9, 559 (2008).

See Also

Functions for:visualising network modules,calculating module topology,calculating permutation test P-values, andsplitting computation over multiple machines.

Examples

# load in example data, correlation, and network matrices for a discovery and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Assess module preservation: you should run at least 10,000 permutationspreservation <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, nPerm=1000, discovery="discovery",  test="test", nThreads=2)

Calculate the topological properties for a network module

Description

Calculates the network properties used to assess module preservation for oneor more modules in a user specified dataset.

Usage

networkProperties(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  simplify = TRUE,  verbose = TRUE)

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

simplify

logical; ifTRUE, simplify the structure of the outputlist if possible (see Return Value).

verbose

logical; should progress be reported? Default isTRUE.

Details

Input data structures:

Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If thenetworkProperties are being calculate within thediscovery ortest datasets, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.

Analysing large datasets:

Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Value

A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is a list that has one element per'modules' specified. Each of these is a list containing the following objects:

'degree':

The weighted within-module degree: the sum of edge weights for each node in the module.

'avgWeight':

The average edge weight within the module.

If the'data' used to infer the'test' network is provided then the following are also returned:

'summary':

A vector summarising the module across each sample. This is calculated as the first eigenvector of the module from a principal component analysis.

'contribution':

Thenode contribution: the similarity between each node and themodule summary profile ('summary').

'coherence':

The proportion of module variance explained by the'summary'vector.

Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if the network properties are requested for only one module in only one dataset, then the returned list will have only the above elements.

Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond toa dataset, and each element at the third level will correspond to modules discovered in the dataset specified at the top level if module labels are provided in the correspondingmoduleAssignments list element. E.g.results[["Dataset1"]][["Dataset2"]][["module1"]] will contain the properties of "module1" as calculated in "Dataset2", where "module1" was indentified in "Dataset1". Modules and datasets for which calculation of the network properties have not been requested will containNULL.

See Also

Getting nodes ordered by degree., andOrdering samples by module summary

Examples

# load in example data, correlation, and network matrices for a discovery and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Calculate the topological properties of all network modules in the discovery datasetprops <- networkProperties(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list)  # Calculate the topological properties in the test dataset for the same modulestest_props <- networkProperties(  network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, discovery="discovery", test="test")

Order nodes in descending order ofweighted degree and order modules by the similarity of their summary vectors.

Description

Order nodes in descending order ofweighted degree and order modules by the similarity of their summary vectors.

Usage

nodeOrder(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  na.rm = FALSE,  orderModules = TRUE,  mean = FALSE,  simplify = TRUE,  verbose = TRUE)

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

na.rm

logical; IfTRUE, nodes and modules present in thediscovery dataset but missing from the test dataset are excluded. IfFALSE, missing nodes and modules are put last in the ordering.

orderModules

logical; ifTRUE modules ordered by clustering their summary vectors. IfFALSE modules are returned in the orderprovided.

mean

logical; ifTRUE, node order will be calculated for eachdiscovery dataset by averaging the weighted degree and poolingmodule summary vectors across the specifiedtest datasets.IfFALSE, the node order is calculated separately in each test dataset.

simplify

logical; ifTRUE, simplify the structure of the outputlist if possible (see Return Value).

verbose

logical; should progress be reported? Default isTRUE.

Details

Input data structures:

Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If thenodeOrder are being calculate within thediscovery ortest datasets, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.

Analysing large datasets:

Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Mean weighted degree:

When multiple'test' datasets are specified and'mean' isTRUE, then the order of nodes will be determine by the average ofeach node's weighted degree across datasets. The weighted degree in each dataset is scaled to the node with the maximum weighted degree in thatmodule in that dataset: this prevents differences in average edge weight across datasets from influencing the outcome (otherwise the mean would beweighted by the overall density of connections in the module). Thus, the mean weighted degree is a robust measure of a node's relative importance to a module across datasets. The mean is calculated with'na.rm=TRUE': where a node is missing it does not contribute to the mean.

Value

A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is a list that has one element per'modules' specified, containing a vector of node names for therequested module. Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if the node ordering are requested for module(s) in only one dataset, then a single vector of node labels willbe returned.

Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond to a dataset, and each element at the third level will correspond to modules discovered in the dataset specified at the top level if module labels are provided in the correspondingmoduleAssignments list element. E.g.results[["Dataset1"]][["Dataset2"]][["module1"]] will contain the order of nodes calculated in "Dataset2", where "module1" was indentified in"Dataset1". Modules and datasets for which calculation of the node order have not been requested will containNULL.

References

  1. Langfelder, P., Mischel, P. S. & Horvath, S.When is hub gene selection better than standard meta-analysis? PLoS One8, e61505 (2013).

See Also

networkProperties

Examples

# load in example data, correlation, and network matrices for a discovery# and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Sort modules by similarity and nodes within each module by their weighted # degreenodes <- nodeOrder(  network=network_list, data=data_list, correlation=correlation_list,    moduleAssignments=labels_list)

Template parameters

Description

Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.

Arguments

orderModules

logical; ifTRUE modules ordered by clustering their summary vectors. IfFALSE modules are returned in the orderprovided.


Permutation test P-values for module preservation statistics

Description

Evaluates the statistical significance of each module preservation test statistic for one or more modules.

Usage

permutationTest(  nulls,  observed,  nVarsPresent,  totalSize,  alternative = "greater")

Arguments

nulls

a 3-dimension matrix where the columns correspond to modulepreservation statistics, rows correspond to modules, and the third dimension to null distribution observations drawn from the permutation procedure inmodulePreservation.

observed

a matrix of observed values for each module preservationstatistc (columns) for each module (rows) returned frommodulePreservation.

nVarsPresent

a vector containing the number of variables/nodes in eachmodule that was present in thetest dataset. Returned as a list element of the same name bymodulePreservation.

totalSize

the size of the test network used to perform the test. Returned as a list element of the same name bymodulePreservation.

alternative

a character string specifying the alternative hypothesis, must be one of "greater" (default), "less", or "two.sided". You can specify just the initial letter.

Details

Calculates exact p-values for permutation tests when permutations are randomly drawn with replacement using thepermp function in thestatmod package.

This function may be useful for re-calculating permutation test P-values,for example when there are missing values due to sparse data. In this casethe user may decide that these missing values should be assigned 0 so thatP-values aren't signficant purely due to many incalculable statistics leadingto low power.

References

  1. Phipson, B. & Smyth, G. K.Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.Stat. Appl. Genet. Mol. Biol.9, Article39 (2010).

Examples

data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Note that we recommend running at least 10,000 permutations to make sure # that the null distributions are representative.preservation <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, nPerm=1000, discovery="discovery",  test="test")# Re-calculate the permutation test P-valuesp.values <- permutationTest(  preservation$nulls, preservation$observed, preservation$nVarsPresent,  preservation$totalSize, preservation$alternative)

Plot the topology of a network module

Description

Plot the correlation structure, network edges, scaled weighted degree, node contribtuion, module data, and module summary vectors of one ormore network modules.

Individual components of the module plot can be plotted usingplotCorrelation,plotNetwork,plotDegree,plotContribution,plotData, andplotSummary.

Usage

plotModule(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderSamplesBy = NULL,  orderNodesBy = NULL,  orderModules = TRUE,  plotNodeNames = TRUE,  plotSampleNames = TRUE,  plotModuleNames = NULL,  main = "Module Topology",  main.line = 1,  drawBorders = FALSE,  lwd = 1,  naxt.line = -0.5,  saxt.line = -0.5,  maxt.line = NULL,  xaxt.line = -0.5,  xaxt.tck = -0.025,  xlab.line = 2.5,  yaxt.line = 0,  yaxt.tck = -0.15,  ylab.line = 2.5,  laxt.line = 2.5,  laxt.tck = 0.04,  cex.axis = 0.8,  legend.main.line = 1.5,  cex.lab = 1.2,  cex.main = 2,  dataCols = NULL,  dataRange = NULL,  corCols = correlation.palette(),  corRange = c(-1, 1),  netCols = network.palette(),  netRange = c(0, 1),  degreeCol = "#feb24c",  contribCols = c("#A50026", "#313695"),  summaryCols = c("#1B7837", "#762A83"),  naCol = "#bdbdbd",  dryRun = FALSE)

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

verbose

logical; should progress be reported? Default isTRUE.

orderSamplesBy

NULL (default),NA, or a vector containing a single dataset name or index. Controls how samples are ordered on the plot (see details).

orderNodesBy

NULL (default),NA, or a vector of datasetnames or indices. Controls how nodes are ordered on the plot (see details).

orderModules

logical; ifTRUE modules ordered by clustering their summary vectors. IfFALSE modules are returned in the orderprovided.

plotNodeNames

logical; controls whether the node names are drawed on the bottom axis.

plotSampleNames

logical; controls whether the sample names are drawed on the left axis.

plotModuleNames

logical; controls whether module names are drawed.The default is for module names to be drawed when multiplemodules are drawn.

main

title for the plot.

main.line

the number of lines into the top margin at which the plottitle will be drawn.

drawBorders

logical; ifTRUE, borders are drawn around theweighted degree,node conribution, andmodule summarybar plots.

lwd

line width for borders and axes.

naxt.line

the number of lines into the bottom margin at which the nodenames will be drawn.

saxt.line

the number of lines into the left margin at which the samplenames will be drawn.

maxt.line

the number of lines into the bottom margin at which the module names will be drawn.

xaxt.line

the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot.

xaxt.tck

the size of the x-axis ticks for the module summary bar plot.

xlab.line

the number of lines into the bottom margin at which the x axis label on themodule summary bar plot(s) will be drawn.

yaxt.line

the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots.

yaxt.tck

the size of the y-axis ticks for the weighted degree and node contribution bar plots.

ylab.line

the number of lines into the left margin at which the y axis labels on theweighted degree andnode contribution bar plots will be drawn.

laxt.line

the distance from the legend to draw the legend axis labels, as multiple oflaxt.tck.

laxt.tck

size of the ticks on each axis legend relative to thesize of the correlation, edge weights, and data matrix heatmaps.

cex.axis

relative size of the node and sample names.

legend.main.line

the distance from the legend to draw the legend title.

cex.lab

relative size of the module names and legend titles.

cex.main

relative size of the plot titles.

dataCols

a character vector of colors to create a gradient from forthe data heatmap (see details). Automatically determined ifNA orNULL.

dataRange

the range of values to map to thedataCols gradient(see details). Automatically determined ifNA orNULL.

corCols

a character vector of colors to create a gradient from forthe correlation structure heatmap (see details).

corRange

the range of values to map to thecorCols gradient(see details).

netCols

a character vector of colors to create a gradient from forthe network edge weight heatmap (see details).

netRange

the range of values to map to thecorCols gradient(see details). Automatically determined ifNA orNULL.

degreeCol

color to use for the weighted degree bar plot.

contribCols

color(s) to use for the node contribution bar plot (see details).

summaryCols

color(s) to use for the node contribution bar plot (see details).

naCol

color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps.

dryRun

logical; ifTRUE, only the axes and labels will be drawed.

Details

Input data structures:

Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If the node andsample ordering is being calculated within the same dataset being visualised, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.

Analysing large datasets:

Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Node, sample, and module ordering:

By default, nodes are ordered in decreasing order ofweighted degreein thediscovery dataset (seenodeOrder). Missing nodes are colored in grey. This facilitates the visual comparison of modules across datasets, as the node ordering will be preserved.

Alternatively, a vector containing the names or indices of one or moredatasets can be provided to theorderNodesBy argument.

If a single dataset is provided, then nodes will be ordered in decreasing order ofweighted degree in that dataset. Only nodes that are present in this dataset will be drawn when ordering nodes by a dataset that is not thediscovery dataset for the requested modules(s).

If multiple datasets are provided then theweighted degree will beaveraged across these datasets (seenodeOrder for more details). This is useful for obtaining a robust ordering of nodes by relative importance, assuming the modules displayed are preserved in those datasets.

Ordering of nodes byweighted degree can be suppressed by settingorderNodesBy toNA, in which case nodes will be ordered as in the matrices provided in thenetwork,data, andcorrelation arguments.

When multiple modules are drawn, modules are ordered by the similarityof their summary vectors in the dataset(s) specified inorderNodesByargument. If multiple datasets are provided to theorderNodesByargument then the module summary vectors are concatenated across datasets.

By default, samples in the data heatmap and accompanying module summary barplot are ordered in descending order ofmodule summary in the drawn dataset (specified by thetest argument). If multiple modules are drawn, samples are ordered as per the left-most module on the plot.

Alternatively, a vector containing the name or index of another dataset may be provided to theorderSamplesBy argument. In this case, samples will be ordered in descending order ofmodule summary in the specified dataset. This is useful when comparing different measurements across samples, for example, gene expression data obtained from multiple tissues samples across the same individuals. If the dataset specified is thediscovery dataset, then missing samples will be displayed as horizontal grey bars. If the dataset specified is one of the other datasets, samples present in both the specified dataset and thetest dataset will be displayed first in order of the specified dataset, then samples present in only the test dataset will be displayedunderneath a horizontal black line ordered by their module summary vector in the test dataset.

Order of samples bymodule summary can be suppressed by settingorderSamplesBy toNA, in which case samples will be order asin the matrix provided to thedata argument for the drawn dataset.

Weighted degree scaling:

When drawn on a plot, the weighted degree of each node is scaled to the maximum weighted degree within its module. The scaled weighted degree is measure of relative importance for each node to its module. This makes visualisation of multiple modules with different sizes and densities possible. However, the scaled weighted degree should only be interpretedfor groups of nodes that have an apparent module structure.

Plot layout and device size

For optimal results we recommend viewing single modules on a PNG devicewith a width of 1500, a height of 2700 and a nominal resolution of 300 (png(filename, width=5*300, height=9*300, res=300))).

Warning: PDF and other vectorized devices should not be used whenplotting more than a hundred nodes. Large files will be generated whichmay cause image editing programs such as Inkscape or Illustrator to crashwhen polishing figures for publication.

WhendryRun isTRUE only the axes, legends, labels, andtitle will be drawn, allowing for quick iteration of customisableparameters to get the plot layout correct.

If axis labels or legends are drawn off screen then the margins of the plot should be adjusted prior to plotting using thepar command to increase the margin size (see the "mar" option in thepar help page).

The size of text labels can be modified by increasing or decreasing thecex.main,cex.lab, andcex.axis arguments:

cex.main:

controls the size of the plot title (specified in themain argument).

cex.lab:

controls the size of the axis labels on theweighted degree,node contribution,andmodule summary bar plots as well asthe size of the module labels and the heatmap legend titles.

cex.axis:

contols the size of the axis tick labels, including the node and sample labels.

The position of these labels can be changed through the following arguments:

xaxt.line:

controls the distance from the plot the x-axistick labels are drawn on themodule summary bar plot.

xlab.line:

controls the distance from the plot the x-axis label is drawn on themodule summary bar plot.

yaxt.line:

controls the distance from the plot the y-axis tick labels are drawn on theweighted degree andnode contribution bar plots.

ylab.line:

controls the distance from the plot the y-axislabel is drawn on theweighted degree andnode contribution bar plots.

main.line:

controls the distance from the plot the title isdrawn.

naxt.line:

controls the distance from the plot the node labels are drawn.

saxt.line:

controls the distance from the plot the sample labels are drawn.

maxt.line:

controls the distance from the plot the modulelabels are drawn.

laxt.line:

controls the distance from the heatmap legendsthat the gradient legend labels are drawn.

legend.main.line:

controls the distance from the heatmaplegends that the legend title is drawn.

The rendering of node, sample, and module names can be disabled by settingplotNodeNames,plotSampleNames, andplotModuleNamestoFALSE.

The size of the axis ticks can be changed by increasing or decreasing thefollowing arguments:

xaxt.tck:

size of the x-axis tick labels as a multiple ofthe height of themodule summary bar plot

yaxt.tck:

size of the y-axis tick labels as a multiple of the width of theweighted degree ornode contributionbar plots.

laxt.tck:

size of the heatmap legend axis ticks as a multiple of the width of the data, correlation structure, or network edge weight heatmaps.

ThedrawBorders argument controls whether borders are drawn around the weighted degree, node contribution, or module summary bar plots. Thelwd argument controls the thickness of these borders, as well as the thickness of axes and axis ticks.

Modifying the color palettes:

ThedataCols anddataRange arguments control the appearance of the data heatmap (seeplotData). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between indataCols anddataRange specifies the range of values that maps to this gradient. Values outside of the specifieddataRange will be rendered with the colors used at eitherextreme of the gradient. The default gradient is determined based on thedata shown on the plot. If all values in thedata matrix arepositive, then the gradient is interpolated between white and green, wherewhite is used for the smallest value and green for the largest. If allvalues are negative, then the gradient is interpolated between purple andwhite, where purple is used for the smallest value and white for the valueclosest to zero. If the data contains both positive and negative values, then the gradient is interpolated between purple, white, and green, where white is used for values of zero. In this case the range shown is always centered at zero, with the values at either extreme determined by the value in the rendereddata with the strongest magnitude (the maximum of the absolute value).

ThecorCols andcorRange arguments control the appearance ofthe correlation structure heatmap (seeplotCorrelation). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between incorCols. By default,strong negative correlations are shown in blue, and strong positivecorrelations in red, and weak correlations as white.corRange controls the range of values that this gradient maps to, by default, -1 to1. Changing this may be useful for showing differences where range of correlation coefficients is small.

ThenetCols andnetRange arguments control the appearance ofthe network edge weight heatmap (seeplotNetwork). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between innetCols. By default,weak or non-edges are shown in white, while strong edges are shown in red.ThenetRange controls the range of values this gradient maps to, by default, 0 to 1. IfnetRange is set toNA, then the gradient will be mapped to values between 0 and the maximum edge weight ofthe shown network.

ThedegreeCol argument controls the color of the weighted degreebar plot (seeplotDegree).

ThecontribCols argument controls the color of the node contribution bar plot (seeplotContribution. This can be specified as single value to be used for all nodes, or as two colors: oneto use for nodes with positive contributions and one to use for nodes withnegative contributions.

ThesummaryCols argument controls the color of the module summary bar plot (seeplotSummary. This can be specified as singlevalue to be used for all samples, or as two colors: one to use for sampleswith a positive module summary value and one fpr samples with a negativemodule summary value.

ThenaCol argument controls the color of missing nodes and sampleson the data, correlaton structure, and network edge weight heatmaps.

Embedding in Rmarkdown documents

The chunk optionfig.keep="last" should be set to avoid an empty plot being embedded above the plot generated byplotModule. Thisempty plot is generated so that an error will be thrown as early as possible if the margins are too small to be displayed. Normally, theseare drawn over with the actual plot components when drawing the plot onother graphical devices.

See Also

plotCorrelation,plotNetwork,plotDegree,plotContribution,plotData, andplotSummary.

Examples

# load in example data, correlation, and network matrices for a discovery # and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Plot module 1, 2 and 4 in the discovery datasetplotModule(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 2, 4))# Now plot them in the test dataset (module 2 does not replicate)plotModule(  network=network_list,data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery",  test="test")# Plot modules 1 and 4, which replicate, in the test datset ordering nodes# by weighted degree averaged across the two datasetsplotModule(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 4), discovery="discovery",  test="test", orderNodesBy=c("discovery", "test"))

Plot a topological feature of network module

Description

Functions for plotting the topology of a network module.

Usage

plotData(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderSamplesBy = NULL,  orderNodesBy = NULL,  orderModules = TRUE,  plotNodeNames = TRUE,  plotSampleNames = TRUE,  plotModuleNames = NULL,  main = "",  main.line = 1,  lwd = 1,  plotLegend = TRUE,  legend.main = "Data",  legend.main.line = 1,  naxt.line = -0.5,  saxt.line = -0.5,  maxt.line = 3,  legend.position = 0.15,  laxt.tck = 0.03,  laxt.line = 3,  cex.axis = 0.8,  cex.lab = 1.2,  cex.main = 2,  dataCols = NULL,  dataRange = NULL,  naCol = "#bdbdbd",  dryRun = FALSE)plotCorrelation(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderNodesBy = NULL,  symmetric = FALSE,  orderModules = TRUE,  plotNodeNames = TRUE,  plotModuleNames = NULL,  main = "",  main.line = 1,  lwd = 1,  plotLegend = TRUE,  legend.main = "Correlation",  legend.main.line = 1,  naxt.line = -0.5,  maxt.line = 3,  legend.position = NULL,  laxt.tck = NULL,  laxt.line = NULL,  cex.axis = 0.8,  cex.lab = 1.2,  cex.main = 2,  corCols = correlation.palette(),  corRange = c(-1, 1),  naCol = "#bdbdbd",  dryRun = FALSE)plotNetwork(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderNodesBy = NULL,  symmetric = FALSE,  orderModules = TRUE,  plotNodeNames = TRUE,  plotModuleNames = NULL,  main = "",  main.line = 1,  lwd = 1,  plotLegend = TRUE,  legend.main = "Edge weight",  legend.main.line = 1,  naxt.line = -0.5,  maxt.line = 3,  legend.position = NULL,  laxt.tck = NULL,  laxt.line = NULL,  cex.axis = 0.8,  cex.lab = 1.2,  cex.main = 2,  netCols = network.palette(),  netRange = c(0, 1),  naCol = "#bdbdbd",  dryRun = FALSE)plotContribution(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderNodesBy = NULL,  orderModules = TRUE,  plotNodeNames = TRUE,  plotModuleNames = NULL,  main = "",  main.line = 1,  ylab.line = 2.5,  lwd = 1,  drawBorders = FALSE,  naxt.line = -0.5,  maxt.line = 3,  yaxt.line = 0,  yaxt.tck = -0.035,  cex.axis = 0.8,  cex.lab = 1.2,  cex.main = 2,  contribCols = c("#A50026", "#313695"),  naCol = "#bdbdbd",  dryRun = FALSE)plotDegree(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderNodesBy = NULL,  orderModules = TRUE,  plotNodeNames = TRUE,  plotModuleNames = NULL,  main = "",  main.line = 1,  lwd = 1,  drawBorders = FALSE,  naxt.line = -0.5,  maxt.line = 3,  yaxt.line = 0,  yaxt.tck = -0.035,  ylab.line = 2.5,  cex.axis = 0.8,  cex.lab = 1.2,  cex.main = 2,  degreeCol = "#feb24c",  naCol = "#bdbdbd",  dryRun = FALSE)plotSummary(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  verbose = TRUE,  orderSamplesBy = NULL,  orderNodesBy = NULL,  orderModules = TRUE,  plotSampleNames = TRUE,  plotModuleNames = NULL,  main = "",  main.line = 1,  xlab.line = 2.5,  lwd = 1,  drawBorders = FALSE,  saxt.line = -0.5,  maxt.line = 0,  xaxt.line = 0,  xaxt.tck = -0.025,  cex.axis = 0.8,  cex.lab = 1.2,  cex.main = 2,  summaryCols = c("#1B7837", "#762A83"),  naCol = "#bdbdbd",  dryRun = FALSE)

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

verbose

logical; should progress be reported? Default isTRUE.

orderSamplesBy

NULL (default),NA, or a vector containing a single dataset name or index. Controls how samples are ordered on the plot (see details).

orderNodesBy

NULL (default),NA, or a vector of datasetnames or indices. Controls how nodes are ordered on the plot (see details).

orderModules

logical; ifTRUE modules ordered by clustering their summary vectors. IfFALSE modules are returned in the orderprovided.

plotNodeNames

logical; controls whether the node names are drawed on the bottom axis.

plotSampleNames

logical; controls whether the sample names are drawed on the left axis.

plotModuleNames

logical; controls whether module names are drawed.The default is for module names to be drawed when multiplemodules are drawn.

main

title for the plot.

main.line

the number of lines into the top margin at which the plottitle will be drawn.

lwd

line width for borders and axes.

plotLegend

logical; controls whether a legend is drawn when usingplotCorrelation,plotNetwork, orplotData.

legend.main

title for the legend.

legend.main.line

the distance from the legend to draw the legend title.

naxt.line

the number of lines into the bottom margin at which the nodenames will be drawn.

saxt.line

the number of lines into the left margin at which the samplenames will be drawn.

maxt.line

the number of lines into the bottom margin at which the module names will be drawn.

legend.position

the distance from the plot to start the legend, as aproportion of the plot width.

laxt.tck

size of the ticks on each axis legend relative to thesize of the correlation, edge weights, and data matrix heatmaps.

laxt.line

the distance from the legend to draw the legend axis labels, as multiple oflaxt.tck.

cex.axis

relative size of the node and sample names.

cex.lab

relative size of the module names and legend titles.

cex.main

relative size of the plot titles.

dataCols

a character vector of colors to create a gradient from forthe data heatmap (see details). Automatically determined ifNA orNULL.

dataRange

the range of values to map to thedataCols gradient(see details). Automatically determined ifNA orNULL.

naCol

color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps.

dryRun

logical; ifTRUE, only the axes and labels will be drawed.

symmetric

logical; controls whether the correlation and network heatmaps are drawn as symmetric (square) heatmaps or asymettric triangle heatmaps. If symmetric, then the node and module names will also be renderedon the left axis.

corCols

a character vector of colors to create a gradient from forthe correlation structure heatmap (see details).

corRange

the range of values to map to thecorCols gradient(see details).

netCols

a character vector of colors to create a gradient from forthe network edge weight heatmap (see details).

netRange

the range of values to map to thecorCols gradient(see details). Automatically determined ifNA orNULL.

ylab.line

the number of lines into the left margin at which the y axis labels on theweighted degree andnode contribution bar plots will be drawn.

drawBorders

logical; ifTRUE, borders are drawn around theweighted degree,node conribution, andmodule summarybar plots.

yaxt.line

the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots.

yaxt.tck

the size of the y-axis ticks for the weighted degree and node contribution bar plots.

contribCols

color(s) to use for the node contribution bar plot (see details).

degreeCol

color to use for the weighted degree bar plot.

xlab.line

the number of lines into the bottom margin at which the x axis label on themodule summary bar plot(s) will be drawn.

xaxt.line

the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot.

xaxt.tck

the size of the x-axis ticks for the module summary bar plot.

summaryCols

color(s) to use for the node contribution bar plot (see details).

Details

Input data structures:

Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If the node andsample ordering is being calculated within the same dataset being visualised, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.

Analysing large datasets:

Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Node, sample, and module ordering:

By default, nodes are ordered in decreasing order ofweighted degreein thediscovery dataset (seenodeOrder). Missing nodes are colored in grey. This facilitates the visual comparison of modules across datasets, as the node ordering will be preserved.

Alternatively, a vector containing the names or indices of one or moredatasets can be provided to theorderNodesBy argument.

If a single dataset is provided, then nodes will be ordered in decreasing order ofweighted degree in that dataset. Only nodes that are present in this dataset will be drawn when ordering nodes by a dataset that is not thediscovery dataset for the requested modules(s).

If multiple datasets are provided then theweighted degree will beaveraged across these datasets (seenodeOrder for more details). This is useful for obtaining a robust ordering of nodes by relative importance, assuming the modules displayed are preserved in those datasets.

Ordering of nodes byweighted degree can be suppressed by settingorderNodesBy toNA, in which case nodes will be ordered as in the matrices provided in thenetwork,data, andcorrelation arguments.

When multiple modules are drawn, modules are ordered by the similarityof their summary vectors in the dataset(s) specified inorderNodesByargument. If multiple datasets are provided to theorderNodesByargument then the module summary vectors are concatenated across datasets.

By default, samples in the data heatmap and accompanying module summary barplot are ordered in descending order ofmodule summary in the drawn dataset (specified by thetest argument). If multiple modules are drawn, samples are ordered as per the left-most module on the plot.

Alternatively, a vector containing the name or index of another dataset may be provided to theorderSamplesBy argument. In this case, samples will be ordered in descending order ofmodule summary in the specified dataset. This is useful when comparing different measurements across samples, for example, gene expression data obtained from multiple tissues samples across the same individuals. If the dataset specified is thediscovery dataset, then missing samples will be displayed as horizontal grey bars. If the dataset specified is one of the other datasets, samples present in both the specified dataset and thetest dataset will be displayed first in order of the specified dataset, then samples present in only the test dataset will be displayedunderneath a horizontal black line ordered by their module summary vector in the test dataset.

Order of samples bymodule summary can be suppressed by settingorderSamplesBy toNA, in which case samples will be order asin the matrix provided to thedata argument for the drawn dataset.

Weighted degree scaling:

When drawn on a plot, the weighted degree of each node is scaled to the maximum weighted degree within its module. The scaled weighted degree is measure of relative importance for each node to its module. This makes visualisation of multiple modules with different sizes and densities possible. However, the scaled weighted degree should only be interpretedfor groups of nodes that have an apparent module structure.

Plot layout and device size

Although reasonable default values for most parameters have been provided,the rendering of axes and titles may need adjusting depending on the sizeof the plot window. ThedryRun argument is useful for quickly determining whether the plot will render correctly. WhendryRun isTRUE only the axes, legends, labels, and title will be drawn, allowing for quick iteration of customisable parameters to get the plot layout correct.

Warning: PDF and other vectorized devices should not be used whenplotting the heatmaps with more than a hundred nodes. Large files will begenerated which may cause image editing programs such as Inkscape orIllustrator to crash when polishing figures for publication.

If axis labels or legends are drawn off screen then the margins of the plot should be adjusted prior to plotting using thepar command to increase the margin size (see the "mar" option in thepar help page).

The size of text labels can be modified by increasing or decreasing thecex.main,cex.lab, andcex.axis arguments:

cex.main:

controls the size of the plot title (specified in themain argument).

cex.lab:

controls the size of the axis labels on theweighted degree,node contribution,andmodule summary bar plots as well asthe size of the module labels and the heatmap legend titles.

cex.axis:

contols the size of the axis tick labels, including the node and sample labels.

The position of these labels can be changed through the following arguments:

xaxt.line:

controls the distance from the plot the x-axistick labels are drawn on themodule summary bar plot.

xlab.line:

controls the distance from the plot the x-axis label is drawn on themodule summary bar plot.

yaxt.line:

controls the distance from the plot the y-axis tick labels are drawn on theweighted degree andnode contribution bar plots.

ylab.line:

controls the distance from the plot the y-axislabel is drawn on theweighted degree andnode contribution bar plots.

main.line:

controls the distance from the plot the title isdrawn.

naxt.line:

controls the distance from the plot the node labels are drawn.

saxt.line:

controls the distance from the plot the sample labels are drawn.

maxt.line:

controls the distance from the plot the modulelabels are drawn.

laxt.line:

controls the distance from the heatmap legendsthat the gradient legend labels are drawn.

legend.main.line:

controls the distance from the heatmaplegends that the legend title is drawn.

The rendering of node, sample, and module names can be disabled by settingplotNodeNames,plotSampleNames, andplotModuleNamestoFALSE.

The size of the axis ticks can be changed by increasing or decreasing thefollowing arguments:

xaxt.tck:

size of the x-axis tick labels as a multiple ofthe height of themodule summary bar plot

yaxt.tck:

size of the y-axis tick labels as a multiple of the width of theweighted degree ornode contributionbar plots.

laxt.tck:

size of the heatmap legend axis ticks as a multiple of the width of the data, correlation structure, or network edge weight heatmaps.

The placement of heatmap legends is controlled by the following arguments:

plotLegend:

ifFALSE legend will not be drawn.

legend.position:

a multiple of the plot width, controls the horizontal distance from the plot the legend is drawn.

ThedrawBorders argument controls whether borders are drawn around the weighted degree, node contribution, or module summary bar plots. Thelwd argument controls the thickness of these borders, as well as the thickness of axes and axis ticks.

Modifying the color palettes:

ThedataCols anddataRange arguments control the appearance of the data heatmap (seeplotData). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between indataCols anddataRange specifies the range of values that maps to this gradient. Values outside of the specifieddataRange will be rendered with the colors used at eitherextreme of the gradient. The default gradient is determined based on thedata shown on the plot. If all values in thedata matrix arepositive, then the gradient is interpolated between white and green, wherewhite is used for the smallest value and green for the largest. If allvalues are negative, then the gradient is interpolated between purple andwhite, where purple is used for the smallest value and white for the valueclosest to zero. If the data contains both positive and negative values, then the gradient is interpolated between purple, white, and green, where white is used for values of zero. In this case the range shown is always centered at zero, with the values at either extreme determined by the value in the rendereddata with the strongest magnitude (the maximum of the absolute value).

ThecorCols andcorRange arguments control the appearance ofthe correlation structure heatmap (seeplotCorrelation). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between incorCols. By default,strong negative correlations are shown in blue, and strong positivecorrelations in red, and weak correlations as white.corRange controls the range of values that this gradient maps to, by default, -1 to1. Changing this may be useful for showing differences where range of correlation coefficients is small.

ThenetCols andnetRange arguments control the appearance ofthe network edge weight heatmap (seeplotNetwork). Thegradient of colors used on the heatmap can be changed by specifying avector of colors to interpolate between innetCols. By default,weak or non-edges are shown in white, while strong edges are shown in red.ThenetRange controls the range of values this gradient maps to, by default, 0 to 1. IfnetRange is set toNA, then the gradient will be mapped to values between 0 and the maximum edge weight ofthe shown network.

ThedegreeCol argument controls the color of the weighted degreebar plot (seeplotDegree).

ThecontribCols argument controls the color of the node contribution bar plot (seeplotContribution. This can be specified as single value to be used for all nodes, or as two colors: oneto use for nodes with positive contributions and one to use for nodes withnegative contributions.

ThesummaryCols argument controls the color of the module summary bar plot (seeplotSummary. This can be specified as singlevalue to be used for all samples, or as two colors: one to use for sampleswith a positive module summary value and one fpr samples with a negativemodule summary value.

ThenaCol argument controls the color of missing nodes and sampleson the data, correlaton structure, and network edge weight heatmaps.

See Also

plotModule for a combined plot showing all topological properties for a network module.

Examples

# load in example data, correlation, and network matrices for a discovery and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Plot the data for module 1, 2 and 4 in the discovery datasetplotData(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 2, 4))# Symmetric = TRUE gives a traditional heatmap for the correlation structure# and weighted networkplotCorrelation(  network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, modules=c(1, 2, 4), symmetric=TRUE)# While the default is to render only one half of the (symmetric) matrixplotNetwork(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 2, 4))# Plot the degree of nodes in each module in the test dataset, but show them# in the same order as the discovery dataset to compare how node degree # changesplotDegree(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery",  test="test")# Alternatively nodes can be ordered on the plot by degree in the test datasetplotDegree(  network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery",  test="test", orderNodesBy="test")# Or by averaging the degree across datasets for a more robust ordering  plotDegree( network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery",  test="test", orderNodesBy=c("discovery", "test"))# Arbitrary subsets can be plotted:plotContribution(  network=network_list[[1]][1:10, 1:10], data=data_list[[1]][, 1:10],   correlation=correlation_list[[1]][1:10, 1:10], orderNodesBy=NA)# Plot the module summary vectors for multiple modules:plotSummary(  network=network_list, data=data_list, correlation=correlation_list,   moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery",  test="test", orderSamplesBy="test")

Template parameters

Description

Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.

Arguments

orderNodesBy

NULL (default),NA, or a vector of datasetnames or indices. Controls how nodes are ordered on the plot (see details).

orderSamplesBy

NULL (default),NA, or a vector containing a single dataset name or index. Controls how samples are ordered on the plot (see details).

plotNodeNames

logical; controls whether the node names are drawed on the bottom axis.

plotSampleNames

logical; controls whether the sample names are drawed on the left axis.

plotModuleNames

logical; controls whether module names are drawed.The default is for module names to be drawed when multiplemodules are drawn.

main

title for the plot.

main.line

the number of lines into the top margin at which the plottitle will be drawn.

drawBorders

logical; ifTRUE, borders are drawn around theweighted degree,node conribution, andmodule summarybar plots.

lwd

line width for borders and axes.

naxt.line

the number of lines into the bottom margin at which the nodenames will be drawn.

saxt.line

the number of lines into the left margin at which the samplenames will be drawn.

maxt.line

the number of lines into the bottom margin at which the module names will be drawn.

xaxt.line

the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot.

xaxt.tck

the size of the x-axis ticks for the module summary bar plot.

xlab.line

the number of lines into the bottom margin at which the x axis label on themodule summary bar plot(s) will be drawn.

yaxt.line

the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots.

ylab.line

the number of lines into the left margin at which the y axis labels on theweighted degree andnode contribution bar plots will be drawn.

yaxt.tck

the size of the y-axis ticks for the weighted degree and node contribution bar plots.

laxt.line

the distance from the legend to draw the legend axis labels, as multiple oflaxt.tck.

laxt.tck

size of the ticks on each axis legend relative to thesize of the correlation, edge weights, and data matrix heatmaps.

legend.main.line

the distance from the legend to draw the legend title.

cex.axis

relative size of the node and sample names.

cex.lab

relative size of the module names and legend titles.

cex.main

relative size of the plot titles.

dataCols

a character vector of colors to create a gradient from forthe data heatmap (see details). Automatically determined ifNA orNULL.

dataRange

the range of values to map to thedataCols gradient(see details). Automatically determined ifNA orNULL.

corCols

a character vector of colors to create a gradient from forthe correlation structure heatmap (see details).

corRange

the range of values to map to thecorCols gradient(see details).

netCols

a character vector of colors to create a gradient from forthe network edge weight heatmap (see details).

netRange

the range of values to map to thecorCols gradient(see details). Automatically determined ifNA orNULL.

degreeCol

color to use for the weighted degree bar plot.

contribCols

color(s) to use for the node contribution bar plot (see details).

summaryCols

color(s) to use for the node contribution bar plot (see details).

naCol

color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps.

dryRun

logical; ifTRUE, only the axes and labels will be drawed.


How many permutations do I need to test at my desired significance level?

Description

How many permutations do I need to test at my desired significance level?

Usage

requiredPerms(alpha, alternative = "greater")

Arguments

alpha

desired significance threshold.

alternative

a character string specifying the alternative hypothesis, must be one of "greater" (default), "less", or "two.sided". You can specify just the initial letter.

Value

The minimum number of permutations required to detect any significantassociations at the providedalpha. The minimum p-value will alwaysbe smaller thanalpha.

Examples

data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# How many permutations are required to Bonferroni adjust for the 4 modules # in the example data? nPerm <- requiredPerms(0.05/4) # Note that we recommend running at least 10,000 permutations to make sure # that the null distributions are representative.preservation <- modulePreservation( network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, nPerm=nPerm, discovery="discovery",  test="test")

Order samples within a network.

Description

Get the order of samples within a module based on the module summary vector.

Usage

sampleOrder(  network,  data,  correlation,  moduleAssignments = NULL,  modules = NULL,  backgroundLabel = "0",  discovery = NULL,  test = NULL,  na.rm = FALSE,  simplify = TRUE,  verbose = TRUE)

Arguments

network

a list of interaction networks, one for each dataset. Each entry of the list should be an * n matrix or where each element contains the edge weight between nodesi andj in the inferred network for that dataset.

data

a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interactionnetwork for that dataset. The columns should correspond to variables in the data(nodes in the network) and rows to samples in that dataset.

correlation

a list of matrices, one for each dataset. Each entry ofthe list should be an * n matrix where each element contains the correlation coefficient between nodesi andj in thedata used to infer the interaction network for that dataset.

moduleAssignments

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules

a list of vectors, one for eachdiscovery dataset, of modules to perform the analysis on. If unspecified, all modulesin eachdiscovery dataset will be analysed, with the exception of those specified inbackgroundLabel argument.

backgroundLabel

a single label given to nodes that do not belong to any module in themoduleAssignments argument. Defaults to "0". Set toNULL if you do not want to skip the network background module.

discovery

a vector of names or indices denoting thediscoverydataset(s) in thedata,correlation,network,moduleAssignments,modules, andtest lists.

test

a list of vectors, one for eachdiscovery dataset,of names or indices denoting thetest dataset(s) in thedata,correlation, andnetwork lists.

na.rm

logical; IfTRUE variables present in thediscovery dataset but missing from thetest dataset are excluded. IfFALSE missing variables are put last in the ordering.

simplify

logical; ifTRUE, simplify the structure of the outputlist if possible (see Return Value).

verbose

logical; should progress be reported? Default isTRUE.

Details

Input data structures:

Thepreservation of network modules in a seconddataset is quantified by measuring the preservation of topologicalproperties between thediscovery andtest datasets. These properties are calculated not only from the interaction networks inferredin each dataset, but also from the data used to infer those networks (e.g.gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in theNetRep package have the following arguments:

network:

a list of interaction networks, one for each dataset.

data:

a list of data matrices used to infer those networks, one for each dataset.

correlation:

a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.

moduleAssignments:

a list of vectors, one for eachdiscovery dataset, containing the module assignments for each node in that dataset.

modules:

a list of vectors, one for eachdiscovery dataset, containingthe names of the modules from that dataset to analyse.

discovery:

a vector indicating the names or indices of the previous arguments' lists to use as thediscovery dataset(s) for the analyses.

test:

a list of vectors, one vector for eachdiscovery dataset, containing the names or indices of thenetwork,data, andcorrelation argument lists to use as thetest dataset(s) for the analysis of eachdiscovery dataset.

The formatting of these arguments is not strict: each function will attemptto make sense of the user input. For example, if there is only onediscovery dataset, then input to themoduleAssigments andtest arguments may be vectors, rather than lists. If thesampleOrder are being calculate within thediscovery ortest datasets, then thediscovery andtest arguments donot need to be specified, and the input matrices for thenetwork,data, andcorrelation arguments do not need to be wrapped ina list.

Analysing large datasets:

Matrices in thenetwork,data, andcorrelation listscan be supplied asdisk.matrix objects. This class allows matrix data to be kept on disk and loaded as required byNetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Value

A nested list structure. At the top level, the list has one element per'discovery' dataset. Each of these elements is a list that has oneelement per'test' dataset analysed for that'discovery' dataset. Each of these elements is a list that has one element per'modules' specified, containing a vector of node names for therequested module. Whensimplify = TRUE then the simplest possible structure will be returned. E.g. if the sample ordering are requested for in only one dataset, then a single vector of node labels will be returned.

Whensimplify = FALSE then a nested list of datasets will always be returned, i.e. each element at the top level and second level correspond to a dataset, and each element at the third level will correspond to modules discovered in the dataset specified at the top level if module labels are provided in the correspondingmoduleAssignments list element. E.g.results[["Dataset1"]][["Dataset2"]][["module1"]] will contain the order of samples calculated in "Dataset2", where "module1" was indentifiedin "Dataset1". Modules and datasets for which calculation of the sampleorder have not been requested will containNULL.

See Also

networkProperties

Examples

# load in example data, correlation, and network matrices for a discovery # and test dataset:data("NetRep")# Set up input lists for each input matrix type across datasets. The list# elements can have any names, so long as they are consistent between the# inputs.network_list <- list(discovery=discovery_network, test=test_network)data_list <- list(discovery=discovery_data, test=test_data)correlation_list <- list(discovery=discovery_correlation, test=test_correlation)labels_list <- list(discovery=module_labels)# Sort nodes within module 1 in descending order by module summarysamples <- sampleOrder(  network=network_list, data=data_list, correlation=correlation_list,  moduleAssignments=labels_list, modules="1" )

Template parameters

Description

Template parameters to be imported into other function documentation. This is not intended to be a stand-alone help file.

Arguments

simplify

logical; ifTRUE, simplify the structure of the outputlist if possible (see Return Value).


[8]ページ先頭

©2009-2025 Movatter.jp