| Version: | 2.2.1 |
| Date: | 2025-08-26 |
| Type: | Package |
| Title: | Linked Inference of Genomic Experimental Relationships |
| Description: | Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details. |
| Author: | Joshua Welch [aut], Yichen Wang [aut, cre], Chao Gao [aut], Jialin Liu [aut], Joshua Sodicoff [aut, ctb], Velina Kozareva [aut, ctb], Evan Macosko [aut, ctb], Paul Hoffman [ctb], Ilya Korsunsky [ctb], Robert Lee [ctb], Andrew Robbins [ctb] |
| Maintainer: | Yichen Wang <wayichen@umich.edu> |
| BugReports: | https://github.com/welch-lab/liger/issues |
| URL: | https://welch-lab.github.io/liger/ |
| License: | GPL-3 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| LinkingTo: | Rcpp, RcppArmadillo, RcppProgress |
| Depends: | methods, stats, utils, R (≥ 3.5) |
| Imports: | cli, DelayedArray, dplyr, ggplot2, grid, HDF5Array, hdf5r,leidenAlg (≥ 1.1.1), lifecycle, magrittr, Matrix, RANN, Rcpp,RcppPlanc (≥ 2.0.0), rlang, S4Vectors, scales, uwot |
| Suggests: | AnnotationDbi, circlize, ComplexHeatmap, cowplot, DESeq2,EnhancedVolcano, fgsea, GenomicRanges, ggrepel, gprofiler2,IRanges, knitr, org.Hs.eg.db, plotly, psych, reactome.db,rmarkdown, Rtsne, sankey, scattermore (≥ 0.7), Seurat,SeuratObject, SingleCellExperiment, SummarizedExperiment,testthat, viridis |
| NeedsCompilation: | yes |
| Packaged: | 2025-08-26 17:18:06 UTC; wangych |
| Repository: | CRAN |
| Date/Publication: | 2025-08-26 17:50:02 UTC |
rliger: Linked Inference of Genomic Experimental Relationships
Description
Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019)doi:10.1016/j.cell.2019.05.006, and Liu J, Gao C, Sodicoff J, et al (2020)doi:10.1038/s41596-020-0391-8 for more details.
Author(s)
Maintainer: Yichen Wangwayichen@umich.edu
Authors:
Joshua Welchwelchjd@umich.edu
Chao Gaogchao@umich.edu
Jialin Liualanliu@umich.edu
Joshua Sodicoffsodicoff@umich.edu [contributor]
Velina Kozareva [contributor]
Evan Macosko [contributor]
Other contributors:
Paul Hoffman [contributor]
Ilya Korsunsky [contributor]
Robert Lee [contributor]
Andrew Robbinsrobbiand@med.umich.edu [contributor]
See Also
Useful links:
Generate dot plot from input matrix with ComplexHeatmap
Description
Generate dot plot from input matrix with ComplexHeatmap
Usage
.complexHeatmapDotPlot( colorMat, sizeMat, featureAnnDF = NULL, cellSplitVar = NULL, cellLabels = NULL, maxDotSize = 4, clusterFeature = FALSE, clusterCell = FALSE, legendColorTitle = "Matrix Value", legendSizeTitle = "Fraction Value", transpose = FALSE, baseSize = 8, cellTextSize = NULL, featureTextSize = NULL, cellTitleSize = NULL, featureTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, featureGrpRot = 0, viridisOption = "C", viridisDirection = -1, ...)Arguments
colorMat,sizeMat | Matrix of the same size. Values in |
featureAnnDF | Data frame of features containing feature names andgrouping labels. |
cellSplitVar | Split the cell orientation (default columns) by thisvariable. |
cellLabels | Label to be shown on cell orientation. |
maxDotSize | The maximum dot size. Default |
clusterFeature,clusterCell | Whether the feature/cell orientation(default rows/column, respectively) should be clustered. Default |
legendColorTitle,legendSizeTitle | The title for color bar and dot sizelegends, repectively. Default see |
transpose | Logical, whether to rotate the dot plot orientation. i.e.rows as cell aggregation and columns as features. Default |
baseSize | One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this. Default |
cellTextSize,featureTextSize,legendTextSize | Size of cell labels,feature label and legend text. Default |
cellTitleSize,featureTitleSize,legendTitleSize | Size of titles oncell and feature orientation and legend title. Default |
featureGrpRot | Number of degree to rotate the feature grouping label.Default |
viridisOption,viridisDirection | See argument |
... | Additional arguments passed to |
Value
AHeatmapList object.
Produce single violin plot with data frame passed from upstream
Description
Produce single violin plot with data frame passed from upstream
Usage
.ggCellViolin( plotDF, y, groupBy = NULL, colorBy = NULL, violin = TRUE, violinAlpha = 0.8, violinWidth = 0.9, box = FALSE, boxAlpha = 0.6, boxWidth = 0.4, dot = FALSE, dotColor = "black", dotSize = getOption("ligerDotSize"), xlabAngle = 45, raster = NULL, seed = 1, ...)Arguments
plotDF | Data frame like object (fortifiable) that contains allnecessary information to make the plot. |
y,groupBy,colorBy | See |
violin,box,dot | Logical, whether to add violin plot, box plot or dot(scatter) plot, respectively. Layers are added in the order of dot, violin,and violin on the top surface. By default, only violin plot is generated. |
violinAlpha,boxAlpha | Numeric, controls the transparency of layers.Default |
violinWidth,boxWidth | Numeric, controls the width of violin/boxbounding box. Default |
dotColor,dotSize | Numeric, globally controls the appearance of alldots. Default |
xlabAngle | Numeric, counter-clockwise rotation angle of X axis labeltext. Default |
raster | Logical, whether to rasterize the dot plot. Default |
seed | Random seed for reproducibility. Default |
... | More theme setting arguments passed to |
Value
ggplot object by default. Whenplotly = TRUE, returnsplotly (htmlwidget) object.
Produce single scatter plot with data frame passed from upstream
Description
Produce single scatter plot with data frame passed from upstream
Usage
.ggScatter( plotDF, x, y, colorBy = NULL, shapeBy = NULL, dotOrder = c("shuffle", "ascending", "descending"), dotSize = getOption("ligerDotSize"), dotAlpha = 0.9, trimHigh = NULL, trimLow = NULL, zeroAsNA = TRUE, raster = NULL, labelBy = colorBy, labelText = TRUE, labelTextSize = 4, ggrepelLabelTick = FALSE, seed = 1, ...)Arguments
plotDF | Data frame like object (fortifiable) that contains allnecessary information to make the plot. |
x,y | Available variable name in |
colorBy,shapeBy | See |
dotOrder | Controls the order that each dot is added to the plot. Choosefrom |
dotSize,dotAlpha | Numeric, controls the size or transparency of alldots. Default |
trimHigh,trimLow | Numeric, limit the largest or smallest value ofcontinuous |
zeroAsNA | Logical, whether to set zero values in continuous |
raster | Logical, whether to rasterize the plot. Default |
labelBy | A variable name available in |
labelText | Logical, whether to show text label at the median positionof each categorical group specified by |
labelTextSize | Numeric, controls the size of label size when |
ggrepelLabelTick | Logical, whether to force showing the tick betweenlabel texts and the position they point to. Useful when a lot of text labelsare required. Default |
seed | Random seed for reproducibility. Default |
... | More theme setting arguments passed to |
Details
Having package "ggrepel" installed can help adding tidier textlabels on the scatter plot.
Value
ggplot object by default. Whenplotly = TRUE, returnsplotly (htmlwidget) object.
Generic ggplot theme setting for rliger package
Description
Controls content and size of all peripheral texts.
Usage
.ggplotLigerTheme( plot, title = NULL, subtitle = NULL, xlab = TRUE, ylab = TRUE, xlabAngle = 0, legendColorTitle = NULL, legendFillTitle = NULL, legendShapeTitle = NULL, legendSizeTitle = NULL, showLegend = TRUE, legendPosition = "right", baseSize = getOption("ligerBaseSize"), titleSize = NULL, subtitleSize = NULL, xTextSize = NULL, xFacetSize = NULL, xTitleSize = NULL, yTextSize = NULL, yFacetSize = NULL, yTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, legendDotSize = 4, panelBorder = FALSE, legendNRow = NULL, legendNCol = NULL, colorLabels = NULL, colorValues = NULL, colorPalette = "magma", colorDirection = -1, naColor = "#DEDEDE", colorLow = NULL, colorMid = NULL, colorHigh = NULL, colorMidPoint = NULL, plotly = FALSE)Arguments
plot | ggplot object passed from wrapper plotting functions |
title,subtitle,xlab,ylab | Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. Use |
xlabAngle | Numeric, counter-clockwise rotation angle of X axis labeltext. Default |
legendColorTitle | Legend title text for color aesthetics, often usedfor categorical or continuous coloring of dots. Default |
legendFillTitle | Legend title text for fill aesthetics, often used forviolin, box, bar plots. Default |
legendShapeTitle | Legend title text for shape aesthetics, often usedfor shaping dots by categorical variable. Default |
legendSizeTitle | Legend title text for size aesthetics, often used forsizing dots by continuous variable. Default |
showLegend | Whether to show the legend. Default |
legendPosition | Text indicating where to place the legend. Choose from |
baseSize | One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this. |
titleSize,xTitleSize,yTitleSize,legendTitleSize | Size of main title,axis titles and legend title. Default |
subtitleSize,xTextSize,yTextSize,legendTextSize | Size of subtitle text,axis texts and legend text. Default |
xFacetSize | Size of facet strip label text on x-axis. Default |
yFacetSize | Size of facet strip label text on y-axis. Default |
legendDotSize | Allow dots in legend region to be large enough to seethe colors/shapes clearly. Default |
panelBorder | Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. Default |
legendNRow,legendNCol | Integer, when too many categories in onevariable, arranges number of rows or columns. Default |
colorLabels | Character vector for modifying category names in acolor legend. Passed to |
colorValues | Character vector of colors for modifying category colorsin a color legend. Passed to |
colorPalette | For continuous coloring, an index or a palette name toselect from available options from ggplot |
colorDirection | Choose |
naColor | The color code for |
colorLow,colorMid,colorHigh,colorMidPoint | All four of these must bespecified to customize palette with |
plotly | Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". Default |
Value
Updated ggplot object by default. Whenplotly = TRUE, returnsplotly (htmlwidget) object.
General heatmap plotting with prepared matrix and data.frames
Description
This is not an exported function. This documentation justserves for a manual of extra arguments that users can use when generatingheatmaps withplotGeneHeatmap orplotFactorHeatmap.
Note that the following arguments are pre-occupied by upstream wrappers sousers should not include them in a function call:dataMatrix,dataName,cellDF,featureDF,cellSplitVar,featureSplitVar.
The following arguments ofHeatmap is occupiedby this function, so users should include them in a function call as well:matrix,name,col,heatmap_legend_param,top_annotation,column_title_gp,column_names_gp,show_column_names,column_split,column_gap,left_annotation,row_title_gp,row_names_gp,show_row_names,row_split,row_gap.
Usage
.plotHeatmap( dataMatrix, dataName = "Value", cellDF = NULL, featureDF = NULL, transpose = FALSE, cellSplitVar = NULL, featureSplitVar = NULL, dataScaleFunc = NULL, showCellLabel = FALSE, showCellLegend = TRUE, showFeatureLabel = TRUE, showFeatureLegend = TRUE, cellAnnColList = NULL, featureAnnColList = NULL, scale = FALSE, trim = c(-2, 2), baseSize = 8, cellTextSize = NULL, featureTextSize = NULL, cellTitleSize = NULL, featureTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, viridisOption = "A", viridisDirection = -1, RColorBrewerOption = "RdBu", ...)Arguments
dataMatrix | Matrix object with features/factors as rows and cells ascolumns. |
dataName | Text for heatmap color bar title. Default |
cellDF | data.frame object. Number of rows must match with number ofcolumns of |
featureDF | data.frame object. Number of columns must match with numberof rows of |
transpose | Logical, whether to "rotate" the heatmap by 90 degrees sothat cell information is displayed by row. Default |
cellSplitVar,featureSplitVar | Subset columns of |
dataScaleFunc | A function object, applied to |
showCellLabel,showFeatureLabel | Logical, whether to show cell barcodes,gene symbols or factor names. Default |
showCellLegend,showFeatureLegend | Logical, whether to show cell orfeature legends. Default |
cellAnnColList,featureAnnColList | List object, with each element anamed vector of R-interpretable color code. The names of the list elementsare used for matching the annotation variable names. The names of the colorsin the vectors are used for matching the levels of a variable (factor object,categorical). Default |
scale | Logical, whether to take z-score to scale and center geneexpression. Applied after |
trim | Numeric vector of two values. Limit the z-score value into thisrange when |
baseSize | One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this. |
cellTextSize,featureTextSize,legendTextSize | Size of cell barcodelabels, gene/factor labels, or legend values. Default |
cellTitleSize,featureTitleSize,legendTitleSize | Size of titles of thecell slices, gene/factor slices, or the legends. Default |
viridisOption,viridisDirection | See argument |
RColorBrewerOption | When |
... | Additional arguments to be passed to |
Value
HeatmapList-class object
Apply function to chunks of H5 data in ligerDataset object
Description
h5 calculation wrapper, that runs specified calculation withon-disk matrix in chunks
Usage
H5Apply( object, FUN, init = NULL, useData = c("rawData", "normData"), chunkSize = 1000, verbose = getOption("ligerVerbose"), ...)Arguments
object | AligerDataset object. |
FUN | A function that is applied to each chunk. See detail forrestrictions. |
init | Initialized result if it need to be updated iteratively. Default |
useData | The slot name of the data to be processed. Choose from |
chunkSize | Number if columns to be included in each chunk.Default |
verbose | Logical. Whether to show information of the progress. Default |
... | Other arguments to be passed to |
Details
TheFUN function has to have the first four arguments orderedby:
chunk data: A sparse matrix(
dgCMatrix-class) containing maximumchunkSizecolumns.x-vector index: The index that subscribes the vector of
xslot of a dgCMatrix, which points to the values in each chunk. Mostly usedwhen need to write a new sparse matrix to H5 file.cell index: The column index of each chunk out of the wholeoriginal matrix
Initialized result: A customized object, the value passed to
H5Apply(init)argument will be passed here in the first iteration. Andthe returned value ofFUNwill be iteratively passed here in nextchunk iterations. So it is important to keep the object structure of thereturned value consistent withinit.
No default value to these four arguments should be pre-defined becauseH5Apply will automatically generate the input.
Align factor loadings to get final integration
Description
This function is a wrapper to switch between alternative factor loadingalignment methods that LIGER provides, which is a required step for producingthe final integrated result. Two methods are provided (click on options formore details):
method = "quantileNorm": Previously published quantilenormalization method. (default)method = "centroidAlign": Newly developed centroidalignment method.
Usage
alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)## S3 method for class 'liger'alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)## S3 method for class 'Seurat'alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)Arguments
object | Aliger or Seurat object with valid factorizationresult available (i.e. |
method | Character, method to align factors. Default |
... | Additional arguments passed to selected methods.For
|
See Also
Converting other classes of data to a liger object
Description
This function converts data stored in SingleCellExperiment (SCE), Seuratobject or a merged sparse matrix (dgCMatrix) into a liger object. This isdesigned for a container object or matrix that already contains multipledatasets to be integerated with LIGER. For individual datasets, please usecreateLiger instead.
Usage
## S3 method for class 'dgCMatrix'as.liger(object, datasetVar = NULL, modal = NULL, ...)## S3 method for class 'SingleCellExperiment'as.liger(object, datasetVar = NULL, modal = NULL, ...)## S3 method for class 'Seurat'as.liger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...)seuratToLiger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...)as.liger(object, ...)Arguments
object | Object. |
datasetVar | Specify the dataset belonging by: 1. Select a variable fromexisting metadata in the object (e.g. colData column); 2. Specify avector/factor that assign the dataset belonging. 3. Give a single characterstring which means that all data is from one dataset (must not be a metadatavariable, otherwise it is understood as 1.). Default |
modal | Modality setting for each dataset. See |
... | Additional arguments passed to |
assay | Name of assay to use. Default |
Details
For Seurat V5 structure, it is highly recommended that users make use of itssplit layer feature, where things like "counts", "data", and "scale.data"can be held for each dataset in the same Seurat object, e.g. with"count.ctrl", "count.stim", not merged. If a Seurat object with split layersis given,datasetVar will be ignored and the layers will be directlyused.
Value
aliger object.
Examples
# dgCMatrix (common sparse matrix class), usually obtained from other# container object, and contains multiple samples merged in one.matList <- rawData(pbmc)multiSampleMatrix <- mergeSparseAll(matList)# The `datasetVar` argument expects the variable assigning the sample sourcepbmc2 <- as.liger(multiSampleMatrix, datasetVar = pbmc$dataset)pbmc2if (requireNamespace("SingleCellExperiment", quietly = TRUE)) { sce <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = multiSampleMatrix) ) sce$sample <- pbmc$dataset pbmc3 <- as.liger(sce, datasetVar = "sample") pbmc3}if (requireNamespace("Seurat", quietly = TRUE)) { seu <- SeuratObject::CreateSeuratObject(multiSampleMatrix) # Seurat creates variable "orig.ident" by identifying the cell barcode # prefixes, which is indeed what we need in this case. Users might need # to be careful and have it confirmed first. pbmc4 <- as.liger(seu, datasetVar = "orig.ident") pbmc4 # As per Seurat V5 updates with layered data, specifically helpful udner the # scenario of dataset integration. "counts" and etc for each datasets can be # split into layers. seu5 <- seu seu5[["RNA"]] <- split(seu5[["RNA"]], pbmc$dataset) print(SeuratObject::Layers(seu5)) pbmc5 <- as.liger(seu5) pbmc5}Converting other classes of data to a ligerDataset object
Description
Works for converting a matrix or container object to a single ligerDataset,and can also convert the modality preset of a ligerDataset. When used witha dense matrix object, it automatically converts the matrix to sparse form(dgCMatrix-class). When used with container objectssuch as Seurat or SingleCellExperiment, it is highly recommended that theobject contains only one dataset/sample which is going to be integrated withLIGER. For multi-sample objects, please useas.liger withdataset source variable specified.
Usage
## S3 method for class 'ligerDataset'as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ...)## Default S3 method:as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ...)## S3 method for class 'matrix'as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ...)## S3 method for class 'Seurat'as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), assay = NULL, ...)## S3 method for class 'SingleCellExperiment'as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ...)as.ligerDataset(object, ...)Arguments
object | Object. |
modal | Modality setting for each dataset. Choose from |
... | Additional arguments passed to |
assay | Name of assay to use. Default |
Value
aliger object.
Examples
ctrl <- dataset(pbmc, "ctrl")ctrl# Convert the modality presetas.ligerDataset(ctrl, modal = "atac")rawCounts <- rawData(ctrl)class(rawCounts)as.ligerDataset(rawCounts)liger object of bone marrow subsample data with RNA and ATAC modality
Description
liger object of bone marrow subsample data with RNA and ATAC modality
Usage
bmmcFormat
liger object with two dataset named by "rna" and "atac"
Source
https://www.nature.com/articles/s41587-019-0332-7
References
Jeffrey M. Granja and et. al., Nature Biotechnology, 2019
Calculate adjusted Rand index (ARI) by comparing two cluster labeling variables
Description
This function aims at calculating the adjusted Rand index for the clusteringresult obtained with LIGER and the external clustering (existing "true"annotation). ARI ranges from 0 to 1, with a score of 0 indicating noagreement between clusterings and 1 indicating perfect agreement.
The true clustering annotation must be specified as the base line. We suggestsetting it to the object cellMeta so that it can be easily used for manyother visualization and evaluation functions.
The ARI can be calculated for only specified datasets, since true annotationmight not be available for all datasets. Evaluation for only one or a fewdatasets can be done by specifyinguseDatasets. IfuseDatasetsis specified, the argument checking fortrueCluster anduseCluster will be enforced to match the cells in the specifieddatasets.
Usage
calcARI( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), classes.compare = trueCluster)Arguments
object | Aliger object, with the clustering resultpresent in cellMeta. |
trueCluster | Either the name of one variable in |
useCluster | The name of one variable in |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be considered for the puritycalculation. Default |
verbose | Logical. Whether to show information of the progress. Default |
classes.compare |
Value
A numeric scalar, the ARI of the clustering result indicated byuseCluster compared totrueCluster.
A numeric scalar of the ARI value
References
L. Hubert and P. Arabie (1985) Comparing Partitions, Journal ofthe Classification, 2, pp. 193-218.
Examples
# Assume the true cluster in `pbmcPlot` is "leiden_cluster"# generate fake new labelingfake <- sample(1:7, ncol(pbmcPlot), replace = TRUE)# Insert into cellMetapbmcPlot$new <- factor(fake)calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new")# Now assume we got existing base line annotation only for "stim" datasetnStim <- ncol(dataset(pbmcPlot, "stim"))stimTrueLabel <- factor(fake[1:nStim])# Insert into cellMetacellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel# Assume "leiden_cluster" is the clustering result we got and need to be# evaluatedcalcARI(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim")# Comparison of the same labeling should always yield 1.calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")Calculate agreement metric after integration
Description
This metric quantifies how much the factorization and alignment distorts thegeometry of the original datasets. The greater the agreement, the lessdistortion of geometry there is. This is calculated by performingdimensionality reduction on the original and integrated (factorized or plusaligned) datasets, and measuring similarity between the k nearestneighbors for each cell in original and integrated datasets. The Jaccardindex is used to quantify similarity, and is the final metric averages acrossall cells.
Note that for most datasets, the greater the chosennNeighbor, thegreater the agreement in general. Although agreement can theoreticallyapproach 1, in practice it is usually no higher than 0.2-0.3.
Usage
calcAgreement( object, ndims = 40, nNeighbors = 15, useRaw = FALSE, byDataset = FALSE, seed = 1, dr.method = NULL, k = nNeighbors, use.aligned = NULL, rand.seed = seed, by.dataset = byDataset)Arguments
object |
|
ndims | Number of factors to produce in NMF. Default |
nNeighbors | Number of nearest neighbors to use in calculating Jaccardindex. Default |
useRaw | Whether to evaluate just factorized |
byDataset | Whether to return agreement calculated for each datasetinstead of the average for all datasets. Default |
seed | Random seed to allow reproducible results. Default |
dr.method | |
k,rand.seed,by.dataset | |
use.aligned |
Value
A numeric vector of agreement metric. A single value ifbyDataset = FALSE or each dataset a value otherwise.
Examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- pbmc %>% normalize %>% selectGenes %>% scaleNotCenter %>% runINMF %>% alignFactors calcAgreement(pbmc)}Calculate alignment metric after integration
Description
This metric quantifies how well-aligned two or more datasets are. We randomlydownsample all datasets to have as many cells as the smallest one. Weconstruct a nearest-neighbor graph and calculate for each cell how many ofits neighbors are from the same dataset. We average across all cells andcompare to the expected value for perfectly mixed datasets, and scale thevalue from 0 to 1. Note that in practice, alignment can be greater than 1occasionally.
Usage
calcAlignment( object, clustersUse = NULL, clusterVar = NULL, nNeighbors = NULL, cellIdx = NULL, cellComp = NULL, resultBy = c("all", "dataset", "cell"), seed = 1, k = nNeighbors, rand.seed = seed, cells.use = cellIdx, cells.comp = cellComp, clusters.use = clustersUse, by.cell = NULL, by.dataset = NULL)Arguments
object | Aliger object, with |
clustersUse | The clusters to consider for calculating the alignment.Should be a vector of existing levels in |
clusterVar | The name of one variable in |
nNeighbors | Number of neighbors to use in calculating alignment.Default |
cellIdx,cellComp | Character, logical or numeric index that cansubscribe cells. Default |
resultBy | Select from |
seed | Random seed to allow reproducible results. Default |
k,rand.seed,cells.use,cells.comp,clusters.use | |
by.cell,by.dataset |
Details
\bar{x} is the average number of neighbors belonging to any cells' samedataset,N is the number of datasets,k is the number ofneighbors in the KNN graph.
1 - \frac{\bar{x} - \frac{k}{N}}{k - \frac{k}{N}}
The selection on cells to be measured can be done in various way andrepresent different scenarios:
By default, all cells are considered and the alignment across alldatasets will be calculated.
Select
clustersUsefromclusterVarto use cells from theclusters of interests. This measures the alignment across all covereddatasets within the specified clusters.Only Specify
cellIdxfor flexible selection. This measures thealignment across all covered datasets within the specified cells. A none-NULLcellIdxprivileges overclustersUse.Specify
cellIdxandcellCompat the same time, so thatthe original dataset source will be ignored and cells specified by eachargument will be regarded as from each a dataset. This measures the alignmentbetween cells specified by the two arguments.cellCompcan containcells already specified incellIdx.
Value
The alignment metric.
Examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- pbmc %>% normalize %>% selectGenes %>% scaleNotCenter %>% runINMF %>% alignFactors calcAlignment(pbmc)}Calculate a dataset-specificity score for each factor
Description
This score represents the relative magnitude of thedataset-specific components of each factor's gene loadings compared to theshared components for two datasets. First, for each dataset we calculate thenorm of the sum of each factor's shared loadings (W) anddataset-specific loadings (V). We then determine the ratio of these twovalues and subtract from 1... TODO: finish description.
Usage
calcDatasetSpecificity( object, dataset1, dataset2, doPlot = FALSE, do.plot = doPlot)Arguments
object | liger object with factorization results. |
dataset1 | Name of first dataset. Required. |
dataset2 | Name of second dataset. Required. |
doPlot | Logical. Whether to display a barplot of dataset specificityscores (by factor). Default |
do.plot | Deprecated. Use |
Value
List containing three elements.
pct1 | Vector of the norm of each metagene factor for dataset1. |
pct2 | Vector of the norm of each metagene factor for dataset2. |
pctSpec | Vector of dataset specificity scores. |
Calculate Normalized Mutual Information (NMI) by comparing two clusterlabeling variables
Description
This function aims at calculating the Normalized Mutual Information for theclustering result obtained with LIGER and the external clustering (existing"true" annotation). NMI ranges from 0 to 1, with a score of 0 indicating noagreement between clusterings and 1 indicating perfect agreement. Themathematical definition of NMI is as follows:
H(X) = -\sum_{x \in X}P(X=x)\log_2 P(X=x)
H(X|Y) = -\sum_{y \in Y}P(Y=y)\sum_{x \in X}P(X=x|Y=y)\log_2 P(X=x|Y=y)
I(X;Y) = H(X) - H(X|Y)
NMI(X;Y) = \frac{I(X;Y)}{\sqrt{H(X)H(Y)}}
WhereX is the cluster variable to be evaluated andY is the truecluster variable.x andy are the cluster labels inX andY respectively.H is the entropy andI is the mutualinformation.
The true clustering annotation must be specified as the base line. We suggestsetting it to the object cellMeta so that it can be easily used for manyother visualization and evaluation functions.
The NMI can be calculated for only specified datasets, since true annotationmight not be available for all datasets. Evaluation for only one or a fewdatasets can be done by specifyinguseDatasets. IfuseDatasetsis specified, the argument checking fortrueCluster anduseCluster will be enforced to match the cells in the specifieddatasets.
Usage
calcNMI( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE))Arguments
object | Aliger object, with the clustering resultpresent in cellMeta. |
trueCluster | Either the name of one variable in |
useCluster | The name of one variable in |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be considered for the puritycalculation. Default |
verbose | Logical. Whether to show information of the progress. Default |
Value
A numeric scalar of the NMI value
Examples
# Assume the true cluster in `pbmcPlot` is "leiden_cluster"# generate fake new labelingfake <- sample(1:7, ncol(pbmcPlot), replace = TRUE)# Insert into cellMetapbmcPlot$new <- factor(fake)calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new")# Now assume we got existing base line annotation only for "stim" datasetnStim <- ncol(dataset(pbmcPlot, "stim"))stimTrueLabel <- factor(fake[1:nStim])# Insert into cellMetacellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel# Assume "leiden_cluster" is the clustering result we got and need to be# evaluatedcalcNMI(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim")# Comparison of the same labeling should always yield 1.calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")Calculate purity by comparing two cluster labeling variables
Description
This function aims at calculating the purity for the clustering resultobtained with LIGER and the external clustering (existing "true" annotation).Purity can sometimes be a more useful metric when the clustering to be testedcontains more subgroups or clusters than the true clusters. Purity rangesfrom 0 to 1, with a score of 1 representing a pure, accurate clustering.
The true clustering annotation must be specified as the base line. We suggestsetting it to the object cellMeta so that it can be easily used for manyother visualization and evaluation functions.
The purity can be calculated for only specified datasets, since trueannotation might not be available for all datasets. Evaluation for only oneor a few datasets can be done by specifyinguseDatasets. IfuseDatasets is specified, the argument checking fortrueClusteranduseCluster will be enforced to match the cells in the specifieddatasets.
Usage
calcPurity( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), classes.compare = trueCluster)Arguments
object | Aliger object, with the clustering resultpresent in cellMeta. |
trueCluster | Either the name of one variable in |
useCluster | The name of one variable in |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be considered for the puritycalculation. Default |
verbose | Logical. Whether to show information of the progress. Default |
classes.compare |
Value
A numeric scalar, the purity of the clustering result indicated byuseCluster compared totrueCluster.
Examples
# Assume the true cluster in `pbmcPlot` is "leiden_cluster"# generate fake new labelingfake <- sample(1:7, ncol(pbmcPlot), replace = TRUE)# Insert into cellMetapbmcPlot$new <- factor(fake)calcPurity(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new")# Now assume we got existing base line annotation only for "stim" datasetnStim <- ncol(dataset(pbmcPlot, "stim"))stimTrueLabel <- factor(fake[1:nStim])# Insert into cellMetacellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel# Assume "leiden_cluster" is the clustering result we got and need to be# evaluatedcalcPurity(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim")Cell cycle gene set for human
Description
Copied from Seurat::cc.genes
Usage
ccGeneHumanFormat
A list of two character vectors:
- s.genes
Genes associated with S-phase
- g2m.genes
Genes associated with G2M-phase
Source
https://www.science.org/doi/abs/10.1126/science.aad0501
Align factor loading by centroid alignment (beta)
Description
This process treats the factor loading of each dataset as the low dimensionalembedding as well as the cluster assignment probability, i.e. the softclustering result. Then the method aligns the embedding by linearly movingthe centroids of the same cluster but within each dataset towards each other.
ATTENTION: This method is still under development while has shownencouraging results in benchmarking tests. The arguments and their defaultvalues reflect the best scored parameters in the tests and some of them maybe subject to change in the future.
Usage
centroidAlign(object, ...)## S3 method for class 'liger'centroidAlign( object, lambda = 1, useDims = NULL, scaleEmb = TRUE, centerEmb = TRUE, scaleCluster = FALSE, centerCluster = FALSE, shift = FALSE, diagnosis = FALSE, ...)## S3 method for class 'Seurat'centroidAlign( object, reduction = "inmf", lambda = 1, useDims = NULL, scaleEmb = TRUE, centerEmb = TRUE, scaleCluster = FALSE, centerCluster = FALSE, shift = FALSE, diagnosis = FALSE, ...)Arguments
object | Aliger or Seurat object with valid factorizationresult available (i.e. |
... | Arguments passed to other S3 methods of this function. |
lambda | Ridge regression penalty applied to each dataset. Can be onenumber that applies to all datasets, or a numeric vector with length equal tothe number of datasets. Default |
useDims | Indices of factors to use considered for the alignment.Default |
scaleEmb | Logical, whether to scale the factor loading being consideredas the embedding. Default |
centerEmb | Logical, whether to center the factor loading beingconsidered as the embedding before scaling it. Default |
scaleCluster | Logical, whether to scale the factor loading beingconsidered as the cluster assignment probability. Default |
centerCluster | Logical, whether to center the factor loading beingconsidered as the cluster assignment probability before scaling it. Default |
shift | Logical, whether to shift the factor loading being considered asthe cluster assignment probability after centered scaling. Default |
diagnosis | Logical, whether to return cell metadata variables withdiagnostic information. See Details. Default |
reduction | Name of the reduction where LIGER integration result isstored. Default |
Details
Diagnostic information include:
object$raw_which.max: The index of the factor with the maximum valuein the raw factor loading.
object$R_which.max: The index of the factor with the maximum value inthe soft clustering probability matrix used for correction.
object$Z_which.max: The index of the factor with the maximum value inthe aligned factor loading.
Value
Returns the updated input object
liger method
Update the
H.normslot for the aligned cell factorloading, ready for running graph based community detection clusteringor dimensionality reduction for visualization.Update the
cellMataslot with diagnostic information ifdiagnosis = TRUE.
Seurat method
Update the
reductionsslot with a newDimReducobject containing the aligned cell factor loading.Update the metadata with diagnostic information if
diagnosis = TRUE.
Examples
pbmc <- centroidAlign(pbmcPlot)Close all links (to HDF5 files) of a liger object
Description
When need to interact with the data embedded in HDF5 files outof the currect R session, the HDF5 files has to be closed in order to beavailable to other processes.
Usage
closeAllH5(object)## S3 method for class 'liger'closeAllH5(object)## S3 method for class 'ligerDataset'closeAllH5(object)Arguments
object | liger object. |
Value
Nothing is returned.
Check difference of two liger command
Description
Check difference of two liger command
Usage
commandDiff(object, cmd1, cmd2)Arguments
object | liger object |
cmd1,cmd2 | Exact string of command labels. Available options could beviewed with running |
Value
If any difference found, a character vector summarizing alldifferences
Examples
pbmc <- normalize(pbmc)pbmc <- normalize(pbmc, log = TRUE, scaleFactor = 1e4)cmds <- commands(pbmc)commandDiff(pbmc, cmds[1], cmds[2])Convert old liger object to latest version
Description
Convert old liger object to latest version
Usage
convertOldLiger( object, dimredName, clusterName = "clusters", h5FilePath = NULL)Arguments
object |
|
dimredName | The name of variable in |
clusterName | The name of variable in |
h5FilePath | Named list, to specify the path to the H5 file of eachdataset if location has been changed. Default |
Examples
## Not run: # Suppose you have a liger object of old version (<1.99.0)newLig <- convertOldLiger(oldLig)## End(Not run)Access ligerSpatialDataset coordinate data
Description
Similar as how defaultligerDataset data isaccessed.
Usage
coordinate(x, dataset)coordinate(x, dataset, check = TRUE) <- value## S4 method for signature 'liger,character'coordinate(x, dataset)## S4 replacement method for signature 'liger,character'coordinate(x, dataset, check = TRUE) <- value## S4 method for signature 'ligerSpatialDataset,missing'coordinate(x, dataset = NULL)## S4 replacement method for signature 'ligerSpatialDataset,missing'coordinate(x, dataset = NULL, check = TRUE) <- valueArguments
x | ligerSpatialDataset object or aligerobject. |
dataset | Name or numeric index of an spatial dataset. |
check | Logical, whether to perform object validity check on setting newvalue. |
value |
Value
The retrieved coordinate matrix or the updatedx object.
Create on-disk ligerDataset Object
Description
For convenience, the defaultformatType = "10x" directly fits thestructure of cellranger output.formatType = "anndata" works forcurrent AnnData H5AD file specification (see Details). If a customized H5file structure is presented, any of therawData,indicesName,indptrName,genesName,barcodesNameshould be specified accordingly to override theformatType preset.
DO make a copy of the H5AD files because rliger functions write tothe files and they will not be able to be read back to Python. This will befixed in the future.
Usage
createH5LigerDataset( h5file, formatType = "10x", rawData = NULL, normData = NULL, scaleData = NULL, barcodesName = NULL, genesName = NULL, indicesName = NULL, indptrName = NULL, anndataX = "X", modal = c("default", "rna", "atac", "spatial", "meth"), featureMeta = NULL, ...)Arguments
h5file | Filename of an H5 file |
formatType | Select preset of H5 file structure. Default |
rawData,indicesName,indptrName | The path in a H5 file for the rawsparse matrix data. These three types of data stands for the |
normData | The path in a H5 file for the "x" vector of the normalizedsparse matrix. Default |
scaleData | The path in a H5 file for the Group that contains the sparsematrix constructing information for the scaled data. Default |
genesName,barcodesName | The path in a H5 file for the gene names andcell barcodes. Default |
anndataX | The HDF5 path to the raw count data in an H5AD file. SeeDetails. Default |
modal | Name of modality for this dataset. Currently options of |
featureMeta | Data frame for feature metadata. Default |
... | Additional slot data. SeeligerDataset for detail.Given values will be directly placed at corresponding slots. |
Details
For H5AD file written from an AnnData object, we allow usingformatType = "anndata" for the function to infer the proper structure.However, while a typical AnnData-based analysis tends to in-place update theadata.X attribute and there is no standard/forced convention for wherethe raw count data, as needed from LIGER, is stored. Therefore, we exposeargumentanndataX for specifying this information. The default value"X" looks foradata.X. If the raw data is stored in a layer,e.g.adata.layers['count'], thenanndataX = "layers/count".If it is stored toadata.raw.X, thenanndataX = "raw/X". Ifyour AnnData object does not have the raw count retained, you will have togo back to the Python work flow to have it inserted at desired object spaceand re-write the H5AD file, or just go from upstream source files with whichthe AnnData was originally created.
Value
H5-basedligerDataset object
Examples
h5Path <- system.file("extdata/ctrl.h5", package = "rliger")tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = tempPath)ld <- createH5LigerDataset(tempPath)Create liger object
Description
This function allows creatingliger object frommultiple datasets of various forms (SeerawData).
DO make a copy of the H5AD files because rliger functions write tothe files and they will not be able to be read back to Python. This will befixed in the future.
Usage
createLiger( rawData, modal = NULL, organism = "human", cellMeta = NULL, removeMissing = TRUE, addPrefix = "auto", formatType = "10X", anndataX = "X", dataName = NULL, indicesName = NULL, indptrName = NULL, genesName = NULL, barcodesName = NULL, newH5 = TRUE, verbose = getOption("ligerVerbose", TRUE), ..., raw.data = rawData, take.gene.union = NULL, remove.missing = removeMissing, format.type = formatType, data.name = dataName, indices.name = indicesName, indptr.name = indptrName, genes.name = genesName, barcodes.name = barcodesName)Arguments
rawData | Named list of datasets. Required. Elements allowed include amatrix, a |
modal | Character vector for modality setting. Use one string for alldatasets, or the same number of strings as the number of datasets. Currentlyoptions of |
organism | Character vector for setting organism for identifying mito,ribo and hemo genes for expression percentage calculation. Use one string forall datasets, or the same number of strings as the number of datasets.Currently options of |
cellMeta | data.frame of metadata at single-cell level. Default |
removeMissing | Logical. Whether to remove cells that do not have anycounts from each dataset. Default |
addPrefix | Logical. Whether to add "datasetName_" as a prefix ofcell identifiers (e.g. barcodes) to avoid duplicates in multiple libraries (common with 10X data). Default |
formatType | Select preset of H5 file structure. Current availableoptions are |
anndataX | The HDF5 path to the raw count data in an H5AD file. See |
dataName,indicesName,indptrName | The path in a H5 file for the rawsparse matrix data. These three types of data stands for the |
genesName,barcodesName | The path in a H5 file for the gene names andcell barcodes. Default |
newH5 | When using HDF5 based data and subsets created after removingmissing cells/features, whether to create new HDF5 files for the subset.Default |
verbose | Logical. Whether to show information of the progress. Default |
... | Additional slot values that should be directly placed in object. |
raw.data,remove.missing,format.type,data.name,indices.name,indptr.name,genes.name,barcodes.name | |
take.gene.union |
See Also
createLigerDataset,createH5LigerDataset
Examples
# Create from raw count matricesctrl.raw <- rawData(pbmc, "ctrl")stim.raw <- rawData(pbmc, "stim")pbmc1 <- createLiger(list(ctrl = ctrl.raw, stim = stim.raw))# Create from H5 filesh5Path <- system.file("extdata/ctrl.h5", package = "rliger")tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = tempPath)lig <- createLiger(list(ctrl = tempPath))# Create from other container objectif (requireNamespace("SeuratObject", quietly = TRUE)) { ctrl.seu <- SeuratObject::CreateSeuratObject(ctrl.raw) stim.seu <- SeuratObject::CreateSeuratObject(stim.raw) pbmc2 <- createLiger(list(ctrl = ctrl.seu, stim = stim.seu))}Create in-memory ligerDataset object
Description
Create in-memory ligerDataset object
Usage
createLigerDataset( rawData = NULL, modal = c("default", "rna", "atac", "spatial", "meth"), normData = NULL, scaleData = NULL, featureMeta = NULL, ...)Arguments
rawData,normData,scaleData | A |
modal | Name of modality for this dataset. Currently options of |
featureMeta | Data frame of feature metadata. Default |
... | Additional slot data. SeeligerDataset for detail.Given values will be directly placed at corresponding slots. |
See Also
ligerDataset,ligerATACDataset,ligerSpatialDataset,ligerMethDataset
Examples
ctrl.raw <- rawData(pbmc, "ctrl")ctrl.ld <- createLigerDataset(ctrl.raw)Data frame for example marker DEG test result
Description
The data frame is the direct output of marker detection DEG test applied onexample dataset which can be loaded withdata("pbmc"). The DEG testwas done with:
defaultCluster(pbmc) <- pbmcPlot$leiden_clusterdeg.marker <- runMarkerDEG( pbmc, minCellPerRep = 5)
The result is for the marker detection test for 8 clusters in the dataset bycomparing each cluster against all other clusters.
Usage
deg.markerFormat
data.frame object of 1992 rows with columns:
feature: gene names, 249 unique genes repeated 8 times for the testsdone for 8 clusters.
group: cluster names, 8 unique cluster names, dividing the tests.
logFC: log fold change of the gene expression between the cluster ofinterest against all other clusters.
pval: p-value of the DEG test.
padj: adjusted p-value of the DEG test.
pct_in: percentage of cells in the cluster of interest expressing thegene.
pct_out: percentage of cells in all other clusters expressing the gene.
See Also
Data frame for example pairwise DEG test result
Description
The data frame is the direct output of pairwise DEG test applied on exampledataset which can be loaded withimportPBMC(). Cell type annotationwas obtained from SeuratData package, "ifnb" dataset, since they are thesame. Use the following command to reproduce the same result:
library(rliger)library(Seurat)library(SeuratData)lig <- importPBMC()ifnb <- LoadData("ifnb")lig$cell_type <- ifnb$seurat_annotationslig$condition_cell_type <- interaction(lig$dataset, lig$cell_type, drop = FALSE)deg.pw <- runPairwiseDEG( object = lig, groupTest = 'stim.CD14 Mono', groupCtrl = 'ctrl.CD14 Mono', variable1 = 'condition_cell_type')deg.pw <- deg.pw[order(deg.pw$padj)[1:1000],]```The result represents the statistics of DEG test between stim dataset againstctrl dataset, within the CD14 monocytes. The result is randomly sampled to1000 entries for minimum demonstration.[1:1000]: R:1:1000Usage
deg.pwFormat
data.frame object of 1000 rows with columns:
feature: gene names.
group: class name within the variable being used for the test condition.
logFC: log fold change of the gene expression between the condition ofinterest against the control condition.
pval: p-value of the DEG test.
padj: adjusted p-value of the DEG test.
pct_in: percentage of cells in the condition of interest expressing thegene.
pct_out: percentage of cells in the control condition expressing thegene.
See Also
Downsample datasets
Description
This function mainly aims at downsampling datasets to a sizesuitable for plotting or expensive in-memmory calculation.
Users can balance the sample size of categories of interests withbalance. Multi-variable specification tobalance is supported,so that at mostmaxCells cells will be sampled from each combinationof categories from the variables. For example, when two datasets arepresented and three clusters labeled across them, there would then be at most2 \times 3 \times maxCells cells being selected. Note that"dataset" will automatically be added as one variable when balancingthe downsampling. However, if users want to balance the downsampling solelybasing on dataset origin, users have to explicitly setbalance ="dataset".
Usage
downsample( object, balance = NULL, maxCells = 1000, useDatasets = NULL, seed = 1, returnIndex = FALSE, ...)Arguments
object | liger object |
balance | Character vector of categorical variable names in |
maxCells | Max number of cells to sample from the grouping based on |
useDatasets | Index selection of datasets to include Default |
seed | Random seed for reproducibility. Default |
returnIndex | Logical, whether to only return the numeric index that cansubset the original object instead of a subset object. Default |
... | Arguments passed to |
Value
By default, a subset ofligerobject.Alternatively whenreturnIndex = TRUE, a numeric vector to be usedwith the original object.
Examples
# Subsetting an objectpbmc <- downsample(pbmc)# Creating a subsetting indexsampleIdx <- downsample(pbmcPlot, balance = "leiden_cluster", maxCells = 10, returnIndex = TRUE)plotClusterDimRed(pbmcPlot, cellIdx = sampleIdx)Export predicted gene-pair interaction
Description
Export the predicted gene-pair interactions calculated byupstream functionlinkGenesAndPeaks into an Interact Track filewhich is compatible withUCSCGenome Browser.
Usage
exportInteractTrack( corrMat, pathToCoords, useGenes = NULL, outputPath = getwd())Arguments
corrMat | A sparse matrix of correlation with peak names as rows andgene names as columns. |
pathToCoords | Path to the gene coordinates file. |
useGenes | Character vector of gene names to be exported. Default |
outputPath | Path of filename where the output file will be stored. Ifa folder, a file named |
Value
No return value. A file located atoutputPath will be created.
Examples
bmmc <- normalize(bmmc)bmmc <- selectGenes(bmmc)bmmc <- scaleNotCenter(bmmc)if (requireNamespace("RcppPlanc", quietly = TRUE) && requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("psych", quietly = TRUE)) { bmmc <- runINMF(bmmc) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") corr <- linkGenesAndPeaks( bmmc, useDataset = "rna", pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger") ) resultPath <- tempfile() exportInteractTrack( corrMat = corr, pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger"), outputPath = resultPath ) head(read.table(resultPath, skip = 1))}Test all factors for enrichment in a gene set
Description
This function takes the factorizedW matrix, with gene loading infactors, to get the ranked gene list for each factor. Then it runs simplyimplemented GSEA against given gene sets. So if genes in the given gene setare top loaded in a factor, this function will return high positiveenrichment score (ES) as well as significant p-value.
For the returned result object, useprint() orsummary() toshow concise results, and useplot() to visualize the GSEA statistics.
This function can be useful in various scenarios:
For example, when clusters with strong cell cycle activity are detected,users can apply this function with cell cycle gene sets to identify if anyfactor is enriched with such genes. Then in the downstream when aligning theiNMF factor loadings, users can simply opt to exclude these factors so thevariation in cell cycle is regressed out. Objectscc.gene.human andcc.gene.mouse are deliverered in package for convenience.
In other cases, this function can also be used to understand the biologicalmeaning of each cluster. Since the downstream clustering result is largelydetermined by the top loaded factor in each cell, understanding whatgenes are loaded in the top factor helps understand the identity and activityof the cell. This will require users to have there own gene sets prepared.
Usage
factorGSEA( object, geneSet, nPerm = 1000, seed = 1, verbose = getOption("ligerVerbose", TRUE))Arguments
object | Aliger object with factorized |
geneSet | A character vector for a single gene set, or a list ofcharacter vectors for multiple gene sets. |
nPerm | Integer number for number of permutations to estimate p-value.Default |
seed | Integer number for random seed. Default |
verbose | Logical, whether to print progress bar. Default |
Value
IfgeneSet is a single character vector, returns a data framewith enrichment score (ES), normalized enrichment score (NES), and p-valuefor the test in each factor. IfgeneSet is a list, returns a list ofsuch data frames.
Examples
pbmc <- pbmc %>% selectBatchHVG() %>% scaleNotCenter() %>% runINMF()factorGSEAres <- factorGSEA(pbmc, ccGeneHuman)# Print summary of significant resultsprint(factorGSEAres)summary(factorGSEAres)# Make GSEA plot for certain gene set and factorplot(factorGSEAres, geneSetName = 'g2m.genes', useFactor = 'Factor_1')Find shared and dataset-specific markers
Description
Applies various filters to genes on the shared (W) anddataset-specific (V) components of the factorization, before selectingthose which load most significantly on each factor (in a shared ordataset-specific way).
Usage
getFactorMarkers( object, dataset1, dataset2, factorShareThresh = 10, datasetSpecificity = NULL, logFCThresh = 1, pvalThresh = 0.05, nGenes = 30, printGenes = FALSE, verbose = getOption("ligerVerbose", TRUE), factor.share.thresh = factorShareThresh, dataset.specificity = datasetSpecificity, log.fc.thresh = logFCThresh, pval.thresh = pvalThresh, num.genes = nGenes, print.genes = printGenes)Arguments
object | liger object with factorization results. |
dataset1 | Name of first dataset. Required. |
dataset2 | Name of second dataset. Required |
factorShareThresh | Numeric. Only factors with a dataset specificityless than or equal to this threshold will be used. Default |
datasetSpecificity | Numeric vector. Pre-calculated dataset specificityif available. Length should match number of all factors available. Default |
logFCThresh | Numeric. Lower log-fold change threshold for differentialexpression in markers. Default |
pvalThresh | Numeric. Upper p-value threshold for Wilcoxon rank test forgene expression. Default |
nGenes | Integer. Max number of genes to report for each dataset.Default |
printGenes | Logical. Whether to print ordered markers passing logFC,UMI and frac thresholds, when |
verbose | Logical. Whether to show information of the progress. Default |
factor.share.thresh,dataset.specificity,log.fc.thresh,pval.thresh,num.genes,print.genes | Deprecated. See Usage section for replacement. |
Value
A list object consisting of the following entries:
value of dataset1 | data.frame of dataset1-specific markers |
shared | data.frame of shared markers |
value of dataset1 | data.frame of dataset2-specific markers |
num_factors_V1 | A frequency table indicating the number of factors eachmarker appears, in dataset1 |
num_factors_V2 | A frequency table indicating the number of factors eachmarker appears, in dataset2 |
Examples
library(dplyr)result <- getFactorMarkers(pbmcPlot, dataset1 = "ctrl", dataset2 = "stim")print(class(result))print(names(result))result$shared %>% group_by(factor_num) %>% top_n(2, logFC)Calculate proportion mitochondrial contribution
Description
Calculates proportion of mitochondrial contribution based on raw ornormalized data.
Usage
getProportionMito(object, use.norm = FALSE, pattern = "^mt-")Arguments
object |
|
use.norm | Deprecated Whether to use cell normalized data incalculating contribution. Default |
pattern | Regex pattern for identifying mitochondrial genes. Default |
Value
Named vector containing proportion of mitochondrial contribution foreach cell.
Note
getProportionMito will be deprecated becauserunGeneralQC generally covers and expands its use case.
Examples
# Example dataset does not contain MT genes, expected to see a messagepbmc$mito <- getProportionMito(pbmc)Import prepared dataset publically available
Description
These are functions to download example datasets that are subset from publicdata.
PBMC - Downsampled from GSE96583, Kang et al, NatureBiotechnology, 2018. Contains two scRNAseq datasets.
BMMC - Downsampled from GSE139369, Granja et al, NatureBiotechnology, 2019. Contains two scRNAseq datasets and one scATAC data.
CGE - Downsampled from GSE97179, Luo et al, Science, 2017.Contains one scRNAseq dataset and one DNA methylation data.
Usage
importPBMC( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ...)importBMMC( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ...)importCGE( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ...)Arguments
dir | Path to download datasets. Default current working directory |
overwrite | Logical, if a file exists at corresponding downloadlocation, whether to re-download or directly use this file. Default |
method |
|
verbose | Logical. Whether to show information of the progress. Default |
... | Additional arguments passed to |
Value
Constructedliger object with QC performed and missingdata removed.
Examples
pbmc <- importPBMC()bmmc <- importBMMC()cge <- importCGE()Impute the peak counts from gene expression data referring to an ATAC datasetafter integration
Description
This function is designed for creating peak data for a dataset with only geneexpression. This function uses aligned cell factor loading to find nearestneighbors between cells from the queried dataset (without peak) and cellsfrom reference dataset (with peak). And then impute the peak for the formerbasing on the weight. Therefore, the reference dataset selected must be of"atac" modality setting.
Usage
imputeKNN( object, reference, queries = NULL, nNeighbors = 20, weight = TRUE, norm = TRUE, scale = FALSE, verbose = getOption("ligerVerbose", TRUE), ..., knn_k = nNeighbors)Arguments
object | liger object with aligned factor loading computedin advance. |
reference | Name of a dataset containing peak data to impute into querydataset(s). |
queries | Names of datasets to be augmented by imputation. Should notinclude |
nNeighbors | The maximum number of nearest neighbors to search. Default |
weight | Logical. Whether to use KNN distances as weight matrix. Default |
norm | Logical. Whether to normalize the imputed data. Default |
scale | Logical. Whether to scale but not center the imputed data.Default |
verbose | Logical. Whether to show information of the progress. Default |
... | Optional arguments to be passed to |
knn_k | Deprecated. See Usage section for replacement. |
Value
The inputobject where queriedligerDatasetobjects indatasets slot are replaced. These datasets will all beconverted toligerATACDataset class with an additional slotrawPeak to store the imputed peak counts, andnormPeak fornormalized imputed peak counts ifnorm = TRUE.
Examples
bmmc <- normalize(bmmc)bmmc <- selectGenes(bmmc, datasets.use = "rna")bmmc <- scaleNotCenter(bmmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { bmmc <- runINMF(bmmc, k = 20) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna")}Check if given liger object if under new implementation
Description
Check if given liger object if under new implementation
Usage
is.newLiger(object)Arguments
object | A liger object |
Value
TRUE if the version ofobject is later than or equal to1.99.0. OtherwiseFALSE. It raises an error if input object is not ofliger class.
Examples
is.newLiger(pbmc) # TRUECheck if a liger or ligerDataset object is made of HDF5 file
Description
Check if a liger or ligerDataset object is made of HDF5 file
Usage
isH5Liger(object, dataset = NULL)Arguments
object | A liger or ligerDataset object. |
dataset | If |
Value
TRUE orFALSE for the specified check.
Examples
isH5Liger(pbmc)isH5Liger(pbmc, "ctrl")ctrl <- dataset(pbmc, "ctrl")isH5Liger(ctrl)liger class
Description
liger object is the main data container for LIGERanalysis in R. The slotdatasets is a list where each element shouldbe aligerDataset object containing dataset specificinformation, such as the expression matrices. The other parts of liger objectstores information that can be shared across the analysis, such as the cellmetadata.
This manual provides explanation to theliger object structure as wellas usage of class-specific methods. Please see detail sections for moreinformation.
Forliger objects created with older versions of rliger package,please try updating the objects individually withconvertOldLiger.
Usage
datasets(x, check = NULL)datasets(x, check = TRUE) <- valuedataset(x, dataset = NULL)dataset(x, dataset, type = NULL, qc = TRUE) <- valuecellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ...)cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, inplace = FALSE, check = FALSE) <- valuedefaultCluster(x, useDatasets = NULL, ...)defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- valuedimReds(x)dimReds(x) <- valuedimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...)dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- valuedefaultDimRed(x, useDatasets = NULL, cellIdx = NULL)defaultDimRed(x) <- valuevarFeatures(x)varFeatures(x, check = TRUE) <- valuevarUnsharedFeatures(x, dataset = NULL)varUnsharedFeatures(x, dataset, check = TRUE) <- valuecommands(x, funcName = NULL, arg = NULL)## S4 method for signature 'liger'show(object)## S4 method for signature 'liger'dim(x)## S4 method for signature 'liger'dimnames(x)## S4 replacement method for signature 'liger,list'dimnames(x) <- value## S4 method for signature 'liger'datasets(x, check = NULL)## S4 replacement method for signature 'liger,logical'datasets(x, check = TRUE) <- value## S4 replacement method for signature 'liger,missing'datasets(x, check = TRUE) <- value## S4 method for signature 'liger,character_OR_NULL'dataset(x, dataset = NULL)## S4 method for signature 'liger,missing'dataset(x, dataset = NULL)## S4 method for signature 'liger,numeric'dataset(x, dataset = NULL)## S4 replacement method for signature 'liger,character,missing,ANY,ligerDataset'dataset(x, dataset, type = NULL, qc = TRUE) <- value## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike'dataset(x, dataset, type = c("rawData", "normData"), qc = FALSE) <- value## S4 replacement method for signature 'liger,character,missing,ANY,NULL'dataset(x, dataset, type = NULL, qc = TRUE) <- value## S3 method for class 'liger'names(x)## S3 replacement method for class 'liger'names(x) <- value## S3 method for class 'liger'length(x)## S3 method for class 'liger'lengths(x, use.names = TRUE)## S4 method for signature 'liger,NULL'cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ...)## S4 method for signature 'liger,character'cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ...)## S4 method for signature 'liger,missing'cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ...)## S4 replacement method for signature 'liger,missing'cellMeta(x, columns = NULL, useDatasets = NULL, cellIdx = NULL, check = FALSE) <- value## S4 replacement method for signature 'liger,character'cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, inplace = TRUE, check = FALSE) <- value## S4 method for signature 'liger'rawData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'rawData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'rawData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger'normData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'normData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'normData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger,ANY'scaleData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5Group'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger,character'scaleUnsharedData(x, dataset = NULL)## S4 method for signature 'liger,numeric'scaleUnsharedData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5Group'scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger,ANY,ANY,ANY'getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B", "W", "H.norm", "rawPeak", "normPeak"), dataset = NULL, returnList = FALSE)## S4 method for signature 'liger,ANY'getH5File(x, dataset = NULL)## S3 replacement method for class 'liger'x[[i]] <- value## S3 method for class 'liger'x$name## S3 replacement method for class 'liger'x$name <- value## S4 method for signature 'liger'defaultCluster(x, useDatasets = NULL, droplevels = FALSE, ...)## S4 replacement method for signature 'liger,ANY,ANY,character'defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value## S4 replacement method for signature 'liger,ANY,ANY,factor'defaultCluster(x, name = NULL, useDatasets = NULL, droplevels = TRUE, ...) <- value## S4 replacement method for signature 'liger,ANY,ANY,NULL'defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value## S4 method for signature 'liger'dimReds(x)## S4 replacement method for signature 'liger,list'dimReds(x) <- value## S4 method for signature 'liger,missing_OR_NULL'dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...)## S4 method for signature 'liger,index'dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...)## S4 replacement method for signature 'liger,index,ANY,ANY,NULL'dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- value## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike'dimRed( x, name = NULL, useDatasets = NULL, cellIdx = NULL, asDefault = NULL, inplace = FALSE, ...) <- value## S4 method for signature 'liger'defaultDimRed(x, useDatasets = NULL, cellIdx = NULL)## S4 replacement method for signature 'liger,character'defaultDimRed(x) <- value## S4 method for signature 'liger'varFeatures(x)## S4 replacement method for signature 'liger,ANY,character'varFeatures(x, check = TRUE) <- value## S4 method for signature 'liger,ANY'varUnsharedFeatures(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,character'varUnsharedFeatures(x, dataset, check = TRUE) <- value## S3 method for class 'liger'fortify(model, data, ...)## S3 method for class 'liger'c(...)## S4 method for signature 'liger'commands(x, funcName = NULL, arg = NULL)## S4 method for signature 'ligerDataset,missing'varUnsharedFeatures(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,missing,ANY,character'varUnsharedFeatures(x, dataset = NULL, check = TRUE) <- valueArguments
x,object,model | Aliger object |
check | Logical, whether to perform object validity check on setting newvalue. Users are not supposed to set |
value | Metadata value to be inserted |
dataset | Name or numeric index of a dataset |
type | When using |
qc | Logical, whether to perform general qc on added new dataset. |
columns | The names of available variables in |
useDatasets | Setter or getter method should only apply on cells inspecified datasets. Any valid character, numeric or logical subscriber isacceptable. Default |
cellIdx | Valid cell subscription to subset retrieved variables. Default |
as.data.frame | Logical, whether to apply |
... | See detailed sections for explanation. |
inplace | For |
name | The name of available variables in |
funcName,arg | See Command records section. |
use.names | Whether returned vector should be named with dataset names. |
slot | Name of slot to retrieve matrix from. Options shown in Usage. |
returnList | Logical, whether to force return a list even when only onedataset-specific matrix (i.e. expression matrices, H, V or U) is requested.Default |
i | Name or numeric index of cell meta variable to be replaced |
droplevels | Whether to remove unused cluster levels from the factorobject fetched by |
asDefault | Whether to set the inserted dimension reduction matrix asdefault for visualization methods. Default |
data | fortify method required argument. Not used. |
Value
See detailed sections for explanetion.
Input liger object updated with replaced/new variable incellMeta(x).
Slots
datasetslist ofligerDataset objects. Use generic
dataset,dataset<-,datasetsordatasets<-tointeract with. See detailed section accordingly.cellMetaDFrame object for cell metadata. Pre-existingmetadata, QC metrics, cluster labeling and etc. are all stored here. Usegeneric
cellMeta,cellMeta<-,$,[[]]or[[]]<-to interact with. See detailed section accordingly.varFeaturesCharacter vector of names of variable features. Use generic
varFeaturesorvarFeatures<-to interact with. See detailedsection accordingly.WiNMF output matrix of shared gene loadings for each factor. See
runIntegration.H.normMatrix of aligned factor loading for each cell. See
alignFactorsandrunIntegration.commandsList ofligerCommand objects. Record ofanalysis. Use
commandsto retrieve information. See detailed sectionaccordingly.unsList for unstructured meta-info of analyses or presets.
versionRecord of version of rliger package
Dataset access
datasets() method only accesses thedatasets slot, the list ofligerDataset objects.dataset() method accesses a singledataset, with subsequent cell metadata updates and checks bonded when addingor modifying a dataset. Therefore, when users want to modify something insidealigerDataset while no cell metadata change should happen, it isrecommended to use:datasets(x)[[name]] <- ligerD for efficiency,though the result would be the same asdataset(x, name) <- ligerD.
length() andnames() methods are implemented to access thenumber and names of datasets.names<- method is supported formodifying dataset names, with taking care of the "dataset" variable in cellmetadata.
Matrix access
Forliger object,rawData(),normData,scaleData() andscaleUnsharedData() methods are exported forusers to access the corresponding feature expression matrix withspecification of one dataset. For retrieving a type of matrix from multipledatasets, please usegetMatrix() method.
When only one matrix is expected to be retrieved bygetMatrix(), thematrix itself will be returned. A list will be returned if multiple matricesis requested (by querying multiple datasets) orreturnList is set toTRUE.
Cell metadata access
Three approaches are provided for access of cell metadata. A generic functioncellMeta is implemented with plenty of options and multi-variableaccessibility. Besides, users can use double-bracket (e.g.ligerObj[[varName]]) or dollor-sign (e.g.ligerObj$nUMI) toaccess or modify single variables.
For users' convenience of generating a customized ggplot with available cellmetadata, the S3 methodfortify.liger is implemented. With this underthe hook, users can create simple ggplots by directly starting withggplot(ligerObj, aes(...)) where cell metadata variables can bedirectly thrown intoaes().
Special partial metadata insertion is implemented specifically for mappingcategorical annotation from sub-population (subset object) back to originalexperiment (full-size object). For example, when sub-clustering andannotation is done for a specific cell-type of cells (stored insubobj) subset from an experiment (stored asobj), users can docellMeta(obj, "sub_ann", cellIdx = colnames(subobj)) <- subobj$sub_annto map the value back, leaving other cells non-annotated with NAs. Plottingwith this variable will then also show NA cells with default grey color.Furthermore, sub-clustering labels for other cell types can also be mappedto the same variable. For example,cellMeta(obj, "sub_ann",cellIdx = colnames(subobj2)) <- subobj2$sub_ann. As long as the labelingvariables are stored as factor class (categorical), the levels (categorynames) will be properly handled and merged. Other situations follow the Rdefault behavior (e.g. categories might be converted to integer numbers ifmapped to numerical variable in the original object). Note that this featureis only available with using the generic functioncellMeta but notwith the`[[` or`$` accessing methods due to syntax reasons.
The genericdefaultCluster works as both getter and setter. As asetter, users can dodefaultCluster(obj) <- "existingVariableName" toset a categorical variable as default cluster used for visualization ordownstream analysis. Users can also dodefaultCluster(obj,"newVarName") <- factorOfLabels to push new labeling into the object and setas default. For getter method, the function returns a factor object of thedefault cluster labeling. ArgumentuseDatasets can be used forrequiring that given or retrieved labeling should match with cells inspecified datasets. We generally don't recommend setting"dataset" asa default cluster because it is a preserved (always existing) field inmetadata and can lead to meaningless result when running analysis thatutilizes both clustering information and the dataset source information.
Dimension reduction access
Currently, low-dimensional representaion of cells, presented as densematrices, are all stored indimReds slot, and can totally be accessedwith genericsdimRed anddimRed<-. Adding a dimRed to theobject looks as simple asdimRed(obj, "name") <- matrixLike. It canbe retrieved back withdimRed(obj, "name"). Similar to having adefault cluster labeling, we also constructed the feature of default dimRed.It can be set withdefaultDimRed(obj) <- "existingMatLikeVar" and thematrix can be retrieved withdefaultDimRed(obj).
Variable feature access
ThevarFeatures slot allows for character vectors of gene names.varFeatures(x) returns this vector andvalue forvarFeatures<- method has to be a character vector orNULL.The replacement method, whencheck = TRUE performs checks on genename consistency check across thescaleData,H,V slotsof innerligerDataset objects as well as theW andH.norm slots of the inputliger object.
Command records
rliger functions, that perform calculation and update theligerobject, will be recorded in aligerCommand object and stored in thecommands slot, a list, ofliger object. Methodcommands() is implemented to retrieve or show the log history.Running withfuncName = NULL (default) returns all command labels.SpecifyingfuncName allows partial matching to all command labelsand returns a subset list (ofligerCommand object) of matches (ortheligerCommand object if only one match found). Ifarg isfurther specified, a subset list of parameters from the matches will bereturned. For example, requesting a list of resolution values used inall louvain cluster attempts:commands(ligerObj, "louvainCluster","resolution")
Dimensionality
For aliger object, the column orientation is assigned forcells. Due to the data structure, it is hard to define a row index for theliger object, which might contain datasets that vary in number ofgenes.
Therefore, forliger objects,dim anddimnames returnsNA/NULL for rows and total cell counts/barcodes for thecolumns.
For direct call ofdimnames<- method,value should be a listwithNULL as the first element and valid cell identifiers as thesecond element. Forcolnames<- method, the character vector of cellidentifiers.rownames<- method is not applicable.
Subsetting
For more detail of subsetting aliger object or aligerDataset object, please check outsubsetLigerandsubsetLigerDataset. Here, we set the S4 method"single-bracket"[ as a quick wrapper to subset aliger object.Note thatj serves as cell subscriptor which can be any valid indexrefering the collection of all cells (i.e.rownames(cellMeta(obj))).Whilei, the feature subscriptor can only be character vector becausethe features for each dataset can vary.... arugments are passed tosubsetLiger so that advanced options are allowed.
Combining multiple liger object
The list ofdatasets slot,the rows ofcellMeta slot and the list ofcommands slot willbe simply concatenated. Variable features invarFeatures slot will betaken a union. TheW andH.norm matrices are not taken intoaccount for now.
Examples
# Methods for base genericspbmcPlotprint(pbmcPlot)dim(pbmcPlot)ncol(pbmcPlot)colnames(pbmcPlot)[1:5]pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10]names(pbmcPlot)length(pbmcPlot)# rliger generics## Retrieving dataset(s), replacement methods availabledatasets(pbmcPlot)dataset(pbmcPlot, "ctrl")dataset(pbmcPlot, 2)## Retrieving cell metadata, replacement methods availablecellMeta(pbmcPlot)head(pbmcPlot[["nUMI"]])## Retrieving dimemtion reduction matrixhead(dimRed(pbmcPlot, "UMAP"))## Retrieving variable features, replacement methods availablevarFeatures(pbmcPlot)## Command record/historypbmcPlot <- scaleNotCenter(pbmcPlot)commands(pbmcPlot)commands(pbmcPlot, funcName = "scaleNotCenter")# S3 methodspbmcPlot2 <- pbmcPlotnames(pbmcPlot2) <- paste0(names(pbmcPlot), 2)c(pbmcPlot, pbmcPlot2)library(ggplot2)ggplot(pbmcPlot, aes(x = UMAP_1, y = UMAP_2)) + geom_point()cellMeta(pbmc)# Add new variablepbmc[["newVar"]] <- 1cellMeta(pbmc)# Change existing variablepbmc[["newVar"]][1:3] <- 1:3cellMeta(pbmc)Subclass of ligerDataset for ATAC modality
Description
Inherits fromligerDataset class. Contained slotscan be referred with the link.
Slots
rawPeaksparse matrix
normPeaksparse matrix
ligerCommand object: Record the input and time of a LIGER function call
Description
ligerCommand object: Record the input and time of a LIGER function call
Usage
## S4 method for signature 'ligerCommand'show(object)Arguments
object | A |
Slots
funcNameName of the function
timeA time stamp object
callA character string converted from system call
parametersList of all arguments except theliger object.Large object are summarized to short string.
objSummaryList of attributes of theliger object as asnapshot when command is operated.
ligerVersionCharacter string converted from
packageVersion("rliger").dependencyVersionNamed character vector of version number, if anydependency library has a chance to be included by the function. Adependency might only be invoked under certain conditions, such as usingan alternative algorithm, which a call does not actually reach to, but itwould still be included for this call.
Examples
pbmc <- normalize(pbmc)cmd <- commands(pbmc, "normalize")cmdligerDataset class
Description
Object for storing dastaset specific information. Will be embedded within ahigher levelliger object
Usage
rawData(x, dataset = NULL)rawData(x, dataset = NULL, check = TRUE) <- valuenormData(x, dataset = NULL)normData(x, dataset = NULL, check = TRUE) <- valuescaleData(x, dataset = NULL)scaleData(x, dataset = NULL, check = TRUE) <- valuescaleUnsharedData(x, dataset = NULL)scaleUnsharedData(x, dataset = NULL, check = TRUE) <- valuegetMatrix(x, slot = "rawData", dataset = NULL, returnList = FALSE)h5fileInfo(x, info = NULL)h5fileInfo(x, info = NULL, check = TRUE) <- valuegetH5File(x, dataset = NULL)## S4 method for signature 'ligerDataset,missing'getH5File(x, dataset = NULL)featureMeta(x, check = NULL)featureMeta(x, check = TRUE) <- value## S4 method for signature 'ligerDataset'show(object)## S4 method for signature 'ligerDataset'dim(x)## S4 method for signature 'ligerDataset'dimnames(x)## S4 replacement method for signature 'ligerDataset,list'dimnames(x) <- value## S4 method for signature 'ligerDataset'rawData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL'rawData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D'rawData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset'normData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL'normData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D'normData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset,missing'scaleData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5Group'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset,missing'scaleUnsharedData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,missing,ANY,matrixLike_OR_NULL'scaleUnsharedData(x, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,missing,ANY,H5D'scaleUnsharedData(x, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,missing,ANY,H5Group'scaleUnsharedData(x, check = TRUE) <- value## S4 method for signature 'ligerDataset,ANY,missing,missing'getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B"), dataset = NULL)## S4 method for signature 'ligerDataset'h5fileInfo(x, info = NULL)## S4 replacement method for signature 'ligerDataset'h5fileInfo(x, info = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset'featureMeta(x, check = NULL)## S4 replacement method for signature 'ligerDataset'featureMeta(x, check = TRUE) <- value## S3 method for class 'ligerDataset'cbind(x, ..., deparse.level = 1)## S4 method for signature 'ligerATACDataset,ANY,missing,missing'getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B", "rawPeak", "normPeak"), dataset = NULL)Arguments
x,object | A |
dataset | Not applicable for |
check | Whether to perform object validity check on setting new value. |
value | See detail sections for requirements |
slot | The slot name when using |
returnList | Not applicable for |
info | Name of the entry in |
... | See detailed sections for explanation. |
deparse.level | Not used here. |
Slots
rawDataRaw data. Feature by cell matrix. Most of the time, sparsematrix of integer numbers for RNA and ATAC data.
normDataNormalized data. Feature by cell matrix. Sparse if the
rawDatait is normalized from is sparse.scaleDataScaled data, usually with subset shared variable features, bycells. Most of the time sparse matrix of float numbers. This is the data usedfor iNMF factorization.
scaleUnsharedDataScaled data of variable features not shared withother datasets. This is the data used for UINMF factorization.
varUnsharedFeaturesVariable features not shared with other datasets.
ViNMF output matrix holding the dataset specific gene loading of eachfactor. Feature by factor matrix.
AOnline iNMF intermediate product matrix.
BOnline iNMF intermediate product matrix.
HiNMF output matrix holding the factor loading of each cell. Factor bycell matrix.
UUINMF output matrix holding the unshared variable gene loading ofeach factor. Feature by factor matrix.
h5fileInfolist of meta information of HDF5 file used for constructingthe object.
featureMetaFeature metadata, DataFrame object.
colnamesCharacter vector of unique cell identifiers.
rownamesCharacter vector of unique feature names.
Matrix access
ForligerDataset object,rawData(),normData,scaleData() andscaleUnsharedData() methods are exported forusers to access the corresponding feature expression matrix. Replacementmethods are also available to modify the slots.
For other matrices, such as theH andV, which are datasetspecific, please usegetMatrix() method with specifying slot name.Directly accessing slot with@ is generally not recommended.
H5 file and information access
AligerDataset object has a slot calledh5fileInfo, which is alist object. The first element is called$H5File, which is anH5File class object and is the connection to the input file. Thesecond element is$filename which stores the absolute path of the H5file in the current machine. The third element$formatType stores thename of preset being used, if applicable. The other following keys pair withpaths in the H5 file that point to specific data for constructing a featureexpression matrix.
h5fileInfo() method access the list described above and simplyretrieves the corresponding value. Wheninfo = NULL, returns the wholelist. Whenlength(info) == 1, returns the requested list value. Whenmore info requested, returns a subset list.
The replacement method modifies the list elements and corresponding slotvalue (if applicable) at the same time. For example, runningh5fileInfo(obj, "rawData") <- newPath not only updates the list, butalso updates therawData slot with theH5D class data at"newPath" in theH5File object.
getH5File() is a wrapper and is equivalent toh5fileInfo(obj, "H5File").
Feature metadata access
A slotfeatureMeta is included for eachligerDataset object.This slot requires aDataFrame-class object, whichis the same ascellMeta slot of aliger object. However,the associated S4 methods only include access to the whole table for now.Internal information access follows the same way as data.frame operation.For example,featureMeta(ligerD)$nCell orfeatureMeta(ligerD)[varFeatures(ligerObj), "gene_var"].
Dimensionality
For aligerDataset object, the column orientation is assigned forcells and rows are for features. Therefore, forligerDataset objects,dim() returns a numeric vector of two numbers which are number offeatures and number of cells.dimnames() returns a list of twocharacter vectors, which are the feature names and the cell barcodes.
For direct call ofdimnames<- method,value should be a listwith a character vector of feature names as the first element and cellidentifiers as the second element. Forcolnames<- method, thecharacter vector of cell identifiers. Forrownames<- method, thecharacter vector of feature names.
Subsetting
For more detail of subsetting aliger object or aligerDataset object, please check outsubsetLigerandsubsetLigerDataset. Here, we set the S3 method"single-bracket"[ as a quick wrapper to subset aligerDatasetobject.i andj serves as feature and cell subscriptor,respectively, which can be any valid index refering the available featuresand cells in a dataset.... arugments are passed tosubsetLigerDataset so that advanced options are allowed.
Concatenate ligerDataset
cbind() method is implemented for concatenatingligerDatasetobjects by cells. When applying, all feature expression matrix will be mergedwith taking a union of all features for the rows.
Examples
ctrl <- dataset(pbmc, "ctrl")# Methods for base genericsctrlprint(ctrl)dim(ctrl)ncol(ctrl)nrow(ctrl)colnames(ctrl)[1:5]rownames(ctrl)[1:5]ctrl[1:5, 1:5]# rliger generics## raw datam <- rawData(ctrl)class(m)dim(m)## normalized datapbmc <- normalize(pbmc)ctrl <- dataset(pbmc, "ctrl")m <- normData(ctrl)class(m)dim(m)## scaled datapbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)ctrl <- dataset(pbmc, "ctrl")m <- scaleData(ctrl)class(m)dim(m)n <- scaleData(pbmc, "ctrl")identical(m, n)## Any other matricesif (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runOnlineINMF(pbmc, k = 20, minibatchSize = 100) ctrl <- dataset(pbmc, "ctrl") V <- getMatrix(ctrl, "V") V[1:5, 1:5] Vs <- getMatrix(pbmc, "V") length(Vs) names(Vs) identical(Vs$ctrl, V)}Subclass of ligerDataset for Methylation modality
Description
Inherits fromligerDataset class. Contained slotscan be referred with the link.scaleNotCenter applied ondatasets of this class will automatically be taken by reversing thenormalized data instead of scaling the variable features.
Subclass of ligerDataset for RNA modality
Description
Inherits fromligerDataset class. Contained slotscan be referred with the link. This subclass does not have any different fromthe defaultligerDataset class except the class name.
Subclass of ligerDataset for Spatial modality
Description
Inherits fromligerDataset class. Contained slotscan be referred with the link.
Slots
coordinatedense matrix
Convert between liger and Seurat object
Description
For converting aliger object to a Seurat object, therawData,normData, andscaleData from each dataset,thecellMeta,H.norm andvarFeatures slot will beincluded. Compatible with V4 and V5. It is not recommended to use thisconversion if yourliger object contains datasets fromvarious modalities.
Usage
ligerToSeurat( object, assay = NULL, identByDataset = FALSE, merge = FALSE, nms = NULL, renormalize = NULL, use.liger.genes = NULL, by.dataset = identByDataset)Arguments
object | Aliger object to be converted |
assay | Name of assay to store the data. Default |
identByDataset | Logical, whether to combine dataset variable anddefault cluster labeling to set the Idents. Default |
merge | Logical, whether to merge layers of different datasets into one.Not recommended. Default |
nms |
|
renormalize |
|
use.liger.genes |
|
by.dataset |
Value
Always returns Seurat object(s) of the latest version. By default aSeurat object with split layers, e.g. with layers like "counts.ctrl" and"counts.stim". Ifmerge = TRUE, return a single Seurat object withlayers for all datasets merged.
Examples
if (requireNamespace("SeuratObject", quietly = TRUE) && requireNamespace("Seurat", quietly = TRUE)) { seu <- ligerToSeurat(pbmc)}Linking genes to putative regulatory elements
Description
Evaluate the relationships between pairs of genes and peaksbased on specified distance metric. Usually used for inferring thecorrelation between gene expression and imputed peak counts for datasetswithout the modality originally (i.e. applied toimputeKNNresult).
Usage
linkGenesAndPeaks( object, useDataset, pathToCoords, useGenes = NULL, method = c("spearman", "pearson", "kendall"), alpha = 0.05, verbose = getOption("ligerVerbose", TRUE), path_to_coords = pathToCoords, genes.list = useGenes, dist = method)Arguments
object | Aliger object, with datasets that is ofligerATACDataset class in the |
useDataset | Name of one dataset, with both normalized gene expressionand normalized peak counts available. |
pathToCoords | Path tothe gene coordinates file, usually a BED file. |
useGenes | Character vector of gene names to be tested. Default |
method | Choose the type of correlation to calculate, from |
alpha | Numeric, significance threshold for correlation p-value.Peak-gene correlations with p-values below this threshold are consideredsignificant. Default |
verbose | Logical. Whether to show information of the progress. Default |
path_to_coords,genes.list,dist | Deprecated. See Usage sectionfor replacement. |
Value
A sparse matrix with peak names as rows and gene names as columns,with each element indicating the correlation between peak i and gene j, 0 ifthe gene and peak are not significantly linked.
See Also
Examples
if (requireNamespace("RcppPlanc", quietly = TRUE) && requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("psych", quietly = TRUE)) { bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc) bmmc <- scaleNotCenter(bmmc) bmmc <- runINMF(bmmc, miniBatchSize = 100) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") corr <- linkGenesAndPeaks( bmmc, useDataset = "rna", pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger") )}
Louvain algorithm for community detection
Description
After quantile normalization, users can additionally run the Louvainalgorithm for community detection, which is widely used in single-cellanalysis and excels at merging small clusters into broad cell classes.
Arguments
object |
|
k | The maximum number of nearest neighbours to compute. (default 20) |
resolution | Value of the resolution parameter, use a value above(below) 1.0 if you want to obtain a larger (smaller) number of communities.(default 1.0) |
prune | Sets the cutoff for acceptable Jaccard index whencomputing the neighborhood overlap for the SNN construction. Any edges withvalues less than or equal to this will be set to 0 and removed from the SNNgraph. Essentially sets the strigency of pruning (0 — no pruning, 1 —prune everything). (default 1/15) |
eps | The error bound of the nearest neighbor search. (default 0.1) |
nRandomStarts | Number of random starts. (default 10) |
nIterations | Maximal number of iterations per random start. (default100) |
random.seed | Seed of the random number generator. (default 1) |
verbose | Print messages (TRUE by default) |
dims.use | Indices of factors to use for clustering. Default |
Value
object with refined cluster assignment updated in"louvain_cluster" variable incellMeta slot. Can be fetchedwithobject$louvain_cluster
See Also
Fast calculation of feature count matrix
Description
Fast calculation of feature count matrix
Usage
makeFeatureMatrix(bedmat, barcodes)Arguments
bedmat | A feature count list generated by bedmap |
barcodes | A list of barcodes |
Value
A feature count matrix with features as rows and barcodes ascolumns
Examples
## Not run: gene.counts <- makeFeatureMatrix(genes.bc, barcodes)promoter.counts <- makeFeatureMatrix(promoters.bc, barcodes)samnple <- gene.counts + promoter.counts## End(Not run)Deprecated functions in packagerliger.
Description
The functions listed below are deprecated and will be defunct inthe near future. When possible, alternative functions with similarfunctionality or a replacement are also mentioned. Help pages fordeprecated functions are available athelp("<function>-deprecated").
Usage
makeInteractTrack( corr.mat, path_to_coords, genes.list = NULL, output_path = getwd())louvainCluster( object, resolution = 1, k = 20, prune = 1/15, eps = 0.1, nRandomStarts = 10, nIterations = 100, random.seed = 1, verbose = getOption("ligerVerbose", TRUE), dims.use = NULL)optimizeALS( object, k, lambda = 5, thresh = NULL, max.iters = 30, nrep = 1, H.init = NULL, W.init = NULL, V.init = NULL, use.unshared = FALSE, rand.seed = 1, print.obj = NULL, verbose = TRUE, ...)online_iNMF( object, X_new = NULL, projection = FALSE, W.init = NULL, V.init = NULL, H.init = NULL, A.init = NULL, B.init = NULL, k = 20, lambda = 5, max.epochs = 5, miniBatch_max_iters = 1, miniBatch_size = 5000, h5_chunk_size = 1000, seed = 123, verbose = TRUE)quantile_norm( object, quantiles = 50, ref_dataset = NULL, min_cells = 20, knn_k = 20, dims.use = NULL, do.center = FALSE, max_sample = 1000, eps = 0.9, refine.knn = TRUE, clusterName = "H.norm_cluster", rand.seed = 1, verbose = getOption("ligerVerbose", TRUE))makeRiverplot( object, cluster1, cluster2, cluster_consensus = NULL, min.frac = 0.05, min.cells = 10, river.yscale = 1, river.lty = 0, river.node_margin = 0.1, label.cex = 1, label.col = "black", lab.srt = 0, river.usr = NULL, node.order = "auto")makeInteractTrack
FormakeInteractTrack, useexportInteractTrack.
louvainCluster
ForlouvainCluster, userunCluster(method = "louvain")as the replacement, whilerunCluster with defaultmethod = "leiden" is more recommended.
optimizeALS
ForoptimizeALS, userunIntegration orrunINMF. For the case ofoptimizeALS(use.unshared = TRUE), userunIntegrationwithmethod = "UINMF" orrunUINMF instead.
online_iNMF
Foronline_iNMF, userunIntegration withmethod = "online" orrunOnlineINMF.
quantile_norm
Forquantile_norm, usequantileNorm.
makeRiverplot
FormakeRiverplot, useplotSankey as the replacement.
Export predicted gene-pair interaction
Description
Export the predicted gene-pair interactions calculated byupstream functionlinkGenesAndPeaks into an Interact Track filewhich is compatible withUCSCGenome Browser.
Arguments
corr.mat | A sparse matrix of correlation with peak names as rows andgene names as columns. |
path_to_coords | Path to the gene coordinates file. |
genes.list | Character vector of gene names to be exported. Default |
output_path | Path of filename where the output file will be stored. Ifa folder, a file named |
Value
No return value. A file located atoutputPath will be created.
See Also
rliger-deprecated,exportInteractTrack
Generate a river (Sankey) plot
Description
Creates a riverplot to show how separate cluster assignments from twodatasets map onto a joint clustering. The joint clustering is by default theobject clustering, but an external one can also be passed in. Uses theriverplot package to construct riverplot object and then plot.
Arguments
object |
|
cluster1 | Cluster assignments for dataset 1. Note that cluster namesshould be distinct across datasets. |
cluster2 | Cluster assignments for dataset 2. Note that cluster namesshould be distinct across datasets. |
cluster_consensus | Optional external consensus clustering (to useinstead of object clusters) |
min.frac | Minimum fraction of cluster for edge to be shown (default0.05). |
min.cells | Minumum number of cells for edge to be shown (default 10). |
river.yscale | y-scale to pass to riverplot – scales the edge withvalues by this factor, can be used to squeeze vertically (default 1). |
river.lty | Line style to pass to riverplot (default 0). |
river.node_margin | Node_margin to pass to riverplot – how muchvertical space to keep between the nodes (default 0.1). |
label.cex | Size of text labels (default 1). |
label.col | Color of text labels (defualt "black"). |
lab.srt | Angle of text labels (default 0). |
river.usr | Coordinates at which to draw the plot in form (x0, x1, y0,y1). |
node.order | Order of clusters in each set (list with three vectors ofordinal numbers). By default will try to automatically order themappropriately. |
Value
object with refined cluster assignment updated in"louvain_cluster" variable incellMeta slot. Can be fetchedwithobject$louvain_cluster
See Also
Create new variable from categories in cellMeta
Description
Designed for fast variable creation when a new variable is going to becreated from existing variable. For example, multiple samples can be mappedto the same study design condition, clusters can be mapped to cell types.
Usage
mapCellMeta(object, from, newTo = NULL, ...)Arguments
object | Aliger object. |
from | The name of the original variable to be mapped from. |
newTo | The name of the new variable to store the mapped result. Default |
... | Mapping criteria, argument names are original existing categoriesin the |
Value
WhennewTo = NULL, a factor object of the new variable.Otherwise, the input object with variablenewTo updated incellMeta(object).
Examples
pbmc <- mapCellMeta(pbmc, from = "dataset", newTo = "modal", ctrl = "rna", stim = "rna")Merge hdf5 files
Description
This function merges hdf5 files generated from differentlibraries (cell ranger by default) before they are preprocessed through Ligerpipeline.
Usage
mergeH5( file.list, library.names, new.filename, format.type = "10X", data.name = NULL, indices.name = NULL, indptr.name = NULL, genes.name = NULL, barcodes.name = NULL)Arguments
file.list | List of path to hdf5 files. |
library.names | Vector of library names (corresponding to file.list) |
new.filename | String of new hdf5 file name after merging (defaultnew.h5). |
format.type | string of HDF5 format (10X CellRanger by default). |
data.name | Path to the data values stored in HDF5 file. |
indices.name | Path to the indices of data points stored in HDF5 file. |
indptr.name | Path to the pointers stored in HDF5 file. |
genes.name | Path to the gene names stored in HDF5 file. |
barcodes.name | Path to the barcodes stored in HDF5 file. |
Value
Directly generates newly merged hdf5 file.
Examples
## Not run: # For instance, we want to merge two datasets saved in HDF5 files (10X# CellRanger) paths to datasets: "library1.h5","library2.h5"# dataset names: "lib1", "lib2"# name for output HDF5 file: "merged.h5"mergeH5(list("library1.h5","library2.h5"), c("lib1","lib2"), "merged.h5")## End(Not run)Merge matrices while keeping the union of rows
Description
mergeSparseAll takes in a list of DGEs, with genes asrows and cells as columns, and merges them into a single DGE. Also addslibraryNames to colnames from each DGE if expected to be overlap(common with 10X barcodes). Values inrawData ornormDataslot of aligerDataset object can be processed with this.
For a list of dense matrices, usually the values inscaleData slot ofaligerDataset object, please usemergeDenseAll whichworks in the same way.
Usage
mergeSparseAll( datalist, libraryNames = NULL, mode = c("union", "intersection"))mergeDenseAll(datalist, libraryNames = NULL)Arguments
datalist | List of dgCMatrix for |
libraryNames | Character vector to be added as the prefix for thebarcodes in each matrix in |
mode | Whether to take the |
Value
dgCMatrix or matrix with all barcodes indatalist as columnsand the union of genes indatalist as rows.
Examples
rawDataList <- getMatrix(pbmc, "rawData")merged <- mergeSparseAll(rawDataList, libraryNames = names(pbmc))Return preset modality of a ligerDataset object or that of all datasets in aliger object
Description
Return preset modality of a ligerDataset object or that of all datasets in aliger object
Usage
modalOf(object)Arguments
object | aligerDataset object or aligerobject |
Value
A single character of modality setting value forligerDatasetobject, or a named vector forliger object, where the names are dataset names.
Examples
modalOf(pbmc)ctrl <- dataset(pbmc, "ctrl")modalOf(ctrl)ctrl.atac <- as.ligerDataset(ctrl, modal = "atac")modalOf(ctrl.atac)
Normalize raw counts data
Description
Perform library size normalization on raw counts input. As forthe preprocessing step of iNMF integration, by default we don't multiply thenormalized values with a scale factor, nor do we take the log transformation.Applicable S3 methods can be found in Usage section.
normalizePeak is designed for datasets of "atac" modality, i.e. storedinligerATACDataset. S3 method for various container object isnot supported yet due to difference in architecture design.
Usage
normalize(object, ...)## S3 method for class 'matrix'normalize(object, log = FALSE, scaleFactor = NULL, ...)## S3 method for class 'dgCMatrix'normalize(object, log = FALSE, scaleFactor = NULL, ...)## S3 method for class 'DelayedArray'normalize( object, log = FALSE, scaleFactor = NULL, chunk = getOption("ligerChunkSize", 20000), overwrite = FALSE, returnStats = FALSE, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'ligerDataset'normalize( object, chunk = getOption("ligerChunkSize", 20000), verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'liger'normalize( object, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), format.type = NULL, remove.missing = NULL, ...)## S3 method for class 'Seurat'normalize(object, assay = NULL, layer = "counts", save = "ligerNormData", ...)normalizePeak( object, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | liger object |
... | Arguments to be passed to S3 methods. The "liger" method callsthe "ligerDataset" method, which then calls "dgCMatrix" method. |
log | Logical. Whether to do a |
scaleFactor | Numeric. Scale the normalized expression value by thisfactor before transformation. |
chunk | Integer. Number of maximum number of cells in each chunk whenworking on HDF5 file based ligerDataset. Default |
overwrite | Logical. When writing newly computed HDF5 array to aseparate HDF5 file, whether to overwrite the existing file. Default |
returnStats | Logical. Used in LIGER internal workflow to allowcaptureing precalculated statistics for downstream use. Default |
verbose | Logical. Whether to show information of the progress. Default |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be normalized. Should specify ATACseqdatasets when using |
format.type,remove.missing | Deprecated. The functionality ofthese is covered through other parts of the whole workflow and is no longneeded. Will be ignored if specified. |
assay | Name of assay to use. Default |
layer | Where the input raw counts should be from. Default |
save | For Seurat>=4.9.9, the name of layer to store normalized data.Default |
Value
Updatedobject.
dgCMatrix method - Returns processed dgCMatrix object
ligerDataset method - Updates the
normDataslot of the objectliger method - Updates the
normDataslot of chosen datasetsSeurat method - Adds a named layer in chosen assay (V5), or update the
dataslot of the chosen assay (<=V4)normalizePeak- Updates thenormPeakslot of chosendatasets.
Examples
pbmc <- normalize(pbmc)
Perform online iNMF on scaled datasets
Description
Please turn torunOnlineINMF orrunIntegration.
Perform online integrative non-negative matrix factorization to representmultiple single-cell datasets in terms of H, W, and V matrices. It optimizesthe iNMF objective function using online learning (non-negative least squaresfor H matrix, hierarchical alternating least squares for W and V matrices),where the number of factors is set by k. The function allows online learningin 3 scenarios: (1) fully observed datasets; (2) iterative refinement usingcontinually arriving datasets; and (3) projection of new datasets withoutupdating the existing factorization. All three scenarios require fixed memoryindependent of the number of cells.
For each dataset, this factorization produces an H matrix (cells by k), a Vmatrix (k by genes), and a shared W matrix (k by genes). The H matricesrepresent the cell factor loadings. W is identical among all datasets, as itrepresents the shared components of the metagenes across datasets. The Vmatrices represent the dataset-specific components of the metagenes.
Arguments
object |
|
X_new | List of new datasets for scenario 2 or scenario 3. Each listelement should be the name of an HDF5 file. |
projection | Perform data integration by shared metagene (W) projection(scenario 3). (default FALSE) |
W.init | Optional initialization for W. (default NULL) |
V.init | Optional initialization for V (default NULL) |
H.init | Optional initialization for H (default NULL) |
A.init | Optional initialization for A (default NULL) |
B.init | Optional initialization for B (default NULL) |
k | Inner dimension of factorization–number of metagenes (default 20).A value in the range 20-50 works well for most analyses. |
lambda | Regularization parameter. Larger values penalizedataset-specific effects morestrongly (ie. alignment should increase as lambda increases). We recommendalways using the default value exceptpossibly for analyses with relatively small differences (biologicalreplicates, male/female comparisons, etc.)in which case a lower value such as 1.0 may improve reconstruction quality.(default 5.0). |
max.epochs | Maximum number of epochs (complete passes through thedata). (default 5) |
miniBatch_max_iters | Maximum number of block coordinate descent (HALSalgorithm) iterations to perform for each update of W and V (default 1).Changing this parameter is not recommended. |
miniBatch_size | Total number of cells in each minibatch (default 5000).This is a reasonable default, but a smaller value such as 1000 may benecessary for analyzing very small datasets. In general, minibatch sizeshould be no larger than the number of cells in the smallest dataset. |
h5_chunk_size | Chunk size of input hdf5 files (default 1000). The chunksize should be no larger than the batch size. |
seed | Random seed to allow reproducible results (default 123). |
verbose | Print progress bar/messages (TRUE by default) |
Value
liger object with H, W, V, A and B slots set.
Perform iNMF on scaled datasets
Description
Please turn torunINMF orrunIntegration.
Perform integrative non-negative matrix factorization to return factorized H,W, and V matrices. It optimizes the iNMF objective function using blockcoordinate descent (alternating non-negative least squares), where the numberof factors is set by k. TODO: include objective function equation here indocumentation (using deqn)
For each dataset, this factorization produces an H matrix (cells by k), a Vmatrix (k by genes), and a shared W matrix (k by genes). The H matricesrepresent the cell factor loadings. W is held consistent among all datasets,as it represents the shared components of the metagenes across datasets. TheV matrices represent the dataset-specific components of the metagenes.
Arguments
object |
|
k | Inner dimension of factorization (number of factors). Run suggestKto determine appropriate value; a general rule of thumb is that a higher kwill be needed for datasets with more sub-structure. |
lambda | Regularization parameter. Larger values penalizedataset-specific effects more strongly (ie. alignment should increase aslambda increases). Run suggestLambda to determine most appropriate value forbalancing dataset alignment and agreement (default 5.0). |
thresh | Convergence threshold. Convergence occurs when|obj0-obj|/(mean(obj0,obj)) < thresh. (default 1e-6) |
max.iters | Maximum number of block coordinate descent iterations toperform (default 30). |
nrep | Number of restarts to perform (iNMF objective function isnon-convex, so taking the best objective from multiple successiveinitializations is recommended). For easier reproducibility, this incrementsthe random seed by 1 for each consecutive restart, so future factorizationsof the same dataset can be run with one rep if necessary. (default 1) |
H.init | Initial values to use for H matrices. (default NULL) |
W.init | Initial values to use for W matrix (default NULL) |
V.init | Initial values to use for V matrices (default NULL) |
rand.seed | Random seed to allow reproducible results (default 1). |
print.obj | Print objective function values after convergence (defaultFALSE). |
verbose | Print progress bar/messages (TRUE by default) |
... | Arguments passed to other methods |
Value
liger object with H, W, and V slots set.
See Also
Perform factorization for new data
Description
Uses an efficient strategy for updating that takes advantage ofthe information in the existing factorization. Assumes that variable featuresare presented in the new datasets. Two modes are supported (controlled bymerge):
Append new data to existing datasets specified by
useDatasets.Here the existingVmatrices for the target datasets will directly beused as initialization, and newHmatrices for the merged matrices willbe initialized accordingly.Set new data as new datasets. Initial
Vmatrices for them willbe copied from datasets specified byuseDatasets, and newHmatrices will be initialized accordingly.
Usage
optimizeNewData( object, dataNew, useDatasets, merge = TRUE, lambda = NULL, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), new.data = dataNew, which.datasets = useDatasets, add.to.existing = merge, max.iters = nIteration, thresh = NULL)Arguments
object | Aliger object. Should have integrativefactorization performed e.g. ( |
dataNew | Named list ofraw count matrices, genes by cells. |
useDatasets | Selection of datasets to append new data to if |
merge | Logical, whether to add the new data to existingdatasets or treat as totally new datasets (i.e. calculate new |
lambda | Numeric regularization parameter. By default |
nIteration | Number of block coordinate descent iterations to perform.Default |
seed | Random seed to allow reproducible results. Default |
verbose | Logical. Whether to show information of the progress. Default |
new.data,which.datasets,add.to.existing,max.iters | These arguments arenow replaced by others and will be removed in the future. Please see usagefor replacement. |
thresh | Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enough |
Value
object withW slot updated with the newWmatrix, and theH andV slots of eachligerDataset object in thedatasets slot updated withthe new dataset specificH andV matrix, respectively.
See Also
runINMF,optimizeNewK,optimizeNewLambda
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)# Only running a few iterations for fast examplesif (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc, k = 20, nIteration = 2) # Create fake new data by increasing all non-zero count in "ctrl" by 1, # and make unique cell identifiers ctrl2 <- rawData(dataset(pbmc, "ctrl")) ctrl2@x <- ctrl2@x + 1 colnames(ctrl2) <- paste0(colnames(ctrl2), 2) pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2), useDatasets = "ctrl", nIteration = 2)}Perform factorization for new value of k
Description
This uses an efficient strategy for updating that takesadvantage of the information in the existing factorization. It is mostrecommended for values ofkNew smaller than current value (k,which is set when runningrunINMF), where it is more likely tospeed up the factorization.
Usage
optimizeNewK( object, kNew, lambda = NULL, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), k.new = kNew, max.iters = nIteration, rand.seed = seed, thresh = NULL)Arguments
object | Aliger object. Should have integrativefactorization performed e.g. ( |
kNew | Number of factors of factorization. |
lambda | Numeric regularization parameter. By default |
nIteration | Number of block coordinate descent iterations toperform. Default |
seed | Random seed to allow reproducible results. Default |
verbose | Logical. Whether to show information of the progress. Default |
k.new,max.iters,rand.seed | These arguments are now replaced by othersand will be removed in the future. Please see usage for replacement. |
thresh | Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enough |
Value
object withW slot updated with the newWmatrix, and theH andV slots of eachligerDataset object in thedatasets slot updated withthe new dataset specificH andV matrix, respectively.
See Also
runINMF,optimizeNewLambda,optimizeNewData
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)# Only running a few iterations for fast examplesif (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeNewK(pbmc, kNew = 25, nIteration = 2)}Perform factorization for new lambda value
Description
Uses an efficient strategy for updating that takes advantage ofthe information in the existing factorization; always uses previous k.Recommended mainly when re-optimizing for higher lambda and when new lambdavalue is significantly different; otherwise may not return optimal results.
Usage
optimizeNewLambda( object, lambdaNew, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), new.lambda = lambdaNew, max.iters = nIteration, rand.seed = seed, thresh = NULL)Arguments
object | liger object. Should have integrativefactorization (e.g. |
lambdaNew | Numeric regularization parameter. Larger values penalizedataset-specific effects more strongly. |
nIteration | Number of block coordinate descent iterations toperform. Default |
seed | Random seed to allow reproducible results. Default |
verbose | Logical. Whether to show information of the progress. Default |
new.lambda,max.iters,rand.seed | These arguments are now replaced byothers and will be removed in the future. Please see usage for replacement. |
thresh | Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enough |
Value
Inputobject with optimized factorization values updated.including theW matrix inliger object, andH andV matrices in eachligerDataset object in thedatasets slot.
See Also
runINMF,optimizeNewK,optimizeNewData
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Only running a few iterations for fast examples pbmc <- runINMF(pbmc, k = 20, nIteration = 2) # pbmc <- optimizeNewLambda(pbmc, lambdaNew = 5.5, nIteration = 2)}Perform factorization for subset of data
Description
Uses an efficient strategy for updating that takes advantage ofthe information in the existing factorization.
Usage
optimizeSubset( object, clusterVar = NULL, useClusters = NULL, lambda = NULL, nIteration = 30, cellIdx = NULL, scaleDatasets = NULL, seed = 1, verbose = getOption("ligerVerbose"), cell.subset = cellIdx, cluster.subset = useClusters, max.iters = nIteration, datasets.scale = scaleDatasets, thresh = NULL)Arguments
object | liger object. Should have integrativefactorization (e.g. |
clusterVar,useClusters | Together select the clusters to subset theobject conveniently. |
lambda | Numeric regularization parameter. By default |
nIteration | Maximum number of block coordinate descent iterations toperform. Default |
cellIdx | Valid index vector that applies to the whole object. See |
scaleDatasets | Names of datasets to re-scale after subsetting.Default |
seed | Random seed to allow reproducible results. Default |
verbose | Logical. Whether to show information of the progress. Default |
cell.subset,cluster.subset,max.iters,datasets.scale | These argumentsare now replaced by others and will be removed in the future. Please seeusage for replacement. |
thresh | Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enough |
Value
Subsetobject with factorization matrices optimized, includingtheW matrix inliger object, andW andVmatrices in eachligerDataset object in thedatasetsslot.scaleData in theligerDataset objects ofdatasets specified byscaleDatasets will also be updated to reflectthe subset.
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Only running a few iterations for fast examples pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeSubset(pbmc, cellIdx = sort(sample(ncol(pbmc), 200)), nIteration = 2)}liger object of PBMC subsample data with Control and Stimulated datasets
Description
liger object of PBMC subsample data with Control and Stimulated datasets
Usage
pbmcFormat
liger object with two datasets named by "ctrl" and"stim".
Source
https://www.nature.com/articles/nbt.4042
References
Hyun Min Kang and et. al., Nature Biotechnology, 2018
liger object of PBMC subsample data with plotting information available
Description
This data was generated from data"pbmc" with defaultparameter integration pipeline: normalize, selectGenes, scaleNotCenter,runINMF, runCluster, runUMAP. To minimize the object size distributed withthe package, rawData and scaleData were removed. Genes are downsampled tothe top 50 variable genes, for smaller normData andW matrix.
Usage
pbmcPlotFormat
liger object with two datasets named by "ctrl" and"stim".
Source
https://www.nature.com/articles/nbt.4042
References
Hyun Min Kang and et. al., Nature Biotechnology, 2018
GSEA plot for specific gene set and factor using factorGSEA results
Description
GSEA plot for specific gene set and factor using factorGSEA results
Usage
## S3 method for class 'factorGSEA'plot( x, y, geneSetName, useFactor, xTitleSize = 10, xTextSize = 8, yTitleSize = 10, yTextSize = 8, titleSize = 12, captionTextSize = 8, ESLineColor = "green", ESLinewidth = 1, hitsLineColor = "black", hitsLinewidth = 0.5, loadingBarColor = "grey", ...)Arguments
x | A |
y | Not used, for S3 method convention. |
geneSetName | A character string for the gene set name to plot. |
useFactor | A character string (e.g. 'Factor_1') or just numeric indexfor the factor name to plot. |
xTitleSize,yTitleSize | Numeric, size for x or y axis titles,respectively. Default |
xTextSize,yTextSize | Numeric, size for x or y axis text,respectively. Default |
titleSize | Numeric, size for the main plot title. Default |
captionTextSize | Numeric, size for the caption text. Default |
ESLineColor | Color for the enrichment score line. Default |
ESLinewidth | Numeric, line width for the enrichment score line.Default |
hitsLineColor | Color for the hits line. Default |
hitsLinewidth | Numeric, line width for the hits line. Default |
loadingBarColor | Color for the loading bar. Default |
... | Not used. |
Value
ggplot object
Create barcode-rank plot for each dataset
Description
This function ranks the total count of each cell within each dataset and makeline plot. This function is simply for examining the input raw count dataand does not infer any recommended cutoff for removing non-cell barcodes.
Usage
plotBarcodeRank(object, ...)Arguments
object | Aliger object. |
... | Arguments passed on to
|
Value
A list object of ggplot for each dataset
Examples
plotBarcodeRank(pbmc)Generate violin/box plot(s) using liger object
Description
This function allows for using available cell metadata, featureexpression or factor loading to generate violin plot, and grouping the datawith available categorical cell metadata. Available categorical cell metadatacan be used to form the color annotation. When it is different from thegrouping, it forms a nested grouping. Multiple y-axis variables are allowedfrom the same specification ofslot, and this returns a list of violinplot for each. Users can further split the plot(s) by grouping on cells (e.g.datasets).
Usage
plotCellViolin( object, y, groupBy = NULL, slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H"), yFunc = NULL, cellIdx = NULL, colorBy = NULL, splitBy = NULL, titles = NULL, ...)Arguments
object | liger object |
y | Available variable name in |
groupBy,colorBy | Available variable name in |
slot | Choose the slot to find the |
yFunc | A function object that expects a vector/factor/data.frameretrieved by |
cellIdx | Character, logical or numeric index that can subscribe cells.Missing or |
splitBy | Character vector of categorical variable names in |
titles | Title text. A character scalar or a character vector with asmany elements as multiple plots are supposed to be generated. Default |
... | Arguments passed on to
|
Details
Available option forslot include:"cellMeta","rawData","normData","scaleData","H.norm"and"H". When"rawData","normData" or"scaleData",y has to be a character vector of feature names.When"H.norm" or"H",colorBy can be any valid index toselect one factor of interests. Note that character index follows"Factor_[k]" format, with replacing[k] with an integer.
When"cellMeta",y has to be an available column name inthe table. Note that, fory as well asgroupBy,colorByandsplitBy since a matrix object is feasible incellMetatable, using a column (e.g. named as"column1" in a certain matrix(e.g. named as"matrixVar") should follow the syntax of"matrixVar.column1". When the matrix does not have a "colname"attribute, the subscription goes with"matrixVar.V1","matrixVar.V2" and etc. These are based on the nature ofas.data.frame method on aDataFrame object.
groupBy is basically send toggplot2::aes(x), whilecolorBy is for the "colour" aesthetics. SpecifyingcolorBywithoutgroupBy visually creates grouping but there will not bevarying values on the x-axis, soboxWidth will be forced to the samevalue asviolinWidth under this situation.
Value
A ggplot object when a single plot is intended. A list of ggplotobjects, when multipley variables and/orsplitBy are set. Whenplotly = TRUE, all ggplot objects become plotly (htmlwidget) objects.
Examples
plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "dataset", slot = "cellMeta")plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "leiden_cluster", slot = "cellMeta", splitBy = "dataset", colorBy = "leiden_cluster", box = TRUE, dot = TRUE, ylab = "Total counts per cell")plotCellViolin(pbmcPlot, y = "S100A8", slot = "normData", yFunc = function(x) log2(10000*x + 1), groupBy = "dataset", colorBy = "leiden_cluster", box = TRUE, ylab = "S100A8 Expression")Make dot plot of factor loading in cell groups
Description
This function produces dot plots. Each column represent a groupof cells specified bygroupBy, each row is a factor specified byuseDims. The color of dots reflects mean of factor loading ofspecified factors in each cell group and sizes reflects the percentage ofcells that have loadings of a factor in a group. We utilizeComplexHeatmapfor simplified management of adding annotation and slicing subplots. This wasinspired by the implementation inscCustomize.
Usage
plotClusterFactorDot( object, groupBy = NULL, useDims = NULL, useRaw = FALSE, splitBy = NULL, factorScaleFunc = NULL, cellIdx = NULL, legendColorTitle = "Mean Factor\nLoading", legendSizeTitle = "Percent\nLoaded", viridisOption = "viridis", verbose = FALSE, ...)Arguments
object | Aliger object |
groupBy | The names of the columns in |
useDims | A Numeric vector to specify exact factors of interests.Default |
useRaw | Whether to use un-aligned cell factor loadings ( |
splitBy | The names of the columns in |
factorScaleFunc | A function object applied to factor loading matrix forscaling the value for better visualization. Default |
cellIdx | Valid cell subscription. See |
legendColorTitle | Title for colorbar legend. Default |
legendSizeTitle | Title for size legend. Default |
viridisOption | Name of available viridis palette. See |
verbose | Logical. Whether to show progress information. Mainly whensubsetting data. Default |
... | Additional theme setting arguments passed to |
Details
For..., please notice that argumentscolorMat,sizeMat,featureAnnDF,cellSplitVar,cellLabelsandviridisOption from.complexHeatmapDotPlot arealready occupied by this function internally. A lot of arguments fromHeatmap have also been occupied:matrix,name, heatmap_legend_param, rect_gp, col, layer_fun, km, border, border_gp,column_gap, row_gap, cluster_row_slices, cluster_rows, row_title_gp,row_names_gp, row_split, row_labels, cluster_column_slices, cluster_columns,column_split, column_title_gp, column_title, column_labels, column_names_gp,top_annotation.
Value
HeatmapList object.
Examples
plotClusterFactorDot(pbmcPlot)Make dot plot of gene expression in cell groups
Description
This function produces dot plots. Each column represent a groupof cells specified bygroupBy, each row is a gene specified byfeatures. The color of dots reflects mean of normalized expression ofspecified genes in each cell group and sizes reflects the percentage of cellsexpressing each gene in a group. We utilizeComplexHeatmapfor simplified management of adding annotation and slicing subplots. This wasinspired by the implementation inscCustomize.
Usage
plotClusterGeneDot( object, features, groupBy = NULL, splitBy = NULL, featureScaleFunc = function(x) log2(10000 * x + 1), cellIdx = NULL, legendColorTitle = "Mean\nExpression", legendSizeTitle = "Percent\nExpressed", viridisOption = "magma", verbose = FALSE, ...)Arguments
object | Aliger object |
features | Use a character vector of gene names to make plain dot plotlike a heatmap. Use a data.frame where the first column is gene names andsecond column is a grouping variable (e.g. subset |
groupBy | The names of the columns in |
splitBy | The names of the columns in |
featureScaleFunc | A function object applied to normalized data forscaling the value for better visualization. Default |
cellIdx | Valid cell subscription. See |
legendColorTitle | Title for colorbar legend. Default |
legendSizeTitle | Title for size legend. Default |
viridisOption | Name of available viridis palette. See |
verbose | Logical. Whether to show progress information. Mainly whensubsetting data. Default |
... | Additional theme setting arguments passed to |
Details
For..., please notice that argumentscolorMat,sizeMat,featureAnnDF,cellSplitVar,cellLabelsandviridisOption from.complexHeatmapDotPlot arealready occupied by this function internally. A lot of arguments fromHeatmap have also been occupied:matrix,name, heatmap_legend_param, rect_gp, col, layer_fun, km, border, border_gp,column_gap, row_gap, cluster_row_slices, cluster_rows, row_title_gp,row_names_gp, row_split, row_labels, cluster_column_slices, cluster_columns,column_split, column_title_gp, column_title, column_labels, column_names_gp,top_annotation.
Value
HeatmapList object.
Examples
# Use character vector of genesfeatures <- varFeatures(pbmcPlot)[1:10]plotClusterGeneDot(pbmcPlot, features = features)# Use data.frame with grouping information, with more tweak on plotfeatures <- data.frame(features, rep(letters[1:5], 2))plotClusterGeneDot(pbmcPlot, features = features, clusterFeature = TRUE, clusterCell = TRUE, maxDotSize = 6)Create violin plot for multiple genes grouped by clusters
Description
Make violin plots for each given gene grouped by cluster variable and stackalong y axis.
Usage
plotClusterGeneViolin( object, gene, groupBy = NULL, colorBy = NULL, box = FALSE, boxAlpha = 0.1, yFunc = function(x) log1p(x * 10000), showLegend = !is.null(colorBy), xlabAngle = 40, ...)Arguments
object | Aliger object. |
gene | Character vector of gene names. |
groupBy | The name of an available categorical variable in |
colorBy | The name of another categorical variable in |
box | Logical, whether to add boxplot. Default |
boxAlpha | Numeric, transparency of boxplot. Default |
yFunc | Function to transform the y-axis. Default is |
showLegend | Whether to show the legend. Default |
xlabAngle | Numeric, counter-clockwise rotation angle in degrees of Xaxis label text. Default |
... | Arguments passed on to
|
Details
Ifxlab need to be set, setxlabAngle at the same time. This isdue to that the argument parsing mechanism will partially match it to mainfunction arguments before matching the... arguments.
Value
A ggplot object.
Examples
plotClusterGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1:10])Create density plot basing on specified coordinates
Description
This function shows the cell density presented in a 2Ddimensionality reduction coordinates. Density is shown with coloring andcontour lines. A scatter plot of the dimensionality reduction is added aswell. The density plot can be splitted by categorical variables (e.g."dataset"), while the scatter plot will always be shown for all cellsin subplots as a reference of the global structure.
Usage
plotDensityDimRed( object, useDimRed = NULL, splitBy = NULL, combinePlot = TRUE, minDensity = 8, contour = TRUE, contourLineWidth = 0.3, contourBins = 5, dot = TRUE, dotColor = "grey", dotSize = 0.6, dotAlpha = 0.3, dotRaster = NULL, title = NULL, legendFillTitle = "Density", colorPalette = "magma", colorDirection = -1, ...)Arguments
object | Aliger object |
useDimRed | Name of the variable storing dimensionality reduction resultin the |
splitBy | Character vector of categorical variable names in |
combinePlot | Logical, whether to utilize |
minDensity | A positive number to filter out low density region coloredon plot. Default |
contour | Logical, whether to draw the contour line. Default |
contourLineWidth | Numeric, the width of the contour line. Default |
contourBins | Number of contour bins. Higher value generates morecontour lines. Default |
dot | Logical, whether to add scatter plot of all cells, even whendensity plot is splitted with |
dotColor,dotSize,dotAlpha | Numeric, controls the appearance of alldots. Default |
dotRaster | Logical, whether to rasterize the scatter plot. Default |
title | Text of main title of the plots. Default |
legendFillTitle | Text of legend title. Default |
colorPalette | Name of the option for |
colorDirection | Color gradient direction for |
... | Arguments passed on to
|
Value
A ggplot object when only one plot is generated, A ggplot objectcombined withplot_grid when multiple plots andcombinePlot = TRUE. A list of ggplot when multiple plots andcombinePlot = FALSE.
Examples
# Example dataset has small number of cells, thus cutoff adjusted.plotDensityDimRed(pbmcPlot, minDensity = 1)Generate scatter plot(s) using liger object
Description
This function allows for using available cell metadata to buildthe x-/y-axis. Available per-cell data can be used to form the color/shapeannotation, including cell metadata, raw or processed gene expression, andunnormalized or aligned factor loading. Multiple coloring variable is allowedfrom the same specification ofslot, and this returns a list of plotswith different coloring values. Users can further split the plot(s) bygrouping on cells (e.g. datasets).
Usage
plotDimRed( object, colorBy = NULL, useDimRed = NULL, slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H", "normPeak", "rawPeak"), colorByFunc = NULL, cellIdx = NULL, splitBy = NULL, shapeBy = NULL, titles = NULL, ...)plotClusterDimRed(object, useCluster = NULL, useDimRed = NULL, ...)plotDatasetDimRed(object, useDimRed = NULL, ...)plotByDatasetAndCluster( object, useDimRed = NULL, useCluster = NULL, combinePlot = TRUE, ...)plotGeneDimRed( object, features, useDimRed = NULL, log = TRUE, scaleFactor = 10000, zeroAsNA = TRUE, colorPalette = "C", ...)plotPeakDimRed( object, features, useDimRed = NULL, log = TRUE, scaleFactor = 10000, zeroAsNA = TRUE, colorPalette = "C", ...)plotFactorDimRed( object, factors, useDimRed = NULL, trimHigh = 0.03, zeroAsNA = TRUE, colorPalette = "D", ...)Arguments
object | Aliger object. |
colorBy | Available variable name in specified |
useDimRed | Name of the variable storing dimensionality reduction resultin the |
slot | Choose the slot to find the |
colorByFunc | Default |
cellIdx | Character, logical or numeric index that can subscribe cells.Missing or |
splitBy | Character vector of categorical variable names in |
shapeBy | Available variable name in |
titles | Title text. A character scalar or a character vector with asmany elements as multiple plots are supposed to be generated. Default |
... | Arguments passed on to
|
useCluster | Name of variable in |
combinePlot | Logical, whether to utilize |
features,factors | Name of genes or index of factors that need to bevisualized. |
log | Logical. Whether to log transform the normalized expression ofgenes. Default |
scaleFactor | Number to be multiplied with the normalized expression ofgenes before log transformation. Default |
zeroAsNA | Logical, whether to swap all zero values to |
colorPalette | Name of viridis palette. See |
trimHigh | Number for highest cut-off to limit the outliers. Factorloading above this value will all be trimmed to this value. Default |
Details
Available option forslot include:"cellMeta","rawData","normData","scaleData","H.norm"and"H". When"rawData","normData" or"scaleData",colorBy has to be a character vector of featurenames. When"H.norm" or"H",colorBy can be any validindex to select one factor of interests. Note that character index follows"Factor_[k]" format, with replacing[k] with an integer.
When"cellMeta",colorBy has to be an available column name inthe table. Note that, forcolorBy as well asx,y,shapeBy andsplitBy, since a matrix object is feasible incellMeta table, using a column (e.g. named as"column1" in acertain matrix (e.g. named as"matrixVar") should follow the syntax of"matrixVar.column1". When the matrix does not have a "colname"attribute, the subscription goes with"matrixVar.V1","matrixVar.V2" and etc. Use"UMAP.1","UMAP.2","TSNE.1" or"TSNE.2" for the 2D embeddings generated withrliger package. These are based on the nature ofas.data.frame methodon aDataFrame object.
Value
A ggplot object when a single plot is intended. A list of ggplotobjects, when multiplecolorBy variables and/orsplitBy areset. Whenplotly = TRUE, all ggplot objects become plotly (htmlwidget)objects.
ggplot object when only one feature (e.g. cluster variable, gene,factor) is set. List object when multiple of those are specified.
Examples
plotDimRed(pbmcPlot, colorBy = "dataset", slot = "cellMeta", labelText = FALSE)plotDimRed(pbmcPlot, colorBy = "S100A8", slot = "normData", dotOrder = "ascending", dotSize = 2)plotDimRed(pbmcPlot, colorBy = 2, slot = "H.norm", dotOrder = "ascending", dotSize = 2, colorPalette = "viridis")plotClusterDimRed(pbmcPlot)plotDatasetDimRed(pbmcPlot)plotByDatasetAndCluster(pbmcPlot)plotGeneDimRed(pbmcPlot, varFeatures(pbmcPlot)[1])plotFactorDimRed(pbmcPlot, 2)Create volcano plot with EnhancedVolcano
Description
Create volcano plot with EnhancedVolcano
Usage
plotEnhancedVolcano(result, group, ...)Arguments
result | Data frame table returned by |
group | Selection of one group available from |
... | Arguments passed to EnhancedVolcano::EnhancedVolcano(), exceptthat |
Value
ggplot
Examples
if (requireNamespace("EnhancedVolcano", quietly = TRUE)) { defaultCluster(pbmc) <- pbmcPlot$leiden_cluster # Test the DEG between "stim" and "ctrl", within each cluster result <- runPairwiseDEG( pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "defaultCluster" ) plotEnhancedVolcano(result, "0.stim")}Visualize GO enrichment test result in dot plot
Description
Visualize GO enrichment test result in dot plot
Usage
plotGODot( result, group = NULL, query = NULL, pvalThresh = 0.05, n = 20, minDotSize = 3, maxDotSize = 7, termIDMatch = "^GO", colorPalette = "E", colorDirection = -1, ...)Arguments
result | Returned list object from |
group | A single group name to be visualized, must be available in |
query | A single string selecting from which query to show the result.Choose from |
pvalThresh | Numeric scalar, cutoff for p-value where smaller values areconsidered as significant. Default |
n | Number of top terms to be shown, ranked by p-value. Default |
minDotSize | The size of the dot representing the minimum gene count.Default |
maxDotSize | The size of the dot representing the maximum gene count. |
termIDMatch | Regular expression pattern to match the term ID. Default |
colorPalette,colorDirection | Viridis palette options. Default |
... | Arguments passed on to
|
Value
A ggplot object.
Examples
if (requireNamespace("gprofiler2", quietly = TRUE)) { go <- runGOEnrich(deg.pw) plotGODot(go)}Plot Heatmap of Gene Expression or Factor Loading
Description
Plot Heatmap of Gene Expression or Factor Loading
Usage
plotGeneHeatmap( object, features, cellIdx = NULL, slot = c("normData", "rawData", "scaleData", "scaleUnsharedData"), useCellMeta = NULL, cellAnnotation = NULL, featureAnnotation = NULL, cellSplitBy = NULL, featureSplitBy = NULL, viridisOption = "C", ...)plotFactorHeatmap( object, factors = NULL, cellIdx = NULL, slot = c("H.norm", "H"), useCellMeta = NULL, cellAnnotation = NULL, factorAnnotation = NULL, cellSplitBy = NULL, factorSplitBy = NULL, trim = c(0, 0.03), viridisOption = "D", ...)Arguments
object | Aliger object, with data to be plot available. |
features,factors | Character vector of genes of interests or numericindex of factor to be involved. |
cellIdx | Valid index to subscribe cells to be included. See |
slot | Use the chosen matrix for heatmap. For |
useCellMeta | Character vector of available variable names in |
cellAnnotation | data.frame object for using external annotation, witheach column a variable and each row is a cell. Row names of this data.framewill be used for matching cells involved in heatmap. For cells not found inthis data.frame, |
featureAnnotation,factorAnnotation | Similar as |
cellSplitBy | Character vector of variable names available in annotationgiven by |
featureSplitBy,factorSplitBy | Similar as |
viridisOption | See |
... | Arguments passed on to
|
trim | Numeric vector of two numbers. Higher value limits the maximumvalue and lower value limits the minimum value. Default |
Value
HeatmapList-class object
Examples
plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot))plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot), useCellMeta = c("leiden_cluster", "dataset"), cellSplitBy = "leiden_cluster")plotFactorHeatmap(pbmcPlot)plotFactorHeatmap(pbmcPlot, cellIdx = pbmcPlot$leiden_cluster %in% 1:3, useCellMeta = c("leiden_cluster", "dataset"), cellSplitBy = "leiden_cluster")Visualize factor expression and gene loading
Description
Visualize factor expression and gene loading
Usage
plotGeneLoadings( object, markerTable, useFactor, useDimRed = NULL, nLabel = 15, nPlot = 30, ...)plotGeneLoadingRank( object, markerTable, useFactor, nLabel = 15, nPlot = 30, ...)Arguments
object | Aliger object with valid factorization result. |
markerTable | Returned result of |
useFactor | Integer index for which factor to visualize. |
useDimRed | Name of the variable storing dimensionality reduction resultin the |
nLabel | Integer, number of top genes to be shown with text labels.Default |
nPlot | Integer, number of top genes to be shown in the loading rankplot. Default |
... | Arguments passed on to
|
Examples
result <- getFactorMarkers(pbmcPlot, "ctrl", "stim")plotGeneLoadings(pbmcPlot, result, useFactor = 2)Visualize gene expression or cell metadata with violin plot
Description
Visualize gene expression or cell metadata with violin plot
Usage
plotGeneViolin(object, gene, byDataset = TRUE, groupBy = NULL, ...)plotTotalCountViolin(object, groupBy = "dataset", ...)plotGeneDetectedViolin(object, groupBy = "dataset", ...)Arguments
object | Aliger object. |
gene | Character gene names. |
byDataset | Logical, whether the violin plot should be splitted bydataset. Default |
groupBy | Names of available categorical variable in |
... | Arguments passed on to
|
Value
ggplot if using a single gene and not splitting by dataset.Otherwise, list of ggplot.
Examples
plotGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1], groupBy = "leiden_cluster")plotTotalCountViolin(pbmc)plotGeneDetectedViolin(pbmc, dot = TRUE, box = TRUE, colorBy = "dataset")Comprehensive group splited cluster plot on dimension reduction withproportion
Description
This function produces combined plot on group level (e.g. dataset, othermetadata variable like biological conditions). Scatter plot of dimensionreduction with cluster labeled is generated per group. Furthermore, a stackedbarplot of cluster proportion within each group is also combined with thesubplot of each group.
Usage
plotGroupClusterDimRed( object, useGroup = "dataset", useCluster = NULL, useDimRed = NULL, combinePlot = TRUE, droplevels = TRUE, relHeightMainLegend = c(5, 1), relHeightDRBar = c(10, 1), mainNRow = NULL, mainNCol = NULL, legendNRow = 1, ...)Arguments
object | Aliger object with dimension reduction, groupingvariable and cluster assignment in |
useGroup | Variable name of the group division in metadata. Default |
useCluster | Name of variable in |
useDimRed | Name of the variable storing dimensionality reduction resultin |
combinePlot | Whether to return combined plot. Default |
droplevels | Logical, whether to perform |
relHeightMainLegend | Relative heights of the main combination panel andthe legend at the bottom. Must be a numeric vector of 2 numbers. Default |
relHeightDRBar | Relative heights of the scatter plot and the barplotwithin each subpanel. Must be a numeric vector of 2 numbers. Default |
mainNRow,mainNCol | Arrangement of the main plotting region, for numberof rows and columns. Default |
legendNRow | Arrangement of the legend, number of rows. Default |
... | Arguments passed on to
|
Value
ggplot object when only one feature (e.g. cluster variable, gene,factor) is set. List object when multiple of those are specified.
Examples
plotGroupClusterDimRed(pbmcPlot)Create heatmap for showing top marker expression in conditions
Description
Create heatmap for showing top marker expression in conditions
Usage
plotMarkerHeatmap( object, result, topN = 5, lfcThresh = 1, padjThresh = 0.05, pctInThresh = 50, pctOutThresh = 50, dedupBy = c("logFC", "padj"), groupBy = NULL, groupSize = 50, column_title = NULL, ...)Arguments
object | Aliger object, with normalized data and metadatato annotate available. |
result | The data.frame returned by |
topN | Number of top features to be plot for each group. Default |
lfcThresh | Hard threshold on logFC value. Default |
padjThresh | Hard threshold on adjusted P-value. Default |
pctInThresh,pctOutThresh | Threshold on expression percentage. Thesemean that a feature will only pass the filter if it is expressed in more than |
dedupBy | When ranking by padj and logFC and a feature is ranked as topfor multiple clusters, assign this feature as the marker of a cluster whenit has the largest |
groupBy | Cell metadata variable names for cell grouping. Downsamplebalancing will also be aware of this. Default |
groupSize | Maximum number of cells in each group to be downsampled forplotting. Default |
column_title | Title on the column. Default |
... | Arguments passed on to
|
Value
AHeatmapList-class object.
Examples
defaultCluster(pbmc) <- pbmcPlot$leiden_clusterpbmc <- normalize(pbmc)plotMarkerHeatmap(pbmc, deg.marker)Create heatmap for pairwise DEG analysis result
Description
Create heatmap for pairwise DEG analysis result
Usage
plotPairwiseDEGHeatmap( object, result, group = NULL, topN = 20, absLFCThresh = 1, padjThresh = 0.05, pctInThresh = 50, pctOutThresh = 50, downsampleSize = 200, useCellMeta = NULL, column_title = NULL, seed = 1, ...)Arguments
object | Aliger object, with normalized data and metadatato annotate available. |
result | The data.frame returned by |
group | The test group name among the result to be shown. Must specifyonly one if multiple tests are available (i.e. split test). Default |
topN | Maximum number of top significant features to be plot for up- anddown-regulated genes. Default |
absLFCThresh | Hard threshold on absolute logFC value. Default |
padjThresh | Hard threshold on adjusted P-value. Default |
pctInThresh,pctOutThresh | Threshold on expression percentage. Thesemean that a feature will only pass the filter if it is expressed in more than |
downsampleSize | Maximum number of downsampled cells to be shown in theheatmap. The downsampling is balanced on the cells involved in the testspecified. Default |
useCellMeta | Cell metadata variable names for cell grouping. Default |
column_title | Title on the column. Default |
seed | Random seed for reproducibility. Default |
... | Arguments passed on to
|
Value
AHeatmapList-class object.
Examples
defaultCluster(pbmc) <- pbmcPlot$leiden_clusterpbmc$condition_cluster <- interaction(pbmc$dataset, pbmc$defaultCluster)deg <- runPairwiseDEG(pbmc, 'stim.0', 'stim.1', 'condition_cluster')pbmc <- normalize(pbmc)plotPairwiseDEGHeatmap(pbmc, deg, 'stim.0')Visualize proportion across two categorical variables
Description
plotProportionBar creates bar plots comparing thecross-category proportion.plotProportionDot creates dot plots.plotClusterProportions has variable pre-specified and calls the dotplot.plotProportion produces a combination of both bar plots and dotplot.
Having package "ggrepel" installed can help adding tidier percentageannotation on the pie chart. Runoptions(ggrepel.max.overlaps = n)before plotting to set allowed label overlaps.
Usage
plotProportion( object, class1 = NULL, class2 = "dataset", method = c("stack", "group", "pie"), ...)plotProportionDot( object, class1 = NULL, class2 = "dataset", showLegend = FALSE, panelBorder = TRUE, ...)plotProportionBar( object, class1 = NULL, class2 = "dataset", method = c("stack", "group"), inclRev = FALSE, panelBorder = TRUE, combinePlot = TRUE, ...)plotClusterProportions(object, useCluster = NULL, return.plot = FALSE, ...)plotProportionPie( object, class1 = NULL, class2 = "dataset", labelSize = 4, labelColor = "black", circleColors = NULL, ...)Arguments
object | Aliger object. |
class1,class2 | Each should be a single name of a categorical variableavailable in |
method | For bar plot, choose whether to draw |
... | Arguments passed on to
|
showLegend | Whether to show the legend. Default |
panelBorder | Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. Default |
inclRev | Logical, for barplot, whether to reverse the specification for |
combinePlot | Logical, whether to combine the two plots with |
useCluster | For |
return.plot | |
labelSize,labelColor | Settings on pie chart percentage label. Default |
circleColors | Character vector of colors. |
Value
ggplot or list of ggplot
Examples
plotProportion(pbmcPlot)plotProportionBar(pbmcPlot, method = "group")plotProportionPie(pbmcPlot)Box plot of cluster proportion in each dataset, grouped by condition
Description
This function calculate the proportion of each category (e.g. cluster, celltype) within each dataset, and then make box plot grouped by condition. Theproportion of all categories within one dataset sums up to 1. The conditionvariable must be a variable of dataset, i.e. each dataset must belong to onlyone condition.
Usage
plotProportionBox( object, useCluster = NULL, conditionBy = NULL, sampleBy = "dataset", splitByCluster = FALSE, dot = FALSE, dotSize = getOption("ligerDotSize", 1), dotJitter = FALSE, ...)Arguments
object | Aliger object. |
useCluster | Name of variable in |
conditionBy | Name of the variable in |
sampleBy | Name of the variable in |
splitByCluster | Logical, whether to split the wide grouped box plot bycluster, into a list of boxplots for each cluster. Default |
dot | Logical, whether to add dot plot on top of the box plot. Default |
dotSize | Size of the dot. Default uses user option "ligerDotSize", or |
dotJitter | Logical, whether to jitter the dot to avoid overlappingwithin a box when many dots are presented. Default |
... | Arguments passed on to
|
Value
A ggplot object or a list of ggplot objects ifsplitByCluster = TRUE.
Examples
# "boxes" are expected to appear as horizontal lines, because there's no# "condition" variable that groups the datasets in the example object, and# thus only one value exists for each "box".plotProportionBox(pbmcPlot, conditionBy = "dataset")Make Riverplot/Sankey diagram that shows label mapping across datasets
Description
Creates a riverplot/Sankey diagram to show how independent clusterassignments from two datasets map onto a joint clustering. Prior knowledge ofcell annotation for the given datasets is required to make sense from thevisualization. Dataset original annotation can be added with the syntax shownin example code in this manual. The joint clustering could be generated withrunCluster or set by any other metadata annotation.
Dataset original annotation can be inserted before running this functionusingcellMeta<- method. Please see example below.
This function depends on CRAN available package "sankey" and it has to beinstalled in order to make this function work.
Usage
plotSankey( object, cluster1, cluster2, clusterConsensus = NULL, minFrac = 0.01, minCell = 10, titles = NULL, prefixes = NULL, labelCex = 1, titleCex = 1.1, colorValues = scPalette, mar = c(2, 2, 4, 2))Arguments
object | Aliger object with all three clusteringvariables available. |
cluster1,cluster2 | Name of the variables in |
clusterConsensus | Name of the joint cluster variable to use. Defaultuses the default clustering of the object. Can select a variable name in |
minFrac | Numeric. Minimum fraction of cluster for an edge to be shown.Default |
minCell | Numeric. Minimum number of cells for an edge to be shown.Default |
titles | Character vector of three. Customizes the column title textshown. Default uses the variable names |
prefixes | Character vector of three. Cluster names have to be uniqueacross all three variables, so this is provided to deduplicate the clustersby adding |
labelCex | Numeric. Amount by which node label text should be magnifiedrelative to the default. Default |
titleCex | Numeric. Amount by which node label text should be magnifiedrelative to the default. Default |
colorValues | Character vector of color codes to set color for eachlevel in the consensus clustering. Default |
mar | Numeric vector of the form |
Value
No returned value. The sankey diagram will be displayed instead.
Note
This function works as a replacement of the functionmakeRiverplotin rliger <1.99. We decide to make a new function because the dependencyadopted by the older version is archived on CRAN and will be no longeravailable.
Examples
# Make fake dataset specific labels from joint clustering resultcellMeta(pbmcPlot, "ctrl_cluster", "ctrl") <- cellMeta(pbmcPlot, "leiden_cluster", "ctrl")cellMeta(pbmcPlot, "stim_cluster", "stim") <- cellMeta(pbmcPlot, "leiden_cluster", "stim")if (requireNamespace("sankey", quietly = TRUE)) { plotSankey(pbmcPlot, "ctrl_cluster", "stim_cluster", titles = c("control", "LIGER", "stim"), prefixes = c("c", NA, "s"))}Visualize a spatial dataset
Description
Simple visualization of spatial coordinates. See example code for how to haveinformation preset in the object. Arguments to the liger object method arepassed down to ligerDataset method.
Usage
plotSpatial2D(object, ...)## S3 method for class 'liger'plotSpatial2D(object, dataset, useCluster = NULL, legendColorTitle = NULL, ...)## S3 method for class 'ligerSpatialDataset'plotSpatial2D( object, useCluster = NULL, legendColorTitle = NULL, useDims = c(1, 2), xlab = NULL, ylab = NULL, labelText = FALSE, panelBorder = TRUE, ...)Arguments
object | Either aliger object containing a spatialdataset or aligerSpatialDataset object. |
... | Arguments passed on to
|
dataset | Name of one spatial dataset. |
useCluster | Either the name of one variable in |
legendColorTitle | Alternative title text in the legend. Default |
useDims | Numeric vector of two, choosing the coordinates to be drawnon 2D space. (STARmap data could have 3 dimensions.) Default |
xlab,ylab | Text label on x-/y-axis. Default |
labelText | Logical, whether to label annotation onto the scatter plot.Default |
panelBorder | Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. Default |
Value
A ggplot object
Examples
ctrl.fake.spatial <- as.ligerDataset(dataset(pbmc, "ctrl"), modal = "spatial")fake.coords <- matrix(rnorm(2 * ncol(ctrl.fake.spatial)), ncol = 2)coordinate(ctrl.fake.spatial) <- fake.coordsdataset(pbmc, "ctrl") <- ctrl.fake.spatialdefaultCluster(pbmc) <- pbmcPlot$leiden_clusterplotSpatial2D(pbmc, dataset = "ctrl")Plot the variance vs mean of feature expression
Description
For each dataset where the feature variablitity is calculated,a plot of log10 feature expression variance and log10 mean will be produced.Features that are considered as variable would be highlighted in red.
Usage
plotVarFeatures(object, combinePlot = TRUE, dotSize = 1, ...)Arguments
object | liger object. |
combinePlot | Logical. If |
dotSize | Controls the size of dots in the main plot. Default |
... | More theme setting parameters passed to |
Value
ggplot object whencombinePlot = TRUE, a list ofggplot objects whencombinePlot = FALSE
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)plotVarFeatures(pbmc)Create volcano plot for Wilcoxon test result
Description
plotVolcano is a simple implementation and shares most of argumentswith other rliger plotting functions.plotEnhancedVolcano is awrapper function ofEnhancedVolcano::EnhancedVolcano(), which hasprovides substantial amount of arguments for graphical control. However, thatrequires the installation of package "EnhancedVolcano".
highlight andlabelTopN both controls the feature namelabeling, whereashighlight is considered first. If both are asdefault (NULL), all significant features will be labeled.
Usage
plotVolcano( result, group = NULL, logFCThresh = 1, padjThresh = 0.01, highlight = NULL, labelTopN = NULL, dotSize = 2, dotAlpha = 0.8, legendPosition = "top", labelSize = 4, ...)Arguments
result | Data frame table returned by |
group | Selection of one group available from |
logFCThresh | Number for the threshold on the absolute value of the log2fold change statistics. Default |
padjThresh | Number for the threshold on the adjusted p-valuestatistics. Default |
highlight | A character vector of feature names to be highlighted.Default |
labelTopN | Number of top differential expressed features to be labeledon the top of the dots. Ranked by adjusted p-value first and absolute valueof logFC next. Default |
dotSize,dotAlpha | Numbers for universal aesthetics control of dots.Default |
legendPosition | Text indicating where to place the legend. Choose from |
labelSize | Size of labeled top features and line annotations. Default |
... | Arguments passed on to
|
Value
ggplot
Examples
plotVolcano(deg.pw, "stim.CD14 Mono")Show information about factorGSEA object
Description
Show information about factorGSEA object
Usage
## S3 method for class 'factorGSEA'print(x, ...)Arguments
x | A |
... | S3 method convention, not used for now. |
Quantile align (normalize) factor loadings
Description
This is a deprecated function. Calling 'quantileNorm' instead.
Usage
quantileAlignSNF( object, knn_k = 20, k2 = 500, prune.thresh = 0.2, ref_dataset = NULL, min_cells = 20, quantiles = 50, nstart = 10, resolution = 1, dims.use = 1:ncol(x = object@H[[1]]), dist.use = "CR", center = FALSE, small.clust.thresh = 0, id.number = NULL, print.mod = FALSE, print.align.summary = FALSE)Arguments
object |
|
knn_k | Number of nearest neighbors for within-dataset knn graph (default 20). |
k2 | Horizon parameter for shared nearest factor graph. Distances to all but the k2 nearestneighbors are set to 0 (cuts down on memory usage for very large graphs). (default 500) |
prune.thresh | Minimum allowed edge weight. Any edges below this are removed (given weight0) (default 0.2) |
ref_dataset | Name of dataset to use as a "reference" for normalization. By default,the dataset with the largest number of cells is used. |
min_cells | Minimum number of cells to consider a cluster shared across datasets (default 2) |
quantiles | Number of quantiles to use for quantile normalization (default 50). |
nstart | Number of times to perform Louvain community detection with different randomstarts (default 10). |
resolution | Controls the number of communities detected. Higher resolution -> morecommunities. (default 1) |
dims.use | Indices of factors to use for shared nearest factor determination (default |
dist.use | Distance metric to use in calculating nearest neighbors (default "CR"). |
center | Centers the data when scaling factors (useful for less sparse modalities likemethylation data). (default FALSE) |
small.clust.thresh | Extracts small clusters loading highly on single factor with fewercells than this before regular alignment (default 0 – no small cluster extraction). |
id.number | Number to use for identifying edge file (when running in parallel)(generates random value by default). |
print.mod | Print modularity output from clustering algorithm (default FALSE). |
print.align.summary | Print summary of clusters which did not align normally (default FALSE). |
Details
This process builds a shared factor neighborhood graph to jointly cluster cells, then quantilenormalizes corresponding clusters.
The first step, building the shared factor neighborhood graph, is performed in SNF(), andproduces a graph representation where edge weights between cells (across all datasets)correspond to their similarity in the shared factor neighborhood space. An important parameterhere is knn_k, the number of neighbors used to build the shared factor space (see SNF()). Afterwards,modularity-based community detection is performed on this graph (Louvain clustering) in orderto identify shared clusters across datasets. The method was first developed by Waltman and van Eck(2013) and source code is available at http://www.ludowaltman.nl/slm/. The most important parameterhere is resolution, which corresponds to the number of communities detected.
Next we perform quantile alignment for each dataset, factor, and cluster (bystretching/compressing datasets' quantiles to better match those of the reference dataset). Thesealigned factor loadings are combined into a single matrix and returned as H.norm.
Value
liger object with H.norm and cluster slots set.
Examples
## Not run: # liger object, factorization completeligerex# do basic quantile alignmentligerex <- quantileAlignSNF(ligerex)# higher resolution for more clusters (note that SNF is conserved)ligerex <- quantileAlignSNF(ligerex, resolution = 1.2)# change knn_k for more fine-grained local clusteringligerex <- quantileAlignSNF(ligerex, knn_k = 15, resolution = 1.2)## End(Not run)Quantile Align (Normalize) Factor Loadings
Description
This process builds a shared factor neighborhood graph tojointly cluster cells, then quantile normalizes corresponding clusters.
The first step, building the shared factor neighborhood graph, is performedin SNF(), and produces a graph representation where edge weights betweencells (across all datasets) correspond to their similarity in the sharedfactor neighborhood space. An important parameter here isnNeighbors,the number of neighbors used to build the shared factor space.
Next we perform quantile alignment for each dataset, factor, and cluster (bystretching/compressing datasets' quantiles to better match those of thereference dataset).
Usage
quantileNorm(object, ...)## S3 method for class 'liger'quantileNorm( object, quantiles = 50, reference = NULL, minCells = 20, nNeighbors = 20, useDims = NULL, center = FALSE, maxSample = 1000, eps = 0.9, refineKNN = TRUE, clusterName = "quantileNorm_cluster", seed = 1, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'Seurat'quantileNorm( object, reduction = "inmf", quantiles = 50, reference = NULL, minCells = 20, nNeighbors = 20, useDims = NULL, center = FALSE, maxSample = 1000, eps = 0.9, refineKNN = TRUE, clusterName = "quantileNorm_cluster", seed = 1, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | Aliger or Seurat object with valid factorizationresult available (i.e. |
... | Arguments passed to other S3 methods of this function. |
quantiles | Number of quantiles to use for quantile normalization.Default |
reference | Character, numeric or logical selection of one dataset, outof all available datasets in |
minCells | Minimum number of cells to consider a cluster shared acrossdatasets. Default |
nNeighbors | Number of nearest neighbors for within-dataset knn graph.Default |
useDims | Indices of factors to use for shared nearest factordetermination. Default |
center | Whether to center the data when scaling factors. Could beuseful for less sparse modalities like methylation data. Default |
maxSample | Maximum number of cells used for quantile normalization ofeach cluster and factor. Default |
eps | The error bound of the nearest neighbor search. Lower values givemore accurate nearest neighbor graphs but take much longer to compute.Default |
refineKNN | whether to increase robustness of cluster assignments usingKNN graph. Default |
clusterName | Variable name that will store the clustering resultin metadata of aliger object or a |
seed | Random seed to allow reproducible results. Default |
verbose | Logical. Whether to show information of the progress. Default |
reduction | Name of the reduction where LIGER integration result isstored. Default |
Value
Updated input object
liger method
Update the
H.normslot for the alignment cell factorloading, ready for running graph based community detectionclustering or dimensionality reduction for visualization.Update the
cellMataslot with a cluster assignment basingon cell factor loading
Seurat method
Update the
reductionsslot with a newDimReducobject containing the aligned cell factor loading.Update the metadata with a cluster assignment basing on cellfactor loading
Examples
pbmc <- quantileNorm(pbmcPlot)
Quantile align (normalize) factor loading
Description
Please turn toquantileNorm.
This process builds a shared factor neighborhood graph to jointly clustercells, then quantile normalizes corresponding clusters.
The first step, building the shared factor neighborhood graph, is performedin SNF(), and produces a graph representation where edge weights betweencells (across all datasets) correspond to their similarity in the sharedfactor neighborhood space. An important parameter here is knn_k, the numberof neighbors used to build the shared factor space.
Next we perform quantile alignment for each dataset, factor, and cluster (bystretching/compressing datasets' quantiles to better match those of thereference dataset). These aligned factor loadings are combined into a singlematrix and returned as H.norm.
Arguments
object |
|
knn_k | Number of nearest neighbors for within-dataset knn graph(default 20). |
ref_dataset | Name of dataset to use as a "reference" for normalization.By default, the dataset with the largest number of cells is used. |
min_cells | Minimum number of cells to consider a cluster shared acrossdatasets (default 20) |
quantiles | Number of quantiles to use for quantile normalization(default 50). |
eps | The error bound of the nearest neighbor search. (default 0.9)Lower values give more accurate nearest neighbor graphs but take much longerto computer. |
dims.use | Indices of factors to use for shared nearest factordetermination (default |
do.center | Centers the data when scaling factors (useful for lesssparse modalities like methylation data). (default FALSE) |
max_sample | Maximum number of cells used for quantile normalization ofeach cluster and factor. (default 1000) |
refine.knn | whether to increase robustness of cluster assignments usingKNN graph.(default TRUE) |
rand.seed | Random seed to allow reproducible results (default 1) |
Value
liger object with 'H.norm' and 'clusters' slot set.
See Also
Access ligerATACDataset peak data
Description
Similar as how defaultligerDataset data isaccessed.
Usage
rawPeak(x, dataset)rawPeak(x, dataset, check = TRUE) <- valuenormPeak(x, dataset)normPeak(x, dataset, check = TRUE) <- value## S4 method for signature 'liger,character'rawPeak(x, dataset)## S4 replacement method for signature 'liger,character'rawPeak(x, dataset, check = TRUE) <- value## S4 method for signature 'liger,character'normPeak(x, dataset)## S4 replacement method for signature 'liger,character'normPeak(x, dataset, check = TRUE) <- value## S4 method for signature 'ligerATACDataset,missing'rawPeak(x, dataset = NULL)## S4 replacement method for signature 'ligerATACDataset,missing'rawPeak(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerATACDataset,missing'normPeak(x, dataset = NULL)## S4 replacement method for signature 'ligerATACDataset,missing'normPeak(x, dataset = NULL, check = TRUE) <- valueArguments
x | ligerATACDataset object or aligerobject. |
dataset | Name or numeric index of an ATAC dataset. |
check | Logical, whether to perform object validity check on setting newvalue. |
value |
|
Value
The retrieved peak count matrix or the updatedx object.
Load in data from 10X
Description
Enables easy loading of sparse data matrices provided by 10X genomics.
read10X works generally for 10X cellranger pipelines including:CellRanger < 3.0 & >= 3.0 and CellRanger-ARC.
read10XRNA invokesread10X and takes the "Gene Expression" out,so that the result can directly be used to construct aligerobject. See Examples for demonstration.
read10XATAC works for both cellRanger-ARC and cellRanger-ATACpipelines but needs user arguments for correct recognition. Similarly, thereturned value can directly be used for constructing aligerobject.
Usage
read10X( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, reference = NULL, geneCol = 2, cellCol = 1, returnList = FALSE, verbose = getOption("ligerVerbose", TRUE), sample.dirs = path, sample.names = sampleNames, use.filtered = useFiltered, data.type = NULL, merge = NULL, num.cells = NULL, min.umis = NULL)read10XRNA( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, reference = NULL, returnList = FALSE, ...)read10XATAC( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, pipeline = c("atac", "arc"), arcFeatureType = "Peaks", returnList = FALSE, geneCol = 2, cellCol = 1, verbose = getOption("ligerVerbose", TRUE))Arguments
path | (A.) A Directory containing the matrix.mtx, genes.tsv (orfeatures.tsv), and barcodes.tsv files provided by 10X. A vector, a namedvector, a list or a named list can be given in order to load several datadirectories. (B.) The 10X root directory where subdirectories of per-sampleoutput folders can be found. Sample names will by default take the name ofthe vector, list or subfolders. |
sampleNames | A vector of names to override the detected or set samplenames for what is given to |
addPrefix | Logical, whether to add sample names as a prefix to thebarcodes. Default |
useFiltered | Logical, if |
reference | In case of specifying a CellRanger<3 root folder to |
geneCol | Specify which column of genes.tsv or features.tsv to use forgene names. Default |
cellCol | Specify which column of barcodes.tsv to use for cell names.Default |
returnList | Logical, whether to still return a structured list insteadof a single matrix object, in the case where only one sample and only onefeature type can be found. Otherwise will always return a list. Default |
verbose | Logical. Whether to show information of the progress. Default |
sample.dirs,sample.names,use.filtered | These arguments are renamed andwill be deprecated in the future. Please see usage for correspondingarguments. |
data.type,merge,num.cells,min.umis | These arguments are defunctedbecause the functionality can/should be fulfilled with other functions. |
... | Arguments passed to |
pipeline | Which cellRanger pipeline type to find the ATAC data. Choose |
arcFeatureType | When |
Value
When only one sample is given or detected, and only one feature typeis detected or using CellRanger < 3.0, and
returnList = FALSE, asparse matrix object (dgCMatrix class) will be returned.When using
read10XRNAorread10XATAC, which are modalityspecific, returns a list named by samples, and each element is thecorresponding sparse matrix object (dgCMatrix class).read10Xgenerally returns a list named by samples. Each sampleelement will be another list named by feature types even if only one featuretype is detected (or using CellRanger < 3.0) for data structure consistency.The feature type "Gene Expression" always comes as the first type ifavailable.
Examples
## Not run: # For output from CellRanger < 3.0dir <- 'path/to/data/directory'list.files(dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtxmat <- read10X(dir)class(mat) # Should show dgCMatrix# For root directory from CellRanger < 3.0dir <- 'path/to/root'list.dirs(dir) # Should show sample namesmatList <- read10X(dir)names(matList) # Should show the sample namesclass(matList[[1]][["Gene Expression"]]) # Should show dgCMatrix# For output from CellRanger >= 3.0 with multiple data typesdir <- 'path/to/data/directory'list.files(dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gzmatList <- read10X(dir, sampleNames = "tissue1")names(matList) # Shoud show "tissue1"names(matList$tissue1) # Should show feature types, e.g. "Gene Expression" and etc.# For root directory from CellRanger >= 3.0 with multiple data typesdir <- 'path/to/root'list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3"matList <- read10X(dir)names(matList) # Should show the sample names: "rep1", "rep2", "rep3"names(matList$rep1) # Should show the avalable feature types for rep1## End(Not run)## Not run: # For creating LIGER object from root directory of CellRanger >= 3.0dir <- 'path/to/root'list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3"matList <- read10XRNA(dir)names(matList) # Should show the sample names: "rep1", "rep2", "rep3"sapply(matList, class) # Should show matrix class all are "dgCMatrix"lig <- createLigerObject(matList)## End(Not run)Read 10X cellranger files (matrix, barcodes and features) into R session
Description
This function works for loading a single sample with specifying the paths tothe matrix.mtx, barcodes.tsv, and features.tsv files. This function isinternally used byread10X functions for loading individualsamples from cellranger output directory, while it can also be convenientwhen out-of-standard files are presented (e.g. data downloaded from GEO).
Usage
read10XFiles( matrixPath, barcodesPath, featuresPath, sampleName = NULL, geneCol = 2, cellCol = 1, isATAC = FALSE, returnList = FALSE)Arguments
matrixPath | Character string, path to the matrix MTX file. Can begzipped. |
barcodesPath | Character string, path to the barcodes TSV file. Can begzipped. |
featuresPath | Character string, path to the features TSV file. Can begzipped. |
sampleName | Character string attached as a prefix to the cell barcodesloaded from the barcodes file. Default |
geneCol | An integer indicating which column in the features file toextract as the feature identifiers. Default |
cellCol | An integer indicating which column in the barcodes file toextract as the cell identifiers. Default |
isATAC | Logical, whether the data is for ATAC-seq. Default |
returnList | Logical, used internally by wrapper functions. Whether toforce putting the loaded matrix in a list even if there's only one matrix.Default |
Value
For a single-modal sample, a dgCMatrix object, or a list of onedgCMatrix whenreturnList = TRUE. A list of multiple dgCMatrix objectswhen multiple feature types are detected.
Examples
## Not run: matrix <- read10XFiles( matrixPath = "path/to/matrix.mtx.gz", barcodesPath = "path/to/barcodes.tsv.gz", featuresPath = "path/to/features.tsv.gz")## End(Not run)Read 10X HDF5 file
Description
Read count matrix from 10X CellRanger HDF5 file. By default,read10XH5load scRNA, scATAC or multimodal data into memory (inMemory = TRUE).To use LIGER in delayed mode for handling large datasets, setinMemory = FALSE to load the data as aDelayedArray object. Thedelayed mode only supports scRNA data for now.
Usage
read10XH5(filename, inMemory = TRUE, useNames = TRUE, featureMakeUniq = TRUE)read10XH5Mem(filename, useNames = TRUE, featureMakeUniq = TRUE)read10XH5Delay(filename, useNames = TRUE, featureMakeUniq = TRUE)Arguments
filename | Character string, path to the HDF5 file. |
inMemory | Logical, whether to load the data into memory. Default |
useNames | Logical, whether to use gene names as row names. Default |
featureMakeUniq | Logical, whether to make gene names unique. Default |
Value
A sparse matrix when only using older CellRanger output HDF5 file orwhen only one genome and one modality is detected. When multiple genomes areavailable, will return a list for each genome. When using multimodal data,each genome will be a list of matrices for each modality. The matrix will beof dgCMatrix class when in memory, or a TENxMatrix object when in delayedmode.
Examples
matrix <- read10XH5( filename = system.file("extdata/ctrl.h5", package = "rliger"), inMemory = TRUE)class(matrix) # Should show dgCMatrixif (requireNamespace("HDF5Array", quietly = TRUE)) { matrix <- read10XH5( filename = system.file("extdata/ctrl.h5", package = "rliger"), inMemory = FALSE ) print(class(matrix)) # Should show TENxMatrix}Read matrix from H5AD file
Description
Read raw count matrix from H5AD file. By default,readH5AD loadspecified layer into memory (inMemory = TRUE). To use LIGER in delayedmode for handling large datasets, setinMemory = FALSE to load thedata as aDelayedArray object. Note that only CSR format is supportedfor the matrix.
Usage
readH5AD(filename, layer, inMemory = TRUE, obs = FALSE)readH5ADMem(filename, layer, obs = FALSE)readH5ADDelay(filename, layer, obs = FALSE)Arguments
filename | Character string, path to the H5AD file. |
layer | Character string specifying the H5 path of raw count data to beloaded. Use |
inMemory | Logical, whether to load the data into memory. Default |
obs | Logical, whether to also load the cell metadata from |
Details
Currently, the only supported H5AD AnnData encoding versions are as follows:
adata.X,adata.raw.X, oradata.layers['layer']-csr_matrix 0.1.0adata.obsandadata.var- dataframe 0.2.0Categoricals in a data frame - categorical 0.2.0
If users possess H5AD files encoded with older specification, please eitheropen an issue on GitHub or use R package 'anndata' to manually extractinformation.
Value
When loaded in memory, a sparse matrix of classdgCMatrix willbe returned. When loaded in delayed mode, aTENxMatrix object will bereturned. Ifobs = TRUE, a list containing the matrix and the cellmetadata will be returned.
Examples
tempH5AD <- tempfile(fileext = '.h5ad')writeH5AD(pbmc, tempH5AD, overwrite = TRUE)mat <- readH5AD(tempH5AD, layer = 'X')delayMat <- readH5AD(tempH5AD, layer = 'X', inMemory = FALSE)Read liger object from RDS file
Description
This file reads a liger object stored in RDS files under all kinds of types.
Aliger object with in-memory data created from packageversion since 1.99.
A liger object with on-disk H5 data associated, where the link to H5 fileswill be automatically restored.
A liger object created with older package version, and can be updated tothe latest data structure by default.
Usage
readLiger( filename, dimredName, clusterName = "clusters", h5FilePath = NULL, update = TRUE)Arguments
filename | Path to an RDS file of a |
dimredName | The name of variable in |
clusterName | The name of variable in |
h5FilePath | Named character vector for all H5 file paths. Not requiredfor object run with in-memory analysis. For object containing H5-basedanalysis (e.g. online iNMF), this must be supplied if the H5 file location isdifferent from that at creation time. |
update | Logical, whether to update an old (<=1.99.0) |
Value
New version ofliger object
Examples
# Save and read regular current-version liger objecttempPath <- tempfile(fileext = ".rds")saveRDS(pbmc, tempPath)pbmc <- readLiger(tempPath, dimredName = NULL)# Save and read H5-based liger objecth5Path <- system.file("extdata/ctrl.h5", package = "rliger")h5tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = h5tempPath)lig <- createLiger(list(ctrl = h5tempPath))tempPath <- tempfile(fileext = ".rds")saveRDS(lig, tempPath)lig <- readLiger(tempPath, h5FilePath = list(ctrl = h5tempPath))## Not run: # Read a old liger object <= 1.0.1# Assume the dimensionality reduction method applied was UMAP# Assume the clustering was derived with Louvain methodlig <- readLiger( filename = "path/to/oldLiger.rds", dimredName = "UMAP", clusterName = "louvain")## End(Not run)
Seedownsample
Description
This function mainly aims at downsampling datasets to a sizesuitable for plotting.
Usage
readSubset( object, slot.use = "normData", balance = NULL, max.cells = 1000, chunk = 1000, datasets.use = NULL, genes.use = NULL, rand.seed = 1, verbose = getOption("ligerVerbose", TRUE))Arguments
object | liger object |
slot.use | Only create subset from one or more of |
balance |
|
max.cells | Max number of cells to sample from the grouping based on |
chunk | Integer. Number of maximum number of cells in each chunk,Default |
datasets.use | Index selection of datasets to consider. Default |
genes.use | Character vector. Subset features to this specified range.Default |
rand.seed | Random seed for reproducibility. Default |
verbose | Logical. Whether to show information of the progress. Default |
Value
Subset ofligerobject.
See Also
downsample,subsetLiger,subsetLigerDataset
Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
Remove missing cells or features from liger object
Description
Remove missing cells or features from liger object
Usage
removeMissing( object, orient = c("both", "feature", "cell"), minCells = NULL, minFeatures = NULL, useDatasets = NULL, newH5 = TRUE, filenameSuffix = "removeMissing", verbose = getOption("ligerVerbose", TRUE), ...)removeMissingObs( object, slot.use = NULL, use.cols = TRUE, verbose = getOption("ligerVerbose", TRUE))Arguments
object | liger object |
orient | Choose to remove non-expressing features ( |
minCells | Keep features that are expressed in at least this number ofcells, calculated on a per-dataset base. A single value for all datasets ora vector for each dataset. Default |
minFeatures | Keep cells that express at least this number of features,calculated on a per-dataset base. A single value for all datasets or a vectorfor each dataset. Default |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be processed. Default |
newH5 | Logical, whether to create a new H5 file on disk for eachH5-based dataset on subset. Default |
filenameSuffix | When subsetting H5-based datasets to new H5 files, thissuffix will be added to all the filenames. Default |
verbose | Logical. Whether to show information of the progress. Default |
... | Arguments passed to |
slot.use | Deprecated. Always look at |
use.cols | Deprecated. Previously means "treating each column asa cell" when |
Value
Updated (subset)object.
Note
removeMissingObs will be deprecated.removeMissing covers andexpands the use case and should be easier to understand.
Examples
# The example dataset does not contain non-expressing genes or empty barcodespbmc <- removeMissing(pbmc)Restore links (to HDF5 files) for reloaded liger/ligerDataset object
Description
When loading the saved liger object with HDF5 data in a new Rsession, the links to HDF5 files would be closed. This function enablesthe restoration of those links so that new analyses can be carried out.
Usage
restoreH5Liger(object, filePath = NULL)restoreOnlineLiger(object, file.path = NULL)Arguments
object | liger orligerDataset object. |
filePath | Paths to HDF5 files. A single character path forligerDataset input or a list of paths named by the datasets forliger object input. Default |
file.path | Will be deprecated with |
Value
object with restored links.
Note
restoreOnlineLiger will be deprecated for clarifying the terms usedfor data structure.
Examples
h5Path <- system.file("extdata/ctrl.h5", package = "rliger")tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = tempPath)lig <- createLiger(list(ctrl = tempPath))# Now it is actually an invalid object! which is equivalent to what users# will get with `saveRDS(lig, "object.rds"); lig <- readRDS("object.rds")``closeAllH5(lig)lig <- restoreH5Liger(lig)Retrieve a single matrix of cells from a slot
Description
Only retrieve data from specific slot to reduce memory used bya wholeliger object of the subset. Useful for plotting.Internally used byplotDimRed andplotCellViolin.
Usage
retrieveCellFeature( object, feature, slot = c("rawData", "normData", "scaleData", "H", "H.norm", "cellMeta", "rawPeak", "normPeak"), cellIdx = NULL, ...)Arguments
object | liger object |
feature | Gene names, factor index or cell metadata variable names.Should be available in specified |
slot | Exactly choose from |
cellIdx | Any valid type of index that subset from all cells. Default |
... | Additional arguments passed to |
Value
A matrix object where rows are cells and columns are specifiedfeatures.
Examples
S100A8Exp <- retrieveCellFeature(pbmc, "S100A8")qcMetrics <- retrieveCellFeature(pbmc, c("nUMI", "nGene", "mito"), slot = "cellMeta")Create "scaled data" for DNA methylation datasets
Description
Because gene body mCH proportions are negatively correlated with geneexpression level in neurons, we need to reverse the direction of themethylation data. We do this by simply subtracting all values from themaximum methylation value. The resulting values are positively correlatedwith gene expression. This will only be applied to variable genes detected inprior.
Usage
reverseMethData(object, useDatasets, verbose = getOption("ligerVerbose", TRUE))Arguments
object | Aliger object, with variable genes identified. |
useDatasets | Required. A character vector of the names, a numeric orlogical vector of the index of the datasets that should be identified asmethylation data where the reversed data will be created. |
verbose | Logical. Whether to show information of the progress. Default |
Value
The inputliger object, where thescaleData slotof the specified datasets will be updated with value as described inDescription.
Examples
# Assuming the second dataset in example data "pbmc" is methylation datapbmc <- normalize(pbmc, useDatasets = 1)pbmc <- selectGenes(pbmc, datasets.use = 1)pbmc <- scaleNotCenter(pbmc, useDatasets = 1)pbmc <- reverseMethData(pbmc, useDatasets = 2)Perform consensus iNMF on scaled datasets
Description
This is an experimental function and issubject to change.
Performs consensus integrative non-negative matrix factorization (c-iNMF)to return factorizedH,W, andV matrices. In order toaddress the non-convex nature of NMF, we built on the cNMF method proposed byD. Kotliar, 2019. We run the regular iNMF multiple times with differentrandom starts, and cluster the pool of all the factors inW andVs and take the consensus of the clusters of the largest population.The cell factor loadingH matrices are eventually solvedwith the consensusW andV matrices.
Please seerunINMF for detailed introduction to the regulariNMF algorithm which is run multiple times in this function.
The consensus iNMF algorithm is developed basing on the consensus NMF (cNMF)method (D. Kotliar et al., 2019).
Usage
runCINMF(object, k = 20, lambda = 5, rho = 0.3, ...)## S3 method for class 'liger'runCINMF( object, k = 20, lambda = 5, rho = 0.3, nIteration = 30, nRandomStarts = 10, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'Seurat'runCINMF( object, k = 20, lambda = 5, rho = 0.3, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "cinmf", nIteration = 30, nRandomStarts = 10, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | Aliger object or a Seurat object withnon-negative scaled data of variable features (Done with |
k | Inner dimension of factorization (number of factors). Generally, ahigher |
lambda | Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase as |
rho | Numeric number between 0 and 1. Fraction for determining thenumber of nearest neighbors to look at for consensus (by |
... | Arguments passed to methods. |
nIteration | Total number of block coordinate descent iterations toperform. Default |
nRandomStarts | Number of replicate runs for creating the pool offactorization results. Default |
HInit | Initial values to use for |
WInit | Initial values to use for |
VInit | Initial values to use for |
seed | Random seed to allow reproducible results. Default |
nCores | The number of parallel tasks to speed up the computation.Default |
verbose | Logical. Whether to show information of the progress. Default |
datasetVar | Metadata variable name that stores the dataset sourceannotation. Default |
layer | For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default |
assay | Name of assay to use. Default |
reduction | Name of the reduction to store result. Also used as thefeature key. Default |
Value
liger method - Returns updated inputliger object
A list of all
Hmatrices can be accessed withgetMatrix(object, "H")A list of all
Vmatrices can be accessed withgetMatrix(object, "V")The
Wmatrix can be accessed withgetMatrix(object, "W")
Seurat method - Returns updated input Seurat object
Hmatrices for all datasets will be concatenated andtransposed (all cells by k), and form a DimReduc object in thereductionsslot named by argumentreduction.Wmatrix will be presented asfeature.loadingsin thesame DimReduc object.Vmatrices, an objective error value and the datasetvariable used for the factorization is currently stored inmiscslot of the same DimReduc object.
References
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares andContrasts Features of Brain Cell Identity, Cell, 2019
Dylan Kotliar and et al., Identifying gene expression programs of cell-typeidentity and cellular activity with single-cell RNA-Seq, eLife, 2019
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runCINMF(pbmc)}SNN Graph Based Community Detection
Description
After aligning cell factor loadings, users can additionally run the Leiden orLouvain algorithm for community detection, which is widely used insingle-cell analysis and excels at merging small clusters into broad cellclasses.
While using aligned factor loadings (result fromalignFactors)is recommended, this function looks for unaligned factor loadings (raw resultfromrunIntegration) when the former is not available.
Usage
runCluster( object, resolution = 1, nNeighbors = 20, prune = 1/15, eps = 0.1, nRandomStarts = 10, nIterations = 5, method = c("leiden", "louvain"), useRaw = NULL, useDims = NULL, groupSingletons = TRUE, saveSNN = FALSE, clusterName = paste0(method, "_cluster"), seed = 1, verbose = getOption("ligerVerbose", TRUE))Arguments
object | Aliger object. Should have valid factorizationresult available. |
resolution | Numeric, value of the resolution parameter, a larger valueresults in a larger number of communities with smaller sizes. Default |
nNeighbors | Integer, the maximum number of nearest neighbors tocompute. Default |
prune | Numeric. Sets the cutoff for acceptable Jaccard index whencomputing the neighborhood overlap for the SNN construction. Any edges withvalues less than or equal to this will be set to 0 and removed from the SNNgraph. Essentially sets the stringency of pruning. |
eps | Numeric, the error bound of the nearest neighbor search. Default |
nRandomStarts | Integer number of random starts. Will pick themembership with highest quality to return. Default |
nIterations | Integer, maximal number of iterations per random start.Default |
method | Community detection algorithm to use. Choose from |
useRaw | Whether to use un-aligned cell factor loadings ( |
useDims | Indices of factors to use for clustering. Default |
groupSingletons | Whether to group single cells that make up their owncluster in with the cluster they are most connected to. Default |
saveSNN | Logical, whether to store the SNN graph, as a dgCMatrixobject, in the object. Default |
clusterName | Name of the variable that will store the clustering resultin |
seed | Seed of the random number generator. Default |
verbose | Logical. Whether to show information of the progress. Default |
Value
object with cluster assignment updated inclusterNamevariable incellMeta slot. Can be fetched withobject[[clusterName]]. IfsaveSNN = TRUE, the SNN graph willbe stored atobject@uns$snn.
Examples
pbmcPlot <- runCluster(pbmcPlot, nRandomStarts = 1)head(pbmcPlot$leiden_cluster)pbmcPlot <- runCluster(pbmcPlot, method = "louvain")head(pbmcPlot$louvain_cluster)Run Gene Ontology enrichment analysis on differentially expressed genes.
Description
This function forms genesets basing on the differential expression result,and calls gene ontology (GO) analysis method provided by gprofiler2.
Usage
runGOEnrich( result, group = NULL, useBg = TRUE, orderBy = NULL, logFCThresh = 1, padjThresh = 0.05, splitReg = FALSE, ...)Arguments
result | Data frame of unfiltered output from |
group | Selection of one group available from |
useBg | Logical, whether to set all genes involved in DE analysis(before threshold filtering) as a domain background of GO analysis. Default |
orderBy | Name of DE statistics metric to order the gene list for eachgroup. Choose from |
logFCThresh | The absolute valued log2FC threshold above which thegenes will be used. Default |
padjThresh | The adjusted p-value threshold less than which the geneswill be used. Default |
splitReg | Whether to have queries of both up-regulated anddown-regulated genes for each group. Default |
... | Additional arguments passed to
Arguments |
Details
GO term enrichment test often goes with two modes: two-list mode and rankedmode.
Two-list mode comes with a query gene set and a background gene set.A query gene set contains the filtered DEGs in this analysis. A backgroundcan be all the genes involved in the DEG test (default,useBg = TRUE),or use all annotated genes in the gprofiler2 database (useBg = FALSE).
Ranked mode comes with only one query gene set, which is sorted. It shouldcontain the whole domain background genes while significant genes aresupposed to come first. SetorderBy to one of the DE statistics metricto enable this mode.useBg will be ignored in this mode.
Value
A list object where each element is a result list for a group. Eachresult list contains two elements:
result | data.frame of main GO analysis result. |
meta | Meta information for the query. |
Seegprofiler2::gost(). for detailed explanation.
References
Kolberg, L. et al, 2020 and Raudvere, U. et al, 2019
Examples
if (requireNamespace("gprofiler2", quietly = TRUE)) { go <- runGOEnrich(deg.pw)}Analyze biological interpretations of metagene
Description
Identify the biological pathways (gene sets from Reactome) thateach metagene (factor) might belongs to.
Usage
runGSEA( object, genesets = NULL, useW = TRUE, useV = NULL, customGenesets = NULL, gene_sets = genesets, mat_w = useW, mat_v = useV, custom_gene_sets = customGenesets)Arguments
object | Aliger object with valid factorization result. |
genesets | Character vector of the Reactome gene sets names to betested. Default |
useW | Logical, whether to use the shared factor loadings ( |
useV | A character vector of the names, a numeric or logicalvector of the index of the datasets where the |
customGenesets | A named list of character vectors of entrez gene ids.Default |
gene_sets,mat_w,mat_v,custom_gene_sets | Deprecated. See Usagesection for replacement. |
Value
A list of matrices with GSEA analysis for each factor
Examples
if (requireNamespace("org.Hs.eg.db", quietly = TRUE) && requireNamespace("reactome.db", quietly = TRUE) && requireNamespace("fgsea", quietly = TRUE) && requireNamespace("AnnotationDbi", quietly = TRUE)) { runGSEA(pbmcPlot)}General QC for liger object
Description
Calculate number of UMIs, number of detected features andpercentage of feature subset (e.g. mito, ribo and hemo) expression per cell.
Usage
runGeneralQC( object, organism, features = NULL, pattern = NULL, overwrite = FALSE, useDatasets = NULL, chunkSize = getOption("ligerChunkSize", 20000), verbose = getOption("ligerVerbose", TRUE), mito = NULL, ribo = NULL, hemo = NULL)Arguments
object | liger object with |
organism | Specify the organism of the dataset to identify themitochondrial, ribosomal and hemoglobin genes. Available options are |
features | Feature names matching the feature subsets that users want tocalculate the expression percentage with. A vector for a single subset, or anamed list for multiple subset. Default |
pattern | Regex patterns for matching the feature subsets that userswant to calculate the expression percentage with. A vector for a singlesubset, or a named list for multiple subset. Default |
overwrite | Whether to overwrite existing QC metric variables. Default |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be included for QC. Default |
chunkSize | Integer number of cells to include in a chunk when workingon HDF5 based dataset. Default |
verbose | Logical. Whether to show information of the progress. Default |
mito,ribo,hemo |
|
Details
This function by default calculates:
nUMI- The column sum of the raw data matrix per cell.Represents the total number of UMIs per cell if given raw counts.nGene- Number of detected features per cellmito- Percentage of mitochondrial gene expression per cellribo- Percentage of ribosomal gene expression per cellhemo- Percentage of hemoglobin gene expression per cell
Users can also specify their own feature subsets with argumentfeatures, or regular expression patterns that match to genes ofinterests with argumentpattern, to calculate the expressionpercentage. If a character vector is given tofeatures, a QC metricvariable named"featureSubset_name" will be computed. If a named listof multiple subsets is given, the names will be used as the variable names.If a single pattern is given topattern, a QC metric variable named"featureSubset_pattern" will be computed. If a named list of multiplepatterns is given, the names will be used as the variable names.Duplicated QC metric names between these two arguments and the defaultfive listed above should be avoided.
This function is automatically operated at the creation time of eachliger object to capture the raw status. Argumentoverwrite is set to FALSE by default to avoid mistakenly updatingexisting metrics after filtering the object. Users can still opt to updateall newly calculated metrics (including the default five) by settingoverwrite = TRUE, or only some of newly calculated ones by providinga character vector of the names of the metrics to update. Intendedoverwriting only happens to datasets selected withuseDatasets.
Value
Updatedobject with thecellMeta(object) updated asintended by users. See Details for more information.
Examples
pbmc <- runGeneralQC(pbmc, "human", overwrite = TRUE)Perform iNMF on scaled datasets
Description
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,2019) using block coordinate descent (alternating non-negativeleast squares, ANLS) to return factorizedH,W, andVmatrices. The objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+\lambda\sum_{i}^{d}||V_iH_i||_F^2
whereE_i is the input non-negative matrix of the i'th dataset,dis the total number of datasets.E_i is of sizem \times n_i form variable genes andn_i cells,H_i is of sizen_i \times k,V_i is of sizem \times k, andW is ofsizem \times k.
The factorization produces a sharedW matrix (genes by k), and for eachdataset, anH matrix (k by cells) and aV matrix (genes by k).TheH matrices represent the cell factor loadings.W is heldconsistent among all datasets, as it represents the shared components of themetagenes across datasets. TheV matrices represent thedataset-specific components of the metagenes.
This function adopts highly optimized fast and memory efficientimplementation extended from Planc (Kannan, 2016). Pre-installation ofextension packageRcppPlanc is required. The underlying algorithmadopts the identical ANLS strategy asoptimizeALS in the oldversion of LIGER.
Usage
runINMF(object, k = 20, lambda = 5, ...)## S3 method for class 'liger'runINMF( object, k = 20, lambda = 5, nIteration = 30, nRandomStarts = 1, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'Seurat'runINMF( object, k = 20, lambda = 5, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "inmf", nIteration = 30, nRandomStarts = 1, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | Aliger object or a Seurat object withnon-negative scaled data of variable features (Done with |
k | Inner dimension of factorization (number of factors). Generally, ahigher |
lambda | Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase as |
... | Arguments passed to methods. |
nIteration | Total number of block coordinate descent iterations toperform. Default |
nRandomStarts | Number of restarts to perform (iNMF objective functionis non-convex, so taking the best objective from multiple successiveinitialization is recommended). For easier reproducibility, this incrementsthe random seed by 1 for each consecutive restart, so future factorizationof the same dataset can be run with one rep if necessary. Default |
HInit | Initial values to use for |
WInit | Initial values to use for |
VInit | Initial values to use for |
seed | Random seed to allow reproducible results. Default |
nCores | The number of parallel tasks to speed up the computation.Default |
verbose | Logical. Whether to show information of the progress. Default |
datasetVar | Metadata variable name that stores the dataset sourceannotation. Default |
layer | For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default |
assay | Name of assay to use. Default |
reduction | Name of the reduction to store result. Also used as thefeature key. Default |
Value
liger method - Returns updated inputliger object
A list of all
Hmatrices can be accessed withgetMatrix(object, "H")A list of all
Vmatrices can be accessed withgetMatrix(object, "V")The
Wmatrix can be accessed withgetMatrix(object, "W")
Seurat method - Returns updated input Seurat object
Hmatrices for all datasets will be concatenated andtransposed (all cells by k), and form a DimReduc object in thereductionsslot named by argumentreduction.Wmatrix will be presented asfeature.loadingsin thesame DimReduc object.Vmatrices, an objective error value and the datasetvariable used for the factorization is currently stored inmiscslot of the same DimReduc object.
Difference from optimizeALS()
In the old version implementation, we compute the objective error at the endof each iteration, and then compares if the algorithm is reaching aconvergence, using an argumentthresh. Now, since the computation ofobjective error is indeed expensive, we canceled this feature and directlyruns a default of 30 (nIteration) iterations, which empirically leadsto a convergence most of the time. Given that the new version is highlyoptimized, running this many iteration should be acceptable.
References
Joshua D. Welch and et al., Single-Cell Multi-omic IntegrationCompares and Contrasts Features of Brain Cell Identity, Cell, 2019
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc)}Integrate scaled datasets with iNMF or variant methods
Description
LIGER provides dataset integration methods based on iNMF (integrativeNon-negative Matrix Factorization [1]) and its variants (online iNMF [2]and UINMF [3]). This function wrapsrunINMF,runOnlineINMF andrunUINMF, of which the helppages have more detailed description.
Usage
runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF", "UINMF"), ...)## S3 method for class 'liger'runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF", "UINMF"), seed = 1, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'Seurat'runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF"), datasetVar = "orig.ident", useLayer = "ligerScaleData", assay = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | Aliger object or a Seurat object withnon-negative scaled data of variable features (Done with |
k | Inner dimension of factorization (number of factors). Generally, ahigher |
lambda | Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase as |
method | iNMF variant algorithm to use for integration. Choose from |
... | Arguments passed to other methods and wrapped functions. |
seed | Random seed to allow reproducible results. Default |
verbose | Logical. Whether to show information of the progress. Default |
datasetVar | Metadata variable name that stores the dataset sourceannotation. Default |
useLayer | For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default |
assay | Name of assay to use. Default |
Value
Updated input object. For detail, please refer to the refered methodlinked in Description.
References
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Comparesand Contrasts Features of Brain Cell Identity, Cell, 2019
Chao Gao and et al., Iterative single-cell multi-omic integration usingonline learning, Nat Biotechnol., 2021
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integrationof single-cell multi-omic datasets using nonnegative matrix factorization,Nat. Comm., 2022
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runIntegration(pbmc)}Perform online iNMF on scaled datasets
Description
Perform online integrative non-negative matrix factorization torepresent multiple single-cell datasets in terms ofH,W, andV matrices. It optimizes the iNMF objective function (seerunINMF) using online learning (non-negative least squares forH matrices, and hierarchical alternating least squares (HALS) forV matrices andW), where the number of factors is set byk. The function allows online learning in 3 scenarios:
Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization
All three scenarios require fixed memory independent of the number of cells.
For each dataset, this factorization produces anH matrix (k by cell),aV matrix (genes by k), and a sharedWmatrix (genes by k). TheH matrices represent the cell factor loadings.W is identical among all datasets, as it represents the sharedcomponents of the metagenes across datasets. TheV matrices representthe dataset-specific components of the metagenes.
Usage
runOnlineINMF(object, k = 20, lambda = 5, ...)## S3 method for class 'liger'runOnlineINMF( object, k = 20, lambda = 5, newDatasets = NULL, projection = FALSE, maxEpochs = 5, HALSiter = 1, minibatchSize = 5000, HInit = NULL, WInit = NULL, VInit = NULL, AInit = NULL, BInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'Seurat'runOnlineINMF( object, k = 20, lambda = 5, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "onlineINMF", maxEpochs = 5, HALSiter = 1, minibatchSize = 5000, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | liger object. Scaled data required. |
k | Inner dimension of factorization–number of metagenes. A value inthe range 20-50 works well for most analyses. Default |
lambda | Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase aslambda increases). We recommend always using the default value exceptpossibly for analyses with relatively small differences (biologicalreplicates, male/female comparisons, etc.) in which case a lower value suchas 1.0 may improve reconstruction quality. Default |
... | Arguments passed to other S3 methods of this function. |
newDatasets | Named list ofdgCMatrix-class object. Newdatasets for scenario 2 or scenario 3. Default |
projection | Whether to perform data integration with scenario 3 when |
maxEpochs | The number of epochs to iterate through. See detail.Default |
HALSiter | Maximum number of block coordinate descent (HALSalgorithm) iterations to perform for each update of |
minibatchSize | Total number of cells in each minibatch. See detail.Default |
HInit,WInit,VInit,AInit,BInit | Optional initialization for |
seed | Random seed to allow reproducible results. Default |
nCores | The number of parallel tasks to speed up the computation.Default |
verbose | Logical. Whether to show information of the progress. Default |
datasetVar | Metadata variable name that stores the dataset sourceannotation. Default |
layer | For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default |
assay | Name of assay to use. Default |
reduction | Name of the reduction to store result. Also used as thefeature key. Default |
Details
For performing scenario 2 or 3, a complete set of factorization result froma run of scenario 1 is required. Given the structure of aligerobject, all of the required information can be retrieved automatically.Under the circumstance where users need customized information for existingfactorization, argumentsWInit,VInit,AInit andBInit are exposed. The requirements for these argument follows:
HInit - A list object of matrices each of size
k \times n_i.Number of matrices should match withnewDatasets.WInit - A matrix object of size
m \times k. (seerunINMFfor notation)VInit - A list object of matrices each of size
m \times k.Number of matrices should match withnewDatasets.AInit - A list object of matrices each of size
k \times k.Number of matrices should match withnewDatasets.BInit - A list object of matrices each of size
m \times k.Number of matrices should match withnewDatasets.
Minibatch iterations is performed on small subset of cells. The exactminibatch size applied on each dataset isminibatchSize multiplied bythe proportion of cells in this dataset out of all cells. In general,minibatchSize should be no larger than the number of cells in thesmallest dataset (considering bothobject andnewDatasets).Therefore, a smaller value may be necessary for analyzing very smalldatasets.
An epoch is one completion of calculation on all cells after a number ofiterations of minibatches. Therefore, the total number of iterations isdetermined by the setting ofmaxEpochs, total number of cells, andminibatchSize.
Currently, Seurat S3 method does not support working on Scenario 2 and 3,because there is no simple solution for organizing a number of miscellaneousmatrices with a single Seurat object. We strongly recommend that users createaliger object which has the specific structure.
Value
liger method - Returns updated inputliger object.
A list of all
Hmatrices can be accessed withgetMatrix(object, "H")A list of all
Vmatrices can be accessed withgetMatrix(object, "V")The
Wmatrix can be accessed withgetMatrix(object, "W")Meanwhile, intermediate matrices
AandBproduced inHALS update can also be accessed similarly.
Seurat method - Returns updated input Seurat object.
Hmatrices for all datasets will be concatenated andtransposed (all cells by k), and form a DimReduc object in thereductionsslot named by argumentreduction.Wmatrix will be presented asfeature.loadingsin thesame DimReduc object.Vmatrices,Amatrices,Bmatricesm an objectiveerror value and the dataset variable used for the factorization iscurrently stored inmiscslot of the same DimReduc object.
References
Chao Gao and et al., Iterative single-cell multi-omic integrationusing online learning, Nat Biotechnol., 2021
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Scenario 1 pbmc <- runOnlineINMF(pbmc, minibatchSize = 200) # Scenario 2 # Fake new dataset by increasing all non-zero value in "ctrl" by 1 ctrl2 <- rawData(dataset(pbmc, "ctrl")) ctrl2@x <- ctrl2@x + 1 colnames(ctrl2) <- paste0(colnames(ctrl2), 2) pbmc2 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2), minibatchSize = 100) # Scenario 3 pbmc3 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2), projection = TRUE)}Find DEG between groups
Description
Two methods are supported:"pseudoBulk" and"wilcoxon". Pseudo-bulk method aggregates cells basing on biologicalreplicates and calls bulk RNAseq DE methods, DESeq2 wald test, whileWilcoxon rank sum test is performed on single-cell level.runPairwiseDEG() is generally used for flexibly comparing two specificgroups of cells, whilerunMarkerDEG() is used for a one-vs-rest markertest strategy.
While using pseudo-bulk method, it is generally recommended that you havethese variables available in your object:
The cell type or cluster labeling. This can be obtained from priorstudy or computed with
runClusterThe biological replicate labeling, most of the time the
"dataset"variable automatically generated when theliger object is created. Users may use other variables ifa "dataset" is merged from multiple replicates.The condition labeling that reflects the study design, such as thetreatment or disease status for each sample/dataset.
Please see below for detailed scenarios.
Usage
runPairwiseDEG( object, groupTest, groupCtrl, variable1 = NULL, variable2 = NULL, splitBy = NULL, method = c("pseudoBulk", "wilcoxon"), usePeak = FALSE, useReplicate = "dataset", nPsdRep = NULL, minCellPerRep = 3, printDiagnostic = FALSE, chunk = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE))runMarkerDEG( object, conditionBy = NULL, splitBy = NULL, method = c("pseudoBulk", "wilcoxon"), useDatasets = NULL, usePeak = FALSE, useReplicate = "dataset", nPsdRep = NULL, minCellPerRep = 3, printDiagnostic = FALSE, chunk = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE))runWilcoxon( object, data.use = NULL, compare.method = c("clusters", "datasets"))Arguments
object | Aliger object, with normalized data available |
groupTest,groupCtrl,variable1,variable2 | Condition specification. See |
splitBy | Name(s) of the variable(s) in |
method | DEG test method to use. Choose from |
usePeak | Logical. Whether to use peak count instead of gene count.Only supported when ATAC datasets are involved. Default |
useReplicate |
|
nPsdRep | Number of pseudo-replicates to create. Only used when |
minCellPerRep | Numeric, will not make pseudo-bulk for replicate withless than this number of cells. Default |
printDiagnostic | Logical. Whether to show more detail when |
chunk | Number of features to process at a time during Wilcoxon test.Useful when memory is limited. Default |
seed | Random seed to use for pseudo-replicate generation. Default |
verbose | Logical. Whether to show information of the progress. Default |
conditionBy |
|
useDatasets | Datasets to perform marker detection within. Default |
data.use | Same as |
compare.method | Choose from |
Value
A data.frame with DEG information with the all or some of thefollowing fields:
feature | Gene names |
group | Test group name. Multiple tests might be present for eachfunction call. This is the main variable to distinguish the tests. For apairwise test, a row with a certain group name represents the test resultbetween the this group against the other control group; When split by avariable, it would be presented in "split.group" format, meaning the statsis by comparing the group in the split level against the control group inthe same split level. When running marker detection without splitting,a row with group "a" represents the stats of the gene in group "a" againstall other cells. When running split marker detection, the group name wouldbe in "split.group" format, meaning the stats is by comparing the group inthe split level against all other cells in the same split level. |
logFC | Log fold change |
pval | P-value |
padj | Adjusted p-value |
avgExpr | Mean expression in the test group indicated by the "group"field. Only available for wilcoxon tests. |
statistic | Wilcoxon rank-sum test statistic. Only available forwilcoxon tests. |
auc | Area under the ROC curve. Only available for wilcoxon tests. |
pct_in | Percentage of cells in the test group, indicated by the"group" field, that express the feature. Only available for wilcoxontests. |
pct_out | Percentage of cells in the control group or other cells, asexplained for the "group" field, that express the feature. Only availablefor wilcoxon tests. |
Using Wilcoxon rank-sum test
Wilcoxon rank-sum test works for each gene and is based on the rank of theexpression in each cell. LIGER provides dataset integration but does not"correct" the expression values. Projects with strong batch effects orintegrate drastically different modalities should be cautious when usingthis method.
Comparing difference between/across cell types
Most of times, people would want to know what cell types are for each clusterafter clustering. This can be done with a marker detection method that testeach cluster against all the other cells. This can be done with a commandlikerunMarkerDEG(object, conditionBy = "cluster_var"). When usingdefault pseudo-bulk method, users should additionaly determine thepseudo-bulk setup parameters. If the real biological replicate variable isavailable, it should be supplied to argumentuseReplicate, otherwise,pseudo-replicates should be created. See "Pseudo-Replicate" section for more.
Compare between conditions
It is frequently needed to identify the difference between conditions. Userscan simply setconditionBy = "condition_var". However, most of time,such comparisons should be ideally done in a per-cluster manner. This can bedone by settingsplitBy = "cluster_var". This will run a loop for eachcluster, and within the group of cells, compare each condition against allother cells in the cluster.
In the scenario when users only need to compare two conditions for eachcluster, runningrunPairwiseDEG(object, groupTest = "condition1",groupCtrl = "condition2", variable1 = "condition_var",splitBy = "cluster_var") would address the need.
For both use case, if pseudo-bulk (default) method is used, users shoulddetermine the pseudo-bulk setup parameters as mentioned in the previoussection.
DetailedrunMarkerDEG usage
Marker detection is performed in a one vs. rest manner. The grouping of suchcondition is specified byconditionBy, which should be a column nameincellMeta. WhensplitBy is specified as another variablename incellMeta, the marker detection will be iteratively done forwithin each level ofsplitBy variable.
For example, whenconditionBy = "celltype" andsplitBy = NULL,marker detection will be performed by comparing all cells of "celltype_i"against all other cells, and etc. This is analogous to the old version whenrunningrunWilcoxon(method = "cluster").
WhenconditionBy = "gender" andsplitBy = "leiden_cluster",marker detection will be performed by comparing "gender_i" cells from "cluster_j"against other cells from "cluster_j", and etc. This is analogous to the oldversion when runningrunWilcoxon(method = "dataset").
DetailedrunPairwiseDEG usage
Users can select classes of cells from a variable incellMeta.variable1 andvariable2 are used to specify a column incellMeta, andgroupTest andgroupCtrl are used to specifyexisting classes fromvariable1 andvariable2, respectively.Whenvariable2 is missing,groupCtrl will be considered fromvariable1.
For example, whenvariable1 = "celltype" andvariable2 = NULL,groupTest andgroupCtrl should be valid cell types inobject$celltype.
Whenvariable1 is "celltype" andvariable2 is "gender",groupTest should be a valid cell type fromobject$celltype andgroupCtrl should be a valid class fromobject$gender.
When bothvariable1 andvariable2 are missing,groupTestandgroupCtrl should be valid index of cells inobject.
Pseudo-Replicate
Pseudo-replicate assignment is a technique to complement the lack of realbiological replicates when using pseudo-bulk DE methods. LIGER's pseudo-bulkmethod generally requires that each comparison group has at least 3replicates each composed of at least 3 cells, in order to ensure thestatistic power. When less than 3 real replicates are found for a comparison,the default setting (nPsdRep = NULL) splits each into 3pseudo-replicates, otherwise no pseudo-replicates are automaticallygenerated. WhennPsdRep is given a number, LIGER will always gothrough each comparison group and split each real replicate into the givennumber of pseudo-replicates.
Examples
pbmc$leiden_cluster <- pbmcPlot$leiden_cluster# Identify cluster markersdegStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster")# Compare "stim" data against "ctrl" data within each clusterdegStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "leiden_cluster", minCellPerRep = 4)Perform t-SNE dimensionality reduction
Description
Runs t-SNE on the aligned cell factors (result fromalignFactors), or unaligned cell factors (result fromrunIntegration)) to generate a 2D embedding for visualization.By defaultRtsne (Barnes-Hut implementation of t-SNE)method is invoked, while alternative "fftRtsne" method (FFT-acceleratedInterpolation-based t-SNE, using Kluger Lab implementation) is alsosupported. For very large datasets, it is recommended to usemethod = "fftRtsne" due to its efficiency and scalability.
Extra external installation steps are required for using "fftRtsne" method.Please consultdetailed guide.
Usage
runTSNE( object, useRaw = NULL, useDims = NULL, nDims = 2, usePCA = FALSE, perplexity = 30, theta = 0.5, method = c("Rtsne", "fftRtsne"), dimredName = "TSNE", asDefault = NULL, fitsnePath = NULL, seed = 42, verbose = getOption("ligerVerbose", TRUE), k = nDims, use.raw = useRaw, dims.use = useDims, use.pca = usePCA, fitsne.path = fitsnePath, rand.seed = seed)Arguments
object | liger object with factorization results. |
useRaw | Whether to use un-aligned cell factor loadings ( |
useDims | Index of factors to use for computing the embedding. Default |
nDims | Number of dimensions to reduce to. Default |
usePCA | Whether to perform initial PCA step for Rtsne. Default |
perplexity | Numeric parameter to pass to Rtsne (expected number ofneighbors). Default |
theta | Speed/accuracy trade-off (increase for less accuracy), set to |
method | Choose from |
dimredName | Name of the variable in |
asDefault | Logical, whether to set the resulting dimRed as default forvisualization. Default |
fitsnePath | Path to the cloned FIt-SNE directory (i.e. |
seed | Random seed for reproducibility. Default |
verbose | Logical. Whether to show information of the progress. Default |
use.raw,dims.use,k,use.pca,fitsne.path,rand.seed | Deprecated.See Usage section for replacement. |
Value
Theobject where a"TSNE" variable is updated in thecellMeta slot with the whole 2D embedding matrix.
See Also
Examples
pbmc <- runTSNE(pbmcPlot)Perform Mosaic iNMF (UINMF) on scaled datasets with unshared features
Description
Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R.Kriebel, 2022) using block coordinate descent (alternating non-negativeleast squares, ANLS) to return factorizedH,W,V andU matrices. The objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d}||\begin{bmatrix}E_i \\ P_i \end{bmatrix} -(\begin{bmatrix}W \\ 0 \end{bmatrix}+\begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+\lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2
whereE_i is the input non-negative matrix of thei'th dataset,P_i is the input non-negative matrix for the unshared features,d is the total number of datasets.E_i is of sizem \times n_i form shared features andn_i cells,P_iis of sizeu_i \times n_i foru_i unshared feaetures,H_i is of sizek \times n_i,V_i is of sizem \times k,W is of sizem \times k andU_i is ofsizeu_i \times k.
The factorization produces a sharedW matrix (genes by k). For eachdataset, anH matrix (k by cells), aV matrix (genes by k) andaU matrix (unshared genes by k). TheH matrices represent thecell factor loadings.W is held consistent among all datasets, as itrepresents the shared components of the metagenes across datasets. TheV matrices represent the dataset-specific components of the metagenes,U matrices are similar toVs but represents the loadingcontributed by unshared features.
This function adopts highly optimized fast and memory efficientimplementation extended from Planc (Kannan, 2016). Pre-installation ofextension packageRcppPlanc is required. The underlying algorithmadopts the identical ANLS strategy asoptimizeALS(unshared =TRUE) in the old version of LIGER.
Usage
runUINMF(object, k = 20, lambda = 5, ...)## S3 method for class 'liger'runUINMF( object, k = 20, lambda = 5, nIteration = 30, nRandomStarts = 1, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | liger object. Should run |
k | Inner dimension of factorization (number of factors). Generally, ahigher |
lambda | Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase as |
... | Arguments passed to other methods and wrapped functions. |
nIteration | Total number of block coordinate descent iterations toperform. Default |
nRandomStarts | Number of restarts to perform (iNMF objective functionis non-convex, so taking the best objective from multiple successiveinitialization is recommended). For easier reproducibility, this incrementsthe random seed by 1 for each consecutive restart, so future factorizationof the same dataset can be run with one rep if necessary. Default |
seed | Random seed to allow reproducible results. Default |
nCores | The number of parallel tasks to speed up the computation.Default |
verbose | Logical. Whether to show information of the progress. Default |
Value
liger method - Returns updated inputliger object.
A list of all
Hmatrices can be accessed withgetMatrix(object, "H")A list of all
Vmatrices can be accessed withgetMatrix(object, "V")The
Wmatrix can be accessed withgetMatrix(object, "W")A list of all
Umatrices can be accessed withgetMatrix(object, "U")
Note
Currently, Seurat S3 method is not supported for UINMF because there is nosimple solution for organizing a number of miscellaneous matrices with asingle Seurat object. We strongly recommend that users create aliger object which has the specific structure.
References
April R. Kriebel and Joshua D. Welch, UINMF performs mosaicintegration of single-cell multi-omic datasets using nonnegative matrixfactorization, Nat. Comm., 2022
Examples
pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim"))pbmc <- scaleNotCenter(pbmc)if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) && !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) { # TODO: unshared variable features cannot be detected from this example pbmc <- runUINMF(pbmc)}Perform UMAP Dimensionality Reduction
Description
Run UMAP on the aligned cell factors (result fromalignFactors), or unaligned cell factors (raw result fromrunIntegration)) to generate a 2D embedding for visualization(or general dimensionality reduction). Has option to run on subset offactors. It is generally recommended to use this method for dimensionalityreduction with extremely large datasets. The underlying UMAP calculationimports uwotumap.
Usage
runUMAP( object, useRaw = NULL, useDims = NULL, nDims = 2, distance = c("cosine", "euclidean", "manhattan", "hamming"), nNeighbors = 20, minDist = 0.1, dimredName = "UMAP", asDefault = NULL, seed = 42, verbose = getOption("ligerVerbose", TRUE), k = nDims, use.raw = useRaw, dims.use = useDims, n_neighbors = nNeighbors, min_dist = minDist, rand.seed = seed, ...)Arguments
object | liger object with factorization results. |
useRaw | Whether to use un-aligned cell factor loadings ( |
useDims | Index of factors to use for computing the embedding. Default |
nDims | Number of dimensions to reduce to. Default |
distance | Character. Metric used to measure distance in the inputspace. Default |
nNeighbors | Number of neighboring points used in local approximationsof manifold structure. Default |
minDist | Numeric. Controls how tightly the embedding is allowedcompress points together. Default |
dimredName | Name of the variable in |
asDefault | Logical, whether to set the resulting dimRed as default forvisualization. Default |
seed | Random seed for reproducibility. Default |
verbose | Logical. Whether to show information of the progress. Default |
k,use.raw,dims.use,n_neighbors,min_dist,rand.seed | Deprecated.See Usage section for replacement. |
... | Additional argument passed to |
Details
FornNeighbors, larger values will result in more globalstructure being preserved at the loss of detailed local structure. In generalthis parameter should often be in the range 5 to 50, with a choice of 10 to15 being a sensible default.
ForminDist, larger values ensure embedded points are more evenlydistributed, while smaller values allow the algorithm to optimize moreaccurately with regard to local structure. Sensible values are in the range0.001 to 0.5, with 0.1 being a reasonable default.
Value
Theobject where a"UMAP" variable is updated in thecellMeta slot with the whole 2D embedding matrix.
See Also
Examples
pbmc <- runUMAP(pbmcPlot)Scale genes by root-mean-square across cells
Description
This function scales normalized gene expression data aftervariable genes have been selected. We do not mean-center the data beforescaling in order to address the non-negativity constraint of NMF.Computation applied to each normalized dataset matrix can form the followingequation:
S_{i,j}=\frac{N_{i,j}}{\sqrt{\sum_{p}^{n}\frac{N_{i,p}^2}{n-1}}}
WhereN denotes the normalized matrix for an individual dataset,S is the output scaled matrix for this dataset, andn is thenumber of cells in this dataset.i, j denotes the specific gene andcell index, andp is the cell iterator.
Please see detailed section below for explanation on methylation dataset.
Usage
scaleNotCenter(object, ...)## S3 method for class 'dgCMatrix'scaleNotCenter(object, features, scaleFactor = NULL, ...)## S3 method for class 'DelayedArray'scaleNotCenter( object, features, scaleFactor = NULL, geneRootMeanSq = NULL, overwrite = FALSE, chunk = getOption("ligerChunkSize", 20000), verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'ligerDataset'scaleNotCenter( object, features = NULL, scaleFactor = NULL, chunk = getOption("ligerChunkSize", 20000), verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'ligerMethDataset'scaleNotCenter( object, features = NULL, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'liger'scaleNotCenter( object, useDatasets = NULL, features = varFeatures(object), verbose = getOption("ligerVerbose", TRUE), remove.missing = NULL, ...)## S3 method for class 'Seurat'scaleNotCenter( object, assay = NULL, layer = "ligerNormData", save = "ligerScaleData", datasetVar = "orig.ident", features = NULL, ...)Arguments
object | liger object,ligerDataset object,dgCMatrix-class object, or a Seurat object. |
... | Arguments passed to other methods. The order goes by: "liger"method calls "ligerDataset" method", which then calls "dgCMatrix" method."Seurat" method directly calls "dgCMatrix" method. |
features | Character, numeric or logical index that choose the variablefeature to be scaled. "liger" method by default uses |
scaleFactor | Numeric vector of scaling factor to normalize the rawcounts to unit sum. This pre-calculated at liger object creation (stored as |
geneRootMeanSq | Numeric vector of root-mean-square of unit-normalizedexpression for each gene. This is pre-calculated at the call of |
overwrite | Logical. When writing newly computed HDF5 array to aseparate HDF5 file, whether to overwrite the existing file. Default |
chunk | Integer. Number of maximum number of cells in each chunk, whenscaling is applied to any HDF5 based dataset. Default |
verbose | Logical. Whether to show information of the progress. Default |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to be scaled but not centered. Default |
remove.missing | Deprecated. The functionality of this is coveredthrough other parts of the whole workflow and is no long needed. Will beignored if specified. |
assay | Name of assay to use. Default |
layer | For Seurat>=4.9.9, the name of layer to retrieve normalizeddata. Default |
save | For Seurat>=4.9.9, the name of layer to store normalized data.Default |
datasetVar | Metadata variable name that stores the dataset sourceannotation. Default |
Value
Updatedobject
dgCMatrix method - Returns scaled dgCMatrix object
ligerDataset method - Updates the
scaleDataandscaledUnsharedData(if unshared variable feature available) slotof the objectliger method - Updates the
scaleDataandscaledUnsharedData(if unshared variable feature available) slotof chosen datasetsSeurat method - Adds a named layer in chosen assay (V5), or update the
scale.dataslot of the chosen assay (<=V4)
Methylation dataset
Because gene body mCH proportions are negatively correlated with geneexpression level in neurons, we need to reverse the direction of themethylation data before performing the integration. We do this by simplysubtracting all values from the maximum methylation value. The resultingvalues are positively correlated with gene expression. This will only beapplied to variable genes detected in prior. Please make sure that argumentmodal is set accordingly when runningcreateLiger. Inthis way, this function can automatically detect it and take proper action.If it is not set, users can still manually have the equivalent processingdone by doingscaleNotCenter(lig, useDataset = c("other", "datasets")),and thenreverseMethData(lig, useDataset = c("meth", "datasets")).
Note
Since the scaling on genes is applied on a per dataset base, other scalingmethods that apply to a whole concatenated matrix of multiple datasets mightnot be considered as equivalent alternatives, even if options likecenter are set toFALSE. Hence we implemented an efficientsolution that works under such circumstance, provided with the Seurat S3method.
Examples
pbmc <- selectBatchHVG(pbmc, n = 10)pbmc <- scaleNotCenter(pbmc)
Batch-aware highly variable gene selection
Description
Method to select HVGs based on mean dispersions of genes that are highlyvariable genes in all batches. Using a the top target_genes per batch byaverage normalize dispersion. If target genes still hasn't been reached,then HVGs in all but one batches are used to fill up. This is continueduntil HVGs in a single batch are considered.
This is anrliger implementation of the method originally published inSCIB.We found the potential that it can improve integration under somecircumstances, and is currently testing it.
This function currently only works for shared features across all datasets.For selection from only part of the datasets and selection fordataset-specific unshared features, please useselectGenes().
Usage
selectBatchHVG(object, ...)## S3 method for class 'liger'selectBatchHVG( object, nGenes = 2000, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'ligerDataset'selectBatchHVG( object, nGenes = 2000, features = NULL, scaleFactor = NULL, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'dgCMatrix'selectBatchHVG( object, nGenes = 2000, returnStats = FALSE, scaleFactor = NULL, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'DelayedArray'selectBatchHVG( object, nGenes = 2000, means = NULL, scaleFactor = NULL, returnStats = FALSE, chunk = getOption("ligerChunkSize", 20000), verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | A |
... | Arguments passed to S3 methods. |
nGenes | Integer number of target genes to select. Default |
verbose | Logical. Whether to show a progress bar. Default |
features | For ligerDataset method, the feature subset to limit theselection to, mainly for limiting the selection to happen within the sharedgenes of all datasets. Default |
scaleFactor | Numeric vector of scaling factor to normalize the rawcounts to unit sum. This pre-calculated at liger object creation (stored as |
returnStats | Logical, for dgCMatrix-method, whether to return a dataframe of statistics for all features, or by default |
means | Numeric vector of pre-calculated means per gene, derived fromlog1p CPM normalized expression. |
chunk | Integer. Number of maximum number of cells in each chunk whenworking on HDF5Array Default |
Value
liger-method: Returns the input liger object with the selected genesupdated in
varFeaturesslot, which can be accessed withvarFeatures(object). Additionally, the statistics are updated inthefeatureMetaslot of each ligerDataset object within thedatasetsslot of theobject.ligerDataset-method: Returns the input ligerDataset object with thestatistics updated in the
featureMetaslot.dgCMatrix-method: By default returns a character vector of selectedvariable features. If
returnStats = TRUE, returns a data.frame of thestatistics.
References
Luecken, M.D., Büttner, M., Chaichoompu, K. et al. (2022), Benchmarkingatlas-level data integration in single-cell genomics.Nat Methods, 19,41–50. https://doi.org/10.1038/s41592-021-01336-8.
See Also
Examples
pbmc <- selectBatchHVG(pbmc, nGenes = 10)varFeatures(pbmc)Select a subset of informative genes
Description
This function identifies highly variable genes from each datasetand combines these gene sets (either by union or intersection) for use indownstream analysis. Assuming that gene expression approximately follows aPoisson distribution, this function identifies genes with gene expressionvariance above a given variance threshold (relative to mean gene expression).Alternatively, we allow selecting a desired number of genes for each datasetby ranking the relative variance, and then take the combination.
Usage
selectGenes(object, thresh = 0.1, nGenes = NULL, alpha = 0.99, ...)## S3 method for class 'liger'selectGenes( object, thresh = 0.1, nGenes = NULL, alpha = 0.99, useDatasets = NULL, useUnsharedDatasets = NULL, unsharedThresh = 0.1, combine = c("union", "intersection"), chunk = getOption("ligerChunkSize", 20000), verbose = getOption("ligerVerbose", TRUE), var.thresh = thresh, alpha.thresh = alpha, num.genes = nGenes, datasets.use = useDatasets, unshared.datasets = useUnsharedDatasets, unshared.thresh = unsharedThresh, tol = NULL, do.plot = NULL, cex.use = NULL, unshared = NULL, ...)## S3 method for class 'Seurat'selectGenes( object, thresh = 0.1, nGenes = NULL, alpha = 0.99, useDatasets = NULL, layer = "ligerNormData", assay = NULL, datasetVar = "orig.ident", combine = c("union", "intersection"), verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | Aliger,ligerDataset or |
thresh | Variance threshold used to identify variable genes. Higherthreshold results in fewer selected genes. Liger and Seurat S3 methods accepta single value or a vector with specific threshold for each dataset in |
nGenes | Number of genes to find for each dataset. By setting this,we optimize the threshold used for each dataset so that we get |
alpha | Alpha threshold. Controls upper bound for expected mean geneexpression. Lower threshold means higher upper bound. Default |
... | Arguments passed to other methods. |
useDatasets | A character vector of the names, a numeric or logicalvector of the index of the datasets to use for shared variable featureselection. Default |
useUnsharedDatasets | A character vector of the names, a numeric orlogical vector of the index of the datasets to use for finding unsharedvariable features. Default |
unsharedThresh | The same thing as |
combine | How to combine variable genes selected from all datasets.Choose from |
chunk | Integer. Number of maximum number of cells in each chunk, whengene selection is applied to any HDF5 based dataset. Default |
verbose | Logical. Whether to show information of the progress. Default |
var.thresh,alpha.thresh,num.genes,datasets.use,unshared.datasets,unshared.thresh | Deprecated.These arguments are renamed and will be removed in the future. Please seefunction usage for replacement. |
tol,do.plot,cex.use,unshared | Deprecated. Gene variabilitymetric is now visualized with separated function |
layer | Where the input normalized counts should be from. Default |
assay | Name of assay to use. Default |
datasetVar | Metadata variable name that stores the dataset sourceannotation. Default |
Value
Updated object
liger method - Each involved dataset stored inligerDataset is updated with its
featureMetaslot andvarUnsharedFeaturesslot (if requested withuseUnsharedDatasets), whilevarFeatures(object)will beupdated with the final combined gene set.Seurat method - Final selection will be updated at
Seurat::VariableFeatures(object). Per-dataset information isstored in themeta.featuresslot of the chosen Assay.
Examples
pbmc <- normalize(pbmc)# Select basing on thresholding the relative variancepbmc <- selectGenes(pbmc, thresh = .1)# Select specified number for each datasetpbmc <- selectGenes(pbmc, nGenes = c(60, 60))Select variable genes from one dataset with Seurat VST method
Description
Seurat FindVariableFeatures VST method. This allows the selection of a fixednumber of variable features, but only applies to one dataset. Nonormalization is needed in advance.
Usage
selectGenesVST( object, useDataset, n = 2000, loessSpan = 0.3, clipMax = "auto", useShared = TRUE, verbose = getOption("ligerVerbose", TRUE))Arguments
object | Aliger object. |
useDataset | The names, a numeric or logical index of the dataset tobe considered for selection. |
n | Number of variable features needed. Default |
loessSpan | Loess span parameter used when fitting the variance-meanrelationship. Default |
clipMax | After standardization values larger than |
useShared | Logical. Whether to only select from genes shared by alldataset. Default |
verbose | Logical. Whether to show information of the progress. Default |
References
Seurat::FindVariableFeatures.default(selection.method = "vst")
Examples
pbmc <- selectGenesVST(pbmc, "ctrl", n = 50)Subset liger with brackets
Description
Subset liger with brackets
Usage
## S3 method for class 'liger'x[i, j, ...]Arguments
x | Aliger object |
i | Feature subscriptor, passed to |
j | Cell subscriptor, passed to |
... | Additional arguments passed to |
Value
Subset ofx with specified features and cells.
See Also
Examples
pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10]Subset ligerDataset object
Description
Subset ligerDataset object
Usage
## S3 method for class 'ligerDataset'x[i, j, ...]Arguments
x | AligerDataset object |
i | Numeric, logical index or character vector of feature names tosubscribe. Leave missing for all features. |
j | Numeric, logical index or character vector of cell IDs to subscribe.Leave missing for all cells. |
... | Additional arguments passed to |
Value
Ifi is given, the selected metadata will be returned; if itis missing, the whole cell metadata table inS4Vectors::DataFrame class will be returned.
Examples
ctrl <- dataset(pbmc, "ctrl")ctrl[1:5, 1:5]Get cell metadata variable
Description
Get cell metadata variable
Usage
## S3 method for class 'liger'x[[i, ...]]Arguments
x | Aliger object |
i | Name or numeric index of cell meta data to fetch |
... | Anything that |
Value
Ifi is given, the selected metadata will be returned; if itis missing, the whole cell metadata table inS4Vectors::DataFrame class will be returned.
Examples
# Retrieve whole cellMetapbmc[[]]# Retrieve a variablepbmc[["dataset"]]Subset liger object
Description
This function subsets aliger object withcharacter feature index and any valid cell index. For datasets based on HDF5,the filenames of subset H5 files could only be automatically generated fornow. Feature subsetting is based on the intersection of available featuresfrom datasets involved bycellIdx, whilefeatureIdx = NULL doesnot take the intersection (i.e. nothing done on the feature axis).
aligerDataset object is also allowed for now and meanwhile,settingfilename is supported.
Usage
subsetLiger( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), newH5 = TRUE, returnObject = TRUE, ...)Arguments
object | Aliger orligerDataset object. |
featureIdx | Character vector. Missing or |
cellIdx | Character, logical or numeric index that can subscribe cells.Missing or |
useSlot | The slot(s) to only consider. Choose one or more from |
chunkSize | Integer. Number of maximum number of cells in each chunk,Default |
verbose | Logical. Whether to show information of the progress. Default |
newH5 | Whether to create new H5 files on disk for the subset datasetsif involved datasets in the |
returnObject | Logical, whether to return aliger objectfor result. Default |
... | Arguments passed to |
Value
Subsetobject
See Also
Examples
pbmc.small <- subsetLiger(pbmc, cellIdx = pbmc$nUMI > 200)pbmc.small <- pbmc[, pbmc$nGene > 50]Subset ligerDataset object
Description
This function subsets aligerDataset object withvalid feature and cell indices. For HDF5 based object, options are availablefor subsetting data into memory or a new on-disk H5 file. Feature and cellsubscription is always based on the size of rawData. Therefore, the featuresubsetting on scaled data, which usually contains already a subset offeatures, will select the intersection between the wanted features and theset available from scaled data.
Usage
subsetLigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, newH5 = TRUE, filename = NULL, filenameSuffix = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), returnObject = TRUE, ...)subsetH5LigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, newH5 = TRUE, filename = NULL, filenameSuffix = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), returnObject = TRUE)subsetMemLigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, returnObject = TRUE)Arguments
object | ligerDataset object. HDF5 based object if using |
featureIdx | Character, logical or numeric index that can subscribefeatures. Missing or |
cellIdx | Character, logical or numeric index that can subscribe cells.Missing or |
useSlot | The slot(s) to only consider. Choose one or more from |
newH5 | Whether to create a new H5 file on disk for the subset datasetif |
filename | Filename of the new H5 file if being created. Default |
filenameSuffix | Instead of specifying the exact filename, set a suffixfor the new files so the new filename looks like |
chunkSize | Integer. Number of maximum number of cells in each chunk,Default |
verbose | Logical. Whether to show information of the progress. Default |
returnObject | Logical, whether to return aligerDatasetobject for result. Default |
... | Arguments passed to |
Value
Subsetobject
Examples
ctrl <- dataset(pbmc, "ctrl")ctrl.small <- subsetLigerDataset(ctrl, cellIdx = 1:5)ctrl.tiny <- ctrl[1:5, 1:5]
Suggest optimal K value for the factorization
Description
This function sweeps through a series of k values (number of ranks thedatasets are factorized into). For each k value, it repeats the factorizationfor a number of random starts and obtains the objective errors from each run.The optimal k value is recommended to be the one with the lowest variance.
We are currently actively testing the methodology and the function issubject to change. Please report any issues you encounter.
Currently we have identified that a wider step of k values (e.g. 5, 10, 15,...) shows a more stable variance than a narrower step (e.g. 5, 6, 7, ...).
Note that this function is supposed to take a long time when a larger numberof random starts is requested (e.g. 50) for a robust suggestion. It is safeto interrupt the progress (e.g. Ctrl+C) and the function will still returnthe recorded objective errors already completed.
Usage
suggestK( object, kTest = seq(5, 50, 5), nRandomStart = 10, lambda = 5, nIteration = 30, nCores = 1L, verbose = getOption("ligerVerbose", TRUE))Arguments
object | Aliger object. |
kTest | A numeric vector of k values to be tested. Default 5, 10, 15,..., 50. |
nRandomStart | Number of random starts for each k value. Default |
lambda | Regularization parameter. Default |
nIteration | Number of iterations for each run. Default |
nCores | Number of cores to use for each run. Default |
verbose | Whether to print progress messages. Default |
Value
A list containing:
stats | A data frame containing the k values, objective errors, andrandom starts. |
figure | A ggplot2 object showing the objective errors and variancefor each k value. The left y-axis corresponds to the dots and bands, theright second y-axis maps to the blue line that stands for the variance. |
Examples
pbmcPlot <- scaleNotCenter(pbmcPlot)# Minimum test example, not for demonstrative recommendationsuggests <- suggestK( object = pbmcPlot, kTest = c(2, 3), nRandomStart = 2, nIteration = 2)suggests$figureShow significant results from factorGSEA
Description
Show significant results from factorGSEA
Usage
## S3 method for class 'factorGSEA'summary(object, ...)Arguments
object | A |
... | S3 method convention, not used for now. |
Value
A data frame of significant tests with gene set names, factor namesand other GSEA statistics.
Update old liger object to up-to-date structure
Description
Due to massive updates since rliger 2.0, old liger object structures are nolonger compatible with the current package. This function will update theobject to the latest structure.
Usage
updateLigerObject( object, dimredName, clusterName = "clusters", h5FilePath = NULL)Arguments
object | An object of any version of rliger |
dimredName | Name of the dimension reduction embedding to be stored.Please see Details section. |
clusterName | Name of the clustering assignment variable to be stored.Please see Details section. |
h5FilePath | Named character vector for all H5 file paths. Not requiredfor object run with in-memory analysis. For object containing H5-basedanalysis (e.g. online iNMF), this must be supplied if the H5 file location isdifferent from that at creation time. |
Details
Old liger object (<1.99.0) stores only one embedding at slottsne.coords.dimredName must be specified as a singlecharacter. Pre-release version (1.99.0) stores multiple embeddings incellMeta.dimredName must be exact existing variable names incellMeta slot.
Old liger object stores clustering assignment in slotclusters.clusterName must be specified as a single character. Pre-releaseversion does not require this.
Value
Updated liger object.
Examples
## Not run: # Suppose you have a liger object of old version (<1.99.0)newLig <- updateLigerObject(oldLig, dimredName = "UMAP", clusterName = "louvain")## End(Not run)Write in-memory data into H5 file
Description
This function writes in-memory data into H5 file by default in 10x cellrangerHDF5 output format. The main goal of this function is to allow users tointegrate large H5-based dataset, that cannot be fully loaded into memory,with other data already loaded in memory usingrunOnlineINMF.In this case, users can write the smaller in-memory data to H5 file insteadof loading subset of the large H5-based dataset into memory, whereinformation might be lost.
Basing on the goal of the whole workflow, the data will always be writtenin a CSC matrix format and colnames/rownames are always required.
The default method coerces the input to adgCMatrix-classobject. Methods for other container classes tries to extract proper data andcalls the default method.
Usage
writeH5(x, file, ...)## Default S3 method:writeH5(x, file, ...)## S3 method for class 'dgCMatrix'writeH5( x, file, overwrite = FALSE, indicesPath = "matrix/indices", indptrPath = "matrix/indptr", dataPath = "matrix/data", shapePath = "matrix/shape", barcodesPath = "matrix/barcodes", featuresPath = "matrix/features/name", ...)## S3 method for class 'ligerDataset'writeH5(x, file, ...)## S3 method for class 'liger'writeH5(x, file, useDatasets, ...)Arguments
x | An object with in-memory data to be written into H5 file. |
file | A character string of the file path to be written. |
... | Arguments passed to other S3 methods. |
overwrite | Logical, whether to overwrite the file if it already exists.Default |
indicesPath,indptrPath,dataPath | The paths inside the H5 file wherethedgCMatrix-class constructor |
shapePath | The path inside the H5 file where the shape of the matrixwill be written to. Default |
barcodesPath | The path inside the H5 file where the barcodes/colnameswill be written to. Default |
featuresPath | The path inside the H5 file where the features/rownameswill be written to. Default |
useDatasets | For liger method. Names or indices of datasets to bewritten to H5 files. Required. |
Value
Nothing is returned. H5 file will be created on disk.
See Also
10X cellranger H5 matrix detail
Examples
raw <- rawData(pbmc, "ctrl")writeH5(raw, tempfile(pattern = "ctrl_", fileext = ".h5"))Write liger object to H5AD files
Description
Create an H5AD file from aliger object. This function writesonly raw counts toadata.X, while normalized and scaled expressiondata will not be written, because LIGER use different normalization andscaling strategy than most of the other tools utilizing H5AD format.
Supports for single sparse matrices or internalligerDatasetobjects are also provided if there is a need to convert single datasets.
Usage
writeH5AD(object, ...)## S3 method for class 'dgCMatrix'writeH5AD( object, filename, obs = NULL, var = NULL, overwrite = FALSE, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'ligerDataset'writeH5AD( object, filename, obs = NULL, overwrite = FALSE, verbose = getOption("ligerVerbose", TRUE), ...)## S3 method for class 'liger'writeH5AD( object, filename, overwrite = FALSE, verbose = getOption("ligerVerbose", TRUE), ...)Arguments
object | One ofliger,ligerDataset ordgCMatrix-class object. |
... | Arguments passed down to S3 methods |
filename | A character string, the path to the H5AD file to be written |
obs | External data.frame that contains metadata of the cells but doesnot embed inside the object. Rownames must be identical to the colnames ofobject. |
var | External data.frame that contains metadata of the features butdoes not embed inside the object. Rownames must be identical to the rownamesof object. |
overwrite | Logical, whether to overwrite the file if it exists. |
verbose | Logical. Whether to show information of the progress. Default |
Value
No return value, an H5AD file is written to disk with the followingspecification, assuming the file is loaded toadata in Python:
adata.X- Raw count CSR matrix, outer joined with alldatasetsadata.obs- Cell metadata, with exactly same content ofcellMeta(object)adata.var- Feature metadata containing only the feature namesas the index ofpd.DataFrame.adata.obsm['X_inmf_aligned']- The integrated embedding,aligned cell factor loading matrix, the primary output of LIGER, ifavailable.adata.obsm['X_inmf']- The raw cell factor loading matrix, ifavailable.adata.obsm['<dimRedName>']- The dimensional reduction matrix,such as UMAP or TSNE, if available.adata.uns['inmf']['W']- The shared factor feature loadingmatrix, if available.adata.uns['inmf']['V']['<datasetName>']- The dataset-specificfactor feature loading matrix, if available.adata.uns['inmf']['features']- The variable features beingused for factorization, supposed to match to the second shape of W and V,if available.adata.uns['inmf']['lambda']- The hyperparameter lambda used,the regularization parameter for the factorization, if available.adata.uns['inmf']['k']- The number of factors used for thefactorization, if available.
Examples
print("The example below works, but causes PDF manual rendering issue for some reason")## Not run: writeH5AD(pbmc, filename = tempfile(fileext = ".h5ad"))## End(Not run)