Movatterモバイル変換

Version:

2.2.1

Date:

2025-08-26

Type:

Package

Title:

Linked Inference of Genomic Experimental Relationships

Description:

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Author:

Joshua Welch [aut], Yichen Wang [aut, cre], Chao Gao [aut], Jialin Liu [aut], Joshua Sodicoff [aut, ctb], Velina Kozareva [aut, ctb], Evan Macosko [aut, ctb], Paul Hoffman [ctb], Ilya Korsunsky [ctb], Robert Lee [ctb], Andrew Robbins [ctb]

Maintainer:

Yichen Wang <wayichen@umich.edu>

BugReports:

https://github.com/welch-lab/liger/issues

URL:

https://welch-lab.github.io/liger/

License:

GPL-3

LazyData:

true

RoxygenNote:

7.3.2

VignetteBuilder:

knitr

Encoding:

UTF-8

LinkingTo:

Rcpp, RcppArmadillo, RcppProgress

Depends:

methods, stats, utils, R (≥ 3.5)

Imports:

cli, DelayedArray, dplyr, ggplot2, grid, HDF5Array, hdf5r,leidenAlg (≥ 1.1.1), lifecycle, magrittr, Matrix, RANN, Rcpp,RcppPlanc (≥ 2.0.0), rlang, S4Vectors, scales, uwot

Suggests:

AnnotationDbi, circlize, ComplexHeatmap, cowplot, DESeq2,EnhancedVolcano, fgsea, GenomicRanges, ggrepel, gprofiler2,IRanges, knitr, org.Hs.eg.db, plotly, psych, reactome.db,rmarkdown, Rtsne, sankey, scattermore (≥ 0.7), Seurat,SeuratObject, SingleCellExperiment, SummarizedExperiment,testthat, viridis

NeedsCompilation:

yes

Packaged:

2025-08-26 17:18:06 UTC; wangych

Repository:

CRAN

Date/Publication:

2025-08-26 17:50:02 UTC

rliger: Linked Inference of Genomic Experimental Relationships

Description

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019)doi:10.1016/j.cell.2019.05.006, and Liu J, Gao C, Sodicoff J, et al (2020)doi:10.1038/s41596-020-0391-8 for more details.

Author(s)

Maintainer: Yichen Wangwayichen@umich.edu

Authors:

Joshua Welchwelchjd@umich.edu
Chao Gaogchao@umich.edu
Jialin Liualanliu@umich.edu
Joshua Sodicoffsodicoff@umich.edu [contributor]
Velina Kozareva [contributor]
Evan Macosko [contributor]

Other contributors:

Paul Hoffman [contributor]
Ilya Korsunsky [contributor]
Robert Lee [contributor]
Andrew Robbinsrobbiand@med.umich.edu [contributor]

Generate dot plot from input matrix with ComplexHeatmap

Description

Generate dot plot from input matrix with ComplexHeatmap

Usage

.complexHeatmapDotPlot(  colorMat,  sizeMat,  featureAnnDF = NULL,  cellSplitVar = NULL,  cellLabels = NULL,  maxDotSize = 4,  clusterFeature = FALSE,  clusterCell = FALSE,  legendColorTitle = "Matrix Value",  legendSizeTitle = "Fraction Value",  transpose = FALSE,  baseSize = 8,  cellTextSize = NULL,  featureTextSize = NULL,  cellTitleSize = NULL,  featureTitleSize = NULL,  legendTextSize = NULL,  legendTitleSize = NULL,  featureGrpRot = 0,  viridisOption = "C",  viridisDirection = -1,  ...)

Arguments

colorMat,sizeMat

Matrix of the same size. Values incolorMatwill be visualized with color while values insizeMat will bereflected by dot size.

featureAnnDF

Data frame of features containing feature names andgrouping labels.

cellSplitVar

Split the cell orientation (default columns) by thisvariable.

cellLabels

Label to be shown on cell orientation.

maxDotSize

The maximum dot size. Default4.

clusterFeature,clusterCell

Whether the feature/cell orientation(default rows/column, respectively) should be clustered. DefaultFALSE.

legendColorTitle,legendSizeTitle

The title for color bar and dot sizelegends, repectively. Default see"Matrix Value" and"FractionValue".

transpose

Logical, whether to rotate the dot plot orientation. i.e.rows as cell aggregation and columns as features. DefaultFALSE.

baseSize

One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this. Default8.

cellTextSize,featureTextSize,legendTextSize

Size of cell labels,feature label and legend text. DefaultNULL controls bybaseSize.

cellTitleSize,featureTitleSize,legendTitleSize

Size of titles oncell and feature orientation and legend title. DefaultNULL controlsbybaseSize + 2.

featureGrpRot

Number of degree to rotate the feature grouping label.Default0.

viridisOption,viridisDirection

See argumentoption anddirection ofviridis. Default"A"and-1.

...

Additional arguments passed toHeatmap.

Value

AHeatmapList object.

Produce single violin plot with data frame passed from upstream

Description

Produce single violin plot with data frame passed from upstream

Usage

.ggCellViolin(  plotDF,  y,  groupBy = NULL,  colorBy = NULL,  violin = TRUE,  violinAlpha = 0.8,  violinWidth = 0.9,  box = FALSE,  boxAlpha = 0.6,  boxWidth = 0.4,  dot = FALSE,  dotColor = "black",  dotSize = getOption("ligerDotSize"),  xlabAngle = 45,  raster = NULL,  seed = 1,  ...)

Arguments

plotDF

Data frame like object (fortifiable) that contains allnecessary information to make the plot.

y,groupBy,colorBy

SeeplotCellViolin.

violin,box,dot

Logical, whether to add violin plot, box plot or dot(scatter) plot, respectively. Layers are added in the order of dot, violin,and violin on the top surface. By default, only violin plot is generated.

violinAlpha,boxAlpha

Numeric, controls the transparency of layers.Default0.8,0.6, respectively.

violinWidth,boxWidth

Numeric, controls the width of violin/boxbounding box. Default0.9 and0.4.

dotColor,dotSize

Numeric, globally controls the appearance of alldots. Default"black" andgetOption("ligerDotSize") (1).

xlabAngle

Numeric, counter-clockwise rotation angle of X axis labeltext. Default45.

raster

Logical, whether to rasterize the dot plot. DefaultNULLautomatically rasterizes the dot plot when number of total cells to beplotted exceeds 100,000.

seed

Random seed for reproducibility. Default1.

...

More theme setting arguments passed to.ggplotLigerTheme.

Value

ggplot object by default. Whenplotly = TRUE, returnsplotly (htmlwidget) object.

Produce single scatter plot with data frame passed from upstream

Description

Produce single scatter plot with data frame passed from upstream

Usage

.ggScatter(  plotDF,  x,  y,  colorBy = NULL,  shapeBy = NULL,  dotOrder = c("shuffle", "ascending", "descending"),  dotSize = getOption("ligerDotSize"),  dotAlpha = 0.9,  trimHigh = NULL,  trimLow = NULL,  zeroAsNA = TRUE,  raster = NULL,  labelBy = colorBy,  labelText = TRUE,  labelTextSize = 4,  ggrepelLabelTick = FALSE,  seed = 1,  ...)

Arguments

plotDF

Data frame like object (fortifiable) that contains allnecessary information to make the plot.

x,y

Available variable name incellMeta slot to look forthe dot coordinates. See details.

colorBy,shapeBy

SeeplotDimRed.

dotOrder

Controls the order that each dot is added to the plot. Choosefrom"shuffle","ascending", or"descending". Default"shuffle", useful when coloring by categories that overlaps (e.g."dataset"),"ascending" can be useful when coloring by a continuousvariable (e.g. gene expression) where high values needs morehighlight.NULL use default order.

dotSize,dotAlpha

Numeric, controls the size or transparency of alldots. DefaultgetOption("ligerDotSize") (1) and0.9.

trimHigh,trimLow

Numeric, limit the largest or smallest value ofcontinuouscolorBy variable. DefaultNULL.

zeroAsNA

Logical, whether to set zero values in continuouscolorBy variable toNA so the color of these value.

raster

Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.

labelBy

A variable name available inplotDF. If the variable iscategorical (a factor), the label position will be the median coordinates ofall dots within the same group. Unique labeling in character vector for eachdot is also acceptable. DefaultcolorBy.

labelText

Logical, whether to show text label at the median positionof each categorical group specified bycolorBy. DefaultTRUE.Does not work when continuous coloring is specified.

labelTextSize

Numeric, controls the size of label size whenlabelText = TRUE. Default4.

ggrepelLabelTick

Logical, whether to force showing the tick betweenlabel texts and the position they point to. Useful when a lot of text labelsare required. DefaultFALSE. Runoptions(ggrepel.max.overlaps = n) before plotting to set allowed labeloverlaps.

seed

Random seed for reproducibility. Default1.

...

More theme setting arguments passed to.ggplotLigerTheme.

Details

Having package "ggrepel" installed can help adding tidier textlabels on the scatter plot.

Value

ggplot object by default. Whenplotly = TRUE, returnsplotly (htmlwidget) object.

Generic ggplot theme setting for rliger package

Description

Controls content and size of all peripheral texts.

Usage

.ggplotLigerTheme(  plot,  title = NULL,  subtitle = NULL,  xlab = TRUE,  ylab = TRUE,  xlabAngle = 0,  legendColorTitle = NULL,  legendFillTitle = NULL,  legendShapeTitle = NULL,  legendSizeTitle = NULL,  showLegend = TRUE,  legendPosition = "right",  baseSize = getOption("ligerBaseSize"),  titleSize = NULL,  subtitleSize = NULL,  xTextSize = NULL,  xFacetSize = NULL,  xTitleSize = NULL,  yTextSize = NULL,  yFacetSize = NULL,  yTitleSize = NULL,  legendTextSize = NULL,  legendTitleSize = NULL,  legendDotSize = 4,  panelBorder = FALSE,  legendNRow = NULL,  legendNCol = NULL,  colorLabels = NULL,  colorValues = NULL,  colorPalette = "magma",  colorDirection = -1,  naColor = "#DEDEDE",  colorLow = NULL,  colorMid = NULL,  colorHigh = NULL,  colorMidPoint = NULL,  plotly = FALSE)

Arguments

plot

ggplot object passed from wrapper plotting functions

title,subtitle,xlab,ylab

Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.

xlabAngle

Numeric, counter-clockwise rotation angle of X axis labeltext. Default0 shows horizontal text.

legendColorTitle

Legend title text for color aesthetics, often usedfor categorical or continuous coloring of dots. DefaultNULL shows theoriginal variable name.

legendFillTitle

Legend title text for fill aesthetics, often used forviolin, box, bar plots. DefaultNULL shows the original variable name.

legendShapeTitle

Legend title text for shape aesthetics, often usedfor shaping dots by categorical variable. DefaultNULL shows theoriginal variable name.

legendSizeTitle

Legend title text for size aesthetics, often used forsizing dots by continuous variable. DefaultNULL shows the originalvariable name.

showLegend

Whether to show the legend. DefaultTRUE.

legendPosition

Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".

baseSize

One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.

titleSize,xTitleSize,yTitleSize,legendTitleSize

Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.

subtitleSize,xTextSize,yTextSize,legendTextSize

Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.

xFacetSize

Size of facet strip label text on x-axis. DefaultNULL controls bybaseSize - 2.

yFacetSize

Size of facet strip label text on y-axis. DefaultNULL controls bybaseSize - 2.

legendDotSize

Allow dots in legend region to be large enough to seethe colors/shapes clearly. Default4.

panelBorder

Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.

legendNRow,legendNCol

Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.

colorLabels

Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.

colorValues

Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.

colorPalette

For continuous coloring, an index or a palette name toselect from available options from ggplotscale_brewer orviridis.Default"magma".

colorDirection

Choose1 or-1. Applied whencolorPalette is from Viridis options. Default-1 use darkercolor for higher value, while1 reverses this direction.

naColor

The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.

colorLow,colorMid,colorHigh,colorMidPoint

All four of these must bespecified to customize palette with

plotly

Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

Updated ggplot object by default. Whenplotly = TRUE, returnsplotly (htmlwidget) object.

General heatmap plotting with prepared matrix and data.frames

Description

This is not an exported function. This documentation justserves for a manual of extra arguments that users can use when generatingheatmaps withplotGeneHeatmap orplotFactorHeatmap.

Note that the following arguments are pre-occupied by upstream wrappers sousers should not include them in a function call:dataMatrix,dataName,cellDF,featureDF,cellSplitVar,featureSplitVar.

The following arguments ofHeatmap is occupiedby this function, so users should include them in a function call as well:matrix,name,col,heatmap_legend_param,top_annotation,column_title_gp,column_names_gp,show_column_names,column_split,column_gap,left_annotation,row_title_gp,row_names_gp,show_row_names,row_split,row_gap.

Usage

.plotHeatmap(  dataMatrix,  dataName = "Value",  cellDF = NULL,  featureDF = NULL,  transpose = FALSE,  cellSplitVar = NULL,  featureSplitVar = NULL,  dataScaleFunc = NULL,  showCellLabel = FALSE,  showCellLegend = TRUE,  showFeatureLabel = TRUE,  showFeatureLegend = TRUE,  cellAnnColList = NULL,  featureAnnColList = NULL,  scale = FALSE,  trim = c(-2, 2),  baseSize = 8,  cellTextSize = NULL,  featureTextSize = NULL,  cellTitleSize = NULL,  featureTitleSize = NULL,  legendTextSize = NULL,  legendTitleSize = NULL,  viridisOption = "A",  viridisDirection = -1,  RColorBrewerOption = "RdBu",  ...)

Arguments

dataMatrix

Matrix object with features/factors as rows and cells ascolumns.

dataName

Text for heatmap color bar title. DefaultValue.

cellDF

data.frame object. Number of rows must match with number ofcolumns ofdataMatrix.

featureDF

data.frame object. Number of columns must match with numberof rows ofdataMatrix.

transpose

Logical, whether to "rotate" the heatmap by 90 degrees sothat cell information is displayed by row. DefaultFALSE.

cellSplitVar,featureSplitVar

Subset columns ofcellDF orfeatureDF, respectively.

dataScaleFunc

A function object, applied todataMatrix.

showCellLabel,showFeatureLabel

Logical, whether to show cell barcodes,gene symbols or factor names. DefaultTRUE for gene/factors butFALSE for cells.

showCellLegend,showFeatureLegend

Logical, whether to show cell orfeature legends. DefaultTRUE. Can be a scalar for overall controlor a vector matching with each given annotation variable.

cellAnnColList,featureAnnColList

List object, with each element anamed vector of R-interpretable color code. The names of the list elementsare used for matching the annotation variable names. The names of the colorsin the vectors are used for matching the levels of a variable (factor object,categorical). DefaultNULL generates ggplot-flavor categorical colors.

scale

Logical, whether to take z-score to scale and center geneexpression. Applied afterdataScaleFunc. DefaultFALSE.

trim

Numeric vector of two values. Limit the z-score value into thisrange whenscale = TRUE. Defaultc(-2, 2).

baseSize

One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.

cellTextSize,featureTextSize,legendTextSize

Size of cell barcodelabels, gene/factor labels, or legend values. DefaultNULL.

cellTitleSize,featureTitleSize,legendTitleSize

Size of titles of thecell slices, gene/factor slices, or the legends. DefaultNULL.

viridisOption,viridisDirection

See argumentoption anddirection ofviridis. Default"A"and-1.

RColorBrewerOption

Whenscale = TRUE, heatmap color will bemapped withbrewer.pal. This is passed toname. Default"RdBu".

...

Additional arguments to be passed toHeatmap.

Value

HeatmapList-class object

Apply function to chunks of H5 data in ligerDataset object

Description

h5 calculation wrapper, that runs specified calculation withon-disk matrix in chunks

Usage

H5Apply(  object,  FUN,  init = NULL,  useData = c("rawData", "normData"),  chunkSize = 1000,  verbose = getOption("ligerVerbose"),  ...)

Arguments

object

AligerDataset object.

FUN

A function that is applied to each chunk. See detail forrestrictions.

init

Initialized result if it need to be updated iteratively. DefaultNULL.

useData

The slot name of the data to be processed. Choose from"rawData","normData","scaleData". Default"rawData".

chunkSize

Number if columns to be included in each chunk.Default1000.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") which isTRUE if users have not set.

...

Other arguments to be passed toFUN.

Details

TheFUN function has to have the first four arguments orderedby:

chunk data: A sparse matrix(dgCMatrix-class) containing maximumchunkSizecolumns.
x-vector index: The index that subscribes the vector ofxslot of a dgCMatrix, which points to the values in each chunk. Mostly usedwhen need to write a new sparse matrix to H5 file.
cell index: The column index of each chunk out of the wholeoriginal matrix
Initialized result: A customized object, the value passed toH5Apply(init) argument will be passed here in the first iteration. Andthe returned value ofFUN will be iteratively passed here in nextchunk iterations. So it is important to keep the object structure of thereturned value consistent withinit.

No default value to these four arguments should be pre-defined becauseH5Apply will automatically generate the input.

Align factor loadings to get final integration

Description

This function is a wrapper to switch between alternative factor loadingalignment methods that LIGER provides, which is a required step for producingthe final integrated result. Two methods are provided (click on options formore details):

method = "quantileNorm": Previously published quantilenormalization method. (default)
method = "centroidAlign": Newly developed centroidalignment method.

Usage

alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)## S3 method for class 'liger'alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)## S3 method for class 'Seurat'alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)

Arguments

object

Aliger or Seurat object with valid factorizationresult available (i.e.runIntegration performed in advance).

method

Character, method to align factors. Default"centroidAlign". Optionally"quantileNorm".

...

Additional arguments passed to selected methods.For"quantileNorm":

quantiles: Number of quantiles to use for quantilenormalization. Default50.
reference: Character, numeric or logical selection of onedataset, out of all available datasets inobject, to use as a"reference" for quantile normalization. DefaultNULL tries to findan RNA dataset with the largest number of cells; if no RNA datasetavailable, use the globally largest dataset.
minCells: Minimum number of cells to consider a clustershared across datasets. Default20.
nNeighbors: Number of nearest neighbors for within-datasetknn graph. Default20.
useDims: Indices of factors to use for shared nearest factordetermination. DefaultNULL uses all factors.
center: Whether to center the data when scaling factors.Could be useful for less sparse modalities like methylation data.DefaultFALSE.
maxSample: Maximum number of cells used for quantilenormalization of each cluster and factor. Default1000.
eps: The error bound of the nearest neighbor search. Lowervalues give more accurate nearest neighbor graphs but take much longer tocompute. Default0.9.
refineKNN: Whether to increase robustness of clusterassignments using KNN graph. DefaultTRUE.
clusterName: Variable name that will store the clusteringresult in metadata of aliger object or aSeuratobject. Default"quantileNorm_cluster".
seed: Random seed to allow reproducible results. Default1.
verbose: Logical. Whether to show information of theprogress. DefaultgetOption("ligerVerbose") orTRUE ifusers have not set.

For"centroidAlign":

lambda: Ridge regression penalty applied to each dataset.Can be one number that applies to all datasets, or a numeric vector withlength equal to the number of datasets. Default1.
useDims: Indices of factors to use considered for thealignment. DefaultNULL uses all factors.
scaleEmb: Logical, whether to scale the factor loading beingconsidered as the embedding. DefaultTRUE.
centerEmb: Logical, whether to center the factor loadingbeing considered as the embedding before scaling it. DefaultTRUE.
scaleCluster: Logical, whether to scale the factor loadingbeing considered as the cluster assignment probability. DefaultFALSE.
centerCluster: Logical, whether to center the factor loadingbeing considered as the cluster assignment probability before scaling it.DefaultFALSE.
shift: Logical, whether to shift the factor loading beingconsidered as the cluster assignment probability after centered scaling.DefaultFALSE.
diagnosis: Logical, whether to return cell metadata variableswith diagnostic information. DefaultFALSE.

Converting other classes of data to a liger object

Description

This function converts data stored in SingleCellExperiment (SCE), Seuratobject or a merged sparse matrix (dgCMatrix) into a liger object. This isdesigned for a container object or matrix that already contains multipledatasets to be integerated with LIGER. For individual datasets, please usecreateLiger instead.

Usage

## S3 method for class 'dgCMatrix'as.liger(object, datasetVar = NULL, modal = NULL, ...)## S3 method for class 'SingleCellExperiment'as.liger(object, datasetVar = NULL, modal = NULL, ...)## S3 method for class 'Seurat'as.liger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...)seuratToLiger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...)as.liger(object, ...)

Arguments

object

Object.

datasetVar

Specify the dataset belonging by: 1. Select a variable fromexisting metadata in the object (e.g. colData column); 2. Specify avector/factor that assign the dataset belonging. 3. Give a single characterstring which means that all data is from one dataset (must not be a metadatavariable, otherwise it is understood as 1.). DefaultNULL gathersthings into one dataset and names it "sample" for dgCMatrix, attemptsto find variable "sample" from SCE or "orig.ident" from Seurat.

modal

Modality setting for each dataset. SeecreateLiger.

...

Additional arguments passed tocreateLiger

assay

Name of assay to use. DefaultNULL uses current activeassay.

Details

For Seurat V5 structure, it is highly recommended that users make use of itssplit layer feature, where things like "counts", "data", and "scale.data"can be held for each dataset in the same Seurat object, e.g. with"count.ctrl", "count.stim", not merged. If a Seurat object with split layersis given,datasetVar will be ignored and the layers will be directlyused.

Value

aliger object.

Examples

# dgCMatrix (common sparse matrix class), usually obtained from other# container object, and contains multiple samples merged in one.matList <- rawData(pbmc)multiSampleMatrix <- mergeSparseAll(matList)# The `datasetVar` argument expects the variable assigning the sample sourcepbmc2 <- as.liger(multiSampleMatrix, datasetVar = pbmc$dataset)pbmc2if (requireNamespace("SingleCellExperiment", quietly = TRUE)) {    sce <- SingleCellExperiment::SingleCellExperiment(        assays = list(counts = multiSampleMatrix)    )    sce$sample <- pbmc$dataset    pbmc3 <- as.liger(sce, datasetVar = "sample")    pbmc3}if (requireNamespace("Seurat", quietly = TRUE)) {    seu <- SeuratObject::CreateSeuratObject(multiSampleMatrix)    # Seurat creates variable "orig.ident" by identifying the cell barcode    # prefixes, which is indeed what we need in this case. Users might need    # to be careful and have it confirmed first.    pbmc4 <- as.liger(seu, datasetVar = "orig.ident")    pbmc4    # As per Seurat V5 updates with layered data, specifically helpful udner the    # scenario of dataset integration. "counts" and etc for each datasets can be    # split into layers.    seu5 <- seu    seu5[["RNA"]] <- split(seu5[["RNA"]], pbmc$dataset)    print(SeuratObject::Layers(seu5))    pbmc5 <- as.liger(seu5)    pbmc5}

Converting other classes of data to a ligerDataset object

Description

Works for converting a matrix or container object to a single ligerDataset,and can also convert the modality preset of a ligerDataset. When used witha dense matrix object, it automatically converts the matrix to sparse form(dgCMatrix-class). When used with container objectssuch as Seurat or SingleCellExperiment, it is highly recommended that theobject contains only one dataset/sample which is going to be integrated withLIGER. For multi-sample objects, please useas.liger withdataset source variable specified.

Usage

## S3 method for class 'ligerDataset'as.ligerDataset(  object,  modal = c("default", "rna", "atac", "spatial", "meth"),  ...)## Default S3 method:as.ligerDataset(  object,  modal = c("default", "rna", "atac", "spatial", "meth"),  ...)## S3 method for class 'matrix'as.ligerDataset(  object,  modal = c("default", "rna", "atac", "spatial", "meth"),  ...)## S3 method for class 'Seurat'as.ligerDataset(  object,  modal = c("default", "rna", "atac", "spatial", "meth"),  assay = NULL,  ...)## S3 method for class 'SingleCellExperiment'as.ligerDataset(  object,  modal = c("default", "rna", "atac", "spatial", "meth"),  ...)as.ligerDataset(object, ...)

Arguments

object

Object.

modal

Modality setting for each dataset. Choose from"default","rna","atac","spatial","meth".

...

Additional arguments passed tocreateLigerDataset

assay

Name of assay to use. DefaultNULL uses current activeassay.

Value

aliger object.

Examples

ctrl <- dataset(pbmc, "ctrl")ctrl# Convert the modality presetas.ligerDataset(ctrl, modal = "atac")rawCounts <- rawData(ctrl)class(rawCounts)as.ligerDataset(rawCounts)

liger object of bone marrow subsample data with RNA and ATAC modality

Description

liger object of bone marrow subsample data with RNA and ATAC modality

Usage

bmmc

Format

liger object with two dataset named by "rna" and "atac"

Source

https://www.nature.com/articles/s41587-019-0332-7

References

Jeffrey M. Granja and et. al., Nature Biotechnology, 2019

Calculate adjusted Rand index (ARI) by comparing two cluster labeling variables

Description

This function aims at calculating the adjusted Rand index for the clusteringresult obtained with LIGER and the external clustering (existing "true"annotation). ARI ranges from 0 to 1, with a score of 0 indicating noagreement between clusterings and 1 indicating perfect agreement.

The true clustering annotation must be specified as the base line. We suggestsetting it to the object cellMeta so that it can be easily used for manyother visualization and evaluation functions.

The ARI can be calculated for only specified datasets, since true annotationmight not be available for all datasets. Evaluation for only one or a fewdatasets can be done by specifyinguseDatasets. IfuseDatasetsis specified, the argument checking fortrueCluster anduseCluster will be enforced to match the cells in the specifieddatasets.

Usage

calcARI(  object,  trueCluster,  useCluster = NULL,  useDatasets = NULL,  verbose = getOption("ligerVerbose", TRUE),  classes.compare = trueCluster)

Arguments

object

Aliger object, with the clustering resultpresent in cellMeta.

trueCluster

Either the name of one variable incellMeta(object)or a factor object with annotation that matches with all cells beingconsidered.

useCluster

The name of one variable incellMeta(object).DefaultNULL uses default clusters.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be considered for the puritycalculation. DefaultNULL uses all datasets.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

classes.compare

. UsetrueCluster instead.

Value

A numeric scalar, the ARI of the clustering result indicated byuseCluster compared totrueCluster.

A numeric scalar of the ARI value

References

L. Hubert and P. Arabie (1985) Comparing Partitions, Journal ofthe Classification, 2, pp. 193-218.

Examples

# Assume the true cluster in `pbmcPlot` is "leiden_cluster"# generate fake new labelingfake <- sample(1:7, ncol(pbmcPlot), replace = TRUE)# Insert into cellMetapbmcPlot$new <- factor(fake)calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new")# Now assume we got existing base line annotation only for "stim" datasetnStim <- ncol(dataset(pbmcPlot, "stim"))stimTrueLabel <- factor(fake[1:nStim])# Insert into cellMetacellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel# Assume "leiden_cluster" is the clustering result we got and need to be# evaluatedcalcARI(pbmcPlot, trueCluster = "stim_true_label",        useCluster = "leiden_cluster", useDatasets = "stim")# Comparison of the same labeling should always yield 1.calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")

Calculate agreement metric after integration

Description

This metric quantifies how much the factorization and alignment distorts thegeometry of the original datasets. The greater the agreement, the lessdistortion of geometry there is. This is calculated by performingdimensionality reduction on the original and integrated (factorized or plusaligned) datasets, and measuring similarity between the k nearestneighbors for each cell in original and integrated datasets. The Jaccardindex is used to quantify similarity, and is the final metric averages acrossall cells.

Note that for most datasets, the greater the chosennNeighbor, thegreater the agreement in general. Although agreement can theoreticallyapproach 1, in practice it is usually no higher than 0.2-0.3.

Usage

calcAgreement(  object,  ndims = 40,  nNeighbors = 15,  useRaw = FALSE,  byDataset = FALSE,  seed = 1,  dr.method = NULL,  k = nNeighbors,  use.aligned = NULL,  rand.seed = seed,  by.dataset = byDataset)

Arguments

object

liger object. Should callalignFactorsbefore calling.

ndims

Number of factors to produce in NMF. Default40.

nNeighbors

Number of nearest neighbors to use in calculating Jaccardindex. Default15.

useRaw

Whether to evaluate just factorizedH matrices instead ofusing alignedH.norm matrix. DefaultFALSE usesaligned matrix.

byDataset

Whether to return agreement calculated for each datasetinstead of the average for all datasets. DefaultFALSE.

seed

Random seed to allow reproducible results. Default1.

dr.method

We no longer support othermethods but just NMF.

k,rand.seed,by.dataset

See Usagefor replacement.

use.aligned

UseuseRawinstead.

Value

A numeric vector of agreement metric. A single value ifbyDataset = FALSE or each dataset a value otherwise.

Examples

if (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- pbmc %>%    normalize %>%    selectGenes %>%    scaleNotCenter %>%    runINMF %>%    alignFactors    calcAgreement(pbmc)}

Calculate alignment metric after integration

Description

This metric quantifies how well-aligned two or more datasets are. We randomlydownsample all datasets to have as many cells as the smallest one. Weconstruct a nearest-neighbor graph and calculate for each cell how many ofits neighbors are from the same dataset. We average across all cells andcompare to the expected value for perfectly mixed datasets, and scale thevalue from 0 to 1. Note that in practice, alignment can be greater than 1occasionally.

Usage

calcAlignment(  object,  clustersUse = NULL,  clusterVar = NULL,  nNeighbors = NULL,  cellIdx = NULL,  cellComp = NULL,  resultBy = c("all", "dataset", "cell"),  seed = 1,  k = nNeighbors,  rand.seed = seed,  cells.use = cellIdx,  cells.comp = cellComp,  clusters.use = clustersUse,  by.cell = NULL,  by.dataset = NULL)

Arguments

object

Aliger object, withalignFactorsalready run.

clustersUse

The clusters to consider for calculating the alignment.Should be a vector of existing levels inclusterVar. DefaultNULL. See Details.

clusterVar

The name of one variable incellMeta(object).DefaultNULL uses default clusters.

nNeighbors

Number of neighbors to use in calculating alignment.DefaultNULL usesfloor(0.01*ncol(object)), with a lower boundof 10 in all cases except where the total number of sampled cells is lessthan 10.

cellIdx,cellComp

Character, logical or numeric index that cansubscribe cells. DefaultNULL. See Details.

resultBy

Select from"all","dataset" or"cell".On which level should the mean alignment be calculated. Default"all".

seed

Random seed to allow reproducible results. Default1.

k,rand.seed,cells.use,cells.comp,clusters.use

Please see Usage for replacement.

by.cell,by.dataset

UseresultBy instead.

Details

\bar{x} is the average number of neighbors belonging to any cells' samedataset,N is the number of datasets,k is the number ofneighbors in the KNN graph.

1 - \frac{\bar{x} - \frac{k}{N}}{k - \frac{k}{N}}

The selection on cells to be measured can be done in various way andrepresent different scenarios:

By default, all cells are considered and the alignment across alldatasets will be calculated.
SelectclustersUse fromclusterVar to use cells from theclusters of interests. This measures the alignment across all covereddatasets within the specified clusters.
Only SpecifycellIdx for flexible selection. This measures thealignment across all covered datasets within the specified cells. A none-NULLcellIdx privileges overclustersUse.
SpecifycellIdx andcellComp at the same time, so thatthe original dataset source will be ignored and cells specified by eachargument will be regarded as from each a dataset. This measures the alignmentbetween cells specified by the two arguments.cellComp can containcells already specified incellIdx.

Value

The alignment metric.

Examples

if (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- pbmc %>%    normalize %>%    selectGenes %>%    scaleNotCenter %>%    runINMF %>%    alignFactors    calcAlignment(pbmc)}

Calculate a dataset-specificity score for each factor

Description

This score represents the relative magnitude of thedataset-specific components of each factor's gene loadings compared to theshared components for two datasets. First, for each dataset we calculate thenorm of the sum of each factor's shared loadings (W) anddataset-specific loadings (V). We then determine the ratio of these twovalues and subtract from 1... TODO: finish description.

Usage

calcDatasetSpecificity(  object,  dataset1,  dataset2,  doPlot = FALSE,  do.plot = doPlot)

Arguments

object

liger object with factorization results.

dataset1

Name of first dataset. Required.

dataset2

Name of second dataset. Required.

doPlot

Logical. Whether to display a barplot of dataset specificityscores (by factor). DefaultFALSE.

do.plot

Deprecated. UsedoPlot instead.

Value

List containing three elements.

pct1

Vector of the norm of each metagene factor for dataset1.

pct2

Vector of the norm of each metagene factor for dataset2.

pctSpec

Vector of dataset specificity scores.

Calculate Normalized Mutual Information (NMI) by comparing two clusterlabeling variables

Description

This function aims at calculating the Normalized Mutual Information for theclustering result obtained with LIGER and the external clustering (existing"true" annotation). NMI ranges from 0 to 1, with a score of 0 indicating noagreement between clusterings and 1 indicating perfect agreement. Themathematical definition of NMI is as follows:

H(X) = -\sum_{x \in X}P(X=x)\log_2 P(X=x)

H(X|Y) = -\sum_{y \in Y}P(Y=y)\sum_{x \in X}P(X=x|Y=y)\log_2 P(X=x|Y=y)

I(X;Y) = H(X) - H(X|Y)

NMI(X;Y) = \frac{I(X;Y)}{\sqrt{H(X)H(Y)}}

WhereX is the cluster variable to be evaluated andY is the truecluster variable.x andy are the cluster labels inX andY respectively.H is the entropy andI is the mutualinformation.

The true clustering annotation must be specified as the base line. We suggestsetting it to the object cellMeta so that it can be easily used for manyother visualization and evaluation functions.

The NMI can be calculated for only specified datasets, since true annotationmight not be available for all datasets. Evaluation for only one or a fewdatasets can be done by specifyinguseDatasets. IfuseDatasetsis specified, the argument checking fortrueCluster anduseCluster will be enforced to match the cells in the specifieddatasets.

Usage

calcNMI(  object,  trueCluster,  useCluster = NULL,  useDatasets = NULL,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

Aliger object, with the clustering resultpresent in cellMeta.

trueCluster

Either the name of one variable incellMeta(object)or a factor object with annotation that matches with all cells beingconsidered.

useCluster

The name of one variable incellMeta(object).DefaultNULL uses default clusters.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be considered for the puritycalculation. DefaultNULL uses all datasets.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

Value

A numeric scalar of the NMI value

Examples

# Assume the true cluster in `pbmcPlot` is "leiden_cluster"# generate fake new labelingfake <- sample(1:7, ncol(pbmcPlot), replace = TRUE)# Insert into cellMetapbmcPlot$new <- factor(fake)calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new")# Now assume we got existing base line annotation only for "stim" datasetnStim <- ncol(dataset(pbmcPlot, "stim"))stimTrueLabel <- factor(fake[1:nStim])# Insert into cellMetacellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel# Assume "leiden_cluster" is the clustering result we got and need to be# evaluatedcalcNMI(pbmcPlot, trueCluster = "stim_true_label",        useCluster = "leiden_cluster", useDatasets = "stim")# Comparison of the same labeling should always yield 1.calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")

Calculate purity by comparing two cluster labeling variables

Description

This function aims at calculating the purity for the clustering resultobtained with LIGER and the external clustering (existing "true" annotation).Purity can sometimes be a more useful metric when the clustering to be testedcontains more subgroups or clusters than the true clusters. Purity rangesfrom 0 to 1, with a score of 1 representing a pure, accurate clustering.

The true clustering annotation must be specified as the base line. We suggestsetting it to the object cellMeta so that it can be easily used for manyother visualization and evaluation functions.

The purity can be calculated for only specified datasets, since trueannotation might not be available for all datasets. Evaluation for only oneor a few datasets can be done by specifyinguseDatasets. IfuseDatasets is specified, the argument checking fortrueClusteranduseCluster will be enforced to match the cells in the specifieddatasets.

Usage

calcPurity(  object,  trueCluster,  useCluster = NULL,  useDatasets = NULL,  verbose = getOption("ligerVerbose", TRUE),  classes.compare = trueCluster)

Arguments

object

Aliger object, with the clustering resultpresent in cellMeta.

trueCluster

Either the name of one variable incellMeta(object)or a factor object with annotation that matches with all cells beingconsidered.

useCluster

The name of one variable incellMeta(object).DefaultNULL uses default clusters.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be considered for the puritycalculation. DefaultNULL uses all datasets.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

classes.compare

UsetrueCluster instead.

Value

A numeric scalar, the purity of the clustering result indicated byuseCluster compared totrueCluster.

Examples

# Assume the true cluster in `pbmcPlot` is "leiden_cluster"# generate fake new labelingfake <- sample(1:7, ncol(pbmcPlot), replace = TRUE)# Insert into cellMetapbmcPlot$new <- factor(fake)calcPurity(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new")# Now assume we got existing base line annotation only for "stim" datasetnStim <- ncol(dataset(pbmcPlot, "stim"))stimTrueLabel <- factor(fake[1:nStim])# Insert into cellMetacellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel# Assume "leiden_cluster" is the clustering result we got and need to be# evaluatedcalcPurity(pbmcPlot, trueCluster = "stim_true_label",           useCluster = "leiden_cluster", useDatasets = "stim")

Cell cycle gene set for human

Description

Copied from Seurat::cc.genes

Usage

ccGeneHuman

Format

A list of two character vectors:

s.genes: Genes associated with S-phase
g2m.genes: Genes associated with G2M-phase

Source

https://www.science.org/doi/abs/10.1126/science.aad0501

Align factor loading by centroid alignment (beta)

Description

This process treats the factor loading of each dataset as the low dimensionalembedding as well as the cluster assignment probability, i.e. the softclustering result. Then the method aligns the embedding by linearly movingthe centroids of the same cluster but within each dataset towards each other.

ATTENTION: This method is still under development while has shownencouraging results in benchmarking tests. The arguments and their defaultvalues reflect the best scored parameters in the tests and some of them maybe subject to change in the future.

Usage

centroidAlign(object, ...)## S3 method for class 'liger'centroidAlign(  object,  lambda = 1,  useDims = NULL,  scaleEmb = TRUE,  centerEmb = TRUE,  scaleCluster = FALSE,  centerCluster = FALSE,  shift = FALSE,  diagnosis = FALSE,  ...)## S3 method for class 'Seurat'centroidAlign(  object,  reduction = "inmf",  lambda = 1,  useDims = NULL,  scaleEmb = TRUE,  centerEmb = TRUE,  scaleCluster = FALSE,  centerCluster = FALSE,  shift = FALSE,  diagnosis = FALSE,  ...)

Arguments

object

Aliger or Seurat object with valid factorizationresult available (i.e.runIntegration performed in advance).

...

Arguments passed to other S3 methods of this function.

lambda

Ridge regression penalty applied to each dataset. Can be onenumber that applies to all datasets, or a numeric vector with length equal tothe number of datasets. Default1.

useDims

Indices of factors to use considered for the alignment.DefaultNULL uses all factors.

scaleEmb

Logical, whether to scale the factor loading being consideredas the embedding. DefaultTRUE.

centerEmb

Logical, whether to center the factor loading beingconsidered as the embedding before scaling it. DefaultTRUE.

scaleCluster

Logical, whether to scale the factor loading beingconsidered as the cluster assignment probability. DefaultFALSE.

centerCluster

Logical, whether to center the factor loading beingconsidered as the cluster assignment probability before scaling it. DefaultFALSE.

shift

Logical, whether to shift the factor loading being considered asthe cluster assignment probability after centered scaling. DefaultFALSE.

diagnosis

Logical, whether to return cell metadata variables withdiagnostic information. See Details. DefaultFALSE.

reduction

Name of the reduction where LIGER integration result isstored. Default"inmf".

Details

Diagnostic information include:

object$raw_which.max: The index of the factor with the maximum valuein the raw factor loading.
object$R_which.max: The index of the factor with the maximum value inthe soft clustering probability matrix used for correction.
object$Z_which.max: The index of the factor with the maximum value inthe aligned factor loading.

Value

Returns the updated input object

liger method
- Update theH.norm slot for the aligned cell factorloading, ready for running graph based community detection clusteringor dimensionality reduction for visualization.
- Update thecellMata slot with diagnostic information ifdiagnosis = TRUE.
Seurat method
- Update thereductions slot with a newDimReducobject containing the aligned cell factor loading.
- Update the metadata with diagnostic information ifdiagnosis = TRUE.

Examples

pbmc <- centroidAlign(pbmcPlot)

Close all links (to HDF5 files) of a liger object

Description

When need to interact with the data embedded in HDF5 files outof the currect R session, the HDF5 files has to be closed in order to beavailable to other processes.

Usage

closeAllH5(object)## S3 method for class 'liger'closeAllH5(object)## S3 method for class 'ligerDataset'closeAllH5(object)

Arguments

object

liger object.

Value

Nothing is returned.

Check difference of two liger command

Description

Check difference of two liger command

Usage

commandDiff(object, cmd1, cmd2)

Arguments

object

liger object

cmd1,cmd2

Exact string of command labels. Available options could beviewed with runningcommands(object).

Value

If any difference found, a character vector summarizing alldifferences

Examples

pbmc <- normalize(pbmc)pbmc <- normalize(pbmc, log = TRUE, scaleFactor = 1e4)cmds <- commands(pbmc)commandDiff(pbmc, cmds[1], cmds[2])

Convert old liger object to latest version

Description

Convert old liger object to latest version

Usage

convertOldLiger(  object,  dimredName,  clusterName = "clusters",  h5FilePath = NULL)

Arguments

object

liger object from rliger version <1.99.0

dimredName

The name of variable incellMeta slot to store thedimensionality reduction matrix, which originally located intsne.coords slot. Default"tsne.coords".

clusterName

The name of variable incellMeta slot to store theclustering assignment, which originally located inclusters slot.Default"clusters".

h5FilePath

Named list, to specify the path to the H5 file of eachdataset if location has been changed. DefaultNULL looks at the filepaths stored in object.

Examples

## Not run: # Suppose you have a liger object of old version (<1.99.0)newLig <- convertOldLiger(oldLig)## End(Not run)

Access ligerSpatialDataset coordinate data

Description

Similar as how defaultligerDataset data isaccessed.

Usage

coordinate(x, dataset)coordinate(x, dataset, check = TRUE) <- value## S4 method for signature 'liger,character'coordinate(x, dataset)## S4 replacement method for signature 'liger,character'coordinate(x, dataset, check = TRUE) <- value## S4 method for signature 'ligerSpatialDataset,missing'coordinate(x, dataset = NULL)## S4 replacement method for signature 'ligerSpatialDataset,missing'coordinate(x, dataset = NULL, check = TRUE) <- value

Arguments

x

ligerSpatialDataset object or aligerobject.

dataset

Name or numeric index of an spatial dataset.

check

Logical, whether to perform object validity check on setting newvalue.

value

matrix.

Value

The retrieved coordinate matrix or the updatedx object.

Create on-disk ligerDataset Object

Description

For convenience, the defaultformatType = "10x" directly fits thestructure of cellranger output.formatType = "anndata" works forcurrent AnnData H5AD file specification (see Details). If a customized H5file structure is presented, any of therawData,indicesName,indptrName,genesName,barcodesNameshould be specified accordingly to override theformatType preset.

DO make a copy of the H5AD files because rliger functions write tothe files and they will not be able to be read back to Python. This will befixed in the future.

Usage

createH5LigerDataset(  h5file,  formatType = "10x",  rawData = NULL,  normData = NULL,  scaleData = NULL,  barcodesName = NULL,  genesName = NULL,  indicesName = NULL,  indptrName = NULL,  anndataX = "X",  modal = c("default", "rna", "atac", "spatial", "meth"),  featureMeta = NULL,  ...)

Arguments

h5file

Filename of an H5 file

formatType

Select preset of H5 file structure. Default"10X".Alternatively, we also support"anndata" for H5AD files.

rawData,indicesName,indptrName

The path in a H5 file for the rawsparse matrix data. These three types of data stands for thex,i, andp slots of adgCMatrix-classobject. DefaultNULL usesformatType preset.

normData

The path in a H5 file for the "x" vector of the normalizedsparse matrix. DefaultNULL.

scaleData

The path in a H5 file for the Group that contains the sparsematrix constructing information for the scaled data. DefaultNULL.

genesName,barcodesName

The path in a H5 file for the gene names andcell barcodes. DefaultNULL usesformatType preset.

anndataX

The HDF5 path to the raw count data in an H5AD file. SeeDetails. Default"X".

modal

Name of modality for this dataset. Currently options of"default","rna","atac","spatial" and"meth" are supported. Default"default".

featureMeta

Data frame for feature metadata. DefaultNULL.

...

Additional slot data. SeeligerDataset for detail.Given values will be directly placed at corresponding slots.

Details

For H5AD file written from an AnnData object, we allow usingformatType = "anndata" for the function to infer the proper structure.However, while a typical AnnData-based analysis tends to in-place update theadata.X attribute and there is no standard/forced convention for wherethe raw count data, as needed from LIGER, is stored. Therefore, we exposeargumentanndataX for specifying this information. The default value"X" looks foradata.X. If the raw data is stored in a layer,e.g.adata.layers['count'], thenanndataX = "layers/count".If it is stored toadata.raw.X, thenanndataX = "raw/X". Ifyour AnnData object does not have the raw count retained, you will have togo back to the Python work flow to have it inserted at desired object spaceand re-write the H5AD file, or just go from upstream source files with whichthe AnnData was originally created.

Value

H5-basedligerDataset object

Examples

h5Path <- system.file("extdata/ctrl.h5", package = "rliger")tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = tempPath)ld <- createH5LigerDataset(tempPath)

Create liger object

Description

This function allows creatingliger object frommultiple datasets of various forms (SeerawData).

DO make a copy of the H5AD files because rliger functions write tothe files and they will not be able to be read back to Python. This will befixed in the future.

Usage

createLiger(  rawData,  modal = NULL,  organism = "human",  cellMeta = NULL,  removeMissing = TRUE,  addPrefix = "auto",  formatType = "10X",  anndataX = "X",  dataName = NULL,  indicesName = NULL,  indptrName = NULL,  genesName = NULL,  barcodesName = NULL,  newH5 = TRUE,  verbose = getOption("ligerVerbose", TRUE),  ...,  raw.data = rawData,  take.gene.union = NULL,  remove.missing = removeMissing,  format.type = formatType,  data.name = dataName,  indices.name = indicesName,  indptr.name = indptrName,  genes.name = genesName,  barcodes.name = barcodesName)

Arguments

rawData

Named list of datasets. Required. Elements allowed include amatrix, aSeurat object, aSingleCellExperiment object, anAnnData object, aligerDataset object or a filename toan HDF5 file. See detail for HDF5 reading.

modal

Character vector for modality setting. Use one string for alldatasets, or the same number of strings as the number of datasets. Currentlyoptions of"default","rna","atac","spatial"and"meth" are supported.

organism

Character vector for setting organism for identifying mito,ribo and hemo genes for expression percentage calculation. Use one string forall datasets, or the same number of strings as the number of datasets.Currently options of"mouse","human","zebrafish","rat", and"drosophila" are supported.

cellMeta

data.frame of metadata at single-cell level. DefaultNULL.

removeMissing

Logical. Whether to remove cells that do not have anycounts from each dataset. DefaultTRUE.

addPrefix

Logical. Whether to add "datasetName_" as a prefix ofcell identifiers (e.g. barcodes) to avoid duplicates in multiple libraries (common with 10X data). Default"auto" detects if matrix columnsalready has the exact prefix or not. Logical value forces the action.

formatType

Select preset of H5 file structure. Current availableoptions are"10x" and"anndata". Can be either a singlespecification for all datasets or a character vector that match with eachdataset.

anndataX

The HDF5 path to the raw count data in an H5AD file. SeecreateH5LigerDataset Details. Default"X".

dataName,indicesName,indptrName

The path in a H5 file for the rawsparse matrix data. These three types of data stands for thex,i, andp slots of adgCMatrix-classobject. DefaultNULL usesformatType preset.

genesName,barcodesName

The path in a H5 file for the gene names andcell barcodes. DefaultNULL usesformatType preset.

newH5

When using HDF5 based data and subsets created after removingmissing cells/features, whether to create new HDF5 files for the subset.DefaultTRUE. IfFALSE, data will be subset into memory andcan be dangerous for large scale analysis.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

...

Additional slot values that should be directly placed in object.

raw.data,remove.missing,format.type,data.name,indices.name,indptr.name,genes.name,barcodes.name

See Usage section for replacement.

take.gene.union

Will be ignored.

Examples

# Create from raw count matricesctrl.raw <- rawData(pbmc, "ctrl")stim.raw <- rawData(pbmc, "stim")pbmc1 <- createLiger(list(ctrl = ctrl.raw, stim = stim.raw))# Create from H5 filesh5Path <- system.file("extdata/ctrl.h5", package = "rliger")tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = tempPath)lig <- createLiger(list(ctrl = tempPath))# Create from other container objectif (requireNamespace("SeuratObject", quietly = TRUE)) {    ctrl.seu <- SeuratObject::CreateSeuratObject(ctrl.raw)    stim.seu <- SeuratObject::CreateSeuratObject(stim.raw)    pbmc2 <- createLiger(list(ctrl = ctrl.seu, stim = stim.seu))}

Create in-memory ligerDataset object

Description

Create in-memory ligerDataset object

Usage

createLigerDataset(  rawData = NULL,  modal = c("default", "rna", "atac", "spatial", "meth"),  normData = NULL,  scaleData = NULL,  featureMeta = NULL,  ...)

Arguments

rawData,normData,scaleData

AdgCMatrix-classobject for the raw or normalized expression count or a dense matrix of scaledvariable gene expression, respectively. DefaultNULL for all three butat lease one has to be specified.

modal

Name of modality for this dataset. Currently options of"default","rna","atac","spatial" and"meth" are supported. Default"default".

featureMeta

Data frame of feature metadata. DefaultNULL.

...

Additional slot data. SeeligerDataset for detail.Given values will be directly placed at corresponding slots.

Examples

ctrl.raw <- rawData(pbmc, "ctrl")ctrl.ld <- createLigerDataset(ctrl.raw)

Data frame for example marker DEG test result

Description

The data frame is the direct output of marker detection DEG test applied onexample dataset which can be loaded withdata("pbmc"). The DEG testwas done with:

defaultCluster(pbmc) <- pbmcPlot$leiden_clusterdeg.marker <- runMarkerDEG(    pbmc,    minCellPerRep = 5)

The result is for the marker detection test for 8 clusters in the dataset bycomparing each cluster against all other clusters.

Usage

deg.marker

Format

data.frame object of 1992 rows with columns:

feature: gene names, 249 unique genes repeated 8 times for the testsdone for 8 clusters.
group: cluster names, 8 unique cluster names, dividing the tests.
logFC: log fold change of the gene expression between the cluster ofinterest against all other clusters.
pval: p-value of the DEG test.
padj: adjusted p-value of the DEG test.
pct_in: percentage of cells in the cluster of interest expressing thegene.
pct_out: percentage of cells in all other clusters expressing the gene.

Data frame for example pairwise DEG test result

Description

The data frame is the direct output of pairwise DEG test applied on exampledataset which can be loaded withimportPBMC(). Cell type annotationwas obtained from SeuratData package, "ifnb" dataset, since they are thesame. Use the following command to reproduce the same result:

library(rliger)library(Seurat)library(SeuratData)lig <- importPBMC()ifnb <- LoadData("ifnb")lig$cell_type <- ifnb$seurat_annotationslig$condition_cell_type <- interaction(lig$dataset, lig$cell_type, drop = FALSE)deg.pw <- runPairwiseDEG(    object = lig,    groupTest = 'stim.CD14 Mono',    groupCtrl = 'ctrl.CD14 Mono',    variable1 = 'condition_cell_type')deg.pw <- deg.pw[order(deg.pw$padj)[1:1000],]```The result represents the statistics of DEG test between stim dataset againstctrl dataset, within the CD14 monocytes. The result is randomly sampled to1000 entries for minimum demonstration.[1:1000]: R:1:1000

Usage

deg.pw

Format

data.frame object of 1000 rows with columns:

feature: gene names.
group: class name within the variable being used for the test condition.
logFC: log fold change of the gene expression between the condition ofinterest against the control condition.
pval: p-value of the DEG test.
padj: adjusted p-value of the DEG test.
pct_in: percentage of cells in the condition of interest expressing thegene.
pct_out: percentage of cells in the control condition expressing thegene.

Downsample datasets

Description

This function mainly aims at downsampling datasets to a sizesuitable for plotting or expensive in-memmory calculation.

Users can balance the sample size of categories of interests withbalance. Multi-variable specification tobalance is supported,so that at mostmaxCells cells will be sampled from each combinationof categories from the variables. For example, when two datasets arepresented and three clusters labeled across them, there would then be at most2 \times 3 \times maxCells cells being selected. Note that"dataset" will automatically be added as one variable when balancingthe downsampling. However, if users want to balance the downsampling solelybasing on dataset origin, users have to explicitly setbalance ="dataset".

Usage

downsample(  object,  balance = NULL,  maxCells = 1000,  useDatasets = NULL,  seed = 1,  returnIndex = FALSE,  ...)

Arguments

object

liger object

balance

Character vector of categorical variable names incellMeta slot, to subsamplemaxCells cells from eachcombination of all specified variables. DefaultNULL samplesmaxCells cells from the whole object.

maxCells

Max number of cells to sample from the grouping based onbalance.

useDatasets

Index selection of datasets to include DefaultNULL for using all datasets.

seed

Random seed for reproducibility. Default1.

returnIndex

Logical, whether to only return the numeric index that cansubset the original object instead of a subset object. DefaultFALSE.

...

Arguments passed tosubsetLiger, wherecellIdx is occupied by internal implementation.

Value

By default, a subset ofligerobject.Alternatively whenreturnIndex = TRUE, a numeric vector to be usedwith the original object.

Examples

# Subsetting an objectpbmc <- downsample(pbmc)# Creating a subsetting indexsampleIdx <- downsample(pbmcPlot, balance = "leiden_cluster",                        maxCells = 10, returnIndex = TRUE)plotClusterDimRed(pbmcPlot, cellIdx = sampleIdx)

Export predicted gene-pair interaction

Description

Export the predicted gene-pair interactions calculated byupstream functionlinkGenesAndPeaks into an Interact Track filewhich is compatible withUCSCGenome Browser.

Usage

exportInteractTrack(  corrMat,  pathToCoords,  useGenes = NULL,  outputPath = getwd())

Arguments

corrMat

A sparse matrix of correlation with peak names as rows andgene names as columns.

pathToCoords

Path to the gene coordinates file.

useGenes

Character vector of gene names to be exported. DefaultNULL uses all genes available incorrMat.

outputPath

Path of filename where the output file will be stored. Ifa folder, a file named"Interact_Track.bed" will be created. Defaultcurrent working directory.

Value

No return value. A file located atoutputPath will be created.

Examples

bmmc <- normalize(bmmc)bmmc <- selectGenes(bmmc)bmmc <- scaleNotCenter(bmmc)if (requireNamespace("RcppPlanc", quietly = TRUE) &&    requireNamespace("GenomicRanges", quietly = TRUE) &&    requireNamespace("IRanges", quietly = TRUE) &&    requireNamespace("psych", quietly = TRUE)) {    bmmc <- runINMF(bmmc)    bmmc <- alignFactors(bmmc)    bmmc <- normalizePeak(bmmc)    bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna")    corr <- linkGenesAndPeaks(        bmmc, useDataset = "rna",        pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger")    )    resultPath <- tempfile()    exportInteractTrack(        corrMat = corr,        pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger"),        outputPath = resultPath    )    head(read.table(resultPath, skip = 1))}

Test all factors for enrichment in a gene set

Description

This function takes the factorizedW matrix, with gene loading infactors, to get the ranked gene list for each factor. Then it runs simplyimplemented GSEA against given gene sets. So if genes in the given gene setare top loaded in a factor, this function will return high positiveenrichment score (ES) as well as significant p-value.

For the returned result object, useprint() orsummary() toshow concise results, and useplot() to visualize the GSEA statistics.

This function can be useful in various scenarios:

For example, when clusters with strong cell cycle activity are detected,users can apply this function with cell cycle gene sets to identify if anyfactor is enriched with such genes. Then in the downstream when aligning theiNMF factor loadings, users can simply opt to exclude these factors so thevariation in cell cycle is regressed out. Objectscc.gene.human andcc.gene.mouse are deliverered in package for convenience.

In other cases, this function can also be used to understand the biologicalmeaning of each cluster. Since the downstream clustering result is largelydetermined by the top loaded factor in each cell, understanding whatgenes are loaded in the top factor helps understand the identity and activityof the cell. This will require users to have there own gene sets prepared.

Usage

factorGSEA(  object,  geneSet,  nPerm = 1000,  seed = 1,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

Aliger object with factorizedW matrixavailable.

geneSet

A character vector for a single gene set, or a list ofcharacter vectors for multiple gene sets.

nPerm

Integer number for number of permutations to estimate p-value.Default1000.

seed

Integer number for random seed. Default1. Set toNULL to not set seed.

verbose

Logical, whether to print progress bar. DefaultgetOptions('ligerVerbose') otherwiseTRUE.

Value

IfgeneSet is a single character vector, returns a data framewith enrichment score (ES), normalized enrichment score (NES), and p-valuefor the test in each factor. IfgeneSet is a list, returns a list ofsuch data frames.

Examples

pbmc <- pbmc %>%    selectBatchHVG() %>%    scaleNotCenter() %>%    runINMF()factorGSEAres <- factorGSEA(pbmc, ccGeneHuman)# Print summary of significant resultsprint(factorGSEAres)summary(factorGSEAres)# Make GSEA plot for certain gene set and factorplot(factorGSEAres, geneSetName = 'g2m.genes', useFactor = 'Factor_1')

Find shared and dataset-specific markers

Description

Applies various filters to genes on the shared (W) anddataset-specific (V) components of the factorization, before selectingthose which load most significantly on each factor (in a shared ordataset-specific way).

Usage

getFactorMarkers(  object,  dataset1,  dataset2,  factorShareThresh = 10,  datasetSpecificity = NULL,  logFCThresh = 1,  pvalThresh = 0.05,  nGenes = 30,  printGenes = FALSE,  verbose = getOption("ligerVerbose", TRUE),  factor.share.thresh = factorShareThresh,  dataset.specificity = datasetSpecificity,  log.fc.thresh = logFCThresh,  pval.thresh = pvalThresh,  num.genes = nGenes,  print.genes = printGenes)

Arguments

object

liger object with factorization results.

dataset1

Name of first dataset. Required.

dataset2

Name of second dataset. Required

factorShareThresh

Numeric. Only factors with a dataset specificityless than or equal to this threshold will be used. Default10.

datasetSpecificity

Numeric vector. Pre-calculated dataset specificityif available. Length should match number of all factors available. DefaultNULL automatically calculates withcalcDatasetSpecificity.

logFCThresh

Numeric. Lower log-fold change threshold for differentialexpression in markers. Default1.

pvalThresh

Numeric. Upper p-value threshold for Wilcoxon rank test forgene expression. Default0.05.

nGenes

Integer. Max number of genes to report for each dataset.Default30.

printGenes

Logical. Whether to print ordered markers passing logFC,UMI and frac thresholds, whenverbose = TRUE. DefaultFALSE.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

factor.share.thresh,dataset.specificity,log.fc.thresh,pval.thresh,num.genes,print.genes

Deprecated. See Usage section for replacement.

Value

A list object consisting of the following entries:

value of dataset1

data.frame of dataset1-specific markers

shared

data.frame of shared markers

value of dataset1

data.frame of dataset2-specific markers

num_factors_V1

A frequency table indicating the number of factors eachmarker appears, in dataset1

num_factors_V2

A frequency table indicating the number of factors eachmarker appears, in dataset2

Examples

library(dplyr)result <- getFactorMarkers(pbmcPlot, dataset1 = "ctrl", dataset2 = "stim")print(class(result))print(names(result))result$shared %>% group_by(factor_num) %>% top_n(2, logFC)

Calculate proportion mitochondrial contribution

Description

Calculates proportion of mitochondrial contribution based on raw ornormalized data.

Usage

getProportionMito(object, use.norm = FALSE, pattern = "^mt-")

Arguments

object

liger object.

use.norm

Deprecated Whether to use cell normalized data incalculating contribution. DefaultFALSE.

pattern

Regex pattern for identifying mitochondrial genes. Default"^mt-" for mouse.

Value

Named vector containing proportion of mitochondrial contribution foreach cell.

Note

getProportionMito will be deprecated becauserunGeneralQC generally covers and expands its use case.

Examples

# Example dataset does not contain MT genes, expected to see a messagepbmc$mito <- getProportionMito(pbmc)

Import prepared dataset publically available

Description

These are functions to download example datasets that are subset from publicdata.

PBMC - Downsampled from GSE96583, Kang et al, NatureBiotechnology, 2018. Contains two scRNAseq datasets.
BMMC - Downsampled from GSE139369, Granja et al, NatureBiotechnology, 2019. Contains two scRNAseq datasets and one scATAC data.
CGE - Downsampled from GSE97179, Luo et al, Science, 2017.Contains one scRNAseq dataset and one DNA methylation data.

Usage

importPBMC(  dir = getwd(),  overwrite = FALSE,  method = "libcurl",  verbose = getOption("ligerVerbose", TRUE),  ...)importBMMC(  dir = getwd(),  overwrite = FALSE,  method = "libcurl",  verbose = getOption("ligerVerbose", TRUE),  ...)importCGE(  dir = getwd(),  overwrite = FALSE,  method = "libcurl",  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

dir

Path to download datasets. Default current working directorygetwd().

overwrite

Logical, if a file exists at corresponding downloadlocation, whether to re-download or directly use this file. DefaultFALSE.

method

method argument directly passed todownload.file. Using"libcurl" while otheroptions might not work depending on platform.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

...

Additional arguments passed todownload.file

Value

Constructedliger object with QC performed and missingdata removed.

Examples

pbmc <- importPBMC()bmmc <- importBMMC()cge <- importCGE()

Impute the peak counts from gene expression data referring to an ATAC datasetafter integration

Description

This function is designed for creating peak data for a dataset with only geneexpression. This function uses aligned cell factor loading to find nearestneighbors between cells from the queried dataset (without peak) and cellsfrom reference dataset (with peak). And then impute the peak for the formerbasing on the weight. Therefore, the reference dataset selected must be of"atac" modality setting.

Usage

imputeKNN(  object,  reference,  queries = NULL,  nNeighbors = 20,  weight = TRUE,  norm = TRUE,  scale = FALSE,  verbose = getOption("ligerVerbose", TRUE),  ...,  knn_k = nNeighbors)

Arguments

object

liger object with aligned factor loading computedin advance.

reference

Name of a dataset containing peak data to impute into querydataset(s).

queries

Names of datasets to be augmented by imputation. Should notincludereference. DefaultNULL uses all datasets except thereference.

nNeighbors

The maximum number of nearest neighbors to search. Default20.

weight

Logical. Whether to use KNN distances as weight matrix. DefaultFALSE.

norm

Logical. Whether to normalize the imputed data. DefaultTRUE.

scale

Logical. Whether to scale but not center the imputed data.DefaultTRUE.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

...

Optional arguments to be passed tonormalize whennorm = TRUE.

knn_k

Deprecated. See Usage section for replacement.

Value

The inputobject where queriedligerDatasetobjects indatasets slot are replaced. These datasets will all beconverted toligerATACDataset class with an additional slotrawPeak to store the imputed peak counts, andnormPeak fornormalized imputed peak counts ifnorm = TRUE.

Examples

bmmc <- normalize(bmmc)bmmc <- selectGenes(bmmc, datasets.use = "rna")bmmc <- scaleNotCenter(bmmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    bmmc <- runINMF(bmmc, k = 20)    bmmc <- alignFactors(bmmc)    bmmc <- normalizePeak(bmmc)    bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna")}

Check if given liger object if under new implementation

Description

Check if given liger object if under new implementation

Usage

is.newLiger(object)

Arguments

object

A liger object

Value

TRUE if the version ofobject is later than or equal to1.99.0. OtherwiseFALSE. It raises an error if input object is not ofliger class.

Examples

is.newLiger(pbmc) # TRUE

Check if a liger or ligerDataset object is made of HDF5 file

Description

Check if a liger or ligerDataset object is made of HDF5 file

Usage

isH5Liger(object, dataset = NULL)

Arguments

object

A liger or ligerDataset object.

dataset

Ifobject is of liger class, check a specific dataset.IfNULL, Check if all datasets are made of HDF5 file. DefaultNULL.

Value

TRUE orFALSE for the specified check.

Examples

isH5Liger(pbmc)isH5Liger(pbmc, "ctrl")ctrl <- dataset(pbmc, "ctrl")isH5Liger(ctrl)

liger class

Description

liger object is the main data container for LIGERanalysis in R. The slotdatasets is a list where each element shouldbe aligerDataset object containing dataset specificinformation, such as the expression matrices. The other parts of liger objectstores information that can be shared across the analysis, such as the cellmetadata.

This manual provides explanation to theliger object structure as wellas usage of class-specific methods. Please see detail sections for moreinformation.

Forliger objects created with older versions of rliger package,please try updating the objects individually withconvertOldLiger.

Usage

datasets(x, check = NULL)datasets(x, check = TRUE) <- valuedataset(x, dataset = NULL)dataset(x, dataset, type = NULL, qc = TRUE) <- valuecellMeta(  x,  columns = NULL,  useDatasets = NULL,  cellIdx = NULL,  as.data.frame = FALSE,  ...)cellMeta(  x,  columns = NULL,  useDatasets = NULL,  cellIdx = NULL,  inplace = FALSE,  check = FALSE) <- valuedefaultCluster(x, useDatasets = NULL, ...)defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- valuedimReds(x)dimReds(x) <- valuedimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...)dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- valuedefaultDimRed(x, useDatasets = NULL, cellIdx = NULL)defaultDimRed(x) <- valuevarFeatures(x)varFeatures(x, check = TRUE) <- valuevarUnsharedFeatures(x, dataset = NULL)varUnsharedFeatures(x, dataset, check = TRUE) <- valuecommands(x, funcName = NULL, arg = NULL)## S4 method for signature 'liger'show(object)## S4 method for signature 'liger'dim(x)## S4 method for signature 'liger'dimnames(x)## S4 replacement method for signature 'liger,list'dimnames(x) <- value## S4 method for signature 'liger'datasets(x, check = NULL)## S4 replacement method for signature 'liger,logical'datasets(x, check = TRUE) <- value## S4 replacement method for signature 'liger,missing'datasets(x, check = TRUE) <- value## S4 method for signature 'liger,character_OR_NULL'dataset(x, dataset = NULL)## S4 method for signature 'liger,missing'dataset(x, dataset = NULL)## S4 method for signature 'liger,numeric'dataset(x, dataset = NULL)## S4 replacement method for signature 'liger,character,missing,ANY,ligerDataset'dataset(x, dataset, type = NULL, qc = TRUE) <- value## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike'dataset(x, dataset, type = c("rawData", "normData"), qc = FALSE) <- value## S4 replacement method for signature 'liger,character,missing,ANY,NULL'dataset(x, dataset, type = NULL, qc = TRUE) <- value## S3 method for class 'liger'names(x)## S3 replacement method for class 'liger'names(x) <- value## S3 method for class 'liger'length(x)## S3 method for class 'liger'lengths(x, use.names = TRUE)## S4 method for signature 'liger,NULL'cellMeta(  x,  columns = NULL,  useDatasets = NULL,  cellIdx = NULL,  as.data.frame = FALSE,  ...)## S4 method for signature 'liger,character'cellMeta(  x,  columns = NULL,  useDatasets = NULL,  cellIdx = NULL,  as.data.frame = FALSE,  ...)## S4 method for signature 'liger,missing'cellMeta(  x,  columns = NULL,  useDatasets = NULL,  cellIdx = NULL,  as.data.frame = FALSE,  ...)## S4 replacement method for signature 'liger,missing'cellMeta(x, columns = NULL, useDatasets = NULL, cellIdx = NULL, check = FALSE) <- value## S4 replacement method for signature 'liger,character'cellMeta(  x,  columns = NULL,  useDatasets = NULL,  cellIdx = NULL,  inplace = TRUE,  check = FALSE) <- value## S4 method for signature 'liger'rawData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'rawData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'rawData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger'normData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'normData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'normData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger,ANY'scaleData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5Group'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger,character'scaleUnsharedData(x, dataset = NULL)## S4 method for signature 'liger,numeric'scaleUnsharedData(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL'scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5D'scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'liger,ANY,ANY,H5Group'scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'liger,ANY,ANY,ANY'getMatrix(  x,  slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A",    "B", "W", "H.norm", "rawPeak", "normPeak"),  dataset = NULL,  returnList = FALSE)## S4 method for signature 'liger,ANY'getH5File(x, dataset = NULL)## S3 replacement method for class 'liger'x[[i]] <- value## S3 method for class 'liger'x$name## S3 replacement method for class 'liger'x$name <- value## S4 method for signature 'liger'defaultCluster(x, useDatasets = NULL, droplevels = FALSE, ...)## S4 replacement method for signature 'liger,ANY,ANY,character'defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value## S4 replacement method for signature 'liger,ANY,ANY,factor'defaultCluster(x, name = NULL, useDatasets = NULL, droplevels = TRUE, ...) <- value## S4 replacement method for signature 'liger,ANY,ANY,NULL'defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value## S4 method for signature 'liger'dimReds(x)## S4 replacement method for signature 'liger,list'dimReds(x) <- value## S4 method for signature 'liger,missing_OR_NULL'dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...)## S4 method for signature 'liger,index'dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...)## S4 replacement method for signature 'liger,index,ANY,ANY,NULL'dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- value## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike'dimRed(  x,  name = NULL,  useDatasets = NULL,  cellIdx = NULL,  asDefault = NULL,  inplace = FALSE,  ...) <- value## S4 method for signature 'liger'defaultDimRed(x, useDatasets = NULL, cellIdx = NULL)## S4 replacement method for signature 'liger,character'defaultDimRed(x) <- value## S4 method for signature 'liger'varFeatures(x)## S4 replacement method for signature 'liger,ANY,character'varFeatures(x, check = TRUE) <- value## S4 method for signature 'liger,ANY'varUnsharedFeatures(x, dataset = NULL)## S4 replacement method for signature 'liger,ANY,ANY,character'varUnsharedFeatures(x, dataset, check = TRUE) <- value## S3 method for class 'liger'fortify(model, data, ...)## S3 method for class 'liger'c(...)## S4 method for signature 'liger'commands(x, funcName = NULL, arg = NULL)## S4 method for signature 'ligerDataset,missing'varUnsharedFeatures(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,missing,ANY,character'varUnsharedFeatures(x, dataset = NULL, check = TRUE) <- value

Arguments

x,object,model

Aliger object

check

Logical, whether to perform object validity check on setting newvalue. Users are not supposed to setFALSE here.

value

Metadata value to be inserted

dataset

Name or numeric index of a dataset

type

When usingdataset<- with a matrix likevalue,specify what type the matrix is. Choose from"rawData","normData" or"scaleData".

qc

Logical, whether to perform general qc on added new dataset.

columns

The names of available variables incellMeta slot. Whenas.data.frame = TRUE, please use variable names after coercion.

useDatasets

Setter or getter method should only apply on cells inspecified datasets. Any valid character, numeric or logical subscriber isacceptable. DefaultNULL works with all datasets.

cellIdx

Valid cell subscription to subset retrieved variables. DefaultNULL uses all cells.

as.data.frame

Logical, whether to applyas.data.frame on the subscription. DefaultFALSE.

...

See detailed sections for explanation.

inplace

ForcellMeta<- method, whencolumns is forexisting variable anduseDatasets orcellIdx indicate partialinsertion to the object, whether to by default (TRUE) in-place insertvalue into the variable for selected cells or to replace the wholevariable with non-selected part left as NA.

name

The name of available variables incellMeta slot or thename of a new variable to store.

funcName,arg

See Command records section.

use.names

Whether returned vector should be named with dataset names.

slot

Name of slot to retrieve matrix from. Options shown in Usage.

returnList

Logical, whether to force return a list even when only onedataset-specific matrix (i.e. expression matrices, H, V or U) is requested.DefaultFALSE.

i

Name or numeric index of cell meta variable to be replaced

droplevels

Whether to remove unused cluster levels from the factorobject fetched bydefaultCluster(). DefaultFALSE.

asDefault

Whether to set the inserted dimension reduction matrix asdefault for visualization methods. DefaultNULL sets it when nodefault has been set yet, otherwise does not change current default.

data

fortify method required argument. Not used.

Value

See detailed sections for explanetion.

Input liger object updated with replaced/new variable incellMeta(x).

Slots

datasets: list ofligerDataset objects. Use genericdataset,dataset<-,datasets ordatasets<- tointeract with. See detailed section accordingly.
cellMeta: DFrame object for cell metadata. Pre-existingmetadata, QC metrics, cluster labeling and etc. are all stored here. UsegenericcellMeta,cellMeta<-,$,[[]] or[[]]<- to interact with. See detailed section accordingly.
varFeatures: Character vector of names of variable features. Use genericvarFeatures orvarFeatures<- to interact with. See detailedsection accordingly.
W: iNMF output matrix of shared gene loadings for each factor. SeerunIntegration.
H.norm: Matrix of aligned factor loading for each cell. SeealignFactors andrunIntegration.
commands: List ofligerCommand objects. Record ofanalysis. Usecommands to retrieve information. See detailed sectionaccordingly.
uns: List for unstructured meta-info of analyses or presets.
version: Record of version of rliger package

Dataset access

datasets() method only accesses thedatasets slot, the list ofligerDataset objects.dataset() method accesses a singledataset, with subsequent cell metadata updates and checks bonded when addingor modifying a dataset. Therefore, when users want to modify something insidealigerDataset while no cell metadata change should happen, it isrecommended to use:datasets(x)[[name]] <- ligerD for efficiency,though the result would be the same asdataset(x, name) <- ligerD.

length() andnames() methods are implemented to access thenumber and names of datasets.names<- method is supported formodifying dataset names, with taking care of the "dataset" variable in cellmetadata.

Matrix access

Forliger object,rawData(),normData,scaleData() andscaleUnsharedData() methods are exported forusers to access the corresponding feature expression matrix withspecification of one dataset. For retrieving a type of matrix from multipledatasets, please usegetMatrix() method.

When only one matrix is expected to be retrieved bygetMatrix(), thematrix itself will be returned. A list will be returned if multiple matricesis requested (by querying multiple datasets) orreturnList is set toTRUE.

Cell metadata access

Three approaches are provided for access of cell metadata. A generic functioncellMeta is implemented with plenty of options and multi-variableaccessibility. Besides, users can use double-bracket (e.g.ligerObj[[varName]]) or dollor-sign (e.g.ligerObj$nUMI) toaccess or modify single variables.

For users' convenience of generating a customized ggplot with available cellmetadata, the S3 methodfortify.liger is implemented. With this underthe hook, users can create simple ggplots by directly starting withggplot(ligerObj, aes(...)) where cell metadata variables can bedirectly thrown intoaes().

Special partial metadata insertion is implemented specifically for mappingcategorical annotation from sub-population (subset object) back to originalexperiment (full-size object). For example, when sub-clustering andannotation is done for a specific cell-type of cells (stored insubobj) subset from an experiment (stored asobj), users can docellMeta(obj, "sub_ann", cellIdx = colnames(subobj)) <- subobj$sub_annto map the value back, leaving other cells non-annotated with NAs. Plottingwith this variable will then also show NA cells with default grey color.Furthermore, sub-clustering labels for other cell types can also be mappedto the same variable. For example,cellMeta(obj, "sub_ann",cellIdx = colnames(subobj2)) <- subobj2$sub_ann. As long as the labelingvariables are stored as factor class (categorical), the levels (categorynames) will be properly handled and merged. Other situations follow the Rdefault behavior (e.g. categories might be converted to integer numbers ifmapped to numerical variable in the original object). Note that this featureis only available with using the generic functioncellMeta but notwith the`[[` or`$` accessing methods due to syntax reasons.

The genericdefaultCluster works as both getter and setter. As asetter, users can dodefaultCluster(obj) <- "existingVariableName" toset a categorical variable as default cluster used for visualization ordownstream analysis. Users can also dodefaultCluster(obj,"newVarName") <- factorOfLabels to push new labeling into the object and setas default. For getter method, the function returns a factor object of thedefault cluster labeling. ArgumentuseDatasets can be used forrequiring that given or retrieved labeling should match with cells inspecified datasets. We generally don't recommend setting"dataset" asa default cluster because it is a preserved (always existing) field inmetadata and can lead to meaningless result when running analysis thatutilizes both clustering information and the dataset source information.

Dimension reduction access

Currently, low-dimensional representaion of cells, presented as densematrices, are all stored indimReds slot, and can totally be accessedwith genericsdimRed anddimRed<-. Adding a dimRed to theobject looks as simple asdimRed(obj, "name") <- matrixLike. It canbe retrieved back withdimRed(obj, "name"). Similar to having adefault cluster labeling, we also constructed the feature of default dimRed.It can be set withdefaultDimRed(obj) <- "existingMatLikeVar" and thematrix can be retrieved withdefaultDimRed(obj).

Variable feature access

ThevarFeatures slot allows for character vectors of gene names.varFeatures(x) returns this vector andvalue forvarFeatures<- method has to be a character vector orNULL.The replacement method, whencheck = TRUE performs checks on genename consistency check across thescaleData,H,V slotsof innerligerDataset objects as well as theW andH.norm slots of the inputliger object.

Command records

rliger functions, that perform calculation and update theligerobject, will be recorded in aligerCommand object and stored in thecommands slot, a list, ofliger object. Methodcommands() is implemented to retrieve or show the log history.Running withfuncName = NULL (default) returns all command labels.SpecifyingfuncName allows partial matching to all command labelsand returns a subset list (ofligerCommand object) of matches (ortheligerCommand object if only one match found). Ifarg isfurther specified, a subset list of parameters from the matches will bereturned. For example, requesting a list of resolution values used inall louvain cluster attempts:commands(ligerObj, "louvainCluster","resolution")

Dimensionality

For aliger object, the column orientation is assigned forcells. Due to the data structure, it is hard to define a row index for theliger object, which might contain datasets that vary in number ofgenes.

Therefore, forliger objects,dim anddimnames returnsNA/NULL for rows and total cell counts/barcodes for thecolumns.

For direct call ofdimnames<- method,value should be a listwithNULL as the first element and valid cell identifiers as thesecond element. Forcolnames<- method, the character vector of cellidentifiers.rownames<- method is not applicable.

Subsetting

For more detail of subsetting aliger object or aligerDataset object, please check outsubsetLigerandsubsetLigerDataset. Here, we set the S4 method"single-bracket"[ as a quick wrapper to subset aliger object.Note thatj serves as cell subscriptor which can be any valid indexrefering the collection of all cells (i.e.rownames(cellMeta(obj))).Whilei, the feature subscriptor can only be character vector becausethe features for each dataset can vary.... arugments are passed tosubsetLiger so that advanced options are allowed.

Combining multiple liger object

The list ofdatasets slot,the rows ofcellMeta slot and the list ofcommands slot willbe simply concatenated. Variable features invarFeatures slot will betaken a union. TheW andH.norm matrices are not taken intoaccount for now.

Examples

# Methods for base genericspbmcPlotprint(pbmcPlot)dim(pbmcPlot)ncol(pbmcPlot)colnames(pbmcPlot)[1:5]pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10]names(pbmcPlot)length(pbmcPlot)# rliger generics## Retrieving dataset(s), replacement methods availabledatasets(pbmcPlot)dataset(pbmcPlot, "ctrl")dataset(pbmcPlot, 2)## Retrieving cell metadata, replacement methods availablecellMeta(pbmcPlot)head(pbmcPlot[["nUMI"]])## Retrieving dimemtion reduction matrixhead(dimRed(pbmcPlot, "UMAP"))## Retrieving variable features, replacement methods availablevarFeatures(pbmcPlot)## Command record/historypbmcPlot <- scaleNotCenter(pbmcPlot)commands(pbmcPlot)commands(pbmcPlot, funcName = "scaleNotCenter")# S3 methodspbmcPlot2 <- pbmcPlotnames(pbmcPlot2) <- paste0(names(pbmcPlot), 2)c(pbmcPlot, pbmcPlot2)library(ggplot2)ggplot(pbmcPlot, aes(x = UMAP_1, y = UMAP_2)) + geom_point()cellMeta(pbmc)# Add new variablepbmc[["newVar"]] <- 1cellMeta(pbmc)# Change existing variablepbmc[["newVar"]][1:3] <- 1:3cellMeta(pbmc)

Subclass of ligerDataset for ATAC modality

Description

Inherits fromligerDataset class. Contained slotscan be referred with the link.

Slots

rawPeak: sparse matrix
normPeak: sparse matrix

ligerCommand object: Record the input and time of a LIGER function call

Description

ligerCommand object: Record the input and time of a LIGER function call

Usage

## S4 method for signature 'ligerCommand'show(object)

Arguments

object

AligerCommand object

Slots

funcName: Name of the function
time: A time stamp object
call: A character string converted from system call
parameters: List of all arguments except theliger object.Large object are summarized to short string.
objSummary: List of attributes of theliger object as asnapshot when command is operated.
ligerVersion: Character string converted frompackageVersion("rliger").
dependencyVersion: Named character vector of version number, if anydependency library has a chance to be included by the function. Adependency might only be invoked under certain conditions, such as usingan alternative algorithm, which a call does not actually reach to, but itwould still be included for this call.

Examples

pbmc <- normalize(pbmc)cmd <- commands(pbmc, "normalize")cmd

ligerDataset class

Description

Object for storing dastaset specific information. Will be embedded within ahigher levelliger object

Usage

rawData(x, dataset = NULL)rawData(x, dataset = NULL, check = TRUE) <- valuenormData(x, dataset = NULL)normData(x, dataset = NULL, check = TRUE) <- valuescaleData(x, dataset = NULL)scaleData(x, dataset = NULL, check = TRUE) <- valuescaleUnsharedData(x, dataset = NULL)scaleUnsharedData(x, dataset = NULL, check = TRUE) <- valuegetMatrix(x, slot = "rawData", dataset = NULL, returnList = FALSE)h5fileInfo(x, info = NULL)h5fileInfo(x, info = NULL, check = TRUE) <- valuegetH5File(x, dataset = NULL)## S4 method for signature 'ligerDataset,missing'getH5File(x, dataset = NULL)featureMeta(x, check = NULL)featureMeta(x, check = TRUE) <- value## S4 method for signature 'ligerDataset'show(object)## S4 method for signature 'ligerDataset'dim(x)## S4 method for signature 'ligerDataset'dimnames(x)## S4 replacement method for signature 'ligerDataset,list'dimnames(x) <- value## S4 method for signature 'ligerDataset'rawData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL'rawData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D'rawData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset'normData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL'normData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D'normData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset,missing'scaleData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5Group'scaleData(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset,missing'scaleUnsharedData(x, dataset = NULL)## S4 replacement method for signature 'ligerDataset,missing,ANY,matrixLike_OR_NULL'scaleUnsharedData(x, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,missing,ANY,H5D'scaleUnsharedData(x, check = TRUE) <- value## S4 replacement method for signature 'ligerDataset,missing,ANY,H5Group'scaleUnsharedData(x, check = TRUE) <- value## S4 method for signature 'ligerDataset,ANY,missing,missing'getMatrix(  x,  slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A",    "B"),  dataset = NULL)## S4 method for signature 'ligerDataset'h5fileInfo(x, info = NULL)## S4 replacement method for signature 'ligerDataset'h5fileInfo(x, info = NULL, check = TRUE) <- value## S4 method for signature 'ligerDataset'featureMeta(x, check = NULL)## S4 replacement method for signature 'ligerDataset'featureMeta(x, check = TRUE) <- value## S3 method for class 'ligerDataset'cbind(x, ..., deparse.level = 1)## S4 method for signature 'ligerATACDataset,ANY,missing,missing'getMatrix(  x,  slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A",    "B", "rawPeak", "normPeak"),  dataset = NULL)

Arguments

x,object

AligerDataset object.

dataset

Not applicable forligerDataset methods.

check

Whether to perform object validity check on setting new value.

value

See detail sections for requirements

slot

The slot name when usinggetMatrix.

returnList

Not applicable forligerDataset methods.

info

Name of the entry inh5fileInfo slot.

...

See detailed sections for explanation.

deparse.level

Not used here.

Slots

rawData: Raw data. Feature by cell matrix. Most of the time, sparsematrix of integer numbers for RNA and ATAC data.
normData: Normalized data. Feature by cell matrix. Sparse if therawData it is normalized from is sparse.
scaleData: Scaled data, usually with subset shared variable features, bycells. Most of the time sparse matrix of float numbers. This is the data usedfor iNMF factorization.
scaleUnsharedData: Scaled data of variable features not shared withother datasets. This is the data used for UINMF factorization.
varUnsharedFeatures: Variable features not shared with other datasets.
V: iNMF output matrix holding the dataset specific gene loading of eachfactor. Feature by factor matrix.
A: Online iNMF intermediate product matrix.
B: Online iNMF intermediate product matrix.
H: iNMF output matrix holding the factor loading of each cell. Factor bycell matrix.
U: UINMF output matrix holding the unshared variable gene loading ofeach factor. Feature by factor matrix.
h5fileInfo: list of meta information of HDF5 file used for constructingthe object.
featureMeta: Feature metadata, DataFrame object.
colnames: Character vector of unique cell identifiers.
rownames: Character vector of unique feature names.

Matrix access

ForligerDataset object,rawData(),normData,scaleData() andscaleUnsharedData() methods are exported forusers to access the corresponding feature expression matrix. Replacementmethods are also available to modify the slots.

For other matrices, such as theH andV, which are datasetspecific, please usegetMatrix() method with specifying slot name.Directly accessing slot with@ is generally not recommended.

H5 file and information access

AligerDataset object has a slot calledh5fileInfo, which is alist object. The first element is called$H5File, which is anH5File class object and is the connection to the input file. Thesecond element is$filename which stores the absolute path of the H5file in the current machine. The third element$formatType stores thename of preset being used, if applicable. The other following keys pair withpaths in the H5 file that point to specific data for constructing a featureexpression matrix.

h5fileInfo() method access the list described above and simplyretrieves the corresponding value. Wheninfo = NULL, returns the wholelist. Whenlength(info) == 1, returns the requested list value. Whenmore info requested, returns a subset list.

The replacement method modifies the list elements and corresponding slotvalue (if applicable) at the same time. For example, runningh5fileInfo(obj, "rawData") <- newPath not only updates the list, butalso updates therawData slot with theH5D class data at"newPath" in theH5File object.

getH5File() is a wrapper and is equivalent toh5fileInfo(obj, "H5File").

Feature metadata access

A slotfeatureMeta is included for eachligerDataset object.This slot requires aDataFrame-class object, whichis the same ascellMeta slot of aliger object. However,the associated S4 methods only include access to the whole table for now.Internal information access follows the same way as data.frame operation.For example,featureMeta(ligerD)$nCell orfeatureMeta(ligerD)[varFeatures(ligerObj), "gene_var"].

Dimensionality

For aligerDataset object, the column orientation is assigned forcells and rows are for features. Therefore, forligerDataset objects,dim() returns a numeric vector of two numbers which are number offeatures and number of cells.dimnames() returns a list of twocharacter vectors, which are the feature names and the cell barcodes.

For direct call ofdimnames<- method,value should be a listwith a character vector of feature names as the first element and cellidentifiers as the second element. Forcolnames<- method, thecharacter vector of cell identifiers. Forrownames<- method, thecharacter vector of feature names.

Subsetting

For more detail of subsetting aliger object or aligerDataset object, please check outsubsetLigerandsubsetLigerDataset. Here, we set the S3 method"single-bracket"[ as a quick wrapper to subset aligerDatasetobject.i andj serves as feature and cell subscriptor,respectively, which can be any valid index refering the available featuresand cells in a dataset.... arugments are passed tosubsetLigerDataset so that advanced options are allowed.

Concatenate ligerDataset

cbind() method is implemented for concatenatingligerDatasetobjects by cells. When applying, all feature expression matrix will be mergedwith taking a union of all features for the rows.

Examples

ctrl <- dataset(pbmc, "ctrl")# Methods for base genericsctrlprint(ctrl)dim(ctrl)ncol(ctrl)nrow(ctrl)colnames(ctrl)[1:5]rownames(ctrl)[1:5]ctrl[1:5, 1:5]# rliger generics## raw datam <- rawData(ctrl)class(m)dim(m)## normalized datapbmc <- normalize(pbmc)ctrl <- dataset(pbmc, "ctrl")m <- normData(ctrl)class(m)dim(m)## scaled datapbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)ctrl <- dataset(pbmc, "ctrl")m <- scaleData(ctrl)class(m)dim(m)n <- scaleData(pbmc, "ctrl")identical(m, n)## Any other matricesif (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- runOnlineINMF(pbmc, k = 20, minibatchSize = 100)    ctrl <- dataset(pbmc, "ctrl")    V <- getMatrix(ctrl, "V")    V[1:5, 1:5]    Vs <- getMatrix(pbmc, "V")    length(Vs)    names(Vs)    identical(Vs$ctrl, V)}

Subclass of ligerDataset for Methylation modality

Description

Inherits fromligerDataset class. Contained slotscan be referred with the link.scaleNotCenter applied ondatasets of this class will automatically be taken by reversing thenormalized data instead of scaling the variable features.

Subclass of ligerDataset for RNA modality

Description

Inherits fromligerDataset class. Contained slotscan be referred with the link. This subclass does not have any different fromthe defaultligerDataset class except the class name.

Subclass of ligerDataset for Spatial modality

Description

Inherits fromligerDataset class. Contained slotscan be referred with the link.

Slots

coordinate: dense matrix

Convert between liger and Seurat object

Description

For converting aliger object to a Seurat object, therawData,normData, andscaleData from each dataset,thecellMeta,H.norm andvarFeatures slot will beincluded. Compatible with V4 and V5. It is not recommended to use thisconversion if yourliger object contains datasets fromvarious modalities.

Usage

ligerToSeurat(  object,  assay = NULL,  identByDataset = FALSE,  merge = FALSE,  nms = NULL,  renormalize = NULL,  use.liger.genes = NULL,  by.dataset = identByDataset)

Arguments

object

Aliger object to be converted

assay

Name of assay to store the data. DefaultNULL detects bydataset modality. If the object contains various modality, default to"LIGER". Default dataset modality setting is understood as"RNA".

identByDataset

Logical, whether to combine dataset variable anddefault cluster labeling to set the Idents. DefaultFALSE.

merge

Logical, whether to merge layers of different datasets into one.Not recommended. DefaultFALSE.

nms

Will be ignored because new objectstructure does not have related problem.

renormalize

Will be ignored becausesince Seurat V5, layers of data can exist at the same time and it is betterto left it for users to do it by themselves.

use.liger.genes

Will be ignored andwill always set LIGER variable features to the place.

by.dataset

. UseidentByDataset instead.

Value

Always returns Seurat object(s) of the latest version. By default aSeurat object with split layers, e.g. with layers like "counts.ctrl" and"counts.stim". Ifmerge = TRUE, return a single Seurat object withlayers for all datasets merged.

Examples

if (requireNamespace("SeuratObject", quietly = TRUE) &&    requireNamespace("Seurat", quietly = TRUE)) {    seu <- ligerToSeurat(pbmc)}

Linking genes to putative regulatory elements

Description

Evaluate the relationships between pairs of genes and peaksbased on specified distance metric. Usually used for inferring thecorrelation between gene expression and imputed peak counts for datasetswithout the modality originally (i.e. applied toimputeKNNresult).

Usage

linkGenesAndPeaks(  object,  useDataset,  pathToCoords,  useGenes = NULL,  method = c("spearman", "pearson", "kendall"),  alpha = 0.05,  verbose = getOption("ligerVerbose", TRUE),  path_to_coords = pathToCoords,  genes.list = useGenes,  dist = method)

Arguments

object

Aliger object, with datasets that is ofligerATACDataset class in thedatasets slot.

useDataset

Name of one dataset, with both normalized gene expressionand normalized peak counts available.

pathToCoords

Path tothe gene coordinates file, usually a BED file.

useGenes

Character vector of gene names to be tested. DefaultNULL uses all genes available inuseDataset.

method

Choose the type of correlation to calculate, from"spearman","pearson" and"kendall". Default"spearman"

alpha

Numeric, significance threshold for correlation p-value.Peak-gene correlations with p-values below this threshold are consideredsignificant. Default0.05.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

path_to_coords,genes.list,dist

Deprecated. See Usage sectionfor replacement.

Value

A sparse matrix with peak names as rows and gene names as columns,with each element indicating the correlation between peak i and gene j, 0 ifthe gene and peak are not significantly linked.

Examples

if (requireNamespace("RcppPlanc", quietly = TRUE) &&    requireNamespace("GenomicRanges", quietly = TRUE) &&    requireNamespace("IRanges", quietly = TRUE) &&    requireNamespace("psych", quietly = TRUE)) {    bmmc <- normalize(bmmc)    bmmc <- selectGenes(bmmc)    bmmc <- scaleNotCenter(bmmc)    bmmc <- runINMF(bmmc, miniBatchSize = 100)    bmmc <- alignFactors(bmmc)    bmmc <- normalizePeak(bmmc)    bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna")    corr <- linkGenesAndPeaks(        bmmc, useDataset = "rna",        pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger")    )}

Louvain algorithm for community detection

Description

After quantile normalization, users can additionally run the Louvainalgorithm for community detection, which is widely used in single-cellanalysis and excels at merging small clusters into broad cell classes.

Arguments

object

liger object. Should run quantile_norm before calling.

k

The maximum number of nearest neighbours to compute. (default 20)

resolution

Value of the resolution parameter, use a value above(below) 1.0 if you want to obtain a larger (smaller) number of communities.(default 1.0)

prune

Sets the cutoff for acceptable Jaccard index whencomputing the neighborhood overlap for the SNN construction. Any edges withvalues less than or equal to this will be set to 0 and removed from the SNNgraph. Essentially sets the strigency of pruning (0 — no pruning, 1 —prune everything). (default 1/15)

eps

The error bound of the nearest neighbor search. (default 0.1)

nRandomStarts

Number of random starts. (default 10)

nIterations

Maximal number of iterations per random start. (default100)

random.seed

Seed of the random number generator. (default 1)

verbose

Print messages (TRUE by default)

dims.use

Indices of factors to use for clustering. DefaultNULLuses all available factors.

Value

object with refined cluster assignment updated in"louvain_cluster" variable incellMeta slot. Can be fetchedwithobject$louvain_cluster

Fast calculation of feature count matrix

Description

Fast calculation of feature count matrix

Usage

makeFeatureMatrix(bedmat, barcodes)

Arguments

bedmat

A feature count list generated by bedmap

barcodes

A list of barcodes

Value

A feature count matrix with features as rows and barcodes ascolumns

Examples

## Not run: gene.counts <- makeFeatureMatrix(genes.bc, barcodes)promoter.counts <- makeFeatureMatrix(promoters.bc, barcodes)samnple <- gene.counts + promoter.counts## End(Not run)

Deprecated functions in packagerliger.

Description

The functions listed below are deprecated and will be defunct inthe near future. When possible, alternative functions with similarfunctionality or a replacement are also mentioned. Help pages fordeprecated functions are available athelp("<function>-deprecated").

Usage

makeInteractTrack(  corr.mat,  path_to_coords,  genes.list = NULL,  output_path = getwd())louvainCluster(  object,  resolution = 1,  k = 20,  prune = 1/15,  eps = 0.1,  nRandomStarts = 10,  nIterations = 100,  random.seed = 1,  verbose = getOption("ligerVerbose", TRUE),  dims.use = NULL)optimizeALS(  object,  k,  lambda = 5,  thresh = NULL,  max.iters = 30,  nrep = 1,  H.init = NULL,  W.init = NULL,  V.init = NULL,  use.unshared = FALSE,  rand.seed = 1,  print.obj = NULL,  verbose = TRUE,  ...)online_iNMF(  object,  X_new = NULL,  projection = FALSE,  W.init = NULL,  V.init = NULL,  H.init = NULL,  A.init = NULL,  B.init = NULL,  k = 20,  lambda = 5,  max.epochs = 5,  miniBatch_max_iters = 1,  miniBatch_size = 5000,  h5_chunk_size = 1000,  seed = 123,  verbose = TRUE)quantile_norm(  object,  quantiles = 50,  ref_dataset = NULL,  min_cells = 20,  knn_k = 20,  dims.use = NULL,  do.center = FALSE,  max_sample = 1000,  eps = 0.9,  refine.knn = TRUE,  clusterName = "H.norm_cluster",  rand.seed = 1,  verbose = getOption("ligerVerbose", TRUE))makeRiverplot(  object,  cluster1,  cluster2,  cluster_consensus = NULL,  min.frac = 0.05,  min.cells = 10,  river.yscale = 1,  river.lty = 0,  river.node_margin = 0.1,  label.cex = 1,  label.col = "black",  lab.srt = 0,  river.usr = NULL,  node.order = "auto")

`makeInteractTrack`

FormakeInteractTrack, useexportInteractTrack.

`louvainCluster`

ForlouvainCluster, userunCluster(method = "louvain")as the replacement, whilerunCluster with defaultmethod = "leiden" is more recommended.

`optimizeALS`

ForoptimizeALS, userunIntegration orrunINMF. For the case ofoptimizeALS(use.unshared = TRUE), userunIntegrationwithmethod = "UINMF" orrunUINMF instead.

`online_iNMF`

Foronline_iNMF, userunIntegration withmethod = "online" orrunOnlineINMF.

`quantile_norm`

Forquantile_norm, usequantileNorm.

`makeRiverplot`

FormakeRiverplot, useplotSankey as the replacement.

Export predicted gene-pair interaction

Description

Export the predicted gene-pair interactions calculated byupstream functionlinkGenesAndPeaks into an Interact Track filewhich is compatible withUCSCGenome Browser.

Arguments

corr.mat

A sparse matrix of correlation with peak names as rows andgene names as columns.

path_to_coords

Path to the gene coordinates file.

genes.list

Character vector of gene names to be exported. DefaultNULL uses all genes available incorrMat.

output_path

Path of filename where the output file will be stored. Ifa folder, a file named"Interact_Track.bed" will be created. Defaultcurrent working directory.

Value

No return value. A file located atoutputPath will be created.

Generate a river (Sankey) plot

Description

Creates a riverplot to show how separate cluster assignments from twodatasets map onto a joint clustering. The joint clustering is by default theobject clustering, but an external one can also be passed in. Uses theriverplot package to construct riverplot object and then plot.

Arguments

object

liger object. Should run quantileAlignSNF before calling.

cluster1

Cluster assignments for dataset 1. Note that cluster namesshould be distinct across datasets.

cluster2

Cluster assignments for dataset 2. Note that cluster namesshould be distinct across datasets.

cluster_consensus

Optional external consensus clustering (to useinstead of object clusters)

min.frac

Minimum fraction of cluster for edge to be shown (default0.05).

min.cells

Minumum number of cells for edge to be shown (default 10).

river.yscale

y-scale to pass to riverplot – scales the edge withvalues by this factor, can be used to squeeze vertically (default 1).

river.lty

Line style to pass to riverplot (default 0).

river.node_margin

Node_margin to pass to riverplot – how muchvertical space to keep between the nodes (default 0.1).

label.cex

Size of text labels (default 1).

label.col

Color of text labels (defualt "black").

lab.srt

Angle of text labels (default 0).

river.usr

Coordinates at which to draw the plot in form (x0, x1, y0,y1).

node.order

Order of clusters in each set (list with three vectors ofordinal numbers). By default will try to automatically order themappropriately.

Value

object with refined cluster assignment updated in"louvain_cluster" variable incellMeta slot. Can be fetchedwithobject$louvain_cluster

Create new variable from categories in cellMeta

Description

Designed for fast variable creation when a new variable is going to becreated from existing variable. For example, multiple samples can be mappedto the same study design condition, clusters can be mapped to cell types.

Usage

mapCellMeta(object, from, newTo = NULL, ...)

Arguments

object

Aliger object.

from

The name of the original variable to be mapped from.

newTo

The name of the new variable to store the mapped result. DefaultNULL returns the new variable (factor class).

...

Mapping criteria, argument names are original existing categoriesin thefrom and values are new categories in the new variable.

Value

WhennewTo = NULL, a factor object of the new variable.Otherwise, the input object with variablenewTo updated incellMeta(object).

Examples

pbmc <- mapCellMeta(pbmc, from = "dataset", newTo = "modal",                    ctrl = "rna", stim = "rna")

Merge hdf5 files

Description

This function merges hdf5 files generated from differentlibraries (cell ranger by default) before they are preprocessed through Ligerpipeline.

Usage

mergeH5(  file.list,  library.names,  new.filename,  format.type = "10X",  data.name = NULL,  indices.name = NULL,  indptr.name = NULL,  genes.name = NULL,  barcodes.name = NULL)

Arguments

file.list

List of path to hdf5 files.

library.names

Vector of library names (corresponding to file.list)

new.filename

String of new hdf5 file name after merging (defaultnew.h5).

format.type

string of HDF5 format (10X CellRanger by default).

data.name

Path to the data values stored in HDF5 file.

indices.name

Path to the indices of data points stored in HDF5 file.

indptr.name

Path to the pointers stored in HDF5 file.

genes.name

Path to the gene names stored in HDF5 file.

barcodes.name

Path to the barcodes stored in HDF5 file.

Value

Directly generates newly merged hdf5 file.

Examples

## Not run: # For instance, we want to merge two datasets saved in HDF5 files (10X# CellRanger) paths to datasets: "library1.h5","library2.h5"# dataset names: "lib1", "lib2"# name for output HDF5 file: "merged.h5"mergeH5(list("library1.h5","library2.h5"), c("lib1","lib2"), "merged.h5")## End(Not run)

Merge matrices while keeping the union of rows

Description

mergeSparseAll takes in a list of DGEs, with genes asrows and cells as columns, and merges them into a single DGE. Also addslibraryNames to colnames from each DGE if expected to be overlap(common with 10X barcodes). Values inrawData ornormDataslot of aligerDataset object can be processed with this.

For a list of dense matrices, usually the values inscaleData slot ofaligerDataset object, please usemergeDenseAll whichworks in the same way.

Usage

mergeSparseAll(  datalist,  libraryNames = NULL,  mode = c("union", "intersection"))mergeDenseAll(datalist, libraryNames = NULL)

Arguments

datalist

List of dgCMatrix formergeSparseAll or a list ofmatrix formergeDenseAll.

libraryNames

Character vector to be added as the prefix for thebarcodes in each matrix indatalist. Length should match with thenumber of matrices. DefaultNULL do not modify the barcodes.

mode

Whether to take the"union" or"intersection" offeatures when merging. Default"union".

Value

dgCMatrix or matrix with all barcodes indatalist as columnsand the union of genes indatalist as rows.

Examples

rawDataList <- getMatrix(pbmc, "rawData")merged <- mergeSparseAll(rawDataList, libraryNames = names(pbmc))

Return preset modality of a ligerDataset object or that of all datasets in aliger object

Description

Return preset modality of a ligerDataset object or that of all datasets in aliger object

Usage

modalOf(object)

Arguments

object

aligerDataset object or aligerobject

Value

A single character of modality setting value forligerDatasetobject, or a named vector forliger object, where the names are dataset names.

Examples

modalOf(pbmc)ctrl <- dataset(pbmc, "ctrl")modalOf(ctrl)ctrl.atac <- as.ligerDataset(ctrl, modal = "atac")modalOf(ctrl.atac)

Normalize raw counts data

Description

Perform library size normalization on raw counts input. As forthe preprocessing step of iNMF integration, by default we don't multiply thenormalized values with a scale factor, nor do we take the log transformation.Applicable S3 methods can be found in Usage section.

normalizePeak is designed for datasets of "atac" modality, i.e. storedinligerATACDataset. S3 method for various container object isnot supported yet due to difference in architecture design.

Usage

normalize(object, ...)## S3 method for class 'matrix'normalize(object, log = FALSE, scaleFactor = NULL, ...)## S3 method for class 'dgCMatrix'normalize(object, log = FALSE, scaleFactor = NULL, ...)## S3 method for class 'DelayedArray'normalize(  object,  log = FALSE,  scaleFactor = NULL,  chunk = getOption("ligerChunkSize", 20000),  overwrite = FALSE,  returnStats = FALSE,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'ligerDataset'normalize(  object,  chunk = getOption("ligerChunkSize", 20000),  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'liger'normalize(  object,  useDatasets = NULL,  verbose = getOption("ligerVerbose", TRUE),  format.type = NULL,  remove.missing = NULL,  ...)## S3 method for class 'Seurat'normalize(object, assay = NULL, layer = "counts", save = "ligerNormData", ...)normalizePeak(  object,  useDatasets = NULL,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

liger object

...

Arguments to be passed to S3 methods. The "liger" method callsthe "ligerDataset" method, which then calls "dgCMatrix" method.normalizePeak directly callsnormalize.dgCMatrix.

log

Logical. Whether to do alog(x + 1) transform on thenormalized data. DefaultTRUE.

scaleFactor

Numeric. Scale the normalized expression value by thisfactor before transformation.NULL for not scaling. Default1e4.

chunk

Integer. Number of maximum number of cells in each chunk whenworking on HDF5 file based ligerDataset. Default20000.

overwrite

Logical. When writing newly computed HDF5 array to aseparate HDF5 file, whether to overwrite the existing file. DefaultFALSE raises an error when the file already exists.

returnStats

Logical. Used in LIGER internal workflow to allowcaptureing precalculated statistics for downstream use. DefaultFALSEonly returns the normalized data for DelayedArray method.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be normalized. Should specify ATACseqdatasets when usingnormalizePeak. DefaultNULL normalizes allvalid datasets.

format.type,remove.missing

Deprecated. The functionality ofthese is covered through other parts of the whole workflow and is no longneeded. Will be ignored if specified.

assay

Name of assay to use. DefaultNULL uses current activeassay.

layer

Where the input raw counts should be from. Default"counts". For older Seurat, always retrieve fromcounts slot.

save

For Seurat>=4.9.9, the name of layer to store normalized data.Default"ligerNormData". For older Seurat, stored todata slot.

Value

Updatedobject.

dgCMatrix method - Returns processed dgCMatrix object
ligerDataset method - Updates thenormData slot of the object
liger method - Updates thenormData slot of chosen datasets
Seurat method - Adds a named layer in chosen assay (V5), or update thedata slot of the chosen assay (<=V4)
normalizePeak - Updates thenormPeak slot of chosendatasets.

Examples

pbmc <- normalize(pbmc)

Perform online iNMF on scaled datasets

Description

Please turn torunOnlineINMF orrunIntegration.

Perform online integrative non-negative matrix factorization to representmultiple single-cell datasets in terms of H, W, and V matrices. It optimizesthe iNMF objective function using online learning (non-negative least squaresfor H matrix, hierarchical alternating least squares for W and V matrices),where the number of factors is set by k. The function allows online learningin 3 scenarios: (1) fully observed datasets; (2) iterative refinement usingcontinually arriving datasets; and (3) projection of new datasets withoutupdating the existing factorization. All three scenarios require fixed memoryindependent of the number of cells.

For each dataset, this factorization produces an H matrix (cells by k), a Vmatrix (k by genes), and a shared W matrix (k by genes). The H matricesrepresent the cell factor loadings. W is identical among all datasets, as itrepresents the shared components of the metagenes across datasets. The Vmatrices represent the dataset-specific components of the metagenes.

Arguments

object

liger object with data stored in HDF5 files. Shouldnormalize, select genes, and scale before calling.

X_new

List of new datasets for scenario 2 or scenario 3. Each listelement should be the name of an HDF5 file.

projection

Perform data integration by shared metagene (W) projection(scenario 3). (default FALSE)

W.init

Optional initialization for W. (default NULL)

V.init

Optional initialization for V (default NULL)

H.init

Optional initialization for H (default NULL)

A.init

Optional initialization for A (default NULL)

B.init

Optional initialization for B (default NULL)

k

Inner dimension of factorization–number of metagenes (default 20).A value in the range 20-50 works well for most analyses.

lambda

Regularization parameter. Larger values penalizedataset-specific effects morestrongly (ie. alignment should increase as lambda increases). We recommendalways using the default value exceptpossibly for analyses with relatively small differences (biologicalreplicates, male/female comparisons, etc.)in which case a lower value such as 1.0 may improve reconstruction quality.(default 5.0).

max.epochs

Maximum number of epochs (complete passes through thedata). (default 5)

miniBatch_max_iters

Maximum number of block coordinate descent (HALSalgorithm) iterations to perform for each update of W and V (default 1).Changing this parameter is not recommended.

miniBatch_size

Total number of cells in each minibatch (default 5000).This is a reasonable default, but a smaller value such as 1000 may benecessary for analyzing very small datasets. In general, minibatch sizeshould be no larger than the number of cells in the smallest dataset.

h5_chunk_size

Chunk size of input hdf5 files (default 1000). The chunksize should be no larger than the batch size.

seed

Random seed to allow reproducible results (default 123).

verbose

Print progress bar/messages (TRUE by default)

Value

liger object with H, W, V, A and B slots set.

Perform iNMF on scaled datasets

Description

Please turn torunINMF orrunIntegration.

Perform integrative non-negative matrix factorization to return factorized H,W, and V matrices. It optimizes the iNMF objective function using blockcoordinate descent (alternating non-negative least squares), where the numberof factors is set by k. TODO: include objective function equation here indocumentation (using deqn)

For each dataset, this factorization produces an H matrix (cells by k), a Vmatrix (k by genes), and a shared W matrix (k by genes). The H matricesrepresent the cell factor loadings. W is held consistent among all datasets,as it represents the shared components of the metagenes across datasets. TheV matrices represent the dataset-specific components of the metagenes.

Arguments

object

liger object. Should normalize, select genes, and scalebefore calling.

k

Inner dimension of factorization (number of factors). Run suggestKto determine appropriate value; a general rule of thumb is that a higher kwill be needed for datasets with more sub-structure.

lambda

Regularization parameter. Larger values penalizedataset-specific effects more strongly (ie. alignment should increase aslambda increases). Run suggestLambda to determine most appropriate value forbalancing dataset alignment and agreement (default 5.0).

thresh

Convergence threshold. Convergence occurs when|obj0-obj|/(mean(obj0,obj)) < thresh. (default 1e-6)

max.iters

Maximum number of block coordinate descent iterations toperform (default 30).

nrep

Number of restarts to perform (iNMF objective function isnon-convex, so taking the best objective from multiple successiveinitializations is recommended). For easier reproducibility, this incrementsthe random seed by 1 for each consecutive restart, so future factorizationsof the same dataset can be run with one rep if necessary. (default 1)

H.init

Initial values to use for H matrices. (default NULL)

W.init

Initial values to use for W matrix (default NULL)

V.init

Initial values to use for V matrices (default NULL)

rand.seed

Random seed to allow reproducible results (default 1).

print.obj

Print objective function values after convergence (defaultFALSE).

verbose

Print progress bar/messages (TRUE by default)

...

Arguments passed to other methods

Value

liger object with H, W, and V slots set.

Perform factorization for new data

Description

Uses an efficient strategy for updating that takes advantage ofthe information in the existing factorization. Assumes that variable featuresare presented in the new datasets. Two modes are supported (controlled bymerge):

Append new data to existing datasets specified byuseDatasets.Here the existingV matrices for the target datasets will directly beused as initialization, and newH matrices for the merged matrices willbe initialized accordingly.
Set new data as new datasets. InitialV matrices for them willbe copied from datasets specified byuseDatasets, and newHmatrices will be initialized accordingly.

Usage

optimizeNewData(  object,  dataNew,  useDatasets,  merge = TRUE,  lambda = NULL,  nIteration = 30,  seed = 1,  verbose = getOption("ligerVerbose"),  new.data = dataNew,  which.datasets = useDatasets,  add.to.existing = merge,  max.iters = nIteration,  thresh = NULL)

Arguments

object

Aliger object. Should have integrativefactorization performed e.g. (runINMF) in advance.

dataNew

Named list ofraw count matrices, genes by cells.

useDatasets

Selection of datasets to append new data to ifmerge = TRUE, or the datasets to inheritV matrices from andinitialize the optimization whenmerge = FALSE. Should match thelength and order ofdataNew.

merge

Logical, whether to add the new data to existingdatasets or treat as totally new datasets (i.e. calculate newVmatrices). DefaultTRUE.

lambda

Numeric regularization parameter. By defaultNULL, thiswill use the lambda value used in the latest factorization.

nIteration

Number of block coordinate descent iterations to perform.Default30.

seed

Random seed to allow reproducible results. Default1. UsedbyrunINMF factorization.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") which isTRUE if users have not set.

new.data,which.datasets,add.to.existing,max.iters

These arguments arenow replaced by others and will be removed in the future. Please see usagefor replacement.

thresh

Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enoughnIteration will bring it to convergence.

Value

object withW slot updated with the newWmatrix, and theH andV slots of eachligerDataset object in thedatasets slot updated withthe new dataset specificH andV matrix, respectively.

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)# Only running a few iterations for fast examplesif (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)    # Create fake new data by increasing all non-zero count in "ctrl" by 1,    # and make unique cell identifiers    ctrl2 <- rawData(dataset(pbmc, "ctrl"))    ctrl2@x <- ctrl2@x + 1    colnames(ctrl2) <- paste0(colnames(ctrl2), 2)    pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2),                               useDatasets = "ctrl", nIteration = 2)}

Perform factorization for new value of k

Description

This uses an efficient strategy for updating that takesadvantage of the information in the existing factorization. It is mostrecommended for values ofkNew smaller than current value (k,which is set when runningrunINMF), where it is more likely tospeed up the factorization.

Usage

optimizeNewK(  object,  kNew,  lambda = NULL,  nIteration = 30,  seed = 1,  verbose = getOption("ligerVerbose"),  k.new = kNew,  max.iters = nIteration,  rand.seed = seed,  thresh = NULL)

Arguments

object

Aliger object. Should have integrativefactorization performed e.g. (runINMF) in advance.

kNew

Number of factors of factorization.

lambda

Numeric regularization parameter. By defaultNULL, thiswill use the lambda value used in the latest factorization.

nIteration

Number of block coordinate descent iterations toperform. Default30.

seed

Random seed to allow reproducible results. Default1. UsedbyrunINMF factorization and initialization only when ifkNew is greater thank.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") which isTRUE if users have not set.

k.new,max.iters,rand.seed

These arguments are now replaced by othersand will be removed in the future. Please see usage for replacement.

thresh

Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enoughnIteration will bring it to convergence.

Value

object withW slot updated with the newWmatrix, and theH andV slots of eachligerDataset object in thedatasets slot updated withthe new dataset specificH andV matrix, respectively.

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)# Only running a few iterations for fast examplesif (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)    pbmc <- optimizeNewK(pbmc, kNew = 25, nIteration = 2)}

Perform factorization for new lambda value

Description

Uses an efficient strategy for updating that takes advantage ofthe information in the existing factorization; always uses previous k.Recommended mainly when re-optimizing for higher lambda and when new lambdavalue is significantly different; otherwise may not return optimal results.

Usage

optimizeNewLambda(  object,  lambdaNew,  nIteration = 30,  seed = 1,  verbose = getOption("ligerVerbose"),  new.lambda = lambdaNew,  max.iters = nIteration,  rand.seed = seed,  thresh = NULL)

Arguments

object

liger object. Should have integrativefactorization (e.g.runINMF) performed in advance.

lambdaNew

Numeric regularization parameter. Larger values penalizedataset-specific effects more strongly.

nIteration

Number of block coordinate descent iterations toperform. Default30.

seed

Random seed to allow reproducible results. Default1. UsedbyrunINMF factorization.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") which isTRUE if users have not set.

new.lambda,max.iters,rand.seed

These arguments are now replaced byothers and will be removed in the future. Please see usage for replacement.

thresh

Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enoughnIteration will bring it to convergence.

Value

Inputobject with optimized factorization values updated.including theW matrix inliger object, andH andV matrices in eachligerDataset object in thedatasets slot.

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    # Only running a few iterations for fast examples    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)    # pbmc <- optimizeNewLambda(pbmc, lambdaNew = 5.5, nIteration = 2)}

Perform factorization for subset of data

Description

Uses an efficient strategy for updating that takes advantage ofthe information in the existing factorization.

Usage

optimizeSubset(  object,  clusterVar = NULL,  useClusters = NULL,  lambda = NULL,  nIteration = 30,  cellIdx = NULL,  scaleDatasets = NULL,  seed = 1,  verbose = getOption("ligerVerbose"),  cell.subset = cellIdx,  cluster.subset = useClusters,  max.iters = nIteration,  datasets.scale = scaleDatasets,  thresh = NULL)

Arguments

object

liger object. Should have integrativefactorization (e.g.runINMF) performed in advance.

clusterVar,useClusters

Together select the clusters to subset theobject conveniently.clusterVar is the name of variable incellMeta(object) anduseClusters should be vector of names ofclusters in the variable.clusterVar is by default the defaultcluster (SeerunCluster, ordefaultCluster at"Cell metadata access"). Users can otherwise select cells explicitly withcellIdx for complex conditions.useClusters overridescellIdx.

lambda

Numeric regularization parameter. By defaultNULL, thiswill use the lambda value used in the latest factorization.

nIteration

Maximum number of block coordinate descent iterations toperform. Default30.

cellIdx

Valid index vector that applies to the whole object. SeesubsetLiger for requirement. DefaultNULL.

scaleDatasets

Names of datasets to re-scale after subsetting.DefaultNULL does not re-scale.

seed

Random seed to allow reproducible results. Default1. UsedbyrunINMF factorization.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") which isTRUE if users have not set.

cell.subset,cluster.subset,max.iters,datasets.scale

These argumentsare now replaced by others and will be removed in the future. Please seeusage for replacement.

thresh

Deprecated. New implementation of iNMF does not requirea threshold for convergence detection. Setting a large enoughnIteration will bring it to convergence.

Value

Subsetobject with factorization matrices optimized, includingtheW matrix inliger object, andW andVmatrices in eachligerDataset object in thedatasetsslot.scaleData in theligerDataset objects ofdatasets specified byscaleDatasets will also be updated to reflectthe subset.

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    # Only running a few iterations for fast examples    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)    pbmc <- optimizeSubset(pbmc, cellIdx = sort(sample(ncol(pbmc), 200)),                           nIteration = 2)}

liger object of PBMC subsample data with Control and Stimulated datasets

Description

liger object of PBMC subsample data with Control and Stimulated datasets

Usage

pbmc

Format

liger object with two datasets named by "ctrl" and"stim".

Source

https://www.nature.com/articles/nbt.4042

References

Hyun Min Kang and et. al., Nature Biotechnology, 2018

liger object of PBMC subsample data with plotting information available

Description

This data was generated from data"pbmc" with defaultparameter integration pipeline: normalize, selectGenes, scaleNotCenter,runINMF, runCluster, runUMAP. To minimize the object size distributed withthe package, rawData and scaleData were removed. Genes are downsampled tothe top 50 variable genes, for smaller normData andW matrix.

Usage

pbmcPlot

Format

liger object with two datasets named by "ctrl" and"stim".

Source

https://www.nature.com/articles/nbt.4042

References

Hyun Min Kang and et. al., Nature Biotechnology, 2018

GSEA plot for specific gene set and factor using factorGSEA results

Description

GSEA plot for specific gene set and factor using factorGSEA results

Usage

## S3 method for class 'factorGSEA'plot(  x,  y,  geneSetName,  useFactor,  xTitleSize = 10,  xTextSize = 8,  yTitleSize = 10,  yTextSize = 8,  titleSize = 12,  captionTextSize = 8,  ESLineColor = "green",  ESLinewidth = 1,  hitsLineColor = "black",  hitsLinewidth = 0.5,  loadingBarColor = "grey",  ...)

Arguments

x

AfactorGSEA object.

y

Not used, for S3 method convention.

geneSetName

A character string for the gene set name to plot.

useFactor

A character string (e.g. 'Factor_1') or just numeric indexfor the factor name to plot.

xTitleSize,yTitleSize

Numeric, size for x or y axis titles,respectively. Default10.

xTextSize,yTextSize

Numeric, size for x or y axis text,respectively. Default8.

titleSize

Numeric, size for the main plot title. Default12.

captionTextSize

Numeric, size for the caption text. Default8.

ESLineColor

Color for the enrichment score line. Default'green'.

ESLinewidth

Numeric, line width for the enrichment score line.Default1.

hitsLineColor

Color for the hits line. Default'black'.

hitsLinewidth

Numeric, line width for the hits line. Default0.5.

loadingBarColor

Color for the loading bar. Default'grey'.

...

Not used.

Value

ggplot object

Create barcode-rank plot for each dataset

Description

This function ranks the total count of each cell within each dataset and makeline plot. This function is simply for examining the input raw count dataand does not infer any recommended cutoff for removing non-cell barcodes.

Usage

plotBarcodeRank(object, ...)

Arguments

object

Aliger object.

...

Arguments passed on to.ggScatter,.ggplotLigerTheme

dotSize,dotAlpha: Numeric, controls the size or transparency of alldots. DefaultgetOption("ligerDotSize") (1) and0.9.
raster: Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.
title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

A list object of ggplot for each dataset

Examples

plotBarcodeRank(pbmc)

Generate violin/box plot(s) using liger object

Description

This function allows for using available cell metadata, featureexpression or factor loading to generate violin plot, and grouping the datawith available categorical cell metadata. Available categorical cell metadatacan be used to form the color annotation. When it is different from thegrouping, it forms a nested grouping. Multiple y-axis variables are allowedfrom the same specification ofslot, and this returns a list of violinplot for each. Users can further split the plot(s) by grouping on cells (e.g.datasets).

Usage

plotCellViolin(  object,  y,  groupBy = NULL,  slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H"),  yFunc = NULL,  cellIdx = NULL,  colorBy = NULL,  splitBy = NULL,  titles = NULL,  ...)

Arguments

object

liger object

y

Available variable name inslot to look for the value tovisualize.

groupBy,colorBy

Available variable name incellMeta slot tolook for categorical grouping. See details. DefaultNULL produces nogrouping and all-black graphic elements.

slot

Choose the slot to find they variable. See Details.Default"cellMeta".

yFunc

A function object that expects a vector/factor/data.frameretrieved byy as the only input, and returns an object of the samesize, so that the y-axis is replaced by this output. Useful when, forexample, users need to scale the gene expression shown on plot.

cellIdx

Character, logical or numeric index that can subscribe cells.Missing orNULL for all cells.

splitBy

Character vector of categorical variable names incellMeta slot. Split all cells by groupings on this/these variable(s)to produce a violin plot containing only the cells in each group. DefaultNULL.

titles

Title text. A character scalar or a character vector with asmany elements as multiple plots are supposed to be generated. DefaultNULL.

...

Arguments passed on to.ggCellViolin,.ggplotLigerTheme

violin,box,dot: Logical, whether to add violin plot, box plot or dot(scatter) plot, respectively. Layers are added in the order of dot, violin,and violin on the top surface. By default, only violin plot is generated.
violinAlpha,boxAlpha: Numeric, controls the transparency of layers.Default0.8,0.6, respectively.
violinWidth,boxWidth: Numeric, controls the width of violin/boxbounding box. Default0.9 and0.4.
dotColor,dotSize: Numeric, globally controls the appearance of alldots. Default"black" andgetOption("ligerDotSize") (1).
xlabAngle: Numeric, counter-clockwise rotation angle of X axis labeltext. Default45.
raster: Logical, whether to rasterize the dot plot. DefaultNULLautomatically rasterizes the dot plot when number of total cells to beplotted exceeds 100,000.
seed: Random seed for reproducibility. Default1.
title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
legendFillTitle: Legend title text for fill aesthetics, often used forviolin, box, bar plots. DefaultNULL shows the original variable name.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Details

Available option forslot include:"cellMeta","rawData","normData","scaleData","H.norm"and"H". When"rawData","normData" or"scaleData",y has to be a character vector of feature names.When"H.norm" or"H",colorBy can be any valid index toselect one factor of interests. Note that character index follows"Factor_[k]" format, with replacing[k] with an integer.

When"cellMeta",y has to be an available column name inthe table. Note that, fory as well asgroupBy,colorByandsplitBy since a matrix object is feasible incellMetatable, using a column (e.g. named as"column1" in a certain matrix(e.g. named as"matrixVar") should follow the syntax of"matrixVar.column1". When the matrix does not have a "colname"attribute, the subscription goes with"matrixVar.V1","matrixVar.V2" and etc. These are based on the nature ofas.data.frame method on aDataFrame object.

groupBy is basically send toggplot2::aes(x), whilecolorBy is for the "colour" aesthetics. SpecifyingcolorBywithoutgroupBy visually creates grouping but there will not bevarying values on the x-axis, soboxWidth will be forced to the samevalue asviolinWidth under this situation.

Value

A ggplot object when a single plot is intended. A list of ggplotobjects, when multipley variables and/orsplitBy are set. Whenplotly = TRUE, all ggplot objects become plotly (htmlwidget) objects.

Examples

plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "dataset", slot = "cellMeta")plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "leiden_cluster",               slot = "cellMeta", splitBy = "dataset",               colorBy = "leiden_cluster",               box = TRUE, dot = TRUE,               ylab = "Total counts per cell")plotCellViolin(pbmcPlot, y = "S100A8", slot = "normData",               yFunc = function(x) log2(10000*x + 1),               groupBy = "dataset", colorBy = "leiden_cluster",               box = TRUE, ylab = "S100A8 Expression")

Make dot plot of factor loading in cell groups

Description

This function produces dot plots. Each column represent a groupof cells specified bygroupBy, each row is a factor specified byuseDims. The color of dots reflects mean of factor loading ofspecified factors in each cell group and sizes reflects the percentage ofcells that have loadings of a factor in a group. We utilizeComplexHeatmapfor simplified management of adding annotation and slicing subplots. This wasinspired by the implementation inscCustomize.

Usage

plotClusterFactorDot(  object,  groupBy = NULL,  useDims = NULL,  useRaw = FALSE,  splitBy = NULL,  factorScaleFunc = NULL,  cellIdx = NULL,  legendColorTitle = "Mean Factor\nLoading",  legendSizeTitle = "Percent\nLoaded",  viridisOption = "viridis",  verbose = FALSE,  ...)

Arguments

object

Aliger object

groupBy

The names of the columns incellMeta slot storingcategorical variables. Loading data would be aggregated basing on these,together withsplitBy. Default uses default clusters.

useDims

A Numeric vector to specify exact factors of interests.DefaultNULL uses all available factors.

useRaw

Whether to use un-aligned cell factor loadings (Hmatrices). DefaultFALSE.

splitBy

The names of the columns incellMeta slot storingcategorical variables. Dotplot panel splitting would be based on these.DefaultNULL.

factorScaleFunc

A function object applied to factor loading matrix forscaling the value for better visualization. DefaultNULL.

cellIdx

Valid cell subscription. SeesubsetLiger.DefaultNULL for using all cells.

legendColorTitle

Title for colorbar legend. Default"Mean Factor\nLoading".

legendSizeTitle

Title for size legend. Default"Percent\nLoaded"

viridisOption

Name of available viridis palette. Seeviridis. Default"viridis".

verbose

Logical. Whether to show progress information. Mainly whensubsetting data. DefaultFALSE.

...

Additional theme setting arguments passed to.complexHeatmapDotPlot and heatmap setting arguments passed toHeatmap. See Details.

Details

For..., please notice that argumentscolorMat,sizeMat,featureAnnDF,cellSplitVar,cellLabelsandviridisOption from.complexHeatmapDotPlot arealready occupied by this function internally. A lot of arguments fromHeatmap have also been occupied:matrix,name, heatmap_legend_param, rect_gp, col, layer_fun, km, border, border_gp,column_gap, row_gap, cluster_row_slices, cluster_rows, row_title_gp,row_names_gp, row_split, row_labels, cluster_column_slices, cluster_columns,column_split, column_title_gp, column_title, column_labels, column_names_gp,top_annotation.

Value

HeatmapList object.

Examples

plotClusterFactorDot(pbmcPlot)

Make dot plot of gene expression in cell groups

Description

This function produces dot plots. Each column represent a groupof cells specified bygroupBy, each row is a gene specified byfeatures. The color of dots reflects mean of normalized expression ofspecified genes in each cell group and sizes reflects the percentage of cellsexpressing each gene in a group. We utilizeComplexHeatmapfor simplified management of adding annotation and slicing subplots. This wasinspired by the implementation inscCustomize.

Usage

plotClusterGeneDot(  object,  features,  groupBy = NULL,  splitBy = NULL,  featureScaleFunc = function(x) log2(10000 * x + 1),  cellIdx = NULL,  legendColorTitle = "Mean\nExpression",  legendSizeTitle = "Percent\nExpressed",  viridisOption = "magma",  verbose = FALSE,  ...)

Arguments

object

Aliger object

features

Use a character vector of gene names to make plain dot plotlike a heatmap. Use a data.frame where the first column is gene names andsecond column is a grouping variable (e.g. subsetrunMarkerDEG output)

groupBy

The names of the columns incellMeta slot storingcategorical variables. Expression data would be aggregated basing on these,together withsplitBy. Default uses default clusters.

splitBy

The names of the columns incellMeta slot storingcategorical variables. Dotplot panel splitting would be based on these.DefaultNULL.

featureScaleFunc

A function object applied to normalized data forscaling the value for better visualization. Defaultfunction(x)log2(10000*x + 1)

cellIdx

Valid cell subscription. SeesubsetLiger.DefaultNULL for using all cells.

legendColorTitle

Title for colorbar legend. Default"Mean\nExpression".

legendSizeTitle

Title for size legend. Default"Percent\nExpressed"

viridisOption

Name of available viridis palette. Seeviridis. Default"magma".

verbose

Logical. Whether to show progress information. Mainly whensubsetting data. DefaultFALSE.

...

Additional theme setting arguments passed to.complexHeatmapDotPlot and heatmap setting arguments passed toHeatmap. See Details.

Details

Value

HeatmapList object.

Examples

# Use character vector of genesfeatures <- varFeatures(pbmcPlot)[1:10]plotClusterGeneDot(pbmcPlot, features = features)# Use data.frame with grouping information, with more tweak on plotfeatures <- data.frame(features, rep(letters[1:5], 2))plotClusterGeneDot(pbmcPlot, features = features,                   clusterFeature = TRUE, clusterCell = TRUE, maxDotSize = 6)

Create violin plot for multiple genes grouped by clusters

Description

Make violin plots for each given gene grouped by cluster variable and stackalong y axis.

Usage

plotClusterGeneViolin(  object,  gene,  groupBy = NULL,  colorBy = NULL,  box = FALSE,  boxAlpha = 0.1,  yFunc = function(x) log1p(x * 10000),  showLegend = !is.null(colorBy),  xlabAngle = 40,  ...)

Arguments

object

Aliger object.

gene

Character vector of gene names.

groupBy

The name of an available categorical variable incellMeta slot. This forms the main x-axis columns. UseFALSEfor no grouping. DefaultNULL looks clustering result but will notgroup if no clustering is found.

colorBy

The name of another categorical variable incellMetaslot. This split the main grouping columns and color the violins. DefaultNULL will not split and color the violins.

box

Logical, whether to add boxplot. DefaultFALSE.

boxAlpha

Numeric, transparency of boxplot. Default0.1.

yFunc

Function to transform the y-axis. Default islog1p(x*1e4). Set toNULL for no transformation.

showLegend

Whether to show the legend. DefaultFALSE.

xlabAngle

Numeric, counter-clockwise rotation angle in degrees of Xaxis label text. Default40.

...

Arguments passed on to.ggplotLigerTheme

title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
legendFillTitle: Legend title text for fill aesthetics, often used forviolin, box, bar plots. DefaultNULL shows the original variable name.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
yFacetSize: Size of facet strip label text on y-axis. DefaultNULL controls bybaseSize - 2.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Details

Ifxlab need to be set, setxlabAngle at the same time. This isdue to that the argument parsing mechanism will partially match it to mainfunction arguments before matching the... arguments.

Value

A ggplot object.

Examples

plotClusterGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1:10])

Create density plot basing on specified coordinates

Description

This function shows the cell density presented in a 2Ddimensionality reduction coordinates. Density is shown with coloring andcontour lines. A scatter plot of the dimensionality reduction is added aswell. The density plot can be splitted by categorical variables (e.g."dataset"), while the scatter plot will always be shown for all cellsin subplots as a reference of the global structure.

Usage

plotDensityDimRed(  object,  useDimRed = NULL,  splitBy = NULL,  combinePlot = TRUE,  minDensity = 8,  contour = TRUE,  contourLineWidth = 0.3,  contourBins = 5,  dot = TRUE,  dotColor = "grey",  dotSize = 0.6,  dotAlpha = 0.3,  dotRaster = NULL,  title = NULL,  legendFillTitle = "Density",  colorPalette = "magma",  colorDirection = -1,  ...)

Arguments

object

Aliger object

useDimRed

Name of the variable storing dimensionality reduction resultin thecellMeta slot. Default uses default dimension reduction.

splitBy

Character vector of categorical variable names incellMeta slot. Split all cells by groupings on this/these variable(s)to produce a density plot containing only the cells in each group. DefaultNULL.

combinePlot

Logical, whether to utilizeplot_grid to combine multiple plots into one. DefaultTRUE returns combined ggplot.FALSE returns a list of ggplotor a single ggplot when only one plot is requested.

minDensity

A positive number to filter out low density region coloredon plot. Default8. Setting zero will show density on the whole panel.

contour

Logical, whether to draw the contour line. DefaultTRUE.

contourLineWidth

Numeric, the width of the contour line. Default0.3.

contourBins

Number of contour bins. Higher value generates morecontour lines. Default5.

dot

Logical, whether to add scatter plot of all cells, even whendensity plot is splitted withsplitBy. DefaultTRUE.

dotColor,dotSize,dotAlpha

Numeric, controls the appearance of alldots. Default"grey",0.6 and0.3, respectively.

dotRaster

Logical, whether to rasterize the scatter plot. DefaultNULL automatically rasterizes the dots when number of total cells tobe plotted exceeds 100,000.

title

Text of main title of the plots. DefaultNULL. Lengthof character vector input should match with number of plots generated.

legendFillTitle

Text of legend title. Default"Density".

colorPalette

Name of the option forscale_fill_viridis_c. Default"magma".

colorDirection

Color gradient direction forscale_fill_viridis_c. Default-1.

...

Arguments passed on to.ggplotLigerTheme

title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

A ggplot object when only one plot is generated, A ggplot objectcombined withplot_grid when multiple plots andcombinePlot = TRUE. A list of ggplot when multiple plots andcombinePlot = FALSE.

Examples

# Example dataset has small number of cells, thus cutoff adjusted.plotDensityDimRed(pbmcPlot, minDensity = 1)

Generate scatter plot(s) using liger object

Description

This function allows for using available cell metadata to buildthe x-/y-axis. Available per-cell data can be used to form the color/shapeannotation, including cell metadata, raw or processed gene expression, andunnormalized or aligned factor loading. Multiple coloring variable is allowedfrom the same specification ofslot, and this returns a list of plotswith different coloring values. Users can further split the plot(s) bygrouping on cells (e.g. datasets).

Usage

plotDimRed(  object,  colorBy = NULL,  useDimRed = NULL,  slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H", "normPeak",    "rawPeak"),  colorByFunc = NULL,  cellIdx = NULL,  splitBy = NULL,  shapeBy = NULL,  titles = NULL,  ...)plotClusterDimRed(object, useCluster = NULL, useDimRed = NULL, ...)plotDatasetDimRed(object, useDimRed = NULL, ...)plotByDatasetAndCluster(  object,  useDimRed = NULL,  useCluster = NULL,  combinePlot = TRUE,  ...)plotGeneDimRed(  object,  features,  useDimRed = NULL,  log = TRUE,  scaleFactor = 10000,  zeroAsNA = TRUE,  colorPalette = "C",  ...)plotPeakDimRed(  object,  features,  useDimRed = NULL,  log = TRUE,  scaleFactor = 10000,  zeroAsNA = TRUE,  colorPalette = "C",  ...)plotFactorDimRed(  object,  factors,  useDimRed = NULL,  trimHigh = 0.03,  zeroAsNA = TRUE,  colorPalette = "D",  ...)

Arguments

object

Aliger object.

colorBy

Available variable name in specifiedslot to look forcolor annotation information. See details. DefaultNULL generatesall-black dots.

useDimRed

Name of the variable storing dimensionality reduction resultin thecellMeta(object). DefaultNULL use default dimRed.

slot

Choose the slot to find thecolorBy variable. See details.Default"cellMeta".

colorByFunc

DefaultNULL. A function object that expects avector/factor/data.frame retrieved bycolorBy as the only input, andreturns an object of the same size, so that the all color "aes" are replacedby this output. Useful when, for example, users need to scale the geneexpression shown on plot.

cellIdx

Character, logical or numeric index that can subscribe cells.Missing orNULL for all cells.

splitBy

Character vector of categorical variable names incellMeta slot. Split all cells by groupings on this/these variable(s)to produce a scatter plot containing only the cells in each group. DefaultNULL.

shapeBy

Available variable name incellMeta slot to look forcategorical annotation to be reflected by dot shapes. DefaultNULL.

titles

Title text. A character scalar or a character vector with asmany elements as multiple plots are supposed to be generated. DefaultNULL.

...

Arguments passed on to.ggScatter,.ggplotLigerTheme

dotOrder: Controls the order that each dot is added to the plot. Choosefrom"shuffle","ascending", or"descending". Default"shuffle", useful when coloring by categories that overlaps (e.g."dataset"),"ascending" can be useful when coloring by a continuousvariable (e.g. gene expression) where high values needs morehighlight.NULL use default order.
dotSize,dotAlpha: Numeric, controls the size or transparency of alldots. DefaultgetOption("ligerDotSize") (1) and0.9.
trimHigh,trimLow: Numeric, limit the largest or smallest value ofcontinuouscolorBy variable. DefaultNULL.
raster: Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.
labelBy: A variable name available inplotDF. If the variable iscategorical (a factor), the label position will be the median coordinates ofall dots within the same group. Unique labeling in character vector for eachdot is also acceptable. DefaultcolorBy.
labelText: Logical, whether to show text label at the median positionof each categorical group specified bycolorBy. DefaultTRUE.Does not work when continuous coloring is specified.
labelTextSize: Numeric, controls the size of label size whenlabelText = TRUE. Default4.
seed: Random seed for reproducibility. Default1.
title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
legendColorTitle: Legend title text for color aesthetics, often usedfor categorical or continuous coloring of dots. DefaultNULL shows theoriginal variable name.
legendShapeTitle: Legend title text for shape aesthetics, often usedfor shaping dots by categorical variable. DefaultNULL shows theoriginal variable name.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
legendDotSize: Allow dots in legend region to be large enough to seethe colors/shapes clearly. Default4.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
colorDirection: Choose1 or-1. Applied whencolorPalette is from Viridis options. Default-1 use darkercolor for higher value, while1 reverses this direction.
colorLow,colorMid,colorHigh,colorMidPoint: All four of these must bespecified to customize palette with
naColor: The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

useCluster

Name of variable incellMeta(object). DefaultNULL uses default cluster.

combinePlot

Logical, whether to utilizeplot_grid to combine multiple plots into one. DefaultTRUE returns combined ggplot.FALSE returns a list of ggplot.

features,factors

Name of genes or index of factors that need to bevisualized.

log

Logical. Whether to log transform the normalized expression ofgenes. DefaultTRUE.

scaleFactor

Number to be multiplied with the normalized expression ofgenes before log transformation. Default1e4.NULL for notscaling.

zeroAsNA

Logical, whether to swap all zero values toNA sonaColor will be used to represent non-expressing features. DefaultTRUE.

colorPalette

Name of viridis palette. Seeviridis for options. Default"C" ("plasma")for gene expression and"D" ("viridis") for factor loading.

trimHigh

Number for highest cut-off to limit the outliers. Factorloading above this value will all be trimmed to this value. Default0.03.

Details

Available option forslot include:"cellMeta","rawData","normData","scaleData","H.norm"and"H". When"rawData","normData" or"scaleData",colorBy has to be a character vector of featurenames. When"H.norm" or"H",colorBy can be any validindex to select one factor of interests. Note that character index follows"Factor_[k]" format, with replacing[k] with an integer.

When"cellMeta",colorBy has to be an available column name inthe table. Note that, forcolorBy as well asx,y,shapeBy andsplitBy, since a matrix object is feasible incellMeta table, using a column (e.g. named as"column1" in acertain matrix (e.g. named as"matrixVar") should follow the syntax of"matrixVar.column1". When the matrix does not have a "colname"attribute, the subscription goes with"matrixVar.V1","matrixVar.V2" and etc. Use"UMAP.1","UMAP.2","TSNE.1" or"TSNE.2" for the 2D embeddings generated withrliger package. These are based on the nature ofas.data.frame methodon aDataFrame object.

Value

A ggplot object when a single plot is intended. A list of ggplotobjects, when multiplecolorBy variables and/orsplitBy areset. Whenplotly = TRUE, all ggplot objects become plotly (htmlwidget)objects.

ggplot object when only one feature (e.g. cluster variable, gene,factor) is set. List object when multiple of those are specified.

Examples

plotDimRed(pbmcPlot, colorBy = "dataset", slot = "cellMeta",           labelText = FALSE)plotDimRed(pbmcPlot, colorBy = "S100A8", slot = "normData",           dotOrder = "ascending", dotSize = 2)plotDimRed(pbmcPlot, colorBy = 2, slot = "H.norm",           dotOrder = "ascending", dotSize = 2, colorPalette = "viridis")plotClusterDimRed(pbmcPlot)plotDatasetDimRed(pbmcPlot)plotByDatasetAndCluster(pbmcPlot)plotGeneDimRed(pbmcPlot, varFeatures(pbmcPlot)[1])plotFactorDimRed(pbmcPlot, 2)

Create volcano plot with EnhancedVolcano

Description

Create volcano plot with EnhancedVolcano

Usage

plotEnhancedVolcano(result, group, ...)

Arguments

result

Data frame table returned byrunMarkerDEG orrunPairwiseDEG.

group

Selection of one group available fromresult$group. Ifonly one group is available fromresult, defaultNULL uses it.

...

Arguments passed to EnhancedVolcano::EnhancedVolcano(), exceptthattoptable,lab,x andy are prefilled by thiswrapper.

Value

ggplot

Examples

if (requireNamespace("EnhancedVolcano", quietly = TRUE)) {    defaultCluster(pbmc) <- pbmcPlot$leiden_cluster    # Test the DEG between "stim" and "ctrl", within each cluster    result <- runPairwiseDEG(        pbmc,        groupTest = "stim",        groupCtrl = "ctrl",        variable1 = "dataset",        splitBy = "defaultCluster"    )    plotEnhancedVolcano(result, "0.stim")}

Visualize GO enrichment test result in dot plot

Description

Visualize GO enrichment test result in dot plot

Usage

plotGODot(  result,  group = NULL,  query = NULL,  pvalThresh = 0.05,  n = 20,  minDotSize = 3,  maxDotSize = 7,  termIDMatch = "^GO",  colorPalette = "E",  colorDirection = -1,  ...)

Arguments

result

Returned list object fromrunGOEnrich.

group

A single group name to be visualized, must be available innames(result). DefaultNULL make plots for the first group.

query

A single string selecting from which query to show the result.Choose from"Up" for results using up-regulated genes,"Down"for down-regulated genes. Default NULL use the first available.

pvalThresh

Numeric scalar, cutoff for p-value where smaller values areconsidered as significant. Default0.05.

n

Number of top terms to be shown, ranked by p-value. Default20.

minDotSize

The size of the dot representing the minimum gene count.Default3.

maxDotSize

The size of the dot representing the maximum gene count.

termIDMatch

Regular expression pattern to match the term ID. Default"^GO" for only using GO terms from returned results.

colorPalette,colorDirection

Viridis palette options. Default"E" and1.

...

Arguments passed on to.ggplotLigerTheme

legendColorTitle: Legend title text for color aesthetics, often usedfor categorical or continuous coloring of dots. DefaultNULL shows theoriginal variable name.
legendSizeTitle: Legend title text for size aesthetics, often used forsizing dots by continuous variable. DefaultNULL shows the originalvariable name.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

A ggplot object.

Examples

if (requireNamespace("gprofiler2", quietly = TRUE)) {   go <- runGOEnrich(deg.pw)   plotGODot(go)}

Plot Heatmap of Gene Expression or Factor Loading

Description

Plot Heatmap of Gene Expression or Factor Loading

Usage

plotGeneHeatmap(  object,  features,  cellIdx = NULL,  slot = c("normData", "rawData", "scaleData", "scaleUnsharedData"),  useCellMeta = NULL,  cellAnnotation = NULL,  featureAnnotation = NULL,  cellSplitBy = NULL,  featureSplitBy = NULL,  viridisOption = "C",  ...)plotFactorHeatmap(  object,  factors = NULL,  cellIdx = NULL,  slot = c("H.norm", "H"),  useCellMeta = NULL,  cellAnnotation = NULL,  factorAnnotation = NULL,  cellSplitBy = NULL,  factorSplitBy = NULL,  trim = c(0, 0.03),  viridisOption = "D",  ...)

Arguments

object

Aliger object, with data to be plot available.

features,factors

Character vector of genes of interests or numericindex of factor to be involved.features is required, whilefactors is by default all the factors (reads object recorded k valueinuns slot).

cellIdx

Valid index to subscribe cells to be included. SeesubsetLiger. DefaultNULL use all cells.

slot

Use the chosen matrix for heatmap. ForplotGeneHeatmap,default"normData", alternatively"rawData","scaleData" or"scaleUnsharedData". ForplotFactorHeatmap, default"H.norm", alternatively"H".

useCellMeta

Character vector of available variable names incellMeta, variables will be added as annotation to the heatmap.DefaultNULL.

cellAnnotation

data.frame object for using external annotation, witheach column a variable and each row is a cell. Row names of this data.framewill be used for matching cells involved in heatmap. For cells not found inthis data.frame,NAs will be added with warning. DefaultNULL.

featureAnnotation,factorAnnotation

Similar ascellAnnotation,while each row would be a gene or factor, respectively. DefaultNULL.

cellSplitBy

Character vector of variable names available in annotationgiven byuseCellMeta andcellAnnotation. This slices theheatmap by specified variables. DefaultNULL.

featureSplitBy,factorSplitBy

Similar ascellSplitBy. DefaultNULL

viridisOption

Seeoption argument ofviridis. Default"C" (plasma) forplotGeneHeatmap and"D" (viridis) forplotFactorHeatmap.

...

Arguments passed on to.plotHeatmap

transpose: Logical, whether to "rotate" the heatmap by 90 degrees sothat cell information is displayed by row. DefaultFALSE.
showCellLabel,showFeatureLabel: Logical, whether to show cell barcodes,gene symbols or factor names. DefaultTRUE for gene/factors butFALSE for cells.
showCellLegend,showFeatureLegend: Logical, whether to show cell orfeature legends. DefaultTRUE. Can be a scalar for overall controlor a vector matching with each given annotation variable.
cellAnnColList,featureAnnColList: List object, with each element anamed vector of R-interpretable color code. The names of the list elementsare used for matching the annotation variable names. The names of the colorsin the vectors are used for matching the levels of a variable (factor object,categorical). DefaultNULL generates ggplot-flavor categorical colors.
scale: Logical, whether to take z-score to scale and center geneexpression. Applied afterdataScaleFunc. DefaultFALSE.
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
cellTextSize,featureTextSize,legendTextSize: Size of cell barcodelabels, gene/factor labels, or legend values. DefaultNULL.
cellTitleSize,featureTitleSize,legendTitleSize: Size of titles of thecell slices, gene/factor slices, or the legends. DefaultNULL.
RColorBrewerOption: Whenscale = TRUE, heatmap color will bemapped withbrewer.pal. This is passed toname. Default"RdBu".

trim

Numeric vector of two numbers. Higher value limits the maximumvalue and lower value limits the minimum value. Defaultc(0, 0.03).

Value

HeatmapList-class object

Examples

plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot))plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot),                useCellMeta = c("leiden_cluster", "dataset"),                cellSplitBy = "leiden_cluster")plotFactorHeatmap(pbmcPlot)plotFactorHeatmap(pbmcPlot, cellIdx = pbmcPlot$leiden_cluster %in% 1:3,                  useCellMeta = c("leiden_cluster", "dataset"),                  cellSplitBy = "leiden_cluster")

Visualize factor expression and gene loading

Description

Visualize factor expression and gene loading

Usage

plotGeneLoadings(  object,  markerTable,  useFactor,  useDimRed = NULL,  nLabel = 15,  nPlot = 30,  ...)plotGeneLoadingRank(  object,  markerTable,  useFactor,  nLabel = 15,  nPlot = 30,  ...)

Arguments

object

Aliger object with valid factorization result.

markerTable

Returned result ofgetFactorMarkers.

useFactor

Integer index for which factor to visualize.

useDimRed

Name of the variable storing dimensionality reduction resultin thecellMeta slot. Default"UMAP".

nLabel

Integer, number of top genes to be shown with text labels.Default15.

nPlot

Integer, number of top genes to be shown in the loading rankplot. Default30.

...

Arguments passed on toplotDimRed,.ggScatter,.ggplotLigerTheme

colorByFunc: DefaultNULL. A function object that expects avector/factor/data.frame retrieved bycolorBy as the only input, andreturns an object of the same size, so that the all color "aes" are replacedby this output. Useful when, for example, users need to scale the geneexpression shown on plot.
cellIdx: Character, logical or numeric index that can subscribe cells.Missing orNULL for all cells.
shapeBy: Available variable name incellMeta slot to look forcategorical annotation to be reflected by dot shapes. DefaultNULL.
titles: Title text. A character scalar or a character vector with asmany elements as multiple plots are supposed to be generated. DefaultNULL.
dotSize,dotAlpha: Numeric, controls the size or transparency of alldots. DefaultgetOption("ligerDotSize") (1) and0.9.
trimHigh,trimLow: Numeric, limit the largest or smallest value ofcontinuouscolorBy variable. DefaultNULL.
raster: Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.
legendColorTitle: Legend title text for color aesthetics, often usedfor categorical or continuous coloring of dots. DefaultNULL shows theoriginal variable name.
legendShapeTitle: Legend title text for shape aesthetics, often usedfor shaping dots by categorical variable. DefaultNULL shows theoriginal variable name.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorPalette: For continuous coloring, an index or a palette name toselect from available options from ggplotscale_brewer orviridis.Default"magma".
colorDirection: Choose1 or-1. Applied whencolorPalette is from Viridis options. Default-1 use darkercolor for higher value, while1 reverses this direction.
naColor: The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.

Examples

result <- getFactorMarkers(pbmcPlot, "ctrl", "stim")plotGeneLoadings(pbmcPlot, result, useFactor = 2)

Visualize gene expression or cell metadata with violin plot

Description

Visualize gene expression or cell metadata with violin plot

Usage

plotGeneViolin(object, gene, byDataset = TRUE, groupBy = NULL, ...)plotTotalCountViolin(object, groupBy = "dataset", ...)plotGeneDetectedViolin(object, groupBy = "dataset", ...)

Arguments

object

Aliger object.

gene

Character gene names.

byDataset

Logical, whether the violin plot should be splitted bydataset. DefaultTRUE.

groupBy

Names of available categorical variable incellMetaslot. UseFALSE for no grouping. DefaultNULL looks clusteringresult but will not group if no clustering found.

...

Arguments passed on toplotCellViolin,.ggCellViolin,.ggplotLigerTheme

slot: Choose the slot to find they variable. See Details.Default"cellMeta".
yFunc: A function object that expects a vector/factor/data.frameretrieved byy as the only input, and returns an object of the samesize, so that the y-axis is replaced by this output. Useful when, forexample, users need to scale the gene expression shown on plot.
cellIdx: Character, logical or numeric index that can subscribe cells.Missing orNULL for all cells.
titles: Title text. A character scalar or a character vector with asmany elements as multiple plots are supposed to be generated. DefaultNULL.
violin,box,dot: Logical, whether to add violin plot, box plot or dot(scatter) plot, respectively. Layers are added in the order of dot, violin,and violin on the top surface. By default, only violin plot is generated.
violinAlpha,boxAlpha: Numeric, controls the transparency of layers.Default0.8,0.6, respectively.
violinWidth,boxWidth: Numeric, controls the width of violin/boxbounding box. Default0.9 and0.4.
dotColor,dotSize: Numeric, globally controls the appearance of alldots. Default"black" andgetOption("ligerDotSize") (1).
xlabAngle: Numeric, counter-clockwise rotation angle of X axis labeltext. Default45.
raster: Logical, whether to rasterize the dot plot. DefaultNULLautomatically rasterizes the dot plot when number of total cells to beplotted exceeds 100,000.
seed: Random seed for reproducibility. Default1.
legendFillTitle: Legend title text for fill aesthetics, often used forviolin, box, bar plots. DefaultNULL shows the original variable name.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

ggplot if using a single gene and not splitting by dataset.Otherwise, list of ggplot.

Examples

plotGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1],               groupBy = "leiden_cluster")plotTotalCountViolin(pbmc)plotGeneDetectedViolin(pbmc, dot = TRUE, box = TRUE, colorBy = "dataset")

Comprehensive group splited cluster plot on dimension reduction withproportion

Description

This function produces combined plot on group level (e.g. dataset, othermetadata variable like biological conditions). Scatter plot of dimensionreduction with cluster labeled is generated per group. Furthermore, a stackedbarplot of cluster proportion within each group is also combined with thesubplot of each group.

Usage

plotGroupClusterDimRed(  object,  useGroup = "dataset",  useCluster = NULL,  useDimRed = NULL,  combinePlot = TRUE,  droplevels = TRUE,  relHeightMainLegend = c(5, 1),  relHeightDRBar = c(10, 1),  mainNRow = NULL,  mainNCol = NULL,  legendNRow = 1,  ...)

Arguments

object

Aliger object with dimension reduction, groupingvariable and cluster assignment incellMeta(object).

useGroup

Variable name of the group division in metadata. Default"dataset".

useCluster

Name of variable incellMeta(object). DefaultNULL uses default cluster.

useDimRed

Name of the variable storing dimensionality reduction resultincellMeta(object). DefaultNULL use default dimRed.

combinePlot

Whether to return combined plot. DefaultTRUE. IfFALSE, will return a list containing only the scatter plots.

droplevels

Logical, whether to performdroplevels() onthe selected grouping variable. DefaultTRUE will not show groups thatare listed as categories but do not indeed have any cells.

relHeightMainLegend

Relative heights of the main combination panel andthe legend at the bottom. Must be a numeric vector of 2 numbers. Defaultc(5, 1).

relHeightDRBar

Relative heights of the scatter plot and the barplotwithin each subpanel. Must be a numeric vector of 2 numbers. Defaultc(10, 1).

mainNRow,mainNCol

Arrangement of the main plotting region, for numberof rows and columns. DefaultNULL will be automatically handled byplot_grid.

legendNRow

Arrangement of the legend, number of rows. Default1.

...

Arguments passed on to.ggScatter,.ggplotLigerTheme

dotOrder: Controls the order that each dot is added to the plot. Choosefrom"shuffle","ascending", or"descending". Default"shuffle", useful when coloring by categories that overlaps (e.g."dataset"),"ascending" can be useful when coloring by a continuousvariable (e.g. gene expression) where high values needs morehighlight.NULL use default order.
dotSize,dotAlpha: Numeric, controls the size or transparency of alldots. DefaultgetOption("ligerDotSize") (1) and0.9.
raster: Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.
labelText: Logical, whether to show text label at the median positionof each categorical group specified bycolorBy. DefaultTRUE.Does not work when continuous coloring is specified.
labelTextSize: Numeric, controls the size of label size whenlabelText = TRUE. Default4.
seed: Random seed for reproducibility. Default1.
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
naColor: The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

ggplot object when only one feature (e.g. cluster variable, gene,factor) is set. List object when multiple of those are specified.

Examples

plotGroupClusterDimRed(pbmcPlot)

Create heatmap for showing top marker expression in conditions

Description

Create heatmap for showing top marker expression in conditions

Usage

plotMarkerHeatmap(  object,  result,  topN = 5,  lfcThresh = 1,  padjThresh = 0.05,  pctInThresh = 50,  pctOutThresh = 50,  dedupBy = c("logFC", "padj"),  groupBy = NULL,  groupSize = 50,  column_title = NULL,  ...)

Arguments

object

Aliger object, with normalized data and metadatato annotate available.

result

The data.frame returned byrunMarkerDEG.

topN

Number of top features to be plot for each group. Default5.

lfcThresh

Hard threshold on logFC value. Default1.

padjThresh

Hard threshold on adjusted P-value. Default0.05.

pctInThresh,pctOutThresh

Threshold on expression percentage. Thesemean that a feature will only pass the filter if it is expressed in more thanpctInThresh percent of cells in the corresponding cluster. SimilarlyforpctOutThresh. Only applied when these metrics are available.Default50 percent for both.

dedupBy

When ranking by padj and logFC and a feature is ranked as topfor multiple clusters, assign this feature as the marker of a cluster whenit has the largest"logFC" in the cluster or has the lowest"padj". Default"logFC".

groupBy

Cell metadata variable names for cell grouping. Downsamplebalancing will also be aware of this. Default"dataset" and thedefault cluster.

groupSize

Maximum number of cells in each group to be downsampled forplotting. Default50.

column_title

Title on the column. DefaultNULL.

...

Arguments passed on toplotGeneHeatmap,.plotHeatmap

cellAnnotation: data.frame object for using external annotation, witheach column a variable and each row is a cell. Row names of this data.framewill be used for matching cells involved in heatmap. For cells not found inthis data.frame,NAs will be added with warning. DefaultNULL.
transpose: Logical, whether to "rotate" the heatmap by 90 degrees sothat cell information is displayed by row. DefaultFALSE.
showCellLabel,showFeatureLabel: Logical, whether to show cell barcodes,gene symbols or factor names. DefaultTRUE for gene/factors butFALSE for cells.
cellAnnColList,featureAnnColList: List object, with each element anamed vector of R-interpretable color code. The names of the list elementsare used for matching the annotation variable names. The names of the colorsin the vectors are used for matching the levels of a variable (factor object,categorical). DefaultNULL generates ggplot-flavor categorical colors.
scale: Logical, whether to take z-score to scale and center geneexpression. Applied afterdataScaleFunc. DefaultFALSE.
trim: Numeric vector of two values. Limit the z-score value into thisrange whenscale = TRUE. Defaultc(-2, 2).
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
cellTextSize,featureTextSize,legendTextSize: Size of cell barcodelabels, gene/factor labels, or legend values. DefaultNULL.
cellTitleSize,featureTitleSize,legendTitleSize: Size of titles of thecell slices, gene/factor slices, or the legends. DefaultNULL.
viridisOption,viridisDirection: See argumentoption anddirection ofviridis. Default"A"and-1.
RColorBrewerOption: Whenscale = TRUE, heatmap color will bemapped withbrewer.pal. This is passed toname. Default"RdBu".

Value

AHeatmapList-class object.

Examples

defaultCluster(pbmc) <- pbmcPlot$leiden_clusterpbmc <- normalize(pbmc)plotMarkerHeatmap(pbmc, deg.marker)

Create heatmap for pairwise DEG analysis result

Description

Create heatmap for pairwise DEG analysis result

Usage

plotPairwiseDEGHeatmap(  object,  result,  group = NULL,  topN = 20,  absLFCThresh = 1,  padjThresh = 0.05,  pctInThresh = 50,  pctOutThresh = 50,  downsampleSize = 200,  useCellMeta = NULL,  column_title = NULL,  seed = 1,  ...)

Arguments

object

Aliger object, with normalized data and metadatato annotate available.

result

The data.frame returned byrunPairwiseDEG.

group

The test group name among the result to be shown. Must specifyonly one if multiple tests are available (i.e. split test). DefaultNULL works with single-test result and raises error with split-testresult.

topN

Maximum number of top significant features to be plot for up- anddown-regulated genes. Default20.

absLFCThresh

Hard threshold on absolute logFC value. Default1.

padjThresh

Hard threshold on adjusted P-value. Default0.05.

pctInThresh,pctOutThresh

downsampleSize

Maximum number of downsampled cells to be shown in theheatmap. The downsampling is balanced on the cells involved in the testspecified. Default200.

useCellMeta

Cell metadata variable names for cell grouping. DefaultNULL includes dataset source and the default cluster.

column_title

Title on the column. DefaultNULL.

seed

Random seed for reproducibility. Default1.

...

Arguments passed on to.plotHeatmap

transpose: Logical, whether to "rotate" the heatmap by 90 degrees sothat cell information is displayed by row. DefaultFALSE.
showCellLabel,showFeatureLabel: Logical, whether to show cell barcodes,gene symbols or factor names. DefaultTRUE for gene/factors butFALSE for cells.
cellAnnColList,featureAnnColList: List object, with each element anamed vector of R-interpretable color code. The names of the list elementsare used for matching the annotation variable names. The names of the colorsin the vectors are used for matching the levels of a variable (factor object,categorical). DefaultNULL generates ggplot-flavor categorical colors.
scale: Logical, whether to take z-score to scale and center geneexpression. Applied afterdataScaleFunc. DefaultFALSE.
trim: Numeric vector of two values. Limit the z-score value into thisrange whenscale = TRUE. Defaultc(-2, 2).
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
cellTextSize,featureTextSize,legendTextSize: Size of cell barcodelabels, gene/factor labels, or legend values. DefaultNULL.
cellTitleSize,featureTitleSize,legendTitleSize: Size of titles of thecell slices, gene/factor slices, or the legends. DefaultNULL.
viridisOption,viridisDirection: See argumentoption anddirection ofviridis. Default"A"and-1.
RColorBrewerOption: Whenscale = TRUE, heatmap color will bemapped withbrewer.pal. This is passed toname. Default"RdBu".

Value

AHeatmapList-class object.

Examples

defaultCluster(pbmc) <- pbmcPlot$leiden_clusterpbmc$condition_cluster <- interaction(pbmc$dataset, pbmc$defaultCluster)deg <- runPairwiseDEG(pbmc, 'stim.0', 'stim.1', 'condition_cluster')pbmc <- normalize(pbmc)plotPairwiseDEGHeatmap(pbmc, deg, 'stim.0')

Visualize proportion across two categorical variables

Description

plotProportionBar creates bar plots comparing thecross-category proportion.plotProportionDot creates dot plots.plotClusterProportions has variable pre-specified and calls the dotplot.plotProportion produces a combination of both bar plots and dotplot.

Having package "ggrepel" installed can help adding tidier percentageannotation on the pie chart. Runoptions(ggrepel.max.overlaps = n)before plotting to set allowed label overlaps.

Usage

plotProportion(  object,  class1 = NULL,  class2 = "dataset",  method = c("stack", "group", "pie"),  ...)plotProportionDot(  object,  class1 = NULL,  class2 = "dataset",  showLegend = FALSE,  panelBorder = TRUE,  ...)plotProportionBar(  object,  class1 = NULL,  class2 = "dataset",  method = c("stack", "group"),  inclRev = FALSE,  panelBorder = TRUE,  combinePlot = TRUE,  ...)plotClusterProportions(object, useCluster = NULL, return.plot = FALSE, ...)plotProportionPie(  object,  class1 = NULL,  class2 = "dataset",  labelSize = 4,  labelColor = "black",  circleColors = NULL,  ...)

Arguments

object

Aliger object.

class1,class2

Each should be a single name of a categorical variableavailable incellMeta slot. Number of cells in each categories inclass2 will be served as the denominator when calculating proportions.By defaultclass1 = NULL and uses default clusters andclass2 ="dataset".

method

For bar plot, choose whether to draw"stack" or"group" bar plot. Default"stack".

...

Arguments passed on to.ggplotLigerTheme

title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
legendFillTitle: Legend title text for fill aesthetics, often used forviolin, box, bar plots. DefaultNULL shows the original variable name.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
colorPalette: For continuous coloring, an index or a palette name toselect from available options from ggplotscale_brewer orviridis.Default"magma".
colorDirection: Choose1 or-1. Applied whencolorPalette is from Viridis options. Default-1 use darkercolor for higher value, while1 reverses this direction.
colorLow,colorMid,colorHigh,colorMidPoint: All four of these must bespecified to customize palette with
naColor: The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

showLegend

Whether to show the legend. DefaultTRUE.

panelBorder

Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.

inclRev

Logical, for barplot, whether to reverse the specification forclass1 andclass2 and produce two plots. DefaultFALSE.

combinePlot

Logical, whether to combine the two plots withplot_grid when two plots are created. DefaultTRUE.

useCluster

ForplotClusterProportions. Same asclass1whileclass2 is hardcoded with"dataset".

return.plot

labelSize,labelColor

Settings on pie chart percentage label. Default4 and"white".

circleColors

Character vector of colors.plotProportionPieparameter for setting the colors of circles, i.e. categorical variablecontrolled byclass2. DefaultNULL uses ggplot default hues.

Value

ggplot or list of ggplot

Examples

plotProportion(pbmcPlot)plotProportionBar(pbmcPlot, method = "group")plotProportionPie(pbmcPlot)

Box plot of cluster proportion in each dataset, grouped by condition

Description

This function calculate the proportion of each category (e.g. cluster, celltype) within each dataset, and then make box plot grouped by condition. Theproportion of all categories within one dataset sums up to 1. The conditionvariable must be a variable of dataset, i.e. each dataset must belong to onlyone condition.

Usage

plotProportionBox(  object,  useCluster = NULL,  conditionBy = NULL,  sampleBy = "dataset",  splitByCluster = FALSE,  dot = FALSE,  dotSize = getOption("ligerDotSize", 1),  dotJitter = FALSE,  ...)

Arguments

object

Aliger object.

useCluster

Name of variable incellMeta(object). DefaultNULL uses default cluster.

conditionBy

Name of the variable incellMeta(object) thatrepresents the condition. Must be a high level variable of thesampleBy variable, i.e. each sample must belong to only one condition.DefaultNULL does not group samples by condition.

sampleBy

Name of the variable incellMeta(object) thatrepresents individual samples. Default"dataset".

splitByCluster

Logical, whether to split the wide grouped box plot bycluster, into a list of boxplots for each cluster. DefaultFALSE.

dot

Logical, whether to add dot plot on top of the box plot. DefaultFALSE.

dotSize

Size of the dot. Default uses user option "ligerDotSize", or1 if not set.

dotJitter

Logical, whether to jitter the dot to avoid overlappingwithin a box when many dots are presented. DefaultFALSE.

...

Arguments passed on to.ggplotLigerTheme

title,subtitle,xlab,ylab: Main title, subtitle or X/Y axis title text.By default, no main title or subtitle will be set, and X/Y axis title will bethe names of variables used for plotting. UseNULL to hide elements.TRUE forxlab orylab shows default values.
legendFillTitle: Legend title text for fill aesthetics, often used forviolin, box, bar plots. DefaultNULL shows the original variable name.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
colorPalette: For continuous coloring, an index or a palette name toselect from available options from ggplotscale_brewer orviridis.Default"magma".
colorDirection: Choose1 or-1. Applied whencolorPalette is from Viridis options. Default-1 use darkercolor for higher value, while1 reverses this direction.
colorLow,colorMid,colorHigh,colorMidPoint: All four of these must bespecified to customize palette with
naColor: The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.
plotly: Whether to use plotly to enable web based interactive browsingfor the plot. Requires installation of package "plotly". DefaultFALSE.

Value

A ggplot object or a list of ggplot objects ifsplitByCluster = TRUE.

Examples

# "boxes" are expected to appear as horizontal lines, because there's no# "condition" variable that groups the datasets in the example object, and# thus only one value exists for each "box".plotProportionBox(pbmcPlot, conditionBy = "dataset")

Make Riverplot/Sankey diagram that shows label mapping across datasets

Description

Creates a riverplot/Sankey diagram to show how independent clusterassignments from two datasets map onto a joint clustering. Prior knowledge ofcell annotation for the given datasets is required to make sense from thevisualization. Dataset original annotation can be added with the syntax shownin example code in this manual. The joint clustering could be generated withrunCluster or set by any other metadata annotation.

Dataset original annotation can be inserted before running this functionusingcellMeta<- method. Please see example below.

This function depends on CRAN available package "sankey" and it has to beinstalled in order to make this function work.

Usage

plotSankey(  object,  cluster1,  cluster2,  clusterConsensus = NULL,  minFrac = 0.01,  minCell = 10,  titles = NULL,  prefixes = NULL,  labelCex = 1,  titleCex = 1.1,  colorValues = scPalette,  mar = c(2, 2, 4, 2))

Arguments

object

Aliger object with all three clusteringvariables available.

cluster1,cluster2

Name of the variables incellMeta(object) forthe cluster assignments of dataset 1 and 2, respectively.

clusterConsensus

Name of the joint cluster variable to use. Defaultuses the default clustering of the object. Can select a variable name incellMeta(object).

minFrac

Numeric. Minimum fraction of cluster for an edge to be shown.Default0.05.

minCell

Numeric. Minimum number of cells for an edge to be shown.Default10.

titles

Character vector of three. Customizes the column title textshown. Default uses the variable namescluster1,clusterConsensus andcluster2.

prefixes

Character vector of three. Cluster names have to be uniqueacross all three variables, so this is provided to deduplicate the clustersby adding"prefixes[i]-" before the actual label. This will not beapplied when no duplicate is found. DefaultNULL uses variable names.An NA value or a string with no character (i.e."") does not add theprefix to the corresponding variable.

labelCex

Numeric. Amount by which node label text should be magnifiedrelative to the default. Default1.

titleCex

Numeric. Amount by which node label text should be magnifiedrelative to the default. Default1.1.

colorValues

Character vector of color codes to set color for eachlevel in the consensus clustering. DefaultscPalette.

mar

Numeric vector of the formc(bottom, left, top, right)which gives the number of lines of margin to be specified on the four sidesof the plot. Increasing the 2nd and 4th values can be helpful when clusterlabels are long and extend out side of the plotting region. Defaultc(2, 2, 4, 2).

Value

No returned value. The sankey diagram will be displayed instead.

Note

This function works as a replacement of the functionmakeRiverplotin rliger <1.99. We decide to make a new function because the dependencyadopted by the older version is archived on CRAN and will be no longeravailable.

Examples

# Make fake dataset specific labels from joint clustering resultcellMeta(pbmcPlot, "ctrl_cluster", "ctrl") <-    cellMeta(pbmcPlot, "leiden_cluster", "ctrl")cellMeta(pbmcPlot, "stim_cluster", "stim") <-    cellMeta(pbmcPlot, "leiden_cluster", "stim")if (requireNamespace("sankey", quietly = TRUE)) {    plotSankey(pbmcPlot, "ctrl_cluster", "stim_cluster",               titles = c("control", "LIGER", "stim"),               prefixes = c("c", NA, "s"))}

Visualize a spatial dataset

Description

Simple visualization of spatial coordinates. See example code for how to haveinformation preset in the object. Arguments to the liger object method arepassed down to ligerDataset method.

Usage

plotSpatial2D(object, ...)## S3 method for class 'liger'plotSpatial2D(object, dataset, useCluster = NULL, legendColorTitle = NULL, ...)## S3 method for class 'ligerSpatialDataset'plotSpatial2D(  object,  useCluster = NULL,  legendColorTitle = NULL,  useDims = c(1, 2),  xlab = NULL,  ylab = NULL,  labelText = FALSE,  panelBorder = TRUE,  ...)

Arguments

object

Either aliger object containing a spatialdataset or aligerSpatialDataset object.

...

Arguments passed on to.ggScatter,.ggplotLigerTheme

dotOrder: Controls the order that each dot is added to the plot. Choosefrom"shuffle","ascending", or"descending". Default"shuffle", useful when coloring by categories that overlaps (e.g."dataset"),"ascending" can be useful when coloring by a continuousvariable (e.g. gene expression) where high values needs morehighlight.NULL use default order.
dotSize,dotAlpha: Numeric, controls the size or transparency of alldots. DefaultgetOption("ligerDotSize") (1) and0.9.
raster: Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.
labelTextSize: Numeric, controls the size of label size whenlabelText = TRUE. Default4.
seed: Random seed for reproducibility. Default1.
showLegend: Whether to show the legend. DefaultTRUE.
legendPosition: Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"right".
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
legendDotSize: Allow dots in legend region to be large enough to seethe colors/shapes clearly. Default4.
colorLabels: Character vector for modifying category names in acolor legend. Passed toggplot2::scale_color_manual(labels). DefaultNULL uses original levels of the factor.
colorValues: Character vector of colors for modifying category colorsin a color legend. Passed toggplot2::scale_color_manual(values).DefaultNULL uses internal selected palette when <= 26 categories arepresented, otherwise ggplot hues.
legendNRow,legendNCol: Integer, when too many categories in onevariable, arranges number of rows or columns. DefaultNULL,automatically split toceiling(levels(variable)/15) columns.
naColor: The color code forNA values. Default"#DEDEDE".scale_colour_gradient2. DefaultNULL.

dataset

Name of one spatial dataset.

useCluster

Either the name of one variable incellMeta(object)or a factor object with annotation that matches with all cells in thespecified dataset. DefaultNULL uses default clusters.

legendColorTitle

Alternative title text in the legend. DefaultNULL uses the variable name set byuseCluster, or"Annotation" isuseCluster is a customized factor object.

useDims

Numeric vector of two, choosing the coordinates to be drawnon 2D space. (STARmap data could have 3 dimensions.) Defaultc(1, 2).

xlab,ylab

Text label on x-/y-axis. DefaultNULL does not showit.

labelText

Logical, whether to label annotation onto the scatter plot.DefaultFALSE.

panelBorder

Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultTRUE.

Value

A ggplot object

Examples

ctrl.fake.spatial <- as.ligerDataset(dataset(pbmc, "ctrl"), modal = "spatial")fake.coords <- matrix(rnorm(2 * ncol(ctrl.fake.spatial)), ncol = 2)coordinate(ctrl.fake.spatial) <- fake.coordsdataset(pbmc, "ctrl") <- ctrl.fake.spatialdefaultCluster(pbmc) <- pbmcPlot$leiden_clusterplotSpatial2D(pbmc, dataset = "ctrl")

Plot the variance vs mean of feature expression

Description

For each dataset where the feature variablitity is calculated,a plot of log10 feature expression variance and log10 mean will be produced.Features that are considered as variable would be highlighted in red.

Usage

plotVarFeatures(object, combinePlot = TRUE, dotSize = 1, ...)

Arguments

object

liger object.selectGenes needs tobe run in advance.

combinePlot

Logical. IfTRUE, sub-figures for all datasets willbe combined into one plot. ifFALSE, a list of plots will be returned.DefaultTRUE.

dotSize

Controls the size of dots in the main plot. Default0.8.

...

More theme setting parameters passed to.ggplotLigerTheme.

Value

ggplot object whencombinePlot = TRUE, a list ofggplot objects whencombinePlot = FALSE

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)plotVarFeatures(pbmc)

Create volcano plot for Wilcoxon test result

Description

plotVolcano is a simple implementation and shares most of argumentswith other rliger plotting functions.plotEnhancedVolcano is awrapper function ofEnhancedVolcano::EnhancedVolcano(), which hasprovides substantial amount of arguments for graphical control. However, thatrequires the installation of package "EnhancedVolcano".

highlight andlabelTopN both controls the feature namelabeling, whereashighlight is considered first. If both are asdefault (NULL), all significant features will be labeled.

Usage

plotVolcano(  result,  group = NULL,  logFCThresh = 1,  padjThresh = 0.01,  highlight = NULL,  labelTopN = NULL,  dotSize = 2,  dotAlpha = 0.8,  legendPosition = "top",  labelSize = 4,  ...)

Arguments

result

Data frame table returned byrunMarkerDEG orrunPairwiseDEG.

group

Selection of one group available fromresult$group. Ifonly one group is available fromresult, defaultNULL uses it.

logFCThresh

Number for the threshold on the absolute value of the log2fold change statistics. Default1.

padjThresh

Number for the threshold on the adjusted p-valuestatistics. Default0.01.

highlight

A character vector of feature names to be highlighted.DefaultNULL.

labelTopN

Number of top differential expressed features to be labeledon the top of the dots. Ranked by adjusted p-value first and absolute valueof logFC next. DefaultNULL.

dotSize,dotAlpha

Numbers for universal aesthetics control of dots.Default2 and0.8.

legendPosition

Text indicating where to place the legend. Choose from"top","bottom","left" or"right". Default"top".

labelSize

Size of labeled top features and line annotations. Default4.

...

Arguments passed on to.ggScatter,.ggplotLigerTheme

dotOrder: Controls the order that each dot is added to the plot. Choosefrom"shuffle","ascending", or"descending". Default"shuffle", useful when coloring by categories that overlaps (e.g."dataset"),"ascending" can be useful when coloring by a continuousvariable (e.g. gene expression) where high values needs morehighlight.NULL use default order.
raster: Logical, whether to rasterize the plot. DefaultNULLautomatically rasterize the plot when number of total dots to be plottedexceeds 100,000.
labelText: Logical, whether to show text label at the median positionof each categorical group specified bycolorBy. DefaultTRUE.Does not work when continuous coloring is specified.
labelTextSize: Numeric, controls the size of label size whenlabelText = TRUE. Default4.
seed: Random seed for reproducibility. Default1.
legendColorTitle: Legend title text for color aesthetics, often usedfor categorical or continuous coloring of dots. DefaultNULL shows theoriginal variable name.
showLegend: Whether to show the legend. DefaultTRUE.
baseSize: One-parameter control of all text sizes. Individual textelement sizes can be controlled by other size arguments. "Title" sizes are2 points larger than "text" sizes when being controlled by this.
titleSize,xTitleSize,yTitleSize,legendTitleSize: Size of main title,axis titles and legend title. DefaultNULL controls bybaseSize + 2.
subtitleSize,xTextSize,yTextSize,legendTextSize: Size of subtitle text,axis texts and legend text. DefaultNULL controls bybaseSize.
panelBorder: Whether to show rectangle border of the panel instead ofusing ggplot classic bottom and left axis lines. DefaultFALSE.

Value

ggplot

Examples

plotVolcano(deg.pw, "stim.CD14 Mono")

Show information about factorGSEA object

Description

Show information about factorGSEA object

Usage

## S3 method for class 'factorGSEA'print(x, ...)

Arguments

x

AfactorGSEA object.

...

S3 method convention, not used for now.

Quantile align (normalize) factor loadings

Description

This is a deprecated function. Calling 'quantileNorm' instead.

Usage

quantileAlignSNF(  object,  knn_k = 20,  k2 = 500,  prune.thresh = 0.2,  ref_dataset = NULL,  min_cells = 20,  quantiles = 50,  nstart = 10,  resolution = 1,  dims.use = 1:ncol(x = object@H[[1]]),  dist.use = "CR",  center = FALSE,  small.clust.thresh = 0,  id.number = NULL,  print.mod = FALSE,  print.align.summary = FALSE)

Arguments

object

liger object. Should run optimizeALS before calling.

knn_k

Number of nearest neighbors for within-dataset knn graph (default 20).

k2

Horizon parameter for shared nearest factor graph. Distances to all but the k2 nearestneighbors are set to 0 (cuts down on memory usage for very large graphs). (default 500)

prune.thresh

Minimum allowed edge weight. Any edges below this are removed (given weight0) (default 0.2)

ref_dataset

Name of dataset to use as a "reference" for normalization. By default,the dataset with the largest number of cells is used.

min_cells

Minimum number of cells to consider a cluster shared across datasets (default 2)

quantiles

Number of quantiles to use for quantile normalization (default 50).

nstart

Number of times to perform Louvain community detection with different randomstarts (default 10).

resolution

Controls the number of communities detected. Higher resolution -> morecommunities. (default 1)

dims.use

Indices of factors to use for shared nearest factor determination (default1:ncol(H[[1]])).

dist.use

Distance metric to use in calculating nearest neighbors (default "CR").

center

Centers the data when scaling factors (useful for less sparse modalities likemethylation data). (default FALSE)

small.clust.thresh

Extracts small clusters loading highly on single factor with fewercells than this before regular alignment (default 0 – no small cluster extraction).

id.number

Number to use for identifying edge file (when running in parallel)(generates random value by default).

print.mod

Print modularity output from clustering algorithm (default FALSE).

print.align.summary

Print summary of clusters which did not align normally (default FALSE).

Details

This process builds a shared factor neighborhood graph to jointly cluster cells, then quantilenormalizes corresponding clusters.

The first step, building the shared factor neighborhood graph, is performed in SNF(), andproduces a graph representation where edge weights between cells (across all datasets)correspond to their similarity in the shared factor neighborhood space. An important parameterhere is knn_k, the number of neighbors used to build the shared factor space (see SNF()). Afterwards,modularity-based community detection is performed on this graph (Louvain clustering) in orderto identify shared clusters across datasets. The method was first developed by Waltman and van Eck(2013) and source code is available at http://www.ludowaltman.nl/slm/. The most important parameterhere is resolution, which corresponds to the number of communities detected.

Next we perform quantile alignment for each dataset, factor, and cluster (bystretching/compressing datasets' quantiles to better match those of the reference dataset). Thesealigned factor loadings are combined into a single matrix and returned as H.norm.

Value

liger object with H.norm and cluster slots set.

Examples

## Not run: # liger object, factorization completeligerex# do basic quantile alignmentligerex <- quantileAlignSNF(ligerex)# higher resolution for more clusters (note that SNF is conserved)ligerex <- quantileAlignSNF(ligerex, resolution = 1.2)# change knn_k for more fine-grained local clusteringligerex <- quantileAlignSNF(ligerex, knn_k = 15, resolution = 1.2)## End(Not run)

Quantile Align (Normalize) Factor Loadings

Description

This process builds a shared factor neighborhood graph tojointly cluster cells, then quantile normalizes corresponding clusters.

Next we perform quantile alignment for each dataset, factor, and cluster (bystretching/compressing datasets' quantiles to better match those of thereference dataset).

Usage

quantileNorm(object, ...)## S3 method for class 'liger'quantileNorm(  object,  quantiles = 50,  reference = NULL,  minCells = 20,  nNeighbors = 20,  useDims = NULL,  center = FALSE,  maxSample = 1000,  eps = 0.9,  refineKNN = TRUE,  clusterName = "quantileNorm_cluster",  seed = 1,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'Seurat'quantileNorm(  object,  reduction = "inmf",  quantiles = 50,  reference = NULL,  minCells = 20,  nNeighbors = 20,  useDims = NULL,  center = FALSE,  maxSample = 1000,  eps = 0.9,  refineKNN = TRUE,  clusterName = "quantileNorm_cluster",  seed = 1,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

Aliger or Seurat object with valid factorizationresult available (i.e.runIntegration performed in advance).

...

Arguments passed to other S3 methods of this function.

quantiles

Number of quantiles to use for quantile normalization.Default50.

reference

Character, numeric or logical selection of one dataset, outof all available datasets inobject, to use as a "reference" forquantile normalization. DefaultNULL tries to find an RNA dataset withthe largest number of cells; if no RNA dataset available, use the globallylargest dataset.

minCells

Minimum number of cells to consider a cluster shared acrossdatasets. Default20.

nNeighbors

Number of nearest neighbors for within-dataset knn graph.Default20.

useDims

Indices of factors to use for shared nearest factordetermination. DefaultNULL uses all factors.

center

Whether to center the data when scaling factors. Could beuseful for less sparse modalities like methylation data. DefaultFALSE.

maxSample

Maximum number of cells used for quantile normalization ofeach cluster and factor. Default1000.

eps

The error bound of the nearest neighbor search. Lower values givemore accurate nearest neighbor graphs but take much longer to compute.Default0.9.

refineKNN

whether to increase robustness of cluster assignments usingKNN graph. DefaultTRUE.

clusterName

Variable name that will store the clustering resultin metadata of aliger object or aSeurat object.Default"quantileNorm_cluster"

seed

Random seed to allow reproducible results. Default1.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

reduction

Name of the reduction where LIGER integration result isstored. Default"inmf".

Value

Updated input object

liger method
- Update theH.norm slot for the alignment cell factorloading, ready for running graph based community detectionclustering or dimensionality reduction for visualization.
- Update thecellMata slot with a cluster assignment basingon cell factor loading
Seurat method
- Update thereductions slot with a newDimReducobject containing the aligned cell factor loading.
- Update the metadata with a cluster assignment basing on cellfactor loading

Examples

pbmc <- quantileNorm(pbmcPlot)

Quantile align (normalize) factor loading

Description

Please turn toquantileNorm.

This process builds a shared factor neighborhood graph to jointly clustercells, then quantile normalizes corresponding clusters.

Next we perform quantile alignment for each dataset, factor, and cluster (bystretching/compressing datasets' quantiles to better match those of thereference dataset). These aligned factor loadings are combined into a singlematrix and returned as H.norm.

Arguments

object

liger object. Should run optimizeALS before calling.

knn_k

Number of nearest neighbors for within-dataset knn graph(default 20).

ref_dataset

Name of dataset to use as a "reference" for normalization.By default, the dataset with the largest number of cells is used.

min_cells

Minimum number of cells to consider a cluster shared acrossdatasets (default 20)

quantiles

Number of quantiles to use for quantile normalization(default 50).

eps

The error bound of the nearest neighbor search. (default 0.9)Lower values give more accurate nearest neighbor graphs but take much longerto computer.

dims.use

Indices of factors to use for shared nearest factordetermination (default1:ncol(H[[1]])).

do.center

Centers the data when scaling factors (useful for lesssparse modalities like methylation data). (default FALSE)

max_sample

Maximum number of cells used for quantile normalization ofeach cluster and factor. (default 1000)

refine.knn

whether to increase robustness of cluster assignments usingKNN graph.(default TRUE)

rand.seed

Random seed to allow reproducible results (default 1)

Value

liger object with 'H.norm' and 'clusters' slot set.

Access ligerATACDataset peak data

Description

Similar as how defaultligerDataset data isaccessed.

Usage

rawPeak(x, dataset)rawPeak(x, dataset, check = TRUE) <- valuenormPeak(x, dataset)normPeak(x, dataset, check = TRUE) <- value## S4 method for signature 'liger,character'rawPeak(x, dataset)## S4 replacement method for signature 'liger,character'rawPeak(x, dataset, check = TRUE) <- value## S4 method for signature 'liger,character'normPeak(x, dataset)## S4 replacement method for signature 'liger,character'normPeak(x, dataset, check = TRUE) <- value## S4 method for signature 'ligerATACDataset,missing'rawPeak(x, dataset = NULL)## S4 replacement method for signature 'ligerATACDataset,missing'rawPeak(x, dataset = NULL, check = TRUE) <- value## S4 method for signature 'ligerATACDataset,missing'normPeak(x, dataset = NULL)## S4 replacement method for signature 'ligerATACDataset,missing'normPeak(x, dataset = NULL, check = TRUE) <- value

Arguments

x

ligerATACDataset object or aligerobject.

dataset

Name or numeric index of an ATAC dataset.

check

Logical, whether to perform object validity check on setting newvalue.

value

dgCMatrix-class matrix.

Value

The retrieved peak count matrix or the updatedx object.

Load in data from 10X

Description

Enables easy loading of sparse data matrices provided by 10X genomics.

read10X works generally for 10X cellranger pipelines including:CellRanger < 3.0 & >= 3.0 and CellRanger-ARC.

read10XRNA invokesread10X and takes the "Gene Expression" out,so that the result can directly be used to construct aligerobject. See Examples for demonstration.

read10XATAC works for both cellRanger-ARC and cellRanger-ATACpipelines but needs user arguments for correct recognition. Similarly, thereturned value can directly be used for constructing aligerobject.

Usage

read10X(  path,  sampleNames = NULL,  addPrefix = FALSE,  useFiltered = NULL,  reference = NULL,  geneCol = 2,  cellCol = 1,  returnList = FALSE,  verbose = getOption("ligerVerbose", TRUE),  sample.dirs = path,  sample.names = sampleNames,  use.filtered = useFiltered,  data.type = NULL,  merge = NULL,  num.cells = NULL,  min.umis = NULL)read10XRNA(  path,  sampleNames = NULL,  addPrefix = FALSE,  useFiltered = NULL,  reference = NULL,  returnList = FALSE,  ...)read10XATAC(  path,  sampleNames = NULL,  addPrefix = FALSE,  useFiltered = NULL,  pipeline = c("atac", "arc"),  arcFeatureType = "Peaks",  returnList = FALSE,  geneCol = 2,  cellCol = 1,  verbose = getOption("ligerVerbose", TRUE))

Arguments

path

(A.) A Directory containing the matrix.mtx, genes.tsv (orfeatures.tsv), and barcodes.tsv files provided by 10X. A vector, a namedvector, a list or a named list can be given in order to load several datadirectories. (B.) The 10X root directory where subdirectories of per-sampleoutput folders can be found. Sample names will by default take the name ofthe vector, list or subfolders.

sampleNames

A vector of names to override the detected or set samplenames for what is given topath. DefaultNULL. If no namedetected at all and multiple samples are given, will name them by numbers.

addPrefix

Logical, whether to add sample names as a prefix to thebarcodes. DefaultFALSE.

useFiltered

Logical, ifpath is given as case B, whether to usethe filtered feature barcode matrix instead of raw (unfiltered). DefaultTRUE.

reference

In case of specifying a CellRanger<3 root folder topath, import the matrix from the output using which reference. Onlyneeded when multiple references present. DefaultNULL.

geneCol

Specify which column of genes.tsv or features.tsv to use forgene names. Default2.

cellCol

Specify which column of barcodes.tsv to use for cell names.Default1.

returnList

Logical, whether to still return a structured list insteadof a single matrix object, in the case where only one sample and only onefeature type can be found. Otherwise will always return a list. DefaultFALSE.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

sample.dirs,sample.names,use.filtered

These arguments are renamed andwill be deprecated in the future. Please see usage for correspondingarguments.

data.type,merge,num.cells,min.umis

These arguments are defunctedbecause the functionality can/should be fulfilled with other functions.

...

Arguments passed toread10X

pipeline

Which cellRanger pipeline type to find the ATAC data. Choose"atac" to read the peak matrix from cellranger-atac pipeline outputfolder(s), or"arc" to split the ATAC feature subset out from themultiomic cellranger-arc pipeline output folder(s). Default"atac".

arcFeatureType

Whenpipeline = "arc", which feature type isfor the ATAC data of interests. Default"Peaks". Other possiblefeature types can be"Chromatin Accessibility". Error message willshow available options if argument specification cannot be found.

Value

When only one sample is given or detected, and only one feature typeis detected or using CellRanger < 3.0, andreturnList = FALSE, asparse matrix object (dgCMatrix class) will be returned.
When usingread10XRNA orread10XATAC, which are modalityspecific, returns a list named by samples, and each element is thecorresponding sparse matrix object (dgCMatrix class).
read10X generally returns a list named by samples. Each sampleelement will be another list named by feature types even if only one featuretype is detected (or using CellRanger < 3.0) for data structure consistency.The feature type "Gene Expression" always comes as the first type ifavailable.

Examples

## Not run: # For output from CellRanger < 3.0dir <- 'path/to/data/directory'list.files(dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtxmat <- read10X(dir)class(mat) # Should show dgCMatrix# For root directory from CellRanger < 3.0dir <- 'path/to/root'list.dirs(dir) # Should show sample namesmatList <- read10X(dir)names(matList) # Should show the sample namesclass(matList[[1]][["Gene Expression"]]) # Should show dgCMatrix# For output from CellRanger >= 3.0 with multiple data typesdir <- 'path/to/data/directory'list.files(dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gzmatList <- read10X(dir, sampleNames = "tissue1")names(matList) # Shoud show "tissue1"names(matList$tissue1) # Should show feature types, e.g. "Gene Expression" and etc.# For root directory from CellRanger >= 3.0 with multiple data typesdir <- 'path/to/root'list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3"matList <- read10X(dir)names(matList) # Should show the sample names: "rep1", "rep2", "rep3"names(matList$rep1) # Should show the avalable feature types for rep1## End(Not run)## Not run: # For creating LIGER object from root directory of CellRanger >= 3.0dir <- 'path/to/root'list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3"matList <- read10XRNA(dir)names(matList) # Should show the sample names: "rep1", "rep2", "rep3"sapply(matList, class) # Should show matrix class all are "dgCMatrix"lig <- createLigerObject(matList)## End(Not run)

Read 10X cellranger files (matrix, barcodes and features) into R session

Description

This function works for loading a single sample with specifying the paths tothe matrix.mtx, barcodes.tsv, and features.tsv files. This function isinternally used byread10X functions for loading individualsamples from cellranger output directory, while it can also be convenientwhen out-of-standard files are presented (e.g. data downloaded from GEO).

Usage

read10XFiles(  matrixPath,  barcodesPath,  featuresPath,  sampleName = NULL,  geneCol = 2,  cellCol = 1,  isATAC = FALSE,  returnList = FALSE)

Arguments

matrixPath

Character string, path to the matrix MTX file. Can begzipped.

barcodesPath

Character string, path to the barcodes TSV file. Can begzipped.

featuresPath

Character string, path to the features TSV file. Can begzipped.

sampleName

Character string attached as a prefix to the cell barcodesloaded from the barcodes file. DefaultNULL does not add any prefix.Useful when users plan to merge multiple samples into one matrix and needto avoid duplicated cell barcodes from different batches.

geneCol

An integer indicating which column in the features file toextract as the feature identifiers. Default2.

cellCol

An integer indicating which column in the barcodes file toextract as the cell identifiers. Default1.

isATAC

Logical, whether the data is for ATAC-seq. DefaultFALSE. IfTRUE, feature identifiers will be generated bycombining the first three columns of the features file in the format of"chr:start-end".

returnList

Logical, used internally by wrapper functions. Whether toforce putting the loaded matrix in a list even if there's only one matrix.DefaultFALSE.

Value

For a single-modal sample, a dgCMatrix object, or a list of onedgCMatrix whenreturnList = TRUE. A list of multiple dgCMatrix objectswhen multiple feature types are detected.

Examples

## Not run: matrix <- read10XFiles(    matrixPath = "path/to/matrix.mtx.gz",    barcodesPath = "path/to/barcodes.tsv.gz",    featuresPath = "path/to/features.tsv.gz")## End(Not run)

Read 10X HDF5 file

Description

Read count matrix from 10X CellRanger HDF5 file. By default,read10XH5load scRNA, scATAC or multimodal data into memory (inMemory = TRUE).To use LIGER in delayed mode for handling large datasets, setinMemory = FALSE to load the data as aDelayedArray object. Thedelayed mode only supports scRNA data for now.

Usage

read10XH5(filename, inMemory = TRUE, useNames = TRUE, featureMakeUniq = TRUE)read10XH5Mem(filename, useNames = TRUE, featureMakeUniq = TRUE)read10XH5Delay(filename, useNames = TRUE, featureMakeUniq = TRUE)

Arguments

filename

Character string, path to the HDF5 file.

inMemory

Logical, whether to load the data into memory. DefaultTRUE.FALSE loads the data as aDelayedArray object.

useNames

Logical, whether to use gene names as row names. DefaultTRUE.FALSE uses gene IDs instead.

featureMakeUniq

Logical, whether to make gene names unique. DefaultTRUE.

Value

A sparse matrix when only using older CellRanger output HDF5 file orwhen only one genome and one modality is detected. When multiple genomes areavailable, will return a list for each genome. When using multimodal data,each genome will be a list of matrices for each modality. The matrix will beof dgCMatrix class when in memory, or a TENxMatrix object when in delayedmode.

Examples

matrix <- read10XH5(    filename = system.file("extdata/ctrl.h5", package = "rliger"),    inMemory = TRUE)class(matrix) # Should show dgCMatrixif (requireNamespace("HDF5Array", quietly = TRUE)) {   matrix <- read10XH5(      filename = system.file("extdata/ctrl.h5", package = "rliger"),      inMemory = FALSE   )   print(class(matrix)) # Should show TENxMatrix}

Read matrix from H5AD file

Description

Read raw count matrix from H5AD file. By default,readH5AD loadspecified layer into memory (inMemory = TRUE). To use LIGER in delayedmode for handling large datasets, setinMemory = FALSE to load thedata as aDelayedArray object. Note that only CSR format is supportedfor the matrix.

Usage

readH5AD(filename, layer, inMemory = TRUE, obs = FALSE)readH5ADMem(filename, layer, obs = FALSE)readH5ADDelay(filename, layer, obs = FALSE)

Arguments

filename

Character string, path to the H5AD file.

layer

Character string specifying the H5 path of raw count data to beloaded. Use'X' foradata.X,'raw/X' foradata.raw.X, or'layers/layer_name' foradata.layers['layer_name'].

inMemory

Logical, whether to load the data into memory. DefaultTRUE.FALSE loads the data as aDelayedArray object.

obs

Logical, whether to also load the cell metadata fromadata.obs. DefaultFALSE.

Details

Currently, the only supported H5AD AnnData encoding versions are as follows:

adata.X,adata.raw.X, oradata.layers['layer'] -csr_matrix 0.1.0
adata.obs andadata.var - dataframe 0.2.0
Categoricals in a data frame - categorical 0.2.0

If users possess H5AD files encoded with older specification, please eitheropen an issue on GitHub or use R package 'anndata' to manually extractinformation.

Value

When loaded in memory, a sparse matrix of classdgCMatrix willbe returned. When loaded in delayed mode, aTENxMatrix object will bereturned. Ifobs = TRUE, a list containing the matrix and the cellmetadata will be returned.

Examples

tempH5AD <- tempfile(fileext = '.h5ad')writeH5AD(pbmc, tempH5AD, overwrite = TRUE)mat <- readH5AD(tempH5AD, layer = 'X')delayMat <- readH5AD(tempH5AD, layer = 'X', inMemory = FALSE)

Read liger object from RDS file

Description

This file reads a liger object stored in RDS files under all kinds of types.

Aliger object with in-memory data created from packageversion since 1.99.
A liger object with on-disk H5 data associated, where the link to H5 fileswill be automatically restored.
A liger object created with older package version, and can be updated tothe latest data structure by default.

Usage

readLiger(  filename,  dimredName,  clusterName = "clusters",  h5FilePath = NULL,  update = TRUE)

Arguments

filename

Path to an RDS file of aliger object of old versions.

dimredName

The name of variable incellMeta slot to store thedimensionality reduction matrix, which originally located intsne.coords slot. Default"tsne.coords".

clusterName

The name of variable incellMeta slot to store theclustering assignment, which originally located inclusters slot.Default"clusters".

h5FilePath

Named character vector for all H5 file paths. Not requiredfor object run with in-memory analysis. For object containing H5-basedanalysis (e.g. online iNMF), this must be supplied if the H5 file location isdifferent from that at creation time.

update

Logical, whether to update an old (<=1.99.0)liger objectto the currect version of structure. DefaultTRUE.

Value

New version ofliger object

Examples

# Save and read regular current-version liger objecttempPath <- tempfile(fileext = ".rds")saveRDS(pbmc, tempPath)pbmc <- readLiger(tempPath, dimredName = NULL)# Save and read H5-based liger objecth5Path <- system.file("extdata/ctrl.h5", package = "rliger")h5tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = h5tempPath)lig <- createLiger(list(ctrl = h5tempPath))tempPath <- tempfile(fileext = ".rds")saveRDS(lig, tempPath)lig <- readLiger(tempPath, h5FilePath = list(ctrl = h5tempPath))## Not run: # Read a old liger object <= 1.0.1# Assume the dimensionality reduction method applied was UMAP# Assume the clustering was derived with Louvain methodlig <- readLiger(    filename = "path/to/oldLiger.rds",    dimredName = "UMAP",    clusterName = "louvain")## End(Not run)

See`downsample`

Description

This function mainly aims at downsampling datasets to a sizesuitable for plotting.

Usage

readSubset(  object,  slot.use = "normData",  balance = NULL,  max.cells = 1000,  chunk = 1000,  datasets.use = NULL,  genes.use = NULL,  rand.seed = 1,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

liger object

slot.use

Only create subset from one or more of"rawData","normData" and"scaleData". DefaultNULL subsets thewholeobject including downstream results.

balance

"all" for samplingmaxCells cells from alldatasets specified byuseDatasets."cluster" for samplingmaxCells cells per cluster per dataset."dataset" formaxCells cells per dataset.

max.cells

Max number of cells to sample from the grouping based onbalance.

chunk

Integer. Number of maximum number of cells in each chunk,Default1000.

datasets.use

Index selection of datasets to consider. DefaultNULL for using all datasets.

genes.use

Character vector. Subset features to this specified range.DefaultNULL does not subset features.

rand.seed

Random seed for reproducibility. Default1.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

Value

Subset ofligerobject.

Objects exported from other packages

Description

These objects are imported from other packages. Follow the linksbelow to see their documentation.

magrittr: %<>%,%>%

Remove missing cells or features from liger object

Description

Remove missing cells or features from liger object

Usage

removeMissing(  object,  orient = c("both", "feature", "cell"),  minCells = NULL,  minFeatures = NULL,  useDatasets = NULL,  newH5 = TRUE,  filenameSuffix = "removeMissing",  verbose = getOption("ligerVerbose", TRUE),  ...)removeMissingObs(  object,  slot.use = NULL,  use.cols = TRUE,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

liger object

orient

Choose to remove non-expressing features ("feature"),empty barcodes ("cell"), or both of them ("both"). Default"both".

minCells

Keep features that are expressed in at least this number ofcells, calculated on a per-dataset base. A single value for all datasets ora vector for each dataset. DefaultNULL only removes none expressingfeatures.

minFeatures

Keep cells that express at least this number of features,calculated on a per-dataset base. A single value for all datasets or a vectorfor each dataset. DefaultNULL only removes none expressing cells.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be processed. DefaultNULL removes empty entries from all datasets.

newH5

Logical, whether to create a new H5 file on disk for eachH5-based dataset on subset. DefaultTRUE

filenameSuffix

When subsetting H5-based datasets to new H5 files, thissuffix will be added to all the filenames. Default"removeMissing".

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

...

Arguments passed tosubsetLigerDataset

slot.use

Deprecated. Always look atrawData slot ofinnerligerDataset objects.

use.cols

Deprecated. Previously means "treating each column asa cell" whenTRUE, now meansorient="cell".

Value

Updated (subset)object.

Note

removeMissingObs will be deprecated.removeMissing covers andexpands the use case and should be easier to understand.

Examples

# The example dataset does not contain non-expressing genes or empty barcodespbmc <- removeMissing(pbmc)

Restore links (to HDF5 files) for reloaded liger/ligerDataset object

Description

When loading the saved liger object with HDF5 data in a new Rsession, the links to HDF5 files would be closed. This function enablesthe restoration of those links so that new analyses can be carried out.

Usage

restoreH5Liger(object, filePath = NULL)restoreOnlineLiger(object, file.path = NULL)

Arguments

object

liger orligerDataset object.

filePath

Paths to HDF5 files. A single character path forligerDataset input or a list of paths named by the datasets forliger object input. DefaultNULL looks for the path(s)of the last valid loading.

file.path

Will be deprecated withrestoreOnlineLiger. The sameasfilePath.

Value

object with restored links.

Note

restoreOnlineLiger will be deprecated for clarifying the terms usedfor data structure.

Examples

h5Path <- system.file("extdata/ctrl.h5", package = "rliger")tempPath <- tempfile(fileext = ".h5")file.copy(from = h5Path, to = tempPath)lig <- createLiger(list(ctrl = tempPath))# Now it is actually an invalid object! which is equivalent to what users# will get with `saveRDS(lig, "object.rds"); lig <- readRDS("object.rds")``closeAllH5(lig)lig <- restoreH5Liger(lig)

Retrieve a single matrix of cells from a slot

Description

Only retrieve data from specific slot to reduce memory used bya wholeliger object of the subset. Useful for plotting.Internally used byplotDimRed andplotCellViolin.

Usage

retrieveCellFeature(  object,  feature,  slot = c("rawData", "normData", "scaleData", "H", "H.norm", "cellMeta", "rawPeak",    "normPeak"),  cellIdx = NULL,  ...)

Arguments

object

liger object

feature

Gene names, factor index or cell metadata variable names.Should be available in specifiedslot.

slot

Exactly choose from"rawData","normData","scaleData","H","H.norm" or"cellMeta".

cellIdx

Any valid type of index that subset from all cells. DefaultNULL uses all cells.

...

Additional arguments passed tosubsetLiger whenslot is one of"rawData","normData" or"scaleData".

Value

A matrix object where rows are cells and columns are specifiedfeatures.

Examples

S100A8Exp <- retrieveCellFeature(pbmc, "S100A8")qcMetrics <- retrieveCellFeature(pbmc, c("nUMI", "nGene", "mito"),                                 slot = "cellMeta")

Create "scaled data" for DNA methylation datasets

Description

Because gene body mCH proportions are negatively correlated with geneexpression level in neurons, we need to reverse the direction of themethylation data. We do this by simply subtracting all values from themaximum methylation value. The resulting values are positively correlatedwith gene expression. This will only be applied to variable genes detected inprior.

Usage

reverseMethData(object, useDatasets, verbose = getOption("ligerVerbose", TRUE))

Arguments

object

Aliger object, with variable genes identified.

useDatasets

Required. A character vector of the names, a numeric orlogical vector of the index of the datasets that should be identified asmethylation data where the reversed data will be created.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

Value

The inputliger object, where thescaleData slotof the specified datasets will be updated with value as described inDescription.

Examples

# Assuming the second dataset in example data "pbmc" is methylation datapbmc <- normalize(pbmc, useDatasets = 1)pbmc <- selectGenes(pbmc, datasets.use = 1)pbmc <- scaleNotCenter(pbmc, useDatasets = 1)pbmc <- reverseMethData(pbmc, useDatasets = 2)

Perform consensus iNMF on scaled datasets

Description

This is an experimental function and issubject to change.

Performs consensus integrative non-negative matrix factorization (c-iNMF)to return factorizedH,W, andV matrices. In order toaddress the non-convex nature of NMF, we built on the cNMF method proposed byD. Kotliar, 2019. We run the regular iNMF multiple times with differentrandom starts, and cluster the pool of all the factors inW andVs and take the consensus of the clusters of the largest population.The cell factor loadingH matrices are eventually solvedwith the consensusW andV matrices.

Please seerunINMF for detailed introduction to the regulariNMF algorithm which is run multiple times in this function.

The consensus iNMF algorithm is developed basing on the consensus NMF (cNMF)method (D. Kotliar et al., 2019).

Usage

runCINMF(object, k = 20, lambda = 5, rho = 0.3, ...)## S3 method for class 'liger'runCINMF(  object,  k = 20,  lambda = 5,  rho = 0.3,  nIteration = 30,  nRandomStarts = 10,  HInit = NULL,  WInit = NULL,  VInit = NULL,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'Seurat'runCINMF(  object,  k = 20,  lambda = 5,  rho = 0.3,  datasetVar = "orig.ident",  layer = "ligerScaleData",  assay = NULL,  reduction = "cinmf",  nIteration = 30,  nRandomStarts = 10,  HInit = NULL,  WInit = NULL,  VInit = NULL,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

Aliger object or a Seurat object withnon-negative scaled data of variable features (Done withscaleNotCenter).

k

Inner dimension of factorization (number of factors). Generally, ahigherk will be needed for datasets with more sub-structure. Default20.

lambda

Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase aslambda increases). Default5.

rho

Numeric number between 0 and 1. Fraction for determining thenumber of nearest neighbors to look at for consensus (byrho * nRandomStarts). Default0.3.

...

Arguments passed to methods.

nIteration

Total number of block coordinate descent iterations toperform. Default30.

nRandomStarts

Number of replicate runs for creating the pool offactorization results. Default10.

HInit

Initial values to use forH matrices. A list object whereeach element is the initialH matrix of each dataset. DefaultNULL.

WInit

Initial values to use forW matrix. A matrix object.DefaultNULL.

VInit

Initial values to use forV matrices. A list object whereeach element is the initialV matrix of each dataset. DefaultNULL.

seed

Random seed to allow reproducible results. Default1.

nCores

The number of parallel tasks to speed up the computation.Default2L. Only supported for platform with OpenMP support.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

datasetVar

Metadata variable name that stores the dataset sourceannotation. Default"orig.ident".

layer

For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default"ligerScaleData". For older Seurat,always retrieve fromscale.data slot.

assay

Name of assay to use. DefaultNULL uses current activeassay.

reduction

Name of the reduction to store result. Also used as thefeature key. Default"cinmf".

Value

liger method - Returns updated inputliger object
- A list of allH matrices can be accessed withgetMatrix(object, "H")
- A list of allV matrices can be accessed withgetMatrix(object, "V")
- TheW matrix can be accessed withgetMatrix(object, "W")
Seurat method - Returns updated input Seurat object
- H matrices for all datasets will be concatenated andtransposed (all cells by k), and form a DimReduc object in thereductions slot named by argumentreduction.
- W matrix will be presented asfeature.loadings in thesame DimReduc object.
- V matrices, an objective error value and the datasetvariable used for the factorization is currently stored inmisc slot of the same DimReduc object.

References

Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares andContrasts Features of Brain Cell Identity, Cell, 2019

Dylan Kotliar and et al., Identifying gene expression programs of cell-typeidentity and cellular activity with single-cell RNA-Seq, eLife, 2019

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- runCINMF(pbmc)}

SNN Graph Based Community Detection

Description

After aligning cell factor loadings, users can additionally run the Leiden orLouvain algorithm for community detection, which is widely used insingle-cell analysis and excels at merging small clusters into broad cellclasses.

While using aligned factor loadings (result fromalignFactors)is recommended, this function looks for unaligned factor loadings (raw resultfromrunIntegration) when the former is not available.

Usage

runCluster(  object,  resolution = 1,  nNeighbors = 20,  prune = 1/15,  eps = 0.1,  nRandomStarts = 10,  nIterations = 5,  method = c("leiden", "louvain"),  useRaw = NULL,  useDims = NULL,  groupSingletons = TRUE,  saveSNN = FALSE,  clusterName = paste0(method, "_cluster"),  seed = 1,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

Aliger object. Should have valid factorizationresult available.

resolution

Numeric, value of the resolution parameter, a larger valueresults in a larger number of communities with smaller sizes. Default1.0.

nNeighbors

Integer, the maximum number of nearest neighbors tocompute. Default20.

prune

Numeric. Sets the cutoff for acceptable Jaccard index whencomputing the neighborhood overlap for the SNN construction. Any edges withvalues less than or equal to this will be set to 0 and removed from the SNNgraph. Essentially sets the stringency of pruning.0 for no pruning,while1 prunes everything. Default1/15.

eps

Numeric, the error bound of the nearest neighbor search. Default0.1.

nRandomStarts

Integer number of random starts. Will pick themembership with highest quality to return. Default10.

nIterations

Integer, maximal number of iterations per random start.Default5.

method

Community detection algorithm to use. Choose from"leiden" or"louvain". Default"leiden".

useRaw

Whether to use un-aligned cell factor loadings (Hmatrices). DefaultNULL search for quantile-normalized loadings firstand un-aligned loadings then.

useDims

Indices of factors to use for clustering. DefaultNULLuses all available factors.

groupSingletons

Whether to group single cells that make up their owncluster in with the cluster they are most connected to. DefaultTRUE,ifFALSE, assign all singletons to a"singleton" group.

saveSNN

Logical, whether to store the SNN graph, as a dgCMatrixobject, in the object. DefaultFALSE.

clusterName

Name of the variable that will store the clustering resultincellMeta slot ofobject. Default"leiden_cluster" and"louvain_cluster".

seed

Seed of the random number generator. Default1.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

Value

object with cluster assignment updated inclusterNamevariable incellMeta slot. Can be fetched withobject[[clusterName]]. IfsaveSNN = TRUE, the SNN graph willbe stored atobject@uns$snn.

Examples

pbmcPlot <- runCluster(pbmcPlot, nRandomStarts = 1)head(pbmcPlot$leiden_cluster)pbmcPlot <- runCluster(pbmcPlot, method = "louvain")head(pbmcPlot$louvain_cluster)

Run Gene Ontology enrichment analysis on differentially expressed genes.

Description

This function forms genesets basing on the differential expression result,and calls gene ontology (GO) analysis method provided by gprofiler2.

Usage

runGOEnrich(  result,  group = NULL,  useBg = TRUE,  orderBy = NULL,  logFCThresh = 1,  padjThresh = 0.05,  splitReg = FALSE,  ...)

Arguments

result

Data frame of unfiltered output fromrunMarkerDEGorrunPairwiseDEG.

group

Selection of one group available fromresult$group.DefaultNULL uses all groups involved in DEresult table.

useBg

Logical, whether to set all genes involved in DE analysis(before threshold filtering) as a domain background of GO analysis. DefaultTRUE. Otherwise use all annotated genes from gprofiler2 database.

orderBy

Name of DE statistics metric to order the gene list for eachgroup. Choose from"logFC","pval" or"padj" to enableranked mode. DefaultNULL to use two-list mode.

logFCThresh

The absolute valued log2FC threshold above which thegenes will be used. Default1.

padjThresh

The adjusted p-value threshold less than which the geneswill be used. Default0.05.

splitReg

Whether to have queries of both up-regulated anddown-regulated genes for each group. DefaultFALSE only queriesup-regulated genes and should be preferred whenresult comes frommarker detection test. Whenresult comes from group-to-group DE test,it is recommended to setsplitReg = TRUE.

...

Additional arguments passed togprofiler2::gost(). Usefulones are:

organism: The organism to be used for the analysis. "hsapiens"for human, "mmusculus" for mouse.
evcodes: Whether to include overlapping genes for each term.DefaultFALSE.
significant: Whether to filter out non-significant terms.DefaultTRUE.

Argumentsquery,custom_bg,domain_scope, andordered_query are pre-specified by this wrapper function.

Details

GO term enrichment test often goes with two modes: two-list mode and rankedmode.

Two-list mode comes with a query gene set and a background gene set.A query gene set contains the filtered DEGs in this analysis. A backgroundcan be all the genes involved in the DEG test (default,useBg = TRUE),or use all annotated genes in the gprofiler2 database (useBg = FALSE).

Ranked mode comes with only one query gene set, which is sorted. It shouldcontain the whole domain background genes while significant genes aresupposed to come first. SetorderBy to one of the DE statistics metricto enable this mode.useBg will be ignored in this mode.

Value

A list object where each element is a result list for a group. Eachresult list contains two elements:

result

data.frame of main GO analysis result.

meta

Meta information for the query.

Seegprofiler2::gost(). for detailed explanation.

References

Kolberg, L. et al, 2020 and Raudvere, U. et al, 2019

Examples

if (requireNamespace("gprofiler2", quietly = TRUE)) {    go <- runGOEnrich(deg.pw)}

Analyze biological interpretations of metagene

Description

Identify the biological pathways (gene sets from Reactome) thateach metagene (factor) might belongs to.

Usage

runGSEA(  object,  genesets = NULL,  useW = TRUE,  useV = NULL,  customGenesets = NULL,  gene_sets = genesets,  mat_w = useW,  mat_v = useV,  custom_gene_sets = customGenesets)

Arguments

object

Aliger object with valid factorization result.

genesets

Character vector of the Reactome gene sets names to betested. DefaultNULL uses all the gene sets from the Reactome.

useW

Logical, whether to use the shared factor loadings (W).DefaultTRUE.

useV

A character vector of the names, a numeric or logicalvector of the index of the datasets where theV matrices will beincluded for analysis. DefaultNULL uses all datasets.

customGenesets

A named list of character vectors of entrez gene ids.DefaultNULL uses all the gene symbols from the input matrix.

gene_sets,mat_w,mat_v,custom_gene_sets

Deprecated. See Usagesection for replacement.

Value

A list of matrices with GSEA analysis for each factor

Examples

if (requireNamespace("org.Hs.eg.db", quietly = TRUE) &&    requireNamespace("reactome.db", quietly = TRUE) &&    requireNamespace("fgsea", quietly = TRUE) &&    requireNamespace("AnnotationDbi", quietly = TRUE)) {    runGSEA(pbmcPlot)}

General QC for liger object

Description

Calculate number of UMIs, number of detected features andpercentage of feature subset (e.g. mito, ribo and hemo) expression per cell.

Usage

runGeneralQC(  object,  organism,  features = NULL,  pattern = NULL,  overwrite = FALSE,  useDatasets = NULL,  chunkSize = getOption("ligerChunkSize", 20000),  verbose = getOption("ligerVerbose", TRUE),  mito = NULL,  ribo = NULL,  hemo = NULL)

Arguments

object

liger object withrawData available ineachligerDataset embedded

organism

Specify the organism of the dataset to identify themitochondrial, ribosomal and hemoglobin genes. Available options are"mouse","human","zebrafish","rat" and"drosophila". SetNULL to disable mito, ribo and hemocalculation.

features

Feature names matching the feature subsets that users want tocalculate the expression percentage with. A vector for a single subset, or anamed list for multiple subset. DefaultNULL.

pattern

Regex patterns for matching the feature subsets that userswant to calculate the expression percentage with. A vector for a singlesubset, or a named list for multiple subset. DefaultNULL.

overwrite

Whether to overwrite existing QC metric variables. DefaultFALSE do not update existing result. UseTRUE for updating all.Use a character vector to specify which to update. See Details.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be included for QC. DefaultNULL performs QC on all datasets.

chunkSize

Integer number of cells to include in a chunk when workingon HDF5 based dataset. Default20000

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

mito,ribo,hemo

Now will alwayscompute the percentages of mitochondrial, ribosomal and hemoglobin genecounts. These arguments will be ignored.

Details

This function by default calculates:

nUMI - The column sum of the raw data matrix per cell.Represents the total number of UMIs per cell if given raw counts.
nGene - Number of detected features per cell
mito - Percentage of mitochondrial gene expression per cell
ribo - Percentage of ribosomal gene expression per cell
hemo - Percentage of hemoglobin gene expression per cell

Users can also specify their own feature subsets with argumentfeatures, or regular expression patterns that match to genes ofinterests with argumentpattern, to calculate the expressionpercentage. If a character vector is given tofeatures, a QC metricvariable named"featureSubset_name" will be computed. If a named listof multiple subsets is given, the names will be used as the variable names.If a single pattern is given topattern, a QC metric variable named"featureSubset_pattern" will be computed. If a named list of multiplepatterns is given, the names will be used as the variable names.Duplicated QC metric names between these two arguments and the defaultfive listed above should be avoided.

This function is automatically operated at the creation time of eachliger object to capture the raw status. Argumentoverwrite is set to FALSE by default to avoid mistakenly updatingexisting metrics after filtering the object. Users can still opt to updateall newly calculated metrics (including the default five) by settingoverwrite = TRUE, or only some of newly calculated ones by providinga character vector of the names of the metrics to update. Intendedoverwriting only happens to datasets selected withuseDatasets.

Value

Updatedobject with thecellMeta(object) updated asintended by users. See Details for more information.

Examples

pbmc <- runGeneralQC(pbmc, "human", overwrite = TRUE)

Perform iNMF on scaled datasets

Description

Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,2019) using block coordinate descent (alternating non-negativeleast squares, ANLS) to return factorizedH,W, andVmatrices. The objective function is stated as

\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+\lambda\sum_{i}^{d}||V_iH_i||_F^2

whereE_i is the input non-negative matrix of the i'th dataset,dis the total number of datasets.E_i is of sizem \times n_i form variable genes andn_i cells,H_i is of sizen_i \times k,V_i is of sizem \times k, andW is ofsizem \times k.

The factorization produces a sharedW matrix (genes by k), and for eachdataset, anH matrix (k by cells) and aV matrix (genes by k).TheH matrices represent the cell factor loadings.W is heldconsistent among all datasets, as it represents the shared components of themetagenes across datasets. TheV matrices represent thedataset-specific components of the metagenes.

This function adopts highly optimized fast and memory efficientimplementation extended from Planc (Kannan, 2016). Pre-installation ofextension packageRcppPlanc is required. The underlying algorithmadopts the identical ANLS strategy asoptimizeALS in the oldversion of LIGER.

Usage

runINMF(object, k = 20, lambda = 5, ...)## S3 method for class 'liger'runINMF(  object,  k = 20,  lambda = 5,  nIteration = 30,  nRandomStarts = 1,  HInit = NULL,  WInit = NULL,  VInit = NULL,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'Seurat'runINMF(  object,  k = 20,  lambda = 5,  datasetVar = "orig.ident",  layer = "ligerScaleData",  assay = NULL,  reduction = "inmf",  nIteration = 30,  nRandomStarts = 1,  HInit = NULL,  WInit = NULL,  VInit = NULL,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

Aliger object or a Seurat object withnon-negative scaled data of variable features (Done withscaleNotCenter).

k

Inner dimension of factorization (number of factors). Generally, ahigherk will be needed for datasets with more sub-structure. Default20.

lambda

Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase aslambda increases). Default5.

...

Arguments passed to methods.

nIteration

Total number of block coordinate descent iterations toperform. Default30.

nRandomStarts

Number of restarts to perform (iNMF objective functionis non-convex, so taking the best objective from multiple successiveinitialization is recommended). For easier reproducibility, this incrementsthe random seed by 1 for each consecutive restart, so future factorizationof the same dataset can be run with one rep if necessary. Default1.

HInit

Initial values to use forH matrices. A list object whereeach element is the initialH matrix of each dataset. DefaultNULL.

WInit

Initial values to use forW matrix. A matrix object.DefaultNULL.

VInit

Initial values to use forV matrices. A list object whereeach element is the initialV matrix of each dataset. DefaultNULL.

seed

Random seed to allow reproducible results. Default1.

nCores

The number of parallel tasks to speed up the computation.Default2L. Only supported for platform with OpenMP support.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

datasetVar

Metadata variable name that stores the dataset sourceannotation. Default"orig.ident".

layer

For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default"ligerScaleData". For older Seurat,always retrieve fromscale.data slot.

assay

Name of assay to use. DefaultNULL uses current activeassay.

reduction

Name of the reduction to store result. Also used as thefeature key. Default"inmf".

Value

liger method - Returns updated inputliger object
- A list of allH matrices can be accessed withgetMatrix(object, "H")
- A list of allV matrices can be accessed withgetMatrix(object, "V")
- TheW matrix can be accessed withgetMatrix(object, "W")
Seurat method - Returns updated input Seurat object
- H matrices for all datasets will be concatenated andtransposed (all cells by k), and form a DimReduc object in thereductions slot named by argumentreduction.
- W matrix will be presented asfeature.loadings in thesame DimReduc object.
- V matrices, an objective error value and the datasetvariable used for the factorization is currently stored inmisc slot of the same DimReduc object.

Difference from optimizeALS()

In the old version implementation, we compute the objective error at the endof each iteration, and then compares if the algorithm is reaching aconvergence, using an argumentthresh. Now, since the computation ofobjective error is indeed expensive, we canceled this feature and directlyruns a default of 30 (nIteration) iterations, which empirically leadsto a convergence most of the time. Given that the new version is highlyoptimized, running this many iteration should be acceptable.

References

Joshua D. Welch and et al., Single-Cell Multi-omic IntegrationCompares and Contrasts Features of Brain Cell Identity, Cell, 2019

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- runINMF(pbmc)}

Integrate scaled datasets with iNMF or variant methods

Description

LIGER provides dataset integration methods based on iNMF (integrativeNon-negative Matrix Factorization [1]) and its variants (online iNMF [2]and UINMF [3]). This function wrapsrunINMF,runOnlineINMF andrunUINMF, of which the helppages have more detailed description.

Usage

runIntegration(  object,  k = 20,  lambda = 5,  method = c("iNMF", "onlineINMF", "UINMF"),  ...)## S3 method for class 'liger'runIntegration(  object,  k = 20,  lambda = 5,  method = c("iNMF", "onlineINMF", "UINMF"),  seed = 1,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'Seurat'runIntegration(  object,  k = 20,  lambda = 5,  method = c("iNMF", "onlineINMF"),  datasetVar = "orig.ident",  useLayer = "ligerScaleData",  assay = NULL,  seed = 1,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

Aliger object or a Seurat object withnon-negative scaled data of variable features (Done withscaleNotCenter).

k

Inner dimension of factorization (number of factors). Generally, ahigherk will be needed for datasets with more sub-structure. Default20.

lambda

Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase aslambda increases). Default5.

method

iNMF variant algorithm to use for integration. Choose from"iNMF","onlineINMF","UINMF". Default"iNMF".

...

Arguments passed to other methods and wrapped functions.

seed

Random seed to allow reproducible results. Default1.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

datasetVar

Metadata variable name that stores the dataset sourceannotation. Default"orig.ident".

useLayer

For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default"ligerScaleData". For older Seurat,always retrieve fromscale.data slot.

assay

Name of assay to use. DefaultNULL uses current activeassay.

Value

Updated input object. For detail, please refer to the refered methodlinked in Description.

References

Joshua D. Welch and et al., Single-Cell Multi-omic Integration Comparesand Contrasts Features of Brain Cell Identity, Cell, 2019
Chao Gao and et al., Iterative single-cell multi-omic integration usingonline learning, Nat Biotechnol., 2021
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integrationof single-cell multi-omic datasets using nonnegative matrix factorization,Nat. Comm., 2022

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    pbmc <- runIntegration(pbmc)}

Perform online iNMF on scaled datasets

Description

Perform online integrative non-negative matrix factorization torepresent multiple single-cell datasets in terms ofH,W, andV matrices. It optimizes the iNMF objective function (seerunINMF) using online learning (non-negative least squares forH matrices, and hierarchical alternating least squares (HALS) forV matrices andW), where the number of factors is set byk. The function allows online learning in 3 scenarios:

Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization

All three scenarios require fixed memory independent of the number of cells.

For each dataset, this factorization produces anH matrix (k by cell),aV matrix (genes by k), and a sharedWmatrix (genes by k). TheH matrices represent the cell factor loadings.W is identical among all datasets, as it represents the sharedcomponents of the metagenes across datasets. TheV matrices representthe dataset-specific components of the metagenes.

Usage

runOnlineINMF(object, k = 20, lambda = 5, ...)## S3 method for class 'liger'runOnlineINMF(  object,  k = 20,  lambda = 5,  newDatasets = NULL,  projection = FALSE,  maxEpochs = 5,  HALSiter = 1,  minibatchSize = 5000,  HInit = NULL,  WInit = NULL,  VInit = NULL,  AInit = NULL,  BInit = NULL,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'Seurat'runOnlineINMF(  object,  k = 20,  lambda = 5,  datasetVar = "orig.ident",  layer = "ligerScaleData",  assay = NULL,  reduction = "onlineINMF",  maxEpochs = 5,  HALSiter = 1,  minibatchSize = 5000,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

liger object. Scaled data required.

k

Inner dimension of factorization–number of metagenes. A value inthe range 20-50 works well for most analyses. Default20.

lambda

Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase aslambda increases). We recommend always using the default value exceptpossibly for analyses with relatively small differences (biologicalreplicates, male/female comparisons, etc.) in which case a lower value suchas 1.0 may improve reconstruction quality. Default5.0.

...

Arguments passed to other S3 methods of this function.

newDatasets

Named list ofdgCMatrix-class object. Newdatasets for scenario 2 or scenario 3. DefaultNULL triggers scenario1.

projection

Whether to perform data integration with scenario 3 whennewDatasets is specified. See description. DefaultFALSE.

maxEpochs

The number of epochs to iterate through. See detail.Default5.

HALSiter

Maximum number of block coordinate descent (HALSalgorithm) iterations to perform for each update ofW andV.Default1. Changing this parameter is not recommended.

minibatchSize

Total number of cells in each minibatch. See detail.Default5000.

HInit,WInit,VInit,AInit,BInit

Optional initialization forH,W,V,A, andB matrices, respectively. Must bepresented all together. See detail. DefaultNULL.

seed

Random seed to allow reproducible results. Default1.

nCores

The number of parallel tasks to speed up the computation.Default2L. Only supported for platform with OpenMP support.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

datasetVar

Metadata variable name that stores the dataset sourceannotation. Default"orig.ident".

layer

For Seurat>=4.9.9, the name of layer to retrieve inputnon-negative scaled data. Default"ligerScaleData". For older Seurat,always retrieve fromscale.data slot.

assay

Name of assay to use. DefaultNULL uses current activeassay.

reduction

Name of the reduction to store result. Also used as thefeature key. Default"onlineINMF".

Details

For performing scenario 2 or 3, a complete set of factorization result froma run of scenario 1 is required. Given the structure of aligerobject, all of the required information can be retrieved automatically.Under the circumstance where users need customized information for existingfactorization, argumentsWInit,VInit,AInit andBInit are exposed. The requirements for these argument follows:

HInit - A list object of matrices each of sizek \times n_i.Number of matrices should match withnewDatasets.
WInit - A matrix object of sizem \times k. (seerunINMF for notation)
VInit - A list object of matrices each of sizem \times k.Number of matrices should match withnewDatasets.
AInit - A list object of matrices each of sizek \times k.Number of matrices should match withnewDatasets.
BInit - A list object of matrices each of sizem \times k.Number of matrices should match withnewDatasets.

Minibatch iterations is performed on small subset of cells. The exactminibatch size applied on each dataset isminibatchSize multiplied bythe proportion of cells in this dataset out of all cells. In general,minibatchSize should be no larger than the number of cells in thesmallest dataset (considering bothobject andnewDatasets).Therefore, a smaller value may be necessary for analyzing very smalldatasets.

An epoch is one completion of calculation on all cells after a number ofiterations of minibatches. Therefore, the total number of iterations isdetermined by the setting ofmaxEpochs, total number of cells, andminibatchSize.

Currently, Seurat S3 method does not support working on Scenario 2 and 3,because there is no simple solution for organizing a number of miscellaneousmatrices with a single Seurat object. We strongly recommend that users createaliger object which has the specific structure.

Value

liger method - Returns updated inputliger object.
- A list of allH matrices can be accessed withgetMatrix(object, "H")
- A list of allV matrices can be accessed withgetMatrix(object, "V")
- TheW matrix can be accessed withgetMatrix(object, "W")
- Meanwhile, intermediate matricesA andB produced inHALS update can also be accessed similarly.
Seurat method - Returns updated input Seurat object.
- H matrices for all datasets will be concatenated andtransposed (all cells by k), and form a DimReduc object in thereductions slot named by argumentreduction.
- W matrix will be presented asfeature.loadings in thesame DimReduc object.
- V matrices,A matrices,B matricesm an objectiveerror value and the dataset variable used for the factorization iscurrently stored inmisc slot of the same DimReduc object.

References

Chao Gao and et al., Iterative single-cell multi-omic integrationusing online learning, Nat Biotechnol., 2021

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc)pbmc <- scaleNotCenter(pbmc)if (requireNamespace("RcppPlanc", quietly = TRUE)) {    # Scenario 1    pbmc <- runOnlineINMF(pbmc, minibatchSize = 200)    # Scenario 2    # Fake new dataset by increasing all non-zero value in "ctrl" by 1    ctrl2 <- rawData(dataset(pbmc, "ctrl"))    ctrl2@x <- ctrl2@x + 1    colnames(ctrl2) <- paste0(colnames(ctrl2), 2)    pbmc2 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2),                           minibatchSize = 100)    # Scenario 3    pbmc3 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2),                           projection = TRUE)}

Find DEG between groups

Description

Two methods are supported:"pseudoBulk" and"wilcoxon". Pseudo-bulk method aggregates cells basing on biologicalreplicates and calls bulk RNAseq DE methods, DESeq2 wald test, whileWilcoxon rank sum test is performed on single-cell level.runPairwiseDEG() is generally used for flexibly comparing two specificgroups of cells, whilerunMarkerDEG() is used for a one-vs-rest markertest strategy.

While using pseudo-bulk method, it is generally recommended that you havethese variables available in your object:

The cell type or cluster labeling. This can be obtained from priorstudy or computed withrunCluster
The biological replicate labeling, most of the time the"dataset" variable automatically generated when theliger object is created. Users may use other variables ifa "dataset" is merged from multiple replicates.
The condition labeling that reflects the study design, such as thetreatment or disease status for each sample/dataset.

Please see below for detailed scenarios.

Usage

runPairwiseDEG(  object,  groupTest,  groupCtrl,  variable1 = NULL,  variable2 = NULL,  splitBy = NULL,  method = c("pseudoBulk", "wilcoxon"),  usePeak = FALSE,  useReplicate = "dataset",  nPsdRep = NULL,  minCellPerRep = 3,  printDiagnostic = FALSE,  chunk = NULL,  seed = 1,  verbose = getOption("ligerVerbose", TRUE))runMarkerDEG(  object,  conditionBy = NULL,  splitBy = NULL,  method = c("pseudoBulk", "wilcoxon"),  useDatasets = NULL,  usePeak = FALSE,  useReplicate = "dataset",  nPsdRep = NULL,  minCellPerRep = 3,  printDiagnostic = FALSE,  chunk = NULL,  seed = 1,  verbose = getOption("ligerVerbose", TRUE))runWilcoxon(  object,  data.use = NULL,  compare.method = c("clusters", "datasets"))

Arguments

object

Aliger object, with normalized data available

groupTest,groupCtrl,variable1,variable2

Condition specification. See?runPairwiseDEG sectionPairwise DEG Scenarios for detail.

splitBy

Name(s) of the variable(s) incellMeta to split thecomparison. See Details. DefaultNULL.

method

DEG test method to use. Choose from"pseudoBulk" or"wilcoxon". Default"pseudoBulk"

usePeak

Logical. Whether to use peak count instead of gene count.Only supported when ATAC datasets are involved. DefaultFALSE.

useReplicate

cellMeta variable of biological replicateannotation. Only used withmethod = "pseudoBulk". Default"dataset".

nPsdRep

Number of pseudo-replicates to create. Only used whenmethod = "pseudoBulk". DefaultNULL. See Details.

minCellPerRep

Numeric, will not make pseudo-bulk for replicate withless than this number of cells. Default3.

printDiagnostic

Logical. Whether to show more detail whenverbose = TRUE. DefaultFALSE.

chunk

Number of features to process at a time during Wilcoxon test.Useful when memory is limited. DefaultNULL will process all featuresat once.

seed

Random seed to use for pseudo-replicate generation. Default1.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

conditionBy

cellMeta variable(s). Marker detection will beperformed for each level of this variable. Multiple variables will becombined. DefaultNULL uses default cluster.

useDatasets

Datasets to perform marker detection within. DefaultNULL will use all datasets.

data.use

Same asuseDatasets.

compare.method

Choose from"clusters" (default) or"datasets"."clusters" compares each cluster against all othercells, while"datasets" run within each cluster and compare eachdataset against all other datasets.

Value

A data.frame with DEG information with the all or some of thefollowing fields:

feature

Gene names

group

Test group name. Multiple tests might be present for eachfunction call. This is the main variable to distinguish the tests. For apairwise test, a row with a certain group name represents the test resultbetween the this group against the other control group; When split by avariable, it would be presented in "split.group" format, meaning the statsis by comparing the group in the split level against the control group inthe same split level. When running marker detection without splitting,a row with group "a" represents the stats of the gene in group "a" againstall other cells. When running split marker detection, the group name wouldbe in "split.group" format, meaning the stats is by comparing the group inthe split level against all other cells in the same split level.

logFC

Log fold change

pval

P-value

padj

Adjusted p-value

avgExpr

Mean expression in the test group indicated by the "group"field. Only available for wilcoxon tests.

statistic

Wilcoxon rank-sum test statistic. Only available forwilcoxon tests.

auc

Area under the ROC curve. Only available for wilcoxon tests.

pct_in

Percentage of cells in the test group, indicated by the"group" field, that express the feature. Only available for wilcoxontests.

pct_out

Percentage of cells in the control group or other cells, asexplained for the "group" field, that express the feature. Only availablefor wilcoxon tests.

Using Wilcoxon rank-sum test

Wilcoxon rank-sum test works for each gene and is based on the rank of theexpression in each cell. LIGER provides dataset integration but does not"correct" the expression values. Projects with strong batch effects orintegrate drastically different modalities should be cautious when usingthis method.

Comparing difference between/across cell types

Most of times, people would want to know what cell types are for each clusterafter clustering. This can be done with a marker detection method that testeach cluster against all the other cells. This can be done with a commandlikerunMarkerDEG(object, conditionBy = "cluster_var"). When usingdefault pseudo-bulk method, users should additionaly determine thepseudo-bulk setup parameters. If the real biological replicate variable isavailable, it should be supplied to argumentuseReplicate, otherwise,pseudo-replicates should be created. See "Pseudo-Replicate" section for more.

Compare between conditions

It is frequently needed to identify the difference between conditions. Userscan simply setconditionBy = "condition_var". However, most of time,such comparisons should be ideally done in a per-cluster manner. This can bedone by settingsplitBy = "cluster_var". This will run a loop for eachcluster, and within the group of cells, compare each condition against allother cells in the cluster.

In the scenario when users only need to compare two conditions for eachcluster, runningrunPairwiseDEG(object, groupTest = "condition1",groupCtrl = "condition2", variable1 = "condition_var",splitBy = "cluster_var") would address the need.

For both use case, if pseudo-bulk (default) method is used, users shoulddetermine the pseudo-bulk setup parameters as mentioned in the previoussection.

Detailed`runMarkerDEG` usage

Marker detection is performed in a one vs. rest manner. The grouping of suchcondition is specified byconditionBy, which should be a column nameincellMeta. WhensplitBy is specified as another variablename incellMeta, the marker detection will be iteratively done forwithin each level ofsplitBy variable.

For example, whenconditionBy = "celltype" andsplitBy = NULL,marker detection will be performed by comparing all cells of "celltype_i"against all other cells, and etc. This is analogous to the old version whenrunningrunWilcoxon(method = "cluster").

WhenconditionBy = "gender" andsplitBy = "leiden_cluster",marker detection will be performed by comparing "gender_i" cells from "cluster_j"against other cells from "cluster_j", and etc. This is analogous to the oldversion when runningrunWilcoxon(method = "dataset").

Detailed`runPairwiseDEG` usage

Users can select classes of cells from a variable incellMeta.variable1 andvariable2 are used to specify a column incellMeta, andgroupTest andgroupCtrl are used to specifyexisting classes fromvariable1 andvariable2, respectively.Whenvariable2 is missing,groupCtrl will be considered fromvariable1.

For example, whenvariable1 = "celltype" andvariable2 = NULL,groupTest andgroupCtrl should be valid cell types inobject$celltype.

Whenvariable1 is "celltype" andvariable2 is "gender",groupTest should be a valid cell type fromobject$celltype andgroupCtrl should be a valid class fromobject$gender.

When bothvariable1 andvariable2 are missing,groupTestandgroupCtrl should be valid index of cells inobject.

Pseudo-Replicate

Pseudo-replicate assignment is a technique to complement the lack of realbiological replicates when using pseudo-bulk DE methods. LIGER's pseudo-bulkmethod generally requires that each comparison group has at least 3replicates each composed of at least 3 cells, in order to ensure thestatistic power. When less than 3 real replicates are found for a comparison,the default setting (nPsdRep = NULL) splits each into 3pseudo-replicates, otherwise no pseudo-replicates are automaticallygenerated. WhennPsdRep is given a number, LIGER will always gothrough each comparison group and split each real replicate into the givennumber of pseudo-replicates.

Examples

pbmc$leiden_cluster <- pbmcPlot$leiden_cluster# Identify cluster markersdegStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster")# Compare "stim" data against "ctrl" data within each clusterdegStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl",                            variable1 = "dataset",                            splitBy = "leiden_cluster",                            minCellPerRep = 4)

Perform t-SNE dimensionality reduction

Description

Runs t-SNE on the aligned cell factors (result fromalignFactors), or unaligned cell factors (result fromrunIntegration)) to generate a 2D embedding for visualization.By defaultRtsne (Barnes-Hut implementation of t-SNE)method is invoked, while alternative "fftRtsne" method (FFT-acceleratedInterpolation-based t-SNE, using Kluger Lab implementation) is alsosupported. For very large datasets, it is recommended to usemethod = "fftRtsne" due to its efficiency and scalability.

Extra external installation steps are required for using "fftRtsne" method.Please consultdetailed guide.

Usage

runTSNE(  object,  useRaw = NULL,  useDims = NULL,  nDims = 2,  usePCA = FALSE,  perplexity = 30,  theta = 0.5,  method = c("Rtsne", "fftRtsne"),  dimredName = "TSNE",  asDefault = NULL,  fitsnePath = NULL,  seed = 42,  verbose = getOption("ligerVerbose", TRUE),  k = nDims,  use.raw = useRaw,  dims.use = useDims,  use.pca = usePCA,  fitsne.path = fitsnePath,  rand.seed = seed)

Arguments

object

liger object with factorization results.

useRaw

Whether to use un-aligned cell factor loadings (Hmatrices). DefaultNULL search for aligned factor loadings firstand un-aligned loadings then.

useDims

Index of factors to use for computing the embedding. DefaultNULL uses all factors.

nDims

Number of dimensions to reduce to. Default2.

usePCA

Whether to perform initial PCA step for Rtsne. DefaultFALSE.

perplexity

Numeric parameter to pass to Rtsne (expected number ofneighbors). Default30.

theta

Speed/accuracy trade-off (increase for less accuracy), set to0.0 for exact TSNE. Default0.5.

method

Choose from"Rtsne" or"fftRtsne". SeeDescription. Default"Rtsne".

dimredName

Name of the variable incellMeta slot to store theresult matrix. Default"TSNE".

asDefault

Logical, whether to set the resulting dimRed as default forvisualization. DefaultNULL will set it when no default is set.

fitsnePath

Path to the cloned FIt-SNE directory (i.e.'/path/to/dir/FIt-SNE'). Required only when first time usingrunTSNE withmethod = "fftRtsne". DefaultNULL.

seed

Random seed for reproducibility. Default42.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

use.raw,dims.use,k,use.pca,fitsne.path,rand.seed

Deprecated.See Usage section for replacement.

Value

Theobject where a"TSNE" variable is updated in thecellMeta slot with the whole 2D embedding matrix.

Examples

pbmc <- runTSNE(pbmcPlot)

Perform Mosaic iNMF (UINMF) on scaled datasets with unshared features

Description

Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R.Kriebel, 2022) using block coordinate descent (alternating non-negativeleast squares, ANLS) to return factorizedH,W,V andU matrices. The objective function is stated as

\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d}||\begin{bmatrix}E_i \\ P_i \end{bmatrix} -(\begin{bmatrix}W \\ 0 \end{bmatrix}+\begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+\lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2

whereE_i is the input non-negative matrix of thei'th dataset,P_i is the input non-negative matrix for the unshared features,d is the total number of datasets.E_i is of sizem \times n_i form shared features andn_i cells,P_iis of sizeu_i \times n_i foru_i unshared feaetures,H_i is of sizek \times n_i,V_i is of sizem \times k,W is of sizem \times k andU_i is ofsizeu_i \times k.

The factorization produces a sharedW matrix (genes by k). For eachdataset, anH matrix (k by cells), aV matrix (genes by k) andaU matrix (unshared genes by k). TheH matrices represent thecell factor loadings.W is held consistent among all datasets, as itrepresents the shared components of the metagenes across datasets. TheV matrices represent the dataset-specific components of the metagenes,U matrices are similar toVs but represents the loadingcontributed by unshared features.

Usage

runUINMF(object, k = 20, lambda = 5, ...)## S3 method for class 'liger'runUINMF(  object,  k = 20,  lambda = 5,  nIteration = 30,  nRandomStarts = 1,  seed = 1,  nCores = 2L,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

liger object. Should runselectGenes withunshared = TRUE and then runscaleNotCenter in advance.

k

Inner dimension of factorization (number of factors). Generally, ahigherk will be needed for datasets with more sub-structure. Default20.

lambda

Regularization parameter. Larger values penalizedataset-specific effects more strongly (i.e. alignment should increase aslambda increases). Default5.

...

Arguments passed to other methods and wrapped functions.

nIteration

Total number of block coordinate descent iterations toperform. Default30.

nRandomStarts

seed

Random seed to allow reproducible results. Default1.

nCores

The number of parallel tasks to speed up the computation.Default2L. Only supported for platform with OpenMP support.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

Value

liger method - Returns updated inputliger object.
- A list of allH matrices can be accessed withgetMatrix(object, "H")
- A list of allV matrices can be accessed withgetMatrix(object, "V")
- TheW matrix can be accessed withgetMatrix(object, "W")
- A list of allU matrices can be accessed withgetMatrix(object, "U")

Note

Currently, Seurat S3 method is not supported for UINMF because there is nosimple solution for organizing a number of miscellaneous matrices with asingle Seurat object. We strongly recommend that users create aliger object which has the specific structure.

References

April R. Kriebel and Joshua D. Welch, UINMF performs mosaicintegration of single-cell multi-omic datasets using nonnegative matrixfactorization, Nat. Comm., 2022

Examples

pbmc <- normalize(pbmc)pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim"))pbmc <- scaleNotCenter(pbmc)if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) &&    !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) {    # TODO: unshared variable features cannot be detected from this example    pbmc <- runUINMF(pbmc)}

Perform UMAP Dimensionality Reduction

Description

Run UMAP on the aligned cell factors (result fromalignFactors), or unaligned cell factors (raw result fromrunIntegration)) to generate a 2D embedding for visualization(or general dimensionality reduction). Has option to run on subset offactors. It is generally recommended to use this method for dimensionalityreduction with extremely large datasets. The underlying UMAP calculationimports uwotumap.

Usage

runUMAP(  object,  useRaw = NULL,  useDims = NULL,  nDims = 2,  distance = c("cosine", "euclidean", "manhattan", "hamming"),  nNeighbors = 20,  minDist = 0.1,  dimredName = "UMAP",  asDefault = NULL,  seed = 42,  verbose = getOption("ligerVerbose", TRUE),  k = nDims,  use.raw = useRaw,  dims.use = useDims,  n_neighbors = nNeighbors,  min_dist = minDist,  rand.seed = seed,  ...)

Arguments

object

liger object with factorization results.

useRaw

Whether to use un-aligned cell factor loadings (Hmatrices). DefaultNULL search for aligned factor loadings firstand un-aligned loadings then.

useDims

Index of factors to use for computing the embedding. DefaultNULL uses all factors.

nDims

Number of dimensions to reduce to. Default2.

distance

Character. Metric used to measure distance in the inputspace. Default"cosine", alternative options include:"euclidean","manhattan" and"hamming".

nNeighbors

Number of neighboring points used in local approximationsof manifold structure. Default20.

minDist

Numeric. Controls how tightly the embedding is allowedcompress points together. Default0.1.

dimredName

Name of the variable incellMeta slot to store theresult matrix. Default"UMAP".

asDefault

Logical, whether to set the resulting dimRed as default forvisualization. DefaultNULL will set it when no default is set.

seed

Random seed for reproducibility. Default42.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

k,use.raw,dims.use,n_neighbors,min_dist,rand.seed

Deprecated.See Usage section for replacement.

...

Additional argument passed touwot::umap().

Details

FornNeighbors, larger values will result in more globalstructure being preserved at the loss of detailed local structure. In generalthis parameter should often be in the range 5 to 50, with a choice of 10 to15 being a sensible default.

ForminDist, larger values ensure embedded points are more evenlydistributed, while smaller values allow the algorithm to optimize moreaccurately with regard to local structure. Sensible values are in the range0.001 to 0.5, with 0.1 being a reasonable default.

Value

Theobject where a"UMAP" variable is updated in thecellMeta slot with the whole 2D embedding matrix.

Examples

pbmc <- runUMAP(pbmcPlot)

Scale genes by root-mean-square across cells

Description

This function scales normalized gene expression data aftervariable genes have been selected. We do not mean-center the data beforescaling in order to address the non-negativity constraint of NMF.Computation applied to each normalized dataset matrix can form the followingequation:

S_{i,j}=\frac{N_{i,j}}{\sqrt{\sum_{p}^{n}\frac{N_{i,p}^2}{n-1}}}

WhereN denotes the normalized matrix for an individual dataset,S is the output scaled matrix for this dataset, andn is thenumber of cells in this dataset.i, j denotes the specific gene andcell index, andp is the cell iterator.

Please see detailed section below for explanation on methylation dataset.

Usage

scaleNotCenter(object, ...)## S3 method for class 'dgCMatrix'scaleNotCenter(object, features, scaleFactor = NULL, ...)## S3 method for class 'DelayedArray'scaleNotCenter(  object,  features,  scaleFactor = NULL,  geneRootMeanSq = NULL,  overwrite = FALSE,  chunk = getOption("ligerChunkSize", 20000),  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'ligerDataset'scaleNotCenter(  object,  features = NULL,  scaleFactor = NULL,  chunk = getOption("ligerChunkSize", 20000),  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'ligerMethDataset'scaleNotCenter(  object,  features = NULL,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'liger'scaleNotCenter(  object,  useDatasets = NULL,  features = varFeatures(object),  verbose = getOption("ligerVerbose", TRUE),  remove.missing = NULL,  ...)## S3 method for class 'Seurat'scaleNotCenter(  object,  assay = NULL,  layer = "ligerNormData",  save = "ligerScaleData",  datasetVar = "orig.ident",  features = NULL,  ...)

Arguments

object

liger object,ligerDataset object,dgCMatrix-class object, or a Seurat object.

...

Arguments passed to other methods. The order goes by: "liger"method calls "ligerDataset" method", which then calls "dgCMatrix" method."Seurat" method directly calls "dgCMatrix" method.

features

Character, numeric or logical index that choose the variablefeature to be scaled. "liger" method by default usesvarFeatures(object). "ligerDataset" method by default uses allfeatures. "Seurat" method by default usesSeurat::VariableFeatures(object).

scaleFactor

Numeric vector of scaling factor to normalize the rawcounts to unit sum. This pre-calculated at liger object creation (stored asobject$nUMI and internally specified in S3 method chains, thus isgenerally not needed to be specified by users.

geneRootMeanSq

Numeric vector of root-mean-square of unit-normalizedexpression for each gene. This is pre-calculated at the call ofselectBatchHVG (stored atfeatureMeta(dataset(object, "datasetName"))$rootMeanSq and internallyspecified in S3 method chains, thus is generally not needed to be specifiedby users.

overwrite

Logical. When writing newly computed HDF5 array to aseparate HDF5 file, whether to overwrite the existing file. DefaultFALSE raises an error when the file already exists.

chunk

Integer. Number of maximum number of cells in each chunk, whenscaling is applied to any HDF5 based dataset. Default20000.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to be scaled but not centered. DefaultNULL applies to all datasets.

remove.missing

Deprecated. The functionality of this is coveredthrough other parts of the whole workflow and is no long needed. Will beignored if specified.

assay

Name of assay to use. DefaultNULL uses current activeassay.

layer

For Seurat>=4.9.9, the name of layer to retrieve normalizeddata. Default"ligerNormData". For older Seurat, always retrieve fromdata slot.

save

For Seurat>=4.9.9, the name of layer to store normalized data.Default"ligerScaleData". For older Seurat, stored toscale.data slot.

datasetVar

Metadata variable name that stores the dataset sourceannotation. Default"orig.ident".

Value

Updatedobject

dgCMatrix method - Returns scaled dgCMatrix object
ligerDataset method - Updates thescaleData andscaledUnsharedData (if unshared variable feature available) slotof the object
liger method - Updates thescaleData andscaledUnsharedData (if unshared variable feature available) slotof chosen datasets
Seurat method - Adds a named layer in chosen assay (V5), or update thescale.data slot of the chosen assay (<=V4)

Methylation dataset

Because gene body mCH proportions are negatively correlated with geneexpression level in neurons, we need to reverse the direction of themethylation data before performing the integration. We do this by simplysubtracting all values from the maximum methylation value. The resultingvalues are positively correlated with gene expression. This will only beapplied to variable genes detected in prior. Please make sure that argumentmodal is set accordingly when runningcreateLiger. Inthis way, this function can automatically detect it and take proper action.If it is not set, users can still manually have the equivalent processingdone by doingscaleNotCenter(lig, useDataset = c("other", "datasets")),and thenreverseMethData(lig, useDataset = c("meth", "datasets")).

Note

Since the scaling on genes is applied on a per dataset base, other scalingmethods that apply to a whole concatenated matrix of multiple datasets mightnot be considered as equivalent alternatives, even if options likecenter are set toFALSE. Hence we implemented an efficientsolution that works under such circumstance, provided with the Seurat S3method.

Examples

pbmc <- selectBatchHVG(pbmc, n = 10)pbmc <- scaleNotCenter(pbmc)

Batch-aware highly variable gene selection

Description

Method to select HVGs based on mean dispersions of genes that are highlyvariable genes in all batches. Using a the top target_genes per batch byaverage normalize dispersion. If target genes still hasn't been reached,then HVGs in all but one batches are used to fill up. This is continueduntil HVGs in a single batch are considered.

This is anrliger implementation of the method originally published inSCIB.We found the potential that it can improve integration under somecircumstances, and is currently testing it.

This function currently only works for shared features across all datasets.For selection from only part of the datasets and selection fordataset-specific unshared features, please useselectGenes().

Usage

selectBatchHVG(object, ...)## S3 method for class 'liger'selectBatchHVG(  object,  nGenes = 2000,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'ligerDataset'selectBatchHVG(  object,  nGenes = 2000,  features = NULL,  scaleFactor = NULL,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'dgCMatrix'selectBatchHVG(  object,  nGenes = 2000,  returnStats = FALSE,  scaleFactor = NULL,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'DelayedArray'selectBatchHVG(  object,  nGenes = 2000,  means = NULL,  scaleFactor = NULL,  returnStats = FALSE,  chunk = getOption("ligerChunkSize", 20000),  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

Aliger object,ligerDataset object or a sparse/dense matrix. The ligerobjects must have raw counts available. A direct matrix input is preferablylog-1p transformed from CPM normalized counts in cell per column orientation.

...

Arguments passed to S3 methods.

nGenes

Integer number of target genes to select. Default2000.

verbose

Logical. Whether to show a progress bar. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

features

For ligerDataset method, the feature subset to limit theselection to, mainly for limiting the selection to happen within the sharedgenes of all datasets. DefaultNULL selects from all features in theligerDataset object.

scaleFactor

returnStats

Logical, for dgCMatrix-method, whether to return a dataframe of statistics for all features, or by defaultFALSE just returna character vector of selected features.

means

Numeric vector of pre-calculated means per gene, derived fromlog1p CPM normalized expression.

chunk

Integer. Number of maximum number of cells in each chunk whenworking on HDF5Array Default20000.

Value

liger-method: Returns the input liger object with the selected genesupdated invarFeatures slot, which can be accessed withvarFeatures(object). Additionally, the statistics are updated inthefeatureMeta slot of each ligerDataset object within thedatasets slot of theobject.
ligerDataset-method: Returns the input ligerDataset object with thestatistics updated in thefeatureMeta slot.
dgCMatrix-method: By default returns a character vector of selectedvariable features. IfreturnStats = TRUE, returns a data.frame of thestatistics.

References

Luecken, M.D., Büttner, M., Chaichoompu, K. et al. (2022), Benchmarkingatlas-level data integration in single-cell genomics.Nat Methods, 19,41–50. https://doi.org/10.1038/s41592-021-01336-8.

Examples

pbmc <- selectBatchHVG(pbmc, nGenes = 10)varFeatures(pbmc)

Select a subset of informative genes

Description

This function identifies highly variable genes from each datasetand combines these gene sets (either by union or intersection) for use indownstream analysis. Assuming that gene expression approximately follows aPoisson distribution, this function identifies genes with gene expressionvariance above a given variance threshold (relative to mean gene expression).Alternatively, we allow selecting a desired number of genes for each datasetby ranking the relative variance, and then take the combination.

Usage

selectGenes(object, thresh = 0.1, nGenes = NULL, alpha = 0.99, ...)## S3 method for class 'liger'selectGenes(  object,  thresh = 0.1,  nGenes = NULL,  alpha = 0.99,  useDatasets = NULL,  useUnsharedDatasets = NULL,  unsharedThresh = 0.1,  combine = c("union", "intersection"),  chunk = getOption("ligerChunkSize", 20000),  verbose = getOption("ligerVerbose", TRUE),  var.thresh = thresh,  alpha.thresh = alpha,  num.genes = nGenes,  datasets.use = useDatasets,  unshared.datasets = useUnsharedDatasets,  unshared.thresh = unsharedThresh,  tol = NULL,  do.plot = NULL,  cex.use = NULL,  unshared = NULL,  ...)## S3 method for class 'Seurat'selectGenes(  object,  thresh = 0.1,  nGenes = NULL,  alpha = 0.99,  useDatasets = NULL,  layer = "ligerNormData",  assay = NULL,  datasetVar = "orig.ident",  combine = c("union", "intersection"),  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

Aliger,ligerDataset orSeurat object, with normalized data available (no scale factormultipled nor log transformed).

thresh

Variance threshold used to identify variable genes. Higherthreshold results in fewer selected genes. Liger and Seurat S3 methods accepta single value or a vector with specific threshold for each dataset inuseDatasets.* Default0.1.

nGenes

Number of genes to find for each dataset. By setting this,we optimize the threshold used for each dataset so that we getnGenesselected features for each dataset. Accepts single value or a vector fordataset specific setting matchinguseDataset.* DefaultNULLdoes not optimize.

alpha

Alpha threshold. Controls upper bound for expected mean geneexpression. Lower threshold means higher upper bound. Default0.99.

...

Arguments passed to other methods.

useDatasets

A character vector of the names, a numeric or logicalvector of the index of the datasets to use for shared variable featureselection. DefaultNULL uses all datasets.

useUnsharedDatasets

A character vector of the names, a numeric orlogical vector of the index of the datasets to use for finding unsharedvariable features. DefaultNULL does not attempt to find unsharedfeatures.

unsharedThresh

The same thing asthresh that is applied to testunshared features. A single value for all datasets inuseUnsharedDatasets or a vector for dataset-specific setting.* Default0.1.

combine

How to combine variable genes selected from all datasets.Choose from"union" or"intersection". Default"union".

chunk

Integer. Number of maximum number of cells in each chunk, whengene selection is applied to any HDF5 based dataset. Default20000.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

var.thresh,alpha.thresh,num.genes,datasets.use,unshared.datasets,unshared.thresh

Deprecated.These arguments are renamed and will be removed in the future. Please seefunction usage for replacement.

tol,do.plot,cex.use,unshared

Deprecated. Gene variabilitymetric is now visualized with separated functionplotVarFeatures. Users can now set none-NULLuseUnsharedDatasets to select unshared genes, instead of having toswitchunshared on.

layer

Where the input normalized counts should be from. Default"ligerNormData". For older Seurat, always retrieve fromdataslot.

assay

Name of assay to use. DefaultNULL uses current activeassay.

datasetVar

Metadata variable name that stores the dataset sourceannotation. Default"orig.ident".

Value

Updated object

liger method - Each involved dataset stored inligerDataset is updated with itsfeatureMetaslot andvarUnsharedFeatures slot (if requested withuseUnsharedDatasets), whilevarFeatures(object) will beupdated with the final combined gene set.
Seurat method - Final selection will be updated atSeurat::VariableFeatures(object). Per-dataset information isstored in themeta.features slot of the chosen Assay.

Examples

pbmc <- normalize(pbmc)# Select basing on thresholding the relative variancepbmc <- selectGenes(pbmc, thresh = .1)# Select specified number for each datasetpbmc <- selectGenes(pbmc, nGenes = c(60, 60))

Select variable genes from one dataset with Seurat VST method

Description

Seurat FindVariableFeatures VST method. This allows the selection of a fixednumber of variable features, but only applies to one dataset. Nonormalization is needed in advance.

Usage

selectGenesVST(  object,  useDataset,  n = 2000,  loessSpan = 0.3,  clipMax = "auto",  useShared = TRUE,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

Aliger object.

useDataset

The names, a numeric or logical index of the dataset tobe considered for selection.

n

Number of variable features needed. Default2000.

loessSpan

Loess span parameter used when fitting the variance-meanrelationship. Default0.3.

clipMax

After standardization values larger thanclipMax willbe set toclipMax. Default"auto" sets this value to the squareroot of the number of cells.

useShared

Logical. Whether to only select from genes shared by alldataset. DefaultTRUE.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

References

Seurat::FindVariableFeatures.default(selection.method = "vst")

Examples

pbmc <- selectGenesVST(pbmc, "ctrl", n = 50)

Subset liger with brackets

Description

Subset liger with brackets

Usage

## S3 method for class 'liger'x[i, j, ...]

Arguments

x

Aliger object

i

Feature subscriptor, passed tofeatureIdx ofsubsetLiger.

j

Cell subscriptor, passed tocellIdx ofsubsetLiger.

...

Additional arguments passed tosubsetLiger.

Value

Subset ofx with specified features and cells.

Examples

pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10]

Subset ligerDataset object

Description

Subset ligerDataset object

Usage

## S3 method for class 'ligerDataset'x[i, j, ...]

Arguments

x

AligerDataset object

i

Numeric, logical index or character vector of feature names tosubscribe. Leave missing for all features.

j

Numeric, logical index or character vector of cell IDs to subscribe.Leave missing for all cells.

...

Additional arguments passed tosubsetLigerDataset.

Value

Ifi is given, the selected metadata will be returned; if itis missing, the whole cell metadata table inS4Vectors::DataFrame class will be returned.

Examples

ctrl <- dataset(pbmc, "ctrl")ctrl[1:5, 1:5]

Get cell metadata variable

Description

Get cell metadata variable

Usage

## S3 method for class 'liger'x[[i, ...]]

Arguments

x

Aliger object

i

Name or numeric index of cell meta data to fetch

...

Anything thatS4Vectors::DataFramemethod allows.

Value

Ifi is given, the selected metadata will be returned; if itis missing, the whole cell metadata table inS4Vectors::DataFrame class will be returned.

Examples

# Retrieve whole cellMetapbmc[[]]# Retrieve a variablepbmc[["dataset"]]

Subset liger object

Description

This function subsets aliger object withcharacter feature index and any valid cell index. For datasets based on HDF5,the filenames of subset H5 files could only be automatically generated fornow. Feature subsetting is based on the intersection of available featuresfrom datasets involved bycellIdx, whilefeatureIdx = NULL doesnot take the intersection (i.e. nothing done on the feature axis).

aligerDataset object is also allowed for now and meanwhile,settingfilename is supported.

Usage

subsetLiger(  object,  featureIdx = NULL,  cellIdx = NULL,  useSlot = NULL,  chunkSize = 1000,  verbose = getOption("ligerVerbose", TRUE),  newH5 = TRUE,  returnObject = TRUE,  ...)

Arguments

object

Aliger orligerDataset object.

featureIdx

Character vector. Missing orNULL for allfeatures.

cellIdx

Character, logical or numeric index that can subscribe cells.Missing orNULL for all cells.

useSlot

The slot(s) to only consider. Choose one or more from"rawData","normData" and"scaleData". DefaultNULL subsets the whole object including analysis result matrices.

chunkSize

Integer. Number of maximum number of cells in each chunk,Default1000.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

newH5

Whether to create new H5 files on disk for the subset datasetsif involved datasets in theobject is HDF5 based.TRUE writes anew ones,FALSE returns in memory data.

returnObject

Logical, whether to return aliger objectfor result. DefaultTRUE.FALSE returns a list containingrequested values.

...

Arguments passed tosubsetLigerDataset

Value

Subsetobject

Examples

pbmc.small <- subsetLiger(pbmc, cellIdx = pbmc$nUMI > 200)pbmc.small <- pbmc[, pbmc$nGene > 50]

Subset ligerDataset object

Description

This function subsets aligerDataset object withvalid feature and cell indices. For HDF5 based object, options are availablefor subsetting data into memory or a new on-disk H5 file. Feature and cellsubscription is always based on the size of rawData. Therefore, the featuresubsetting on scaled data, which usually contains already a subset offeatures, will select the intersection between the wanted features and theset available from scaled data.

Usage

subsetLigerDataset(  object,  featureIdx = NULL,  cellIdx = NULL,  useSlot = NULL,  newH5 = TRUE,  filename = NULL,  filenameSuffix = NULL,  chunkSize = 1000,  verbose = getOption("ligerVerbose", TRUE),  returnObject = TRUE,  ...)subsetH5LigerDataset(  object,  featureIdx = NULL,  cellIdx = NULL,  useSlot = NULL,  newH5 = TRUE,  filename = NULL,  filenameSuffix = NULL,  chunkSize = 1000,  verbose = getOption("ligerVerbose", TRUE),  returnObject = TRUE)subsetMemLigerDataset(  object,  featureIdx = NULL,  cellIdx = NULL,  useSlot = NULL,  returnObject = TRUE)

Arguments

object

ligerDataset object. HDF5 based object if usingsubsetH5LigerDataset, in-memory data forsubsetMemLigerDataset.

featureIdx

Character, logical or numeric index that can subscribefeatures. Missing orNULL for all features.

cellIdx

Character, logical or numeric index that can subscribe cells.Missing orNULL for all cells.

useSlot

The slot(s) to only consider. Choose one or more from"rawData","normData" and"scaleData". DefaultNULL subsets the whole object including analysis result matrices.

newH5

Whether to create a new H5 file on disk for the subset datasetifobject is HDF5 based.TRUE writes a new one,FALSEreturns in memory data.

filename

Filename of the new H5 file if being created. DefaultNULL adds suffix".subset_{yymmdd_HHMMSS}.h5" to the originalname.

filenameSuffix

Instead of specifying the exact filename, set a suffixfor the new files so the new filename looks likeoriginal.h5.[suffix].h5. DefaultNULL.

chunkSize

Integer. Number of maximum number of cells in each chunk,Default1000.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") orTRUE if users have not set.

returnObject

Logical, whether to return aligerDatasetobject for result. DefaultTRUE.FALSE returns a listcontaining requested values.

...

Arguments passed tosubsetH5LigerDataset

Value

Subsetobject

Examples

ctrl <- dataset(pbmc, "ctrl")ctrl.small <- subsetLigerDataset(ctrl, cellIdx = 1:5)ctrl.tiny <- ctrl[1:5, 1:5]

Suggest optimal K value for the factorization

Description

This function sweeps through a series of k values (number of ranks thedatasets are factorized into). For each k value, it repeats the factorizationfor a number of random starts and obtains the objective errors from each run.The optimal k value is recommended to be the one with the lowest variance.

We are currently actively testing the methodology and the function issubject to change. Please report any issues you encounter.

Currently we have identified that a wider step of k values (e.g. 5, 10, 15,...) shows a more stable variance than a narrower step (e.g. 5, 6, 7, ...).

Note that this function is supposed to take a long time when a larger numberof random starts is requested (e.g. 50) for a robust suggestion. It is safeto interrupt the progress (e.g. Ctrl+C) and the function will still returnthe recorded objective errors already completed.

Usage

suggestK(  object,  kTest = seq(5, 50, 5),  nRandomStart = 10,  lambda = 5,  nIteration = 30,  nCores = 1L,  verbose = getOption("ligerVerbose", TRUE))

Arguments

object

Aliger object.

kTest

A numeric vector of k values to be tested. Default 5, 10, 15,..., 50.

nRandomStart

Number of random starts for each k value. Default10.

lambda

Regularization parameter. Default5.

nIteration

Number of iterations for each run. Default30.

nCores

Number of cores to use for each run. Default1L.

verbose

Whether to print progress messages. DefaultTRUE.

Value

A list containing:

stats

A data frame containing the k values, objective errors, andrandom starts.

figure

A ggplot2 object showing the objective errors and variancefor each k value. The left y-axis corresponds to the dots and bands, theright second y-axis maps to the blue line that stands for the variance.

Examples

pbmcPlot <- scaleNotCenter(pbmcPlot)# Minimum test example, not for demonstrative recommendationsuggests <- suggestK(    object = pbmcPlot,    kTest = c(2, 3),    nRandomStart = 2,    nIteration = 2)suggests$figure

Show significant results from factorGSEA

Description

Show significant results from factorGSEA

Usage

## S3 method for class 'factorGSEA'summary(object, ...)

Arguments

object

AfactorGSEA object.

...

S3 method convention, not used for now.

Value

A data frame of significant tests with gene set names, factor namesand other GSEA statistics.

Update old liger object to up-to-date structure

Description

Due to massive updates since rliger 2.0, old liger object structures are nolonger compatible with the current package. This function will update theobject to the latest structure.

Usage

updateLigerObject(  object,  dimredName,  clusterName = "clusters",  h5FilePath = NULL)

Arguments

object

An object of any version of rliger

dimredName

Name of the dimension reduction embedding to be stored.Please see Details section.

clusterName

Name of the clustering assignment variable to be stored.Please see Details section.

h5FilePath

Details

Old liger object (<1.99.0) stores only one embedding at slottsne.coords.dimredName must be specified as a singlecharacter. Pre-release version (1.99.0) stores multiple embeddings incellMeta.dimredName must be exact existing variable names incellMeta slot.

Old liger object stores clustering assignment in slotclusters.clusterName must be specified as a single character. Pre-releaseversion does not require this.

Value

Updated liger object.

Examples

## Not run: # Suppose you have a liger object of old version (<1.99.0)newLig <- updateLigerObject(oldLig,                            dimredName = "UMAP",                            clusterName = "louvain")## End(Not run)

Write in-memory data into H5 file

Description

This function writes in-memory data into H5 file by default in 10x cellrangerHDF5 output format. The main goal of this function is to allow users tointegrate large H5-based dataset, that cannot be fully loaded into memory,with other data already loaded in memory usingrunOnlineINMF.In this case, users can write the smaller in-memory data to H5 file insteadof loading subset of the large H5-based dataset into memory, whereinformation might be lost.

Basing on the goal of the whole workflow, the data will always be writtenin a CSC matrix format and colnames/rownames are always required.

The default method coerces the input to adgCMatrix-classobject. Methods for other container classes tries to extract proper data andcalls the default method.

Usage

writeH5(x, file, ...)## Default S3 method:writeH5(x, file, ...)## S3 method for class 'dgCMatrix'writeH5(  x,  file,  overwrite = FALSE,  indicesPath = "matrix/indices",  indptrPath = "matrix/indptr",  dataPath = "matrix/data",  shapePath = "matrix/shape",  barcodesPath = "matrix/barcodes",  featuresPath = "matrix/features/name",  ...)## S3 method for class 'ligerDataset'writeH5(x, file, ...)## S3 method for class 'liger'writeH5(x, file, useDatasets, ...)

Arguments

x

An object with in-memory data to be written into H5 file.

file

A character string of the file path to be written.

...

Arguments passed to other S3 methods.

overwrite

Logical, whether to overwrite the file if it already exists.DefaultFALSE.

indicesPath,indptrPath,dataPath

The paths inside the H5 file wherethedgCMatrix-class constructori,p, andx will be written to, respectively. Default using cellrangerconvention"matrix/indices","matrix/indptr", and"matrix/data".

shapePath

The path inside the H5 file where the shape of the matrixwill be written to. Default"matrix/shape".

barcodesPath

The path inside the H5 file where the barcodes/colnameswill be written to. Default"matrix/barcodes". Skipped if the objectdoes not have colnames.

featuresPath

The path inside the H5 file where the features/rownameswill be written to. Default"matrix/features/name". Skipped if theobject does not have rownames.

useDatasets

For liger method. Names or indices of datasets to bewritten to H5 files. Required.

Value

Nothing is returned. H5 file will be created on disk.

Examples

raw <- rawData(pbmc, "ctrl")writeH5(raw, tempfile(pattern = "ctrl_", fileext = ".h5"))

Write liger object to H5AD files

Description

Create an H5AD file from aliger object. This function writesonly raw counts toadata.X, while normalized and scaled expressiondata will not be written, because LIGER use different normalization andscaling strategy than most of the other tools utilizing H5AD format.

Supports for single sparse matrices or internalligerDatasetobjects are also provided if there is a need to convert single datasets.

Usage

writeH5AD(object, ...)## S3 method for class 'dgCMatrix'writeH5AD(  object,  filename,  obs = NULL,  var = NULL,  overwrite = FALSE,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'ligerDataset'writeH5AD(  object,  filename,  obs = NULL,  overwrite = FALSE,  verbose = getOption("ligerVerbose", TRUE),  ...)## S3 method for class 'liger'writeH5AD(  object,  filename,  overwrite = FALSE,  verbose = getOption("ligerVerbose", TRUE),  ...)

Arguments

object

One ofliger,ligerDataset ordgCMatrix-class object.

...

Arguments passed down to S3 methods

filename

A character string, the path to the H5AD file to be written

obs

External data.frame that contains metadata of the cells but doesnot embed inside the object. Rownames must be identical to the colnames ofobject.

var

External data.frame that contains metadata of the features butdoes not embed inside the object. Rownames must be identical to the rownamesof object.

overwrite

Logical, whether to overwrite the file if it exists.

verbose

Logical. Whether to show information of the progress. DefaultgetOption("ligerVerbose") which isTRUE if users have not set.

Value

No return value, an H5AD file is written to disk with the followingspecification, assuming the file is loaded toadata in Python:

adata.X - Raw count CSR matrix, outer joined with alldatasets
adata.obs - Cell metadata, with exactly same content ofcellMeta(object)
adata.var - Feature metadata containing only the feature namesas the index ofpd.DataFrame.
adata.obsm['X_inmf_aligned'] - The integrated embedding,aligned cell factor loading matrix, the primary output of LIGER, ifavailable.
adata.obsm['X_inmf'] - The raw cell factor loading matrix, ifavailable.
adata.obsm['<dimRedName>'] - The dimensional reduction matrix,such as UMAP or TSNE, if available.
adata.uns['inmf']['W'] - The shared factor feature loadingmatrix, if available.
adata.uns['inmf']['V']['<datasetName>'] - The dataset-specificfactor feature loading matrix, if available.
adata.uns['inmf']['features'] - The variable features beingused for factorization, supposed to match to the second shape of W and V,if available.
adata.uns['inmf']['lambda'] - The hyperparameter lambda used,the regularization parameter for the factorization, if available.
adata.uns['inmf']['k'] - The number of factors used for thefactorization, if available.

Examples

print("The example below works, but causes PDF manual rendering issue for some reason")## Not run: writeH5AD(pbmc, filename = tempfile(fileext = ".h5ad"))## End(Not run)

Movatterモバイル変換

rliger: Linked Inference of Genomic Experimental Relationships

Description

Author(s)

See Also

Generate dot plot from input matrix with ComplexHeatmap

Description

Usage

Arguments

Value

Produce single violin plot with data frame passed from upstream

Description

Usage

Arguments

Value

Produce single scatter plot with data frame passed from upstream

Description

Usage

Arguments

Details

Value

Generic ggplot theme setting for rliger package

Description

Usage

Arguments

Value

General heatmap plotting with prepared matrix and data.frames

Description

Usage

Arguments

Value

Apply function to chunks of H5 data in ligerDataset object

Description

Usage

Arguments

Details

Align factor loadings to get final integration

Description

Usage

Arguments

See Also

Converting other classes of data to a liger object

Description

Usage

Arguments

Details

Value

Examples

Converting other classes of data to a ligerDataset object

Description

Usage

Arguments

Value

Examples

liger object of bone marrow subsample data with RNA and ATAC modality

Description

Usage

Format

Source

References

Calculate adjusted Rand index (ARI) by comparing two cluster labeling variables

Description

Usage

Arguments

Value

References

Examples

Calculate agreement metric after integration

Description

Usage

Arguments

Value

Examples

Calculate alignment metric after integration

Description

Usage

Arguments

Details

Value

Examples