Movatterモバイル変換

Type:

Package

Title:

Cluster Independent Algorithm for Rare Cell Types Identification

Version:

0.1.0

Author:

Gabriele Lubatti

Maintainer:

Gabriele Lubatti<gabriele.lubatti@helmholtz-muenchen.de>

Description:

Identification of markers of rare cell types by looking at genes whose expression is confined in small regions of the expression spacehttps://github.com/ScialdoneLab.

License:

Artistic-2.0

Depends:

R (≥ 4.0)

Imports:

Biobase, ggplot2, ggraph, magrittr

Suggests:

circlize, clustree, ComplexHeatmap, plotly, Seurat (≥ 4.0),testthat, knitr, rmarkdown

biocViews:

software

Config/testthat/edition:

Encoding:

UTF-8

RoxygenNote:

7.1.1

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2022-02-22 11:07:19 UTC; gabriele.lubatti

Repository:

CRAN

Date/Publication:

2022-02-22 20:00:02 UTC

CIARA

Description

It selects highly localized genes as specified inCIARA_gene,starting from genes inbackground

Usage

CIARA(  norm_matrix,  knn_matrix,  background,  cores_number = 1,  p_value = 0.001,  odds_ratio = 2,  local_region = 1,  approximation = FALSE)

Arguments

norm_matrix

Norm count matrix (n_genes X n_cells).

knn_matrix

K-nearest neighbors matrix (n_cells X n_cells).

background

Vector of genes for which the functionCIARA_geneis run.

cores_number

Integer.Number of cores to use.

p_value

p value returned by the functionfisher.test withparameter alternative = "g"

odds_ratio

odds_ratio returned by the functionfisher.testwith parameter alternative = "g"

local_region

Integer. Minimum number of local regions (cell with itsknn neighbours) where the binarized gene expression is enriched in 1.

approximation

Logical.For a given gene, the fisher test is run in thelocal regions of only the cells where the binarized gene expression is 1.

Value

Dataframe with n_rows equal to the length ofbackground . Each row is the output fromCIARA_gene.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

CIARA_gene

Description

The gene expression is binarized (1/0) if the value in a given cell isabove/below the median. Each of cell with its first K nearest neighborsdefined a local region. If there are at leastlocal_region enrichedin 1 according tofisher.test, then the gene is defined as highlylocalized and a final p value is assigned to it. The final p value is theminimum of the p values from all the enriched local regions. If there are noenriched local regions, then the p value by default is set to 1

Usage

CIARA_gene(  norm_matrix,  knn_matrix,  gene_expression,  p_value = 0.001,  odds_ratio = 2,  local_region = 1,  approximation = FALSE)

Arguments

norm_matrix

Norm count matrix (n_genes X n_cells).

knn_matrix

K-nearest neighbors matrix (n_cells X n_cells).

gene_expression

numeric vector with the gene expression (length equalto n_cells). The gene expression is binarized (equal to 0/1 in the cellswhere the value is below/above the median)

p_value

p value returned by the functionfisher.test withparameter alternative = "g"

odds_ratio

odds_ratio returned by the functionfisher.testwith parameter alternative = "g"

local_region

Integer. Minimum number of local regions (cell with itsknn neighbours) where the binarized gene expression is enriched in 1.

approximation

Logical.For a given gene, the fisher test is run in thelocal regions of only the cells where the binarized gene expression is 1.

Value

List with one element corresponding to the p value of the gene.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

cluster_analysis_integrate_rare

Description

cluster_analysis_integrate_rare

Usage

cluster_analysis_integrate_rare(  raw_counts,  project_name,  resolution,  neighbors,  max_dimension,  feature_genes = NULL)

Arguments

raw_counts

Raw count matrix (n_genes X n_cells).

project_name

Character name of the Seurat project.

resolution

Numeric value specifying the parameterresolutionused in the Seurat functionFindClusters.

neighbors

Numeric value specifying the parameterk.param inthe Seurat functionFindNeighbors

max_dimension

Numeric value specifying the maximum number of the PCAdimensions used in the parameterdims for the Seurat functionFindNeighbors

feature_genes

vector of features specifying the argumentfeatures in the Seurat functionRunPCA.

Value

Seurat object including raw and normalized counts matrices, UMAP coordinates and cluster result.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

cluster_analysis_sub

Description

cluster_analysis_sub

Usage

cluster_analysis_sub(  raw_counts,  resolution,  neighbors,  max_dimension,  name_cluster)

Arguments

raw_counts

Raw count matrix (n_genes X n_cells).

resolution

Numeric value specifying the parameterresolutionused in the Seurat functionFindClusters.

neighbors

Numeric value specifying the parameterk.param inthe Seurat functionFindNeighbors

max_dimension

Numeric value specifying the maximum number of the PCAdimensions used in the parameterdims for the Seurat functionFindNeighbors

name_cluster

Character.Name of the original cluster for which the subclustering is done.

Value

Seurat object including raw and normalized counts matrices and cluster result.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

find_resolution

Description

find_resolution

Usage

find_resolution(seurat_object, resolution_vector)

Arguments

seurat_object

Seurat object as returned bycluster_analysis_integrate_rare

resolution_vector

vector with all values of resolution for which theSeurat functionFindClusters is run

Value

Clustree object showing the connection between clusters obtained at different level of resolution as specified inresolution_vector.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

get_background_full

Description

get_background_full

Usage

get_background_full(  norm_matrix,  threshold = 1,  n_cells_low = 3,  n_cells_high = 20)

Arguments

norm_matrix

Norm count matrix (n_genes X n_cells).

threshold

threshold in expression for a given gene

n_cells_low

minimum number of cells where a gene is expressed at alevel above threshold

n_cells_high

maximum number of cells where a gene is expressed at alevel above threshold

Value

Character vector with all genes expressed at a level higher thanthreshold in a number of cells betweenn_cells andn_cells_high.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

markers_cluster_seurat

Description

The Seurat functionFindMarkers is used to identify general markerfor each cluster (specific cluster vs all other cluster). This list ofmarkers is then filtered keeping only the genes that appear as markers in aunique cluster.

Usage

markers_cluster_seurat(seurat_object, cluster, cell_names, number_top)

Arguments

seurat_object

Seurat object as returned bycluster_analysis_sub or bycluster_analysis_integrate_rare.

cluster

Vector of length equal to the number of cells, with clusterassignment.

cell_names

Vector of length equal to the number of cells, with cellnames.

number_top

Integer. Number of top marker genes to keep for eachcluster.

Value

List of three elements. The first is a vector withnumber_topmarker genes for each cluster. The second is a vector withnumber_topmarker genes and corresponding cluster. The third element is a vector withall marker genes for each cluster.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

merge_cluster

Description

merge_cluster

Usage

merge_cluster(old_cluster, new_cluster, max_number = NULL)

Arguments

old_cluster

original cluster assignment that need to be updated

new_cluster

new cluster assignment that need to be integrated withold_cluster.

max_number

Threshold in size for clusters innew_cluster. Onlycluster with number of cells smaller thanmax_number will beintegrated inold cluster. Ifmax_number is NULL, then all the clusters innew_cluster are integrated inold cluster.

Value

Numeric vector of length equal toold_cluster showing the merged cluster assignment betweenold cluster andnew_cluster.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

plot_balloon_marker

Description

plot_balloon_marker

Usage

plot_balloon_marker(  norm_counts,  cluster,  marker_complete,  max_number,  max_size = 5,  text_size = 7)

Arguments

norm_counts

Norm count matrix (genes X cells).

cluster

Vector of length equal to the number of cells, with clusterassignment.

marker_complete

Third element of the output list asreturned by the functionmarkers_cluster_seurat

max_number

Integer. Maximum number of markers for each cluster forwhich we want to plot the expression.

max_size

Integer. Size of the dots to be plotted.

text_size

Size of the text in the heatmap plot.

Value

ggplot2 object showing balloon plot.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

plot_gene

Description

Cells are coloured according to the expression ofgene_id and plottedaccording tocoordinate_umap.

Usage

plot_gene(norm_counts, coordinate_umap, gene_id, title_name)

Arguments

norm_counts

Norm count matrix (genes X cells).

coordinate_umap

Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells

gene_id

Character name of the gene.

title_name

Character name.

Value

ggplot2 object.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

plot_genes_sum

Description

The sum of each gene ingenes_relevant across all cells is firstnormalized to 1. Then for each cell, the sum from the (normalized) genesexpression is computed and shown in the output plot.

Usage

plot_genes_sum(coordinate_umap, norm_counts, genes_relevant, name_title)

Arguments

coordinate_umap

Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells

norm_counts

Norm count matrix (genes X cells).

genes_relevant

Vector with gene names for which we want to visualizethe sum in each cell.

name_title

Character value.

Value

ggplot2 object.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

plot_heatmap_marker

Description

plot_heatmap_marker

Usage

plot_heatmap_marker(  marker_top,  marker_all_cluster,  cluster,  condition,  norm_counts,  text_size)

Arguments

marker_top

First element returned bymarkers_cluster_seurat

marker_all_cluster

Second element returned bymarkers_cluster_seurat

cluster

Vector of length equal to the number of cells, with clusterassignment.

condition

Vector or length equal to the number of cells, specifyingthe condition of the cells (i.e. batch, dataset of origin..)

norm_counts

Norm count matrix (genes X cells).

text_size

Size of the text in the heatmap plot.

Value

Heatmap class object.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

plot_interactive

Description

It shows in an interactive plot which are the highly localized genes in eachcell. It is based on plotly library

Usage

plot_interactive(  coordinate_umap,  color,  text,  min_x = NULL,  max_x = NULL,  min_y = NULL,  max_y = NULL)

Arguments

coordinate_umap

Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells

color

vector of length equal to n_rows in coordinate_umap.Each cellwill be coloured following a gradient according to the corresponding valueof this vector.

text

Character vector specifying the highly localized genes in eachcell. It is the output fromselection_localized_genes.

min_x

Set the min limit on the x axis.

max_x

Set the max limit on the x axis.

min_y

Set the min limit on the y axis.

max_y

Set the min limit on the y axis.

Value

plotly object given byplot_ly function (from libraryplotly).

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

plot_umap

Description

plot_umap

Usage

plot_umap(coordinate_umap, cluster)

Arguments

coordinate_umap

Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells

cluster

Vector of length equal to the number of cells, with clusterassignment.

Value

ggplot2 object.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

Objects exported from other packages

Description

These objects are imported from other packages. Follow the linksbelow to see their documentation.

ggraph: guide_edge_colourbar

selection_localized_genes

Description

selection_localized_genes

Usage

selection_localized_genes(  norm_counts,  localized_genes,  min_number_cells = 4,  max_number_genes = 10)

Arguments

norm_counts

Norm count matrix (genes X cells).

localized_genes

vector of highly localized genes as provided by thelast element of the list given as output fromCIARA_mixing_final.

min_number_cells

Minimum number of cells where a genes must beexpressed (> 0).

max_number_genes

Maximum number of genes to show for each cell in theinteractive plot fromplot_interactive.

Value

Character vector where each entry contains the name of the topmax_number_genes for the corresponding cell.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

test_hvg

Description

For each cluster incluster, HVGs are defined withSeurat functionFindVariableFeatures. A Fisher test is performed tosee if there is a statistically significant enrichment between the topnumber_hvg and thelocalized_genes

Usage

test_hvg(  raw_counts,  cluster,  localized_genes,  background,  number_hvg,  min_p_value)

Arguments

raw_counts

Raw count matrix (n_genes X n_cells).

cluster

Vector of length equal to the number of cells, with clusterassignment.

localized_genes

Character vector with localized genes detected by CIARA.

background

Character vector with all the genes names to use asbackground for the Fisher test.

number_hvg

Integer value. Number of top HVGs provided by the SeuratfunctionFindVariableFeatures.

min_p_value

Threshold on p values provided by Fisher test.

Value

A list with two elements.

first element

The first one is alist with length equal to the number of clusters. Each entry is list ofthree elements. The first two elements contain the p value and the oddsration given by the Fisher test The third is a vector with genes names thatare present both inlocalized_genes and in topnumber_hvg HVGs.

second element

a character vector with the name of the cluster thathave a p value smaller thanmin_p_value.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

white_black_markers

Description

A white-marker is a gene whose median expression across cells belong tosingle_cluster is greater thanthreshold and in all the otherclusters is equal to zero.

Usage

white_black_markers(  cluster,  single_cluster,  norm_counts,  marker_list,  threshold = 0)

Arguments

cluster

Vector of length equal to the number of cells, with clusterassignment.

single_cluster

Character. Label of one specify cluster

norm_counts

Norm count matrix (genes X cells).

marker_list

Third element of the output list as returned by thefunctionmarkers_cluster_seurat

threshold

Numeric. The median of the genes across cells belong tosingle_cluster has to be greater thanthreshold in order to beconsider as a white-black marker forsingle_cluster

Value

Logical vector of length equal tomarker_list, withTRUE/FALSE if the gene is/is not a white-black marker forsingle_cluster.

Author(s)

Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de

Movatterモバイル変換

CIARA

Description

Usage

Arguments

Value

Author(s)

CIARA_gene

Description

Usage

Arguments

Value

Author(s)

See Also

cluster_analysis_integrate_rare

Description

Usage

Arguments

Value

Author(s)

See Also

cluster_analysis_sub

Description

Usage

Arguments

Value

Author(s)

See Also

find_resolution

Description

Usage

Arguments

Value

Author(s)

See Also

get_background_full

Description

Usage

Arguments

Value

Author(s)

markers_cluster_seurat

Description

Usage

Arguments

Value

Author(s)

See Also

merge_cluster

Description

Usage

Arguments

Value

Author(s)

plot_balloon_marker

Description

Usage

Arguments

Value

Author(s)

plot_gene

Description

Usage

Arguments

Value

Author(s)

See Also

plot_genes_sum

Description

Usage

Arguments

Value

Author(s)

See Also

plot_heatmap_marker

Description

Usage

Arguments

Value

Author(s)