| Type: | Package |
| Title: | Cluster Independent Algorithm for Rare Cell Types Identification |
| Version: | 0.1.0 |
| Author: | Gabriele Lubatti |
| Maintainer: | Gabriele Lubatti<gabriele.lubatti@helmholtz-muenchen.de> |
| Description: | Identification of markers of rare cell types by looking at genes whose expression is confined in small regions of the expression spacehttps://github.com/ScialdoneLab. |
| License: | Artistic-2.0 |
| Depends: | R (≥ 4.0) |
| Imports: | Biobase, ggplot2, ggraph, magrittr |
| Suggests: | circlize, clustree, ComplexHeatmap, plotly, Seurat (≥ 4.0),testthat, knitr, rmarkdown |
| biocViews: | software |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.1.1 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2022-02-22 11:07:19 UTC; gabriele.lubatti |
| Repository: | CRAN |
| Date/Publication: | 2022-02-22 20:00:02 UTC |
CIARA
Description
It selects highly localized genes as specified inCIARA_gene,starting from genes inbackground
Usage
CIARA( norm_matrix, knn_matrix, background, cores_number = 1, p_value = 0.001, odds_ratio = 2, local_region = 1, approximation = FALSE)Arguments
norm_matrix | Norm count matrix (n_genes X n_cells). |
knn_matrix | K-nearest neighbors matrix (n_cells X n_cells). |
background | Vector of genes for which the functionCIARA_geneis run. |
cores_number | Integer.Number of cores to use. |
p_value | p value returned by the functionfisher.test withparameter alternative = "g" |
odds_ratio | odds_ratio returned by the functionfisher.testwith parameter alternative = "g" |
local_region | Integer. Minimum number of local regions (cell with itsknn neighbours) where the binarized gene expression is enriched in 1. |
approximation | Logical.For a given gene, the fisher test is run in thelocal regions of only the cells where the binarized gene expression is 1. |
Value
Dataframe with n_rows equal to the length ofbackground . Each row is the output fromCIARA_gene.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
CIARA_gene
Description
The gene expression is binarized (1/0) if the value in a given cell isabove/below the median. Each of cell with its first K nearest neighborsdefined a local region. If there are at leastlocal_region enrichedin 1 according tofisher.test, then the gene is defined as highlylocalized and a final p value is assigned to it. The final p value is theminimum of the p values from all the enriched local regions. If there are noenriched local regions, then the p value by default is set to 1
Usage
CIARA_gene( norm_matrix, knn_matrix, gene_expression, p_value = 0.001, odds_ratio = 2, local_region = 1, approximation = FALSE)Arguments
norm_matrix | Norm count matrix (n_genes X n_cells). |
knn_matrix | K-nearest neighbors matrix (n_cells X n_cells). |
gene_expression | numeric vector with the gene expression (length equalto n_cells). The gene expression is binarized (equal to 0/1 in the cellswhere the value is below/above the median) |
p_value | p value returned by the functionfisher.test withparameter alternative = "g" |
odds_ratio | odds_ratio returned by the functionfisher.testwith parameter alternative = "g" |
local_region | Integer. Minimum number of local regions (cell with itsknn neighbours) where the binarized gene expression is enriched in 1. |
approximation | Logical.For a given gene, the fisher test is run in thelocal regions of only the cells where the binarized gene expression is 1. |
Value
List with one element corresponding to the p value of the gene.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/fisher.test
cluster_analysis_integrate_rare
Description
cluster_analysis_integrate_rare
Usage
cluster_analysis_integrate_rare( raw_counts, project_name, resolution, neighbors, max_dimension, feature_genes = NULL)Arguments
raw_counts | Raw count matrix (n_genes X n_cells). |
project_name | Character name of the Seurat project. |
resolution | Numeric value specifying the parameterresolutionused in the Seurat functionFindClusters. |
neighbors | Numeric value specifying the parameterk.param inthe Seurat functionFindNeighbors |
max_dimension | Numeric value specifying the maximum number of the PCAdimensions used in the parameterdims for the Seurat functionFindNeighbors |
feature_genes | vector of features specifying the argumentfeatures in the Seurat functionRunPCA. |
Value
Seurat object including raw and normalized counts matrices, UMAP coordinates and cluster result.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindClustershttps://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindNeighborshttps://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/RunPCA
cluster_analysis_sub
Description
cluster_analysis_sub
Usage
cluster_analysis_sub( raw_counts, resolution, neighbors, max_dimension, name_cluster)Arguments
raw_counts | Raw count matrix (n_genes X n_cells). |
resolution | Numeric value specifying the parameterresolutionused in the Seurat functionFindClusters. |
neighbors | Numeric value specifying the parameterk.param inthe Seurat functionFindNeighbors |
max_dimension | Numeric value specifying the maximum number of the PCAdimensions used in the parameterdims for the Seurat functionFindNeighbors |
name_cluster | Character.Name of the original cluster for which the subclustering is done. |
Value
Seurat object including raw and normalized counts matrices and cluster result.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/RunPCAhttps://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindVariableFeatures
find_resolution
Description
find_resolution
Usage
find_resolution(seurat_object, resolution_vector)Arguments
seurat_object | Seurat object as returned bycluster_analysis_integrate_rare |
resolution_vector | vector with all values of resolution for which theSeurat functionFindClusters is run |
Value
Clustree object showing the connection between clusters obtained at different level of resolution as specified inresolution_vector.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://CRAN.R-project.org/package=clustree
get_background_full
Description
get_background_full
Usage
get_background_full( norm_matrix, threshold = 1, n_cells_low = 3, n_cells_high = 20)Arguments
norm_matrix | Norm count matrix (n_genes X n_cells). |
threshold | threshold in expression for a given gene |
n_cells_low | minimum number of cells where a gene is expressed at alevel above threshold |
n_cells_high | maximum number of cells where a gene is expressed at alevel above threshold |
Value
Character vector with all genes expressed at a level higher thanthreshold in a number of cells betweenn_cells andn_cells_high.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
markers_cluster_seurat
Description
The Seurat functionFindMarkers is used to identify general markerfor each cluster (specific cluster vs all other cluster). This list ofmarkers is then filtered keeping only the genes that appear as markers in aunique cluster.
Usage
markers_cluster_seurat(seurat_object, cluster, cell_names, number_top)Arguments
seurat_object | Seurat object as returned bycluster_analysis_sub or bycluster_analysis_integrate_rare. |
cluster | Vector of length equal to the number of cells, with clusterassignment. |
cell_names | Vector of length equal to the number of cells, with cellnames. |
number_top | Integer. Number of top marker genes to keep for eachcluster. |
Value
List of three elements. The first is a vector withnumber_topmarker genes for each cluster. The second is a vector withnumber_topmarker genes and corresponding cluster. The third element is a vector withall marker genes for each cluster.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindMarkers
merge_cluster
Description
merge_cluster
Usage
merge_cluster(old_cluster, new_cluster, max_number = NULL)Arguments
old_cluster | original cluster assignment that need to be updated |
new_cluster | new cluster assignment that need to be integrated withold_cluster. |
max_number | Threshold in size for clusters innew_cluster. Onlycluster with number of cells smaller thanmax_number will beintegrated inold cluster. Ifmax_number is NULL, then all the clusters innew_cluster are integrated inold cluster. |
Value
Numeric vector of length equal toold_cluster showing the merged cluster assignment betweenold cluster andnew_cluster.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
plot_balloon_marker
Description
plot_balloon_marker
Usage
plot_balloon_marker( norm_counts, cluster, marker_complete, max_number, max_size = 5, text_size = 7)Arguments
norm_counts | Norm count matrix (genes X cells). |
cluster | Vector of length equal to the number of cells, with clusterassignment. |
marker_complete | Third element of the output list asreturned by the functionmarkers_cluster_seurat |
max_number | Integer. Maximum number of markers for each cluster forwhich we want to plot the expression. |
max_size | Integer. Size of the dots to be plotted. |
text_size | Size of the text in the heatmap plot. |
Value
ggplot2 object showing balloon plot.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
plot_gene
Description
Cells are coloured according to the expression ofgene_id and plottedaccording tocoordinate_umap.
Usage
plot_gene(norm_counts, coordinate_umap, gene_id, title_name)Arguments
norm_counts | Norm count matrix (genes X cells). |
coordinate_umap | Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells |
gene_id | Character name of the gene. |
title_name | Character name. |
Value
ggplot2 object.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://CRAN.R-project.org/package=ggplot2
plot_genes_sum
Description
The sum of each gene ingenes_relevant across all cells is firstnormalized to 1. Then for each cell, the sum from the (normalized) genesexpression is computed and shown in the output plot.
Usage
plot_genes_sum(coordinate_umap, norm_counts, genes_relevant, name_title)Arguments
coordinate_umap | Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells |
norm_counts | Norm count matrix (genes X cells). |
genes_relevant | Vector with gene names for which we want to visualizethe sum in each cell. |
name_title | Character value. |
Value
ggplot2 object.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://CRAN.R-project.org/package=ggplot2
plot_heatmap_marker
Description
plot_heatmap_marker
Usage
plot_heatmap_marker( marker_top, marker_all_cluster, cluster, condition, norm_counts, text_size)Arguments
marker_top | First element returned bymarkers_cluster_seurat |
marker_all_cluster | Second element returned bymarkers_cluster_seurat |
cluster | Vector of length equal to the number of cells, with clusterassignment. |
condition | Vector or length equal to the number of cells, specifyingthe condition of the cells (i.e. batch, dataset of origin..) |
norm_counts | Norm count matrix (genes X cells). |
text_size | Size of the text in the heatmap plot. |
Value
Heatmap class object.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://www.rdocumentation.org/packages/ComplexHeatmap/versions/1.10.2/topics/Heatmap
plot_interactive
Description
It shows in an interactive plot which are the highly localized genes in eachcell. It is based on plotly library
Usage
plot_interactive( coordinate_umap, color, text, min_x = NULL, max_x = NULL, min_y = NULL, max_y = NULL)Arguments
coordinate_umap | Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells |
color | vector of length equal to n_rows in coordinate_umap.Each cellwill be coloured following a gradient according to the corresponding valueof this vector. |
text | Character vector specifying the highly localized genes in eachcell. It is the output fromselection_localized_genes. |
min_x | Set the min limit on the x axis. |
max_x | Set the max limit on the x axis. |
min_y | Set the min limit on the y axis. |
max_y | Set the min limit on the y axis. |
Value
plotly object given byplot_ly function (from libraryplotly).
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
plot_umap
Description
plot_umap
Usage
plot_umap(coordinate_umap, cluster)Arguments
coordinate_umap | Data frame with dimensionality reduction coordinates.Number of rows must be equal to the number of cells |
cluster | Vector of length equal to the number of cells, with clusterassignment. |
Value
ggplot2 object.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://CRAN.R-project.org/package=ggplot2
Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
- ggraph
selection_localized_genes
Description
selection_localized_genes
Usage
selection_localized_genes( norm_counts, localized_genes, min_number_cells = 4, max_number_genes = 10)Arguments
norm_counts | Norm count matrix (genes X cells). |
localized_genes | vector of highly localized genes as provided by thelast element of the list given as output fromCIARA_mixing_final. |
min_number_cells | Minimum number of cells where a genes must beexpressed (> 0). |
max_number_genes | Maximum number of genes to show for each cell in theinteractive plot fromplot_interactive. |
Value
Character vector where each entry contains the name of the topmax_number_genes for the corresponding cell.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
test_hvg
Description
For each cluster incluster, HVGs are defined withSeurat functionFindVariableFeatures. A Fisher test is performed tosee if there is a statistically significant enrichment between the topnumber_hvg and thelocalized_genes
Usage
test_hvg( raw_counts, cluster, localized_genes, background, number_hvg, min_p_value)Arguments
raw_counts | Raw count matrix (n_genes X n_cells). |
cluster | Vector of length equal to the number of cells, with clusterassignment. |
localized_genes | Character vector with localized genes detected by CIARA. |
background | Character vector with all the genes names to use asbackground for the Fisher test. |
number_hvg | Integer value. Number of top HVGs provided by the SeuratfunctionFindVariableFeatures. |
min_p_value | Threshold on p values provided by Fisher test. |
Value
A list with two elements.
first element | The first one is alist with length equal to the number of clusters. Each entry is list ofthree elements. The first two elements contain the p value and the oddsration given by the Fisher test The third is a vector with genes names thatare present both inlocalized_genes and in topnumber_hvg HVGs. |
second element | a character vector with the name of the cluster thathave a p value smaller thanmin_p_value. |
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de
See Also
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/fisher.test
white_black_markers
Description
A white-marker is a gene whose median expression across cells belong tosingle_cluster is greater thanthreshold and in all the otherclusters is equal to zero.
Usage
white_black_markers( cluster, single_cluster, norm_counts, marker_list, threshold = 0)Arguments
cluster | Vector of length equal to the number of cells, with clusterassignment. |
single_cluster | Character. Label of one specify cluster |
norm_counts | Norm count matrix (genes X cells). |
marker_list | Third element of the output list as returned by thefunctionmarkers_cluster_seurat |
threshold | Numeric. The median of the genes across cells belong tosingle_cluster has to be greater thanthreshold in order to beconsider as a white-black marker forsingle_cluster |
Value
Logical vector of length equal tomarker_list, withTRUE/FALSE if the gene is/is not a white-black marker forsingle_cluster.
Author(s)
Gabriele Lubattigabriele.lubatti@helmholtz-muenchen.de