| Title: | A Bootstrap-Based Power Estimation Tool for SpatialTranscriptomics |
| Version: | 0.1.2 |
| Imports: | scam, dplyr, resample, xgboost, magrittr, ggplot2 |
| Suggests: | patchwork, boot, fields, rayrender, tidyr, plotly, rayshader,Seurat, knitr, rmarkdown |
| Author: | Lan Shui |
| Maintainer: | Lan Shui <Lan.Shui@uth.tmc.edu> |
| Description: | Power estimation and sample size calculation for 10X Visium Spatial Transcriptomics data to detect differential expressed genes between two conditions based on bootstrap resampling. See Shui et al. (2025) <doi:10.1371/journal.pcbi.1013293> for method details. |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 2.10) |
| LazyData: | true |
| NeedsCompilation: | no |
| Packaged: | 2025-12-07 03:12:22 UTC; shuilan |
| Repository: | CRAN |
| Date/Publication: | 2025-12-09 12:00:07 UTC |
Pipe operator
Description
Seemagrittr::%>% for details.
Usage
lhs %>% rhsArguments
lhs | A value or the magrittr placeholder. |
rhs | A function call using the magrittr semantics. |
Value
The result of callingrhs(lhs).
Bootstrap resampling and power calculation upon ST data
Description
This function performs bootstrap resampling upon a Seurat subject under each conditionto resemble the real dataset which allows the exact power calculation, and perform DE analysis.Users can specify the test they would like to perform for the DE analysis in '...' which should not containmin.pct and logfc.threshold or other parameters attempt to pre-filter genes, as we specify min.pct and logfc.threshold as 0sto calculate power for all the genes available. Therefore it may take one night to run if the ST data owns over thousands of genes.To speed up this process, one may want to try function 'PoweREST_subset' where the pre-filter of genes are included in this process.
Usage
PoweREST(Seurat_obj,cond,replicates=1,spots_num,iteration=100,random_seed=1,pvalue=0.05,...)Arguments
Seurat_obj | ASeurat object. |
cond | The name of the variable that indicates different conditions which is also stored in themeta.data of the Seurat_obj and should be in character type. |
replicates | The number of sample replicates per group. |
spots_num | The number of spots per replicate. |
iteration | The number of iterations of the resampling. |
random_seed | To set a random seed. |
pvalue | The pvalue that will be considered significant. |
... | DE test to use other than the default Wilcoxon test. |
Value
A list of values containing the power, average log2FC and percentage of spots detecting the gene amongthe resampling data, the replicate value and the spots number per slice specified by the user and corresponding genes' name.
Author(s)
Lan Shuilshui@mdanderson.org
Bootstrap resampling and power estimation for one single gene
Description
This function performs bootstrap resampling upon a Seurat subject under each conditionto resemble the real dataset which allows the exact power calculation, and perform DEanalysis upon one gene specified by the user. Users can specify the test they would liketo perform for the DE analysis in '...'. Note that the results are not multiple testingcorrected, therefore should be interpreted carefully.
Usage
PoweREST_gene(Seurat_obj,cond,replicates=1,spots_num,gene_name,iteration=100,random_seed=1,pvalue=0.05,...)Arguments
Seurat_obj | ASeurat object. |
cond | The name of the variable that indicates different conditions which is also stored in themeta.data of the Seurat_obj and should be in character type. |
replicates | The number of sample replicates per group. |
spots_num | The number of spots per replicate. |
gene_name | Specify the name of gene for power calculation. |
iteration | The number of iterations of the resampling. |
random_seed | To set a random seed. |
pvalue | The pvalue that will be considered significant. |
... | DE Test to use other than the default Wilcoxon test. |
Value
A list of values containing the power, average log2FC and percentage of spots detecting the gene amongthe resampling data, the replicate value and the spots number per slice specified by the user and corresponding gene's name.
Author(s)
Lan Shuilshui@mdanderson.org
Bootstrap resampling and power calculation for a subset of genes
Description
This function performs bootstrap resampling upon a Seurat subject under each conditionto resemble the real dataset which allows the exact power calculation, and perform DE analysis.Similar to 'PoweREST', users can specify the test they would like to perform for the DE analysisin '...' (more test options can be refered toSeurat.Different to 'PoweREST', users can specify the values of 'min.pct' and 'logfc.threshold'to pre-filter the genes based on their minimum detection rate 'min.pct' and at least X-fold difference (log-scale)('logfc.threshold') across both groups. But this kind of filtering can miss weaker signals.
Usage
PoweREST_subset(Seurat_obj,cond,replicates=1,spots_num,iteration=100,random_seed=1,pvalue=0.05,logfc.threshold = 0.1,min.pct = 0.01,...)Arguments
Seurat_obj | ASeurat object. |
cond | The name of the variable that indicates different conditions which is also stored in themeta.data of the Seurat_obj and should be in character type. |
replicates | The number of sample replicates per group. |
spots_num | The number of spots per replicate. |
iteration | The number of iterations of the resampling. |
random_seed | To set a random seed. |
pvalue | The pvalue that will be considered significant. |
logfc.threshold | For every resampling, limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups.Default is 0.1 Increasing logfc.threshold speeds up the function, but can miss weaker signals. |
min.pct | For every resampling, only test genes that are detected in a minimum fraction of min.pct spots in either of the two populations.Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.01. |
... | DE test to use other than the default Wilcoxon test. |
Value
A list of values containing the power, average log2FC and percentage of spots detecting the gene amongthe resampling data, the replicate value and the spots number per slice specified by the user and the filtered.
Author(s)
Lan Shuilshui@mdanderson.org
Fit with XGBoost
Description
This function estimates the power values based on XGBoost under 3-dimensional monotone constraints upon avg_log2FC, avg_PCTand replicates. This function is recommended when there exist crossings between power surfaces fitted by 'fit_powerest' andused for estimating local power values.
Usage
fit_XGBoost(power,avg_log2FC,avg_PCT,replicates,filter_zero=TRUE,max_depth=6,learning_rate=0.3,nrounds=100)Arguments
power | The raw power values. |
avg_log2FC | The corresponding log2FC values. |
avg_PCT | The corresponding PCT values. |
replicates | The corresponding replicates number. |
filter_zero | Whether the user would like to filter to remove the power values being 0. Default=TRUE. |
max_depth | Maximum depth of a tree. Default=6. |
learning_rate | Control the learning rate: scale the contribution of each tree by a factor of 0 < learning_rate < 1 when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Default=0.3. |
nrounds | Max number of boosting iterations. |
Value
A object of class 'xgb.Booster'. More information about the content of a 'xgb.Booster' object can be foundat the document of R packagexgboost.
Author(s)
Lan Shuilshui@mdanderson.org
Examples
data(power_example)# Fit the local power surface of avg_log2FC_abs between 1 and 2avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2)# Fit the modelbst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs,avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)Fit the power surface
Description
This function loads the power values with corresponding avg_log2FC and avg_PCT derived from bootstrap samplingand utilizes thescam package to fit two dimensionalsmoothing splines under monotone constraints: 1.positive relationship between power and avg_log2FC;2.positive relationship between power and avg_PCT. The values of avg_log2FC and avg_PCT can be eitherfrom the averages of the bootstrap samples or from the original spatial transcriptomics data.
Usage
fit_powerest(power,avg_log2FC,avg_PCT,filter_zero=TRUE)Arguments
power | The raw power values. |
avg_log2FC | The corresponding log2FC values. |
avg_PCT | The corresponding PCT values. |
filter_zero | Whether the user would like to filter to remove the power values being 0, default=TRUE. |
Value
A 'scam' object is the result of scam function. More information about the content of a 'scam' object can be foundat the document of R packagescam.
Author(s)
Lan Shuilshui@mdanderson.org
Examples
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)3D interactive visualization
Description
This function creates 3d interactive plot of the power against other parameters based on 'plot_ly'.
Usage
plotly_powerest(pred,opacity=0.8,colors='BrBG',fig_title=NULL)Arguments
pred | The result from 'pred_powerest'. |
opacity | The opacity of the graph, default=0.8. |
colors | The color for the graph, default='BrBG'. |
fig_title | The title of the graph, default=NULL. |
Value
A 3d interactive plot of the power surface. Users can also plot multiple surfaces together to compare them.
Author(s)
Lan Shuilshui@mdanderson.org
Examples
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1)) plotly_powerest(pred,fig_title='Power estimation result')An example of power results with multiple replicates number
Description
A subset of power results with multiplereplicates number from PoweREST
Usage
power_exampleFormat
power_example
A data frame with 844 rows and 5 columns:
- avg_logFC
average log2FC
- mean_PCT
percentage of spots detecting the gene
- sample_size
number of replicates
- power
power values
- avg_log2FC_abs
the absolute value of average log2FC
Prediction results from XGBoost
Description
This function takes the result from 'fit_XGBoost' and make predictions.
Usage
pred_XGBoost(x,n.grid=30,xlim,ylim,replicates)Arguments
x | A object of class 'xgb.Booster'. |
n.grid | The grid note number within 'xlim' and 'ylim', default=30. |
xlim | The range of the absolute value of avg_log2FC used for prediction. |
ylim | The range of the avg_pct used for prediction. |
replicates | The replicates number. |
Value
The power estimations from XGBoost.
Author(s)
Lan Shuilshui@mdanderson.org
Examples
data(power_example)# Fit the local power surface of avg_log2FC_abs between 1 and 2avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2)# Fit the modelbst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs,avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3)Power value prediction
Description
This function provides the prediction from theSeuratobject which could be used for visualization by 'plotly_powerest' and 'vis_powerest' orthe power result for your proposal or research. And it is a modified version of the scam library code predict.scam.
Usage
pred_powerest(x,n.grid=30,xlim=NULL,ylim=NULL)Arguments
x | ASeurat object. |
n.grid | The grid note number within 'xlim' and 'ylim', default=30. |
xlim | The range of the absolute value of log2FC used for prediction, default=NULL which means the original range. |
ylim | The range of the avg_pct used for prediction, default=NULL which means the original range. |
Value
The prediction values of the power.
Author(s)
Lan Shuilshui@mdanderson.org based partly on 'scam' by Natalya Pya
Examples
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1))An example of power results from PoweREST
Description
A subset of power results from PoweRESTby running PoweREST(Peri,cond='Condition',replicates=5,spots_num=80,iteration=2)
Usage
result_exampleFormat
result_example
A data frame with ~20,000 rows and 3 columns:
- power
power values
- avg_logFC
average log2FC
- avg_PCT
percentage of spots detecting the gene
Visualization of the power estimations from XGBoost
Description
This function takes the result from 'pred_XGboost' and plots 2D/3D views of it,
Usage
vis_XGBoost(x,view='2D',legend_name='Power',xlab='avg_log2FC_abs',ylab='mean_pct')Arguments
x | The result dataframe from 'pred_XGboost'. |
view | determines plot 2D/3D view, default='2D'. |
legend_name | The name of legend, default='Power'. |
xlab | The name of xlab, default='avg_log2FC_abs'. |
ylab | The name of ylab, default='mean_pct'. |
Value
A 2D/3D plot of the power results from XGBoost.
Author(s)
Lan Shuilshui@mdanderson.org
Examples
data(power_example)# Fit the local power surface of avg_log2FC_abs between 1 and 2avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2)# Fit the modelbst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs,avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3)vis_XGBoost(pred,view='2D',legend_name='Power',xlab='avg_log2FC_abs',ylab='mean_pct')Visualization of the power surface
Description
This function takes the result from 'pred_powerest' and plots 2D views of it,supply ticktype="detailed" to get proper axis annotation and is a modified version of the 'scam' library code 'vis.scam'.
Usage
vis_powerest(x,color="heat",contour.col=NULL,se=-1,zlim=NULL,n.grid=30,col=NA,plot.type="persp",nCol=50,...)Arguments
x | Ascam object. |
color | The color of the plot which can be one of the "heat", "topo", "cm", "terrain", "gray" or "bw". |
contour.col | The color of the contour plot when using plot.type="contour". |
se | If less than or equal to zero then only the predicted surface is plotted, but ifgreater than zero, then 3 surfaces are plotted, one at the predicted values minus se standard errors,one at the predicted values and one at the predicted values plus se standard errors. |
zlim | The range of power value the user want to show. |
n.grid | The number of grid nodes in each direction used for calculating the plotted surface. |
col | The colors for the facets of the plot. If this is NA then if se>0 the facets are transparent,otherwise the color scheme specified in color is used. If col is not NA then it is used as the facet color. |
plot.type | One of "contour" or "persp". |
nCol | The number of colors to use in color schemes. |
... | Other arguments. |
Value
A 2d plot of the power surface. More details can be seen atscam.
Author(s)
Lan Shuilshui@mdanderson.org based partly on 'scam' by Natalya Pya
Examples
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1)) vis_powerest(pred,theta=-30,phi=30,color='heat',ticktype = "detailed",xlim=c(0,6),nticks=5)