This tutorial demonstrates the R packagecpam for theanalysis of time series omics data. It serves as a basic introduction tothe package. There are also two detailed case studies using real worlddata:
These case studies and several simulation studies are presented inthe accompanyingmanuscript byYates et al. (2024). See also, the packagewebsite.
The data for the following examples have been simulated based onempirical RNA-seq data. These data are gene-level counts from acase-only design with 6 time points and 5 replicates per time point.Code to reproduce the data is available in thisrepository.
An example experimental design is included in thecpampackage. Since it is case-only design, there are no experimentalconditions beyond time.
The example count data for are provided as a matrix. Let’s take alook at the first few rows.
cpamTo fit the models, we first prepare thecpam object,then compute p-values, estimate changepoints, and select the shape foreach gene. In this simple example, simulated data are gene-level, thatis we do not have isoform-level counts. As such, we leave```the transcript-to-gene mapping (t2g) and we setgene_level= T`.
cpo<-prepare_cpam(exp_design = exp_design_example,count_matrix = count_matrix_example,model_type ="case-only",t2g =NULL,gene_level = T,num_cores =1)# just for the example cpo<-compute_p_values(cpo)# 6 seconds cpo<-estimate_changepoint(cpo)# 4 seconds cpo<-select_shape(cpo)# 5 secondsWe can look at a summary of the fitted cpam object
If you run the code on your own computer, you can launch the Shinyapp to visualise the results interactively usingvisualise(cpo).
The results of the analysis are summarised using theresults function.
The generated results can be filtered by specifying minimum counts,minimum log-fold changes, and maximum\(p\)-values. For example, to return only thetranscripts with a log-fold change greater than 1, at least 10 counts,and a\(p\)-value less than 0.01, wecan run
A single gene can be plotted using theplot_cpamfunction. Here we plot the gene g063
The subtitle shows
(1,cv) indicating a changepoint at firsttime point (i.e., the gene responds immediately) and a convex('cv') shaped trend.
Let’s look for a gene that has a more complex trend. Unconstrainedshapes incpam are denoted by'tp'.1 We can filter theresults for genes with unconstrained shapes and plot one of them.
We plot the first gene in the list.
This selection of
'tp' suggests that the trend for thisgene does not conform to one of the simpler shape types thatcpam uses. We can exclude'tp' as an optionand forcecpam to choose among the simpler forms by settingshape_type = "shape2" in theplot_cpamfunction ("shape1", the default, allows the'tp'). This choice can be useful for analysis such asclustering. For example:
Here a monotonic increasing concave shape (‘micv’) is chosen, and we cansee this trend deviates from the data substantially more that theunconstrained shape. See themanuscript for moredetails on the shape types available in
cpam.
Next we look for a gene with a changepoint by filtering for geneswith changepoints at the third time point.
Again, we plot the first gene in the list.
The results function can be used to generate clusters according toselected filters. Theplot_cluster function can then beused to visualise the clusters. With such a small simulated data set, wedon’t have many genes in each cluster, but we can try a few differentclustering options to get an idea of how the function works.
There are 19 genes with a concave shape and a changepoint at the firsttime point.
More than one shape or changepoint can be provided. For example:
There are just four genes with decreasing linear or monotonic decreasingconvex shapes which have a changepoint at the second time point.
Clustering can be further refined based on, for example, the rate atwhich the above transcripts attain their maximum values. We illustrateadvanced refinements such as thiscasestudy.
This is just a simple example to get you started. The package hasmany more features and options. Check out the two case studies to seehowcpam can be used to analyse real-world data:
sessionInfo()#> R version 4.4.3 (2025-02-28)#> Platform: x86_64-pc-linux-gnu#> Running under: Pop!_OS 22.04 LTS#>#> Matrix products: default#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0#>#> locale:#> [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=C#> [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C#> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C#>#> time zone: Australia/Hobart#> tzcode source: system (glibc)#>#> attached base packages:#> [1] stats graphics grDevices utils datasets methods base#>#> other attached packages:#> [1] ggplot2_3.5.1 stringr_1.5.1 tidyr_1.3.1 dplyr_1.1.4 cpam_0.1.3#>#> loaded via a namespace (and not attached):#> [1] sass_0.4.9 generics_0.1.3 stringi_1.8.4 lattice_0.22-6 digest_0.6.37 magrittr_2.0.3#> [7] RColorBrewer_1.1-3 evaluate_1.0.3 grid_4.4.3 fastmap_1.2.0 jsonlite_1.8.9 Matrix_1.7-2#> [13] limma_3.61.8 mgcv_1.9-1 purrr_1.0.4 scales_1.3.0 codetools_0.2-20 jquerylib_0.1.4#> [19] cli_3.6.4 rlang_1.1.5 pbmcapply_1.5.1 munsell_0.5.1 splines_4.4.3 scam_1.2-18#> [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.4.3 parallel_4.4.3 colorspace_2.1-1#> [31] locfit_1.5-9.10 vctrs_0.6.5 R6_2.6.1 matrixStats_1.5.0 lifecycle_1.0.4 edgeR_4.3.7#> [37] pkgconfig_2.0.3 pillar_1.10.1 bslib_0.9.0 gtable_0.3.6 glue_1.8.0 Rcpp_1.0.14#> [43] statmod_1.5.0 xfun_0.51 tibble_3.2.1 tidyselect_1.2.1 rstudioapi_0.17.0 knitr_1.49#> [49] farver_2.1.2 htmltools_0.5.8.1 nlme_3.1-167 labeling_0.4.3 rmarkdown_2.27 compiler_4.4.3#> [55] mvnfast_0.2.81tp stands forthinplate which is thetype of spline used for the ‘unconstrained’ curves as defined in themgcv package. The curves are still penalised to be smooth,but the shape type is not fixed.