Exploring copy number signatures with recently developed approachhave been described atTherepertoire of copy number alteration signatures in human cancer.
A more general introduction please readExtract,Analyze and Visualize Mutational Signatures with Sigminer.
library(sigminer)#> Registered S3 method overwritten by 'sigminer':#> method from#> print.bytes Rcpp#> sigminer version 2.3.1#> - Star me at https://github.com/ShixiangWang/sigminer#> - Run hello() to see usage and citation.For this analysis, data with six columns are required.
load(system.file("extdata","toy_segTab.RData",package ="sigminer",mustWork =TRUE))set.seed(1234)segTabs$minor_cn<-sample(c(0,1),size =nrow(segTabs),replace =TRUE)cn<-read_copynumber(segTabs,seg_cols =c("chromosome","start","end","segVal"),genome_measure ="wg",complement =TRUE,add_loh =TRUE)#> ℹ [2024-05-11 12:07:22.722145]: Started.#> ℹ [2024-05-11 12:07:22.739754]: Genome build : hg19.#> ℹ [2024-05-11 12:07:22.741917]: Genome measure: wg.#> ℹ [2024-05-11 12:07:22.743996]: When add_loh is TRUE, use_all is forced to TRUE.#> Please drop columns you don't want to keep before reading.#> ✔ [2024-05-11 12:07:22.76519]: Chromosome size database for build obtained.#> ℹ [2024-05-11 12:07:22.767823]: Reading input.#> ✔ [2024-05-11 12:07:22.770067]: A data frame as input detected.#> ✔ [2024-05-11 12:07:22.772699]: Column names checked.#> ✔ [2024-05-11 12:07:22.777028]: Column order set.#> ✔ [2024-05-11 12:07:22.796121]: Chromosomes unified.#> ✔ [2024-05-11 12:07:22.824277]: Value 2 (normal copy) filled to uncalled chromosomes.#> ✔ [2024-05-11 12:07:22.833025]: Data imported.#> ℹ [2024-05-11 12:07:22.835415]: Segments info:#> ℹ [2024-05-11 12:07:22.8376]: Keep - 477#> ℹ [2024-05-11 12:07:22.839727]: Drop - 0#> ✔ [2024-05-11 12:07:22.842663]: Segments sorted.#> ℹ [2024-05-11 12:07:22.844732]: Adding LOH labels...#> ℹ [2024-05-11 12:07:22.848268]: Joining adjacent segments with same copy number value. Be patient...#> ✔ [2024-05-11 12:07:23.026732]: 410 segments left after joining.#> ✔ [2024-05-11 12:07:23.029618]: Segmental table cleaned.#> ℹ [2024-05-11 12:07:23.031965]: Annotating.#> ✔ [2024-05-11 12:07:23.056107]: Annotation done.#> ℹ [2024-05-11 12:07:23.058526]: Summarizing per sample.#> ✔ [2024-05-11 12:07:23.094252]: Summarized.#> ℹ [2024-05-11 12:07:23.09669]: Generating CopyNumber object.#> ✔ [2024-05-11 12:07:23.099685]: Generated.#> ℹ [2024-05-11 12:07:23.101863]: Validating object.#> ✔ [2024-05-11 12:07:23.104097]: Done.#> ℹ [2024-05-11 12:07:23.106758]: 0.385 secs elapsed.cn#> An object of class CopyNumber#> =============================#> sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr#> <char> <int> <int> <int> <int> <int>#> 1: TCGA-DF-A2KN-01A-11D-A17U-01 34 6 5 1 4#> 2: TCGA-19-2621-01B-01D-0911-01 34 8 5 3 5#> 3: TCGA-B6-A0X5-01A-21D-A107-01 29 8 4 4 2#> 4: TCGA-A8-A07S-01A-11D-A036-01 39 11 2 9 4#> 5: TCGA-26-6174-01A-21D-1842-01 44 13 8 5 8#> 6: TCGA-CV-7432-01A-11D-2128-01 41 16 7 9 9#> 7: TCGA-06-0644-01A-02D-0310-01 47 19 5 14 8#> 8: TCGA-A5-A0G2-01A-11D-A042-01 40 21 5 16 10#> 9: TCGA-99-7458-01A-11D-2035-01 49 26 10 16 13#> 10: TCGA-05-4417-01A-22D-1854-01 53 37 33 4 17#> n_loh cna_burden#> <int> <num>#> 1: 15 0.000#> 2: 20 0.095#> 3: 18 0.083#> 4: 21 0.106#> 5: 24 0.113#> 6: 24 0.188#> 7: 33 0.158#> 8: 23 0.375#> 9: 33 0.304#> 10: 29 0.617cn@data#> chromosome start end segVal sample#> <char> <num> <num> <int> <char>#> 1: chr1 3218923 116319008 2 TCGA-05-4417-01A-22D-1854-01#> 2: chr1 116324707 120523902 1 TCGA-05-4417-01A-22D-1854-01#> 3: chr1 149879545 247812431 4 TCGA-05-4417-01A-22D-1854-01#> 4: chr10 423671 135224372 3 TCGA-05-4417-01A-22D-1854-01#> 5: chr11 458784 19461653 3 TCGA-05-4417-01A-22D-1854-01#> ---#> 406: chr6 1016984 170898549 2 TCGA-DF-A2KN-01A-11D-A17U-01#> 407: chr7 746917 158385118 2 TCGA-DF-A2KN-01A-11D-A17U-01#> 408: chr8 617885 145225107 2 TCGA-DF-A2KN-01A-11D-A17U-01#> 409: chr9 790234 140938075 2 TCGA-DF-A2KN-01A-11D-A17U-01#> 410: chrX 1 155270560 2 TCGA-DF-A2KN-01A-11D-A17U-01#> minor_cn loh .loh_frac#> <num> <lgcl> <num>#> 1: 1.0000000 FALSE NA#> 2: 0.0000000 TRUE NA#> 3: 0.5000000 TRUE 0.1175943#> 4: 1.0000000 FALSE NA#> 5: 1.0000000 FALSE NA#> ---#> 406: 0.3333333 TRUE 0.9979494#> 407: 1.0000000 FALSE NA#> 408: 1.0000000 FALSE NA#> 409: 0.5000000 TRUE 0.8328715#> 410: NA FALSE NAIf you want to try other type of copy number signatures, change themethod argument.
tally_s<-sig_tally(cn,method ="S")#> ℹ [2024-05-11 12:07:23.165562]: Started.#> ℹ [2024-05-11 12:07:23.171528]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!#> ✔ [2024-05-11 12:07:23.197549]: Matrix generated.#> ℹ [2024-05-11 12:07:23.200068]: 0.034 secs elapsed.str(tally_s$all_matrices,max.level =1)#> List of 2#> $ CN_40: int [1:10, 1:40] 0 0 0 0 0 0 0 0 0 0 ...#> ..- attr(*, "dimnames")=List of 2#> $ CN_48: int [1:10, 1:48] 0 0 0 0 0 0 0 0 0 0 ...#> ..- attr(*, "dimnames")=List of 2sig_denovo=sig_auto_extract(tally_s$all_matrices$CN_48)#> Select Run 3, which K = 2 as best solution.head(sig_denovo$Signature)#> Sig1 Sig2#> 0:homdel:0-100Kb 0.000000 0.000000e+00#> 0:homdel:100Kb-1Mb 0.000000 0.000000e+00#> 0:homdel:>1Mb 0.000000 0.000000e+00#> 1:LOH:0-100Kb 3.609460 3.819129e-242#> 1:LOH:100Kb-1Mb 6.316554 2.814800e-127#> 1:LOH:1Mb-10Mb 13.535473 2.784288e-190This directly calculates the contribution of 19 referencesignatures.
act_refit=sig_fit(t(tally_s$all_matrices$CN_48),sig_index ="ALL",sig_db ="CNS_TCGA")#> ℹ [2024-05-11 12:07:24.377693]: Started.#> ✔ [2024-05-11 12:07:24.379994]: Signature index detected.#> ℹ [2024-05-11 12:07:24.382046]: Checking signature database in package.#> ℹ [2024-05-11 12:07:24.386141]: Checking signature index.#> ℹ [2024-05-11 12:07:24.388193]: Valid index for db 'CNS_TCGA':#> CN1 CN2 CN3 CN4 CN5 CN6 CN7 CN8 CN9 CN10 CN11 CN12 CN13 CN14 CN15 CN16 CN17 CN18 CN19#> ✔ [2024-05-11 12:07:24.390339]: Database and index checked.#> ✔ [2024-05-11 12:07:24.392602]: Signature normalized.#> ℹ [2024-05-11 12:07:24.394607]: Checking row number for catalog matrix and signature matrix.#> ✔ [2024-05-11 12:07:24.396599]: Checked.#> ℹ [2024-05-11 12:07:24.398572]: Checking rownames for catalog matrix and signature matrix.#> ✔ [2024-05-11 12:07:24.400536]: Checked.#> ✔ [2024-05-11 12:07:24.402494]: Method 'QP' detected.#> ✔ [2024-05-11 12:07:24.414918]: Corresponding function generated.#> ℹ [2024-05-11 12:07:24.417316]: Calling function.#> ℹ [2024-05-11 12:07:24.419917]: Fitting sample: TCGA-05-4417-01A-22D-1854-01#> ℹ [2024-05-11 12:07:24.42325]: Fitting sample: TCGA-06-0644-01A-02D-0310-01#> ℹ [2024-05-11 12:07:24.425641]: Fitting sample: TCGA-19-2621-01B-01D-0911-01#> ℹ [2024-05-11 12:07:24.427867]: Fitting sample: TCGA-26-6174-01A-21D-1842-01#> ℹ [2024-05-11 12:07:24.430052]: Fitting sample: TCGA-99-7458-01A-11D-2035-01#> ℹ [2024-05-11 12:07:24.432224]: Fitting sample: TCGA-A5-A0G2-01A-11D-A042-01#> ℹ [2024-05-11 12:07:24.434394]: Fitting sample: TCGA-A8-A07S-01A-11D-A036-01#> ℹ [2024-05-11 12:07:24.436542]: Fitting sample: TCGA-B6-A0X5-01A-21D-A107-01#> ℹ [2024-05-11 12:07:24.438711]: Fitting sample: TCGA-CV-7432-01A-11D-2128-01#> ℹ [2024-05-11 12:07:24.440867]: Fitting sample: TCGA-DF-A2KN-01A-11D-A17U-01#> ✔ [2024-05-11 12:07:24.443045]: Done.#> ℹ [2024-05-11 12:07:24.445076]: Generating output signature exposures.#> ✔ [2024-05-11 12:07:24.44793]: Done.#> ℹ [2024-05-11 12:07:24.450022]: 0.072 secs elapsed.We can use some threshold to keep really contributed signautres.
For de novo signatures:
Show the activity/exposure.
For reference signatures, you can just select what you want:
show_sig_profile(get_sig_db("CNS_TCGA")$db[,rownames(act_refit2)],style ="cosmic",mode ="copynumber",method ="S",check_sig_names =FALSE)Similarly for showing activity.
NOTE that this case shows relatively large difference with differentapproaches, so you need to pick based on your data size/quality anddouble-check the results. In general, for small-size data set, therefitting approach is recommended.
To assign the de-novo signatures to reference signatures, we usecosine similarity.
get_sig_similarity(sig_denovo,sig_db ="CNS_TCGA")#> -Comparing against COSMIC signatures#> ------------------------------------#> --Found Sig1 most similar to CN1#> Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.706]#> --Found Sig2 most similar to CN2#> Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.771]#> ------------------------------------#> Return result invisiblely.