Movatterモバイル変換


[0]ホーム

URL:


Analyze Copy Number Signatures withsigminer

Shixiang Wang (wangsx1@sysucc.org.cn )

2024-05-11

Exploring copy number signatures with recently developed approachhave been described atTherepertoire of copy number alteration signatures in human cancer.

A more general introduction please readExtract,Analyze and Visualize Mutational Signatures with Sigminer.

library(sigminer)#> Registered S3 method overwritten by 'sigminer':#>   method      from#>   print.bytes Rcpp#> sigminer version 2.3.1#> - Star me at https://github.com/ShixiangWang/sigminer#> - Run hello() to see usage and citation.

For this analysis, data with six columns are required.

Generate allele-specific copy number profile

load(system.file("extdata","toy_segTab.RData",package ="sigminer",mustWork =TRUE))set.seed(1234)segTabs$minor_cn<-sample(c(0,1),size =nrow(segTabs),replace =TRUE)cn<-read_copynumber(segTabs,seg_cols =c("chromosome","start","end","segVal"),genome_measure ="wg",complement =TRUE,add_loh =TRUE)#> ℹ [2024-05-11 12:07:22.722145]: Started.#> ℹ [2024-05-11 12:07:22.739754]: Genome build  : hg19.#> ℹ [2024-05-11 12:07:22.741917]: Genome measure: wg.#> ℹ [2024-05-11 12:07:22.743996]: When add_loh is TRUE, use_all is forced to TRUE.#> Please drop columns you don't want to keep before reading.#> ✔ [2024-05-11 12:07:22.76519]: Chromosome size database for build obtained.#> ℹ [2024-05-11 12:07:22.767823]: Reading input.#> ✔ [2024-05-11 12:07:22.770067]: A data frame as input detected.#> ✔ [2024-05-11 12:07:22.772699]: Column names checked.#> ✔ [2024-05-11 12:07:22.777028]: Column order set.#> ✔ [2024-05-11 12:07:22.796121]: Chromosomes unified.#> ✔ [2024-05-11 12:07:22.824277]: Value 2 (normal copy) filled to uncalled chromosomes.#> ✔ [2024-05-11 12:07:22.833025]: Data imported.#> ℹ [2024-05-11 12:07:22.835415]: Segments info:#> ℹ [2024-05-11 12:07:22.8376]:     Keep - 477#> ℹ [2024-05-11 12:07:22.839727]:     Drop - 0#> ✔ [2024-05-11 12:07:22.842663]: Segments sorted.#> ℹ [2024-05-11 12:07:22.844732]: Adding LOH labels...#> ℹ [2024-05-11 12:07:22.848268]: Joining adjacent segments with same copy number value. Be patient...#> ✔ [2024-05-11 12:07:23.026732]: 410 segments left after joining.#> ✔ [2024-05-11 12:07:23.029618]: Segmental table cleaned.#> ℹ [2024-05-11 12:07:23.031965]: Annotating.#> ✔ [2024-05-11 12:07:23.056107]: Annotation done.#> ℹ [2024-05-11 12:07:23.058526]: Summarizing per sample.#> ✔ [2024-05-11 12:07:23.094252]: Summarized.#> ℹ [2024-05-11 12:07:23.09669]: Generating CopyNumber object.#> ✔ [2024-05-11 12:07:23.099685]: Generated.#> ℹ [2024-05-11 12:07:23.101863]: Validating object.#> ✔ [2024-05-11 12:07:23.104097]: Done.#> ℹ [2024-05-11 12:07:23.106758]: 0.385 secs elapsed.
cn#> An object of class CopyNumber#> =============================#>                           sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr#>                           <char>    <int>    <int>    <int>    <int>     <int>#>  1: TCGA-DF-A2KN-01A-11D-A17U-01       34        6        5        1         4#>  2: TCGA-19-2621-01B-01D-0911-01       34        8        5        3         5#>  3: TCGA-B6-A0X5-01A-21D-A107-01       29        8        4        4         2#>  4: TCGA-A8-A07S-01A-11D-A036-01       39       11        2        9         4#>  5: TCGA-26-6174-01A-21D-1842-01       44       13        8        5         8#>  6: TCGA-CV-7432-01A-11D-2128-01       41       16        7        9         9#>  7: TCGA-06-0644-01A-02D-0310-01       47       19        5       14         8#>  8: TCGA-A5-A0G2-01A-11D-A042-01       40       21        5       16        10#>  9: TCGA-99-7458-01A-11D-2035-01       49       26       10       16        13#> 10: TCGA-05-4417-01A-22D-1854-01       53       37       33        4        17#>     n_loh cna_burden#>     <int>      <num>#>  1:    15      0.000#>  2:    20      0.095#>  3:    18      0.083#>  4:    21      0.106#>  5:    24      0.113#>  6:    24      0.188#>  7:    33      0.158#>  8:    23      0.375#>  9:    33      0.304#> 10:    29      0.617cn@data#>      chromosome     start       end segVal                       sample#>          <char>     <num>     <num>  <int>                       <char>#>   1:       chr1   3218923 116319008      2 TCGA-05-4417-01A-22D-1854-01#>   2:       chr1 116324707 120523902      1 TCGA-05-4417-01A-22D-1854-01#>   3:       chr1 149879545 247812431      4 TCGA-05-4417-01A-22D-1854-01#>   4:      chr10    423671 135224372      3 TCGA-05-4417-01A-22D-1854-01#>   5:      chr11    458784  19461653      3 TCGA-05-4417-01A-22D-1854-01#>  ---#> 406:       chr6   1016984 170898549      2 TCGA-DF-A2KN-01A-11D-A17U-01#> 407:       chr7    746917 158385118      2 TCGA-DF-A2KN-01A-11D-A17U-01#> 408:       chr8    617885 145225107      2 TCGA-DF-A2KN-01A-11D-A17U-01#> 409:       chr9    790234 140938075      2 TCGA-DF-A2KN-01A-11D-A17U-01#> 410:       chrX         1 155270560      2 TCGA-DF-A2KN-01A-11D-A17U-01#>       minor_cn    loh .loh_frac#>          <num> <lgcl>     <num>#>   1: 1.0000000  FALSE        NA#>   2: 0.0000000   TRUE        NA#>   3: 0.5000000   TRUE 0.1175943#>   4: 1.0000000  FALSE        NA#>   5: 1.0000000  FALSE        NA#>  ---#> 406: 0.3333333   TRUE 0.9979494#> 407: 1.0000000  FALSE        NA#> 408: 1.0000000  FALSE        NA#> 409: 0.5000000   TRUE 0.8328715#> 410:        NA  FALSE        NA

Classify the segments with Steele et al method

If you want to try other type of copy number signatures, change themethod argument.

tally_s<-sig_tally(cn,method ="S")#> ℹ [2024-05-11 12:07:23.165562]: Started.#> ℹ [2024-05-11 12:07:23.171528]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!#> ✔ [2024-05-11 12:07:23.197549]: Matrix generated.#> ℹ [2024-05-11 12:07:23.200068]: 0.034 secs elapsed.str(tally_s$all_matrices,max.level =1)#> List of 2#>  $ CN_40: int [1:10, 1:40] 0 0 0 0 0 0 0 0 0 0 ...#>   ..- attr(*, "dimnames")=List of 2#>  $ CN_48: int [1:10, 1:48] 0 0 0 0 0 0 0 0 0 0 ...#>   ..- attr(*, "dimnames")=List of 2

Find de novo signatures

sig_denovo=sig_auto_extract(tally_s$all_matrices$CN_48)#> Select Run 3, which K = 2 as best solution.head(sig_denovo$Signature)#>                         Sig1          Sig2#> 0:homdel:0-100Kb    0.000000  0.000000e+00#> 0:homdel:100Kb-1Mb  0.000000  0.000000e+00#> 0:homdel:>1Mb       0.000000  0.000000e+00#> 1:LOH:0-100Kb       3.609460 3.819129e-242#> 1:LOH:100Kb-1Mb     6.316554 2.814800e-127#> 1:LOH:1Mb-10Mb     13.535473 2.784288e-190

Refit (19) reference signatures

This directly calculates the contribution of 19 referencesignatures.

act_refit=sig_fit(t(tally_s$all_matrices$CN_48),sig_index ="ALL",sig_db ="CNS_TCGA")#> ℹ [2024-05-11 12:07:24.377693]: Started.#> ✔ [2024-05-11 12:07:24.379994]: Signature index detected.#> ℹ [2024-05-11 12:07:24.382046]: Checking signature database in package.#> ℹ [2024-05-11 12:07:24.386141]: Checking signature index.#> ℹ [2024-05-11 12:07:24.388193]: Valid index for db 'CNS_TCGA':#> CN1 CN2 CN3 CN4 CN5 CN6 CN7 CN8 CN9 CN10 CN11 CN12 CN13 CN14 CN15 CN16 CN17 CN18 CN19#> ✔ [2024-05-11 12:07:24.390339]: Database and index checked.#> ✔ [2024-05-11 12:07:24.392602]: Signature normalized.#> ℹ [2024-05-11 12:07:24.394607]: Checking row number for catalog matrix and signature matrix.#> ✔ [2024-05-11 12:07:24.396599]: Checked.#> ℹ [2024-05-11 12:07:24.398572]: Checking rownames for catalog matrix and signature matrix.#> ✔ [2024-05-11 12:07:24.400536]: Checked.#> ✔ [2024-05-11 12:07:24.402494]: Method 'QP' detected.#> ✔ [2024-05-11 12:07:24.414918]: Corresponding function generated.#> ℹ [2024-05-11 12:07:24.417316]: Calling function.#> ℹ [2024-05-11 12:07:24.419917]: Fitting sample: TCGA-05-4417-01A-22D-1854-01#> ℹ [2024-05-11 12:07:24.42325]: Fitting sample: TCGA-06-0644-01A-02D-0310-01#> ℹ [2024-05-11 12:07:24.425641]: Fitting sample: TCGA-19-2621-01B-01D-0911-01#> ℹ [2024-05-11 12:07:24.427867]: Fitting sample: TCGA-26-6174-01A-21D-1842-01#> ℹ [2024-05-11 12:07:24.430052]: Fitting sample: TCGA-99-7458-01A-11D-2035-01#> ℹ [2024-05-11 12:07:24.432224]: Fitting sample: TCGA-A5-A0G2-01A-11D-A042-01#> ℹ [2024-05-11 12:07:24.434394]: Fitting sample: TCGA-A8-A07S-01A-11D-A036-01#> ℹ [2024-05-11 12:07:24.436542]: Fitting sample: TCGA-B6-A0X5-01A-21D-A107-01#> ℹ [2024-05-11 12:07:24.438711]: Fitting sample: TCGA-CV-7432-01A-11D-2128-01#> ℹ [2024-05-11 12:07:24.440867]: Fitting sample: TCGA-DF-A2KN-01A-11D-A17U-01#> ✔ [2024-05-11 12:07:24.443045]: Done.#> ℹ [2024-05-11 12:07:24.445076]: Generating output signature exposures.#> ✔ [2024-05-11 12:07:24.44793]: Done.#> ℹ [2024-05-11 12:07:24.450022]: 0.072 secs elapsed.

We can use some threshold to keep really contributed signautres.

act_refit2= act_refit[apply(act_refit,1,function(x)sum(x)>0.1),]rownames(act_refit2)#>  [1] "CN1"  "CN2"  "CN3"  "CN4"  "CN9"  "CN11" "CN12" "CN13" "CN14" "CN19"

Plot signatures

For de novo signatures:

show_sig_profile(sig_denovo,mode ="copynumber",method ="S",style ="cosmic")

Show the activity/exposure.

show_sig_exposure(sig_denovo)

For reference signatures, you can just select what you want:

show_sig_profile(get_sig_db("CNS_TCGA")$db[,rownames(act_refit2)],style ="cosmic",mode ="copynumber",method ="S",check_sig_names =FALSE)

Similarly for showing activity.

show_sig_exposure(act_refit2)

NOTE that this case shows relatively large difference with differentapproaches, so you need to pick based on your data size/quality anddouble-check the results. In general, for small-size data set, therefitting approach is recommended.

Signature assignment

To assign the de-novo signatures to reference signatures, we usecosine similarity.

get_sig_similarity(sig_denovo,sig_db ="CNS_TCGA")#> -Comparing against COSMIC signatures#> ------------------------------------#> --Found Sig1 most similar to CN1#>    Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.706]#> --Found Sig2 most similar to CN2#>    Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.771]#> ------------------------------------#> Return result invisiblely.

[8]ページ先頭

©2009-2025 Movatter.jp