Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Quality Surrogate Variable Analysis for Degradation Correction

NotificationsYou must be signed in to change notification settings

LieberInstitute/qsvaR

Repository files navigation

Lifecycle: stableBioc release statusBioc devel statusBioc downloads rankBioc supportBioc historyBioc last commitBioc dependenciesCodecov test coverageR build statusGitHub issuesGitHub pullsDOI

Differential expressions analysis requires the ability to normalizecomplex datasets. In the case of postmortem brain tissue we are taskedwith removing the effects of bench degradation. TheqsvaR packagecombines an established method for removing the effects of degradationfrom RNA-seq data with easy to use functions. It is the second iterationof the qSVA framework (Jaffe et al, PNAS,2017).

The first step in theqsvaR workflow is to create anRangedSummarizedExperimentobject with the transcripts identified in our qSVA experiment. If youalready have aRangedSummarizedExperimentof transcripts we can do this with thegetDegTx() function as shownbelow.If not this can be generated with theSPEAQeasy (a RNA-seqpipeline maintained by our lab) pipeline using the--qsva flag. If youalready have aRangedSummarizedExperimentobject with transcripts then you do not need to runSPEAQeasy. This flagrequires a full path to a text file, containing one Ensembl transcriptID per line for each transcript desired in the final transcripts Routput object (calledrse_tx). Thesig_transcripts argument in thispackage should contain the same Ensembl transcript IDs as the text filefor the--qsva flag.The goal ofqsvaR is to provide software thatcan remove the effects of bench degradation from RNA-seq data.

Installation Instructions

Get the latest stable R release from CRAN. Then installqsvaR usingfrom Bioconductor the following code:

if (!requireNamespace("BiocManager",quietly=TRUE)) {    install.packages("BiocManager")}BiocManager::install("qsvaR")

And the development version from GitHub with:

BiocManager::install("LieberInstitute/qsvaR")

Example

This is a basic example which shows how to obtain the quality surrogatevariables (qSVs) for the brainseqphase IIdataset. qSVs are essentiallyprincipal components from an rna-seq experiment designed to model benchdegradation. For more on principal components you can read andintroductory articlehere.At the start of this script we will have anRangedSummarizedExperimentand a list of all the transcripts found in our degradation study. At theend we will have a table with differential expression results that isadjusted for qSVs.

## R packages we'll uselibrary("qsvaR")library("limma")
library("qsvaR")## We'll download example data from the BrainSeq Phase II project## described at http://eqtl.brainseq.org/phase2/.#### We'll use BiocFileCache to cache these files so you don't have to download## them again for other examples.bfc<-BiocFileCache::BiocFileCache()rse_file<-BiocFileCache::bfcrpath("https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata",x=bfc)#> adding rname 'https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata'## Now that we have the data in our computer, we can load it.load(rse_file,verbose=TRUE)#> Loading objects:#>   rse_tx

In this next step, we subset to the transcripts associated withdegradation.qsvaR provides significant transcripts determined in fourdifferent linear models of transcript expression against degradationtime, brain region, and potentially cell-type proportions:

  1. exp ~ DegradationTime + Region
  2. exp ~ DegradationTime * Region
  3. exp ~ DegradationTime + Region + CellTypeProp
  4. exp ~ DegradationTime * Region + CellTypeProp

select_transcripts() returns degradation-associated transcripts andsupports two parameters. First,top_n controls how many significanttranscripts to extract from each model. Whencell_component = TRUE,all four models are used; otherwise, just the first two are used. Theunion of significant transcripts from all used models is returned.

As an example, we’ll subset ourRangedSummarizedExperiment to theunion of the top 1000 significant transcripts derived from each of thefour models.

#   Subset 'rse_tx' to the top 1000 significant transcripts from the four#   degradation modelsDegTx<- getDegTx(rse_tx,sig_transcripts= select_transcripts(top_n=1000,cell_component=TRUE))#> Using 2496 degradation-associated transcripts.## Now we can compute the Principal Components (PCs) of the degraded## transcriptspcTx<- getPCs(DegTx,"tpm")

Next we use thek_qsvs() function to calculate how many PCs we willneed to account for the variation. A model matrix accounting forrelevant variables should be used. Common variables such as Age, Sex,Race and Region are often included in the model. Again we are using ourRangedSummarizedExperimentDegTx as therse_tx option. Next wespecify themod with ourmodel.matrix().model.matrix() creates adesign (or model) matrix, e.g., by expanding factors to a set of dummyvariables (depending on the contrasts) and expanding interactionssimilarly. For more information on creating a design matrix for yourexperiment see the documentationhere.Again we use theassayname option to specify the we are using thetpm assay, where TPM stands fortranscripts per million.

## Using a simple statistical model we determine the number of PCs needed (k)mod<- model.matrix(~Dx+Age+Sex+Race+Region,data= colData(rse_tx))k<- k_qsvs(DegTx,mod,"tpm")print(k)#> [1] 20

Now that we have our PCs and the number we need we can generate ourqSVs.

## Obtain the k qSVsqsvs<- get_qsvs(pcTx,k)dim(qsvs)#> [1] 900  20

This can be done in one step with our wrapper functionqSVA which justcombinds all the previous mentioned functions.

## Example use of the wrapper function qSVA()qsvs_wrapper<- qSVA(rse_tx=rse_tx,sig_transcripts= select_transcripts(top_n=1000,cell_component=TRUE),mod=mod,assayname="tpm")#> Using 2496 degradation-associated transcripts.dim(qsvs_wrapper)#> [1] 900  20

Differential Expression

Next we can use a standardlimma package approach to do differentialexpression on the data. The key here is that we add our qSVs to thestatistical model we use throughmodel.matrix(). Here we input ourRanged SummarizedExperimentobject and ourmodel.matrix with qSVs. Note here that theRanged SummarizedExperiment object is the original object loaded withthe full list of transcripts, not the the one we subsetted for qSVs.This is because while PCs can be generated from a subset of genes,differential expression is best done on the full dataset. The expectedoutput is asigTx object that shows the results of differentialexpression.

library("limma")## Add the qSVs to our statistical modelmod_qSVA<- cbind(mod,qsvs)## Extract the transcript expression values and put them in the## log2(TPM + 1) scaletxExprs<- log2(assays(rse_tx)$tpm+1)## Run the standard linear model for differential expressionfitTx<- lmFit(txExprs,mod_qSVA)eBTx<- eBayes(fitTx)## Extract the differential expression resultssigTx<- topTable(eBTx,coef=2,p.value=1,number= nrow(rse_tx))## Explore the top resultshead(sigTx)#>                         logFC  AveExpr         t      P.Value    adj.P.Val#> ENST00000484223.1 -0.17439018 1.144051 -6.685583 4.099898e-11 8.121610e-06#> ENST00000344423.9  0.09212678 1.837102  6.449533 1.855943e-10 1.838246e-05#> ENST00000399808.4  0.28974369 4.246788  6.320041 4.165237e-10 2.233477e-05#> ENST00000467370.5  0.06313938 0.301711  6.307179 4.509956e-10 2.233477e-05#> ENST00000264657.9  0.09913353 2.450684  5.933186 4.280565e-09 1.375288e-04#> ENST00000415912.6  0.09028757 1.736581  5.918230 4.671963e-09 1.375288e-04#>                           B#> ENST00000484223.1 14.338379#> ENST00000344423.9 12.865110#> ENST00000399808.4 12.077344#> ENST00000467370.5 11.999896#> ENST00000264657.9  9.811110#> ENST00000415912.6  9.726142

Finally, you can compare the resulting t-statistics from yourdifferential expression model against the degradation time t-statisticsadjusting for the six different brain regions. This type of plot iscalledDEqual plot and was shown in the initial qSVA framework paper(Jaffe et al, PNAS, 2017). Weare really looking for two patterns exemplified here in Figure 1(cartoon shown earlier). A direct positive correlation with degradationshown in Figure 1 on the right tells us that there is signal in the dataassociated with qSVs. An example of nonconfounded data or data that hasbeen modeled can be seen in Figure 1 on the right with its lack ofrelationship between the x and y variables.

Cartoon showing patterns in DEqual plots

Cartoon showing patterns in DEqual plots

## Generate a DEqual() plot using the model results with qSVsDEqual(sigTx)
Result of Differential Expression with qSVA normalization.

Result of Differential Expression with qSVA normalization.

For comparison, here is theDEqual() plot for the model without qSVs.

## Generate a DEqual() plot using the model results without qSVsDEqual(topTable(eBayes(lmFit(txExprs,mod)),coef=2,p.value=1,number= nrow(rse_tx)))
Result of Differential Expression without qSVA normalization.

Result of Differential Expression without qSVA normalization.

In these two DEqual plots we can see that the first is much better. Witha correlation of -0.014 we can effectively conclude that we have removedthe effects of degradation from the data. In the second plot aftermodeling for several common variables we still have a correlation of 0.5with the degradation experiment. This high correlation shows we stillhave a large amount of signal from degradation in our data potentiallyconfounding our case-control (SCZD vs neurotypical controls)differential expression results.

About

Quality Surrogate Variable Analysis for Degradation Correction

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors12


[8]ページ先頭

©2009-2025 Movatter.jp