WT215/bayNorm_papercodePublic

NotificationsYou must be signed in to change notification settings
Fork4
Star6

code for producing figures in bayNorm

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DCA		DCA
FigS19		FigS19
Figure1		Figure1
Figure2		Figure2
Figure3		Figure3
Figure4		Figure4
Functions		Functions
RealData		RealData
Simulations		Simulations
README.html		README.html
README.md		README.md

Repository files navigation

bayNorm: relevant code for producing figures in the paper

code for producing figures in bayNorm

#Purpose of this repositoryThe main purpose of this repository is to provide the analysis procedure used in the paper.

Source code of bayNorm

Source code of bayNorm can be foundhere

Real datasets used in this paper

This paper involves the following 8 studies:

Klein study (https://www.cell.com/cell/abstract/S0092-8674%2815%2900500-0)
Grün study (https://www.nature.com/articles/nmeth.2930)
Torre study (https://www.cell.com/cell-systems/abstract/S2405-4712(18)30051-6)
Bacher study (https://www.nature.com/articles/nmeth.4263)
Islam study (https://www.ncbi.nlm.nih.gov/pubmed/21543516)
Soumillon study (https://www.biorxiv.org/content/early/2014/03/05/003236)
Tung study (https://www.nature.com/articles/srep39921)
Patel study (http://science.sciencemag.org/content/344/6190/1396)

Simulated datasets used in this paper:

There are 4 simulated datasets with DE genes. Each one of them consists of 2 two groups of cells, and 100 cells in each group. 2000 out of 10000 genes were simulated to be DE genes in the first group and half of the 2000 genes were upregulated\Simulations\SIM_DE.

SIM DE I: mean capture efficiency$<\beta>=10%$ for two groups.
SIM DE II: mean capture efficiency$<\beta>=5% \text{ and } 10%$ for two groups respectively.
SIM DE III: mean capture efficiency$<\beta>=10% \text{ and } 5%$ for two groups respectively.
SIM DE IV: mean capture efficiency$<\beta>=5% \text{ and } 5%$ for two groups respectively.

There are another 2 simulated datasets without DE genes\Simulations\SIM_noDE. Mean capture efficiency$<\beta>=10% \text{ and } 5%$ for two groups respectively. These two simulations were inspired by Bacher study. The purpose is to study the ability of normalization method in terms of correcting different sequencing depths.

SIM Bacher I: Parameters were estimated from Klein study.
SIM Bacher II: Parameters were estimated from H1_P24 cells from Bacher study.

Datasets and the corresponding figures

Real datasets

Klein study: Fig1 (b)-(e), Fig3 (a)-(b); Fig S2, S8a-b.
Grün study: Fig2 a,c,e and g; Fig S11a-b, S12-S13
Torre study: Fig2 b,d,f and h; Fig S6, S8e-f, S10a, S11c, S14.
Bacher study: Fig S7, S9 a-d, S10e, S16, S19a, S23a.
Islam study: Fig 3c; Fig S9e-f, S23b.
Soumillon study: Fig3d; Fig S21.
Tung study: Fig4, FigS3-S5, S8c-d, S10b-d, S25-26
Patel study: FigS10f

Simulated datasets with DE genes

SIM DE I: FigS15a,e,i, S20c-d, S22a, S24a, S27-29
SIM DE II: FigS15b,f,j, S20c-d, S22b, S24b, S27-29
SIM DE III: FigS15c,g,k, S20c-d, S22c, S24c, S27-29
SIM DE IV: FigS15d,h,l, S20c-d, S22d, S24d, S27-29

Simulated datasets without DE genes

SIM Bacher I: S17, S19b, S20a-b
SIM Bacher II: S18, S19c

Some notes before running the code

You cannot directly run all the code at the same time. The paths in each R file need to be modified accordingly.
The normalization and DE detection could take a long time, which depends on the size of raw data. Hence make sure running the code step by step so as to avoid bugs.
Useful functions are stored in the file\Functions, some of them need to be loaded in advance.
The noramlization methodDCA is developed using python. The Jupyter Notebooks for running DCA are stored in the file\DCA. Make sure running DCA normalization and corresponding DE detection, and them feed the DCA normalized data into the other R files.
Some R files need several.RData files as input and will also output.RData files used in other cases. Hence make sure the first step is completed so as to produce necessary.RData files to begin with.

The first step

Preparing for the real datasets

Klein study: firstly, run\RealData\Klein_study\Klein_bayNorm.R, outputKlein_bayNorm.RData.

2.Grün study: runLOAD_Grun_smFISH.R (outputsmFISH_norm_load.RData),LOAD_Grun_2i.R (outputGrun_2014_RAW.RData) andLOAD_Grun_serum.R (outputGrun_2014_RAW_serum.RData). Then runGrun_2i_norms.R (outputGrun_2i_norms.RData) andGrun_serum_norms.R (outputGrun_serum_norms.RData) for normalizing data. Note that the other method DCA needs to be run separately.

Torre study: runLoad_Torre.R (outputLoad_Torre.RData). Then runTorre_many_normalizations.R (out putTorre_many_normalizations.RData) for normalizing data.
Bacher study: runLOAD_Bacher.R (outputRAW_INITIATE.RData) to load H1 and H9 datasets. Then runH1_many_normalizations.R (output"H1_many_normalizations.RData") andH9_many_normalizations.R (output"H9_many_normalizations.RData") respectively.
Islam study: runLoad_Islam.R (outputLoad_Islam.RData). Then runIslam_many_normalizations.R (outputIslam_many_normalizations.RData).
Soumillon study: runLOAD_Soumillon.R (outputSoumillon_2014.RData). Then runSoumillon_norms.R (outputSoumillon_analysis.RData).
Tung study: runLoad_Tung.R (outputLoad_Tung.RData). Then runTung_many_normalizations.R (outputTung_norms.RData).
Patel study: runLoad_Patel.R (outputPatel2014_bay_out.RData)

Notes before running simulations

Firstly, we need to estimate parameters from the real data. Relevant codes are stored in\bayNorm_papercode\Figure1.

For Klein dataset, if you have completed the step 1 as shown above, thenKlein_bayNorm.RData stored the parameters you need.Klein_bayNorm.RData is needed in SIM DE I-IV and SIM Bacher I.
For Bacher dataset (H1_P24), run a section namedREAL DATA 6: Bacher study (H1_P24 cells) in the fileSimulations_realdata.R, which outputH1p24_bay_sim_allgene.RData used in SIM Bacher II.

Preparing for the simulated datasets (with DE genes)

The codes are stored in:\Simulations\SIM_DE

SIM DE I: runDE_sim_01_01.R (outputSIM_1.RData andGG_SIM_1.RData).
SIM DE II: runSIM_005_01.R (outputSIM_005_01.RData andGG_SIM_005_01.RData).
SIM DE III: runSIM_01_005.R(outputSIM_01_005.RData andGG_SIM_01_005.RData).
SIM DE IV: runSIM_005_005.r(outputSIM_005_005.RData andGG_SIM_005_005.RData).

Preparing for the simulated datasets (without DE genes)

The codes are stored in:\Simulations\SIM_noDE

SIM Bacher I: runSIM_noDE_01_005.R (outputSIM_noDE_01_005.RData)
SIM Bacher II: runSIM_noDE_01_005_H1.R (outputSIM_noDE_01_005_H1.RData)

After the above steps, you can try the other R files which include various code for analysing the data.

About

code for producing figures in bayNorm

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

bayNorm: relevant code for producing figures in the paper

Source code of bayNorm

Real datasets used in this paper

Simulated datasets used in this paper:

Datasets and the corresponding figures

Real datasets

Simulated datasets with DE genes

Simulated datasets without DE genes

Some notes before running the code

The first step

Preparing for the real datasets

Notes before running simulations

Preparing for the simulated datasets (with DE genes)

Preparing for the simulated datasets (without DE genes)

Next

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

WT215/bayNorm_papercode

Folders and files

Latest commit

History

Repository files navigation

bayNorm: relevant code for producing figures in the paper

Source code of bayNorm

Real datasets used in this paper

Simulated datasets used in this paper:

Datasets and the corresponding figures

Real datasets

Simulated datasets with DE genes

Simulated datasets without DE genes

Some notes before running the code

The first step

Preparing for the real datasets

Notes before running simulations

Preparing for the simulated datasets (with DE genes)

Preparing for the simulated datasets (without DE genes)

Next

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages