biomedicalinformaticsgroup/sfariexpressionPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star0

Code repository for SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

www.nature.com/articles/s41598-022-14077-1

0 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
InputData		InputData
IntermediateData		IntermediateData
R		R
Results		Results
supportingDatasets		supportingDatasets
.gitignore		.gitignore
README.md		README.md
make.R		make.R
run.R		run.R

Repository files navigation

SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

Code repository forSFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

Notes about this repository

All code is in R. Thedrake package is used to manage the workflow of the project, but the code can also be executed as a regular R script:

The scriptrun.R runs the project as a regular R script and saves the output in theResults folder
The scriptmake.R runs the project usingdrake and saves the output in the.drake folder, which can be accessed by name usingdrake'sloadd() function

Note on using drake:Drake provides a lot of useful features but it has two drawbacks in this project:

Running the project usingrun.R is much faster than withmake.R in computers with multiple cores because some packages use theparallel package underneath, which doesn't work well withclustermq, the packagedrake uses to distribute the work
The Enrichment Analysis of the top modules is only available when running the code usingrun.R, because of compatibility issues between the packageclusterProfiler anddrake

Running the code

Clone this repository
Download InputData fromdoi.org/10.7488/ds/2980
Executerun.R ormake.R depending on whether you want your workflow to be rundrake or not

InputData

genes_GO_annotations: Gene Ontology annotations for each gene
krishnan_probability_score.xlsx: Krishnan's ASD probabilty score downloaded fromasd.princeton.edu
NCBI_gene2ensembl_20_02_07gz: NCBI's mapping between genes symbols and ensembl IDs
NCBI_gene_info_20_02_07_.gz: Functional annotations of the genes
RNAseq_ASD_datExpr.csv: Gene expression matrix. Downloaded frommgandal's github repository
RNAseq_ASD_datMeta.csv: Metadata of the samples from the gene expression matrix. Downloaded frommgandal's github repository
sanders_TADA_score.xlsx Sanders TADA score downloaded from [He et al., 2013)[https://doi.org/10.1371/journal.pgen.1003671]
SFARI_genes_01-03-2020.csv: SFARI Gene scores using new scoring system
SFARI_genes_08-29-2019.csv: SFARI Gene scores using old scoring system

Output

Preprocessed Input Data

new_SFARI_dataset: Dataframe with information about SFARI genes with the new annotation criteria (scores 1 to 3)
old_SFARI_dataset: Dataframe with information about SFARI genes with the original annotation criteria (scores 1 to 6)
NCBI_dataset: Dataframe with gene biotype annotation obtained from NCBI
GO_neuronal_dataset: Dataframe with gene annotation indicating if they have some neuronal-related function in the Gene Ontology
Gandal_dataset: RData object containing the preprocessed and normalised gene expression data

WGCNA

modules_dataset: Dataframe indicating the module each of the genes belong to
top_modules_by_Diagnosis: Dataframe indicating the modules with the highest relation to Diagnosis as well as their correlation value
top_modules_by_SFARI: Dataframe indicating the modules with the highest enrichment in SFARI Genes as well as their enrichment and adjusted p-value
top_modules_enrichment: (not included in thedrake workflow) Named list with the Enrichment results for all the modules with a strong correlation to Diagnosis or enriched in SFARI Genes

Classification Model

classification_dataset: Dataframe with the input data used for the classification models
biased_classification_model: Named list with the information from the biased classification model, including the predictions for each gene and the coefficients and performance metrics of the model
unbiased_classification_model: Named list with the information from the unbiased classification model. The list includes the same elements as biased_classification_model

About

Code repository for SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

www.nature.com/articles/s41598-022-14077-1

Releases

2tags

Packages

No packages published

Languages

R100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

Notes about this repository

Running the code

InputData

Output

Preprocessed Input Data

WGCNA

Classification Model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

biomedicalinformaticsgroup/sfariexpression

Folders and files

Latest commit

History

Repository files navigation

SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

Notes about this repository

Running the code

InputData

Output

Preprocessed Input Data

WGCNA

Classification Model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages