- Notifications
You must be signed in to change notification settings - Fork1
Code repository for SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data
biomedicalinformaticsgroup/sfariexpression
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data
Code repository forSFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data
All code is in R. Thedrake package is used to manage the workflow of the project, but the code can also be executed as a regular R script:
The script
run.Rruns the project as a regular R script and saves the output in theResults folderThe script
make.Rruns the project usingdrakeand saves the output in the.drake folder, which can be accessed by name usingdrake'sloadd()function
Note on using drake:Drake provides a lot of useful features but it has two drawbacks in this project:
Running the project using
run.Ris much faster than withmake.Rin computers with multiple cores because some packages use theparallelpackage underneath, which doesn't work well withclustermq, the packagedrakeuses to distribute the workThe Enrichment Analysis of the top modules is only available when running the code using
run.R, because of compatibility issues between the packageclusterProfileranddrake
Clone this repository
Download InputData fromdoi.org/10.7488/ds/2980
Execute
run.Rormake.Rdepending on whether you want your workflow to be rundrakeor not
genes_GO_annotations: Gene Ontology annotations for each gene
krishnan_probability_score.xlsx: Krishnan's ASD probabilty score downloaded fromasd.princeton.edu
NCBI_gene2ensembl_20_02_07gz: NCBI's mapping between genes symbols and ensembl IDs
NCBI_gene_info_20_02_07_.gz: Functional annotations of the genes
RNAseq_ASD_datExpr.csv: Gene expression matrix. Downloaded frommgandal's github repository
RNAseq_ASD_datMeta.csv: Metadata of the samples from the gene expression matrix. Downloaded frommgandal's github repository
sanders_TADA_score.xlsx Sanders TADA score downloaded from [He et al., 2013)[https://doi.org/10.1371/journal.pgen.1003671]
SFARI_genes_01-03-2020.csv: SFARI Gene scores using new scoring system
SFARI_genes_08-29-2019.csv: SFARI Gene scores using old scoring system
new_SFARI_dataset: Dataframe with information about SFARI genes with the new annotation criteria (scores 1 to 3)
old_SFARI_dataset: Dataframe with information about SFARI genes with the original annotation criteria (scores 1 to 6)
NCBI_dataset: Dataframe with gene biotype annotation obtained from NCBI
GO_neuronal_dataset: Dataframe with gene annotation indicating if they have some neuronal-related function in the Gene Ontology
Gandal_dataset: RData object containing the preprocessed and normalised gene expression data
modules_dataset: Dataframe indicating the module each of the genes belong to
top_modules_by_Diagnosis: Dataframe indicating the modules with the highest relation to Diagnosis as well as their correlation value
top_modules_by_SFARI: Dataframe indicating the modules with the highest enrichment in SFARI Genes as well as their enrichment and adjusted p-value
top_modules_enrichment: (not included in the
drakeworkflow) Named list with the Enrichment results for all the modules with a strong correlation to Diagnosis or enriched in SFARI Genes
classification_dataset: Dataframe with the input data used for the classification models
biased_classification_model: Named list with the information from the biased classification model, including the predictions for each gene and the coefficients and performance metrics of the model
unbiased_classification_model: Named list with the information from the unbiased classification model. The list includes the same elements as biased_classification_model
About
Code repository for SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.