Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Code repository for SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

NotificationsYou must be signed in to change notification settings

biomedicalinformaticsgroup/sfariexpression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Code repository forSFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data


Notes about this repository

All code is in R. Thedrake package is used to manage the workflow of the project, but the code can also be executed as a regular R script:

  • The scriptrun.R runs the project as a regular R script and saves the output in theResults folder

  • The scriptmake.R runs the project usingdrake and saves the output in the.drake folder, which can be accessed by name usingdrake'sloadd() function


Note on using drake:Drake provides a lot of useful features but it has two drawbacks in this project:

  • Running the project usingrun.R is much faster than withmake.R in computers with multiple cores because some packages use theparallel package underneath, which doesn't work well withclustermq, the packagedrake uses to distribute the work

  • The Enrichment Analysis of the top modules is only available when running the code usingrun.R, because of compatibility issues between the packageclusterProfiler anddrake


Running the code

  1. Clone this repository

  2. Download InputData fromdoi.org/10.7488/ds/2980

  3. Executerun.R ormake.R depending on whether you want your workflow to be rundrake or not


InputData


  • genes_GO_annotations: Gene Ontology annotations for each gene

  • krishnan_probability_score.xlsx: Krishnan's ASD probabilty score downloaded fromasd.princeton.edu

  • NCBI_gene2ensembl_20_02_07gz: NCBI's mapping between genes symbols and ensembl IDs

  • NCBI_gene_info_20_02_07_.gz: Functional annotations of the genes

  • RNAseq_ASD_datExpr.csv: Gene expression matrix. Downloaded frommgandal's github repository

  • RNAseq_ASD_datMeta.csv: Metadata of the samples from the gene expression matrix. Downloaded frommgandal's github repository

  • sanders_TADA_score.xlsx Sanders TADA score downloaded from [He et al., 2013)[https://doi.org/10.1371/journal.pgen.1003671]

  • SFARI_genes_01-03-2020.csv: SFARI Gene scores using new scoring system

  • SFARI_genes_08-29-2019.csv: SFARI Gene scores using old scoring system


Output


Preprocessed Input Data

  • new_SFARI_dataset: Dataframe with information about SFARI genes with the new annotation criteria (scores 1 to 3)

  • old_SFARI_dataset: Dataframe with information about SFARI genes with the original annotation criteria (scores 1 to 6)

  • NCBI_dataset: Dataframe with gene biotype annotation obtained from NCBI

  • GO_neuronal_dataset: Dataframe with gene annotation indicating if they have some neuronal-related function in the Gene Ontology

  • Gandal_dataset: RData object containing the preprocessed and normalised gene expression data


WGCNA

  • modules_dataset: Dataframe indicating the module each of the genes belong to

  • top_modules_by_Diagnosis: Dataframe indicating the modules with the highest relation to Diagnosis as well as their correlation value

  • top_modules_by_SFARI: Dataframe indicating the modules with the highest enrichment in SFARI Genes as well as their enrichment and adjusted p-value

  • top_modules_enrichment: (not included in thedrake workflow) Named list with the Enrichment results for all the modules with a strong correlation to Diagnosis or enriched in SFARI Genes


Classification Model

  • classification_dataset: Dataframe with the input data used for the classification models

  • biased_classification_model: Named list with the information from the biased classification model, including the predictions for each gene and the coefficients and performance metrics of the model

  • unbiased_classification_model: Named list with the information from the unbiased classification model. The list includes the same elements as biased_classification_model

About

Code repository for SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp