Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
Evan Staton edited this pageJan 21, 2022 ·16 revisions

This brief guide will show you how to get GO term associations for your genes using HMMER2GO, and also analyze the GO associations for a study set of genes using Ontologizer.

HMMER2GO

The script below will fetch the latestArabidopsis thaliana coding sequences and run the full analysis. In order of processing, the following script will:

  1. find the longest ORF for each sequence and translate them,
  2. search the translated ORFs for domain matches,
  3. download the protein domain to GO term mappings, and finally,
  4. tabulate the type and number of GO terms for each sequence.

Please note the comment below about the run time. You may want to try a smaller subset to test the commands, or simply read through the steps to understand what each command is doing. Once this process completes you should be able to explore the results with Ontologizer, as explained in the next section.

If we are using Docker, let's consider that our input data is under the current working directory in a directory calledwork and the Pfam DB is in a directory called/db. The following Docker command would mount both of these directories:

docker run -it --name hmmer2go-con -v $(pwd)/work:/work:Z -v /db:/db:Z sestaton/hmmer2go

Ideally the following script would be in thework directory. Inside the Docker container, you can change to the/work directory and run the script.

#!/bin/bashwd=$(pwd)Athal_cdscmp=Arabidopsis_thaliana.TAIR10.cds.all.fa.gzAthal_base=$(echo${Athal_cdscmp}| sed's/\.fa.*//')Athal_orfs=${Athal_base}_orfs.faapfam_out=${Athal_base}_orfs_Pfam-A.tbloutpfam_GO=${Athal_base}_orfs_Pfam-A_GO.tsvpfam_GAF=${Athal_base}_orfs_Pfam-A_GO_mapping.gafpfam_db=`pwd`/db/pfam/latest/Pfam-A.hmm# Edit to the location of the models you want to use## Set up Pfam-A DBpfam_dir=$(dirname$pfam_db)pfam_file=$(basename$pfam_db)pfam_dl=${pfam_file}.gzmkdir -p$pfam_dir&&cd$pfam_dirwget -q"http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam35.0/Pfam-A.hmm.gz"gunzip$pfam_dlhmmpress$pfam_filecd$wd## Fetch latest A. thaliana CDS modelswget -q -O$Athal_cdscmp \"http://ftp.gramene.org/CURRENT_RELEASE/fasta/arabidopsis_thaliana/cds/Arabidopsis_thaliana.TAIR10.cds.all.fa.gz"## Get longest ORF for each cDNAtime hmmer2go getorf --choose -i$Athal_cdscmp -o$Athal_orfs## Find domain matchestime hmmer2go run -i$Athal_orfs -d$pfam_db -o$pfam_out -t 4## Find Pfam -> GO term mappingtime hmmer2go mapterms -i$pfam_out -o$pfam_GO --map## Create GAF filetime hmmer2go map2gaf -i$pfam_GO -o$pfam_GAF -s'Arabidopsis thaliana'

This script will take about 9 hours to run (on a small cloud instance, though you can speed this up using multiple processors), so usenohup or your queuing system (e.g.,nohup bash hmmer2go.sh 2>&1 > hmmer2go.out &). Note that this script fetches the latest Pfam models and creates the required database to run HMMER2GO. If you have a set of existing models you want to use, perhaps created with thehmmer2go pfamsearch command, then you will have to edit line 12 above and add the location of the Pfam HMM models you want to use (these need to be formatted withhmmpress, which will be handled byhmmer2go pfamsearch --createdb). The expected output should be:

32579 query sequences with 32579 GO terms mapped in file Arabidopsis_thaliana_GOterm_mapping.tsv.

Ontologizer

First, you will have to downloadOntologizer and have Java installed. The following is a very simple example. The study set in a real use case would be derived from some experimental result.

Generate the population set:

egrep -v "^\!" Arabidopsis_thaliana_GOterm_mapping.tsv | cut -f2 | sort > athal_popset.txt

For the sake of example, we will create a study set of IDs based on the description (replace "term description" with any terms of interest, for example, "ATP"):

egrep -v "^\!" Arabidopsis_thaliana_GOterm_mapping.tsv | grep <term description> | cut -f2 | sort > athal_studyset.txt

With the population and study sets prepared, we can now run Ontologizer:

#!/bin/bash# Download the same GO ontology used by HMMER2GOwget http://purl.obolibrary.org/obo/go.obogaf=Arabidopsis_thaliana_GOterm_mapping.gafjava -jar Ontologizer.jar \     -s athal_studyset.txt \     -a$gaf \     -g go.obo \     -p athal_popset.txt \     -o athal_ontologizer
Clone this wiki locally

[8]ページ先頭

©2009-2025 Movatter.jp