Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

FASTQ-to-CodFreq pipeline for HIV-1 and SARS-CoV-2

License

NotificationsYou must be signed in to change notification settings

hivdb/codfreq

Repository files navigation

CodonFrequency Table Format

The HIVDB Sequence Reads Interpretation Program accepts a codon frequency tablethat stores in theCodFreq format. The CodFreq format consists of fivecolumns:

  1. gene (PR,RT, orIN);
  2. position;
  3. total number of reads of this position;
  4. codon nucleotide triplet; and
  5. total number of reads of this codon.

Examples

This repository contains CodFreq filesgenerated from publicly available SRA sequences. We have also included threeselected files from studies that utilize Illumina sequencing. To analyze thesefiles, first download one or more CodFreq example files. Then, submit them to theHIVDB Interpretation Program foranalysis.

Create.codfreq file from.fastq/.fastq.gz file

  1. Install Docker CE (https://docs.docker.com/install/).

  2. Download script:

    sudo curl -sL https://raw.githubusercontent.com/hivdb/codfreq/main/bin-wrapper/align-all-docker -o /usr/local/bin/fastq2codfreqsudo chmod +x /usr/local/bin/fastq2codfreq
  3. Download alignment profiles:

    mkdir profilescurl -sL https://raw.githubusercontent.com/hivdb/codfreq/main/profiles/HIV1.json -o profiles/HIV1.jsoncurl -sL https://raw.githubusercontent.com/hivdb/codfreq/main/profiles/SARS2.json -o profiles/SARS2.json
  4. Use following command to process FASTQ files and generate CodFreq files.

    fastq2codfreq -r profiles/HIV1.json -d path/to/fastq/folders

    The script will automatically find every file named with an extension of.fastq, align them to.sam file and then extract the codon freqency tableinto.codfreq file.

    The above command is adequate for most case of both paired or unpaired FASTQfiles generated by Illumina with the filename pattern looks like*_L001_R1_001.fastq.gz and*_L001_R1_002.fastq.gz. However, if your FASTQfiles are in other naming convention, please readAdvanced usages § Manuallypairing FASTQ files.

Note: thefastq2codfreq script can only be executed in an Unix-like system. If you are using Microsoft Windows 10,you need to install theWindows Subsystem for Linux touse this script.

Offline usage

Thefastq2codfreq command can be used offline, although the usage is slightlydifferent from the above description. Followings are the differences:

  • Docker's installation package, thefastq2codfreq script and the alignmentprofiles can be transfered to the offline server using a portable drive.
  • Docker image used byfastq2codfreq can be downloaded into a binary file, andtransfer to the offline server using a portable drive.
    # Run this command on a computer with Internet accessdocker save hivdb/codfreq-runner:latest| gzip> codfreq-runner.tar.gz# Run this command on the offline serverdocker load< codfreq-runner.tar.gz
  • The auto-update option offastq2codfreq should also be disabled withargument-s:
    fastq2codfreq -s -r profiles/HIV1.json -d path/to/fastq/folders

Advanced usages

Disable auto-pairing FASTQ files

A flag argument-m can be added tofastq2codfreq command to dissableauto-pairing FASTQ files.

fastq2codfreq -m -r profiles/HIV1.json -d path/to/fastq/folders

Manually pairing FASTQ files

With paired FASTQ files, a single CodFreq file will be generated by the process.The program will try to match the FASTQ files with similar names as paired FASTQfiles. To change this behavior, apairinfo.json file can be supplied under thesame folder that includes FASTQ files. We have provided an example file atexamples/pairinfo.json.

Customize fastp options

Programfastp is by default used to trimadapters, filter low quality regions and reads which are too short.examples/fastp-config.jsonlisted all fastp options supported by this pipeline. Please refer tofastp'sdocumentation for the usage andexplanation of these options.

To apply your customized settings, make afastp-config.json file and save itunder the same folder that includes FASTQ files. You can also disable adaptertrimming, low phred quality filtering or length filtering by set thecorresponding disabling flags totrue.

Primer trimming - FASTA

CodFreq pipeline supports trimming FASTA format primer sequences by usingcutadapt.examples/cutadapt-config.jsonlisted all cutadapt options supported by this pipeline. Please refer tocutadapt'sreference guide for theusage and explanation of these options.

Three type of optional FASTA primer files can be supplied under the same folderthat includes the FASTQ files:primers3.fa,primers5.fa andprimers53.fawhich corresponding to the “3’ adapters”, “5’ adapters”, and “5’ or 3’ adapters”described incutadapt's userguide.

To enable primer trimming (FASTA), you must make a validcutadapt-config.jsonfile under the same folder that includes FASTQ files.

Primer trimming - BED

CodFreq pipeline supports trimming BED format primer locations by usingivar.examples/ivar-trim-config.jsonlisted allivar trim options supported by this pipeline. Please refer toivar'smanual for theusage and explanation of these options.

A BED primer file can be supplied under the same folder that includes the FASTQfiles:primers.bed (example:examples/primers.bed).ivar requires a BED6 format which is a tab-delimited file include following sixcolumns (no header): reference, start, end, name, score, and strand. We havereviewed ivar 4.1 source code and have confirmed that only four columns - start,end, name, and strand are used by ivar. The other two (reference and score) canbe just supplied in any values for completing the BED6 format.

To enable primer trimming (BED), you must make a validivar-trim-config.json fileunder the same folder that includes FASTQ files.

Other tools

Consolidate codon frequency table to amino acid freqency table

A script using only the standard Python library is provided to consolidate acodon frequency table (.codfreq or .codfreq.gz file) into an amino acidfrequency table (.aafreq.csv file). The script merges rows of codons that can betranslated into the same amino acid.

This script requires Python 3.9 or higher version to be installed. This requiredPython runtime is included in the latest version of MacOS and most Linuxreleases. To install the latest Python version, please follow theofficialwebsite.

To use this script:

  1. Download the script:

    sudo curl -sL https://raw.githubusercontent.com/hivdb/codfreq/main/scripts/codfreq2aafreq.py -o /usr/local/bin/codfreq2aafreqsudo chmod +x /usr/local/bin/codfreq2aafreq
  2. Run the script:

    codfreq2aafreq dir/to/read/codfreqs dir/to/write/aafreqs

About

FASTQ-to-CodFreq pipeline for HIV-1 and SARS-CoV-2

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp