Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Python package for studying Acidobacteria

License

NotificationsYou must be signed in to change notification settings

sap218/acidoseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Studying Acidobacteria reads from aNanopore metagenomic data-set |Python v3.5 |PyPI (see version)

AuthorSamantha C Pendleton, Data Science MSc Aberystwyth University,Twitter |GitHub

Follow the Twitter bot I created,acido_bot, that dispenses daily facts about Acidobacteria!

TheGC content of the Acidobacteria genomes are consistent with their placements, e.g. species in the same subdivision (above 60% for group V fragments and roughly 10% lower for group III fragments) are similar, displaying the diversity within the phylum [1].The abundance of the subdivisions correlate with pH depends on the subdivisions: 1, 2, 3, 12, 13 have a negative relationship as pH increases, whilst 4, 6, 7, 10, 11, 16, 17, 18, 22, 25 are sparse in low pH and have a positive relationship as pH increases [2].

This package includes studying a collection of reads and gathering the ones assigned as Acidobacteria from a Kaiju output. There are various statistical information and GC plots. Futhermore, the group of unclassified Acidobacteria reads are visualised into subdivisons based on the pH level of the soil sample.

Introduction

Kaiju output provides taxon ID and the corredponding sequence, my package outputs the Acidobacteria species alongside annotation, plots, and information on the unclassified reads.

Prerequisite
  • FASTA format of all the reads.
  • Kaiju output after extracting the two columns: sequence ID and NCBI taxIDs.
Dependencies
import osimport csv                                                                                                        import pysam  import collectionsimport matplotlib.pyplot as pltimport matplotlib.patches as mpatchesimport randomfrom termcolor import coloredfrom colorama import init import click

$ pip3 install matplotlib

Installation

GitClone

$ git clone https://github.com/sap218/acidoseq.git

pip

$ pip install acidoseq

Kaiju

I used the Kaiju output: columns 2 and 3 which included sequence references and the NCBI taxons.

  1. Filter the output with only classified labels$ awk '$1 == "C"' kaiju.out > kaijuC.out
  2. Cut the columns$ cut -f2,3 kaijuC.out > results.txt
  3. Converted the txt to csv (comma-delimted)$ sed 's/\s\+/,/g' results.txt > result_seqid_taxon.csv

Map

If you are unsure of the pH of your soil samples, you may want to use the map script first - default city is Aberystwyth.

Pleasenote: due to the fact that the Earth is spherical and maps are 2-dimensional, there will be some distortion when plotting locations.

$ acidomap --city Birmingham

Usage

CLIneeds the Kaiju and FASTA file, all other options have defaults: e.g. pH = 5.

If no plot style was provided, or entered incorrectly, it will choose a random one.

Run like followed withLinux (find how torun with other operating systems here):

$ acidoseq --helpUsage: acidoseq [OPTIONS]Options:  --taxdumptype TEXT  Study "ALL" or only unclassified "U"?  --kaijufile TEXT    Place edited Kaiju (csv) in directory for ease.  --fastapath TEXT    Place FASTA in directory for ease.  --style TEXT        ['seaborn-bright', 'seaborn-poster', 'seaborn-white',                      'bmh', 'seaborn-darkgrid', 'seaborn-pastel',                      'grayscale', '_classic_test', 'ggplot', 'seaborn-                      whitegrid', 'seaborn-dark', 'seaborn-muted', 'seaborn-                      colorblind', 'seaborn-ticks', 'Solarize_Light2',                      'seaborn-notebook', 'dark_background', 'fast',                      'seaborn', 'fivethirtyeight', 'seaborn-paper', 'seaborn-                      dark-palette', 'seaborn-talk', 'classic', 'seaborn-                      deep']  --plottype TEXT     "span" range of GC means OR "line" average mean GC  --ph TEXT           pH of soil, use map script for assistance.  --help              Show this message and exit.
Examples

$ acidoseq --kaijufile result_seqid_taxon.csv --fastapath all.fa

$ acidoseq --taxdumptype ALL --kaijufile result_seqid_taxon.csv --fastapath all.fa --style ggplot --plottype span --ph 4.92

$ acidoseq --taxdumptype U --kaijufile result_seqid_taxon.csv --fastapath all.fa --style seaborn --plottype line --ph 7.14

Output

  • FASTA file: a collection of reads which were identified as Acidobacteria
  • Plot of AT and GC ratio comparison with means
  • Indepth plot of GC ratio with subdivisions labelled (regions with 'span' and means with 'line')
  • Separate FASTA files of the unclassified reads assigned into subdivisions based on the pH, e.g. a file of sequences which reside in the subdivison 1 GC span if the pH is low

Acknowledgements

Thank you! 🌱

Don't hesitate to create an issue or make a suggestion!

Todo List
  • Make available
  • Improve descriptions and comments
  • Look into command line interface
  • Fix code to output unclassified subdivisions based on pH
  • Alter code so the input file can be the original Kaiju output
  • Make available on Conda
References

[1] Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S. C., Treusch, A. H., Eck, J., & Schleper, C. (2003). Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Molecular microbiology, 50(2), 563-575.

[2] Eichorst, S. A., Breznak, J. A., & Schmidt, T. M. (2007). Isolation and characterization of soil bacteria that define Terriglobus gen. nov., in the phylum Acidobacteria. Applied and environmental microbiology, 73(8), 2708-2717.

About

A Python package for studying Acidobacteria

Topics

Resources

License

Stars

Watchers

Forks

Languages


[8]ページ先頭

©2009-2025 Movatter.jp