Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Easy Manipulation of Multiple Sequence Alignments (Concatenation and Format Conversion)

License

NotificationsYou must be signed in to change notification settings

kyungtaekLIM/seqlim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concatenate and Convert Multiple Sequence Alignments

Description

seqlim includes a python library and an executable for manipulating biological sequences. It concatenates multiple sequence alignments (MSAs) horizontally or vertically, and converts MSAs into various formats (fasta, phylip, nexus, msf, tsv, and csv). The horizontal concatenation of MSAs is often used for multi-loci/multi-gene phylogenetic analysis and phylogenomics.

Installation

  • Install Python 2.7 or higher, Python installers are available athttps://www.python.org/.
  • Clone or download this repo and install using setup.py.
$ python setup.py install
  • Confirm the installation ofseqlim executable.
$ seqlim -h
  • Confirm the installation ofseqlim library.
$ python>>> from seqlim import MSeq

Executable examples

  • Suppose two sequence files in FASTA format in./test/fasta.
`Locus1.fasta`   >Escheri1   CCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAAC   >Enteroc1   UGUGGUGGCGAUAGCGAGAAGGAUACACCUGUUCCCAUGCCGAACACAGAAGUUAAGC`Locus2.fasta`   >Escheri2   UAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACU--GCCAGGC   >Enteroc2   UAGCGCCGAUUGUAGUGAAGGGUUUCCCUUUGUGAGAGUAGG--ACGUCGCCACGC
  • Concatenate these files horizontally.
$ seqlim cath ./test/fasta>Escheri1CCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACU--GCCAGGC>Enteroc1UGUGGUGGCGAUAGCGAGAAGGAUACACCUGUUCCCAUGCCGAACACAGAAGUUAAGCUAGCGCCGAUUGUAGUGAAGGGUUUCCCUUUGUGAGAGUAGG--ACGUCGCCACGC
  • Concatenate the files vertically.
$ seqlim catv ./test/fasta>Escheri1CCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAAC>Enteroc1UGUGGUGGCGAUAGCGAGAAGGAUACACCUGUUCCCAUGCCGAACACAGAAGUUAAGC>Escheri2UAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACU--GCCAGGC>Enteroc2UAGCGCCGAUUGUAGUGAAGGGUUUCCCUUUGUGAGAGUAGG--ACGUCGCCACGC
  • Set an input sequence format after-infmt.seqlim accepts 'fasta', 'fas', 'mfa', 'fna', 'fsa' or 'fa' for FASTA format, 'phylip' or 'phy' for PHYLIP format and 'msf' for MSF format.
$ seqlim -infmt phylip cath ./test/phylip>Escheri1CCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAAC>Enteroc1UGUGGUGGCGAUAGCGAGAAGGAUACACCUGUUCCCAUGCCGAACACAGAAGUUAAGC>Escheri2UAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACU--GCCAGGC>Enteroc2UAGCGCCGAUUGUAGUGAAGGGUUUCCCUUUGUGAGAGUAGG--ACGUCGCCACGC
  • Set an output sequence format after-outfmt.seqlim accepts 'fasta', 'fas', 'mfa', 'fna', 'fsa' or 'fa' for FASTA format, 'phylip' or 'phy' for PHYLIP format, 'nexus', 'nex' or 'nxs' for NEXUS format, 'msf' for MSF format, 'csv' for CSV format and 'tsv' for TSV format.
$ seqlim -outfmt phylip cath ./test/fasta 2 114Escheri1     CCUGGCGGCC GUAGCGCGGU GGUCCCACCU GACCCCAUGC CGAACUCAGA AGUGAAACUAEnteroc1     UGUGGUGGCG AUAGCGAGAA GGAUACACCU GUUCCCAUGC CGAACACAGA AGUUAAGCUA             GCGCCGAUGG UAGUGUGGGG UCUCCCCAUG CGAGAGUAGG GAACU--GCC AGGC             GCGCCGAUUG UAGUGAAGGG UUUCCCUUUG UGAGAGUAGG --ACGUCGCC ACGC
  • The line and block lengths of sequences can be adjusted using-line_length and-block_length, respectively.
$ seqlim -outfmt phylip -line_length 50 -block_length 5 cath ./test/fasta 2 114Escheri1     CCUGG CGGCC GUAGC GCGGU GGUCC CACCU GACCC CAUGC CGAAC UCAGAEnteroc1     UGUGG UGGCG AUAGC GAGAA GGAUA CACCU GUUCC CAUGC CGAAC ACAGA             AGUGA AACUA GCGCC GAUGG UAGUG UGGGG UCUCC CCAUG CGAGA GUAGG             AGUUA AGCUA GCGCC GAUUG UAGUG AAGGG UUUCC CUUUG UGAGA GUAGG             GAACU --GCC AGGC             --ACG UCGCC ACGC
  • Save an output.
$ seqlim -o ./test/temp/concatenated.fasta cath ./test/fasta
  • Just format conversion.
$ seqlim -outfmt phylip -o ./test/temp/converted.phylip cnvt ./test/fasta/locus1.fasta
  • Convert all sequence files in./test/fasta to another format (phylip) and save them in./test/phylip.
$ seqlim -o ./test/phylip -outfmt phylip cnvt ./test/fasta

About

Easy Manipulation of Multiple Sequence Alignments (Concatenation and Format Conversion)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp