Usage and ExamplesLink
Table of Contents
- Usage and Examples
- Before use
- taxonkit
- list
- lineage
- reformat
- reformat2
- name2taxid
- filter
- lca
- taxid-changelog
- profile2cami
- cami-filter
- create-taxdump
- genautocomplete
Before useLink
- Download and uncompress
taxdump.tar.gz
: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz - Copy
names.dmp
,nodes.dmp
,delnodes.dmp
andmerged.dmp
to data directory:$HOME/.taxonkit
,e.g.,/home/shenwei/.taxonkit
, - Optionally copy to some other directories, and later you can refer to using flag
--data-dir
,or environment variableTAXONKIT_DB
.
All-in-one command:
wget -c ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz tar -zxvf taxdump.tar.gzmkdir -p $HOME/.taxonkitcp names.dmp nodes.dmp delnodes.dmp merged.dmp $HOME/.taxonkit
Update dataset: Simply re-download the taxdump files, uncompress and override old ones.
taxonkitLink
TaxonKit - A Practical and Efficient NCBI Taxonomy ToolkitVersion: 0.20.0Author: Wei Shen <shenwei356@gmail.com>Source code: https://github.com/shenwei356/taxonkitDocuments : https://bioinf.shenwei.me/taxonkitCitation : https://www.sciencedirect.com/science/article/pii/S1673852721000837Dataset: Please download and uncompress "taxdump.tar.gz": http://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz and copy "names.dmp", "nodes.dmp", "delnodes.dmp" and "merged.dmp" to data directory: "/home/shenwei/.taxonkit" or some other directory, and later you can refer to using flag --data-dir, or environment variable TAXONKIT_DB. When environment variable TAXONKIT_DB is set, explicitly setting --data-dir will overide the value of TAXONKIT_DB.Usage: taxonkit [command] Available Commands: cami-filter Remove taxa of given TaxIds and their descendants in CAMI metagenomic profile create-taxdump Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTV filter Filter TaxIds by taxonomic rank range genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell) lca Compute lowest common ancestor (LCA) for TaxIds lineage Query taxonomic lineage of given TaxIds list List taxonomic subtrees of given TaxIds name2taxid Convert taxon names to TaxIds profile2cami Convert metagenomic profile table to CAMI format reformat Reformat lineage in canonical ranks reformat2 Reformat lineage in chosen ranks, allowing more ranks than 'reformat' taxid-changelog Create TaxId changelog from dump archives version print version information and check for updateFlags: --data-dir string directory containing nodes.dmp and names.dmp (default "/home/shenwei/.taxonkit") -h, --help help for taxonkit --line-buffered use line buffering on output, i.e., immediately writing to stdin/file for every line of output -o, --out-file string out file ("-" for stdout, suffix .gz for gzipped out) (default "-") -j, --threads int number of CPUs. 4 is enough (default 4) --verbose print verbose information
listLink
Usage
List taxonomic subtrees of given TaxIdsAttention: 1. When multiple taxids are given, the output may contain duplicated records if some taxids are descendants of others.Examples: $ taxonkit list --ids 9606 -n -r --indent " " 9606 [species] Homo sapiens 63221 [subspecies] Homo sapiens neanderthalensis 741158 [subspecies] Homo sapiens subsp. 'Denisova' $ taxonkit list --ids 9606 --indent "" 9606 63221 741158 # from stdin echo 9606 | taxonkit list # from file taxonkit list <(echo 9606)Usage: taxonkit list [flags]Flags: -h, --help help for list -i, --ids string TaxId(s), multiple values should be separated by comma -I, --indent string indent (default " ") -J, --json output in JSON format. you can save the result in file with suffix ".json" and open with modern text editor -n, --show-name output scientific name -r, --show-rank output rank
Examples
Default usage.
$taxonkitlist--ids9605,2399349605960663221741158142517026659522665953239934239935349741512293512294113182212626911263034167944426089151131336...
Removing indent. The list could be used to extract sequences from BLAST database with
blastdbcmd
(seetutorial)$taxonkitlist--ids9605,239934--indent""96059606632217411581425170266595226659532399342399353497415122935122941131822126269112630341679444...
Performance: Time and memory usage for whole taxon tree:
$#emptyingthebufferscache$su-c"free && sync && echo 3 > /proc/sys/vm/drop_caches && free"$memusg-ttaxonkitlist--ids1--indent""--verbose>t0.txt21:05:01.782[INFO]parsingmergedfile:/home/shenwei/.taxonkit/names.dmp21:05:01.782[INFO]parsingnamesfile:/home/shenwei/.taxonkit/names.dmp21:05:01.782[INFO]parsingdelnodesfile:/home/shenwei/.taxonkit/names.dmp21:05:01.816[INFO]61023mergednodesparsed21:05:01.889[INFO]437929delnodesparsed21:05:03.178[INFO]2303979namesparsedelapsedtime:3.290speakrss:742.77MB
Adding names
$taxonkitlist--show-rank--show-name--indent" "--ids9605,2399349605[genus]Homo9606[species]Homosapiens63221[subspecies]Homosapiensneanderthalensis741158[subspecies]Homosapienssubsp.'Denisova'1425170[species]Homoheidelbergensis2665952[norank]environmentalsamples2665953[species]Homosapiensenvironmentalsample239934[genus]Akkermansia239935[species]Akkermansiamuciniphila349741[strain]AkkermansiamuciniphilaATCCBAA-835512293[norank]environmentalsamples512294[species]unculturedAkkermansiasp.1131822[species]unculturedAkkermansiasp.SMG251262691[species]Akkermansiasp.CAG:3441263034[species]AkkermansiamuciniphilaCAG:1541679444[species]Akkermansiaglycaniphila2608915[norank]unclassifiedAkkermansia1131336[species]Akkermansiasp.KLE16051574264[species]Akkermansiasp.KLE1797...
Performance: Time and memory usage for whole taxonomy tree:
$# emptying the buffers cache$su-c"free && sync && echo 3 > /proc/sys/vm/drop_caches && free"$memusg-ttaxonkitlist--show-rank--show-name--ids1>t1.txtelapsedtime:5.341speakrss:1.04GB
Output in JSON format, you can easily collapse and uncollapse taxonomy tree in modern text editor.
$taxonkitlist--show-rank--show-name--indent" "--ids9605,239934--json{"9605 [genus] Homo":{"9606 [species] Homo sapiens":{"63221 [subspecies] Homo sapiens neanderthalensis":{},"741158 [subspecies] Homo sapiens subsp. 'Denisova'":{}},"1425170 [species] Homo heidelbergensis":{}},"239934 [genus] Akkermansia":{"239935 [species] Akkermansia muciniphila":{"349741 [no rank] Akkermansia muciniphila ATCC BAA-835":{}},"512293 [no rank] environmental samples":{"512294 [species] uncultured Akkermansia sp.":{},"1131822 [species] uncultured Akkermansia sp. SMG25":{},"1262691 [species] Akkermansia sp. CAG:344":{},"1263034 [species] Akkermansia muciniphila CAG:154":{}},"1679444 [species] Akkermansia glycaniphila":{},"2608915 [no rank] unclassified Akkermansia":{"1131336 [species] Akkermansia sp. KLE1605":{},"1574264 [species] Akkermansia sp. KLE1797":{},"1574265 [species] Akkermansia sp. KLE1798":{},"1638783 [species] Akkermansia sp. UNK.MGS-1":{},"1755639 [species] Akkermansia sp. MC_55":{}}}}
Snapshot of taxonomy (taxid 1) in kate:
lineageLink
Usage
Query taxonomic lineage of given TaxIdsInput: - List of TaxIds, one TaxId per line. - Or tab-delimited format, please specify TaxId field with flag -i/--taxid-field (default 1). - Supporting (gzipped) file or STDIN.Output: 1. Input line data. 2. (Optional) Status code (-c/--show-status-code), values: - "-1" for queries not found in whole database. - "0" for deleted TaxIds, provided by "delnodes.dmp". - New TaxIds for merged TaxIds, provided by "merged.dmp". - Taxids for these found in "nodes.dmp". 3. Lineage, delimiter can be changed with flag -d/--delimiter. 4. (Optional) TaxIds taxons in the lineage (-t/--show-lineage-taxids) 5. (Optional) Name (-n/--show-name) 6. (Optional) Rank (-r/--show-rank)Filter out invalid and deleted taxids, and replace merged taxids with new ones: # input is one-column-taxid $ taxonkit lineage -c taxids.txt \ | awk '$2>0' \ | cut -f 2- # taxids are in 3rd field in a 4-columns tab-delimited file, # for $5, where 5 = 4 + 1. $ cat input.txt \ | taxonkit lineage -c -i 3 \ | csvtk filter2 -H -t -f '$5>0' \ | csvtk -H -t cut -f -3Usage: taxonkit lineage [flags]Flags: -d, --delimiter string field delimiter in lineage (default ";") -h, --help help for lineage -L, --no-lineage do not show lineage, when user just want names or/and ranks -R, --show-lineage-ranks appending ranks of all levels -t, --show-lineage-taxids appending lineage consisting of taxids -n, --show-name appending scientific name -r, --show-rank appending rank of taxids -c, --show-status-code show status code before lineage -i, --taxid-field int field index of taxid. input data should be tab-separated (default 1)
Examples
Full lineage:
#notethat123124124isafaketaxid,3wasdeleted,92489,1458427weremerged$cattaxids.txt960699133766193497412399353141011193213270371231241243924891458427$taxonkitlineagetaxids.txt|teelineage.txt19:22:13.077[WARN]taxid92489wasmergedinto79633419:22:13.077[WARN]taxid1458427wasmergedinto145842519:22:13.077[WARN]taxid123124124notfound19:22:13.077[WARN]taxid3wasdeleted9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens9913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B11932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle1327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y123124124392489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae1458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraicheisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei#wrappedtablewithcsvtkpretty(>v0.26.0)$taxonkitlineagetaxids.txt|csvtkpretty-Ht-x';'-W70-Sbold┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃9606┃cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;┃┃┃Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;┃┃┃Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;┃┃┃Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;┃┃┃Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;┃┃┃Homo;Homosapiens┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃9913┃cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;┃┃┃Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;┃┃┃Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;┃┃┃Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;┃┃┃Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃376619┃cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;┃┃┃Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;┃┃┃Francisellatularensissubsp.holarctica;┃┃┃Francisellatularensissubsp.holarcticaLVS┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃349741┃cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;┃┃┃Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;┃┃┃Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃239935┃cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;┃┃┃Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;┃┃┃Akkermansiamuciniphila┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃314101┃cellularorganisms;Bacteria;environmentalsamples;┃┃┃unculturedmurinelargebowelbacteriumBAC54B┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃11932┃Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;┃┃┃Ortervirales;Retroviridae;unclassifiedRetroviridae;┃┃┃IntracisternalA-particles;MouseIntracisternalA-particle┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃1327037┃Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;┃┃┃Caudovirales;Siphoviridae;unclassifiedSiphoviridae;┃┃┃CroceibacterphageP2559Y┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃92489┃cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;┃┃┃Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae┃┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫┃1458427┃cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;┃┃┃Burkholderiales;Comamonadaceae;Serpentinomonas;┃┃┃Serpentinomonasraichei┃┗━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Speed.
$timeecho9606|taxonkitlineage9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiensreal0m1.190suser0m2.365ssys0m0.170s# all TaxIds$timetaxonkitlist--ids1--indent""|taxonkitlineage>treal0m4.249suser0m16.418ssys0m1.221s
Checking deleted or merged taxids
$taxonkitlineage--show-status-codetaxids.txt|teelineage.withcode.txt# valid$catlineage.withcode.txt|awk'$2 > 0'|cut-f1,296069606991399133766193766193497413497412399352399353141013141011193211932132703713270379248979633414584271458425# merged$catlineage.withcode.txt|awk'$2 > 0 && $2 != $1'|cut-f1,29248979633414584271458425# deleted$catlineage.withcode.txt|awk'$2 == 0'|cut-f13# invalid$catlineage.withcode.txt|awk'$2 < 0'|cut-f1123124124
Filter out invalid and deleted taxids, and replace merged taxids with new ones, you may installcsvtk.
#inputisone-column-taxid$taxonkitlineage-ctaxids.txt\|awk'$2>0'\|cut-f2-#taxidsarein3rdfieldina4-columnstab-delimitedfile,#for$5,where5=4+1.$catinput.txt\|taxonkitlineage-c-i3\|csvtkfilter2-H-t-f'$5>0'\|csvtk-H-tcut-f-3
Only show name and rank.
$taxonkitlineage-r-n-Ltaxids.txt\|csvtkpretty-H-t9606Homosapiensspecies9913Bostaurusspecies376619Francisellatularensissubsp.holarcticaLVSstrain349741AkkermansiamuciniphilaATCCBAA-835strain239935Akkermansiamuciniphilaspecies314101unculturedmurinelargebowelbacteriumBAC54Bspecies11932MouseIntracisternalA-particlespecies1327037CroceibacterphageP2559Yspecies123124124392489Erwiniaoleaespecies1458427Serpentinomonasraicheispecies
Show lineage consisting of taxids:
$taxonkitlineage-ttaxids.txt9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;40674;32525;9347;1437010;314146;9443;376913;314293;9526;314295;9604;207598;9605;96069913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;40674;32525;9347;1437010;314145;91561;9845;35500;9895;27592;9903;9913376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS131567;2;1224;1236;72273;34064;262;263;119857;376619349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;1783257;74201;203494;48461;1647988;239934;239935;349741239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;1783257;74201;203494;48461;1647988;239934;239935314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B131567;2;48479;31410111932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle10239;2559587;2732397;2732409;2732514;2169561;11632;35276;11749;119321327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y10239;2731341;2731360;2731618;2731619;28883;10699;196894;1327037123124124392489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae131567;2;1224;1236;91347;1903409;551;7963341458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei131567;2;1224;28216;80840;80864;2490452;1458425
or read taxids from STDIN:
$cattaxids.txt|taxonkitlineage
And ranks of all nodes:
$echo2697049\|taxonkitlineage-t-R\|csvtktranspose-Ht2697049Viruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;2697049superkingdom;clade;kingdom;phylum;class;order;suborder;family;subfamily;genus;subgenus;species;norank
Another way to show lineage detail of a TaxId
$echo2697049\|taxonkitlineage-t\|csvtkcut-Ht-f3\|csvtkunfold-Ht-f1-s";"\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t10239superkingdomViruses2559587cladeRiboviria2732396kingdomOrthornavirae2732408phylumPisuviricota2732506classPisoniviricetes76804orderNidovirales2499399suborderCornidovirineae11118familyCoronaviridae2501931subfamilyOrthocoronavirinae694002genusBetacoronavirus2509511subgenusSarbecovirus694009speciesSevereacuterespiratorysyndrome-relatedcoronavirus2697049norankSevereacuterespiratorysyndromecoronavirus2
reformatLink
Usage
Reformat lineage in canonical ranksWarning: - 'taxonkit reformat2' is recommended since Match 2025 when NCBI made big changes to ranks. See more: https://ncbiinsights.ncbi.nlm.nih.gov/2025/02/27/new-ranks-ncbi-taxonomy/Input: - List of TaxIds or lineages, one record per line. The lineage can be a complete lineage or only one taxonomy name. - Or tab-delimited format. Plese specify the lineage field with flag -i/--lineage-field (default 2). Or specify the TaxId field with flag -I/--taxid-field (default 0), which overrides -i/--lineage-field. - Supporting (gzipped) file or STDIN.Output: 1. Input line data. 2. Reformated lineage. 3. (Optional) TaxIds taxons in the lineage (-t/--show-lineage-taxids)Ambiguous names: - Some TaxIds have the same complete lineage, empty result is returned by default. You can use the flag -a/--output-ambiguous-result to return one possible resultOutput format can be formated by flag --format, available placeholders: {C}: cellular root {a}: acellular root {r}: realm {d}: domain {k}: superkingdom {K}: kingdom {p}: phylum {c}: class {o}: order {f}: family {g}: genus {s}: species {t}: subspecies/strain {S}: subspecies {T}: strainWhen these're no nodes of rank "subspecies" nor "strain",you can switch on -S/--pseudo-strain to use the node with lowest rankas subspecies/strain name, if which rank is lower than "species". This flag affects {t}, {S}, {T}.Output format can contains some escape charactors like "\t".Usage: taxonkit reformat [flags] Flags: -P, --add-prefix add prefixes for all ranks, single prefix for a rank is defined by flag --prefix-X -d, --delimiter string field delimiter in input lineage (default ";") -F, --fill-miss-rank fill missing rank with lineage information of the next higher rank -f, --format string output format, placeholders of rank are needed (default "{k};{p};{c};{o};{f};{g};{s}") -h, --help help for reformat -i, --lineage-field int field index of lineage. data should be tab-separated (default 2) -r, --miss-rank-repl string replacement string for missing rank -p, --miss-rank-repl-prefix string prefix for estimated taxon names (default "unclassified ") -s, --miss-rank-repl-suffix string suffix for estimated taxon names. "rank" for rank name, "" for no suffix (default "rank") -R, --miss-taxid-repl string replacement string for missing taxid -a, --output-ambiguous-result output one of the ambigous result --prefix-C string prefix for cellular root, used along with flag -P/--add-prefix (default "d__") --prefix-K string prefix for kingdom, used along with flag -P/--add-prefix (default "K__") --prefix-S string prefix for subspecies, used along with flag -P/--add-prefix (default "S__") --prefix-T string prefix for strain, used along with flag -P/--add-prefix (default "T__") --prefix-a string prefix for acellular root, used along with flag -P/--add-prefix (default "d__") --prefix-c string prefix for class, used along with flag -P/--add-prefix (default "c__") --prefix-d string prefix for domain, used along with flag -P/--add-prefix (default "d__") --prefix-f string prefix for family, used along with flag -P/--add-prefix (default "f__") --prefix-g string prefix for genus, used along with flag -P/--add-prefix (default "g__") --prefix-k string prefix for superkingdom, used along with flag -P/--add-prefix (default "k__") --prefix-o string prefix for order, used along with flag -P/--add-prefix (default "o__") --prefix-p string prefix for phylum, used along with flag -P/--add-prefix (default "p__") --prefix-r string prefix for realm, used along with flag -P/--add-prefix (default "r__") --prefix-s string prefix for species, used along with flag -P/--add-prefix (default "s__") --prefix-t string prefix for subspecies/strain, used along with flag -P/--add-prefix (default "t__") -S, --pseudo-strain use the node with lowest rank as strain name, only if which rank is lower than "species" and not "subpecies" nor "strain". It affects {t}, {S}, {T}. This flag needs flag -F -t, --show-lineage-taxids show corresponding taxids of reformated lineage -I, --taxid-field int field index of taxid. input data should be tab-separated. it overrides -i/--lineage-field -T, --trim do not fill or add prefix for missing rank lower than current rank
Examples:
For version > 0.8.0,
reformat
accept input of TaxIds via flag-I/--taxid-field
.$echo239935|taxonkitreformat-I1239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila$echo349741|taxonkitreformat-I1-f"{k}|{p}|{c}|{o}|{f}|{g}|{s}|{t}"-F-t349741Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila|AkkermansiamuciniphilaATCCBAA-8352|74201|203494|48461|1647988|239934|239935|349741
Example lineage (produced by:
taxonkit lineage taxids.txt | awk '$2!=""' > lineage.txt
).$catlineage.txt9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens9913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B11932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle1327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y92489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae1458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei
Default output format (
"{k};{p};{c};{o};{f};{g};{s}"
).#reformatedlineagesareappendedtotheinputdata$taxonkitreformatlineage.txt...239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila...$$taxonkitreformatlineage.txt|teelineage.txt.reformat$cut-f1,3lineage.txt.reformat9606Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homosapiens9913Eukaryota;Chordata;Mammalia;Artiodactyla;Bovidae;Bos;Bostaurus376619Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis349741Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila314101Bacteria;;;;;;unculturedmurinelargebowelbacteriumBAC54B11932Viruses;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;IntracisternalA-particles;MouseIntracisternalA-particle1327037Viruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;;CroceibacterphageP2559Y92489Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae1458427Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei#aligned$catlineage.txt\|taxonkitreformat\|csvtk-H-tcut-f1,3\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,kindom,phylum,class,order,family,genus,species\|csvtkpretty-ttaxidkindomphylumclassorderfamilygenusspecies------------------------------------------------------------------------------------------------------------------------------------------------------------9606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomosapiens9913EukaryotaChordataMammaliaArtiodactylaBovidaeBosBostaurus376619BacteriaProteobacteriaGammaproteobacteriaThiotrichalesFrancisellaceaeFrancisellaFrancisellatularensis349741BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila239935BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila314101BacteriaunculturedmurinelargebowelbacteriumBAC54B11932VirusesArtverviricotaRevtraviricetesOrterviralesRetroviridaeIntracisternalA-particlesMouseIntracisternalA-particle1327037VirusesUroviricotaCaudoviricetesCaudoviralesSiphoviridaeCroceibacterphageP2559Y92489BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesErwiniaceaeErwiniaErwiniaoleae1458427BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaeSerpentinomonasSerpentinomonasraichei
And
subspecies/strain
({t}
),subspecies
({S}
), andstrain
({T}
) are also available.#defaultoperation$echo-ne"239935\n83333\n1408252\n2697049\n2605619\n"\|taxonkitlineage-n-r\|taxonkitreformat-f'{t};{S};{T}'\|csvtk-H-tcut-f1,4,3,5\|csvtk-H-tsep-f4-s';'-R\|csvtk-H-tadd-header-n"taxid,rank,name,subspecies/strain,subspecies,strain"\|csvtkpretty-ttaxidranknamesubspecies/strainsubspeciesstrain-------------------------------------------------------------------------------------------------------------------------------239935speciesAkkermansiamuciniphila83333strainEscherichiacoliK-12EscherichiacoliK-12EscherichiacoliK-121408252subspeciesEscherichiacoliR178EscherichiacoliR178EscherichiacoliR1782697049norankSevereacuterespiratorysyndromecoronavirus22605619norankEscherichiacoliO16:H48#fillmissingranks#seeexamplebelowfor-F/--fill-miss-rank#$echo-ne"239935\n83333\n1408252\n2697049\n2605619\n"\|taxonkitlineage-n-r\|taxonkitreformat-f'{t};{S};{T}'--fill-miss-rank\|csvtk-H-tcut-f1,4,3,5\|csvtk-H-tsep-f4-s';'-R\|csvtk-H-tadd-header-n"taxid,rank,name,subspecies/strain,subspecies,strain"\|csvtkpretty-ttaxidranknamesubspecies/strainsubspeciesstrain----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------239935speciesAkkermansiamuciniphilaunclassifiedAkkermansiamuciniphilasubspecies/strainunclassifiedAkkermansiamuciniphilasubspeciesunclassifiedAkkermansiamuciniphilastrain83333strainEscherichiacoliK-12EscherichiacoliK-12unclassifiedEscherichiacolisubspeciesEscherichiacoliK-121408252subspeciesEscherichiacoliR178EscherichiacoliR178EscherichiacoliR178unclassifiedEscherichiacoliR178strain2697049norankSevereacuterespiratorysyndromecoronavirus2unclassifiedSevereacuterespiratorysyndrome-relatedcoronavirussubspecies/strainunclassifiedSevereacuterespiratorysyndrome-relatedcoronavirussubspeciesunclassifiedSevereacuterespiratorysyndrome-relatedcoronavirusstrain2605619norankEscherichiacoliO16:H48unclassifiedEscherichiacolisubspecies/strainunclassifiedEscherichiacolisubspeciesunclassifiedEscherichiacolistrain
When these's no nodes of rank "subspecies" nor "strain", you can switch
-S/--pseudo-strain
to use the node with lowest rank as subspecies/strain name, if which rank is lower than "species". Recommend using v0.14.1 or later versions.$echo-ne"239935\n83333\n1408252\n2697049\n2605619\n"\|taxonkitlineage-n-r\|taxonkitreformat-f'{t};{S};{T}'--pseudo-strain\|csvtk-H-tcut-f1,4,3,5\|csvtk-H-tsep-f4-s';'-R\|csvtk-H-tadd-header-n"taxid,rank,name,subspecies/strain,subspecies,strain"\|csvtkpretty-ttaxidranknamesubspecies/strainsubspeciesstrain-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------239935speciesAkkermansiamuciniphila83333strainEscherichiacoliK-12EscherichiacoliK-12EscherichiacoliK-121408252subspeciesEscherichiacoliR178EscherichiacoliR178EscherichiacoliR1782697049norankSevereacuterespiratorysyndromecoronavirus2Severeacuterespiratorysyndromecoronavirus2Severeacuterespiratorysyndromecoronavirus2Severeacuterespiratorysyndromecoronavirus22605619norankEscherichiacoliO16:H48EscherichiacoliO16:H48EscherichiacoliO16:H48EscherichiacoliO16:H48
Add prefix (
-P/--add-prefix
).$catlineage.txt\|taxonkitreformat-P\|csvtk-H-tcut-f1,39606k__Eukaryota;p__Chordata;c__Mammalia;o__Primates;f__Hominidae;g__Homo;s__Homosapiens9913k__Eukaryota;p__Chordata;c__Mammalia;o__Artiodactyla;f__Bovidae;g__Bos;s__Bostaurus376619k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Thiotrichales;f__Francisellaceae;g__Francisella;s__Francisellatularensis349741k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Akkermansiaceae;g__Akkermansia;s__Akkermansiamuciniphila239935k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Akkermansiaceae;g__Akkermansia;s__Akkermansiamuciniphila314101k__Bacteria;p__;c__;o__;f__;g__;s__unculturedmurinelargebowelbacteriumBAC54B11932k__Viruses;p__Artverviricota;c__Revtraviricetes;o__Ortervirales;f__Retroviridae;g__IntracisternalA-particles;s__MouseIntracisternalA-particle1327037k__Viruses;p__Uroviricota;c__Caudoviricetes;o__Caudovirales;f__Siphoviridae;g__;s__CroceibacterphageP2559Y92489k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Erwiniaceae;g__Erwinia;s__Erwiniaoleae1458427k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Serpentinomonas;s__Serpentinomonasraichei
Show corresponding taxids of reformated lineage (flag
-t/--show-lineage-taxids
)$catlineage.txt\|taxonkitreformat-t\|csvtk-H-tcut-f1,4\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,kindom,phylum,class,order,family,genus,species\|csvtkpretty-ttaxidkindomphylumclassorderfamilygenusspecies-------------------------------------------------------960627597711406749443960496059606991327597711406749156198959903991337661921224123672273340642622633497412742012034944846116479882399342399352399352742012034944846116479882399342399353141012314101119321023927324092732514216956111632117491193213270371023927316182731619288831069913270379248921224123691347190340955179633414584272122428216808408086424904521458425# both node name and taxidsecho562\|taxonkitreformat-I1-t\|csvtk-H-tsep-f2-s';'-R\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-n"taxid,kingdom,phylum,class,order,family,genus,species,kingdom_taxid,phylum_taxid,class_taxid,order_taxid,family_taxid,genus_taxid,species_taxid"\|csvtkpretty-ttaxidkingdomphylumclassorderfamilygenusspecieskingdom_taxidphylum_taxidclass_taxidorder_taxidfamily_taxidgenus_taxidspecies_taxid----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------562BacteriaPseudomonadotaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeEscherichiaEscherichiacoli21224123691347543561562
Use custom symbols for unclassfied ranks (
-r/--miss-rank-repl
)$taxonkitreformatlineage.txt-r"__"|cut-f3Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;HomosapiensEukaryota;Chordata;Mammalia;Artiodactyla;Bovidae;Bos;BostaurusBacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;FrancisellatularensisBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;__;__;__;__;__;unculturedmurinelargebowelbacteriumBAC54BViruses;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;IntracisternalA-particles;MouseIntracisternalA-particleViruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;__;CroceibacterphageP2559YBacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;ErwiniaoleaeBacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei$taxonkitreformatlineage.txt-rUnassigned|cut-f3Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;HomosapiensEukaryota;Chordata;Mammalia;Artiodactyla;Bovidae;Bos;BostaurusBacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;FrancisellatularensisBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Unassigned;Unassigned;Unassigned;Unassigned;Unassigned;unculturedmurinelargebowelbacteriumBAC54BViruses;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;IntracisternalA-particles;MouseIntracisternalA-particleViruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;Unassigned;CroceibacterphageP2559YBacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;ErwiniaoleaeBacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei
Estimate and fill missing rank with original lineage information (
-F, --fill-miss-rank
, very useful for formatting input data forLEfSe). You can change the prefix "unclassified" using flag-p/--miss-rank-repl-prefix
.$catlineage.txt\|taxonkitreformat-F\|csvtk-H-tcut-f1,3\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,kindom,phylum,class,order,family,genus,species\|csvtkpretty-ttaxidkindomphylumclassorderfamilygenusspecies------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------9606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomosapiens9913EukaryotaChordataMammaliaArtiodactylaBovidaeBosBostaurus376619BacteriaProteobacteriaGammaproteobacteriaThiotrichalesFrancisellaceaeFrancisellaFrancisellatularensis349741BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila239935BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila314101BacteriaunclassifiedBacteriaphylumunclassifiedBacteriaclassunclassifiedBacteriaorderunclassifiedBacteriafamilyunclassifiedBacteriagenusunculturedmurinelargebowelbacteriumBAC54B11932VirusesArtverviricotaRevtraviricetesOrterviralesRetroviridaeIntracisternalA-particlesMouseIntracisternalA-particle1327037VirusesUroviricotaCaudoviricetesCaudoviralesSiphoviridaeunclassifiedSiphoviridaegenusCroceibacterphageP2559Y92489BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesErwiniaceaeErwiniaErwiniaoleae1458427BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaeSerpentinomonasSerpentinomonasraichei
Do not add prefix or suffix for estimated nodes:
$echo314101|taxonkitreformat-I1314101Bacteria;;;;;;unculturedmurinelargebowelbacteriumBAC54B$echo314101|taxonkitreformat-I1-F-p""-s""314101Bacteria;Bacteria;Bacteria;Bacteria;Bacteria;Bacteria;unculturedmurinelargebowelbacteriumBAC54B
Only some ranks.
$catlineage.txt\|taxonkitreformat-F-f"{s};{p}"\|csvtk-H-tcut-f1,3\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,species,phylum\|csvtkpretty-ttaxidspeciesphylum----------------------------------------------------------------------------------9606HomosapiensChordata9913BostaurusChordata376619FrancisellatularensisProteobacteria349741AkkermansiamuciniphilaVerrucomicrobia239935AkkermansiamuciniphilaVerrucomicrobia314101unculturedmurinelargebowelbacteriumBAC54BunclassifiedBacteriaphylum11932MouseIntracisternalA-particleArtverviricota1327037CroceibacterphageP2559YUroviricota92489ErwiniaoleaeProteobacteria1458427SerpentinomonasraicheiProteobacteria
For some taxids which rank is higher than the lowest rank in
-f/--format
, use-T/--trim
to avoid fill missing rank lower than current rank.$echo-ne"2\n239934\n239935\n"\|taxonkitlineage\|taxonkitreformat-F\|sed-r"s/;+$//"\|csvtk-H-tcut-f1,32Bacteria;unclassifiedBacteriaphylum;unclassifiedBacteriaclass;unclassifiedBacteriaorder;unclassifiedBacteriafamily;unclassifiedBacteriagenus;unclassifiedBacteriaspecies239934Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;unclassifiedAkkermansiaspecies239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila$echo-ne"2\n239934\n239935\n"\|taxonkitlineage\|taxonkitreformat-F-T\|sed-r"s/;+$//"\|csvtk-H-tcut-f1,32Bacteria239934Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila
Support tab in format string
$echo9606\|taxonkitlineage\|taxonkitreformat-f"{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{S}"\|csvtkcut-t-f-29606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomosapiens
List seven-level lineage for all TaxIds.
#replaceemptytaxonwith"Unassigned"$taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-rUnassigned|gzip-c>all.lineage.tsv.gz#tab-delimitedseven-levels$taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-rUnassigned-f"{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}"\|csvtkcut-H-t-f-2\|head-n5\|csvtkpretty-H-t#8-level$taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-rUnassigned-f"{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{t}"\|csvtkcut-H-t-f-2\|head-n5\|csvtkpretty-H-t#Fillandtrim$memusg-t-s'taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-F-T\|sed-r"s/;+$//"\|gzip-c>all.lineage.tsv.gz'elapsedtime:19.930speakrss:6.25GB
From taxid to 7-ranks lineage:
$cattaxids.txt|taxonkitlineage|taxonkitreformat# for taxonkit v0.8.0 or later versions$cattaxids.txt|taxonkitreformat-I1
Some TaxIds have the same complete lineage, empty result is returned by default. You can use the flag
-a/--output-ambiguous-result
to return one possible result.see #42$echo-ne"2507530\n2516889\n"|taxonkitlineage--data-dir.|taxonkitreformat--data-dir.-t19:18:29.770[WARN]wecan't distinguish the TaxIds (2507530, 2516889) for lineage: cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019. But you can use -a/--output-ambiguous-result to return one possible result19:18:29.770 [WARN] we can'tdistinguishtheTaxIds(2507530,2516889)forlineage:cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019.Butyoucanuse-a/--output-ambiguous-resulttoreturnonepossibleresult2507530cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-20192516889cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019$echo-ne"2507530\n2516889\n"|taxonkitlineage--data-dir.|taxonkitreformat--data-dir.-t-a2507530cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russulasp.8KA-20192759;5204;155619;452342;5401;5402;25075302516889cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russulasp.8KA-20192759;5204;155619;452342;5401;5402;2507530
reformat2Link
Usage
Reformat lineage in chosen ranks, allowing more ranks than 'reformat'Input: - List of TaxIds, one record per line. - Or tab-delimited format. Please specify the TaxId field with flag -I/--taxid-field (default 1) - Supporting (gzipped) file or STDIN.Output: 1. Input line data. 2. Reformated lineage. 3. (Optional) TaxIds taxons in the lineage (-t/--show-lineage-taxids)Output format: 1. It can contain some escape characters like "\t". 2. You can use "|" to set multiple ranks, and the first valid one will be outputted. This is useful for a rank with different rank names, especially since NCBI made big changes to some ranks in March 2025: - "Domain" replaces "superkingdom" for Archaea, Bacteria, and Eukaryota - "Acellular root" replaces "superkingdom" for Viruses - Six viral groups are designated with the new rank "realm", the equivalent of "domain" So, we can use "{domain|acellular root|superkingdom}" to handle all these cases and keep compatible with old taxonomy data. $ echo -ne "Eukaryota\nBacteria\nViruses\n" \ | taxonkit name2taxid -s -r \ | taxonkit reformat2 -I 2 -f "{domain|acellular root|superkingdom}" \ | csvtk add-header -Ht -n name,taxid,rank,kingdom/domain \ | csvtk pretty -t name taxid rank kingdom/domain --------- ----- -------------- -------------- Eukaryota 2759 domain Eukaryota Bacteria 2 domain Bacteria Viruses 10239 acellular root Viruses Another example is for subspecies nodes, the rank might be "subpecies", "strain", or "no rank". For example, $ echo -ne "562\n83333\n2697049\n" \ | taxonkit lineage -L -r \ | taxonkit reformat2 -f "{species};{strain|subspecies|no rank}" 562 species Escherichia coli; 83333 strain Escherichia coli;Escherichia coli K-12 2697049 no rank Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2Differences from 'taxonkit reformat': - [input] only accept TaxIDs - [format] accept more rank place holders, not just the seven canonical ones. - [format] use the full name of ranks, such as "{species}", rather than "{s}" - [format] support multiple ranks in one place holder, such as "{subspecies|strain}" - do not automatically add prefixes, but you can simply set them in the formatUsage: taxonkit reformat2 [flags] Flags: -f, --format string output format, placeholders of rank are needed (default "{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species}") -h, --help help for reformat2 -r, --miss-rank-repl string replacement string for missing rank -R, --miss-taxid-repl string replacement string for missing taxid -B, --no-ranks strings rank names of no-rank. A lineage might have many "no rank" ranks, we only keep the last one below known ranks (default [no rank,clade]) -t, --show-lineage-taxids show corresponding taxids of reformated lineage -I, --taxid-field int field index of taxid. input data should be tab-separated. it overrides -i/--lineage-field (default 1) -T, --trim do not replace missing ranks lower than the rank of the current node
Examples
Default format
$echo562|taxonkitreformat2562Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli
Change the format
$echo562|taxonkitreformat2-f"g__{genus}\ts__{species}"562g__Escherichias__Escherichiacoli
Subspecies
$echo511145|taxonkitlineage511145cellularorganisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli;EscherichiacoliK-12;Escherichiacolistr.K-12substr.MG1655$echo511145|taxonkitreformat-I1-f"{d};{p};{c};{o};{f};{g};{s};{t}"511145Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli;EscherichiacoliK-12$echo511145|taxonkitreformat2-I1-f"{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species};{subspecies|strain|no rank}"511145Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli;EscherichiacoliK-12
Trim
$echo561|taxonkitreformat2-runknown561Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;unknown$echo561|taxonkitreformat2-runknown-T561Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;# -----------------------------------------------------------# another example where the order rank is missing$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species}"102403Eukaryota;Mollusca;Bivalvia;0;Poromyidae;Tropidomya;0$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species}"-T102403Eukaryota;Mollusca;Bivalvia;0;Poromyidae;Tropidomya;# and now, the lowest rank in the output format is order, but the tailing "0" is not trimmed.$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order}"102403Eukaryota;Mollusca;Bivalvia;0$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order}"-T102403Eukaryota;Mollusca;Bivalvia;0
name2taxidLink
Usage
Convert taxon names to TaxIdsAttention: 1. Some TaxIds share the same names, e.g, Drosophila. These input lines are duplicated with multiple TaxIds. $ echo Drosophila | taxonkit name2taxid | taxonkit lineage -i 2 -r -L Drosophila 7215 genus Drosophila 32281 subgenus Drosophila 2081351 genusUsage: taxonkit name2taxid [flags]Flags: -h, --help help for name2taxid -i, --name-field int field index of name. data should be tab-separated (default 1) -s, --sci-name only searching scientific names -r, --show-rank show rank
Examples
Example data
$catnames.txtHomosapiensAkkermansiamuciniphilaATCCBAA-835AkkermansiamuciniphilaMouseIntracisternalA-particleWeiShenunculturedmurinelargebowelbacteriumBAC54BCroceibacterphageP2559Y
Default.
# taxonkit name2taxid names.txt$ cat names.txt | taxonkit name2taxid | csvtk pretty -H -tHomo sapiens 9606Akkermansia muciniphila ATCC BAA-835 349741Akkermansia muciniphila 239935Mouse Intracisternal A-particle 11932Wei Shen uncultured murine large bowel bacterium BAC 54B 314101Croceibacter phage P2559Y 1327037
Show rank.
$catnames.txt|taxonkitname2taxid--show-rank|csvtkpretty-H-tHomosapiens9606speciesAkkermansiamuciniphilaATCCBAA-835349741strainAkkermansiamuciniphila239935speciesMouseIntracisternalA-particle11932speciesWeiShenunculturedmurinelargebowelbacteriumBAC54B314101speciesCroceibacterphageP2559Y1327037species
From name to lineage.
$catnames.txt|taxonkitname2taxid|taxonkitlineage--taxid-field2Homosapiens9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;HomosapiensAkkermansiamuciniphilaATCCBAA-835349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835Akkermansiamuciniphila239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaMouseIntracisternalA-particle11932Viruses;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particleWeiShenunculturedmurinelargebowelbacteriumBAC54B314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54BCroceibacterphageP2559Y1327037Viruses;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y
Convert old names to new names.
$echoLactobacillusfermentum|taxonkitname2taxid|taxonkitlineage-i2-n|cut-f1,2,4Lactobacillusfermentum1613Limosilactobacillusfermentum
Some TaxIds share the same scientific names, e.g, Drosophila.
$echoDrosophila\|taxonkitname2taxid\|taxonkitlineage-i2-r\|taxonkitreformat-i3\|csvtkcut-H-t-f1,2,4,5\|csvtkpretty-H-tDrosophila7215genusEukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;Drosophila32281subgenusEukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;Drosophila2081351genusEukaryota;Basidiomycota;Agaricomycetes;Agaricales;Psathyrellaceae;Drosophila;
filterLink
Usage
Filter TaxIds by taxonomic rank rangeAttention: 1. Flag -L/--lower-than and -H/--higher-than are exclusive, and can be used along with -E/--equal-to which values can be different. 2. A list of pre-ordered ranks is in ~/.taxonkit/ranks.txt, you can use your list by -r/--rank-file, the format specification is below. 3. All ranks in taxonomy database should be defined in rank file. 4. Ranks can be removed with black list via -B/--black-list. 5. TaxIDs with no rank (those starting with ! in the rank file) are kept by default! They can be optionally discarded by -N/--discard-noranks. One exception: -N/--discard-noranks is switched on automatically when only -E/--equal-to is given and the value is not one of ranks without order ("no rank", "clade"). 6. [Recommended] When filtering with -L/--lower-than, you can use -n/--save-predictable-norank to save some special ranks without order, where rank of the closest higher node is still lower than rank cutoff.Rank file: 1. Blank lines or lines starting with "#" are ignored. 2. Ranks are in decending order and case ignored. 3. Ranks with same order should be in one line separated with comma (",", no space). 4. Ranks without order should be assigned a prefix symbol "!" for each rank.Usage: taxonkit filter [flags] Flags: -B, --black-list strings black list of ranks to discard, e.g., '-B "no rank" -B "clade" -N, --discard-noranks discard all ranks without order, type "taxonkit filter --help" for details -R, --discard-root discard root taxid, defined by --root-taxid -E, --equal-to strings output TaxIds with rank equal to some ranks, multiple values can be separated with comma "," (e.g., -E "genus,species"), or give multiple times (e.g., -E genus -E species) -h, --help help for filter -H, --higher-than string output TaxIds with rank higher than a rank, exclusive with --lower-than --list-order list user defined ranks in order, from "$HOME/.taxonkit/ranks.txt" --list-ranks list ordered ranks in taxonomy database, sorted in user defined order -L, --lower-than string output TaxIds with rank lower than a rank, exclusive with --higher-than -r, --rank-file string user-defined ordered taxonomic ranks, type "taxonkit filter --help" for details --root-taxid uint32 root taxid (default 1) -n, --save-predictable-norank do not discard some special ranks without order when using -L, where rank of the closest higher node is still lower than rank cutoff -i, --taxid-field int field index of taxid. input data should be tab-separated (default 1)
Examples
Example data
$echo349741|taxonkitlineage-t|cut-f3|sed's/;/\n/g'>taxids2.txt$cattaxids2.txt1315672178325774201203494484611647988239934239935349741$cattaxids2.txt|taxonkitlineage-r|csvtk-Htcut-f1,3,2|csvtkpretty-H-t131567cellularrootcellularorganisms2domaincellularorganisms;Bacteria1783257cladecellularorganisms;Bacteria;Pseudomonadati;PVCgroup74201phylumcellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota203494classcellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia48461ordercellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales1647988familycellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae239934genuscellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae;Akkermansia239935speciescellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila349741straincellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835
Equal to certain rank(s) (
-E/--equal-to
)$cattaxids2.txt\|taxonkitfilter-EPhylum-EClass-N\|taxonkitlineage-r\|csvtk-Htcut-f1,3,2\|csvtkpretty-H-t74201phylumcellularorganisms;Bacteria;PVCgroup;Verrucomicrobia203494classcellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae
Lower than a rank (
-L/--lower-than
)$cattaxids2.txt\|taxonkitfilter-Lgenus-N\|taxonkitlineage-r-n-L\|csvtk-Htcut-f1,3,2\|csvtkpretty-H-t239935speciesAkkermansiamuciniphila349741strainAkkermansiamuciniphilaATCCBAA-835
Higher than a rank (
-H/--higher-than
)$cattaxids2.txt\|taxonkitfilter-Hphylum-N\|taxonkitlineage-r-n-L\|csvtk-Htcut-f1,3,2\|csvtkpretty-H-t131567cellularrootcellularorganisms2domainBacteria
TaxIDs with no rank are kept by default!!! "no rank" and "clade" have no rank and can be filter out via
-N/--discard-noranks
. Futher ranks can be removed with black list via-B/--black-list
.#562istheTaxIdofEscherichiacoli$taxonkitlist--ids562\|taxonkitfilter-Lspecies\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkfreq-Ht-f2-nr\|csvtkpretty-H-tstrain2940norank486serotype176serogroup110isolate1subspecies1$taxonkitlist--ids562\|taxonkitfilter-Lspecies-N-Bstrain\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkfreq-Ht-f2-nr\|csvtkpretty-H-tserotype176serogroup110isolate1subspecies1
Combine of
-L/-H
with-E
.$cattaxids2.txt\|taxonkitfilter-Lgenus-Egenus-N\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t239934genusAkkermansia239935speciesAkkermansiamuciniphila349741strainAkkermansiamuciniphilaATCCBAA-835
Special cases of "no rank". (
-n/--save-predictable-norank
). When filtering with-L/--lower-than
, you can use-n/--save-predictable-norank
to save some special ranks without order, where rank of the closest higher node is still lower than rank cutoff.$echo-ne"2605619\n1327037\n"\|taxonkitlineage-t\|csvtkcut-Ht-f3\|csvtkunfold-Ht-f1-s";"\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t131567cellularrootcellularorganisms2domainBacteria3379134kingdomPseudomonadati1224phylumPseudomonadota1236classGammaproteobacteria91347orderEnterobacterales543familyEnterobacteriaceae561genusEscherichia562speciesEscherichiacoli2605619norankEscherichiacoliO16:H4810239acellularrootViruses2731341realmDuplodnaviria2731360kingdomHeunggongvirae2731618phylumUroviricota2731619classCaudoviricetes2788787norankunclassifiedCaudoviricetes1327037speciesCroceibacterphageP2559Y# save taxids$echo-ne"2605619\n1327037\n"\|taxonkitlineage-t\|csvtkcut-Ht-f3\|csvtkunfold-Ht-f1-s";"\|teetaxids4.txt13156721224123691347543561562260561910239273134127313602731618273161928883106991968941327037
Now, filter nodes of rank <= species.
$cattaxids4.txt\|taxonkitfilter-Lspecies-Especies-N-n\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t562speciesEscherichiacoli2605619norankEscherichiacoliO16:H481327037speciesCroceibacterphageP2559Y
Note that 2605619 (no rank) is saved because its parent node 562 is <= species.
lcaLink
Usage
Compute lowest common ancestor (LCA) for TaxIdsAttention: 1. This command computes LCA TaxId for a list of TaxIds in a field ("-i/--taxids-field) of tab-delimited file or STDIN. 2. TaxIDs should have the same separator ("-s/--separator"), single charactor separator is prefered. 3. Empty lines or lines without valid TaxIds in the field are omitted. 4. If some TaxIds are not found in database, it returns 0.Examples: $ echo 239934, 239935, 349741 | taxonkit lca -s ", " 239934, 239935, 349741 239934 $ time echo 239934 239935 349741 9606 | taxonkit lca 239934 239935 349741 9606 131567Usage: taxonkit lca [flags] Flags: -b, --buffer-size string size of line buffer, supported unit: K, M, G. You need to increase the value when "bufio.Scanner: token too long" error occured (default "1M") -h, --help help for lca --separater string separater for TaxIds. This flag is same to --separator. (default " ") -s, --separator string separator for TaxIds (default " ") -D, --skip-deleted skip deleted TaxIds and compute with left ones -U, --skip-unfound skip unfound TaxIds and compute with left ones -i, --taxids-field int field index of TaxIds. Input data should be tab-separated (default 1)
Examples:
Example data
$taxonkitlist--ids9605-nr--indent" "9605[genus]Homo9606[species]Homosapiens63221[subspecies]Homosapiensneanderthalensis741158[subspecies]Homosapienssubsp.'Denisova'1425170[species]Homoheidelbergensis2665952[norank]environmentalsamples2665953[species]Homosapiensenvironmentalsample
Simple one
$echo632212665953|taxonkitlca6322126659539605
Custom field (
-i/--taxids-field
) and separater (-s/--separator
).$echo-ne"a\t63221,2665953\nb\t63221, 741158\n"a63221,2665953b63221,741158$echo-ne"a\t63221,2665953\nb\t63221, 741158\n"\|taxonkitlca-i2-s","a63221,26659539605b63221,7411589606
Merged TaxIds.
#merged$echo924879248892489|taxonkitlca10:08:26.578[WARN]taxid92489wasmergedinto7963349248792488924891236
Deleted TaxIds, you can ommit theses and continue compute with left onces with (
-D/--skip-deleted
).$echo123|taxonkitlca10:30:17.678[WARN]taxid3notfound1230$timeecho123|taxonkitlca-D10:29:31.828[WARN]taxid3wasdeleted1231
TaxIDs not found in database, you can ommit theses and continue compute with left onces with (
-U/--skip-unfound
).$echo610216102211111111|taxonkitlca10:31:44.929[WARN]taxid11111111notfound6102161022111111110$echo610216102211111111|taxonkitlca-U10:32:02.772[WARN]taxid11111111notfound6102161022111111112628496
taxid-changelogLink
Usage
Create TaxId changelog from dump archivesAttention: 1. This command was originally designed for NCBI taxonomy, where the the TaxIds are stable. 2. For other taxonomic data created by "taxonkit create-taxdump", e.g., GTDB-taxdump, some change events might be wrong, because a) There would be dramatic changes between the two versions. b) Different taxons in multiple versions might have the same TaxIds, because we only check and eliminate taxid collision within a single version. So a single version of taxonomic data created by "taxonkit create-taxdump" has no problem, it's just the changelog might not be perfect.Steps: # dependencies: # rush - https://github.com/shenwei356/rush/ mkdir -p archive; cd archive; # --------- download --------- # option 1 # for fast network connection wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp*.zip # option 2 # for slow network connection url=https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/ wget $url -O - -o /dev/null \ | grep taxdmp | perl -ne '/(taxdmp_.+?.zip)/; print "$1\n";' \ | rush -j 2 -v url=$url 'axel -n 5 {url}/{}' \ --immediate-output -c -C download.rush # --------- unzip --------- ls taxdmp*.zip | rush -j 1 'unzip {} names.dmp nodes.dmp merged.dmp delnodes.dmp -d {@_(.+)\.}' # optionally compress .dmp files with pigz, for saving disk space fd .dmp$ | rush -j 4 'pigz {}' # --------- create log --------- cd .. taxonkit taxid-changelog -i archive -o taxid-changelog.csv.gz --verboseOutput format (CSV): # fields comments taxid # taxid version # version / time of archive, e.g, 2019-07-01 change # change, values: # NEW newly added # REUSE_DEL deleted taxids being reused # REUSE_MER merged taxids being reused # DELETE deleted # MERGE merged into another taxid # ABSORB other taxids merged into this one # CHANGE_NAME scientific name changed # CHANGE_RANK rank changed # CHANGE_LIN_LIN lineage taxids remain but lineage remain # CHANGE_LIN_TAX lineage taxids changed # CHANGE_LIN_LEN lineage length changed change-value # variable values for changes: # 1) new taxid for MERGE # 2) merged taxids for ABSORB # 3) empty for others name # scientific name rank # rank lineage # complete lineage of the taxid lineage-taxids # taxids of the lineage # you can use csvtk to investigate them. e.g., csvtk grep -f taxid -p 1390515 taxid-changelog.csv.gzUsage: taxonkit taxid-changelog [flags]Flags: -i, --archive string directory containing uncompressed dumped archives -h, --help help for taxid-changelog
Example 1 (E.coli with taxid
562
)$pigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-p562\|csvtkprettytaxidversionchangechange-valuenameranklineagelineage-taxids5622014-08-01NEWEscherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622014-08-01ABSORB662101;662104Escherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622015-11-01ABSORB1637691Escherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622016-10-01CHANGE_LIN_LINEscherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622018-06-01ABSORB469598Escherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;562# merged taxids$pigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-p662101,662104,1637691,469598\|csvtkprettytaxidversionchangechange-valuenameranklineagelineage-taxids4695982014-08-01NEWEscherichiasp.3_2_53FAAspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiasp.3_2_53FAA131567;2;1224;1236;91347;543;561;4695984695982016-10-01CHANGE_LIN_LINEscherichiasp.3_2_53FAAspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiasp.3_2_53FAA131567;2;1224;1236;91347;543;561;4695984695982018-06-01MERGE562Escherichiasp.3_2_53FAAspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiasp.3_2_53FAA131567;2;1224;1236;91347;543;561;4695986621012014-08-01MERGE5626621042014-08-01MERGE56216376912015-04-01DELETE16376912015-05-01REUSE_DELEscherichiasp.MARspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiasp.MAR131567;2;1224;1236;91347;543;561;163769116376912015-11-01MERGE562Escherichiasp.MARspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiasp.MAR131567;2;1224;1236;91347;543;561;1637691
Example 2 (SARS-CoV-2).
$timepigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-p2697049\|csvtkprettytaxidversionchangechange-valuenameranklineagelineage-taxids26970492020-02-01NEWWuhanseafoodmarketpneumoniavirusspeciesViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;unclassifiedBetacoronavirus;Wuhanseafoodmarketpneumoniavirus10239;2559587;76804;2499399;11118;2501931;694002;696098;269704926970492020-03-01CHANGE_NAMESevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-03-01CHANGE_RANKSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-03-01CHANGE_LIN_LENSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-06-01CHANGE_LIN_LENSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-07-01CHANGE_RANKSevereacuterespiratorysyndromecoronavirus2isolateViruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-08-01CHANGE_RANKSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;2697049real0m7.644suser0m16.749ssys0m3.985s
Example 3 (All subspecies and strain inAkkermansia muciniphila 239935)
#speciesinAkkermansia$taxonkitlist--show-rank--show-name--indent" "--ids239935239935[species]Akkermansiamuciniphila349741[strain]AkkermansiamuciniphilaATCCBAA-835#checkthemall$pigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-P<(taxonkitlist--indent""--ids239935)\|csvtkprettylineage-taxidstaxidversionchangechange-valuenameranklineagelineage-taxids2399352014-08-01NEWAkkermansiamuciniphilaspeciescellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Verrucomicrobiaceae;Akkermansia;Akkermansiamuciniphila131567;2;51290;74201;203494;48461;203557;239934;2399352399352015-05-01CHANGE_LIN_TAXAkkermansiamuciniphilaspeciescellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;51290;74201;203494;48461;1647988;239934;2399352399352016-03-01CHANGE_LIN_TAXAkkermansiamuciniphilaspeciescellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;1783257;74201;203494;48461;1647988;239934;2399352399352016-05-01ABSORB1834199Akkermansiamuciniphilaspeciescellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;1783257;74201;203494;48461;1647988;239934;2399353497412014-08-01NEWAkkermansiamuciniphilaATCCBAA-835norankcellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Verrucomicrobiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;51290;74201;203494;48461;203557;239934;239935;3497413497412015-05-01CHANGE_LIN_TAXAkkermansiamuciniphilaATCCBAA-835norankcellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;51290;74201;203494;48461;1647988;239934;239935;3497413497412016-03-01CHANGE_LIN_TAXAkkermansiamuciniphilaATCCBAA-835norankcellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;1783257;74201;203494;48461;1647988;239934;239935;3497413497412020-07-01CHANGE_RANKAkkermansiamuciniphilaATCCBAA-835straincellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;1783257;74201;203494;48461;1647988;239934;239935;349741
create-taxdumpLink
Usage
Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTVInput format: 0. For GTDB taxonomy file, just use --gtdb. We use the numeric assembly accession as the taxon at subspecies rank. (without the prefix GCA_ and GCF_, and version number). 1. The input file should be tab-delimited, at least one column is needed. 2. Ranks can be given either via the first row or the flag --rank-names. 3. The column containing the genome/assembly accession is recommended to generate TaxId mapping file (taxid.map, id -> taxid). -A/--field-accession, field contaning genome/assembly accession --field-accession-re, regular expression to extract the accession Note that mutiple TaxIds pointing to the same accession are listed as comma-seperated integers.Attention: 1. Duplicated taxon names wit different ranks are allowed since v0.16.0, since the rank and taxon name are contatenated for generating the TaxId. 2. The generated TaxIds are not consecutive numbers, however some tools like MMSeqs2 required this, you can use the script below for convertion: https://github.com/apcamargo/ictv-mmseqs2-protein-database/blob/master/scripts/fix_taxdump.py 3. We only check and eliminate taxid collision within a single version of taxonomy data. Therefore, if you create taxid-changelog with "taxid-changelog", different taxons in multiple versions might have the same TaxIds and some change events might be wrong. So a single version of taxonomic data created by "taxonkit create-taxdump" has no problem, it's just the changelog might not be perfect.Usage: taxonkit create-taxdump [flags]Flags: -A, --field-accession int field index of assembly accession (genome ID), for outputting taxid.map -S, --field-accession-as-subspecies treate the accession as subspecies rank --field-accession-re string regular expression to extract assembly accession (default "^(.+)$") --force overwrite existing output directory --gtdb input files are GTDB taxonomy file --gtdb-re-subs string regular expression to extract assembly accession as the subspecies (default "^\\w\\w_GC[AF]_(.+)\\.\\d+$") -h, --help help for create-taxdump --line-chunk-size int number of lines to process for each thread, and 4 threads is fast enough. (default 5000) --null strings null value of taxa (default [,NULL,NA]) -x, --old-taxdump-dir string taxdump directory of the previous version, for generating merged.dmp and delnodes.dmp -O, --out-dir string output directory -R, --rank-names strings names of all ranks, leave it empty to use the (lowercase) first row of input as rank names
Examples:
GTDB. See more: https://github.com/shenwei356/gtdb-taxdump
$taxonkitcreate-taxdump--gtdbar53_taxonomy_r207.tsv.gzbac120_taxonomy_r207.tsv.gz--out-dirtaxdump16:42:35.213[INFO]317542recordssavedtotaxdump/taxid.map16:42:35.460[INFO]401815recordssavedtotaxdump/nodes.dmp16:42:35.611[INFO]401815recordssavedtotaxdump/names.dmp16:42:35.611[INFO]0recordssavedtotaxdump/merged.dmp16:42:35.611[INFO]0recordssavedtotaxdump/delnodes.dmp
ICTV, See more: https://github.com/shenwei356/ictv-taxdump
MGV. Only Order, Family, Genus information are available.
$catmgv_contig_info.tsv\|csvtkcut-t-fictv_order,ictv_family,ictv_genus,votu_id,contig_id\|sed1d\>mgv.tsv$taxonkitcreate-taxdumpmgv.tsv--out-dirmgv--force-A5-Rorder,family,genus,species23:33:18.098[INFO]189680recordssavedtomgv/taxid.map23:33:18.131[INFO]58102recordssavedtomgv/nodes.dmp23:33:18.150[INFO]58102recordssavedtomgv/names.dmp23:33:18.150[INFO]0recordssavedtomgv/merged.dmp23:33:18.150[INFO]0recordssavedtomgv/delnodes.dmp$head-n5mgv/taxid.mapMGV-GENOME-0364295677052301MGV-GENOME-0364296677052301MGV-GENOME-03643031414406025MGV-GENOME-03643111849074420MGV-GENOME-03643122074846424$echo677052301|taxonkitlineage--data-dirmgv/677052301Caudovirales;crAss-phage;OTU-61123$echo677052301|taxonkitreformat--data-dirmgv/-I1-P677052301k__;p__;c__;o__Caudovirales;f__crAss-phage;g__;s__OTU-61123$grepMGV-GENOME-0364295mgv.tsvCaudoviralescrAss-phageNULLOTU-61123MGV-GENOME-0364295
Custom lineages with the first row as rank names and treating one column as accession.
$csvtkpretty-texample/taxonomy.tsvidsuperkingdomphylumclassorderfamilygenusspecies--------------------------------------------------------------------------------------------------------------------------------------GCF_001027105.1BacteriaFirmicutesBacilliBacillalesStaphylococcaceaeStaphylococcusStaphylococcusaureusGCF_001096185.1BacteriaFirmicutesBacilliLactobacillalesStreptococcaceaeStreptococcusStreptococcuspneumoniaeGCF_001544255.1BacteriaFirmicutesBacilliLactobacillalesEnterococcaceaeEnterococcusEnterococcusfaeciumGCF_002949675.1BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeShigellaShigelladysenteriaeGCF_002950215.1BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeShigellaShigellaflexneriGCF_006742205.1BacteriaFirmicutesBacilliBacillalesStaphylococcaceaeStaphylococcusStaphylococcusepidermidisGCF_000006945.2BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeSalmonellaSalmonellaentericaGCF_000017205.1BacteriaProteobacteriaGammaproteobacteriaPseudomonadalesPseudomonadaceaePseudomonasPseudomonasaeruginosaGCF_003697165.2BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeEscherichiaEscherichiacoliGCF_009759685.1BacteriaProteobacteriaGammaproteobacteriaMoraxellalesMoraxellaceaeAcinetobacterAcinetobacterbaumanniiGCF_000148585.2BacteriaFirmicutesBacilliLactobacillalesStreptococcaceaeStreptococcusStreptococcusmitisGCF_000392875.1BacteriaFirmicutesBacilliLactobacillalesEnterococcaceaeEnterococcusEnterococcusfaecalisGCF_000742135.1BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeKlebsiellaKlebsiellapneumonia#thefirstcolumnasaccession$taxonkitcreate-taxdump-A1example/taxonomy.tsv-Oexample/taxdump16:31:31.828[INFO]Iwillusethefirstrowofinputasranknames16:31:31.843[INFO]13recordssavedtoexample/taxdump/taxid.map16:31:31.843[INFO]39recordssavedtoexample/taxdump/nodes.dmp16:31:31.843[INFO]39recordssavedtoexample/taxdump/names.dmp16:31:31.843[INFO]0recordssavedtoexample/taxdump/merged.dmp16:31:31.843[INFO]0recordssavedtoexample/taxdump/delnodes.dmp$exportTAXONKIT_DB=example/taxdump$taxonkitlist--ids1|taxonkitfilter-Especies|taxonkitlineage-r|csvtkpretty-Ht793223984Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;Acinetobacterbaumanniispecies1220345221Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonasaeruginosaspecies561101225Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Shigella;Shigellaflexnerispecies1969112428Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Shigella;Shigelladysenteriaespecies599451526Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacolispecies2034984046Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Salmonella;Salmonellaentericaspecies1859674812Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Klebsiella;Klebsiellapneumoniaespecies773201972Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcusaureusspecies1295317147Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcusepidermidisspecies182402976Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;Enterococcusfaeciumspecies1566113429Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;Enterococcusfaecalisspecies891083107Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcuspneumoniaespecies1357145446Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcusmitisspecies$head-n3example/taxdump/taxid.mapGCF_001027105.1773201972GCF_001096185.1891083107GCF_001544255.1182402976
Custom lineages with the first row as rank names (pure lineage data)
$csvtkcut-t-f2-example/taxonomy.tsv|head-n2|csvtkpretty-tsuperkingdomphylumclassorderfamilygenusspecies-------------------------------------------------------------------------------------------BacteriaFirmicutesBacilliBacillalesStaphylococcaceaeStaphylococcusStaphylococcusaureus$csvtkcut-t-f2-example/taxonomy.tsv\|taxonkitcreate-taxdump-Oexample/taxdump216:53:08.604[INFO]Iwillusethefirstrowofinputasranknames16:53:08.614[INFO]39recordssavedtoexample/taxdump2/nodes.dmp16:53:08.614[INFO]39recordssavedtoexample/taxdump2/names.dmp16:53:08.614[INFO]0recordssavedtoexample/taxdump2/merged.dmp16:53:08.615[INFO]0recordssavedtoexample/taxdump2/delnodes.dmp$exportTAXONKIT_DB=example/taxdump2$taxonkitlist--ids1|taxonkitfilter-Especies|taxonkitlineage-r|head-n2793223984Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;Acinetobacterbaumanniispecies1220345221Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonasaeruginosaspecies
genautocompleteLink
Usage
Generate shell autocompletion scriptSupported shell: bash|zsh|fish|powershellBash: # generate completion shell taxonkit genautocomplete --shell bash # configure if never did. # install bash-completion if the "complete" command is not found. echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion echo "source ~/.bash_completion" >> ~/.bashrcZsh: # generate completion shell taxonkit genautocomplete --shell zsh --file ~/.zfunc/_taxonkit # configure if never did echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc echo "autoload -U compinit; compinit" >> ~/.zshrcfish: taxonkit genautocomplete --shell fish --file ~/.config/fish/completions/taxonkit.fishUsage: taxonkit genautocomplete [flags]Flags: --file string autocompletion file (default "/home/shenwei/.bash_completion.d/taxonkit.sh") -h, --help help for genautocomplete --type string autocompletion type (currently only bash supported) (default "bash")
profile2camiLink
Usage
Convert metagenomic profile table to CAMI formatInput format: 1. The input file should be tab-delimited 2. At least two columns needed: a) TaxId of a taxon. b) Abundance (could be percentage, automatically detected or use -p/--percentage).Attention: 0. If some TaxIds are parents of others, please switch on -S/--no-sum-up to disable summing up abundances. 1. Some TaxIds may be merged to another ones in current taxonomy version, the abundances will be summed up. 2. Some TaxIds may be deleted in current taxonomy version, the abundances can be optionally recomputed with the flag -R/--recompute-abd.Usage: taxonkit profile2cami [flags]Flags: -a, --abundance-field int field index of abundance. input data should be tab-separated (default 2) -h, --help help for profile2cami -0, --keep-zero keep taxons with abundance of zero -S, --no-sum-up do not sum up abundance from child to parent TaxIds -p, --percentage abundance is in percentage -R, --recompute-abd recompute abundance if some TaxIds are deleted in current taxonomy version -s, --sample-id string sample ID in result file -r, --show-rank strings only show TaxIds and names of these ranks (default [superkingdom,phylum,class,order,family,genus,species,strain]) -i, --taxid-field int field index of taxid. input data should be tab-separated (default 1) -t, --taxonomy-id string taxonomy ID in result file
Examples
Test data, note that
2824115
is merged to483329
and1657696
is deleted in current taxonomy version.$catexample/abundance.tsv28241150.2mergedto4833294833290.2absord28241152399350.5nochange16576960.1deleted
Example:
$taxonkitprofile2cami-ssample1-t2021-10-01\example/abundance.tsv13:17:40.552[WARN]taxidisdeletedincurrenttaxonomyversion:165769613:17:40.552[WARN]youmayrecomputedabundancewiththeflag-R/--recompute-abd@SampleID:sample1@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:2021-10-01@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria50.0000000000000002759superkingdom2759Eukaryota40.00000000000000074201phylum2|74201Bacteria|Verrucomicrobia50.0000000000000006656phylum2759|6656Eukaryota|Arthropoda40.000000000000000203494class2|74201|203494Bacteria|Verrucomicrobia|Verrucomicrobiae50.00000000000000050557class2759|6656|50557Eukaryota|Arthropoda|Insecta40.00000000000000048461order2|74201|203494|48461Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales50.0000000000000007041order2759|6656|50557|7041Eukaryota|Arthropoda|Insecta|Coleoptera40.0000000000000001647988family2|74201|203494|48461|1647988Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae50.00000000000000057514family2759|6656|50557|7041|57514Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae40.000000000000000239934genus2|74201|203494|48461|1647988|239934Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia50.00000000000000057515genus2759|6656|50557|7041|57514|57515Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus40.000000000000000239935species2|74201|203494|48461|1647988|239934|239935Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila50.000000000000000483329species2759|6656|50557|7041|57514|57515|483329Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus|Nicrophoruscarolina40.000000000000000
Recompute (normalize) the abundance
$taxonkitprofile2cami-ssample1-t2021-10-01\example/abundance.tsv--recompute-abd13:19:23.647[WARN]taxidisdeletedincurrenttaxonomyversion:1657696@SampleID:sample1@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:2021-10-01@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria55.5555555555555572759superkingdom2759Eukaryota44.44444444444445074201phylum2|74201Bacteria|Verrucomicrobia55.5555555555555576656phylum2759|6656Eukaryota|Arthropoda44.444444444444450203494class2|74201|203494Bacteria|Verrucomicrobia|Verrucomicrobiae55.55555555555555750557class2759|6656|50557Eukaryota|Arthropoda|Insecta44.44444444444445048461order2|74201|203494|48461Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales55.5555555555555577041order2759|6656|50557|7041Eukaryota|Arthropoda|Insecta|Coleoptera44.4444444444444501647988family2|74201|203494|48461|1647988Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae55.55555555555555757514family2759|6656|50557|7041|57514Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae44.444444444444450239934genus2|74201|203494|48461|1647988|239934Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia55.55555555555555757515genus2759|6656|50557|7041|57514|57515Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus44.444444444444450239935species2|74201|203494|48461|1647988|239934|239935Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila55.555555555555557483329species2759|6656|50557|7041|57514|57515|483329Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus|Nicrophoruscarolina44.444444444444450
Some abundance might have taxa where some of them are parrents of others. E.g.,
$catexample/abundance2.tsv20.9912240.5912360.2282110.412390.4910610.3927590.0196060.01
Please switch on -S/--no-sum-up to disable summing up abundances.
$taxonkitprofile2camiexample/abundance2.tsv-S@SampleID:@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria99.0000000000000002759superkingdom2759Eukaryota1.0000000000000001224phylum2|1224Bacteria|Pseudomonadota59.0000000000000001239phylum2|1239Bacteria|Bacillota40.0000000000000007711phylum2759|7711Eukaryota|Chordata1.00000000000000028211class2|1224|28211Bacteria|Pseudomonadota|Alphaproteobacteria40.00000000000000091061class2|1239|91061Bacteria|Bacillota|Bacilli39.0000000000000001236class2|1224|1236Bacteria|Pseudomonadota|Gammaproteobacteria20.00000000000000040674class2759|7711|40674Eukaryota|Chordata|Mammalia1.0000000000000009443order2759|7711|40674|9443Eukaryota|Chordata|Mammalia|Primates1.0000000000000009604family2759|7711|40674|9443|9604Eukaryota|Chordata|Mammalia|Primates|Hominidae1.0000000000000009605genus2759|7711|40674|9443|9604|9605Eukaryota|Chordata|Mammalia|Primates|Hominidae|Homo1.0000000000000009606species2759|7711|40674|9443|9604|9605|9606Eukaryota|Chordata|Mammalia|Primates|Hominidae|Homo|Homosapiens1.000000000000000
Also see https://github.com/shenwei356/sun2021-cami-profiles
cami-filterLink
Usage
Remove taxa of given TaxIds and their descendants in CAMI metagenomic profileInput format: The CAMI (Taxonomic) Profiling Output Format - https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd - One file with mutiple samples is also supported.How to: - No extra taxonomy data needed, so the original taxonomic information are used and not changed. - A mini taxonomic tree is built from records with abundance greater than zero, and only leaves are retained for later use. The rank of leaves may be "strain", "species", or "no rank". - Relative abundances (in percentage) are recomputed for all leaves (reference genome). - A new taxonomic tree is built from these leaves, and abundances are cumulatively added up from leaves to the root.Examples: 1. Remove Archaea, Bacteria, and EukaryoteS, only keep Viruses: taxonkit cami-filter -t 2,2157,2759 test.profile -o test.filter.profile 2. Remove Viruses: taxonkit cami-filter -t 10239 test.profile -o test.filter.profileUsage: taxonkit cami-filter [flags]Flags: --field-percentage int field index of PERCENTAGE (default 5) --field-rank int field index of taxid (default 2) --field-taxid int field index of taxid (default 1) --field-taxpath int field index of TAXPATH (default 3) --field-taxpathsn int field index of TAXPATHSN (default 4) -h, --help help for cami-filter --leaf-ranks strings only consider leaves at these ranks (default [species,strain,no rank]) --show-rank strings only show TaxIds and names of these ranks (default [superkingdom,phylum,class,order,family,genus,species,strain]) --taxid-sep string separator of taxid in TAXPATH and TAXPATHSN (default "|") -t, --taxids strings the parent taxid(s) to filter out -f, --taxids-file strings file(s) for the parent taxid(s) to filter out, one taxid per line
Examples:
- Remove Eukaryota
taxonkitprofile2cami-ssample1-t2021-10-01\example/abundance.tsv--recompute-abd\|taxonkitcami-filter-t2759@SampleID:sample1@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:2021-10-01@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria100.00000000000000074201phylum2|74201Bacteria|Verrucomicrobia100.000000000000000203494class2|74201|203494Bacteria|Verrucomicrobia|Verrucomicrobiae100.00000000000000048461order2|74201|203494|48461Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales100.0000000000000001647988family2|74201|203494|48461|1647988Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae100.000000000000000239934genus2|74201|203494|48461|1647988|239934Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia100.000000000000000239935species2|74201|203494|48461|1647988|239934|239935Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila100.000000000000000