Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Usage and ExamplesLink

Table of Contents

Before useLink

  1. Download and uncompresstaxdump.tar.gz: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
  2. Copynames.dmp,nodes.dmp,delnodes.dmp andmerged.dmp to data directory:$HOME/.taxonkit,e.g.,/home/shenwei/.taxonkit ,
  3. Optionally copy to some other directories, and later you can refer to using flag--data-dir,or environment variableTAXONKIT_DB.

All-in-one command:

wget -c ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz tar -zxvf taxdump.tar.gzmkdir -p $HOME/.taxonkitcp names.dmp nodes.dmp delnodes.dmp merged.dmp $HOME/.taxonkit

Update dataset: Simply re-download the taxdump files, uncompress and override old ones.

taxonkitLink

TaxonKit - A Practical and Efficient NCBI Taxonomy ToolkitVersion: 0.20.0Author: Wei Shen <shenwei356@gmail.com>Source code: https://github.com/shenwei356/taxonkitDocuments  : https://bioinf.shenwei.me/taxonkitCitation   : https://www.sciencedirect.com/science/article/pii/S1673852721000837Dataset:    Please download and uncompress "taxdump.tar.gz":    http://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz    and copy "names.dmp", "nodes.dmp", "delnodes.dmp" and "merged.dmp" to data directory:    "/home/shenwei/.taxonkit"    or some other directory, and later you can refer to using flag --data-dir,    or environment variable TAXONKIT_DB.    When environment variable TAXONKIT_DB is set, explicitly setting --data-dir will    overide the value of TAXONKIT_DB.Usage:  taxonkit [command] Available Commands:  cami-filter     Remove taxa of given TaxIds and their descendants in CAMI metagenomic profile  create-taxdump  Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTV  filter          Filter TaxIds by taxonomic rank range  genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell)  lca             Compute lowest common ancestor (LCA) for TaxIds  lineage         Query taxonomic lineage of given TaxIds  list            List taxonomic subtrees of given TaxIds  name2taxid      Convert taxon names to TaxIds  profile2cami    Convert metagenomic profile table to CAMI format  reformat        Reformat lineage in canonical ranks  reformat2       Reformat lineage in chosen ranks, allowing more ranks than 'reformat'  taxid-changelog Create TaxId changelog from dump archives  version         print version information and check for updateFlags:      --data-dir string   directory containing nodes.dmp and names.dmp (default "/home/shenwei/.taxonkit")  -h, --help              help for taxonkit      --line-buffered     use line buffering on output, i.e., immediately writing to stdin/file for                          every line of output  -o, --out-file string   out file ("-" for stdout, suffix .gz for gzipped out) (default "-")  -j, --threads int       number of CPUs. 4 is enough (default 4)      --verbose           print verbose information

listLink

Usage

List taxonomic subtrees of given TaxIdsAttention:  1. When multiple taxids are given, the output may contain duplicated records     if some taxids are descendants of others.Examples:    $ taxonkit list --ids 9606 -n -r --indent "    "    9606 [species] Homo sapiens        63221 [subspecies] Homo sapiens neanderthalensis        741158 [subspecies] Homo sapiens subsp. 'Denisova'    $ taxonkit list --ids 9606 --indent ""    9606    63221    741158    # from stdin    echo 9606 | taxonkit list    # from file    taxonkit list <(echo 9606)Usage:  taxonkit list [flags]Flags:  -h, --help            help for list  -i, --ids string      TaxId(s), multiple values should be separated by comma  -I, --indent string   indent (default "  ")  -J, --json            output in JSON format. you can save the result in file with suffix ".json" and                        open with modern text editor  -n, --show-name       output scientific name  -r, --show-rank       output rank

Examples

  1. Default usage.

    $taxonkitlist--ids9605,2399349605960663221741158142517026659522665953239934239935349741512293512294113182212626911263034167944426089151131336...
  2. Removing indent. The list could be used to extract sequences from BLAST database withblastdbcmd (seetutorial)

    $taxonkitlist--ids9605,239934--indent""96059606632217411581425170266595226659532399342399353497415122935122941131822126269112630341679444...

    Performance: Time and memory usage for whole taxon tree:

    $#emptyingthebufferscache$su-c"free && sync && echo 3 > /proc/sys/vm/drop_caches && free"$memusg-ttaxonkitlist--ids1--indent""--verbose>t0.txt21:05:01.782[INFO]parsingmergedfile:/home/shenwei/.taxonkit/names.dmp21:05:01.782[INFO]parsingnamesfile:/home/shenwei/.taxonkit/names.dmp21:05:01.782[INFO]parsingdelnodesfile:/home/shenwei/.taxonkit/names.dmp21:05:01.816[INFO]61023mergednodesparsed21:05:01.889[INFO]437929delnodesparsed21:05:03.178[INFO]2303979namesparsedelapsedtime:3.290speakrss:742.77MB
  3. Adding names

    $taxonkitlist--show-rank--show-name--indent"    "--ids9605,2399349605[genus]Homo9606[species]Homosapiens63221[subspecies]Homosapiensneanderthalensis741158[subspecies]Homosapienssubsp.'Denisova'1425170[species]Homoheidelbergensis2665952[norank]environmentalsamples2665953[species]Homosapiensenvironmentalsample239934[genus]Akkermansia239935[species]Akkermansiamuciniphila349741[strain]AkkermansiamuciniphilaATCCBAA-835512293[norank]environmentalsamples512294[species]unculturedAkkermansiasp.1131822[species]unculturedAkkermansiasp.SMG251262691[species]Akkermansiasp.CAG:3441263034[species]AkkermansiamuciniphilaCAG:1541679444[species]Akkermansiaglycaniphila2608915[norank]unclassifiedAkkermansia1131336[species]Akkermansiasp.KLE16051574264[species]Akkermansiasp.KLE1797...

    Performance: Time and memory usage for whole taxonomy tree:

    $# emptying the buffers cache$su-c"free && sync && echo 3 > /proc/sys/vm/drop_caches && free"$memusg-ttaxonkitlist--show-rank--show-name--ids1>t1.txtelapsedtime:5.341speakrss:1.04GB
  4. Output in JSON format, you can easily collapse and uncollapse taxonomy tree in modern text editor.

    $taxonkitlist--show-rank--show-name--indent"    "--ids9605,239934--json{"9605 [genus] Homo":{"9606 [species] Homo sapiens":{"63221 [subspecies] Homo sapiens neanderthalensis":{},"741158 [subspecies] Homo sapiens subsp. 'Denisova'":{}},"1425170 [species] Homo heidelbergensis":{}},"239934 [genus] Akkermansia":{"239935 [species] Akkermansia muciniphila":{"349741 [no rank] Akkermansia muciniphila ATCC BAA-835":{}},"512293 [no rank] environmental samples":{"512294 [species] uncultured Akkermansia sp.":{},"1131822 [species] uncultured Akkermansia sp. SMG25":{},"1262691 [species] Akkermansia sp. CAG:344":{},"1263034 [species] Akkermansia muciniphila CAG:154":{}},"1679444 [species] Akkermansia glycaniphila":{},"2608915 [no rank] unclassified Akkermansia":{"1131336 [species] Akkermansia sp. KLE1605":{},"1574264 [species] Akkermansia sp. KLE1797":{},"1574265 [species] Akkermansia sp. KLE1798":{},"1638783 [species] Akkermansia sp. UNK.MGS-1":{},"1755639 [species] Akkermansia sp. MC_55":{}}}}

    Snapshot of taxonomy (taxid 1) in kate:taxon.json.png

lineageLink

Usage

Query taxonomic lineage of given TaxIdsInput:  - List of TaxIds, one TaxId per line.  - Or tab-delimited format, please specify TaxId field     with flag -i/--taxid-field (default 1).  - Supporting (gzipped) file or STDIN.Output:  1. Input line data.  2. (Optional) Status code (-c/--show-status-code), values:     - "-1" for queries not found in whole database.     - "0" for deleted TaxIds, provided by "delnodes.dmp".     - New TaxIds for merged TaxIds, provided by "merged.dmp".     - Taxids for these found in "nodes.dmp".  3. Lineage, delimiter can be changed with flag -d/--delimiter.  4. (Optional) TaxIds taxons in the lineage (-t/--show-lineage-taxids)  5. (Optional) Name (-n/--show-name)  6. (Optional) Rank (-r/--show-rank)Filter out invalid and deleted taxids, and replace merged taxids with new ones:    # input is one-column-taxid    $ taxonkit lineage -c taxids.txt \        | awk '$2>0' \        | cut -f 2-    # taxids are in 3rd field in a 4-columns tab-delimited file,    # for $5, where 5 = 4 + 1.    $ cat input.txt \        | taxonkit lineage -c -i 3 \        | csvtk filter2 -H -t -f '$5>0' \        | csvtk -H -t cut -f -3Usage:  taxonkit lineage [flags]Flags:  -d, --delimiter string      field delimiter in lineage (default ";")  -h, --help                  help for lineage  -L, --no-lineage            do not show lineage, when user just want names or/and ranks  -R, --show-lineage-ranks    appending ranks of all levels  -t, --show-lineage-taxids   appending lineage consisting of taxids  -n, --show-name             appending scientific name  -r, --show-rank             appending rank of taxids  -c, --show-status-code      show status code before lineage  -i, --taxid-field int       field index of taxid. input data should be tab-separated (default 1)

Examples

  1. Full lineage:

    #notethat123124124isafaketaxid,3wasdeleted,92489,1458427weremerged$cattaxids.txt960699133766193497412399353141011193213270371231241243924891458427$taxonkitlineagetaxids.txt|teelineage.txt19:22:13.077[WARN]taxid92489wasmergedinto79633419:22:13.077[WARN]taxid1458427wasmergedinto145842519:22:13.077[WARN]taxid123124124notfound19:22:13.077[WARN]taxid3wasdeleted9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens9913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B11932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle1327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y123124124392489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae1458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraicheisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei#wrappedtablewithcsvtkpretty(>v0.26.0)$taxonkitlineagetaxids.txt|csvtkpretty-Ht-x';'-W70-Sbold┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫9913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫11932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫1327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫92489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae┣━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫1458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei┗━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
  2. Speed.

    $timeecho9606|taxonkitlineage9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiensreal0m1.190suser0m2.365ssys0m0.170s# all TaxIds$timetaxonkitlist--ids1--indent""|taxonkitlineage>treal0m4.249suser0m16.418ssys0m1.221s
  3. Checking deleted or merged taxids

    $taxonkitlineage--show-status-codetaxids.txt|teelineage.withcode.txt# valid$catlineage.withcode.txt|awk'$2 > 0'|cut-f1,296069606991399133766193766193497413497412399352399353141013141011193211932132703713270379248979633414584271458425# merged$catlineage.withcode.txt|awk'$2 > 0 && $2 != $1'|cut-f1,29248979633414584271458425# deleted$catlineage.withcode.txt|awk'$2 == 0'|cut-f13# invalid$catlineage.withcode.txt|awk'$2 < 0'|cut-f1123124124
  4. Filter out invalid and deleted taxids, and replace merged taxids with new ones, you may installcsvtk.

    #inputisone-column-taxid$taxonkitlineage-ctaxids.txt\|awk'$2>0'\|cut-f2-#taxidsarein3rdfieldina4-columnstab-delimitedfile,#for$5,where5=4+1.$catinput.txt\|taxonkitlineage-c-i3\|csvtkfilter2-H-t-f'$5>0'\|csvtk-H-tcut-f-3
  5. Only show name and rank.

    $taxonkitlineage-r-n-Ltaxids.txt\|csvtkpretty-H-t9606Homosapiensspecies9913Bostaurusspecies376619Francisellatularensissubsp.holarcticaLVSstrain349741AkkermansiamuciniphilaATCCBAA-835strain239935Akkermansiamuciniphilaspecies314101unculturedmurinelargebowelbacteriumBAC54Bspecies11932MouseIntracisternalA-particlespecies1327037CroceibacterphageP2559Yspecies123124124392489Erwiniaoleaespecies1458427Serpentinomonasraicheispecies
  6. Show lineage consisting of taxids:

    $taxonkitlineage-ttaxids.txt9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;40674;32525;9347;1437010;314146;9443;376913;314293;9526;314295;9604;207598;9605;96069913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;40674;32525;9347;1437010;314145;91561;9845;35500;9895;27592;9903;9913376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS131567;2;1224;1236;72273;34064;262;263;119857;376619349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;1783257;74201;203494;48461;1647988;239934;239935;349741239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;1783257;74201;203494;48461;1647988;239934;239935314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B131567;2;48479;31410111932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle10239;2559587;2732397;2732409;2732514;2169561;11632;35276;11749;119321327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y10239;2731341;2731360;2731618;2731619;28883;10699;196894;1327037123124124392489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae131567;2;1224;1236;91347;1903409;551;7963341458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei131567;2;1224;28216;80840;80864;2490452;1458425

    or read taxids from STDIN:

    $cattaxids.txt|taxonkitlineage
  7. And ranks of all nodes:

    $echo2697049\|taxonkitlineage-t-R\|csvtktranspose-Ht2697049Viruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;2697049superkingdom;clade;kingdom;phylum;class;order;suborder;family;subfamily;genus;subgenus;species;norank

    Another way to show lineage detail of a TaxId

    $echo2697049\|taxonkitlineage-t\|csvtkcut-Ht-f3\|csvtkunfold-Ht-f1-s";"\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t10239superkingdomViruses2559587cladeRiboviria2732396kingdomOrthornavirae2732408phylumPisuviricota2732506classPisoniviricetes76804orderNidovirales2499399suborderCornidovirineae11118familyCoronaviridae2501931subfamilyOrthocoronavirinae694002genusBetacoronavirus2509511subgenusSarbecovirus694009speciesSevereacuterespiratorysyndrome-relatedcoronavirus2697049norankSevereacuterespiratorysyndromecoronavirus2

reformatLink

Usage

Reformat lineage in canonical ranksWarning:  - 'taxonkit reformat2' is recommended since Match 2025 when NCBI made    big changes to ranks.    See more: https://ncbiinsights.ncbi.nlm.nih.gov/2025/02/27/new-ranks-ncbi-taxonomy/Input:  - List of TaxIds or lineages, one record per line.    The lineage can be a complete lineage or only one taxonomy name.  - Or tab-delimited format.    Plese specify the lineage field with flag -i/--lineage-field (default 2).    Or specify the TaxId field with flag -I/--taxid-field (default 0),    which overrides -i/--lineage-field.  - Supporting (gzipped) file or STDIN.Output:  1. Input line data.  2. Reformated lineage.  3. (Optional) TaxIds taxons in the lineage (-t/--show-lineage-taxids)Ambiguous names:  - Some TaxIds have the same complete lineage, empty result is returned     by default. You can use the flag -a/--output-ambiguous-result to    return one possible resultOutput format can be formated by flag --format, available placeholders:    {C}: cellular root    {a}: acellular root    {r}: realm    {d}: domain    {k}: superkingdom    {K}: kingdom    {p}: phylum    {c}: class    {o}: order    {f}: family    {g}: genus    {s}: species    {t}: subspecies/strain    {S}: subspecies    {T}: strainWhen these're no nodes of rank "subspecies" nor "strain",you can switch on -S/--pseudo-strain to use the node with lowest rankas subspecies/strain name, if which rank is lower than "species". This flag affects {t}, {S}, {T}.Output format can contains some escape charactors like "\t".Usage:  taxonkit reformat [flags] Flags:  -P, --add-prefix                     add prefixes for all ranks, single prefix for a rank is defined                                       by flag --prefix-X  -d, --delimiter string               field delimiter in input lineage (default ";")  -F, --fill-miss-rank                 fill missing rank with lineage information of the next higher rank  -f, --format string                  output format, placeholders of rank are needed (default                                       "{k};{p};{c};{o};{f};{g};{s}")  -h, --help                           help for reformat  -i, --lineage-field int              field index of lineage. data should be tab-separated (default 2)  -r, --miss-rank-repl string          replacement string for missing rank  -p, --miss-rank-repl-prefix string   prefix for estimated taxon names (default "unclassified ")  -s, --miss-rank-repl-suffix string   suffix for estimated taxon names. "rank" for rank name, "" for no                                       suffix (default "rank")  -R, --miss-taxid-repl string         replacement string for missing taxid  -a, --output-ambiguous-result        output one of the ambigous result      --prefix-C string                prefix for cellular root, used along with flag -P/--add-prefix                                       (default "d__")      --prefix-K string                prefix for kingdom, used along with flag -P/--add-prefix (default                                       "K__")      --prefix-S string                prefix for subspecies, used along with flag -P/--add-prefix                                       (default "S__")      --prefix-T string                prefix for strain, used along with flag -P/--add-prefix (default                                       "T__")      --prefix-a string                prefix for acellular root, used along with flag -P/--add-prefix                                       (default "d__")      --prefix-c string                prefix for class, used along with flag -P/--add-prefix (default "c__")      --prefix-d string                prefix for domain, used along with flag -P/--add-prefix (default                                       "d__")      --prefix-f string                prefix for family, used along with flag -P/--add-prefix (default                                       "f__")      --prefix-g string                prefix for genus, used along with flag -P/--add-prefix (default "g__")      --prefix-k string                prefix for superkingdom, used along with flag -P/--add-prefix                                       (default "k__")      --prefix-o string                prefix for order, used along with flag -P/--add-prefix (default "o__")      --prefix-p string                prefix for phylum, used along with flag -P/--add-prefix (default                                       "p__")      --prefix-r string                prefix for realm, used along with flag -P/--add-prefix (default "r__")      --prefix-s string                prefix for species, used along with flag -P/--add-prefix (default                                       "s__")      --prefix-t string                prefix for subspecies/strain, used along with flag                                       -P/--add-prefix (default "t__")  -S, --pseudo-strain                  use the node with lowest rank as strain name, only if which rank                                       is lower than "species" and not "subpecies" nor "strain". It                                       affects {t}, {S}, {T}. This flag needs flag -F  -t, --show-lineage-taxids            show corresponding taxids of reformated lineage  -I, --taxid-field int                field index of taxid. input data should be tab-separated. it                                       overrides -i/--lineage-field  -T, --trim                           do not fill or add prefix for missing rank lower than current rank

Examples:

  1. For version > 0.8.0,reformat accept input of TaxIds via flag-I/--taxid-field.

    $echo239935|taxonkitreformat-I1239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila$echo349741|taxonkitreformat-I1-f"{k}|{p}|{c}|{o}|{f}|{g}|{s}|{t}"-F-t349741Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila|AkkermansiamuciniphilaATCCBAA-8352|74201|203494|48461|1647988|239934|239935|349741
  2. Example lineage (produced by:taxonkit lineage taxids.txt | awk '$2!=""' > lineage.txt).

    $catlineage.txt9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homosapiens9913cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bostaurus376619cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis;Francisellatularensissubsp.holarctica;Francisellatularensissubsp.holarcticaLVS349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54B11932Viruses;Riboviria;Pararnavirae;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particle1327037Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y92489cellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae1458427cellularorganisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei
  3. Default output format ("{k};{p};{c};{o};{f};{g};{s}").

    #reformatedlineagesareappendedtotheinputdata$taxonkitreformatlineage.txt...239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila...$$taxonkitreformatlineage.txt|teelineage.txt.reformat$cut-f1,3lineage.txt.reformat9606Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homosapiens9913Eukaryota;Chordata;Mammalia;Artiodactyla;Bovidae;Bos;Bostaurus376619Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisellatularensis349741Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila314101Bacteria;;;;;;unculturedmurinelargebowelbacteriumBAC54B11932Viruses;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;IntracisternalA-particles;MouseIntracisternalA-particle1327037Viruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;;CroceibacterphageP2559Y92489Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwiniaoleae1458427Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei#aligned$catlineage.txt\|taxonkitreformat\|csvtk-H-tcut-f1,3\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,kindom,phylum,class,order,family,genus,species\|csvtkpretty-ttaxidkindomphylumclassorderfamilygenusspecies------------------------------------------------------------------------------------------------------------------------------------------------------------9606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomosapiens9913EukaryotaChordataMammaliaArtiodactylaBovidaeBosBostaurus376619BacteriaProteobacteriaGammaproteobacteriaThiotrichalesFrancisellaceaeFrancisellaFrancisellatularensis349741BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila239935BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila314101BacteriaunculturedmurinelargebowelbacteriumBAC54B11932VirusesArtverviricotaRevtraviricetesOrterviralesRetroviridaeIntracisternalA-particlesMouseIntracisternalA-particle1327037VirusesUroviricotaCaudoviricetesCaudoviralesSiphoviridaeCroceibacterphageP2559Y92489BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesErwiniaceaeErwiniaErwiniaoleae1458427BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaeSerpentinomonasSerpentinomonasraichei
  4. Andsubspecies/strain ({t}),subspecies ({S}), andstrain ({T}) are also available.

    #defaultoperation$echo-ne"239935\n83333\n1408252\n2697049\n2605619\n"\|taxonkitlineage-n-r\|taxonkitreformat-f'{t};{S};{T}'\|csvtk-H-tcut-f1,4,3,5\|csvtk-H-tsep-f4-s';'-R\|csvtk-H-tadd-header-n"taxid,rank,name,subspecies/strain,subspecies,strain"\|csvtkpretty-ttaxidranknamesubspecies/strainsubspeciesstrain-------------------------------------------------------------------------------------------------------------------------------239935speciesAkkermansiamuciniphila83333strainEscherichiacoliK-12EscherichiacoliK-12EscherichiacoliK-121408252subspeciesEscherichiacoliR178EscherichiacoliR178EscherichiacoliR1782697049norankSevereacuterespiratorysyndromecoronavirus22605619norankEscherichiacoliO16:H48#fillmissingranks#seeexamplebelowfor-F/--fill-miss-rank#$echo-ne"239935\n83333\n1408252\n2697049\n2605619\n"\|taxonkitlineage-n-r\|taxonkitreformat-f'{t};{S};{T}'--fill-miss-rank\|csvtk-H-tcut-f1,4,3,5\|csvtk-H-tsep-f4-s';'-R\|csvtk-H-tadd-header-n"taxid,rank,name,subspecies/strain,subspecies,strain"\|csvtkpretty-ttaxidranknamesubspecies/strainsubspeciesstrain----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------239935speciesAkkermansiamuciniphilaunclassifiedAkkermansiamuciniphilasubspecies/strainunclassifiedAkkermansiamuciniphilasubspeciesunclassifiedAkkermansiamuciniphilastrain83333strainEscherichiacoliK-12EscherichiacoliK-12unclassifiedEscherichiacolisubspeciesEscherichiacoliK-121408252subspeciesEscherichiacoliR178EscherichiacoliR178EscherichiacoliR178unclassifiedEscherichiacoliR178strain2697049norankSevereacuterespiratorysyndromecoronavirus2unclassifiedSevereacuterespiratorysyndrome-relatedcoronavirussubspecies/strainunclassifiedSevereacuterespiratorysyndrome-relatedcoronavirussubspeciesunclassifiedSevereacuterespiratorysyndrome-relatedcoronavirusstrain2605619norankEscherichiacoliO16:H48unclassifiedEscherichiacolisubspecies/strainunclassifiedEscherichiacolisubspeciesunclassifiedEscherichiacolistrain
  5. When these's no nodes of rank "subspecies" nor "strain", you can switch-S/--pseudo-strain to use the node with lowest rank as subspecies/strain name, if which rank is lower than "species". Recommend using v0.14.1 or later versions.

    $echo-ne"239935\n83333\n1408252\n2697049\n2605619\n"\|taxonkitlineage-n-r\|taxonkitreformat-f'{t};{S};{T}'--pseudo-strain\|csvtk-H-tcut-f1,4,3,5\|csvtk-H-tsep-f4-s';'-R\|csvtk-H-tadd-header-n"taxid,rank,name,subspecies/strain,subspecies,strain"\|csvtkpretty-ttaxidranknamesubspecies/strainsubspeciesstrain-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------239935speciesAkkermansiamuciniphila83333strainEscherichiacoliK-12EscherichiacoliK-12EscherichiacoliK-121408252subspeciesEscherichiacoliR178EscherichiacoliR178EscherichiacoliR1782697049norankSevereacuterespiratorysyndromecoronavirus2Severeacuterespiratorysyndromecoronavirus2Severeacuterespiratorysyndromecoronavirus2Severeacuterespiratorysyndromecoronavirus22605619norankEscherichiacoliO16:H48EscherichiacoliO16:H48EscherichiacoliO16:H48EscherichiacoliO16:H48
  6. Add prefix (-P/--add-prefix).

    $catlineage.txt\|taxonkitreformat-P\|csvtk-H-tcut-f1,39606k__Eukaryota;p__Chordata;c__Mammalia;o__Primates;f__Hominidae;g__Homo;s__Homosapiens9913k__Eukaryota;p__Chordata;c__Mammalia;o__Artiodactyla;f__Bovidae;g__Bos;s__Bostaurus376619k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Thiotrichales;f__Francisellaceae;g__Francisella;s__Francisellatularensis349741k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Akkermansiaceae;g__Akkermansia;s__Akkermansiamuciniphila239935k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Akkermansiaceae;g__Akkermansia;s__Akkermansiamuciniphila314101k__Bacteria;p__;c__;o__;f__;g__;s__unculturedmurinelargebowelbacteriumBAC54B11932k__Viruses;p__Artverviricota;c__Revtraviricetes;o__Ortervirales;f__Retroviridae;g__IntracisternalA-particles;s__MouseIntracisternalA-particle1327037k__Viruses;p__Uroviricota;c__Caudoviricetes;o__Caudovirales;f__Siphoviridae;g__;s__CroceibacterphageP2559Y92489k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Erwiniaceae;g__Erwinia;s__Erwiniaoleae1458427k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Serpentinomonas;s__Serpentinomonasraichei
  7. Show corresponding taxids of reformated lineage (flag-t/--show-lineage-taxids)

    $catlineage.txt\|taxonkitreformat-t\|csvtk-H-tcut-f1,4\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,kindom,phylum,class,order,family,genus,species\|csvtkpretty-ttaxidkindomphylumclassorderfamilygenusspecies-------------------------------------------------------960627597711406749443960496059606991327597711406749156198959903991337661921224123672273340642622633497412742012034944846116479882399342399352399352742012034944846116479882399342399353141012314101119321023927324092732514216956111632117491193213270371023927316182731619288831069913270379248921224123691347190340955179633414584272122428216808408086424904521458425# both node name and taxidsecho562\|taxonkitreformat-I1-t\|csvtk-H-tsep-f2-s';'-R\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-n"taxid,kingdom,phylum,class,order,family,genus,species,kingdom_taxid,phylum_taxid,class_taxid,order_taxid,family_taxid,genus_taxid,species_taxid"\|csvtkpretty-ttaxidkingdomphylumclassorderfamilygenusspecieskingdom_taxidphylum_taxidclass_taxidorder_taxidfamily_taxidgenus_taxidspecies_taxid----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------562BacteriaPseudomonadotaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeEscherichiaEscherichiacoli21224123691347543561562
  8. Use custom symbols for unclassfied ranks (-r/--miss-rank-repl)

    $taxonkitreformatlineage.txt-r"__"|cut-f3Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;HomosapiensEukaryota;Chordata;Mammalia;Artiodactyla;Bovidae;Bos;BostaurusBacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;FrancisellatularensisBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;__;__;__;__;__;unculturedmurinelargebowelbacteriumBAC54BViruses;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;IntracisternalA-particles;MouseIntracisternalA-particleViruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;__;CroceibacterphageP2559YBacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;ErwiniaoleaeBacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei$taxonkitreformatlineage.txt-rUnassigned|cut-f3Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;HomosapiensEukaryota;Chordata;Mammalia;Artiodactyla;Bovidae;Bos;BostaurusBacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;FrancisellatularensisBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaBacteria;Unassigned;Unassigned;Unassigned;Unassigned;Unassigned;unculturedmurinelargebowelbacteriumBAC54BViruses;Artverviricota;Revtraviricetes;Ortervirales;Retroviridae;IntracisternalA-particles;MouseIntracisternalA-particleViruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;Unassigned;CroceibacterphageP2559YBacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;ErwiniaoleaeBacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Serpentinomonas;Serpentinomonasraichei
  9. Estimate and fill missing rank with original lineage information (-F, --fill-miss-rank, very useful for formatting input data forLEfSe). You can change the prefix "unclassified" using flag-p/--miss-rank-repl-prefix.

    $catlineage.txt\|taxonkitreformat-F\|csvtk-H-tcut-f1,3\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,kindom,phylum,class,order,family,genus,species\|csvtkpretty-ttaxidkindomphylumclassorderfamilygenusspecies------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------9606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomosapiens9913EukaryotaChordataMammaliaArtiodactylaBovidaeBosBostaurus376619BacteriaProteobacteriaGammaproteobacteriaThiotrichalesFrancisellaceaeFrancisellaFrancisellatularensis349741BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila239935BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiaAkkermansiamuciniphila314101BacteriaunclassifiedBacteriaphylumunclassifiedBacteriaclassunclassifiedBacteriaorderunclassifiedBacteriafamilyunclassifiedBacteriagenusunculturedmurinelargebowelbacteriumBAC54B11932VirusesArtverviricotaRevtraviricetesOrterviralesRetroviridaeIntracisternalA-particlesMouseIntracisternalA-particle1327037VirusesUroviricotaCaudoviricetesCaudoviralesSiphoviridaeunclassifiedSiphoviridaegenusCroceibacterphageP2559Y92489BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesErwiniaceaeErwiniaErwiniaoleae1458427BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaeSerpentinomonasSerpentinomonasraichei

    Do not add prefix or suffix for estimated nodes:

    $echo314101|taxonkitreformat-I1314101Bacteria;;;;;;unculturedmurinelargebowelbacteriumBAC54B$echo314101|taxonkitreformat-I1-F-p""-s""314101Bacteria;Bacteria;Bacteria;Bacteria;Bacteria;Bacteria;unculturedmurinelargebowelbacteriumBAC54B
  10. Only some ranks.

    $catlineage.txt\|taxonkitreformat-F-f"{s};{p}"\|csvtk-H-tcut-f1,3\|csvtk-H-tsep-f2-s';'-R\|csvtkadd-header-t-ntaxid,species,phylum\|csvtkpretty-ttaxidspeciesphylum----------------------------------------------------------------------------------9606HomosapiensChordata9913BostaurusChordata376619FrancisellatularensisProteobacteria349741AkkermansiamuciniphilaVerrucomicrobia239935AkkermansiamuciniphilaVerrucomicrobia314101unculturedmurinelargebowelbacteriumBAC54BunclassifiedBacteriaphylum11932MouseIntracisternalA-particleArtverviricota1327037CroceibacterphageP2559YUroviricota92489ErwiniaoleaeProteobacteria1458427SerpentinomonasraicheiProteobacteria
  11. For some taxids which rank is higher than the lowest rank in-f/--format, use-T/--trim to avoid fill missing rank lower than current rank.

    $echo-ne"2\n239934\n239935\n"\|taxonkitlineage\|taxonkitreformat-F\|sed-r"s/;+$//"\|csvtk-H-tcut-f1,32Bacteria;unclassifiedBacteriaphylum;unclassifiedBacteriaclass;unclassifiedBacteriaorder;unclassifiedBacteriafamily;unclassifiedBacteriagenus;unclassifiedBacteriaspecies239934Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;unclassifiedAkkermansiaspecies239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila$echo-ne"2\n239934\n239935\n"\|taxonkitlineage\|taxonkitreformat-F-T\|sed-r"s/;+$//"\|csvtk-H-tcut-f1,32Bacteria239934Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia239935Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila
  12. Support tab in format string

    $echo9606\|taxonkitlineage\|taxonkitreformat-f"{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{S}"\|csvtkcut-t-f-29606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomosapiens
  13. List seven-level lineage for all TaxIds.

    #replaceemptytaxonwith"Unassigned"$taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-rUnassigned|gzip-c>all.lineage.tsv.gz#tab-delimitedseven-levels$taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-rUnassigned-f"{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}"\|csvtkcut-H-t-f-2\|head-n5\|csvtkpretty-H-t#8-level$taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-rUnassigned-f"{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{t}"\|csvtkcut-H-t-f-2\|head-n5\|csvtkpretty-H-t#Fillandtrim$memusg-t-s'taxonkitlist--ids1\|taxonkitlineage\|taxonkitreformat-F-T\|sed-r"s/;+$//"\|gzip-c>all.lineage.tsv.gz'elapsedtime:19.930speakrss:6.25GB
  14. From taxid to 7-ranks lineage:

    $cattaxids.txt|taxonkitlineage|taxonkitreformat# for taxonkit v0.8.0 or later versions$cattaxids.txt|taxonkitreformat-I1
  15. Some TaxIds have the same complete lineage, empty result is returned by default. You can use the flag-a/--output-ambiguous-result to return one possible result.see #42

    $echo-ne"2507530\n2516889\n"|taxonkitlineage--data-dir.|taxonkitreformat--data-dir.-t19:18:29.770[WARN]wecan't distinguish the TaxIds (2507530, 2516889) for lineage: cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019. But you can use -a/--output-ambiguous-result to return one possible result19:18:29.770 [WARN] we can'tdistinguishtheTaxIds(2507530,2516889)forlineage:cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019.Butyoucanuse-a/--output-ambiguous-resulttoreturnonepossibleresult2507530cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-20192516889cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019$echo-ne"2507530\n2516889\n"|taxonkitlineage--data-dir.|taxonkitreformat--data-dir.-t-a2507530cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russulasp.8KA-20192759;5204;155619;452342;5401;5402;25075302516889cellularorganisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetesincertaesedis;Russulales;Russulaceae;Russula;unclassifiedRussula;Russulasp.8KA-2019Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russulasp.8KA-20192759;5204;155619;452342;5401;5402;2507530

reformat2Link

Usage

Reformat lineage in chosen ranks, allowing more ranks than 'reformat'Input:  - List of TaxIds, one record per line.  - Or tab-delimited format.    Please specify the TaxId field with flag -I/--taxid-field (default 1)  - Supporting (gzipped) file or STDIN.Output:  1. Input line data.  2. Reformated lineage.  3. (Optional) TaxIds taxons in the lineage (-t/--show-lineage-taxids)Output format:  1. It can contain some escape characters like "\t".  2. You can use "|" to set multiple ranks, and the first valid one will be outputted.     This is useful for a rank with different rank names, especially since NCBI     made big changes to some ranks in March 2025:        - "Domain" replaces "superkingdom" for Archaea, Bacteria, and Eukaryota        - "Acellular root" replaces "superkingdom" for Viruses        - Six viral groups are designated with the new rank "realm", the equivalent of "domain"     So, we can use "{domain|acellular root|superkingdom}" to handle all these cases     and keep compatible with old taxonomy data.       $ echo -ne "Eukaryota\nBacteria\nViruses\n" \           | taxonkit name2taxid -s -r \           | taxonkit reformat2 -I 2 -f "{domain|acellular root|superkingdom}" \           | csvtk add-header -Ht -n name,taxid,rank,kingdom/domain \           | csvtk pretty -t       name        taxid   rank             kingdom/domain       ---------   -----   --------------   --------------       Eukaryota   2759    domain           Eukaryota       Bacteria    2       domain           Bacteria       Viruses     10239   acellular root   Viruses     Another example is for subspecies nodes, the rank might be "subpecies", "strain", or "no rank".     For example,       $ echo -ne "562\n83333\n2697049\n" \          | taxonkit lineage -L -r \          | taxonkit reformat2 -f "{species};{strain|subspecies|no rank}"       562     species Escherichia coli;       83333   strain  Escherichia coli;Escherichia coli K-12       2697049 no rank Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2Differences from 'taxonkit reformat':  - [input] only accept TaxIDs  - [format] accept more rank place holders, not just the seven canonical ones.  - [format] use the full name of ranks, such as "{species}", rather than "{s}"  - [format] support multiple ranks in one place holder, such as "{subspecies|strain}"  - do not automatically add prefixes, but you can simply set them in the formatUsage:  taxonkit reformat2 [flags] Flags:  -f, --format string            output format, placeholders of rank are needed (default                                 "{domain|acellular                                 root|superkingdom};{phylum};{class};{order};{family};{genus};{species}")  -h, --help                     help for reformat2  -r, --miss-rank-repl string    replacement string for missing rank  -R, --miss-taxid-repl string   replacement string for missing taxid  -B, --no-ranks strings         rank names of no-rank. A lineage might have many "no rank" ranks, we                                 only keep the last one below known ranks (default [no rank,clade])  -t, --show-lineage-taxids      show corresponding taxids of reformated lineage  -I, --taxid-field int          field index of taxid. input data should be tab-separated. it overrides                                 -i/--lineage-field (default 1)  -T, --trim                     do not replace missing ranks lower than the rank of the current node

Examples

  1. Default format

    $echo562|taxonkitreformat2562Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli
  2. Change the format

    $echo562|taxonkitreformat2-f"g__{genus}\ts__{species}"562g__Escherichias__Escherichiacoli
  3. Subspecies

    $echo511145|taxonkitlineage511145cellularorganisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli;EscherichiacoliK-12;Escherichiacolistr.K-12substr.MG1655$echo511145|taxonkitreformat-I1-f"{d};{p};{c};{o};{f};{g};{s};{t}"511145Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli;EscherichiacoliK-12$echo511145|taxonkitreformat2-I1-f"{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species};{subspecies|strain|no rank}"511145Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli;EscherichiacoliK-12
  4. Trim

    $echo561|taxonkitreformat2-runknown561Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;unknown$echo561|taxonkitreformat2-runknown-T561Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;# -----------------------------------------------------------# another example where the order rank is missing$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species}"102403Eukaryota;Mollusca;Bivalvia;0;Poromyidae;Tropidomya;0$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order};{family};{genus};{species}"-T102403Eukaryota;Mollusca;Bivalvia;0;Poromyidae;Tropidomya;# and now, the lowest rank in the output format is order, but the tailing "0" is not trimmed.$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order}"102403Eukaryota;Mollusca;Bivalvia;0$echo102403|taxonkitreformat2-I1-r"0"-f"{domain|acellular root|superkingdom};{phylum};{class};{order}"-T102403Eukaryota;Mollusca;Bivalvia;0

name2taxidLink

Usage

Convert taxon names to TaxIdsAttention:  1. Some TaxIds share the same names, e.g, Drosophila.     These input lines are duplicated with multiple TaxIds.    $ echo Drosophila | taxonkit name2taxid | taxonkit lineage -i 2 -r -L    Drosophila      7215    genus    Drosophila      32281   subgenus    Drosophila      2081351 genusUsage:  taxonkit name2taxid [flags]Flags:  -h, --help             help for name2taxid  -i, --name-field int   field index of name. data should be tab-separated (default 1)  -s, --sci-name         only searching scientific names  -r, --show-rank        show rank

Examples

Example data

$catnames.txtHomosapiensAkkermansiamuciniphilaATCCBAA-835AkkermansiamuciniphilaMouseIntracisternalA-particleWeiShenunculturedmurinelargebowelbacteriumBAC54BCroceibacterphageP2559Y
  1. Default.

    # taxonkit name2taxid names.txt$ cat names.txt | taxonkit name2taxid | csvtk pretty -H -tHomo sapiens                                      9606Akkermansia muciniphila ATCC BAA-835              349741Akkermansia muciniphila                           239935Mouse Intracisternal A-particle                   11932Wei Shen                                          uncultured murine large bowel bacterium BAC 54B   314101Croceibacter phage P2559Y                         1327037
  2. Show rank.

    $catnames.txt|taxonkitname2taxid--show-rank|csvtkpretty-H-tHomosapiens9606speciesAkkermansiamuciniphilaATCCBAA-835349741strainAkkermansiamuciniphila239935speciesMouseIntracisternalA-particle11932speciesWeiShenunculturedmurinelargebowelbacteriumBAC54B314101speciesCroceibacterphageP2559Y1327037species
  3. From name to lineage.

    $catnames.txt|taxonkitname2taxid|taxonkitlineage--taxid-field2Homosapiens9606cellularorganisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;HomosapiensAkkermansiamuciniphilaATCCBAA-835349741cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835Akkermansiamuciniphila239935cellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;AkkermansiamuciniphilaMouseIntracisternalA-particle11932Viruses;Ortervirales;Retroviridae;unclassifiedRetroviridae;IntracisternalA-particles;MouseIntracisternalA-particleWeiShenunculturedmurinelargebowelbacteriumBAC54B314101cellularorganisms;Bacteria;environmentalsamples;unculturedmurinelargebowelbacteriumBAC54BCroceibacterphageP2559Y1327037Viruses;Caudovirales;Siphoviridae;unclassifiedSiphoviridae;CroceibacterphageP2559Y
  4. Convert old names to new names.

    $echoLactobacillusfermentum|taxonkitname2taxid|taxonkitlineage-i2-n|cut-f1,2,4Lactobacillusfermentum1613Limosilactobacillusfermentum
  5. Some TaxIds share the same scientific names, e.g, Drosophila.

    $echoDrosophila\|taxonkitname2taxid\|taxonkitlineage-i2-r\|taxonkitreformat-i3\|csvtkcut-H-t-f1,2,4,5\|csvtkpretty-H-tDrosophila7215genusEukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;Drosophila32281subgenusEukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;Drosophila2081351genusEukaryota;Basidiomycota;Agaricomycetes;Agaricales;Psathyrellaceae;Drosophila;

filterLink

Usage

Filter TaxIds by taxonomic rank rangeAttention:  1. Flag -L/--lower-than and -H/--higher-than are exclusive, and can be     used along with -E/--equal-to which values can be different.  2. A list of pre-ordered ranks is in ~/.taxonkit/ranks.txt, you can use     your list by -r/--rank-file, the format specification is below.  3. All ranks in taxonomy database should be defined in rank file.  4. Ranks can be removed with black list via -B/--black-list.  5. TaxIDs with no rank (those starting with ! in the rank file) are kept     by default! They can be optionally discarded by -N/--discard-noranks.     One exception: -N/--discard-noranks is switched on automatically     when only -E/--equal-to is given and the value is not one of ranks     without order ("no rank", "clade").  6. [Recommended] When filtering with -L/--lower-than, you can use    -n/--save-predictable-norank to save some special ranks without order,    where rank of the closest higher node is still lower than rank cutoff.Rank file:  1. Blank lines or lines starting with "#" are ignored.  2. Ranks are in decending order and case ignored.  3. Ranks with same order should be in one line separated with comma (",", no space).  4. Ranks without order should be assigned a prefix symbol "!" for each rank.Usage:  taxonkit filter [flags] Flags:  -B, --black-list strings        black list of ranks to discard, e.g., '-B "no rank" -B "clade"  -N, --discard-noranks           discard all ranks without order, type "taxonkit filter --help" for details  -R, --discard-root              discard root taxid, defined by --root-taxid  -E, --equal-to strings          output TaxIds with rank equal to some ranks, multiple values can be                                  separated with comma "," (e.g., -E "genus,species"), or give multiple                                  times (e.g., -E genus -E species)  -h, --help                      help for filter  -H, --higher-than string        output TaxIds with rank higher than a rank, exclusive with --lower-than      --list-order                list user defined ranks in order, from "$HOME/.taxonkit/ranks.txt"      --list-ranks                list ordered ranks in taxonomy database, sorted in user defined order  -L, --lower-than string         output TaxIds with rank lower than a rank, exclusive with --higher-than  -r, --rank-file string          user-defined ordered taxonomic ranks, type "taxonkit filter --help"                                  for details      --root-taxid uint32         root taxid (default 1)  -n, --save-predictable-norank   do not discard some special ranks without order when using -L, where                                  rank of the closest higher node is still lower than rank cutoff  -i, --taxid-field int           field index of taxid. input data should be tab-separated (default 1)

Examples

  1. Example data

    $echo349741|taxonkitlineage-t|cut-f3|sed's/;/\n/g'>taxids2.txt$cattaxids2.txt1315672178325774201203494484611647988239934239935349741$cattaxids2.txt|taxonkitlineage-r|csvtk-Htcut-f1,3,2|csvtkpretty-H-t131567cellularrootcellularorganisms2domaincellularorganisms;Bacteria1783257cladecellularorganisms;Bacteria;Pseudomonadati;PVCgroup74201phylumcellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota203494classcellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia48461ordercellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales1647988familycellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae239934genuscellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae;Akkermansia239935speciescellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila349741straincellularorganisms;Bacteria;Pseudomonadati;PVCgroup;Verrucomicrobiota;Verrucomicrobiia;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835
  2. Equal to certain rank(s) (-E/--equal-to)

    $cattaxids2.txt\|taxonkitfilter-EPhylum-EClass-N\|taxonkitlineage-r\|csvtk-Htcut-f1,3,2\|csvtkpretty-H-t74201phylumcellularorganisms;Bacteria;PVCgroup;Verrucomicrobia203494classcellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae
  3. Lower than a rank (-L/--lower-than)

    $cattaxids2.txt\|taxonkitfilter-Lgenus-N\|taxonkitlineage-r-n-L\|csvtk-Htcut-f1,3,2\|csvtkpretty-H-t239935speciesAkkermansiamuciniphila349741strainAkkermansiamuciniphilaATCCBAA-835
  4. Higher than a rank (-H/--higher-than)

    $cattaxids2.txt\|taxonkitfilter-Hphylum-N\|taxonkitlineage-r-n-L\|csvtk-Htcut-f1,3,2\|csvtkpretty-H-t131567cellularrootcellularorganisms2domainBacteria
  5. TaxIDs with no rank are kept by default!!! "no rank" and "clade" have no rank and can be filter out via-N/--discard-noranks. Futher ranks can be removed with black list via-B/--black-list.

    #562istheTaxIdofEscherichiacoli$taxonkitlist--ids562\|taxonkitfilter-Lspecies\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkfreq-Ht-f2-nr\|csvtkpretty-H-tstrain2940norank486serotype176serogroup110isolate1subspecies1$taxonkitlist--ids562\|taxonkitfilter-Lspecies-N-Bstrain\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkfreq-Ht-f2-nr\|csvtkpretty-H-tserotype176serogroup110isolate1subspecies1
  6. Combine of-L/-H with-E.

    $cattaxids2.txt\|taxonkitfilter-Lgenus-Egenus-N\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t239934genusAkkermansia239935speciesAkkermansiamuciniphila349741strainAkkermansiamuciniphilaATCCBAA-835
  7. Special cases of "no rank". (-n/--save-predictable-norank). When filtering with-L/--lower-than, you can use-n/--save-predictable-norank to save some special ranks without order, where rank of the closest higher node is still lower than rank cutoff.

    $echo-ne"2605619\n1327037\n"\|taxonkitlineage-t\|csvtkcut-Ht-f3\|csvtkunfold-Ht-f1-s";"\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t131567cellularrootcellularorganisms2domainBacteria3379134kingdomPseudomonadati1224phylumPseudomonadota1236classGammaproteobacteria91347orderEnterobacterales543familyEnterobacteriaceae561genusEscherichia562speciesEscherichiacoli2605619norankEscherichiacoliO16:H4810239acellularrootViruses2731341realmDuplodnaviria2731360kingdomHeunggongvirae2731618phylumUroviricota2731619classCaudoviricetes2788787norankunclassifiedCaudoviricetes1327037speciesCroceibacterphageP2559Y# save taxids$echo-ne"2605619\n1327037\n"\|taxonkitlineage-t\|csvtkcut-Ht-f3\|csvtkunfold-Ht-f1-s";"\|teetaxids4.txt13156721224123691347543561562260561910239273134127313602731618273161928883106991968941327037

    Now, filter nodes of rank <= species.

    $cattaxids4.txt\|taxonkitfilter-Lspecies-Especies-N-n\|taxonkitlineage-r-n-L\|csvtkcut-Ht-f1,3,2\|csvtkpretty-H-t562speciesEscherichiacoli2605619norankEscherichiacoliO16:H481327037speciesCroceibacterphageP2559Y

    Note that 2605619 (no rank) is saved because its parent node 562 is <= species.

lcaLink

Usage

Compute lowest common ancestor (LCA) for TaxIdsAttention:  1. This command computes LCA TaxId for a list of TaxIds      in a field ("-i/--taxids-field) of tab-delimited file or STDIN.  2. TaxIDs should have the same separator ("-s/--separator"),     single charactor separator is prefered.  3. Empty lines or lines without valid TaxIds in the field are omitted.  4. If some TaxIds are not found in database, it returns 0.Examples:    $ echo 239934, 239935, 349741 | taxonkit lca  -s ", "    239934, 239935, 349741  239934    $ time echo 239934  239935  349741 9606  | taxonkit lca    239934 239935 349741 9606       131567Usage:  taxonkit lca [flags] Flags:  -b, --buffer-size string   size of line buffer, supported unit: K, M, G. You need to increase the                             value when "bufio.Scanner: token too long" error occured (default "1M")  -h, --help                 help for lca      --separater string     separater for TaxIds. This flag is same to --separator. (default " ")  -s, --separator string     separator for TaxIds (default " ")  -D, --skip-deleted         skip deleted TaxIds and compute with left ones  -U, --skip-unfound         skip unfound TaxIds and compute with left ones  -i, --taxids-field int     field index of TaxIds. Input data should be tab-separated (default 1)

Examples:

  1. Example data

    $taxonkitlist--ids9605-nr--indent"    "9605[genus]Homo9606[species]Homosapiens63221[subspecies]Homosapiensneanderthalensis741158[subspecies]Homosapienssubsp.'Denisova'1425170[species]Homoheidelbergensis2665952[norank]environmentalsamples2665953[species]Homosapiensenvironmentalsample
  2. Simple one

    $echo632212665953|taxonkitlca6322126659539605
  3. Custom field (-i/--taxids-field) and separater (-s/--separator).

    $echo-ne"a\t63221,2665953\nb\t63221, 741158\n"a63221,2665953b63221,741158$echo-ne"a\t63221,2665953\nb\t63221, 741158\n"\|taxonkitlca-i2-s","a63221,26659539605b63221,7411589606
  4. Merged TaxIds.

    #merged$echo924879248892489|taxonkitlca10:08:26.578[WARN]taxid92489wasmergedinto7963349248792488924891236
  5. Deleted TaxIds, you can ommit theses and continue compute with left onces with (-D/--skip-deleted).

    $echo123|taxonkitlca10:30:17.678[WARN]taxid3notfound1230$timeecho123|taxonkitlca-D10:29:31.828[WARN]taxid3wasdeleted1231
  6. TaxIDs not found in database, you can ommit theses and continue compute with left onces with (-U/--skip-unfound).

    $echo610216102211111111|taxonkitlca10:31:44.929[WARN]taxid11111111notfound6102161022111111110$echo610216102211111111|taxonkitlca-U10:32:02.772[WARN]taxid11111111notfound6102161022111111112628496

taxid-changelogLink

Usage

Create TaxId changelog from dump archivesAttention:  1. This command was originally designed for NCBI taxonomy, where the the TaxIds are stable.  2. For other taxonomic data created by "taxonkit create-taxdump", e.g., GTDB-taxdump,    some change events might be wrong, because     a) There would be dramatic changes between the two versions.     b) Different taxons in multiple versions might have the same TaxIds, because we only        check and eliminate taxid collision within a single version.     So a single version of taxonomic data created by "taxonkit create-taxdump" has no problem,     it's just the changelog might not be perfect.Steps:    # dependencies:    #   rush - https://github.com/shenwei356/rush/    mkdir -p archive; cd archive;    # --------- download ---------    # option 1    # for fast network connection    wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp*.zip    # option 2    # for slow network connection    url=https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/    wget $url -O - -o /dev/null \        | grep taxdmp | perl -ne '/(taxdmp_.+?.zip)/; print "$1\n";' \        | rush -j 2 -v url=$url 'axel -n 5 {url}/{}' \            --immediate-output  -c -C download.rush    # --------- unzip ---------    ls taxdmp*.zip | rush -j 1 'unzip {} names.dmp nodes.dmp merged.dmp delnodes.dmp -d {@_(.+)\.}'    # optionally compress .dmp files with pigz, for saving disk space    fd .dmp$ | rush -j 4 'pigz {}'    # --------- create log ---------    cd ..    taxonkit taxid-changelog -i archive -o taxid-changelog.csv.gz --verboseOutput format (CSV):    # fields        comments    taxid           # taxid    version         # version / time of archive, e.g, 2019-07-01    change          # change, values:                    #   NEW             newly added                    #   REUSE_DEL       deleted taxids being reused                    #   REUSE_MER       merged taxids being reused                    #   DELETE          deleted                    #   MERGE           merged into another taxid                    #   ABSORB          other taxids merged into this one                    #   CHANGE_NAME     scientific name changed                    #   CHANGE_RANK     rank changed                    #   CHANGE_LIN_LIN  lineage taxids remain but lineage remain                    #   CHANGE_LIN_TAX  lineage taxids changed                    #   CHANGE_LIN_LEN  lineage length changed    change-value    # variable values for changes:                     #   1) new taxid for MERGE                    #   2) merged taxids for ABSORB                    #   3) empty for others    name            # scientific name    rank            # rank    lineage         # complete lineage of the taxid    lineage-taxids  # taxids of the lineage    # you can use csvtk to investigate them. e.g.,    csvtk grep -f taxid -p 1390515 taxid-changelog.csv.gzUsage:  taxonkit taxid-changelog [flags]Flags:  -i, --archive string   directory containing uncompressed dumped archives  -h, --help             help for taxid-changelog

Details

  1. Example 1 (E.coli with taxid562)

    $pigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-p562\|csvtkprettytaxidversionchangechange-valuenameranklineagelineage-taxids5622014-08-01NEWEscherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622014-08-01ABSORB662101;662104Escherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622015-11-01ABSORB1637691Escherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622016-10-01CHANGE_LIN_LINEscherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;5625622018-06-01ABSORB469598Escherichiacolispeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacoli131567;2;1224;1236;91347;543;561;562# merged taxids$pigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-p662101,662104,1637691,469598\|csvtkprettytaxidversionchangechange-valuenameranklineagelineage-taxids4695982014-08-01NEWEscherichiasp.3_2_53FAAspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiasp.3_2_53FAA131567;2;1224;1236;91347;543;561;4695984695982016-10-01CHANGE_LIN_LINEscherichiasp.3_2_53FAAspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiasp.3_2_53FAA131567;2;1224;1236;91347;543;561;4695984695982018-06-01MERGE562Escherichiasp.3_2_53FAAspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiasp.3_2_53FAA131567;2;1224;1236;91347;543;561;4695986621012014-08-01MERGE5626621042014-08-01MERGE56216376912015-04-01DELETE16376912015-05-01REUSE_DELEscherichiasp.MARspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiasp.MAR131567;2;1224;1236;91347;543;561;163769116376912015-11-01MERGE562Escherichiasp.MARspeciescellularorganisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichiasp.MAR131567;2;1224;1236;91347;543;561;1637691
  2. Example 2 (SARS-CoV-2).

    $timepigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-p2697049\|csvtkprettytaxidversionchangechange-valuenameranklineagelineage-taxids26970492020-02-01NEWWuhanseafoodmarketpneumoniavirusspeciesViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;unclassifiedBetacoronavirus;Wuhanseafoodmarketpneumoniavirus10239;2559587;76804;2499399;11118;2501931;694002;696098;269704926970492020-03-01CHANGE_NAMESevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-03-01CHANGE_RANKSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-03-01CHANGE_LIN_LENSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-06-01CHANGE_LIN_LENSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-07-01CHANGE_RANKSevereacuterespiratorysyndromecoronavirus2isolateViruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;269704926970492020-08-01CHANGE_RANKSevereacuterespiratorysyndromecoronavirus2norankViruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severeacuterespiratorysyndrome-relatedcoronavirus;Severeacuterespiratorysyndromecoronavirus210239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;2697049real0m7.644suser0m16.749ssys0m3.985s
  3. Example 3 (All subspecies and strain inAkkermansia muciniphila 239935)

    #speciesinAkkermansia$taxonkitlist--show-rank--show-name--indent"    "--ids239935239935[species]Akkermansiamuciniphila349741[strain]AkkermansiamuciniphilaATCCBAA-835#checkthemall$pigz-cdtaxid-changelog.csv.gz\|csvtkgrep-ftaxid-P<(taxonkitlist--indent""--ids239935)\|csvtkprettylineage-taxidstaxidversionchangechange-valuenameranklineagelineage-taxids2399352014-08-01NEWAkkermansiamuciniphilaspeciescellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Verrucomicrobiaceae;Akkermansia;Akkermansiamuciniphila131567;2;51290;74201;203494;48461;203557;239934;2399352399352015-05-01CHANGE_LIN_TAXAkkermansiamuciniphilaspeciescellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;51290;74201;203494;48461;1647988;239934;2399352399352016-03-01CHANGE_LIN_TAXAkkermansiamuciniphilaspeciescellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;1783257;74201;203494;48461;1647988;239934;2399352399352016-05-01ABSORB1834199Akkermansiamuciniphilaspeciescellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila131567;2;1783257;74201;203494;48461;1647988;239934;2399353497412014-08-01NEWAkkermansiamuciniphilaATCCBAA-835norankcellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Verrucomicrobiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;51290;74201;203494;48461;203557;239934;239935;3497413497412015-05-01CHANGE_LIN_TAXAkkermansiamuciniphilaATCCBAA-835norankcellularorganisms;Bacteria;Chlamydiae/Verrucomicrobiagroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;51290;74201;203494;48461;1647988;239934;239935;3497413497412016-03-01CHANGE_LIN_TAXAkkermansiamuciniphilaATCCBAA-835norankcellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;1783257;74201;203494;48461;1647988;239934;239935;3497413497412020-07-01CHANGE_RANKAkkermansiamuciniphilaATCCBAA-835straincellularorganisms;Bacteria;PVCgroup;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansiamuciniphila;AkkermansiamuciniphilaATCCBAA-835131567;2;1783257;74201;203494;48461;1647988;239934;239935;349741

More

create-taxdumpLink

Usage

Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTVInput format:  0. For GTDB taxonomy file, just use --gtdb.     We use the numeric assembly accession as the taxon at subspecies rank.     (without the prefix GCA_ and GCF_, and version number).  1. The input file should be tab-delimited, at least one column is needed.  2. Ranks can be given either via the first row or the flag --rank-names.  3. The column containing the genome/assembly accession is recommended to     generate TaxId mapping file (taxid.map, id -> taxid).       -A/--field-accession,    field contaning genome/assembly accession       --field-accession-re,    regular expression to extract the accession     Note that mutiple TaxIds pointing to the same accession are listed as     comma-seperated integers.Attention:  1. Duplicated taxon names wit different ranks are allowed since v0.16.0, since     the rank and taxon name are contatenated for generating the TaxId.  2. The generated TaxIds are not consecutive numbers, however some tools like MMSeqs2     required this, you can use the script below for convertion:     https://github.com/apcamargo/ictv-mmseqs2-protein-database/blob/master/scripts/fix_taxdump.py  3. We only check and eliminate taxid collision within a single version of taxonomy data.     Therefore, if you create taxid-changelog with "taxid-changelog", different taxons     in multiple versions might have the same TaxIds and some change events might be wrong.     So a single version of taxonomic data created by "taxonkit create-taxdump" has no problem,     it's just the changelog might not be perfect.Usage:  taxonkit create-taxdump [flags]Flags:  -A, --field-accession int             field index of assembly accession (genome ID), for outputting                                        taxid.map  -S, --field-accession-as-subspecies   treate the accession as subspecies rank      --field-accession-re string       regular expression to extract assembly accession (default "^(.+)$")      --force                           overwrite existing output directory      --gtdb                            input files are GTDB taxonomy file      --gtdb-re-subs string             regular expression to extract assembly accession as the                                        subspecies (default "^\\w\\w_GC[AF]_(.+)\\.\\d+$")  -h, --help                            help for create-taxdump      --line-chunk-size int             number of lines to process for each thread, and 4 threads is                                        fast enough. (default 5000)      --null strings                    null value of taxa (default [,NULL,NA])  -x, --old-taxdump-dir string          taxdump directory of the previous version, for generating                                        merged.dmp and delnodes.dmp  -O, --out-dir string                  output directory  -R, --rank-names strings              names of all ranks, leave it empty to use the (lowercase) first                                        row of input as rank names

Examples:

  1. GTDB. See more: https://github.com/shenwei356/gtdb-taxdump

    $taxonkitcreate-taxdump--gtdbar53_taxonomy_r207.tsv.gzbac120_taxonomy_r207.tsv.gz--out-dirtaxdump16:42:35.213[INFO]317542recordssavedtotaxdump/taxid.map16:42:35.460[INFO]401815recordssavedtotaxdump/nodes.dmp16:42:35.611[INFO]401815recordssavedtotaxdump/names.dmp16:42:35.611[INFO]0recordssavedtotaxdump/merged.dmp16:42:35.611[INFO]0recordssavedtotaxdump/delnodes.dmp
  2. ICTV, See more: https://github.com/shenwei356/ictv-taxdump

  3. MGV. Only Order, Family, Genus information are available.

    $catmgv_contig_info.tsv\|csvtkcut-t-fictv_order,ictv_family,ictv_genus,votu_id,contig_id\|sed1d\>mgv.tsv$taxonkitcreate-taxdumpmgv.tsv--out-dirmgv--force-A5-Rorder,family,genus,species23:33:18.098[INFO]189680recordssavedtomgv/taxid.map23:33:18.131[INFO]58102recordssavedtomgv/nodes.dmp23:33:18.150[INFO]58102recordssavedtomgv/names.dmp23:33:18.150[INFO]0recordssavedtomgv/merged.dmp23:33:18.150[INFO]0recordssavedtomgv/delnodes.dmp$head-n5mgv/taxid.mapMGV-GENOME-0364295677052301MGV-GENOME-0364296677052301MGV-GENOME-03643031414406025MGV-GENOME-03643111849074420MGV-GENOME-03643122074846424$echo677052301|taxonkitlineage--data-dirmgv/677052301Caudovirales;crAss-phage;OTU-61123$echo677052301|taxonkitreformat--data-dirmgv/-I1-P677052301k__;p__;c__;o__Caudovirales;f__crAss-phage;g__;s__OTU-61123$grepMGV-GENOME-0364295mgv.tsvCaudoviralescrAss-phageNULLOTU-61123MGV-GENOME-0364295
  4. Custom lineages with the first row as rank names and treating one column as accession.

    $csvtkpretty-texample/taxonomy.tsvidsuperkingdomphylumclassorderfamilygenusspecies--------------------------------------------------------------------------------------------------------------------------------------GCF_001027105.1BacteriaFirmicutesBacilliBacillalesStaphylococcaceaeStaphylococcusStaphylococcusaureusGCF_001096185.1BacteriaFirmicutesBacilliLactobacillalesStreptococcaceaeStreptococcusStreptococcuspneumoniaeGCF_001544255.1BacteriaFirmicutesBacilliLactobacillalesEnterococcaceaeEnterococcusEnterococcusfaeciumGCF_002949675.1BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeShigellaShigelladysenteriaeGCF_002950215.1BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeShigellaShigellaflexneriGCF_006742205.1BacteriaFirmicutesBacilliBacillalesStaphylococcaceaeStaphylococcusStaphylococcusepidermidisGCF_000006945.2BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeSalmonellaSalmonellaentericaGCF_000017205.1BacteriaProteobacteriaGammaproteobacteriaPseudomonadalesPseudomonadaceaePseudomonasPseudomonasaeruginosaGCF_003697165.2BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeEscherichiaEscherichiacoliGCF_009759685.1BacteriaProteobacteriaGammaproteobacteriaMoraxellalesMoraxellaceaeAcinetobacterAcinetobacterbaumanniiGCF_000148585.2BacteriaFirmicutesBacilliLactobacillalesStreptococcaceaeStreptococcusStreptococcusmitisGCF_000392875.1BacteriaFirmicutesBacilliLactobacillalesEnterococcaceaeEnterococcusEnterococcusfaecalisGCF_000742135.1BacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeKlebsiellaKlebsiellapneumonia#thefirstcolumnasaccession$taxonkitcreate-taxdump-A1example/taxonomy.tsv-Oexample/taxdump16:31:31.828[INFO]Iwillusethefirstrowofinputasranknames16:31:31.843[INFO]13recordssavedtoexample/taxdump/taxid.map16:31:31.843[INFO]39recordssavedtoexample/taxdump/nodes.dmp16:31:31.843[INFO]39recordssavedtoexample/taxdump/names.dmp16:31:31.843[INFO]0recordssavedtoexample/taxdump/merged.dmp16:31:31.843[INFO]0recordssavedtoexample/taxdump/delnodes.dmp$exportTAXONKIT_DB=example/taxdump$taxonkitlist--ids1|taxonkitfilter-Especies|taxonkitlineage-r|csvtkpretty-Ht793223984Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;Acinetobacterbaumanniispecies1220345221Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonasaeruginosaspecies561101225Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Shigella;Shigellaflexnerispecies1969112428Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Shigella;Shigelladysenteriaespecies599451526Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichiacolispecies2034984046Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Salmonella;Salmonellaentericaspecies1859674812Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Klebsiella;Klebsiellapneumoniaespecies773201972Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcusaureusspecies1295317147Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcusepidermidisspecies182402976Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;Enterococcusfaeciumspecies1566113429Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;Enterococcusfaecalisspecies891083107Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcuspneumoniaespecies1357145446Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcusmitisspecies$head-n3example/taxdump/taxid.mapGCF_001027105.1773201972GCF_001096185.1891083107GCF_001544255.1182402976
  5. Custom lineages with the first row as rank names (pure lineage data)

    $csvtkcut-t-f2-example/taxonomy.tsv|head-n2|csvtkpretty-tsuperkingdomphylumclassorderfamilygenusspecies-------------------------------------------------------------------------------------------BacteriaFirmicutesBacilliBacillalesStaphylococcaceaeStaphylococcusStaphylococcusaureus$csvtkcut-t-f2-example/taxonomy.tsv\|taxonkitcreate-taxdump-Oexample/taxdump216:53:08.604[INFO]Iwillusethefirstrowofinputasranknames16:53:08.614[INFO]39recordssavedtoexample/taxdump2/nodes.dmp16:53:08.614[INFO]39recordssavedtoexample/taxdump2/names.dmp16:53:08.614[INFO]0recordssavedtoexample/taxdump2/merged.dmp16:53:08.615[INFO]0recordssavedtoexample/taxdump2/delnodes.dmp$exportTAXONKIT_DB=example/taxdump2$taxonkitlist--ids1|taxonkitfilter-Especies|taxonkitlineage-r|head-n2793223984Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;Acinetobacterbaumanniispecies1220345221Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonasaeruginosaspecies

genautocompleteLink

Usage

Generate shell autocompletion scriptSupported shell: bash|zsh|fish|powershellBash:    # generate completion shell    taxonkit genautocomplete --shell bash    # configure if never did.    # install bash-completion if the "complete" command is not found.    echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion    echo "source ~/.bash_completion" >> ~/.bashrcZsh:    # generate completion shell    taxonkit genautocomplete --shell zsh --file ~/.zfunc/_taxonkit    # configure if never did    echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc    echo "autoload -U compinit; compinit" >> ~/.zshrcfish:    taxonkit genautocomplete --shell fish --file ~/.config/fish/completions/taxonkit.fishUsage:  taxonkit genautocomplete [flags]Flags:      --file string   autocompletion file (default "/home/shenwei/.bash_completion.d/taxonkit.sh")  -h, --help          help for genautocomplete      --type string   autocompletion type (currently only bash supported) (default "bash")

profile2camiLink

Usage

Convert metagenomic profile table to CAMI formatInput format:  1. The input file should be tab-delimited  2. At least two columns needed:     a) TaxId of a taxon.     b) Abundance (could be percentage, automatically detected or use -p/--percentage).Attention:  0. If some TaxIds are parents of others, please switch on -S/--no-sum-up to disable     summing up abundances.  1. Some TaxIds may be merged to another ones in current taxonomy version,     the abundances will be summed up.  2. Some TaxIds may be deleted in current taxonomy version,     the abundances can be optionally recomputed with the flag -R/--recompute-abd.Usage:  taxonkit profile2cami [flags]Flags:  -a, --abundance-field int   field index of abundance. input data should be tab-separated (default 2)  -h, --help                  help for profile2cami  -0, --keep-zero             keep taxons with abundance of zero  -S, --no-sum-up             do not sum up abundance from child to parent TaxIds  -p, --percentage            abundance is in percentage  -R, --recompute-abd         recompute abundance if some TaxIds are deleted in current taxonomy version  -s, --sample-id string      sample ID in result file  -r, --show-rank strings     only show TaxIds and names of these ranks (default                              [superkingdom,phylum,class,order,family,genus,species,strain])  -i, --taxid-field int       field index of taxid. input data should be tab-separated (default 1)  -t, --taxonomy-id string    taxonomy ID in result file

Examples

  • Test data, note that2824115 is merged to483329 and1657696 is deleted in current taxonomy version.

    $catexample/abundance.tsv28241150.2mergedto4833294833290.2absord28241152399350.5nochange16576960.1deleted
  • Example:

    $taxonkitprofile2cami-ssample1-t2021-10-01\example/abundance.tsv13:17:40.552[WARN]taxidisdeletedincurrenttaxonomyversion:165769613:17:40.552[WARN]youmayrecomputedabundancewiththeflag-R/--recompute-abd@SampleID:sample1@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:2021-10-01@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria50.0000000000000002759superkingdom2759Eukaryota40.00000000000000074201phylum2|74201Bacteria|Verrucomicrobia50.0000000000000006656phylum2759|6656Eukaryota|Arthropoda40.000000000000000203494class2|74201|203494Bacteria|Verrucomicrobia|Verrucomicrobiae50.00000000000000050557class2759|6656|50557Eukaryota|Arthropoda|Insecta40.00000000000000048461order2|74201|203494|48461Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales50.0000000000000007041order2759|6656|50557|7041Eukaryota|Arthropoda|Insecta|Coleoptera40.0000000000000001647988family2|74201|203494|48461|1647988Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae50.00000000000000057514family2759|6656|50557|7041|57514Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae40.000000000000000239934genus2|74201|203494|48461|1647988|239934Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia50.00000000000000057515genus2759|6656|50557|7041|57514|57515Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus40.000000000000000239935species2|74201|203494|48461|1647988|239934|239935Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila50.000000000000000483329species2759|6656|50557|7041|57514|57515|483329Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus|Nicrophoruscarolina40.000000000000000
  • Recompute (normalize) the abundance

    $taxonkitprofile2cami-ssample1-t2021-10-01\example/abundance.tsv--recompute-abd13:19:23.647[WARN]taxidisdeletedincurrenttaxonomyversion:1657696@SampleID:sample1@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:2021-10-01@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria55.5555555555555572759superkingdom2759Eukaryota44.44444444444445074201phylum2|74201Bacteria|Verrucomicrobia55.5555555555555576656phylum2759|6656Eukaryota|Arthropoda44.444444444444450203494class2|74201|203494Bacteria|Verrucomicrobia|Verrucomicrobiae55.55555555555555750557class2759|6656|50557Eukaryota|Arthropoda|Insecta44.44444444444445048461order2|74201|203494|48461Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales55.5555555555555577041order2759|6656|50557|7041Eukaryota|Arthropoda|Insecta|Coleoptera44.4444444444444501647988family2|74201|203494|48461|1647988Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae55.55555555555555757514family2759|6656|50557|7041|57514Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae44.444444444444450239934genus2|74201|203494|48461|1647988|239934Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia55.55555555555555757515genus2759|6656|50557|7041|57514|57515Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus44.444444444444450239935species2|74201|203494|48461|1647988|239934|239935Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila55.555555555555557483329species2759|6656|50557|7041|57514|57515|483329Eukaryota|Arthropoda|Insecta|Coleoptera|Silphidae|Nicrophorus|Nicrophoruscarolina44.444444444444450
  • Some abundance might have taxa where some of them are parrents of others. E.g.,

    $catexample/abundance2.tsv20.9912240.5912360.2282110.412390.4910610.3927590.0196060.01

    Please switch on -S/--no-sum-up to disable summing up abundances.

    $taxonkitprofile2camiexample/abundance2.tsv-S@SampleID:@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria99.0000000000000002759superkingdom2759Eukaryota1.0000000000000001224phylum2|1224Bacteria|Pseudomonadota59.0000000000000001239phylum2|1239Bacteria|Bacillota40.0000000000000007711phylum2759|7711Eukaryota|Chordata1.00000000000000028211class2|1224|28211Bacteria|Pseudomonadota|Alphaproteobacteria40.00000000000000091061class2|1239|91061Bacteria|Bacillota|Bacilli39.0000000000000001236class2|1224|1236Bacteria|Pseudomonadota|Gammaproteobacteria20.00000000000000040674class2759|7711|40674Eukaryota|Chordata|Mammalia1.0000000000000009443order2759|7711|40674|9443Eukaryota|Chordata|Mammalia|Primates1.0000000000000009604family2759|7711|40674|9443|9604Eukaryota|Chordata|Mammalia|Primates|Hominidae1.0000000000000009605genus2759|7711|40674|9443|9604|9605Eukaryota|Chordata|Mammalia|Primates|Hominidae|Homo1.0000000000000009606species2759|7711|40674|9443|9604|9605|9606Eukaryota|Chordata|Mammalia|Primates|Hominidae|Homo|Homosapiens1.000000000000000
  • Also see https://github.com/shenwei356/sun2021-cami-profiles

cami-filterLink

Usage

Remove taxa of given TaxIds and their descendants in CAMI metagenomic profileInput format:   The CAMI (Taxonomic) Profiling Output Format      - https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd  - One file with mutiple samples is also supported.How to:  - No extra taxonomy data needed, so the original taxonomic information are    used and not changed.  - A mini taxonomic tree is built from records with abundance greater than    zero, and only leaves are retained for later use. The rank of leaves may    be "strain", "species", or "no rank".  - Relative abundances (in percentage) are recomputed for all leaves    (reference genome).  - A new taxonomic tree is built from these leaves, and abundances are     cumulatively added up from leaves to the root.Examples:  1. Remove Archaea, Bacteria, and EukaryoteS, only keep Viruses:      taxonkit cami-filter -t 2,2157,2759 test.profile -o test.filter.profile  2. Remove Viruses:      taxonkit cami-filter -t 10239 test.profile -o test.filter.profileUsage:  taxonkit cami-filter [flags]Flags:      --field-percentage int   field index of PERCENTAGE (default 5)      --field-rank int         field index of taxid (default 2)      --field-taxid int        field index of taxid (default 1)      --field-taxpath int      field index of TAXPATH (default 3)      --field-taxpathsn int    field index of TAXPATHSN (default 4)  -h, --help                   help for cami-filter      --leaf-ranks strings     only consider leaves at these ranks (default [species,strain,no rank])      --show-rank strings      only show TaxIds and names of these ranks (default                               [superkingdom,phylum,class,order,family,genus,species,strain])      --taxid-sep string       separator of taxid in TAXPATH and TAXPATHSN (default "|")  -t, --taxids strings         the parent taxid(s) to filter out  -f, --taxids-file strings    file(s) for the parent taxid(s) to filter out, one taxid per line

Examples:

  1. Remove Eukaryota
    taxonkitprofile2cami-ssample1-t2021-10-01\example/abundance.tsv--recompute-abd\|taxonkitcami-filter-t2759@SampleID:sample1@Version:0.10.0@Ranks:superkingdom|phylum|class|order|family|genus|species|strain@TaxonomyID:2021-10-01@@TAXIDRANKTAXPATHTAXPATHSNPERCENTAGE2superkingdom2Bacteria100.00000000000000074201phylum2|74201Bacteria|Verrucomicrobia100.000000000000000203494class2|74201|203494Bacteria|Verrucomicrobia|Verrucomicrobiae100.00000000000000048461order2|74201|203494|48461Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales100.0000000000000001647988family2|74201|203494|48461|1647988Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae100.000000000000000239934genus2|74201|203494|48461|1647988|239934Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia100.000000000000000239935species2|74201|203494|48461|1647988|239934|239935Bacteria|Verrucomicrobia|Verrucomicrobiae|Verrucomicrobiales|Akkermansiaceae|Akkermansia|Akkermansiamuciniphila100.000000000000000

[8]ページ先頭

©2009-2025 Movatter.jp