Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

NCBI taxonomic identifier (taxid) changelog, including taxids deletion, new adding, merge, reuse, and rank/name changes.

License

NotificationsYou must be signed in to change notification settings

shenwei356/taxid-changelog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 

Repository files navigation

changes.png

taxid-changelog

NCBI taxonomic identifier (taxIDs) changelog,tracking taxIDs deletion, new adding, merge, reuse, and rank/name changes.

Please citeTaxonKit:https://doi.org/10.1016/j.jgg.2021.03.006

Table of contents

Download

Release

Format

File format (CSV format with 8 fields):

# fields        commentstaxid           # taxidversion         # version / time of archive, e.g, 2019-07-01change          # change, values:                #   NEW             newly added                #   REUSE_DEL       deleted taxids being reused                #   REUSE_MER       merged taxids being reused                #   DELETE          deleted                #   MERGE           merged into another taxid                #   ABSORB          other taxids merged into this one                #   CHANGE_NAME     scientific name changed                #   CHANGE_RANK     rank changed                #   CHANGE_LIN_LIN  lineage taxids remain but lineage changed                #   CHANGE_LIN_TAX  lineage taxids changed                #   CHANGE_LIN_LEN  lineage length changedchange-value    # variable values for changes:                 #   1) new taxid for MERGE                #   2) merged taxids for ABSORB                #   3) empty for othersname            # scientific namerank            # ranklineage         # full lineage of the taxidlineage-taxids  # taxids of the lineage

Example 1:

$ gzip -dc taxid-changelog.csv.gz | head -n 11taxid,version,change,change-value,name,rank,lineage,lineage-taxids1,2014-08-01,NEW,,root,no rank,root,12,2014-08-01,NEW,,Bacteria,superkingdom,cellular organisms;Bacteria,131567;23,2014-08-01,DELETE,,,,,4,2014-08-01,DELETE,,,,,5,2014-08-01,DELETE,,,,,6,2014-08-01,NEW,,Azorhizobium,genus,cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Xanthobacteraceae;Azorhizobium,131567;2;1224;28211;356;335928;67,2014-08-01,NEW,,Azorhizobium caulinodans,species,cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Xanthobacteraceae;Azorhizobium;Azorhizobium caulinodans,131567;2;1224;28211;356;335928;6;77,2014-08-01,ABSORB,395,Azorhizobium caulinodans,species,cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Xanthobacteraceae;Azorhizobium;Azorhizobium caulinodans,131567;2;1224;28211;356;335928;6;78,2014-08-01,DELETE,,,,,9,2014-08-01,NEW,,Buchnera aphidicola,species,cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Buchnera;Buchnera aphidicola,131567;2;1224;1236;91347;543;32199;9

Example 2 (SARS-CoV-2)

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 2697049 \    | csvtk prettytaxid     version      change           change-value   name                                              rank      lineage                                                                                                                                                                                                                                                        lineage-taxids2697049   2020-02-01   NEW                             Wuhan seafood market pneumonia virus              species   Viruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;unclassified Betacoronavirus;Wuhan seafood market pneumonia virus                                                                                               10239;2559587;76804;2499399;11118;2501931;694002;696098;26970492697049   2020-03-01   CHANGE_NAME                     Severe acute respiratory syndrome coronavirus 2   no rank   Viruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2                                              10239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;26970492697049   2020-03-01   CHANGE_RANK                     Severe acute respiratory syndrome coronavirus 2   no rank   Viruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2                                              10239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;26970492697049   2020-03-01   CHANGE_LIN_LEN                  Severe acute respiratory syndrome coronavirus 2   no rank   Viruses;Riboviria;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2                                              10239;2559587;76804;2499399;11118;2501931;694002;2509511;694009;26970492697049   2020-06-01   CHANGE_LIN_LEN                  Severe acute respiratory syndrome coronavirus 2   no rank   Viruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2   10239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;26970492697049   2020-07-01   CHANGE_RANK                     Severe acute respiratory syndrome coronavirus 2   isolate   Viruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2   10239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;26970492697049   2020-08-01   CHANGE_RANK                     Severe acute respiratory syndrome coronavirus 2   no rank   Viruses;Riboviria;Orthornavirae;Pisuviricota;Pisoniviricetes;Nidovirales;Cornidovirineae;Coronaviridae;Orthocoronavirinae;Betacoronavirus;Sarbecovirus;Severe acute respiratory syndrome-related coronavirus;Severe acute respiratory syndrome coronavirus 2   10239;2559587;2732396;2732408;2732506;76804;2499399;11118;2501931;694002;2509511;694009;2697049

Example 3 (E.coli with taxid562)

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 562 \    | csvtk prettytaxid   version      change           change-value    name               rank      lineage                                                                                                                            lineage-taxids562     2014-08-01   NEW                              Escherichia coli   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichia coli   131567;2;1224;1236;91347;543;561;562562     2014-08-01   ABSORB           662101;662104   Escherichia coli   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichia coli   131567;2;1224;1236;91347;543;561;562562     2015-11-01   ABSORB           1637691         Escherichia coli   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichia coli   131567;2;1224;1236;91347;543;561;562562     2016-10-01   CHANGE_LIN_LIN                   Escherichia coli   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli    131567;2;1224;1236;91347;543;561;562562     2018-06-01   ABSORB           469598          Escherichia coli   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli    131567;2;1224;1236;91347;543;561;562# merged taxids$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 662101,662104,1637691,469598 \    | csvtk prettytaxid     version      change           change-value   name                        rank      lineage                                                                                                                                     lineage-taxids469598    2014-08-01   NEW                             Escherichia sp. 3_2_53FAA   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichia sp. 3_2_53FAA   131567;2;1224;1236;91347;543;561;469598469598    2016-10-01   CHANGE_LIN_LIN                  Escherichia sp. 3_2_53FAA   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia sp. 3_2_53FAA    131567;2;1224;1236;91347;543;561;469598469598    2018-06-01   MERGE            562            Escherichia sp. 3_2_53FAA   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia sp. 3_2_53FAA    131567;2;1224;1236;91347;543;561;469598662101    2014-08-01   MERGE            562                                                                                                                                                                                              662104    2014-08-01   MERGE            562                                                                                                                                                                                              1637691   2015-04-01   DELETE                                                                                                                                                                                                            1637691   2015-05-01   REUSE_DEL                       Escherichia sp. MAR         species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichia sp. MAR         131567;2;1224;1236;91347;543;561;16376911637691   2015-11-01   MERGE            562            Escherichia sp. MAR         species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Escherichia sp. MAR         131567;2;1224;1236;91347;543;561;1637691

Example 4 (All subspecies and strain inAkkermansia muciniphila 239935)

# species in Akkermansia$ taxonkit list --show-rank --show-name --ids 239935239935 [species] Akkermansia muciniphila  349741 [strain] Akkermansia muciniphila ATCC BAA-835# check them all  $ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -P <(taxonkit list --indent "" --ids 239935) \    | csvtk prettytaxid    version      change           change-value   name                                   rank      lineage                                                                                                                                                                                                         lineage-taxids239935   2014-08-01   NEW                             Akkermansia muciniphila                species   cellular organisms;Bacteria;Chlamydiae/Verrucomicrobia group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Verrucomicrobiaceae;Akkermansia;Akkermansia muciniphila                                        131567;2;51290;74201;203494;48461;203557;239934;239935239935   2015-05-01   CHANGE_LIN_TAX                  Akkermansia muciniphila                species   cellular organisms;Bacteria;Chlamydiae/Verrucomicrobia group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila                                            131567;2;51290;74201;203494;48461;1647988;239934;239935239935   2016-03-01   CHANGE_LIN_TAX                  Akkermansia muciniphila                species   cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila                                                                   131567;2;1783257;74201;203494;48461;1647988;239934;239935239935   2016-05-01   ABSORB           1834199        Akkermansia muciniphila                species   cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila                                                                   131567;2;1783257;74201;203494;48461;1647988;239934;239935349741   2014-08-01   NEW                             Akkermansia muciniphila ATCC BAA-835   no rank   cellular organisms;Bacteria;Chlamydiae/Verrucomicrobia group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Verrucomicrobiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835   131567;2;51290;74201;203494;48461;203557;239934;239935;349741349741   2015-05-01   CHANGE_LIN_TAX                  Akkermansia muciniphila ATCC BAA-835   no rank   cellular organisms;Bacteria;Chlamydiae/Verrucomicrobia group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835       131567;2;51290;74201;203494;48461;1647988;239934;239935;349741349741   2016-03-01   CHANGE_LIN_TAX                  Akkermansia muciniphila ATCC BAA-835   no rank   cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835                              131567;2;1783257;74201;203494;48461;1647988;239934;239935;349741349741   2020-07-01   CHANGE_RANK                     Akkermansia muciniphila ATCC BAA-835   strain    cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835                              131567;2;1783257;74201;203494;48461;1647988;239934;239935;349741

Results

Tools used:

  • csvtk for query and more maniputations.
  • pigz is faster thangzip.

Taxids were deleted, newly added and re-used

Stats:

$ csvtk join -k -f version \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p DELETE \    | csvtk freq -f version) \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p NEW \    | csvtk freq -f version) \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p MERGE \    | csvtk freq -f version) \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p REUSE_DEL \    | csvtk freq -f version) \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p REUSE_MER \    | csvtk freq -f version) \    | csvtk rename -f -1 -n deleted,newly_added,merged,deleted_reused,merged_reused \    | csvtk prettyversion      deleted   newly_added   merged   deleted_reused   merged_reused2014-08-01   310603    1184830       33153                     2014-09-01   7957      3362          211      7694             2014-10-01   6535      3856          381      5590             2014-11-01   9568      5849          463      6037             2014-12-01   7147      3699          255      6378             2015-01-01   7650      5104          295      4762             2015-02-01   14911     2699          240      9900             2015-03-01   9123      4117          273      6025             2015-04-01   13748     3398          316      9951             2015-05-01   8381      2639          554      6279             2015-06-01   8629      3320          447      4385             2015-07-01   12746     4140          438      14992            2015-08-01   14680     4300          232      8300             2015-09-01   6646      4083          404      8330             2015-10-01   16933     6052          342      12273            2015-11-01   12169     4727          620      15613            2015-12-01   9782      6298          436      13206            2016-01-01   7834      4133          446      9175             2016-03-01   18825     13532         1230     12250            2016-04-01   10135     6106          469      6461             2016-05-01   13835     4704          449      14778            12016-06-01   6303      6110          434      14237            12016-08-01   14866     13730         633      10324            22016-09-01   8033      4683          273      8450             2016-10-01   6420      2084          219      6213             2016-11-01   5097      3548          1419     3765             22016-12-01   5881      3517          219      7113             12017-01-01   4208      3102          298      5800             2017-02-01   6762      3108          310      4588             2017-03-01   8929      13341         371      5509             2017-04-01   6721      4026          347      6778             2017-05-01   4904      4306          217      7364             2017-06-01   12789     9715          287      4746             2017-07-01   6974      4622          329      3735             32017-08-01   3669      2483          285      14135            2017-09-01   6716      2853          253      3892             2017-10-01   5766      2451          369      4865             2017-12-01   8194      7087          564      8602             2018-01-01   7732      2711          382      4123             12018-02-01   9020      4106          325      6676             2018-03-01   17113     9415          326      3743             2018-04-01   14688     16683         422      16358            2018-05-01   31551     4950          292      9794             2018-06-01   31336     6040          430      22034            2018-07-01   35696     10657         260      17726            2018-08-01   21046     11585         246      20586            2018-09-01   4992      10658         404      15697            2018-10-01   78319     34128         384      12118            2018-11-01   34245     31692         509      68228            2018-12-01   5699      1856          210      33487            2019-01-01   3722      2562          330      4653             2019-02-01   5488      5990          404      4411             2019-03-01   12567     6473          219      3598             2019-04-01   12807     19580         271      11672            12019-05-01   10502     2453          345      4103             2019-06-01   8008      1458          509      13797            2019-07-01   5050      2192          371      6695             2019-08-01   6041      1611          256      6501             2019-09-01   4975      2541          191      5328             2019-10-01   10854     32970         278      3497             2019-11-01   10648     2025          223      3530             2019-12-01   11905     5727          351      7961             2020-01-01   8928      4337          262      15423            2020-02-01   8566      2024          292      8309 2020-03-01   5752      3051          390      3998             2020-04-01   6982      2149          522      4085 2020-05-01   6380      2277          434      7099             2020-06-01   5648      3182          429      5504             2020-07-01   6715      1982          366      43792020-08-01   8667      2053          632      4155             2020-09-01   7135      1457          325      54252020-10-01   6091      2567          346      6654             2020-11-01   5126      1479          365      4754 2020-12-01   5726      3696          358      4986             2021-01-01   5506      1661          472      3561             2021-02-01   4320      2260          559      6528             2021-03-01   7015      1395          400      51452021-04-01   5230      2101          650      4633             12021-05-01   7419      1687          549      4324             22021-06-01   6654      2052          321      4623

The paper of NCBI Taxonomy database:

Taxids are stable and persistent—they may be deleted(when taxa are removed from the database),and they may be merged (when taxa are synonymized),but they will never be reused to identify a different taxon.

Deleted taxid can be re-used, e.g.,1157319 was reused to identify the same taxon.

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 7343,1157319 \    | csvtk pretty taxid     version      change           change-value   name                     rank      lineage                                                                                                                                                                                                                                                                                                                                                                                       lineage-taxids7343      2014-08-01   DELETE                                                                                                                                                                                                                                                                                                                                                                                                                                                           7343      2015-04-01   REUSE_DEL                       Paraliodrosophila        genus     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophilina;Drosophiliti;Paraliodrosophila   131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;46879;186285;73437343      2015-06-01   DELETE                          Paraliodrosophila        genus     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophilina;Drosophiliti;Paraliodrosophila   131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;46879;186285;73437343      2016-05-01   REUSE_DEL                       Paraliodrosophila        genus     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophilina;Drosophiliti;Paraliodrosophila   131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;46879;186285;73437343      2016-08-01   CHANGE_LIN_LEN                  Paraliodrosophila        genus     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Paraliodrosophila                             131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;73437343      2017-03-01   CHANGE_LIN_LIN                  Paraliodrosophila        genus     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Holometabola;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Paraliodrosophila                              131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;73431157319   2014-08-01   NEW                             Lactococcus phage ASCC   no rank   Viruses;dsDNA viruses, no RNA stage;Caudovirales;Siphoviridae;unclassified Siphoviridae;Lactococcus phage 936 sensu lato;Lactococcus phage ASCC                                                                                                                                                                                                                                               10239;35237;28883;10699;196894;354259;11573191157319   2018-06-01   CHANGE_LIN_LEN                  Lactococcus phage ASCC   no rank   Viruses;dsDNA viruses, no RNA stage;Caudovirales;Siphoviridae;Sk1virus;unclassified Sk1virus;Lactococcus phage 936 sensu lato;Lactococcus phage ASCC                                                                                                                                                                                                                                          10239;35237;28883;10699;1623305;2050979;354259;11573191157319   2019-04-01   CHANGE_LIN_LEN                  Lactococcus phage ASCC   no rank   Viruses;Caudovirales;Siphoviridae;Skunavirus;unclassified Sk1virus;Lactococcus phage 936 sensu lato;Lactococcus phage ASCC                                                                                                                                                                                                                                                                    10239;28883;10699;1623305;2050979;354259;11573191157319   2019-05-01   CHANGE_LIN_LIN                  Lactococcus phage ASCC   no rank   Viruses;Caudovirales;Siphoviridae;Skunavirus;unclassified Skunavirus;Lactococcus phage 936 sensu lato;Lactococcus phage ASCC                                                                                                                                                                                                                                                                  10239;28883;10699;1623305;2050979;354259;11573191157319   2019-06-01   DELETE                          Lactococcus phage ASCC   no rank   Viruses;Caudovirales;Siphoviridae;Skunavirus;unclassified Skunavirus;Lactococcus phage 936 sensu lato;Lactococcus phage ASCC                                                                                                                                                                                                                                                                  10239;28883;10699;1623305;2050979;354259;11573191157319   2019-07-01   REUSE_DEL                       Lactococcus phage ASCC   species   Viruses;Caudovirales;Siphoviridae;Skunavirus;unclassified Skunavirus;Lactococcus phage ASCC                                                                                                                                                                                                                                                                                                   10239;28883;10699;1623305;2050979;11573191157319   2020-06-01   CHANGE_LIN_LEN                  Lactococcus phage ASCC   species   Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;Skunavirus;unclassified Skunavirus;Lactococcus phage ASCC                                                                                                                                                                                                                                           10239;2731341;2731360;2731618;2731619;28883;10699;1623305;2050979;1157319

The full list:

$ pigz -cd taxid-changelog.csv.gz \    | grep REUSE_DEL \    | csvtk cut -f 1 \    > reuse_del.txt$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -P reuse_del.txt \    | csvtk fold -f taxid -v change \    | csvtk grep -f change -r -p ^DELETE -v \    | csvtk cut -f taxid \    > reuse_del.afterAug2014.txt    $ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -P reuse_del.afterAug2014.txt \    > reuse_del.afterAug2014.txt.detail

Don't worry, reused taxIDs are assigned to the same taxon.

Merged taxid can also be re-used (become independent again?), e.g.,

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 101480,36032 \    | csvtk prettytaxid    version      change      change-value   name                         rank      lineage                                                                                                                                                                                                              lineage-taxids36032    2014-08-01   MERGE       1249076                                                                                                                                                                                                                                                                    36032    2016-06-01   REUSE_MER                  Barnettozyma wickerhamii     species   cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Saccharomycotina;Saccharomycetes;Saccharomycetales;Phaffomycetaceae;Barnettozyma;Barnettozyma wickerhamii                          131567;2759;33154;4751;451864;4890;716545;147537;4891;4892;115784;599802;36032101480   2014-08-01   MERGE       63407                                                                                                                                                                                                                                                                      101480   2016-05-01   REUSE_MER                  Trichophyton interdigitale   species   cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;Eurotiomycetes;Eurotiomycetidae;Onygenales;Arthrodermataceae;Trichophyton;Trichophyton interdigitale   131567;2759;33154;4751;451864;4890;716545;147538;716546;147545;451871;33183;34384;5550;101480

Scientific names and ranks changed

Scientific changed, e.g.,

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 11,152 \    | csvtk cut -f -lineage,-lineage-taxids \    | csvtk pretty taxid   version      change           change-value   name                      rank11      2014-08-01   NEW                             [Cellvibrio] gilvus       species11      2015-05-01   CHANGE_LIN_LEN                  [Cellvibrio] gilvus       species11      2015-11-01   CHANGE_NAME                     Cellulomonas gilvus       species11      2015-11-01   CHANGE_LIN_LIN                  Cellulomonas gilvus       species11      2016-03-01   CHANGE_LIN_LEN                  Cellulomonas gilvus       species152     2014-08-01   NEW                             Treponema stenostrepta    species152     2015-06-01   CHANGE_NAME                     Treponema stenostreptum   species152     2015-06-01   CHANGE_LIN_LIN                  Treponema stenostreptum   species

Rank changed, e.g.,

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 1189,2763 \    | csvtk cut -f -lineage,-lineage-taxids \    | csvtk pretty taxid   version      change           change-value   name              rank1189    2014-08-01   NEW                             Stigonematales    order1189    2016-03-01   CHANGE_LIN_LEN                  Stigonematales    order1189    2016-09-01   CHANGE_NAME                     Stigonemataceae   family1189    2016-09-01   CHANGE_RANK                     Stigonemataceae   family1189    2016-09-01   CHANGE_LIN_LEN                  Stigonemataceae   family2763    2014-08-01   NEW                             Rhodophyta        no rank2763    2019-02-01   CHANGE_RANK                     Rhodophyta        phylum

Stats:

$ csvtk join -k -f version \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p CHANGE_NAME \    | csvtk freq -f version) \    <(pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p CHANGE_RANK \    | csvtk freq -f version) \    | csvtk rename -f -1 -n name-changed,rank-changed \    | csvtk prettyversion      name-changed   rank-changed2014-09-01   386            322014-10-01   643            512014-11-01   468            712014-12-01   460            482015-01-01   483            1042015-02-01   407            822015-03-01   553            862015-04-01   595            642015-05-01   415            442015-06-01   1278           312015-07-01   673            1412015-08-01   280            312015-09-01   418            362015-10-01   529            342015-11-01   1087           582015-12-01   1063           1042016-01-01   929            1072016-03-01   2032           1732016-04-01   1190           1232016-05-01   706            312016-06-01   573            1532016-08-01   1073           3382016-09-01   884            1092016-10-01   994            502016-11-01   898            1742016-12-01   853            1742017-01-01   732            432017-02-01   948            912017-03-01   2022           4082017-04-01   784            1022017-06-01   574            3612017-05-01   606            2762017-07-01   523            882017-08-01   568            222017-09-01   660            972017-10-01   894            962017-12-01   1829           1092018-01-01   733            282018-02-01   701            292018-03-01   706            972018-04-01   1562           752018-05-01   640            702018-06-01   721            542018-07-01   1028           2202018-08-01   718            1222018-09-01   599            1472018-10-01   536            612018-11-01   371            692018-12-01   244            342019-01-01   532            772019-02-01   372            5722019-03-01   395            592019-04-01   537            2972019-05-01   415            5802019-06-01   326            2932019-07-01   293            372019-08-01   423            702019-09-01   287            142019-10-01   465            112019-11-01   378            782020-01-01   497            402020-02-01   361            242020-03-01   805            332020-04-01   572            152020-05-01   848            362020-06-01   495            7752020-07-01   611            2159322020-08-01   631            1677992020-09-01   748            342020-10-01   1206           422020-11-01   610            1262020-12-01   566            842021-01-01   983            482021-02-01   1931           512021-03-01   1327           50

What happend on 2020-07-01?

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f version -p 2020-07-01 \    | csvtk grep -f change -p CHANGE_RANK \    | csvtk freq -f rank -nr \    | csvtk pretty    rank              frequencyisolate           111108strain            101971serotype          1144clade             822forma specialis   740serogroup         71genotype          20biotype           17species           14morph             11pathogroup        5subvariety        5family            1genus             1subgenus          1tribe             1# where are they from$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f version -p 2020-07-01 \    | csvtk grep -f change -p CHANGE_RANK \    | csvtk cut -f taxid \    > rank-changed-2020-07.taxid# ranks before 2020-07-01pigz -cd taxid-changelog.csv.gz \        | csvtk grep -f taxid -P rank-changed-2020-07.taxid \        | csvtk grep -f version -p 2020-07-01 -p 2020-08-01 -p 2020-09-01 -v \        | csvtk cut -f taxid,version,rank \        | csvtk sort -k taxid:n -k version:r \        | csvtk uniq -f taxid \        > rank-changed-2020-07.taxid.before-2020-07.csv# ranks on 2020-07-01$ pigz -cd taxid-changelog.csv.gz \        | csvtk grep -f taxid -P rank-changed-2020-07.taxid \        | csvtk grep -f version -p 2020-07-01 \        | csvtk cut -f taxid,version,rank \        > rank-changed-2020-07.taxid.2020-07.csv# join$ csvtk join --outer-join -f taxid rank-changed-2020-07.taxid.before-2020-07.csv rank-changed-2020-07.taxid.2020-07.csv \    | csvtk rename -f 3 -n "<2020.07" \    | csvtk rename -f 5 -n 2020.07 \    | csvtk freq -f "<2020.07,2020.07" -nr \    | csvtk pretty<2020.07     2020.07           frequencyno rank      isolate           110937no rank      strain            102294no rank      serotype          1144no rank      clade             939no rank      forma specialis   759species      isolate           663no rank      serogroup         71no rank      genotype          20no rank      biotype           17no rank      species           14no rank      morph             11subspecies   species           11no rank      pathogroup        5no rank      subvariety        5species      strain            3subfamily    family            3subfamily    tribe             3varietas     species           3genus        subgenus          2subgenus     genus             2

What happend on 2020-08-01?

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f version -p 2020-08-01 \    | csvtk grep -f change -p CHANGE_RANK \    | csvtk freq -f rank -nr \    | csvtk prettyrank         frequencyno rank      167603serotype     106clade        59species      15subspecies   9subgenus     3varietas     3family       1

Well, lots of "isolate" and "strain" are changed back to "no rank" again.

# taxid with rank changed to "no rank" on 2020-08-01$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f version -p 2020-08-01 \    | csvtk grep -f rank -p "no rank" \    | csvtk cut -f taxid \    > norank-2020-08.taxid# ranks before 2020-07-01$ pigz -cd taxid-changelog.csv.gz \        | csvtk grep -f taxid -P norank-2020-08.taxid \        | csvtk grep -f version -p 2020-07-01 -p 2020-08-01 -p 2020-09-01 -v \        | csvtk cut -f taxid,version,rank \        | csvtk sort -k taxid:n -k version:r \        | csvtk uniq -f taxid \        > norank-2020-08.taxid.before-2020-07.csv# ranks on 2020-07-01$ pigz -cd taxid-changelog.csv.gz \        | csvtk grep -f taxid -P norank-2020-08.taxid \        | csvtk grep -f version -p 2020-07-01 \        | csvtk cut -f taxid,version,rank \        > norank-2020-08.taxid.2020-07.csv# join  $ csvtk join --outer-join -f taxid norank-2020-08.taxid.before-2020-07.csv norank-2020-08.taxid.2020-07.csv \    | csvtk rename -f 3 -n "<2020.07" \    | csvtk rename -f 5 -n 2020.07 \    | csvtk mutate2 -n 2020.08 -e '"no rank"' \    | csvtk freq -f "<2020.07,2020.07,2020.08" -nr \    | csvtk pretty        <2020.07   2020.07    2020.08   frequencyno rank    isolate    no rank   109585no rank    strain     no rank   57724species    isolate    no rank   660                      no rank   198no rank               no rank   186species               no rank   59no rank    serotype   no rank   52           isolate    no rank   18no rank    no rank    no rank   9species    species    no rank   3           no rank    no rank   2species    strain     no rank   2

Lineage taxids remained but lineage changed

Because scientifics name of itself (730) changed, or these of part taxids with higher ranks (8492).

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 730,8492 \    | csvtk prettytaxid   version      change           change-value   name                    rank      lineage                                                                                                                                                                                                                                                                    lineage-taxids730     2014-08-01   NEW                             Haemophilus ducreyi     species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;Haemophilus ducreyi                                                                                                                                              131567;2;1224;1236;135625;712;724;730730     2015-06-01   CHANGE_NAME                     [Haemophilus] ducreyi   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;[Haemophilus] ducreyi                                                                                                                                            131567;2;1224;1236;135625;712;724;730730     2015-06-01   CHANGE_LIN_LIN                  [Haemophilus] ducreyi   species   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;[Haemophilus] ducreyi                                                                                                                                            131567;2;1224;1236;135625;712;724;7308492    2014-08-01   NEW                             Archosauria             no rank   cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Sauropsida;Sauria;Testudines + Archosauria group;Archosauria   131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;8457;32561;1329799;84928492    2015-01-01   CHANGE_LIN_LIN                  Archosauria             no rank   cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Sauropsida;Sauria;Archelosauria;Archosauria                    131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;8457;32561;1329799;84928492    2020-07-01   CHANGE_RANK                     Archosauria             clade     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Sauropsida;Sauria;Archelosauria;Archosauria                    131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;8457;32561;1329799;8492

Rank changed from subspecies and species

Steps:

# get taxids which change rank to species$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f change -p CHANGE_RANK \    | csvtk grep -f rank -p species \    | csvtk cut -f taxid \    > t.txt# filter taxids which change rank from subspecies from species$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -P t.txt \    | csvtk collapse -f taxid -v rank -s ";" \    | csvtk grep -f rank -r -p "subspecies.*species" \    | csvtk cut -f taxid \    > t.f.txt# count$ csvtk nrow t.f.txt651

When did these happend?

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -P t.f.txt \    | csvtk grep -f change -p CHANGE_RANK \    | csvtk grep -f rank -p species \    | csvtk freq -f version -k \    | csvtk prettyversion      frequency2014-09-01   142014-11-01   292014-12-01   62015-01-01   12015-02-01   92015-03-01   52015-04-01   32015-05-01   22015-06-01   32015-07-01   32015-08-01   32015-09-01   32015-10-01   62015-11-01   22016-01-01   42016-03-01   62016-04-01   42016-05-01   12016-06-01   1052016-08-01   112016-09-01   62016-10-01   22016-11-01   62016-12-01   202017-01-01   102017-02-01   22017-03-01   82017-04-01   12017-05-01   62017-06-01   22017-07-01   32017-08-01   52017-09-01   432017-10-01   62017-12-01   92018-01-01   42018-02-01   42018-03-01   32018-04-01   92018-05-01   32018-06-01   22018-07-01   222018-08-01   72018-09-01   42018-10-01   22018-11-01   12018-12-01   12019-01-01   72019-02-01   12019-03-01   22019-04-01   42019-05-01   52019-06-01   32019-08-01   62019-09-01   32019-10-01   12019-11-01   42019-11-01   42019-12-01   12020-02-01   62020-03-01   92020-04-01   22020-05-01   62020-06-01   1702020-07-01   52020-08-01   52020-09-01   12020-10-01   72020-11-01   8

Examples:

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f taxid -p 8757,40746,41264 \    | csvtk cut -f -lineage,-lineage-taxids \    | csvtk pretty    taxid   version      change           change-value   name                                     rank8757    2014-08-01   NEW                             Sistrurus catenatus tergeminus           subspecies8757    2014-08-01   ABSORB           8761           Sistrurus catenatus tergeminus           subspecies8757    2017-08-01   CHANGE_NAME                     Sistrurus tergeminus                     subspecies8757    2017-08-01   CHANGE_LIN_LEN                  Sistrurus tergeminus                     subspecies8757    2017-12-01   CHANGE_RANK                     Sistrurus tergeminus                     species40746   2014-08-01   NEW                             Langloisia setosissima subsp. punctata   subspecies40746   2014-12-01   ABSORB           1570882        Langloisia punctata                      species40746   2014-12-01   CHANGE_NAME                     Langloisia punctata                      species40746   2014-12-01   CHANGE_RANK                     Langloisia punctata                      species40746   2014-12-01   CHANGE_LIN_LEN                  Langloisia punctata                      species40746   2019-06-01   CHANGE_LIN_LIN                  Langloisia punctata                      species41264   2014-08-01   NEW                             Gerbilliscus kempi gambiana              subspecies41264   2014-08-01   ABSORB           410304         Gerbilliscus kempi gambiana              subspecies41264   2014-12-01   CHANGE_NAME                     Gerbilliscus gambianus                   species41264   2014-12-01   CHANGE_RANK                     Gerbilliscus gambianus                   species41264   2014-12-01   CHANGE_LIN_LEN                  Gerbilliscus gambianus                   species41264   2017-04-01   CHANGE_LIN_TAX                  Gerbilliscus gambianus                   species

A superkingdom disappeared

$ pigz -cd taxid-changelog.csv.gz \    | csvtk grep -f rank -p superkingdom \    | csvtk pretty taxid   version      change   change-value   name        rank           lineage                        lineage-taxids2       2014-08-01   NEW                     Bacteria    superkingdom   cellular organisms;Bacteria    131567;22157    2014-08-01   NEW                     Archaea     superkingdom   cellular organisms;Archaea     131567;21572759    2014-08-01   NEW                     Eukaryota   superkingdom   cellular organisms;Eukaryota   131567;275910239   2014-08-01   NEW                     Viruses     superkingdom   Viruses                        1023912884   2014-08-01   NEW                     Viroids     superkingdom   Viroids                        1288412884   2019-05-01   DELETE                  Viroids     superkingdom   Viroids                        12884

to be continue ...

Method

Data source:https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/

Dependencies:

Hardware requirements:

  • DISK: > 30 GiB
  • RAM: >= 100 GiB (75 GiB for 125 archives, in 32min, 2024/11/01)

Steps:

mkdir -p archive; cd archive;# --------- download ---------# option 1# for fast network connectionwget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp*.zip# option 2# for bad network connection like mineurl=https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/wget $url -O - -o /dev/null \    | grep taxdmp | perl -ne '/(taxdmp_.+?.zip)/; print "$1\n";' \    | rush -j 2 -v url=$url 'axel -n 5 {url}/{}' \        --immediate-output  -c -C _download.rush# --------- unzip ---------ls taxdmp*.zip \    | rush -j 1 'unzip {} names.dmp nodes.dmp merged.dmp delnodes.dmp -d {@_(.+)\.}' \    -c -C _unzip.rush --eta# optionally compress .dmp files with pigz, for saving disk spacefd .dmp$ | rush -j 4 'pigz {}' --eta# --------- create log ---------cd ..time taxonkit taxid-changelog -i archive -o taxid-changelog.csv.gz --verbose

Citation

Shen, W., Ren, H., TaxonKit: a practical and efficient NCBI Taxonomy toolkit,Journal of Genetics and Genomics,https://doi.org/10.1016/j.jgg.2021.03.006

Contributing

We welcome pull requests, bug fixes and issue reports.

License

MIT License

About

NCBI taxonomic identifier (taxid) changelog, including taxids deletion, new adding, merge, reuse, and rank/name changes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp