- Notifications
You must be signed in to change notification settings - Fork6
📜 the Great Automatic Nomenclator — The Next Million Names for Archaea and Bacteria
License
telatin/gan
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Next Million Names for Archaea and Bacteria
Mark J. Pallenet al.The Next Million Names for Archaea and Bacteria,Trends in Microbiology (2020).DOI: 10.1016/j.tim.2020.10.009
To generate a large number of new names, we apply a combinatorial approach starting with two or three sets ofcurated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.
The scripts in this repository require Python (at least 3.6) and these modules:
- itertools (ships with Python)
- pandas (>1.0)
- xlrd (1.2.0)
To run the scripts of this repository, we suggest to create a conda environment as follows:
conda create -c conda-forge -n gan python=3.8 pandas pip ipythonconda activate ganpip install xlrd==1.2.0
A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.
Synopsis:
usage: gan-genus.py [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]
For full usage and installation instructions, pleasecheck the documentation.
Using three small files in theinput_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:
"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator")written by the British author Roald Dahl [link].
About
📜 the Great Automatic Nomenclator — The Next Million Names for Archaea and Bacteria