- Notifications
You must be signed in to change notification settings - Fork2
koskenni/pytwolc
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Experimental two-level rule compilation using Python HFST. For more information, seehttps://github.com/hfst/python
The Python programtwol.py is a rule compiler and tester for rules of simplified two-level model, seehttps://pytwolc.readthedocs.io/en/latest/formalism.html for more information on the rule formalism and the compiler. The HST package can be loaded using the command:
$ python3 -m pip install hfst
The program twol.py uses and depend on the 'tatsu' Python parser generator by Juancarlo Añez, seeehttp://tatsu.readthedocs.io/en/stable/index.html for detailed documentation. You can load and install TaTsu from the net using a command:
$ python3 -m pip install tatsu
The program is prepared to handle input in Unicode, including user percieved graphemes which are combined out of two or more Unicode characters (with a so called code point). In order to recognize suchgraphemes, an additional package has to be installed:
$ python3 -m pip install grapheme
The compiler needs two files: (1) examples as a FST and (2) a rule file. The human readable examples must be converted into a FST usingtwexamp.py program.
The compiler is normally executed as follows:
$ python3 twol.py examples.fst rules.twolc
One can get more information by using the--help parameter. More documentation on twol.py can be found athttps://pytwolc.readthedocs.io/en/latest/compiletest.html
The moduletwexamp.py handles various tasks for the compiler during the compilation process. It is also needed for converting human readable examples into a FST so that ti is not necessary recompile it at every step of testing rules. A recompilation is only needed when the examples are changed. In order to convert examples from a pair string format into a fst you can e.g.:
$ python3 twexamp.py examples.pstr examples.fst
The sequence of programsparad2words.py,words2zerofilled.py,zerofilled2raw.py andraw2named.py is intended for determining the underlying or morphophonemic representations for word stems. It starts from a table of word forms or paradigms where morphs are separated from each other e.g. by a period (.). Seehttps://pytwolc.readthedocs.io/en/latest/morphophon.html for more information on their use. Each program is run from the command line, and one can get detailed information on the parameters by running the command with a--help argument, e.g.
$ python3 words2zerofilled.py --help
Some of the programs of this sequence need the packageorderedset which one can get from the net by
$ python3 -m pip install orderedset
Especially the zero-filling program needs the same package for handling combined graphemes as twol.py uses:
$ python3 -m pip install grapheme
There is a Makefile in the subdirectoryparad and examples which may help in testing and using the programs.
This program builds tentative or raw rules out of a set of examples. The examples must be given one example per line as a space-separated list of symbol pairs. Seehttps://pytwolc.readthedocs.io/en/latest/twdiscov.html for more information.
About
Experimental two-level rule compilation using Python HFST
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.