- Notifications
You must be signed in to change notification settings - Fork4
Do lookups on an FST in Python!
License
eddieantonio/fst-lookup
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Implements lookup forFoma finite state transducers.
Supports Python 3.5 and up.
pip install fst-lookupImport the library, and load an FST from a file:
Hint: Test this module bydownloading the
eatFST!
>>>fromfst_lookupimportFST>>>fst=FST.from_file('eat.fomabin')
fst_lookup assumes that thelower label corresponds to the surfaceform, while theupper label corresponds to the lemma, and linguistictags and features: e.g., yourLEXC will look something likethis—note what is on each side of the colon (:):
Multichar_Symbols +N +Sg +PlLexicon Root cow+N+Sg:cow #; cow+N+Pl:cows #; goose+N+Sg:goose #; goose+N+Pl:geese #; sheep+N+Sg:sheep #; sheep+N+Pl:sheep #;If your FST has labels on the opposite sides—e.g., theupper labelcorresponds to the surface form and theupper label corresponds tothe lemma and linguistic tags—then instantiate the FST by providingthelabels="invert" keyword argument:
fst=FST.from_file('eat-inverted.fomabin',labels="invert")
Hint: FSTs originating from the HFST suite are often inverted, sotry to loading the FST inverted first if
.generate()or.analyze()aren't working correctly!
Toanalyze a form (take a word form, and get its linguistic analyzes)call theanalyze() function:
defanalyze(self,surface_form:str)->Iterator[Analysis]
This will yield all possible linguistic analyses produced by the FST.
An analysis is a tuple of strings. The strings are either linguistictags, or thelemma (base form of the word).
FST.analyze() is a generator, so you must calllist() to get a list.
>>>list(sorted(fst.analyze('eats')))[('eat','+N','+Mass'), ('eat','+V','+3P','+Sg')]
Togenerate a form (take a linguistic analysis, and get its concreteword forms), call thegenerate() function:
defgenerate(self,analysis:str)->Iterator[str]
FST.generate() is a Python generator, so you must calllist() to geta list.
>>>list(fst.generate('eat+V+Past')))['ate']
If you plan to contribute code, it is recommended you usePoetry.Fork and clone this repository, then install development dependenciesby typing:
poetry installThen, do all your development within a virtual environment, managed byPoetry:
poetry shellThis project usesmypy to check static types. To invoke it on thispackage, type the following:
mypy -p fst_lookupTo run this project's tests, we usepy.test:
poetry run pytestBuilding the C extension is handled inbuild.py
To disable building the C extension, add the following line to.env:
export FST_LOOKUP_BUILD_EXT=False(by default, this isTrue).
To enable debugging flags while working on the C extension, add thefollowing line to.env:
export FST_LOOKUP_DEBUG=TRUE(by default, this isFalse).
If you are creating or modifying existing test fixtures (i.e., mostlypre-built FSTs used for testing), you will need the followingdependencies:
- GNU
make - Foma
Fixtures are stored intests/data/. Here, you will usemake tocompile all pre-built FSTs from source:
makeCopyright © 2019–2021 National Research Council Canada.
Licensed under the MIT license.
About
Do lookups on an FST in Python!
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.