Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Do lookups on an FST in Python!

License

NotificationsYou must be signed in to change notification settings

eddieantonio/fst-lookup

Repository files navigation

TestscodecovPyPI versioncalver YYYY.MM.DD

Implements lookup forFoma finite state transducers.

Supports Python 3.5 and up.

Install

pip install fst-lookup

Usage

Import the library, and load an FST from a file:

Hint: Test this module bydownloading theeat FST!

>>>fromfst_lookupimportFST>>>fst=FST.from_file('eat.fomabin')

Assumed format of the FSTs

fst_lookup assumes that thelower label corresponds to the surfaceform, while theupper label corresponds to the lemma, and linguistictags and features: e.g., yourLEXC will look something likethis—note what is on each side of the colon (:):

Multichar_Symbols +N +Sg +PlLexicon Root    cow+N+Sg:cow #;    cow+N+Pl:cows #;    goose+N+Sg:goose #;    goose+N+Pl:geese #;    sheep+N+Sg:sheep #;    sheep+N+Pl:sheep #;

If your FST has labels on the opposite sides—e.g., theupper labelcorresponds to the surface form and theupper label corresponds tothe lemma and linguistic tags—then instantiate the FST by providingthelabels="invert" keyword argument:

fst=FST.from_file('eat-inverted.fomabin',labels="invert")

Hint: FSTs originating from the HFST suite are often inverted, sotry to loading the FST inverted first if.generate() or.analyze()aren't working correctly!

Analyze a word form

Toanalyze a form (take a word form, and get its linguistic analyzes)call theanalyze() function:

defanalyze(self,surface_form:str)->Iterator[Analysis]

This will yield all possible linguistic analyses produced by the FST.

An analysis is a tuple of strings. The strings are either linguistictags, or thelemma (base form of the word).

FST.analyze() is a generator, so you must calllist() to get a list.

>>>list(sorted(fst.analyze('eats')))[('eat','+N','+Mass'), ('eat','+V','+3P','+Sg')]

Generate a word form

Togenerate a form (take a linguistic analysis, and get its concreteword forms), call thegenerate() function:

defgenerate(self,analysis:str)->Iterator[str]

FST.generate() is a Python generator, so you must calllist() to geta list.

>>>list(fst.generate('eat+V+Past')))['ate']

Contributing

If you plan to contribute code, it is recommended you usePoetry.Fork and clone this repository, then install development dependenciesby typing:

poetry install

Then, do all your development within a virtual environment, managed byPoetry:

poetry shell

Type-checking

This project usesmypy to check static types. To invoke it on thispackage, type the following:

mypy -p fst_lookup

Running tests

To run this project's tests, we usepy.test:

poetry run pytest

C Extension

Building the C extension is handled inbuild.py

To disable building the C extension, add the following line to.env:

export FST_LOOKUP_BUILD_EXT=False

(by default, this isTrue).

To enable debugging flags while working on the C extension, add thefollowing line to.env:

export FST_LOOKUP_DEBUG=TRUE

(by default, this isFalse).

Fixtures

If you are creating or modifying existing test fixtures (i.e., mostlypre-built FSTs used for testing), you will need the followingdependencies:

Fixtures are stored intests/data/. Here, you will usemake tocompile all pre-built FSTs from source:

make

License

Copyright © 2019–2021 National Research Council Canada.

Licensed under the MIT license.

About

Do lookups on an FST in Python!

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp