- Notifications
You must be signed in to change notification settings - Fork1
Examples
Afterinstalling HFST on your computer, start python. For example, the following simple program
import hfsttr1 = hfst.regex('foo:bar')tr2 = hfst.regex('bar:baz')tr1.compose(tr2)print(tr1)should print to standard output the following text when run:
0 1 foo baz 01 0Download a Finnish lexicon text file from:
http://hfst.github.io/downloads/finntreebank.lexc
start python and execute:
import hfsthfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE)tr = hfst.compile_lexc_file('finntreebank.lexc')tr.invert()tr.convert(hfst.ImplementationType.HFST_OL_TYPE)You may also download some precompiled lexicons for various languages from
https://sourceforge.net/projects/hfst/files/resources/morphological-transducers/
You can try out the Finnish lexicon with some words:
import sysfor line in sys.stdin: print(tr.lookup(line.replace('\n',''), output='text'))Try the word "testi" and you should get something like:
testi<N><sg><nom> 0.000000Try a non-word "xtesti" and you get something like:
0An example of creating a simple transducer from scratch and converting between transducer formats and testingtransducer properties and handling exceptions:
import hfst# Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.t = hfst.HfstBasicTransducer()t.add_state(1)t.add_transition(0, 1, 'a', 'b', 0.3)t.set_final_weight(1, 0.5)# Convert to tropical OpenFst format (the default) and push weights toward final state.T = hfst.HfstTransducer(t)T.push_weights_to_end()# Convert back to HFST basic transducer.tc = hfst.HfstBasicTransducer(T)try:# Rounding might affect the precision. if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81): print("TEST PASSED") exit(0) else: print("TEST FAILED") exit(1)# If the state does not exist or is not finalexcept hfst.exceptions.HfstException as e: print("TEST FAILED: An exception was thrown.") exit(1)An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.
import hfsthfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) # we use foma implementation as there are no weights involved# Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].tok = hfst.HfstTokenizer()tok.add_multichar_symbol('foo')tok.add_multichar_symbol('bar')tok.add_multichar_symbol('baz')words = hfst.tokenized_fst(tok.tokenize('foobarfoo'))t = hfst.tokenized_fst(tok.tokenize('foobarbaz'))words.disjunct(t)# Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.rule = hfst.regex('bar (->) baz || foo _ foo')# Apply the rule transducer to the lexicon.words.compose(rule)words.minimize()# Extract all string pairs from the result and print them to standard output.results = 0try:# Extract paths and remove tokenization results = words.extract_paths(output='dict')except hfst.exceptions.TransducerIsCyclicException as e:# This should not happen because transducer is not cyclic. print("TEST FAILED") exit(1)for input,outputs in results.items(): print('%s:' % input) for output in outputs: print(' %s\t%f' % (output[0], output[1]))The output:
foobarfoo: foobarfoo 0.000000 foobazfoo 0.000000foobarbaz: foobarbaz 0.000000Thehelp command of python is probably useful when finding information on the package, a class in it or a given function in a class:
help(hfst)help(hfst.HfstTransducer)help(hfst.HfstTransducer.lookup)