Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PackageHfst

eaxelson edited this pageFeb 8, 2018 ·33 revisions

Package hfst

...

List of contents of package hfst

ItemDescription
class AttReaderA class for reading input in AT&T text format and converting it into transducer(s).
class PrologReaderA class for reading input in prolog text format and converting it into transducer(s).
class HfstBasicTransducerA simple transducer class with tropical weights.
class HfstBasicTransitionA transition class that consists of a target state, input and output symbols and a a tropical weight.
class HfstTransducerA synchronous finite-state transducer.
class HfstInputStreamA stream for reading HFST binary transducers.
class HfstOutputStreamA stream for writing HFST binary transducers.
class MultiCharSymbolTrie???
class HfstTokenizerA tokenizer for creating transducers from UTF-8 strings.
class LexcCompilerA compiler holding information contained in lexc style lexicons.
class XreCompilerA regular expression compiler.
class PmatchContainerA class for performing pattern matching.
class ImplementationTypeBack-end implementations.
set_default_fst_typeSet default transducer implementation type.
get_default_fst_typeGet default transducer implementation type.
fst_type_to_stringGet a string representation of transducer implementation type.
EPSILONThe string for epsilon symbol.
UNKNOWNThe string for unknown symbol.
IDENTITYThe string for identity symbol.
fstGet a transducer that recognizes one or more paths.
fst_to_fsaGet an automaton representation of a tranducer.
fsa_to_fstGet a transducer representation of an automaton.
tokenized_fstGet a transducer that recognizes the concatenation of symbols or symbol pairs.
empty_fstGet an empty transducer.
epsilon_fstGet an epsilon transducer.
regexGet a transducer as defined by regular expression.
compile_sfst_fileCompile sfst file into a transducer.
compile_lexc_fileCompile lexc file into a transducer.
compile_xfst_fileCompile (is 'run' a better term?) xfst file.
compile_pmatch_fileCompile pmatch expressions as defined infile and return a tuple of transducers.
compile_twolc_fileCompile twolc file and store the result to file.
compile_pmatch_expressionCompile a pmatch expression into a tuple of transducers.
start_xfstStart interactive xfst compiler.
read_att_inputRead AT&T input from the user and return a transducer.
read_att_stringRead a multiline AT&T string and return a transducer.
read_att_transducerRead next transducer from file in AT&T format.
read_prolog_transducerRead next transducer from file in prolog format.
read_prolog_transducerRead next transducer from file in prolog format keeping track of lines.
concatenateReturn a concatenation of transducers.
disjunctReturn a union of transducers.
intersectReturn an intersection of transducers.
composeReturn a composition of transducers.
cross_productReturn a cross product of transducers.
is_diacriticWhether a symbol is flag diacritic.

set_default_fst_type (impl)

Set default transducer implementation type.

Set the implementation type (SFST_TYPE, TROPICAL_OPENFST_TYPE, FOMA_TYPE) that isused by default by all operations that create transducers. The default value isTROPICAL_OPENFST_TYPE


get_default_fst_type ()

Get default transducer implementation type.

If the default type is not set, it defaults toTROPICAL_OPENFST_TYPE.


fst_type_to_string (type)

Get a string representation of transducer implementation typetype.


EPSILON='@_EPSILON_SYMBOL_@'

The string for epsilon symbol.

An example:

fsm = hfst.HfstBasicTransducer()fsm.add_state(1)fsm.set_final_weight(1, 2.0)fsm.add_transition(0, 1, "foo", hfst.EPSILON)if not hfst.HfstTransducer(fsm).compare(hfst.regex('foo:0::2.0')):    raise RuntimeError('')

Note: In regular expressions, "0" is used for the epsilon.See also:Symbols


UNKNOWN='@_UNKNOWN_SYMBOL_@'

The string for unknown symbol.

An example:

fsm = hfst.HfstBasicTransducer()fsm.add_state(1)fsm.set_final_weight(1, -0.5)fsm.add_transition(0, 1, "foo", hfst.UNKNOWN)fsm.add_transition(0, 1, "foo", "foo")if not hfst.HfstTransducer(fsm).compare(hfst.regex('foo:?::-0.5')):    raise RuntimeError('')

Note: In regular expressions, "?" on either or both sides of a transition is used for the unknown symbol.See also:Symbols


IDENTITY='@_IDENTITY_SYMBOL_@'

The string for identity symbol.

An example:

fsm = hfst.HfstBasicTransducer()fsm.add_state(1)fsm.set_final_weight(1, 1.5)fsm.add_transition(0, 1, hfst.IDENTITY, hfst.IDENTITY)if not hfst.HfstTransducer(fsm).compare(hfst.regex('?::1.5')):    raise RuntimeError('')

Note: In regular expressions, a single "?" is used for the identity symbol.See also:Symbols


fst (arg)

Get a transducer that recognizes one or more paths.

  • arg See example below

Possible inputs:

One unweighted identity path:'foo'  ->  [f o o]Weighted path: a tuple of string and number, e.g.('foo',1.4)('bar',-3)('baz',0)Several paths: a list or a tuple of paths and/or weighted paths, e.g.['foo', 'bar']('foo', ('bar',5.0))('foo', ('bar',5.0), 'baz', 'Foo', ('Bar',2.4))[('foo',-1), ('bar',0), ('baz',3.5)]A dictionary mapping strings to any of the above cases:{'foo':'foo', 'bar':('foo',1.4), 'baz':(('foo',-1),'BAZ')}

fst_to_fsa (fst, separator='')

Get a transducer (automaton) where each transition symbol pair isymbol:osymbol offst is replaced with a transition isymbolosymbol:isymbolosymbol, addingseparator between isymbol and osymbol.

  • fst The transducer.
  • separator The separator symbol inserted between input and output symbols.

Examples

import hfstfoo2bar = hfst.fst({'foo':'bar'})
creates a transducer [f:b o:a o:r]. Calling
foobar = hfst.fst_to_fsa(foo2bar)
will create the transducer [fb:fb oa:oa or:or] and
foobar = hfst.fst_to_fsa(foo2bar, '^')
the transducer [f^b:f^b o^a:o^a o^r:o^r].

See also: hfst.fsa_to_fst


fsa_to_fst (fsa, separator='')

Get a transducer where each transition isymbolSosymbol:isymbolSosymbol offsa is replaced a transition isymbol:osymbol, ifseparator is S.

  • fsa The transducer. Must be an automaton, i.e. for each transition, the input and output symbols must be the same. Else, a TransducerIsNotAutomatonException is thrown.
  • separator The symbol separating input and output symbol parts infsa. If it is the empty string, length of each symbol infsa (excluding special symbols of form "@...@") must be exactly 2. Else, a RuntimeError is thrown.

Examples:

import hfstfoo2bar = hfst.fst({'foo':'bar'})  # creates transducer [f:b o:a o:r]foobar = hfst.fst_to_fsa(foo2bar, '^')
creates the transducer [f^b:f^b o^a:o^a o^r:o^r]. Then calling
foo2bar = hfst.fsa_to_fst(foobar, '^')
will create again the original transducer [f:b o:a o:r].See also: hfst.fst_to_fsa

tokenized_fst (arg, weight=0)

Get a transducer that recognizes the concatenation of symbols or symbol pairs inarg.

  • arg The symbols or symbol pairs that form the path to be recognized.

Example

import hfsttok = hfst.HfstTokenizer()tok.add_multichar_symbol('foo')tok.add_multichar_symbol('bar')tr = hfst.tokenized_fst(tok.tokenize('foobar', 'foobaz'))
will create the transducer [foo:foo bar:b 0:a 0:z]

empty_fst ()

Get an empty transducer.

Empty transducer has one state that is not final, i.e. it does not recognize any string.


epsilon_fst (weight=0)

Get an epsilon transducer.

Epsilon transducer has one state that is final (with final weightweight), i.e. it recognizes the empty string.

  • weight The weight of the final state.

regex (regexp, **kwargs)

Get a transducer as defined by regular expressionregexp.

  • regexp The regular expression defined withXerox transducer notation.
  • kwargs Arguments recognized are: error.
  • error Where warnings and errors are printed. Possible values are sys.stdout, sys.stderr (the default), a StringIO or None, indicating a quiet mode.

Regular expression operators:

~   complement\   term complement&   intersection-   minus$.  contains once$?  contains optionally$   contains once or more( ) optionality+   Kleene plus*   Kleene star./. ignore internally (not yet implemented)/   ignoring|   union<>  shuffle<   before>   after.o.   composition.O.   lenient composition.m>.  merge right.<m.  merge left.x.   cross product.P.   input priority union.p.   output priority union.-u.  input minus.-l.  output minus`[ ]  substitute^n,k  catenate from n to k times, inclusive^>n   catenate more than n times^>n   catenate less than n times^n    catenate n times.r   reverse.i   invert.u   input side.l   output side\\\  left quotientTwo-level rules: \<=   left restriction <=>   left and right arrow <=    left arrow =>    right arrowReplace rules: ->    replace right (->)  optionally replace right <-    replace left (<-)  optionally replace left <->   replace left and right (<->) optionally replace left and right @->   left-to-right longest match @>    left-to-right shortest match ->@   right-to-left longest match >@    right-to-left shortest matchRule contexts, markers and separators: ||   match contexts on input sides //   match left context on output side and right context on input side \\   match left context on input side and right context on output side \/   match contexts on output sides _    center marker ...  markup marker ,,   rule separator in parallel rules ,    context separator [. .]  match epsilons only onceRead from file: @bin" "  read binary transducer @txt" "  read transducer in att text format @stxt" " read spaced text @pl" "   read transducer in prolog text format @re" "   read regular expressionSymbols: .#.  word boundary symbol in replacements, restrictions 0    the epsilon ?    any token %    escape character { }  concatenate symbols " "  quote symbol:    pair separator::   weight;   end of expression!   starts a comment until end of line#   starts a comment until end of line

compile_sfst_file (filename, **kwargs)

Compile sfst filefilename into a transducer.

  • filename The name of the sfst file.
  • kwargs Arguments recognized are: verbose, output.
  • verbose Whether sfst file is processed in verbose mode, defaults to False.
  • output TODO: Where output is printed. Possible values are sys.stdout, sys.stderr, a StringI0, sys.stderr being the default.Return: On success the resulting transducer, else None.

compile_lexc_file (filename, **kwargs)

Compile lexc filefilename into a transducer.

  • filename The name of the lexc file.
  • kwargs Arguments recognized are: verbosity, with_flags, output.
  • verbosity The verbosity of the compiler, defaults to 0 (silent). Possible values are: 0, 1, 2.
  • with_flags Whether lexc flags are used when compiling, defaults to False.
  • output Where output is printed. Possible values are sys.stdout, sys.stderr, a StringIO, sys.stderr being the default?Return: On success the resulting transducer, else None.

compile_xfst_file (filename, **kwargs)

Compile (is 'run' a better term?) xfst filefilename.

  • filename The name of the xfst file.
  • kwargs Arguments recognized are: verbosity, quit_on_fail, output, type.
  • verbosity The verbosity of the compiler, defaults to 0 (silent). Possible values are: 0, 1, 2.
  • quit_on_fail Whether the script is exited on any error, defaults to True.
  • output Where output is printed. Possible values are sys.stdout, sys.stderr, a StringIO, sys.stderr being the default?
  • type Implementation type of the compiler, defaults to hfst.get_default_fst_type().Return: On success 0, else an integer greater than 0.

compile_pmatch_file (filename)

Compile pmatch expressions as defined infilename and return a tuple of transducers.

An example:

If we have a file named streets.txt that contains:

define CapWord UppercaseAlpha Alpha* ;define StreetWordFr [{avenue} | {boulevard} | {rue}] ;define DeFr [ [{de} | {du} | {des} | {de la}] Whitespace ] | [{d'} | {l'}] ;define StreetFr StreetWordFr (Whitespace DeFr) CapWord+ ;regex StreetFr EndTag(FrenchStreetName) ;

we can run:

defs = hfst.compile_pmatch_file('streets.txt')cont = hfst.PmatchContainer(defs)assert cont.match("Je marche seul dans l'avenue des Ternes.") == "Je marche seul dans l'<FrenchStreetName>avenue des Ternes</FrenchStreetName>."

See also:hfst.PmatchContainer.match,hfst.PmatchContainer.__init__


compile_twolc_file (inputfilename, outputfilename, **kwargs)

Compile twolc fileinputfilename and store the result to fileoutputfilename.

  • inputfilename The name of the twolc input file.
  • outputfilename The name of the transducer output file.
  • kvargs Arguments recognized are: silent, verbose, resolve_right_conflicts, resolve_left_conflicts, type.
  • silent Whether compilation is performed in silent mode, defaults to False.
  • verbose Whether compilation is performed in verbose mode, defaults to False.
  • resolve_right_conflicts Whether right arrow conflicts are resolved, defaults to True.
  • resolve_left_conflicts Whether left arrow conflicts are resolved, defaults to False.
  • type Implementation type of the compiler, defaults to hfst.get_default_fst_type().Return: On success zero, else an integer other than zero.

compile_pmatch_expression (expr)

Compile a pmatch expression into a tuple of transducers.

  • expr A string defining how pmatch is done.

See also:hfst.compile_pmatch_file


start_xfst (**kwargs)

Start interactive xfst compiler.

  • kwargs Arguments recognized are: type, quit_on_fail.
  • quit_on_fail Whether the compiler exits on any error, defaults to False.
  • type Implementation type of the compiler, defaults to hfst.get_default_fst_type().

See also: command line toolhfst-xfst


read_att_input ()

Read AT&T input from the user and return a transducer.

Return: An HfstTransducer whose type is hfst.get_default_fst_type().

Read one AT&T line at a time from standard input and finally return an equivalent transducer.An empty line signals the end of input.


read_att_string (att)

Read a multiline stringatt and return a transducer.

  • att A string in AT&& format that defines the transducer.Return: An HfstTransducer whose type is hfst.get_default_fst_type().

Readatt and create a transducer as defined in it.


read_att_transducer (f, epsilonstr=hfst.EPSILON)

Read next transducer from AT&T file pointed byf.epsilonstr defines the symbol used for epsilon in the file.

  • f A python file
  • epsilonstr How epsilon is represented in the file. By default, "@EPSILON_SYMBOL@" and "@0@" are both recognized.

If the file contains several transducers, they must be separated by "--" lines.In AT&T format, the transition lines are of the form:

[0-9]+[\w]+[0-9]+[\w]+[^\w]+[\w]+[^\w]([\w]+(-)[0-9]+(\.[0-9]+))

and final state lines:

[0-9]+[\w]+([\w]+(-)[0-9]+(\.[0-9]+))

If several transducers are listed in the same file, they are separated by lines oftwo consecutive hyphens "--". If the weight

([\w]+(-)[0-9]+(\.[0-9]+))
is missing, the transition or final state is given a zero weight.

NOTE: If transition symbols contains spaces, they must be escapedas '@SPACE@' because spaces are used as field separators.Both '@0@' and '@EPSILON_SYMBOL@' are always interpreted asepsilons.

An example:

0      1      foo      bar      0.31      0.5--0      0.0----0      0.00      0      a        <eps>    0.2

The example lists four transducers in AT&T format:one transducer accepting the string pair <'foo','bar'>, oneepsilon transducer, one empty transducer and one transducerthat accepts any number of 'a's and produces an empty stringin all cases. The transducers can be read with the following commands (from a file named'testfile.att'):

transducers = []ifile = open('testfile.att', 'r')try:    while (True):        t = hfst.read_att_transducer(ifile, '<eps>')        transducers.append(t)        print("read one transducer")except hfst.exceptions.NotValidAttFormatException as e:    print("Error reading transducer: not valid AT&T format.")except hfst.exceptions.EndOfStreamException as e:ifile.close()print("Read %i transducers in total" % len(transducers))

Epsilon will be represented as hfst.EPSILON in the resulting transducer.The argumentepsilon_symbol only denotes how epsilons are representedinifile.

Known bugs: Empty transducers are in theory represented as empty strings in AT&T format.However, this sometimes results in them getting interpreted as end-of-file.To avoid this, use an empty line instead, i.e. a single newline character.

Throws:

See also: #write_att


read_prolog_transducer (f)

Read next transducer from prolog file pointed byf.

  • f A python file.

If the file contains several transducers, they must be separated by empty lines.


read_prolog_transducer (f, linecount=[0])

Create a transducer as defined in prolog format in filef.linecountkeeps track of the current line in the file.


concatenate (transducers)

Return a concatenation oftransducers.

  • transducers An iterable object of transducers.

disjunct (transducers)

Return a union oftransducers.

  • transducers An iterable object of transducers.

intersect (transducers)

Return an intersection oftransducers.

  • transducers An iterable object of transducers.

compose (transducers)

Return a composition oftransducers.

  • transducers An iterable object of transducers.

cross_product (transducers)

Return a cross product oftransducers.

  • transducers An iterable object of transducers.

is_diacritic (symbol)

Whether symbolsymbol is a flag diacritic.

Flag diacritics are of the form

@[PNDRCU][.][A-Z]+([.][A-Z]+)?@
Clone this wiki locally

[8]ページ先頭

©2009-2025 Movatter.jp