Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

HfstTransducer

eaxelson edited this pageFeb 8, 2018 ·25 revisions

class HfstTransducer

A synchronous finite-state transducer.


Argument handling

Transducer functions modify their calling object and returna reference to the calling object after modification,unless otherwise mentioned.Transducer arguments are usually not modified.

# transducer is reversedtransducer.reverse()# transducer2 is not modified, but a copy of it is disjuncted with# transducer1transducer1.disjunct(transducer2)# a chain of functions is possibletransducer.reverse().determinize().reverse().determinize()

Implementation types

Currently, an HfstTransducer has three implementation types that are well supported.When an HfstTransducer is created, its type is defined with an argument.For functions that take a transducer as an argument, the type of the calling transducermust be the same as the type of the argument transducer:

# this will cause a TransducerTypeMismatchException:tropical_transducer.disjunct(foma_transducer)# this works, but weights are lost in the conversiontropical_transducer.convert(hfst.ImplementationType.SFST_TYPE).disjunct(sfst_transducer)# this works, information is not losttropical_transducer.disjunct(sfst_transducer.convert(hfst.ImplementationType.TROPICAL_OPENFST_TYPE))

Creating transducers

With HfstTransducer constructors it is possible to create empty,epsilon, one-transition and single-path transducers.Transducers can also be created from scratch withHfstBasicTransducerand converted to an HfstTransducer.More complex transducers can be combined from simple ones with various functions.class HfstTransducer:

Whether HFST is linked to the transducer library needed by implementation typetype.is_implementation_type_available(type):


__init__ (self)

Create an empty transducer.

tr = hfst.HfstTransducer()assert(tr.compare(hfst.empty_fst()))

__init__ (self, another)

Create a deep copy of HfstTransduceranother or a transducer equivalent to HfstBasicTransduceranother.

  • another An HfstTransducer or HfstBasicTransducer.

An example:

tr1 = hfst.regex('foo bar foo')tr2 = hfst.HfstTransducer(tr1)tr2.substitute('foo','FOO')tr1.concatenate(tr2)

__init__ (self, t, type)

Create an HFST transducer equivalent to HfstBasicTransducert. The type of the created transducer is defined bytype.

  • t An HfstBasicTransducer.
  • type The type of the resulting transducer.

If you want to use the default type, you can just callhfst.HfstTransducer(fsm)


copy (self)

Return a deep copy of the transducer.

tr = hfst.regex('[foo:bar::0.3]*')TR = tr.copy()assert(tr.compare(TR))

set_name (self, name)

Rename the transducername.

  • name The name of the transducer.

See also:get_name


get_name (self)

Get the name of the transducer.

See also:set_name


__str__ (self)

An AT&T representation of the transducer.

Defined for print command. An example:

>>> print(hfst.regex('[foo:bar::2]+'))0       1       foo     bar     2.0000001       1       foo     bar     2.0000001       0.000000

Todo: Works only for small transducers.


prune (self)

Make transducer coaccessible.

A transducer is coaccessible iff there is a path from every state to a final state.


set_property (self, property, value)

Set arbitrary string propertyproperty tovalue.

  • property A string naming the property.
  • value A string expressing the value ofproperty.

Note: set_property('name', 'name of the transducer') equals set_name('name of the transducer').

Note: While this function is capable of creating endless amounts of arbitrary metadata, it is suggested that property names aredrawn from central repository, or prefixed with "x-". A property that does not follow this convention may affect the behavior oftransducer in future releases.


get_property (self, property)

Get arbitrary string propert @a property.

  • property The name of the property whose value is returned.

get_property('name') works like get_name().


get_properties (self)

Get all properties from the transducer.

Return: A dictionary whose keys are properties and whose values are the values of those properties.


get_alphabet (self)

Get the alphabet of the transducer.

The alphabet is defined as the set of symbols known to the transducer.

Return: A tuple of strings.


insert_to_alphabet (self, symbol)

Explicitly insertsymbol to the alphabet of the transducer.

  • symbol The symbol (string) to be inserted.

Note: Usually this function is not needed since new symbols areadded to the alphabet by default.


remove_from_alphabet (self, symbol)

Removesymbol from the alphabet of the transducer.

  • symbol The symbol (string) to be removed.

Precondition:symbol does not occur in any transition of the transducer.

Note: Use with care, removing a symbol that occurs in a transitionof the transducer can have unexpected results.


eliminate_flag (self, symbol)

Eliminate flag diacriticsymbol from the transducer.

  • symbol The flag to be eliminated. TODO: explain more.

An equivalent transducer with no flagssymbol.


eliminate_flags (self, symbols)

Eliminate flag diacritics listed insymbols from the transducer.

  • symbols The flags to be eliminated. TODO: explain more.

An equivalent transducer with no flags listed insymbols.


is_automaton (self)

Whether each transition in the transducer has equivalent input and output symbols.Note: Transition with hfst.UNKNOWN on both sides IS NOT a transition with equivalent input and output symbols.Note: Transition with hfst.IDENTITY on both sides IS a transition with equivalent input and output symbols.


is_cyclic (self)

Whether the transducer is cyclic.


get_type (self)

The implementation type of the transducer.Return:hfst.ImplementationType


compare (self, another)

Whether this transducer andanother are equivalent.

  • another The compared transducer.Precondition:self andanother must have the same implementation type.

Two transducers are equivalent iff they accept the same input/outputstring pairs with the same weights and the same alignments.Note: For weighted transducers, the function often returns false negativesdue to weight precision issues.


remove_epsilons (self)

Remove allepsilon:epsilon transitions from the transducer so that the resulting transducer is equivalent to the original one.


determinize (self)

Determinize the transducer.

Determinizing a transducer yields an equivalent transducer that hasno state with two or more transitions whose input:output symbolpairs are the same.


number_of_states (self)

The number of states in the transducer.


number_of_arcs (self)

The number of transitions in the transducer.


write (self, ostr)

Write the transducer in binary format toostr.

  • ostr A hfst.HfstOutputStream where the transducer is written.

write_att (self, f, write_weights=True)

Write the transducer in AT&T format to filef,write_weights defined whether weights are written.

  • f A python file where transducer is written.
  • write_weights Whether weights are written.

write_prolog (self, f, name, write_weights=True)

Write the transducer in prolog format with namename to filef,write_weights defined whether weights are written.

  • f A python file where the transducer is written.
  • name The name of the transducer that must be given in a prolog file.
  • write_weights Whether weights are written.

minimize (self)

Minimize the transducer.

Minimizing a transducer yields an equivalent transducer withthe smallest number of states.

Known bugs: OpenFst's minimization algorithm seems to add epsilon transitions to weighted transducers?


n_best (self, n)

Extractn best paths of the transducer.

In the case of a weighted transducer (TROPICAL_OPENFST_TYPE orLOG_OPENFST_TYPE, best paths are defined as paths withthe lowest weight.In the case of an unweighted transducer (SFST_TYPE orFOMA_TYPE),the function returns random paths.

This function is not implemented forFOMA_TYPE orSFST_TYPE.If this function is called by an HfstTransducer of typeFOMA_TYPEorSFST_TYPE, it is converted toTROPICAL_OPENFST_TYPE,paths are extracted and it is converted back toFOMA_TYPE orSFST_TYPE. If HFST is not linked to OpenFst library, anhfst.exceptions.ImplementationTypeNotAvailableException is thrown.


repeat_star (self)

A concatenation of N transducers where N is any number from zero to infinity.


repeat_plus (self)

A concatenation of N transducers where N is any number from one to infinity.


repeat_n (self, n)

A concatenation ofn transducers.


repeat_n_minus (self, n)

A concatenation of N transducers where N is any number from zero ton, inclusive.


repeat_n_plus (self, n)

A concatenation of N transducers where N is any number fromn to infinity, inclusive.


repeat_n_to_k (self, n, k)

A concatenation of N transducers where N is any number fromn tok, inclusive.


optionalize (self)

Disjunct the transducer with an epsilon transducer.


invert (self)

Swap the input and output symbols of each transition in the transducer.


reverse (self)

Reverse the transducer.

A reverted transducer accepts the string 'n(0) n(1) ... n(N)'iff the originaltransducer accepts the string 'n(N) n(N-1) ... n(0)'


input_project (self)

Extract the input language of the transducer.

All transition symbol pairsisymbol:osymbol are changedtoisymbol:isymbol.


output_project (self)

Extract the output language of the transducer.

All transition symbol pairsisymbol:osymbol are changedtoosymbol:osymbol.


compose (self, another)

Compose this transducer withanother.

  • another The second argument in the composition. Not modified.

lenient_composition (self, another)

Perform a lenient composition on this transducer andanother.TODO: explain more.


compose_intersect (self, v, invert=False)

Compose this transducer with the intersection oftransducers inv. Ifinvert is true, then compose theintersection of the transducers inv with this transducer.

The algorithm used by this function is faster than intersectingall transducers one by one and then composing this transducerwith the intersection.

Precondition: The transducers inv are deterministic and epsilon-free.

  • v A tuple of transducers.
  • invert Whether the intersection of the transducers inv is composed with this transducer.

concatenate (self, another)

Concatenate this transducer withanother.


disjunct (self, another)

Disjunct this transducer withanother.


intersect (self, another)

Intersect this transducer withanother.


subtract (self, another)

Subtract transduceranother from this transducer.


minus (self, another)

Alias for subtract.See also:subtract


conjunct (self, another)

Alias for intersect.See also:intersect


convert (self, type, options='')

Convert the transducer into an equivalent transducer in formattype.

If a weighted transducer is converted into an unweighted one,all weights are lost. In the reverse case, all weights are initialized to thesemiring's one.

A transducer of typeSFST_TYPE,TROPICAL_OPENFST_TYPE,LOG_OPENFST_TYPE orFOMA_TYPE can be converted into anHFST_OL_TYPE orHFST_OLW_TYPE transducer, but anHFST_OL_TYPEorHFST_OLW_TYPE transducer cannot be converted to any other type.

Note: For conversion between HfstBasicTransducer and HfstTransducer,seeHfstTransducer.__init__ andHfstBasicTransducer.__init__.


write_att (self, ofile, write_weights=True)

Write the transducer in AT&T format to fileofile,write_weights defines whether weights are written.

The fields in the resulting AT&T format are separated by tabulator characters.

NOTE: If the transition symbols contain space characters,the spaces are printed as '@SPACE@' becausewhitespace characters are used as field separators in AT&T format. Epsilon symbols are printed as '@0@'.

If several transducers are written in the same file, they must be separated by a line of two consecutive hyphens "--", so thatthey will be read correctly by hfst.read_att.

An example:

tr1 = hfst.regex('[foo:bar baz:0 " "]::0.3')tr2 = hfst.empty_fst()tr3 = hfst.epsilon_fst(0.5)tr4 = hfst.regex('[foo]')tr5 = hfst.empty_fst()

f = hfst.hfst_open('testfile.att', 'w')for tr in [tr1, tr2, tr3, tr4]:    tr.write_att(f)    f.write('--\n')tr5.write_att(f)f.close()

This will yield a file 'testfile.att' that looks as follows:

0       1       foo     bar     0.2998051       2       baz     @0@     0.0000002       3       @_SPACE_@       @_SPACE_@       0.0000003       0.000000----0       0.500000--0       1       foo     foo     0.0000001       0.000000--

Throws:

See also:HfstOutputStream.write


write_att (self, filename, write_weights=True)

Write the transducer in AT&T format to file namedfilename.write_weights defines whether weights are written.

If the file exists, it is overwritten. If the file does not exist, it is created.


priority_union (self, another)

Make priority union of this transducer withanother.

For the operation t1.priority_union(t2), the result is a union of t1 and t2,except that whenever t1 and t2 have the same string on left side,the path in t2 overrides the path in t1.

Example

Transducer 1 (t1):a : ab : bTransducer 2 (t2):b : Bc : CResult ( t1.priority_union(t2) ):a : ab : Bc : C

For more information, readfsmbook.


cross_product (self, another)

Make cross product of this transducer withanother. It pairs every string of this with every string ofanother.If strings are not the same length, epsilon padding will be added in the end of the shorter string.Precondition: Both transducers must be automata, i.e. map strings onto themselves.


shuffle (self, another)

Shuffle this transducer with transduceranother.

If transducer A accepts string 'foo' and transducer B string 'bar',the transducer that results from shuffling A and B accepts all strings[(f|b)(o|a)(o|r)].

Precondition: Both transducers must be automata, i.e. map strings onto themselves.


insert_freely (self, ins)

Freely insert a transition or a transducer into the transducer.

  • ins The transition or transducer to be inserted.

Ifins is a transition, i.e. a 2-tuple of strings: A transition is added to each state in this transducer.The transition leads from that state to itself with input and output symbols defined byins.The weight of the transition is zero.

Ifins is anHfstTransducer:A copy ofins is attached with epsilon transitionsto each state of this transducer. After the operation, for eachstate S in this transducer, there is an epsilon transitionthat leads from state S to the initial state ofins, and for each final state ofins, there is an epsilon transitionthat leads from that final state to state S in this transducer.The weights of the final states inins are copied to theepsilon transitions leading to state S.


set_final_weights (self, weight)

Set the weights of all final states toweight.If the HfstTransducer is of unweighted type (SFST_TYPE orFOMA_TYPE), nothing is done.


push_weights_to_start (self)

Push weights towards initial state.

If the HfstTransducer is of unweighted type (SFST_TYPE orFOMA_TYPE), nothing is done.

An example:

>>> import hfst>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')>>> tr.push_weights_to_start()>>> print(tr)0       1       a       a       2.0000001       2       a       b       0.0000002       3       b       b       0.0000002       0.0000003       0.000000
#See also:push_weights_to_end

push_weights_to_end (self)

Push weights towards final state(s).

If the HfstTransducer is of unweighted type (SFST_TYPE orFOMA_TYPE), nothing is done.

An example:

>>> import hfst>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')>>> tr.push_weights_to_end()>>> print(tr)0       1       a       a       0.0000001       2       a       b       0.0000002       3       b       b       0.0000002       2.0000003       2.000000

See also:push_weights_to_start


substitute (self, s, S=None, **kwargs)

Substitute symbols or transitions in the transducer.

  • s The symbol or transition to be substituted. Can also be a dictionary of substitutions, if S == None.
  • S The symbol, transition, a tuple of transitions or a transducer (hfst.HfstTransducer) that substitutess.
  • kwargs Arguments recognized are 'input' and 'output', their values can be False or True, True being the default.These arguments are valid only ifs andS are strings, else they are ignored.
  • input Whether substitution is performed on input side, defaults to True. Valid only ifs andS are strings.
  • output Whether substitution is performed on output side, defaults to True. Valid only ifs and \ S are strings.

For more information, seeHfstBasicTransducer.substitute. The function works similarly, with the exception of argumentS, which must be anHfstTransducer instead ofHfstBasicTransducer.

See also:HfstBasicTransducer.substitute


lookup_optimize (self)

Optimize the transducer for lookup.This effectively converts the transducer intoHFST_OL_TYPE.


lookup (self, input, kwargs)

Lookup stringinput.

  • input The input. A string or a pre-tokenized tuple of symbols (i.e. a tuple of strings).
  • kwargs Possible parameters and their default values are: obey_flags=True, max_number=-1, time_cutoff=0.0, output='tuple'
  • obey_flags Whether flag diacritics are obeyed. Always True for HFST_OL(W)_TYPE transducers.
  • max_number Maximum number of results returned, defaults to -1, i.e. infinity.
  • time_cutoff How long the function can search for results before returning, expressed in seconds. Defaults to 0.0, i.e. infinitely. Always 0.0 for transducers that are not of HFST_OL(W)_TYPE.
  • output Possible values are 'tuple', 'text' and 'raw', 'tuple' being the default.

Note: This function has an efficient implementation only for optimized lookup formatHFST_OL_TYPE orHFST_OLW_TYPE). Other formats perform thelookup via composition. Consider converting the transducer to optimized lookup formator to a HfstBasicTransducer. Conversion to HFST_OL(W)_TYPE might take a while but thelookup is fast. Conversion to HfstBasicTransducer is quick but lookup is slower.


remove_optimization (self)

Remove lookup optimization.This effectively converts transducer (back) into default fst type.


extract_paths (self, **kwargs)

Extract paths that are recognized by the transducer.

  • kwargs Arguments recognized are filter_flags, max_cycles, max_number, obey_flags, output, random.
  • filter_flags Whether flags diacritics are filtered out from the result (default True).
  • max_cycles Indicates how many times a cycle will be followed, with negative numbers indicating unlimited (default -1 i.e. unlimited).
  • max_number The total number of resulting strings is capped at this value, with 0 or negative indicating unlimited (default -1 i.e. unlimited).
  • obey_flags Whether flag diacritics are validated (default True).
  • output Output format. Values recognized: 'text', 'raw', 'dict' (the default). 'text' returns a string where paths are separated by newlines and each path is represented as input_string + ":" + output_string + "\t" t weight. 'raw' yields a tuple of all paths where each path is a 2-tuple consisting of a weight and a tuple of all transition symbol pairs, each symbol pair being a 2-tuple of an input and an output symbol. 'dict' gives a dictionary that maps each input string into a list of possible outputs, each output being a 2-tuple of an output string and a weight.
  • random Whether result strings are fetched randomly (default False).Return: The extracted strings.output controls how they are represented.

Precondition: The transducer must be acyclic, if bothmax_number andmax_cycles have unlimited values. Else a hfst.exceptions.TransducerIsCyclicException will be thrown.

An example:

>>> tr = hfst.regex('a:b+ (a:c+)')>>> print(tr)0       1       a       b       0.0000001       1       a       b       0.0000001       2       a       c       0.0000001       0.0000002       2       a       c       0.0000002       0.000000>>> print(tr.extract_paths(max_cycles=1, output='text'))a:b     0aa:bb   0aaa:bbc 0aaaa:bbcc       0aa:bc   0aaa:bcc 0>>> print(tr.extract_paths(max_number=4, output='text'))a:b     0aa:bc   0aaa:bcc 0aaaa:bccc       0>>> print(tr.extract_paths(max_cycles=1, max_number=4, output='text'))a:b     0aa:bb   0aa:bc   0aaa:bcc 0

Throws:

See also:HfstTransducer.n_best

Note:Special symbols are printed as such.

Todo: a link to flag diacritics


extract_shortest_paths (self)

Extract shortest paths of the transducer.Return: A dictionary.


extract_longest_paths (self, **kwargs)

Extract longest paths of the transducer.Return: A dictionary.


longest_path_size (self, **kwargs)

Get length of longest path of the transducer.


lookup (self, input, limit=-1)

Lookup or apply a single tokenized stringtok_input and return a maximum oflimit results.

TODO: This is a version of lookup that handles flag diacritics as ordinarysymbols and does not validate the sequences prior to outputting. Currently, this function calls lookup_fd.

Todo: Handle flag diacritics as ordinary symbols instead of calling lookup_fd.

See also:lookup_fd

Return: HfstOneLevelPaths pointer

  • tok_input A tuple of consecutive symbols (strings).
  • limit Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.

Lookup or apply a single stringinput and return a maximum oflimit results.

This is an overloaded lookup function that leaves tokenizing to the transducer.Return: HfstOneLevelPaths pointer


lookup (self, tok, input, limit=-1)

Lookup or apply a single stringinput and return a maximum oflimit results.tok defined hows is tokenized.

This is an overloaded lookup function that leaves tokenizing totok.Return: HfstOneLevelPaths pointer


lookup_fd (self, tok_input, limit = -1)

Lookup or apply a single stringtok_input minding flag diacritics properly and return a maximum oflimit results.

Traverse all paths on logical first level of the transducer to produceall possible outputs on the second.This is in effect a fast composition of single path from lefthand side.

This is a version of lookup that handles flag diacritics as epsilonsand validates the sequences prior to outputting.Epsilons on the second level are represented by empty stringsin the results.

Precondition: The transducer must be of typeHFST_OL_TYPE orHFST_OLW_TYPE. This function is not implemented for other transducer types.

  • tok_input A tuple of consecutive symbols (strings) to look up.
  • limit (Currently ignored.) Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.

See also:is_lookup_infinitely_ambiguousReturn: HfstOneLevelPaths pointer

Todo: Do not ignore argumentlimit.


lookup_fd (self, input, limit = -1)

Lookup or apply a single strings minding flag diacritics properly and return a maximum oflimit results.

This is an overloaded lookup function that leaves tokenizing to the transducer.Return: HfstOneLevelPaths pointer


lookup_fd(tok, input, limit = -1)

Lookup or apply a single stringinput minding flag diacritics properly and return a maximum oflimit results.tok defines how s is tokenized.

This is an overloaded lookup function that leaves tokenizing totok.Return: HfstOneLevelPaths pointer


is_lookup_infinitely_ambiguous (self, tok_input)

Whether lookup of pathinput will have infinite results.

Currently, this function will return whether the transduceris infinitely ambiguous on any lookup path found in the transducer,i.e. the argumentinput is ignored.

Todo: Do not ignore the argumentinput


is_infinitely_ambiguous (self)

Whether the transducer is infinitely ambiguous.

A transducer is infinitely ambiguous if there exists an input that will yield infinitely many results,i.e. there are input epsilon loops that are traversed with that input.


has_flag_diacritics (self)

Whether the transducer has flag diacritics in its transitions.

Clone this wiki locally

[8]ページ先頭

©2009-2025 Movatter.jp