Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PmatchContainer

eaxelson edited this pageMay 14, 2018 ·17 revisions

class PmatchContainer

A class for performing pattern matching.

Probably the easiest way to perform pattern matching is with functionshfst.compile_pmatch_expression andhfst.compile_pmatch_file


__init__ (self)

Initialize a PmatchContainer. Is this needed?


__init__ (self, defs)

Create a PmatchContainer based on definitionsdefs.

  • defs: A tuple of transducers inHFST_OLW_TYPE defining how pmatch is done.

An example:

If we have a file namedstreets.txt that contains:

define CapWord UppercaseAlpha Alpha* ;define StreetWordFr [{avenue} | {boulevard} | {rue}] ;define DeFr [ [{de} | {du} | {des} | {de la}] Whitespace ] | [{d'} | {l'}] ;define StreetFr StreetWordFr (Whitespace DeFr) CapWord+ ;regex StreetFr EndTag(FrenchStreetName) ;

and which has been earlier compiled and stored in filestreets.pmatch.hfst.ol:

defs = hfst.compile_pmatch_file('streets.txt')ostr = hfst.HfstOutputStream(filename='streets.pmatch.hfst.ol', type=hfst.ImplementationType.HFST_OLW_TYPE)for tr in defs:    ostr.write(tr)ostr.close()

we can read the pmatch definitions from file and perform string matching with:

istr = hfst.HfstInputStream('streets.pmatch.hfst.ol')defs = []while(not istr.is_eof()):    defs.append(istr.read())istr.close()cont = hfst.PmatchContainer(defs)assert cont.match("Je marche seul dans l'avenue des Ternes.") == "Je marche seul dans l'<FrenchStreetName>avenue des Ternes</FrenchStreetName>."

See also:hfst.compile_pmatch_file,hfst.compile_pmatch_expression


match (self, input, time_cutoff = 0)

Match inputinput.


get_profiling_info (self)

todo


set_verbose (self, b)

todo


set_extract_tags_mode (self, b)

todo


set_profile (self, b)

todo


locate(self, input, time_cutoff, weight_cutoff)

The locations of pmatched strings for stringinput where the results are limitedas defined bytime_cutoff andweight_cutoff.

  • input : The input string.
  • time_cutoff : Time cutoff, defaults to zero, i.e. no cutoff.
  • weight_cutoff : Weight cutoff, defaults to infinity, i.e. no cutoff.

Returns: A tuple of tuples of Location.


tokenize(self, input)

Tokenizeinput and return a list of tokens i.e. strings.

  • input: The string to be tokenized.

get_tokenized_output(self, input, **kwargs)

Tokenizeinput and get a string representation of the tokenization(essentially the same that command line tool hfst-tokenize would give).

  • input: The input string to be tokenized.
  • kwargs: Possible parameters are:output_format,max_weight_classes,dedupe,print_weights,print_all,time_cutoff,verbose,beam,tokenize_multichar.
  • output_format: The format of output; possible values aretokenize,xerox,cg,finnpos,giellacg,conllu andvisl;tokenize being the default.
  • max_weight_classes: Maximum number of best weight classes to output (where analyses with equal weight constitute a class), defaults to None i.e. no limit.
  • dedupe: Whether duplicate analyses are removed, defaults to False.
  • print_weights: Whether weights are printd, defaults to False.
  • print_all: Whether nonmatching text is printed, defaults to False.
  • time_cutoff: Maximum number of seconds used per input after limiting the search.
  • verbose: Whether input is processed verbosely, defaults to True.
  • beam: Beam within analyses must be to get printed.
  • tokenize_multichar: Tokenize input into multicharacter symbols present in the transducer, defaults to false.
Clone this wiki locally

[8]ページ先頭

©2009-2025 Movatter.jp