NotificationsYou must be signed in to change notification settings
Fork1
Star10

PmatchContainer

eaxelson edited this pageMay 14, 2018 ·17 revisions

class PmatchContainer

class PmatchContainer

A class for performing pattern matching.

Probably the easiest way to perform pattern matching is with functionshfst.compile_pmatch_expression andhfst.compile_pmatch_file

init (self)

Initialize a PmatchContainer. Is this needed?

init (self, defs)

Create a PmatchContainer based on definitionsdefs.

defs: A tuple of transducers inHFST_OLW_TYPE defining how pmatch is done.

An example:

If we have a file namedstreets.txt that contains:

define CapWord UppercaseAlpha Alpha* ;define StreetWordFr [{avenue} | {boulevard} | {rue}] ;define DeFr [ [{de} | {du} | {des} | {de la}] Whitespace ] | [{d'} | {l'}] ;define StreetFr StreetWordFr (Whitespace DeFr) CapWord+ ;regex StreetFr EndTag(FrenchStreetName) ;

and which has been earlier compiled and stored in filestreets.pmatch.hfst.ol:

defs = hfst.compile_pmatch_file('streets.txt')ostr = hfst.HfstOutputStream(filename='streets.pmatch.hfst.ol', type=hfst.ImplementationType.HFST_OLW_TYPE)for tr in defs:    ostr.write(tr)ostr.close()

we can read the pmatch definitions from file and perform string matching with:

istr = hfst.HfstInputStream('streets.pmatch.hfst.ol')defs = []while(not istr.is_eof()):    defs.append(istr.read())istr.close()cont = hfst.PmatchContainer(defs)assert cont.match("Je marche seul dans l'avenue des Ternes.") == "Je marche seul dans l'<FrenchStreetName>avenue des Ternes</FrenchStreetName>."

match (self, input, time_cutoff = 0)

Match inputinput.

get_profiling_info (self)

todo

set_verbose (self, b)

todo

set_extract_tags_mode (self, b)

todo

set_profile (self, b)

todo

locate(self, input, time_cutoff, weight_cutoff)

The locations of pmatched strings for stringinput where the results are limitedas defined bytime_cutoff andweight_cutoff.

input : The input string.
time_cutoff : Time cutoff, defaults to zero, i.e. no cutoff.
weight_cutoff : Weight cutoff, defaults to infinity, i.e. no cutoff.

Returns: A tuple of tuples of Location.

tokenize(self, input)

Tokenizeinput and return a list of tokens i.e. strings.

input: The string to be tokenized.

get_tokenized_output(self, input, **kwargs)

Tokenizeinput and get a string representation of the tokenization(essentially the same that command line tool hfst-tokenize would give).

input: The input string to be tokenized.
kwargs: Possible parameters are:output_format,max_weight_classes,dedupe,print_weights,print_all,time_cutoff,verbose,beam,tokenize_multichar.
output_format: The format of output; possible values aretokenize,xerox,cg,finnpos,giellacg,conllu andvisl;tokenize being the default.
max_weight_classes: Maximum number of best weight classes to output (where analyses with equal weight constitute a class), defaults to None i.e. no limit.
dedupe: Whether duplicate analyses are removed, defaults to False.
print_weights: Whether weights are printd, defaults to False.
print_all: Whether nonmatching text is printed, defaults to False.
time_cutoff: Maximum number of seconds used per input after limiting the search.
verbose: Whether input is processed verbosely, defaults to True.
beam: Beam within analyses must be to get printed.
tokenize_multichar: Tokenize input into multicharacter symbols present in the transducer, defaults to false.

Movatterモバイル変換

PmatchContainer

class PmatchContainer

__init__ (self)

__init__ (self, defs)

match (self, input, time_cutoff = 0)

get_profiling_info (self)

set_verbose (self, b)

set_extract_tags_mode (self, b)

set_profile (self, b)

locate(self, input, time_cutoff, weight_cutoff)

tokenize(self, input)

get_tokenized_output(self, input, **kwargs)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pages

Packagehfst

Packagehfst.exceptions

Packagehfst.sfst_rules

Packagehfst.xerox_rules

General information

Links

Clone this wiki locally

init (self)

init (self, defs)