- Notifications
You must be signed in to change notification settings - Fork19
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
License
Unknown and 2 other licenses found
Licenses found
garvys-org/rustfst
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Rust implementation of Weighted Finite States Transducers.
Rustfst is a library for constructing, combining, optimizing, and searching weightedfinite-state transducers (FSTs). Weighted finite-state transducers are automata whereeach transition has an input label, an output label, and a weight.The more familiar finite-state acceptor is represented as a transducerwith each transition's input and output label equal. Finite-state acceptorsare used to represent sets of strings (specifically, regular or rational sets);finite-state transducers are used to represent binary relations between pairs ofstrings (specifically, rational transductions). The weights can be used to representthe cost of taking a particular transition.
FSTs have key applications in speech recognition and synthesis, machine translation,optical character recognition, pattern matching, string processing, machine learning,information extraction and retrieval among others. Often a weighted transducer is used torepresent a probabilistic model (e.g., an n-gram model, pronunciation model). FSTs can beoptimized by determinization and minimization, models can be applied to hypothesis sets(also represented as automata) or cascaded by finite-state composition, and the bestresults can be selected by shortest-path algorithms.
For a basicexample see the section below.
Some simple and commonly encountered types of FSTs can be easilycreated with the macro [fst] or the functionsacceptor andtransducer.
For more complex cases you will likely start with theVectorFst type, which will be importedin the [prelude] along with most everything else you need.VectorFst<TropicalWeight> correspondsdirectly to the OpenFSTStdVectorFst, and can be used to loadits files usingread orread_text.
Because "iteration" over an FST can mean many different things,there are a variety of different iterators. To iterate over stateIDs you may usestates_iter, while toiterate over transitions out of a state, you may useget_trs. Since it is common toiterate over both, this can be done usingfst_iter orfst_into_iter. Itis also very common to iterate over paths accepted by an FST,which can be done withpaths_iter, and as a conveniencefor generating text,string_paths_iter.Alternately, in the case of a linear FST, you may retrieve theonly possible path withdecode_linear_fst.
Note that iterating over paths is not the same thing as findingtheshortest path or paths, which is done withshortest_path (for a single path)orshortest_path_with_config(for N-shortest paths).
For the complete list of algorithms, see the [algorithms] module.
You may now be wondering, especially if you have previously usedsuch linguist-friendly tools aspyfoma, "what if I just wanttotransduce some text???" The unfriendly answer is thatrustfst is a somewhat lower-level library, designed forimplementing things like speech recognizers. The somewhat morehelpful answer is that you would do this by constructing anacceptor for your input, which you willcompose with atransducer, thenproject the result to itsoutput, and finallyiterate over the paths inthe resulting FST.
Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :
- Weighted automata algorithms
- The design principles of a weighted finite-state transducer library
- OpenFst: A general and efficient weighted finite-state transducer library
- Weighted finite-state transducers in speech recognition
The API closely resembles that of OpenFST, with somesimplifications and changes to make it more idiomatic in Rust, notablythe use ofTr instead ofArc. SeeDifferences fromOpenFST for more information.
use anyhow::Result;use rustfst::prelude::*;use rustfst::algorithms::determinize::{DeterminizeType, determinize};use rustfst::algorithms::rm_epsilon::rm_epsilon;fnmain() ->Result<()>{// Creates a empty wFSTletmut fst =VectorFst::<TropicalWeight>::new();// Add some stateslet s0 = fst.add_state();let s1 = fst.add_state();let s2 = fst.add_state();// Set s0 as the start state fst.set_start(s0)?;// Add a transition from s0 to s1 fst.add_tr(s0,Tr::new(3,5,10.0, s1))?;// Add a transition from s0 to s2 fst.add_tr(s0,Tr::new(5,7,18.0, s2))?;// Set s1 and s2 as final states fst.set_final(s1,31.0)?; fst.set_final(s2,45.0)?;// Iter over all the paths in the wFSTfor pin fst.paths_iter(){println!("{:?}", p);}// A lot of operations are available to modify/optimize the FST.// Here are a few examples :// - Remove useless states.connect(&mut fst)?;// - Optimize the FST by merging states with the same behaviour.minimize(&mut fst)?;// - Copy all the input labels in the output.project(&mut fst,ProjectType::ProjectInput);// - Remove epsilon transitions.rm_epsilon(&mut fst)?;// - Compute an equivalent FST but deterministic. fst =determinize(&fst)?;Ok(())}
Here is a non-exhaustive list of ways in which Rustfst's APIdiffers from OpenFST:
- The default epsilon symbol is
<eps>and not<epsilon>. - Functions and methods follow Rust naming conventions,e.g.
add_staterather thanAddState, but are otherwise mostlyequivalent, except that: - Transitions are called
Trand notArc, becauseArchas arather different and well-established meaning in Rust, and rustfstuses it (std::sync::Arc, that is) to reference-count symboltables. All associated functions also usetr. - Final states are not indicated by a final weight of
zero. Youcan test for finality usingis_final, andfinal_weightreturns an [Option]. Thisrequires some care when converting OpenFST code. - Transitions can be accessed directly as a slice rather than requiringan iterator.
- Semiring operations are expressed as plain old methods ratherthan strange C++ things. So write
w1.plus(w2)rather thanPlus(w1, w2), for instance. - Weights have in-place operations for ⊕(
plus_assign) and ⊗(times_assign). - Most of the type aliases (which would be trait aliases in Rust) suchas
StdArc,StdFst, and so forth, are missing, but type inferenceallows us to avoid explicit type arguments in most cases, such aswhen calling [Tr::new], for instance. - State IDs are unsigned, with [
NO_STATE_ID] used for a missing value.They are also 32 bits by default (presumably, 4 billion statesis enough for most applications). This means you must take care tocast them to [usize] when using them as indices, and vice-versa,preferably checking for overflows - Symbol IDs are also unsigned and 32-bits, with [
NO_LABEL] usedfor a missing value. - Floating-point weights are not generic, so are always single-precision.
I did a benchmark some time ago on almost every linear fst algorithm and compared the results withOpenFst. You can find the results here :
Spoiler alert:Rustfst is faster on all those algorithms 😅
The documentation of the last released version is available here :https://docs.rs/rustfst
- Use the script
update_version.shto update the version of every package. - Push
- Push a new tag with the prefix
rustfst-v
Example :
./update_version.sh 0.9.1-alpha.6git commit -am"Release 0.9.1-alpha.6"git pushgit tag -a rustfst-v0.9.1-alpha.6 -m"Release rustfst 0.9.1-alpha.6" git push --tags
Optionally, if this is a major release, create a GitHub release in the UI.
This repository contains two main projects:
rustfstis the Rust re-implementation.rustfst-pythonis the python binding ofrustfst.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE orhttp://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT) orhttp://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
About
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Topics
Resources
License
Unknown and 2 other licenses found
Licenses found
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.