- Notifications
You must be signed in to change notification settings - Fork182
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
License
githubharald/CTCDecoder
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Update 2021: installable Python package
Python implementation of some commonConnectionist Temporal Classification (CTC) decoding algorithms.A minimalisticlanguage model is provided.
- Go to the root level of the repository
- Execute
pip install .
- Go to
tests/
and executepytest
to check if installation worked
Here is a minimalistic executable example:
importnumpyasnpfromctc_decoderimportbest_path,beam_searchmat=np.array([[0.4,0,0.6], [0.4,0,0.6]])chars='ab'print(f'Best path: "{best_path(mat,chars)}"')print(f'Beam search: "{beam_search(mat,chars)}"')
The outputmat
(numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxCand is passed as the first argument to the decoders.T is the number of time-steps, and C the number of characters (the CTC-blank is the last element).The characters that can be predicted by the neural network are passed as thechars
string to the decoder.Decoders return the decoded string.
Running the code outputs:
Best path: ""Beam search: "a"
To see more examples on how to use the decoders,please have a look at the scripts in thetests/
folder.
Beam search can optionally integrate a character-level language model.Text statistics (bigrams) are used by beam search to improve reading accuracy.
fromctc_decoderimportbeam_search,LanguageModel# create language model instance from a (large) textlm=LanguageModel('this is some text',chars)# and use it in the beam search decoderres=beam_search(mat,chars,lm=lm)
The lexicon search decoder computes a first approximation with best path decoding.Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word.The BK-tree is created by providing a list of dictionary words.A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.
fromctc_decoderimportlexicon_search,BKTree# create BK-tree from a list of wordsbk_tree=BKTree(['words','from','a','dictionary'])# and use the tree in the lexicon searchres=lexicon_search(mat,chars,bk_tree,tolerance=2)
Some notes:
- No adapter for TensorFlow or PyTorch is provided
- Apply softmax already in the model
- Convert to numpy array
- Usually, the output of an RNN layer
rnn_output
has shape TxBxC, with B the batch dimension- Decoders work on single batch elements of shape TxC
- Therefore, iterate over all batch elements and apply the decoder to each of them separately
- Example: extract matrix of batch element 0
mat = rnn_output[:, 0, :]
- The CTC-blank is expected to be the last element along the character dimension
- TensorFlow has the CTC-blank as last element, so nothing to do here
- PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting
Recommended decoders:
best_path
: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform betterbeam_search
: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameterlexicon_search
: lexicon search decoder, returns the best scoring word from a dictionary
Other decoders, from my experience not really suited for practical purposes,but might be used for experiments or research:
prefix_search
: prefix search decodertoken_passing
: token passing algorithm- Best path decoder implementation in OpenCL (see
extras/
folder)
This paper gives suggestions when to use best path decoding, beam search decoding and token passing.
- Documentation oftest cases
- Documentation of thedata
- Graves - Supervised sequence labelling with recurrent neural networks
- Hwang - Character-level incremental speech recognition with recurrent neural networks
- Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
- Marti - The IAM-database: an English sentence database for offline handwriting recognition
- Beam Search Decoding in CTC-trained Neural Networks
- An Intuitive Explanation of Connectionist Temporal Classification
- Scheidl - Comparison of Connectionist Temporal Classification Decoding Algorithms
- Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm
About
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.