githubharald/CTCDecoderPublic

NotificationsYou must be signed in to change notification settings
Fork182
Star829

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

License

MIT license

829 stars 182 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
ctc_decoder		ctc_decoder
data		data
doc		doc
extras		extras
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some commonConnectionist Temporal Classification (CTC) decoding algorithms.A minimalisticlanguage model is provided.

Installation

Go to the root level of the repository
Executepip install .
Go totests/ and executepytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

importnumpyasnpfromctc_decoderimportbest_path,beam_searchmat=np.array([[0.4,0,0.6], [0.4,0,0.6]])chars='ab'print(f'Best path: "{best_path(mat,chars)}"')print(f'Beam search: "{beam_search(mat,chars)}"')

The outputmat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxCand is passed as the first argument to the decoders.T is the number of time-steps, and C the number of characters (the CTC-blank is the last element).The characters that can be predicted by the neural network are passed as thechars string to the decoder.Decoders return the decoded string.
Running the code outputs:

Best path: ""Beam search: "a"

To see more examples on how to use the decoders,please have a look at the scripts in thetests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model.Text statistics (bigrams) are used by beam search to improve reading accuracy.

fromctc_decoderimportbeam_search,LanguageModel# create language model instance from a (large) textlm=LanguageModel('this is some text',chars)# and use it in the beam search decoderres=beam_search(mat,chars,lm=lm)

The lexicon search decoder computes a first approximation with best path decoding.Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word.The BK-tree is created by providing a list of dictionary words.A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

fromctc_decoderimportlexicon_search,BKTree# create BK-tree from a list of wordsbk_tree=BKTree(['words','from','a','dictionary'])# and use the tree in the lexicon searchres=lexicon_search(mat,chars,bk_tree,tolerance=2)

Usage with deep learning frameworks

Some notes:

No adapter for TensorFlow or PyTorch is provided
Apply softmax already in the model
Convert to numpy array
Usually, the output of an RNN layerrnn_output has shape TxBxC, with B the batch dimension
- Decoders work on single batch elements of shape TxC
- Therefore, iterate over all batch elements and apply the decoder to each of them separately
- Example: extract matrix of batch element 0mat = rnn_output[:, 0, :]
The CTC-blank is expected to be the last element along the character dimension
- TensorFlow has the CTC-blank as last element, so nothing to do here
- PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes,but might be used for experiments or research:

prefix_search: prefix search decoder
token_passing: token passing algorithm
Best path decoder implementation in OpenCL (seeextras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

Documentation oftest cases
Documentation of thedata

References

About

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

harald-scheidl.medium.com/5a889a3d85a7

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CTC Decoding Algorithms

Installation

Usage

Basic usage

Language model and BK-tree

Usage with deep learning frameworks

List of provided decoders

Documentation of test cases and data

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors4

Uh oh!

Languages

Movatterモバイル変換

License

githubharald/CTCDecoder

Folders and files

Latest commit

History

Repository files navigation

CTC Decoding Algorithms

Installation

Usage

Basic usage

Language model and BK-tree

Usage with deep learning frameworks

List of provided decoders

Documentation of test cases and data

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors4

Uh oh!

Languages