- Notifications
You must be signed in to change notification settings - Fork246
pykaldi/pykaldi
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PyKaldi is a Python scripting layer for theKaldi speech recognition toolkit.It provides easy-to-use, low-overhead, first-class Python wrappers for the C++code in Kaldi andOpenFst libraries. You can use PyKaldi to write Python codefor things that would otherwise require writing C++ code such as callinglow-level Kaldi functions, manipulating Kaldi and OpenFst objects in code orimplementing new Kaldi tools.
You can think of Kaldi as a large box of legos that you can mix and match tobuild custom speech recognition solutions. The best way to think of PyKaldi isas a supplement, a sidekick if you will, to Kaldi. In fact, PyKaldi is at itsbest when it is used alongside Kaldi. To that end, replicating the functionalityof myriad command-line tools, utility scripts and shell-level recipes providedby Kaldi is a non-goal for the PyKaldi project.
Like Kaldi, PyKaldi is primarily intended for speech recognition researchers andprofessionals. It is jam packed with goodies that one would need to build Pythonsoftware taking advantage of the vast collection of utilities, algorithms anddata structures provided by Kaldi and OpenFst libraries.
If you are not familiar with FST-based speech recognition or have no interest inhaving access to the guts of Kaldi and OpenFst in Python, but only want to run apre-trained Kaldi system as part of your Python application, do not fret.PyKaldi includes a number of high-level application oriented modules, such asasr
,alignment
andsegmentation
, that should be accessible to mostPython programmers.
If you are interested in using PyKaldi for research or building advanced ASRapplications, you are in luck. PyKaldi comes with everything you need to read,write, inspect, manipulate or visualize Kaldi and OpenFst objects in Python. Itincludes Python wrappers for most functions and methods that are part of thepublic APIs of Kaldi and OpenFst C++ libraries. If you want to read/write filesthat are produced/consumed by Kaldi tools, check out I/O and table utilities intheutil
package. If you want to work with Kaldi matrices and vectors, e.g.convert them toNumPy ndarrays and vice versa, check out thematrix
package. If you want to use Kaldi for feature extraction and transformation,check out thefeat
,ivector
andtransform
packages. If you want towork with lattices or other FST structures produced/consumed by Kaldi tools,check out thefstext
,lat
andkws
packages. If you want low-levelaccess to Gaussian mixture models, hidden Markov models or phonetic decisiontrees in Kaldi, check out thegmm
,sgmm2
,hmm
, andtree
packages. If you want low-level access to Kaldi neural network models, check outthennet3
,cudamatrix
andchain
packages. If you want to use thedecoders and language modeling utilities in Kaldi, check out thedecoder
,lm
,rnnlm
,tfrnnlm
andonline2
packages.
Interested readers who would like to learn more about Kaldi and PyKaldi mightfind the following resources useful:
- Kaldi Docs: Read these to learn more about Kaldi.
- PyKaldi Docs: Consult these to learn more about the PyKaldi API.
- PyKaldi Examples: Check these out to see PyKaldi in action.
- PyKaldi Paper: Read this to learn more about the design of PyKaldi.
Since automatic speech recognition (ASR) in Python is undoubtedly the "killerapp" for PyKaldi, we will go over a few ASR scenarios to get a feel for thePyKaldi API. We should note that PyKaldi does not provide any high-levelutilities for training ASR models, so you need to train your models using Kaldirecipes or use pre-trained models available online. The reason why this is so issimply because there is no high-level ASR training API in Kaldi C++ libraries.Kaldi ASR models are trained using complex shell-levelrecipesthat handle everything from data preparation to the orchestration of myriadKaldi executables used in training. This is by design and unlikely to change inthe future. PyKaldi does provide wrappers for the low-level ASR trainingutilities in Kaldi C++ libraries but those are not really useful unless you wantto build an ASR training pipeline in Python from basic building blocks, which isno easy task. Continuing with the lego analogy, this task is akin to buildingthis given access to a truck full of legos you might need. If youare crazy enough to try though, please don't let this paragraph discourage you.Before we started building PyKaldi, we thought that was a mad man's task too.
PyKaldiasr
module includes a number of easy-to-use, high-level classes tomake it dead simple to put together ASR systems in Python. Ignoring theboilerplate code needed for setting things up, doing ASR with PyKaldi can be assimple as the following snippet of code:
asr=SomeRecognizer.from_files("final.mdl","HCLG.fst","words.txt",opts)withSequentialMatrixReader("ark:feats.ark")asfeats_reader:forkey,featsinfeats_reader:out=asr.decode(feats)print(key,out["text"])
In this simplified example, we first instantiate a hypothetical recognizerSomeRecognizer
with the paths for the modelfinal.mdl
, the decoding graphHCLG.fst
and the symbol tablewords.txt
. Theopts
object contains theconfiguration options for the recognizer. Then, we instantiate aPyKaldi tablereaderSequentialMatrixReader
for reading the featurematrices stored in theKaldi archivefeats.ark
. Finally,we iterate over the feature matrices and decode them one by one. Here we aresimply printing the best ASR hypothesis for each utterance so we are onlyinterested in the"text"
entry of the output dictionaryout
. Keep in mindthat the output dictionary contains a bunch of other useful entries, such as theframe level alignment of the best hypothesis and a weighted lattice representingthe most likely hypotheses. Admittedly, not all ASR pipelines will be as simpleas this example, but they will often have the same overall structure. In thefollowing sections, we will see how we can adapt the code given above toimplement more complicated ASR pipelines.
This is the most common scenario. We want to do offline ASR using pre-trainedKaldi models, such asASpIRE chain models. Here we are using the term "models"loosely to refer to everything one would need to put together an ASR system. Inthis specific example, we are going to need:
- aneural network acoustic model,
- atransition model,
- adecoding graph,
- aword symbol table,
- and a couple of feature extractionconfigs.
Note that you can use this example code to decode withASpIRE chain models.
fromkaldi.asrimportNnetLatticeFasterRecognizerfromkaldi.decoderimportLatticeFasterDecoderOptionsfromkaldi.nnet3importNnetSimpleComputationOptionsfromkaldi.util.tableimportSequentialMatrixReader,CompactLatticeWriter# Set the paths and read/write specifiersmodel_path="models/aspire/final.mdl"graph_path="models/aspire/graph_pp/HCLG.fst"symbols_path="models/aspire/graph_pp/words.txt"feats_rspec= ("ark:compute-mfcc-feats --config=models/aspire/conf/mfcc.conf ""scp:wav.scp ark:- |")ivectors_rspec= (feats_rspec+"ivector-extract-online2 ""--config=models/aspire/conf/ivector_extractor.conf ""ark:spk2utt ark:- ark:- |")lat_wspec="ark:| gzip -c > lat.gz"# Instantiate the recognizerdecoder_opts=LatticeFasterDecoderOptions()decoder_opts.beam=13decoder_opts.max_active=7000decodable_opts=NnetSimpleComputationOptions()decodable_opts.acoustic_scale=1.0decodable_opts.frame_subsampling_factor=3asr=NnetLatticeFasterRecognizer.from_files(model_path,graph_path,symbols_path,decoder_opts=decoder_opts,decodable_opts=decodable_opts)# Extract the features, decode and write output latticeswithSequentialMatrixReader(feats_rspec)asfeats_reader, \SequentialMatrixReader(ivectors_rspec)asivectors_reader, \CompactLatticeWriter(lat_wspec)aslat_writer:for (fkey,feats), (ikey,ivectors)inzip(feats_reader,ivectors_reader):assert(fkey==ikey)out=asr.decode((feats,ivectors))print(fkey,out["text"])lat_writer[fkey]=out["lattice"]
The fundamental difference between this example and the short snippet from lastsection is that for each utterance we are reading the raw audio data from diskand computing two feature matrices on the fly instead of reading a singleprecomputed feature matrix from disk. Thescript filewav.scp
contains a list of WAV files corresponding to the utterances we wantto decode. The additional feature matrix we are extracting contains onlinei-vectors that are used by the neural network acoustic model to perform channeland speaker adaptation. Thespeaker-to-utterance mapspk2utt
is used for accumulating separate statistics for each speaker inonline i-vector extraction. It can be a simple identity mapping if the speakerinformation is not available. We pack the MFCC features and the i-vectors into atuple and pass this tuple to the recognizer for decoding. The neural networkrecognizers in PyKaldi know how to handle the additional i-vector features whenthey are available. The model filefinal.mdl
contains both the transitionmodel and the neural network acoustic model. TheNnetLatticeFasterRecognizer
processes feature matrices by first computing phone log-likelihoods using theneural network acoustic model, then mapping those to transition log-likelihoodsusing the transition model and finally decoding transition log-likelihoods intoword sequences using the decoding graphHCLG.fst
, which hastransitionIDs on its input labels andword IDs on its output labels. After decoding, we save the latticegenerated by the recognizer to a Kaldi archive for future processing.
This example also illustrates the powerfulI/O mechanismsprovided by Kaldi. Instead of implementing the feature extraction pipelines incode, we define them as Kaldi read specifiers and compute the feature matricessimply by instantiatingPyKaldi table readers anditerating over them. This is not only the simplest but also the fastest way ofcomputing features with PyKaldi since the feature extraction pipeline is run inparallel by the operating system. Similarly, we use a Kaldi write specifier toinstantiate aPyKaldi table writer which writes outputlattices to a compressed Kaldi archive. Note that for these to work, we needcompute-mfcc-feats
,ivector-extract-online2
andgzip
to be on ourPATH
.
This is similar to the previous scenario, but instead of a Kaldi acoustic model,we use aPyTorch acoustic model. After computing the features as before, weconvert them to a PyTorch tensor, do the forward pass using a PyTorch neuralnetwork module outputting phone log-likelihoods and finally convert thoselog-likelihoods back into a PyKaldi matrix for decoding. The recognizer uses thetransition model to automatically map phone IDs to transition IDs, the inputlabels on a typical Kaldi decoding graph.
fromkaldi.asrimportMappedLatticeFasterRecognizerfromkaldi.decoderimportLatticeFasterDecoderOptionsfromkaldi.matriximportMatrixfromkaldi.util.tableimportSequentialMatrixReader,CompactLatticeWriterfrommodelsimportAcousticModel# Import your PyTorch modelimporttorch# Set the paths and read/write specifiersacoustic_model_path="models/aspire/model.pt"transition_model_path="models/aspire/final.mdl"graph_path="models/aspire/graph_pp/HCLG.fst"symbols_path="models/aspire/graph_pp/words.txt"feats_rspec= ("ark:compute-mfcc-feats --config=models/aspire/conf/mfcc.conf ""scp:wav.scp ark:- |")lat_wspec="ark:| gzip -c > lat.gz"# Instantiate the recognizerdecoder_opts=LatticeFasterDecoderOptions()decoder_opts.beam=13decoder_opts.max_active=7000asr=MappedLatticeFasterRecognizer.from_files(transition_model_path,graph_path,symbols_path,decoder_opts=decoder_opts)# Instantiate the PyTorch acoustic model (subclass of torch.nn.Module)model=AcousticModel(...)model.load_state_dict(torch.load(acoustic_model_path))model.eval()# Extract the features, decode and write output latticeswithSequentialMatrixReader(feats_rspec)asfeats_reader, \CompactLatticeWriter(lat_wspec)aslat_writer:forkey,featsinfeats_reader:feats=torch.from_numpy(feats.numpy())# Convert to PyTorch tensorloglikes=model(feats)# Compute log-likelihoodsloglikes=Matrix(loglikes.numpy())# Convert to PyKaldi matrixout=asr.decode(loglikes)print(key,out["text"])lat_writer[key]=out["lattice"]
This section is a placeholder. Check outthis script in the meantime.
Lattice rescoring is a standard technique for using large n-gram language modelsor recurrent neural network language models (RNNLMs) in ASR. In this example, werescore lattices using a Kaldi RNNLM. We first instantiate a rescorer byproviding the paths for the models. Then we use a table reader to iterate overthe lattices we want to rescore and finally we use a table writer to writerescored lattices back to disk.
fromkaldi.asrimportLatticeRnnlmPrunedRescorerfromkaldi.fstextimportSymbolTablefromkaldi.rnnlmimportRnnlmComputeStateComputationOptionsfromkaldi.util.tableimportSequentialCompactLatticeReader,CompactLatticeWriter# Set the paths, extended filenames and read/write specifierssymbols_path="models/tedlium/config/words.txt"old_lm_path="models/tedlium/data/lang_nosp/G.fst"word_feats_path="models/tedlium/word_feats.txt"feat_embedding_path="models/tedlium/feat_embedding.final.mat"word_embedding_rxfilename= ("rnnlm-get-word-embedding %s %s - |"% (word_feats_path,feat_embedding_path))rnnlm_path="models/tedlium/final.raw"lat_rspec="ark:gunzip -c lat.gz |"lat_wspec="ark:| gzip -c > rescored_lat.gz"# Instantiate the rescorersymbols=SymbolTable.read_text(symbols_path)opts=RnnlmComputeStateComputationOptions()opts.bos_index=symbols.find_index("<s>")opts.eos_index=symbols.find_index("</s>")opts.brk_index=symbols.find_index("<brk>")rescorer=LatticeRnnlmPrunedRescorer.from_files(old_lm_path,word_embedding_rxfilename,rnnlm_path,opts=opts)# Read the lattices, rescore and write output latticeswithSequentialCompactLatticeReader(lat_rspec)aslat_reader, \CompactLatticeWriter(lat_wspec)aslat_writer:forkey,latinlat_reader:lat_writer[key]=rescorer.rescore(lat)
Notice the extended filename we used to compute the word embeddings from theword features and the feature embeddings on the fly. Also of note are theread/write specifiers we used to transparently decompress/compress the latticearchives. For these to work, we needrnnlm-get-word-embedding
,gunzip
andgzip
to be on ourPATH
.
PyKaldi aims to bridge the gap between Kaldi and all the nice things Python hasto offer. It is more than a collection of bindings into Kaldi libraries. It is ascripting layer providing first class support for essential Kaldi andOpenFsttypes in Python. PyKaldi vector and matrix types are tightly integrated withNumPy. They can be seamlessly converted to NumPy arrays and vice versa withoutcopying the underlying memory buffers. PyKaldi FST types, including Kaldi stylelattices, are first class citizens in Python. The API for the user facing FSTtypes and operations is almost entirely defined in Python mimicking the APIexposed bypywrapfst, the official Python wrapper for OpenFst.
PyKaldi harnesses the power ofCLIF to wrap Kaldi and OpenFst C++ librariesusing simple API descriptions. The CPython extension modules generated by CLIFcan be imported in Python to interact with Kaldi and OpenFst. While CLIF isgreat for exposing existing C++ API in Python, the wrappers do not always exposea "Pythonic" API that is easy to use from Python. PyKaldi addresses this byextending the raw CLIF wrappers in Python (and sometimes in C++) to provide amore "Pythonic" API. Below figure illustrates where PyKaldi fits in the Kaldiecosystem.
PyKaldi has a modular design which makes it easy to maintain and extend. Sourcefiles are organized in a directory tree that is a replica of the Kaldi sourcetree. Each directory defines a subpackage and contains only the wrapper codewritten for the associated Kaldi library. The wrapper code consists of:
CLIF C++ API descriptions defining the types and functions to be wrapped andtheir Python API,
C++ headers defining the shims for Kaldi code that is not compliant with theGoogle C++ style expected by CLIF,
Python modules grouping together related extension modules generated with CLIFand extending the raw CLIF wrappers to provide a more "Pythonic" API.
You can read more about the design and technical details of PyKaldi inour paper.
The following table shows the status of each PyKaldi package (we currently donot plan to add support for nnet, nnet2 and online) along the followingdimensions:
- Wrapped?: If there are enough CLIF files to make the package usable inPython.
- Pythonic?: If the package API has a "Pythonic" look-and-feel.
- Documentation?: If there is documentation beyond what is automaticallygenerated by CLIF. Single checkmark indicates that there is not much additionaldocumentation (if any). Three checkmarks indicates that package documentationis complete (or near complete).
- Tests?: If there are any tests for the package.
Package | Wrapped? | Pythonic? | Documentation? | Tests? |
---|---|---|---|---|
base | ✔ | ✔ | ✔ ✔ ✔ | ✔ |
chain | ✔ | ✔ | ✔ ✔ ✔ | |
cudamatrix | ✔ | ✔ | ✔ | |
decoder | ✔ | ✔ | ✔ ✔ ✔ | |
feat | ✔ | ✔ | ✔ ✔ ✔ | |
fstext | ✔ | ✔ | ✔ ✔ ✔ | |
gmm | ✔ | ✔ | ✔ ✔ | ✔ |
hmm | ✔ | ✔ | ✔ ✔ ✔ | ✔ |
ivector | ✔ | ✔ | ||
kws | ✔ | ✔ | ✔ ✔ ✔ | |
lat | ✔ | ✔ | ✔ ✔ ✔ | |
lm | ✔ | ✔ | ✔ ✔ ✔ | |
matrix | ✔ | ✔ | ✔ ✔ ✔ | ✔ |
nnet3 | ✔ | ✔ | ✔ | |
online2 | ✔ | ✔ | ✔ ✔ ✔ | |
rnnlm | ✔ | ✔ | ✔ ✔ ✔ | |
sgmm2 | ✔ | ✔ | ||
tfrnnlm | ✔ | ✔ | ✔ ✔ ✔ | |
transform | ✔ | ✔ | ✔ | |
tree | ✔ | ✔ | ||
util | ✔ | ✔ | ✔ ✔ ✔ | ✔ |
If you are using a relatively recent Linux or macOS, such as Ubuntu >= 16.04,CentOS >= 7 or macOS >= 10.13, you should be able to install PyKaldi without toomuch trouble. Otherwise, you will likely need to tweak the installation scripts.
You can now download official whl packages from ourgithub release page. We have whl packages for Python 3.7, 3.8, ... , 3.11 on Linux and a few (experimental) builds for Mac M1/M2.
If you decide to use a whl package then you can skip the next sections and head straight to "Starting a new project with a pykaldi whl package" to setup your project. Note that you still need to compile a PyKaldi-compatible version of Kaldi.
To install and build PyKaldi from source, follow the steps given below.
git clone https://github.com/pykaldi/pykaldi.gitcd pykaldi
Although it is not required, we recommend installing PyKaldi and all of itsPython dependencies inside a new isolated Python environment. If you do not wantto create a new Python environment, you can skip the rest of this step.
You can use any tool you like for creating a new Python environment. Here weusevirtualenv
, but you can use another tool likeconda
if you prefer that.Make sure you activate the new Python environment before continuing with therest of the installation.
virtualenv envsource env/bin/activate
Running the commands below will install the system packages needed for buildingPyKaldi from source.
# Ubuntusudo apt-get install autoconf automake cmake curl g++ git graphviz \ libatlas3-base libtool make pkg-config subversion unzip wget zlib1g-dev# macOSbrew install automake cmake git graphviz libtool pkg-config wget gnu-sed openblas subversionPATH="/opt/homebrew/opt/gnu-sed/libexec/gnubin:$PATH"
Running the commands below will install the Python packages needed for buildingPyKaldi from source.
pip install --upgrade pippip install --upgrade setuptoolspip install numpy pyparsingpip install ninja# not required but strongly recommended
In addition to above listed packages, we also need PyKaldi compatibleinstallations of the following software:
Google Protobuf, recommended v3.5.0. Boththe C++ library and the Python package must be installed.
PyKaldi compatible fork of CLIF. Tostreamline PyKaldi development, we made some changes to CLIF codebase. Weare hoping to upstream these changes over time.These changes are in the pykaldi branch:
# This command will be automatically run for you in the tools install scripts.git clone -b pykaldi https://github.com/pykaldi/clif
- PyKaldi compatible fork of Kaldi. Tocomply with CLIF requirements we had to make some changes to Kaldi codebase. Weare hoping to upstream these changes over time.These changes are in the pykaldi branch:
# This command will be automatically run for you in the tools install scripts.git clone -b pykaldi https://github.com/pykaldi/kaldi
You can use the scripts in thetools
directory to install or update thesesoftware locally. Make sure you check the output of these scripts. If you do notseeDone installing {protobuf,CLIF,Kaldi}
printed at the very end, it meansthat installation has failed for some reason.
cd tools./check_dependencies.sh# checks if system dependencies are installed./install_protobuf.sh# installs both the C++ library and the Python package./install_clif.sh# installs both the C++ library and the Python package./install_kaldi.sh# installs the C++ librarycd ..
Note, if you are compiling Kaldi on Apple Silicion and ./install_kaldi.sh gets stuck right at the beginning compiling sctk, you might need to remove -march=native from tools/kaldi/tools/Makefile, e.g. by uncommeting it in this line like this:
SCTK_CXFLAGS = -w#-march=native
If Kaldi is installed inside thetools
directory and all Python dependencies(numpy, pyparsing, pyclif, protobuf) are installed in the active Pythonenvironment, you can install PyKaldi with the following command.
python setup.py install
Once installed, you can run PyKaldi tests with the following command.
python setup.pytest
You can then also create a whl package. The whl package makes it easy to install pykaldi into a new project environment for your speech project.
python setup.py bdist_wheel
The whl file can then be found in the "dist" folder. The whl filename depends on the pykaldi version, your Python version and your architecture. For a Python 3.9 build on x86_64 with pykaldi 0.2.2 it may look like: dist/pykaldi-0.2.2-cp39-cp39-linux_x86_64.whl
Create a new project folder, for example:
mkdir -p~/projects/myASRcd~/projects/myASR
Create and activate a virtual environment with the same Python version as the whl package, e.g for Python 2.9:
virtualenv -p /usr/bin/python3.9 myasr_env. myasr_env/bin/activate
Install numpy and pykaldi into your myASR environment:
pip3 install numpypip3 install pykaldi-0.2.2-cp39-cp39-linux_x86_64.whl
Copy pykaldi/tools/install_kaldi.sh to your myASR project. Use the install_kaldi.sh script to install a pykaldi compatible kaldi version for your project:
./install_kaldi.sh
Copy pykaldi/tools/path.sh to your project. Path.sh is used to make pykaldi find the Kaldi libraries and binaries in the kaldi folder. Source path.sh with:
. path.sh
Congratulations, you are ready to use pykaldi in your project!
Note: Anytime you open a new shell, you need to source the project environment and path.sh:
. myasr_env/bin/activate. path.sh
Note: Unfortunatly, the PyKaldi Conda packages are outdated. If you would like to maintain it, please get in touch with us.
To install PyKaldi with CUDA support:
conda install -c pykaldi pykaldi
To install PyKaldi without CUDA support (CPU only):
conda install -c pykaldi pykaldi-cpu
Note that PyKaldi conda package does not provide Kaldi executables. If you wouldlike to use Kaldi executables along with PyKaldi, e.g. as part of read/writespecifiers, you need to install Kaldi separately.
Note: The docker instructions below may be outdated. If you would like to maintain a docker image for PyKaldi, please get in touch with us.
If you would like to use PyKaldi inside a Docker container, follow theinstructions in thedocker
folder.
By default, PyKaldi install command uses all available (logical) processors toaccelerate the build process. If the size of the system memory is relativelysmall compared to the number of processors, the parallel compilation/linkingjobs might end up exhausting the system memory and result in swapping. You canlimit the number of parallel jobs used for building PyKaldi as follows:
MAKE_NUM_JOBS=2 python setup.py install
We have no idea what is needed to build PyKaldi on Windows. It would probablyrequire lots of changes to the build system.
At the moment, PyKaldi is not compatible with the upstream Kaldi repository.You need to build it againstour Kaldi fork.
If you already have a compatible Kaldi installation on your system, you do notneed to install a new one inside thepykaldi/tools
directory. Instead, youcan simply set the following environment variable before running the PyKaldiinstallation command.
export KALDI_DIR=<directory where Kaldi is installed, e.g."$HOME/tools/kaldi">
At the moment, PyKaldi is not compatible with the upstream CLIF repository.You need to build it usingour CLIF fork.
If you already have a compatible CLIF installation on your system, you do notneed to install a new one inside thepykaldi/tools
directory. Instead, youcan simply set the following environment variables before running the PyKaldiinstallation command.
export PYCLIF=<path to pyclif executable, e.g."$HOME/anaconda3/envs/clif/bin/pyclif">export CLIF_MATCHER=<path to clif-matcher executable, e.g."$HOME/anaconda3/envs/clif/clang/bin/clif-matcher">
While the need for updating Protobuf and CLIF should not come up very often, youmight want or need to update Kaldi installation used for building PyKaldi.Rerunning the relevant install script intools
directory should update theexisting installation. If this does not work, please open an issue.
PyKalditfrnnlm
package is built automatically along with the rest of PyKaldiifkaldi-tensorflow-rnnlm
library can be found among Kaldi libraries. Afterbuilding Kaldi, go toKALDI_DIR/src/tfrnnlm/
directory and follow theinstructions given in the Makefile. Make sure the symbolic link for thekaldi-tensorflow-rnnlm
library is added to theKALDI_DIR/src/lib/
directory.
Shennong - a toolbox for speech features extraction, like MFCC, PLP etc. using PyKaldi.
Kaldi model server - a threaded kaldi model server for live decoding. Can directly decode speech from your microphone with a nnet3 compatible model. Example models for English and German are available. Uses the PyKaldi online2 decoder.
MeetingBot - example of a web application for meeting transcription and summarization that makes use of a pykaldi/kaldi-model-server backend to display ASR output in the browser.
Subtitle2go - automatic subtitle generation for any media file. Uses PyKaldi for ASR with a batch decoder.
If you have a cool open source project that makes use of PyKaldi that you'd like to showcase here, let us know!
If you use PyKaldi for research, please citeour paper asfollows:
@inproceedings{pykaldi, title = {PyKaldi: A Python Wrapper for Kaldi}, author = {Dogan Can and Victor R. Martinez and Pavlos Papadopoulos and Shrikanth S. Narayanan}, booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on}, year = {2018}, organization = {IEEE}}
We appreciate all contributions! If you find a bug, feel free to open an issueor a pull request. If you would like to request or add a new feature please openan issue for discussion.
About
A Python wrapper for Kaldi