- Notifications
You must be signed in to change notification settings - Fork11
Python toolkit for Visual Speech Recognition
License
georgesterpu/pyVSR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Python toolkit for Visual Speech Recognition
pyVSR is a Python toolkit aimed at running Visual Speech Recognition (VSR) experiments in a traditional framework (e.g. handcrafted visual features, Hidden Markov Models for pattern recognition).
The main goal of pyVSR is to easily reproduce VSR experiments in order to have a baseline result on most publicly available audio-visual datasets.
currently supported:
TCD-TIMIT
- speaker-dependent protocol (Gillen)
- speaker-independent protocol (Gillen)
- single person
OuluVS2
- speaker-independent protocol (Saitoh)
- single person
Discrete Cosine Transform (DCT)
- Automatic ROI extraction (grayscale, RGB, DCT)
- Face alignment (from 5 stable landmarks)
- Configurable window size
- Fourth order accurate derivatives
- Sample rate interpolation
- Storage in HDF5 format
Active Appearance Models (AAM)
- Do NOT require manually annotated landmarks
- Face, lips, and chin models supported
- Parameters obtainable either through fitting or projection
- Implementation based onMenpo
Point cloud of facial landmarks
- OpenFace wrapper
- easy HTK wrapper for Python
- optional bigram language model
- multi-threaded support (both for training and decoding at full CPU Power)
- pyVSR has a simple, modular, object-oriented architecture
Please refer to the attached examples.
pyVSR was re-designed to simplify its usage on multiple datasets.
Users can provide their own dictionaries of (input, output) pairs for all of pyVSR's functionalities.
The recommended way is to create an emptyconda
environment and install the following dependencies:
- conda install -c menpo menpo menpofit menpodetect menpowidgets
- conda install -c menpo pango harfbuzz
- conda install h5py
- conda install natsort
- conda install scipy
Alternatively, you can use theenvironment.yml
file:
- conda env create -f environment.yml
It is the user's responsibility to compileOpenFace
andHTK
.
Please refer to the documentation upstream:
OpenFace
HTK 3.5
Add the HTK binaries to the system path (e.g./usr/local/bin/
) or to./pyVSR/bins/htk/
Add the OpenFace binaries to./pyVSR/bins/openface/
pyVSR was initially developed on a system running Manjaro Linux, frequently updated from thetesting
repositories.We also succesfully tested it on Windows systems.
If you are not interested in using the AAM module, you can skip installing a great amount of Python packages.We recommend running the example scripts and installing the missing dependencies (opencv, dlib, numpy).
If you use this work, please cite it as:
George Sterpu and Naomi Harte.Towards lipreading sentences using active appearance models.In AVSP, Stockholm, Sweden, August 2017.
We are always happy to hear from you:
George Sterpu sterpug [at] tcd.ie
Naomi Harte nharte [at] tcd.ie
About
Python toolkit for Visual Speech Recognition