- Notifications
You must be signed in to change notification settings - Fork269
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
License
readbeyond/aeneas
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
- Version: 1.7.3
- Date: 2017-03-15
- Developed by:ReadBeyond
- Lead Developer:Alberto Pettarin
- License: the GNU Affero General Public License Version 3 (AGPL v3)
- Contact:aeneas@readbeyond.it
- Quick Links:Home -GitHub -PyPI -Docs -Tutorial -Benchmark -Mailing List -Web App
aeneas automatically generates asynchronization mapbetween a list of text fragmentsand an audio file containing the narration of the text.In computer science this task is known as(automatically computing a)forced alignment.
For example, giventhis text fileandthis audio file,aeneas determines, for each fragment, the corresponding time interval in the audio file:
1 => [00:00:00.000, 00:00:02.640]From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]This synchronization map can be output to filein several formats, depending on its application:
- research: Audacity (AUD), ELAN (EAF), TextGrid;
- digital publishing: SMIL for EPUB 3;
- closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
- Web: JSON;
- further processing: CSV, SSV, TSV, TXT, XML.
- a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
- Python 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)
- FFmpeg
- eSpeak
- Python packages
BeautifulSoup4,lxml, andnumpy - Python headers to compile the Python C/C++ extensions (optional but strongly recommended)
- A shell supporting UTF-8 (optional but strongly recommended)
aeneas has been developed and tested onDebian 64bit,withPython 2.7 andPython 3.5,which are theonly supported platforms at the moment.Nevertheless,aeneas has been confirmed to work onother Linux distributions, Mac OS X, and Windows.See thePLATFORMS filefor details.
If installingaeneas natively on your OS proves difficult,you are strongly encouraged to useaeneas-vagrant,which providesaeneas inside a virtualized Debian imagerunning underVirtualBoxandVagrant,which can be installed on any modern OS (Linux, Mac OS X, Windows).
All-in-one installers are available for Mac OS X and Windows,and a Bash script for deb-based Linux distributions (Debian, Ubuntu)is provided in this repository.It is also possible to download a VirtualBox+Vagrant virtual machine.Please see theINSTALL filefor detailed, step-by-step installation procedures for different operating systems.
The generic OS-independent procedure is simple:
Make sure the followingexecutables can be called from yourshell:
espeak,ffmpeg,ffprobe,pip, andpythonFirst install
numpywithpipand thenaeneas(this order is important):pip install numpypip install aeneas
Tocheck whether you installedaeneas correctly, run:
python -m aeneas.diagnostics
Run without arguments to get theusage message:
python -m aeneas.tools.execute_taskpython -m aeneas.tools.execute_job
You can also get a list oflive examplesthat you can immediately run on your machinethanks to the included files:
python -m aeneas.tools.execute_task --examplespython -m aeneas.tools.execute_task --examples-all
Tocompute a synchronization map
map.jsonfor a pair(audio.mp3,text.txtinplaintext format), you can run:python -m aeneas.tools.execute_task \ audio.mp3 \ text.txt \"task_language=eng|os_task_file_format=json|is_text_type=plain" \ map.json(The command has been split into lines with
\for visual clarity;in production you can have the entire command on a single lineand/or you can use shell variables.)Tocompute a synchronization map
map.smilfor a pair(audio.mp3,page.xhtmlcontaining fragments marked byidattributes likef001),you can run:python -m aeneas.tools.execute_task \ audio.mp3 \ page.xhtml \"task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ map.smilAs you can see, the third argument (theconfiguration string)specifies the parameters controlling the I/O formatsand the processing options for the task.Consult thedocumentationfor details.
If you have several tasks to process,you can create ajob containerto batch process them:
python -m aeneas.tools.execute_job job.zip output_directory
File
job.zipshould contain aconfig.txtorconfig.xmlconfiguration file, providingaeneaswith all the information needed to parse the input assetsand format the output sync map files.Consult thedocumentationfor details.
Thedocumentationcontains a highly suggestedtutorialwhich explains how to use the built-in command line tools.
- Documentation:http://www.readbeyond.it/aeneas/docs/
- Command line tools tutorial:http://www.readbeyond.it/aeneas/docs/clitutorial.html
- Library tutorial:http://www.readbeyond.it/aeneas/docs/libtutorial.html
- Old, verbose tutorial:A Practical Introduction To The aeneas Package
- Mailing list:https://groups.google.com/d/forum/aeneas-forced-alignment
- Changelog:http://www.readbeyond.it/aeneas/docs/changelog.html
- High level description of how aeneas works:HOWITWORKS
- Development history:HISTORY
- Testing:TESTING
- Benchmark suite:https://readbeyond.github.io/aeneas-benchmark/
- Input text files in
parsed,plain,subtitles, orunparsed(XML) format - Multilevel input text files in
mplainandmunparsed(XML) format - Text extraction from XML (e.g., XHTML) files using
idandclassattributes - Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
- Input audio file formats: all those readable by
ffmpeg - Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
- Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
- MFCC and DTW computed via Python C extensions to reduce the processing time
- Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
- Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
- Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
- Batch processing of multiple audio/text pairs
- Download audio from a YouTube video
- In multilevel mode, recursive alignment from paragraph to sentence to word level
- In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently
- Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
- Adjustable splitting times, including a max character/second constraint for CC applications
- Automated detection of audio head/tail
- Output an HTML file for fine tuning the sync map manually (
finetuneasproject) - Execution parameters tunable at runtime
- Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
- Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release
- Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
- Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
- No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
- Open issues
A significant number of users runsaeneas to align audio and textat word-level (i.e., each fragment is a word).Althoughaeneas was not designed with word-level alignment in mindand the results might be inferior toASR-based forced alignersfor languages with good ASR models,aeneas offers some options to improvethe quality of the alignment at word-level:
- multilevel text (since v1.5.1),
- MFCC nonspeech masking (since v1.7.0, disabled by default),
- use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0).
If you use theaeneas.tools.execute_task command line tool,you can add--presets-word switch to enable MFCC nonspeech masking, for example:
$ python -m aeneas.tools.execute_task --example-words --presets-word$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
If you useaeneas as a library, just set the appropriateRuntimeConfiguration parameters.Please see thecommand line tutorialfor details.
aeneas is released under the terms of theGNU Affero General Public License Version 3.See theLICENSE file for details.
Licenses for third party code and files included inaeneascan be found in thelicenses directory.
No copy rights were harmed in the making of this project.
July 2015:Michele Gianella generously supported the development of the boundary adjustment code (v1.0.4)
August 2015:Michele Gianella partially sponsored the port of the MFCC/DTW code to C (v1.1.0)
September 2015: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)
October 2015: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0)
April 2016: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0
December 2016: theCentro Internazionale Del Libro Parlato "Adriano Sernagiotto" (Feltre, Italy) partially sponsored the development of the v1.7 series
Would you like supporting the development ofaeneas?
I accept sponsorships to
- fix bugs,
- add new features,
- improve the quality and the performance of the code,
- port the code to other languages/platforms, and
- improve the documentation.
Feel free toget in touch.
If you think you found a bugor you have a feature request,please use theGitHub issue trackerto submit it.
If you want to ask a questionabout usingaeneas,your best option consists in sending an email to themailing list.
Finally, code contributions are welcome!Please refer to theCode Contribution Guidefor details about the branch policies and the code style to follow.
Many thanks toNicola Montecchio,who suggested using MFCCs and DTW,and co-developed the first experimental codefor aligning audio and text.
Paolo Bertasi, who developed theAPIs and Web application for ReadBeyond Sync,helped shaping the structure of this packagefor its asynchronous usage.
Chris Hubbard prepared the files forpackaging aeneas as a Debian/Ubuntu.deb.
Daniel Bair prepared thebrew formulafor installingaeneas and its dependencies on Mac OS X.
Daniel Bair,Chris Hubbard, andRichard Margettspackaged the installers for Mac OS X and Windows.
Firat Ozdemir contributed thefinetuneasHTML/JS code for fine tuning sync maps in the browser.
Willem van der Walt contributed the code snippetto output a sync map in TextGrid format.
Chris Vaughn contributed the MacOS TTS wrapper.
All the mightyGitHub contributors,and the members of theGoogle Group.
About
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
