| Speech Recognition & Synthesis | |
|---|---|
| Developer | |
| Initial release | 10 October 2013; 12 years ago (2013-10-10) |
| Stable release | |
| Operating system | Android 8+ Discontinued
|
| Type | Screen reader |
Speech Recognition & Synthesis, formerly known asSpeech Services,[3] is ascreen reader application developed byGoogle for itsAndroid operating system. It powers applications to read aloud (speak) the text on the screen, with support for many languages. Text-to-Speech may be used by apps such asGoogle Play Books for reading books aloud,Google Translate for reading aloud translations for the pronunciation of words,Google TalkBack, and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Speech Recognition & Synthesis" – news ·newspapers ·books ·scholar ·JSTOR(November 2023) (Learn how and when to remove this message) |
Some app developers have started adapting and tweaking their Android Auto apps to include Text-to-Speech, such asHyundai in 2015.[4] Apps such as textPlus andWhatsApp use Text-to-Speech to read notifications aloud and provide voice-reply functionality.
Google Cloud Text-to-Speech is powered byWaveNet,[5] software created by Google's UK-based AI subsidiaryDeepMind, which was bought by Google in 2014.[6] It tries to distinguish from its competitors,Amazon andMicrosoft.[7]
Most voice synthesizers (including Apple'sSiri) useconcatenative synthesis,[5] in which a program stores individualphonemes and then pieces them together to form words and sentences.WaveNet synthesizes speech with human-like emphasis and inflection on syllables, phonemes, and words.Unlike most other text-to-speech systems, a WaveNet model createsraw audio waveforms from scratch. The model uses a neural network that has been trained using a large volume of speech samples. During training, the network extracts the underlying structure of the speech, such as which tones follow each other and what a realistic speech waveform looks like. When given a text input, the trained WaveNet model can generate the corresponding speech waveforms from scratch, one sample at a time, with up to 24,000 samples per second and smooth transitions between the individual sounds.[5]
The service was renamed Speech Recognition & Synthesis in 2023.[citation needed]