Movatterモバイル変換

[0]ホーム

Jump to content

Speech synthesis

Edit links

Checked

From Wikipedia, the free encyclopedia

Page version status

This is an accepted version of this page

This is thelatest accepted revision,reviewed on20 November 2025.

Artificial production of human speech

Automatic announcement

A synthetic voice announcing an arriving train in Sweden.

Problems playing this file? Seemedia help.

Speech synthesis is the artificial production of humanspeech. A computer system used for this purpose is called aspeech synthesizer, and can be implemented insoftware orhardware products. Atext-to-speech (TTS) system converts normal language text into speech; other systems rendersymbolic linguistic representations likephonetic transcriptions into speech.^[1] The reverse process isspeech recognition.

Synthesized speech can be created byconcatenating pieces of recorded speech that are stored in adatabase. Systems differ in the size of the stored speech units; a system that storesphones ordiphones provides the largest output range, but may lack clarity.^{[citation needed]} For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of thevocal tract and other human voice characteristics to create a completely "synthetic" voice output.^[2]

The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people withvisual impairments orreading disabilities to listen to written words on a home computer. The earliest computeroperating system to have included a speech synthesizer wasUnix in 1974, through the Unixspeak utility.^[3] In 2000, Microsoft Sam was the defaulttext-to-speech voice synthesizer used by thenarrator accessibility feature, which shipped with all Windows 2000 operating systems, and subsequent Windows XP systems.

A text-to-speech system (or "engine") is composed of two parts:^[4] afront-end and aback-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often calledtext normalization,pre-processing, ortokenization. The front-end then assignsphonetic transcriptions to each word, and divides and marks the text intoprosodic units, likephrases,clauses, andsentences. The process of assigning phonetic transcriptions to words is calledtext-to-phoneme orgrapheme-to-phoneme conversion. Phonetic transcriptions andprosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as thesynthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of thetarget prosody (pitch contour, phoneme durations),^[5] which is then imposed on the output speech.

v t e Sound synthesis types
Frequency modulation Linear arithmetic Phase distortion Scanned Subtractive Additive Distortion
Sample-based orSampler	Wavetable Granular Vector Concatenative
Physical modelling	Banded waveguide Digital waveguide Direct digital Formant Karplus–Strong string
Analog synthesizer	Graphical sound Modular
Digital synthesizer	Analog modeling Scanned synthesis Software synthesizer

Authority control databases
National	Japan Czech Republic Israel
Other	MusicBrainz instrument

Movatterモバイル変換

History

Electronic devices

Artificial intelligence

Synthesizer technologies

Concatenation synthesis

Unit selection synthesis

Diphone synthesis

Domain-specific synthesis

Formant synthesis

Articulatory synthesis

HMM-based synthesis

Sinewave synthesis

Deep learning-based synthesis

Audio deepfakes

Challenges

Text normalization challenges

Text-to-phoneme challenges

Evaluation challenges

Prosodics and emotional content

Dedicated hardware

Hardware and software systems

Texas Instruments

Mattel

SAM

Atari

Apple

Amazon

AmigaOS

Microsoft Windows

Votrax

Text-to-speech systems

Android

Internet

Open source

Others

Digital sound-alikes

Speech synthesis markup languages

Applications

Singing synthesis

See also

References

External links