Movatterモバイル変換

[0]ホーム

Jump to content

Linear predictive coding

Edit links

From Wikipedia, the free encyclopedia

Speech analysis and encoding technique

Linear predictive coding (LPC) is a method used mostly inaudio signal processing andspeech processing for representing thespectral envelope of adigital signal ofspeech incompressed form, using the information of alinear predictive model.^[1]^[2]

LPC is the most widely used method inspeech coding andspeech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a lowbit rate.

Overview

[edit]

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (forvoiced sounds), with occasional added hissing and popping sounds (forvoiceless sounds such assibilants andplosives). Although apparently crude, thisSource–filter model is actually a close approximation of the reality of speech production. Theglottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) andfrequency (pitch). Thevocal tract (the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise toformants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

[edit]

Linear prediction (signal estimation) goes back to at least the 1940s whenNorbert Wiener developed a mathematical theory for calculating the bestfilters and predictors for detecting signals hidden in noise.^[3]^[4] Soon afterClaude Shannon established ageneral theory of coding, work on predictive coding was done byC. Chapin Cutler,^[5]Bernard M. Oliver^[6] and Henry C. Harrison.^[7]Peter Elias in 1955 published two papers on predictive coding of signals.^[8]^[9]

Linear predictors were applied to speech analysis independently byFumitada Itakura ofNagoya University and Shuzo Saito ofNippon Telegraph and Telephone in 1966 and in 1967 byBishnu S. Atal,Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based onmaximum likelihood estimation; Atal and Schroeder described anadaptive linear predictor approach; Burg outlined an approach based onprinciple of maximum entropy.^[4]^[10]^[11]^[12]

In 1969, Itakura and Saito introduced a method based onpartial correlation (PARCOR),Glen Culler proposed real-time speech encoding, andBishnu S. Atal presented an LPC speech coder at the Annual Meeting of theAcoustical Society of America. In 1971, realtime LPC using16-bit LPC hardware was demonstrated byPhilco-Ford; four units were sold.^[13] LPC technology was advanced by Bishnu Atal andManfred Schroeder during the 1970s–1980s.^[13] In 1978, Atal and Vishwanathet al. of BBN developed the firstvariable-rate LPC algorithm.^[13] The same year, Atal andManfred R. Schroeder at Bell Labs proposed an LPC speechcodec calledadaptive predictive coding, which used apsychoacoustic coding algorithm exploiting the masking properties of the human ear.^[14]^[15] This later became the basis for theperceptual coding technique used by theMP3 audio compression format, introduced in 1993.^[14]Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985.^[16]

LPC is the basis forvoice-over-IP (VoIP) technology.^[13] In 1972,Bob Kahn ofARPA with Jim Forgie ofLincoln Laboratory (LL) and Dave Walden ofBBN Technologies started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over theARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratory.

LPC coefficient representations

[edit]

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (seelinear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.

There are more advanced representations such aslog area ratios (LAR),line spectral pairs (LSP) decomposition andreflection coefficients. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

[edit]

LPC is the most widely used method inspeech coding andspeech synthesis.^[17] It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in theGSM standard, for example. It is also used forsecure wireless, where voice must bedigitized,encrypted and sent over a narrow voice channel; an early example of this is the US government'sNavajo I.

LPC synthesis can be used to constructvocoders where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular inelectronic music.Paul Lansky made the well-known computer music piecenotjustmoreidlechatter using linear predictive coding.^[18]A 10th-order LPC was used in the popular 1980sSpeak & Spell educational toy.

LPC predictors are used inShorten,MPEG-4 ALS,FLAC,SILK audio codec, and otherlossless audio codecs.

LPC has received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.^[19]

References

[edit]

^Deng, Li; Douglas O'Shaughnessy (2003).Speech processing: a dynamic and optimization-oriented approach.Marcel Dekker. pp. 41–48.ISBN 978-0-8247-4040-5.
^Beigi, Homayoon (2011).Fundamentals of Speaker Recognition. Berlin: Springer-Verlag.ISBN 978-0-387-77591-3.
^B.S. Atal (2006)."The history of linear prediction".IEEE Signal Processing Magazine.23 (2):154–161.Bibcode:2006ISPM...23..154A.doi:10.1109/MSP.2006.1598091.S2CID 15601493.
^^a ^bY. Sasahira; S. Hashimoto (1995)."Voice pitch changing by Linear Predictive Coding Method to keep the Singer's Personal Timbre"(pdf). Michigan Publishing.{{cite journal}}:Cite journal requires|journal= (help)
^US 2605361, C. C. Cutler, "Differential quantization of communication signals", published 1952-07-29
^B. M. Oliver (1952). "Efficient coding".The Bell System Technical Journal.31 (4). Nokia Bell Labs:724–750.Bibcode:1952BSTJ...31..724O.doi:10.1002/j.1538-7305.1952.tb01403.x.
^H. C. Harrison (1952). "Experiments with linear prediction in television".Bell System Technical Journal.31 (4):764–783.Bibcode:1952BSTJ...31..764H.doi:10.1002/j.1538-7305.1952.tb01405.x.
^P. Elias (1955). "Predictive coding I".IRE Trans. Inform. Theory. IT-1 no. 1:16–24.Bibcode:1955IRTIT...1...16E.doi:10.1109/TIT.1955.1055126.
^P. Elias (1955). "Predictive coding II".IRE Trans. Inform. Theory. IT-1 no. 1:24–33.Bibcode:1955IRTIT...1...24E.doi:10.1109/TIT.1955.1055116.
^S. Saito; F. Itakura (Jan 1967). "Theoretical consideration of the statistical optimum recognition of the spectral density of speech".J. Acoust. Soc. Jpn.
^B.S. Atal; M.R. Schroeder (1967). "Predictive coding of speech".Conf. Communications and Proc.
^J.P. Burg (1967). "Maximum Entropy Spectral Analysis".Proceedings of 37th Meeting, Society of Exploration Geophysics, Oklahoma City.
^^a ^b ^c ^dGray, Robert M. (2010)."A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol"(PDF).Found. Trends Signal Process.3 (4):203–303.doi:10.1561/2000000036.ISSN 1932-8346.Archived(PDF) from the original on 2022-10-09.
^^a ^bSchroeder, Manfred R. (2014)."Bell Laboratories".Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. Springer. p. 388.ISBN 9783319056609.
^Atal, B.; Schroeder, M. (1978). "Predictive coding of speech signals and subjective error criteria".ICASSP '78. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 3. pp. 573–576.doi:10.1109/ICASSP.1978.1170564.
^Schroeder, Manfred R.;Atal, Bishnu S. (1985). "Code-excited linear prediction(CELP): High-quality speech at very low bit rates".ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 10. pp. 937–940.doi:10.1109/ICASSP.1985.1168147.S2CID 14803427.
^Gupta, Shipra (May 2016)."Application of MFCC in Text Independent Speaker Recognition"(PDF).International Journal of Advanced Research in Computer Science and Software Engineering.6 (5): 805–810 (806).ISSN 2277-128X.S2CID 212485331. Archived fromthe original(PDF) on 2019-10-18. Retrieved18 October 2019.
^Lansky, Paul."More Than Idle Chatter". Archived fromthe original on 2017-12-24. Retrieved2024-06-02.
^Tai, Hwan-Ching; Chung, Dai-Ting (June 14, 2012)."Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females".Savart Journal.1 (2). Archived fromthe original on December 3, 2013. RetrievedMarch 7, 2013.

External links

[edit]

Data compression methods

Lossless
type

Entropy	Adaptive coding Arithmetic Asymmetric numeral systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Unary Universal Exp-Golomb Fibonacci Gamma Levenshtein
Dictionary	Byte-pair encoding Lempel–Ziv 842 LZ4 LZJB LZO LZRW LZSS LZW LZWL Snappy
Other	BWT CTW CM Delta Incremental DMC DPCM Grammar Re-Pair Sequitur LDCT MTF PAQ PPM RLE
Hybrid	LZ77 + Huffman Deflate LZX LZS LZ77 + ANS LZFSE LZ77 + Huffman + ANS Zstandard LZ77 + Huffman + context Brotli LZSS + Huffman LHA/LZH LZ77 + Range LZMA LZHAM RLE + BWT + MTF + Huffman bzip2

Lossy
type

Transform	Discrete cosine transform DCT MDCT DST FFT Wavelet Daubechies DWT SPIHT
Predictive	DPCM ADPCM LPC ACELP CELP LAR LSP WLPC Motion Compensation Estimation Vector Psychoacoustic

Audio

Concepts	Bit rate ABR CBR VBR Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Silence compression Sound quality Speech coding Sub-band coding
Codec parts	A-law μ-law DPCM ADPCM DM FT FFT LPC ACELP CELP LAR LSP WLPC MDCT Psychoacoustic model

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image Texture compression
Methods	Chain code DCT Deflate Fractal KLT LP RLE Wavelet Daubechies DWT EZW SPIHT

Video

Concepts	Bit rate ABR CBR VBR Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality
Codec parts	DCT DPCM Deblocking filter Lapped transform Motion Compensation Estimation Vector Wavelet Daubechies DWT