JP2002062899A

Movatterモバイル変換

Info

Publication number: JP2002062899A
Application number: JP2000251969A
Authority: JP
Inventors: Tetsujiro Kondo; 哲二郎近藤; Masaaki Hattori; 正明服部; Yasuhiro Fujimori; 泰弘藤森; Tsutomu Watanabe; 勉渡辺; Hiroto Kimura; 裕人木村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-08-23
Filing date: 2000-08-23
Publication date: 2002-02-28

Abstract

PROBLEM TO BE SOLVED: To obtain high quality synthesized sound. SOLUTION: In a signal receiving section 94 of a CELP (Code Excited Linear Prediction Coding) system portable telephone, the code outputted by a channel decoder 21 are decoded into decoded residual signals and decoded linear prediction coefficients. In a predicting section 106, predicted values of true residual signals are obtained by employing the decoded residual signals and tap coefficients obtained by learning. Then, in a voice synthesis filter 29, voice synthesis is conducted by using the residual signals and linear predicted coefficients respectively obtained by the sections 106 and 107.

Description

Translated fromJapanese

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データ処理装置お
よびデータ処理方法、学習装置および学習方法、並びに
記録媒体に関し、特に、例えば、ＣＥＬＰ(Code Excite
d Liner Prediction coding)方式で符号化された音声
を、高音質の音声に復号することができるようにするデ
ータ処理装置およびデータ処理方法、学習装置および学
習方法、並びに記録媒体に関する。The present invention relates to a data processing device and a data processing method, a learning device and a learning method, and a recording medium, and particularly to, for example, a CELP (Code Excite).
The present invention relates to a data processing device and a data processing method, a learning device and a learning method, and a recording medium that enable a speech coded by a d Liner Prediction coding) method to be decoded into a high-quality speech.

【０００２】[0002]

【従来の技術】図１および図２は、従来の携帯電話機の
一例の構成を示している。2. Description of the Related Art FIGS. 1 and 2 show an example of the configuration of a conventional portable telephone.

【０００３】この携帯電話機では、音声を、ＣＥＬＰ方
式により所定のコードに符号化して送信する送信処理
と、他の携帯電話機から送信されてくるコードを受信し
て、音声に復号する受信処理とが行われるようになって
おり、図１は、送信処理を行う送信部を、図２は、受信
処理を行う受信部を、それぞれ示している。[0003] In this portable telephone, a transmission process of encoding a speech into a predetermined code by the CELP method and transmitting the same, and a reception process of receiving a code transmitted from another portable telephone and decoding it into speech are performed. FIG. 1 shows a transmitting unit for performing a transmitting process, and FIG. 2 shows a receiving unit for performing a receiving process.

【０００４】図１に示した送信部では、ユーザが発話し
た音声が、マイク（マイクロフォン）１に入力され、そ
こで、電気信号としての音声信号に変換され、Ａ／Ｄ(A
nalog/Digital)変換部２に供給される。Ａ／Ｄ変換部２
は、マイク１からのアナログの音声信号を、所定のサン
プリング周波数（例えば、８kHz）でサンプリングする
ことにより、ディジタルの音声信号にＡ／Ｄ変換し、さ
らに、所定のビット数で量子化を行って、演算器３とＬ
ＰＣ(Liner Prediction Coefficient)分析部４に供給す
る。In the transmitting unit shown in FIG. 1, a voice uttered by a user is input to a microphone (microphone) 1, where it is converted into a voice signal as an electric signal, and A / D (A
(nalog / Digital) conversion unit 2. A / D converter 2
Converts A / D into a digital audio signal by sampling an analog audio signal from the microphone 1 at a predetermined sampling frequency (for example, 8 kHz), and further performs quantization by a predetermined number of bits. , Arithmetic unit 3 and L
It is supplied to a PC (Liner Prediction Coefficient) analysis unit 4.

【０００５】ＬＰＣ分析部４は、Ａ／Ｄ変換部２からの
音声信号を、所定のフレーム（例えば、１６０サンプ
ル）ごとにＬＰＣ分析し、Ｐ次の線形予測係数α₁，
α₂，・・・，α_Pを求める。そして、ＬＰＣ分析部４
は、このＰ次の線形予測係数α_p（ｐ＝１，２，・・
・，Ｐ）を要素とするベクトルを、音声の特徴ベクトル
として、ベクトル量子化部５に供給する。[0005] The LPC analysis unit 4 performs an LPC analysis of the audio signal from the A / D conversion unit 2 for each predetermined frame (for example, 160 samples), and obtains a P-order linear prediction coefficient α₁ ,
α₂ ,..., α_P are obtained. And the LPC analysis unit 4
Is the P-order linear prediction coefficient α_p (p = 1, 2,.
., P) are supplied to the vector quantization unit 5 as speech feature vectors.

【０００６】ベクトル量子化部５は、線形予測係数を要
素とするコードベクトル（セントロイドベクトル）とコ
ードとを対応付けたコードブックを記憶しており、その
コードブックに基づいて、ＬＰＣ分析部４からの特徴ベ
クトルαをベクトル量子化する。そして、ベクトル量子
化部５は、そのベクトル量子化の結果得られるコード
（以下、適宜、Ａコード(A_code)という）を、コード決
定部１５に供給する。The vector quantization unit 5 stores a code book in which a code vector (centroid vector) having a linear prediction coefficient as an element is associated with a code, and based on the code book, an LPC analysis unit 4 stores the code book. Is vector-quantized from the feature vector α. Then, the vector quantization unit 5 supplies a code obtained as a result of the vector quantization (hereinafter, appropriately referred to as an A code (A_code)) to the code determination unit 15.

【０００７】さらに、ベクトル量子化部５は、Ａコード
に対応するコードベクトルα’を構成する要素となって
いる線形予測係数α₁’，α₂’，・・・，α_P’を、音
声合成フィルタ６に供給する。[0007] Further, the vector quantization unit 5 converts the linear prediction coefficients α₁ ′, α₂ ′,..., Α_P ′, which constitute the code vector α ′ corresponding to the A code, into speech. It is supplied to the synthesis filter 6.

【０００８】音声合成フィルタ６は、例えば、ＩＩＲ(I
nfinite Impulse Response)型のディジタルフィルタ
で、ベクトル量子化部５からの線形予測係数α_p’（ｐ
＝１，２，・・・，Ｐ）をＩＩＲフィルタのタップ係数
とするとともに、演算器１４から供給される残差信号ｅ
を入力信号として、音声合成を行う。The speech synthesis filter 6 is, for example, an IIR (I
nfinite Impulse Response) type digital filter, and the linear prediction coefficient α_p ′ (p
= 1, 2,..., P) as the tap coefficients of the IIR filter, and the residual signal e supplied from the arithmetic unit 14.
Is used as an input signal to perform speech synthesis.

【０００９】即ち、ＬＰＣ分析部４で行われるＬＰＣ分
析は、現在時刻ｎの音声信号（のサンプル値）ｓ_n、お
よびこれに隣接する過去のＰ個のサンプル値ｓ_n-1，ｓ
_n-2，・・・，ｓ_n-Pに、式ｓ_n＋α₁ｓ_n-1＋α₂ｓ_n-2＋・・・＋α_Pｓ_n-P＝e_n ・・・（１）で示す線形１次結合が成立すると仮定し、現在時刻ｎの
サンプル値ｓ_nの予測値（線形予測値）ｓ_n’を、過去の
Ｐ個のサンプル値ｓ_n-1，ｓ_n-2，・・・，ｓ_n-Pを用い
て、式ｓ_n’＝−（α₁ｓ_n-1＋α₂ｓ_n-2＋・・・＋α_Pｓ_n-P）・・・（２）によって線形予測したときに、実際のサンプル値ｓ_nと
線形予測値ｓ_n’との間の自乗誤差を最小にする線形予
測係数α_pを求めるものである。That is, the LPC analysis performed by the LPC analysis unit 4 includes (a sample value of) the audio signal s_{n at} the current time n and the past P sample values s_n−1 and s adjacent thereto.
_n-2, ···, the s_nP, linear combination represented by the formula_{_{_{s n + α 1 s n-}}} 1 + α 2 s n-2 + ··· + α P s nP = e n ··· (1) There assuming satisfied, the predicted value of the sample value s_n at the current time n the (linear prediction value) s_n ', past P sample values_{_{s n-1, s n-}} 2, ···, s nP When the linear prediction is performed by the equation s_n ′ = − (α₁ s_n−1 + α₂ s_n−2 +... + Α_P s_nP ) (2), the actual sample value s A linear prediction coefficient α_p that minimizes the square error between_n and the linear prediction value s_n ′ is obtained.

【００１０】ここで、式（１）において、｛e_n｝（・・
・，e_n-1，e_n，e_n+1，・・・）は、平均値が０で、分散
が所定値σ²の互いに無相関な確率変数である。[0010] Here, in the formula_{(1), {e n}} (··
, E_n−1 , e_n , e_{n + 1} ,...) Are uncorrelated random variables having an average value of 0 and a variance of a predetermined value σ² .

【００１１】式（１）から、サンプル値ｓ_nは、式ｓ_n＝e_n−（α₁ｓ_n-1＋α₂ｓ_n-2＋・・・＋α_Pｓ_n-P）・・・（３）で表すことができ、これを、Ｚ変換すると、次式が成立
する。From the [0011] formula (1), the sample value s_n the formula_{_{s n = e n - (α}} 1 s n-1 + α 2 s n-2 + ··· + α P s nP) ··· (3) Which can be expressed by the following equation.

【００１２】Ｓ＝Ｅ／（１＋α₁ｚ^-¹＋α₂ｚ^-²＋・・・＋α_Pｚ^-^P）・・・（４）但し、式（４）において、ＳとＥは、式（３）における
ｓ_nとｅ_nのＺ変換を、それぞれ表す。[0012] S = E / (1 + α 1 z - 1 + α 2 z - 2 + ··· + α P z - P) ··· (4) In Expression (4), S and E, the formula (3 the Z transform of s_n and e_n in), it represents respectively.

【００１３】ここで、式（１）および（２）から、ｅ_n
は、式ｅ_n＝ｓ_n−ｓ_n’・・・（５）で表すことができ、実際のサンプル値ｓ_nと線形予測値
ｓ_n’との間の残差信号と呼ばれる。Here, from equations (1) and (2), e_n
It is 'can be represented by (5), the actual sample value s_n and linear predicted value s_n' wherein e_n = s_n -s_n called residual signal between.

【００１４】従って、式（４）から、線形予測係数α_p
をＩＩＲフィルタのタップ係数とするとともに、残差信
号ｅ_nをＩＩＲフィルタの入力信号とすることにより、
音声信号ｓ_nを求めることができる。Therefore, from equation (4), the linear prediction coefficient α_p
With the tap coefficients of the IIR filter, by the residual signal e_n as an input signal of the IIR filter,
It can be obtained audio signal s_n.

【００１５】そこで、音声合成フィルタ６は、上述した
ように、ベクトル量子化部５からの線形予測係数α_p’
をタップ係数とするとともに、演算器１４から供給され
る残差信号ｅを入力信号として、式（４）を演算し、音
声信号（合成音信号）ｓｓを求める。Therefore, the speech synthesis filter 6 receives the linear prediction coefficient α_p ′ from the vector quantization unit 5 as described above.
Is used as a tap coefficient, and using the residual signal e supplied from the arithmetic unit 14 as an input signal, the equation (4) is calculated to obtain a speech signal (synthesized sound signal) ss.

【００１６】なお、音声合成フィルタ６では、ＬＰＣ分
析部４によるＬＰＣ分析の結果得られる線形予測係数α
_pではなく、そのベクトル量子化の結果得られるコード
に対応するコードベクトルとしての線形予測係数α_p’
が用いられるため、音声合成フィルタ６が出力する合成
音信号は、Ａ／Ｄ変換部２が出力する音声信号とは、基
本的に同一にならない。In the speech synthesis filter 6, a linear prediction coefficient α obtained as a result of the LPC analysis by the LPC analysis unit 4 is used.
Instead of_p , a linear prediction coefficient α_p ′ as a code vector corresponding to the code obtained as a result of the vector quantization
Is used, the synthesized sound signal output from the sound synthesis filter 6 is not basically the same as the sound signal output from the A / D converter 2.

【００１７】音声合成フィルタ６が出力する合成音信号
ｓｓは、演算器３に供給される。演算器３は、音声合成
フィルタ６からの合成音信号ｓｓから、Ａ／Ｄ変換部２
が出力する音声信号ｓを減算し、その減算値を、自乗誤
差演算部７に供給する。自乗誤差演算部７は、演算器３
からの減算値の自乗和（第ｋフレームのサンプル値につ
いての自乗和）を演算し、その結果得られる自乗誤差
を、自乗誤差最小判定部８に供給する。The synthesized sound signal ss output from the voice synthesis filter 6 is supplied to the arithmetic unit 3. The arithmetic unit 3 converts the synthesized sound signal ss from the speech synthesis filter 6 into an A / D converter 2
Subtracts the output audio signal s, and supplies the subtracted value to the square error calculator 7. The square error calculator 7 is configured to calculate
, The sum of the squares of the subtraction value from the sum (the sum of the squares of the sample values of the k-th frame) is calculated, and the resulting square error is supplied to the square error minimum determination unit 8.

【００１８】自乗誤差最小判定部８は、自乗誤差演算部
７が出力する自乗誤差に対応付けて、ラグを表すコード
としてのＬコード(L_code)、ゲインを表すコードとして
のＧコード(G_code)、および符号語を表すコードとして
のＩコード(I_code)を記憶しており、自乗誤差演算部７
が出力する自乗誤差に対応するＬコード、Ｇコード、お
よびＬコードを出力する。Ｌコードは、適応コードブッ
ク記憶部９に供給され、Ｇコードは、ゲイン復号器１０
に供給される。また、Ｉコードは、励起コードブック記
憶部１１に供給される。さらに、Ｌコード、Ｇコード、
およびＩコードは、コード決定部１５にも供給される。The minimum square error determining section 8 correlates the square error output from the square error calculating section 7 with an L code (L_code) as a code representing a lag, a G code (G_code) as a code representing a gain, And an I code (I_code) as a code representing a code word.
Output the L code, the G code, and the L code corresponding to the squared error output by. The L code is supplied to the adaptive codebook storage unit 9, and the G code is supplied to the gain decoder 10.
Supplied to Further, the I code is supplied to the excitation codebook storage unit 11. Furthermore, L code, G code,
And the I code are also supplied to the code determination unit 15.

【００１９】適応コードブック記憶部９は、Ｌコード
と、所定の遅延時間（ラグ）とを対応付けた適応コード
ブックを記憶しており、演算器１４から供給される残差
信号ｅを、自乗誤差最小判定部８から供給されるＬコー
ドに対応付けられた遅延時間だけ遅延して、演算器１２
に出力する。The adaptive codebook storage unit 9 stores an adaptive codebook in which an L code is associated with a predetermined delay time (lag), and stores the residual signal e supplied from the arithmetic unit 14 as a square. The operation unit 12 is delayed by a delay time associated with the L code supplied from the error minimum determination unit 8.
Output to

【００２０】ここで、適応コードブック記憶部９は、残
差信号ｅを、Ｌコードに対応する時間だけ遅延して出力
することから、その出力信号は、その遅延時間を周期と
する周期信号となる。この周期信号は、音声合成フィル
タ６における線形予測係数を用いた音声合成において、
主として、有声音の合成音を生成するための駆動信号と
なる。Here, since the adaptive codebook storage unit 9 outputs the residual signal e with a delay corresponding to the time corresponding to the L code, the output signal is a periodic signal having a cycle of the delay time. Become. In the speech synthesis using the linear prediction coefficient in the speech synthesis filter 6,
It is mainly a drive signal for generating a synthesized voiced voice.

【００２１】ゲイン復号器１０は、Ｇコードと、所定の
ゲインβおよびγとを対応付けたテーブルを記憶してお
り、自乗誤差最小判定部８から供給されるＧコードに対
応付けられたゲインβおよびγを出力する。ゲインβと
γは、演算器１２と１３に、それぞれ供給される。The gain decoder 10 stores a table in which a G code is associated with predetermined gains β and γ, and a gain β associated with the G code supplied from the square error minimum determining unit 8 is stored. And γ are output. The gains β and γ are supplied to computing units 12 and 13, respectively.

【００２２】励起コードブック記憶部１１は、Ｉコード
と、所定の励起信号とを対応付けた励起コードブックを
記憶しており、自乗誤差最小判定部８から供給されるＩ
コードに対応付けられた励起信号を、演算器１３に出力
する。The excitation codebook storage unit 11 stores an excitation codebook in which an I code is associated with a predetermined excitation signal, and is supplied from the minimum square error determination unit 8.
An excitation signal associated with the code is output to the calculator 13.

【００２３】ここで、励起コードブックに記憶されてい
る励起信号は、例えば、ホワイトノイズ等であり、音声
合成フィルタ６における線形予測係数を用いた音声合成
において、主として、無声音の合成音を生成するための
駆動信号となる。Here, the excitation signal stored in the excitation codebook is, for example, white noise or the like. In speech synthesis using linear prediction coefficients in the speech synthesis filter 6, mainly a synthesized voice of unvoiced sound is generated. Drive signal for

【００２４】演算器１２は、適応コードブック記憶部９
の出力信号と、ゲイン復号器１０が出力するゲインβと
を乗算し、その乗算値ｌを、演算器１４に供給する。演
算器１３は、励起コードブック記憶部１１の出力信号
と、ゲイン復号器１０が出力するゲインγとを乗算し、
その乗算値ｎを、演算器１４に供給する。演算器１４
は、演算器１２からの乗算値ｌと、演算器１３からの乗
算値ｎとを加算し、その加算値を、残差信号ｅとして、
音声合成フィルタ６に供給する。The arithmetic unit 12 stores the adaptive codebook storage unit 9
Is multiplied by the gain β output from the gain decoder 10, and the multiplied value 1 is supplied to the calculator 14. The arithmetic unit 13 multiplies the output signal of the excitation codebook storage unit 11 by the gain γ output by the gain decoder 10,
The multiplied value n is supplied to the arithmetic unit 14. Arithmetic unit 14
Adds the multiplied value 1 from the computing unit 12 and the multiplied value n from the computing unit 13 and uses the sum as a residual signal e as
It is supplied to the voice synthesis filter 6.

【００２５】音声合成フィルタ６では、以上のようにし
て、演算器１４から供給される残差信号ｅを入力信号と
して、その入力信号が、ベクトル量子化部５から供給さ
れる線形予測係数α_p’をタップ係数とするＩＩＲフィ
ルタでフィルタリングされ、その結果得られる合成音信
号が、演算器３に供給される。そして、演算器３および
自乗誤差演算部７において、上述の場合と同様の処理が
行われ、その結果得られる自乗誤差が、自乗誤差最小判
定部８に供給される。As described above, the speech synthesis filter 6 uses the residual signal e supplied from the arithmetic unit 14 as an input signal and converts the input signal into the linear prediction coefficient α_p supplied from the vector quantization unit 5. Filtered by an IIR filter using 'as a tap coefficient, the resultant synthesized sound signal is supplied to the arithmetic unit 3. Then, the same processing as described above is performed in the arithmetic unit 3 and the square error calculator 7, and the square error obtained as a result is supplied to the minimum square error determiner 8.

【００２６】自乗誤差最小判定部８は、自乗誤差演算部
７からの自乗誤差が最小（極小）になったかどうかを判
定し、最小になっていないと判定した場合、上述のよう
に、その自乗誤差に対応するＬコード、Ｇコード、およ
びＬコードを出力して、以下、同様の処理が繰り返され
る。The squared error minimum judging section 8 judges whether or not the squared error from the squared error calculating section 7 has become minimum (minimum). If the squared error has not been minimized, as described above, The L code, the G code, and the L code corresponding to the error are output, and the same processing is repeated thereafter.

【００２７】一方、自乗誤差最小判定部８は、自乗誤差
が最小になったと判定した場合、確定信号を、コード決
定部１５に出力する。コード決定部１５は、ベクトル量
子化部５から供給されるＡコードを順次ラッチするとと
もに、自乗誤差最小判定部８から供給されるＬコード、
Ｇコード、およびＩコードを順次ラッチするようになっ
ており、自乗誤差最小判定部８から確定信号を受信する
と、そのときラッチしているＡコード、Ｌコード、Ｇコ
ード、およびＩコードを、チャネルエンコーダ１６に供
給する。チャネルエンコーダ１６は、コード決定部１５
からのＡコード、Ｌコード、Ｇコード、およびＩコード
を多重化し、コードデータとして出力する。このコード
データは、伝送路を介して送信される。On the other hand, when the square error minimum judging section 8 judges that the square error has become minimum, it outputs a determination signal to the code determining section 15. The code determination unit 15 sequentially latches the A code supplied from the vector quantization unit 5, and outputs the L code supplied from the square error minimum determination unit 8,
The G code and the I code are sequentially latched. When a decision signal is received from the square error minimum determination unit 8, the A code, L code, G code, and I code latched at that time are transmitted to the channel. It is supplied to the encoder 16. The channel encoder 16 includes a code determination unit 15
A code, L code, G code, and I code are multiplexed and output as code data. This code data is transmitted via a transmission path.

【００２８】なお、以下では、説明を簡単にするため、
Ａコード、Ｌコード、Ｇコード、およびＩコードは、フ
レームごとに求められるものとする。但し、例えば、１
フレームを、４つのサブフレームに分割し、Ｌコード、
Ｇコード、およびＩコードは、サブフレームごとに求め
るようにすること等が可能である。In the following, in order to simplify the description,
The A code, L code, G code, and I code are determined for each frame. However, for example, 1
Divide the frame into four subframes, L code,
The G code and the I code can be determined for each subframe.

【００２９】ここで、図１（後述する図２、図１１、お
よび図１２においても同様）では、各変数に、[k]が付
され、配列変数とされている。このkは、フレーム数を
表すが、明細書中では、その記述は、適宜省略する。Here, in FIG. 1 (the same applies to FIGS. 2, 11 and 12 described later), each variable is marked with [k] and is an array variable. Although k represents the number of frames, the description thereof is omitted as appropriate in the specification.

【００３０】次に、以上のようにして、他の携帯電話機
の送信部から送信されてくるコードデータは、図２に示
す受信部のチャネルデコーダ２１で受信される。チャネ
ルデコーダ２１は、コードデータから、Ｌコード、Ｇコ
ード、Ｉコード、Ａコードを分離し、それぞれを、適応
コードブック記憶部２２、ゲイン復号器２３、励起コー
ドブック記憶部２４、フィルタ係数復号器２５に供給す
る。Next, as described above, the code data transmitted from the transmission section of another portable telephone is received by the channel decoder 21 of the reception section shown in FIG. The channel decoder 21 separates an L code, a G code, an I code, and an A code from code data, and separates them into an adaptive codebook storage unit 22, a gain decoder 23, an excitation codebook storage unit 24, and a filter coefficient decoder. 25.

【００３１】適応コードブック記憶部２２、ゲイン復号
器２３、励起コードブック記憶部２４、演算器２６乃至
２８は、図１の適応コードブック記憶部９、ゲイン復号
器１０、励起コードブック記憶部１１、演算器１２乃至
１４とそれぞれ同様に構成されるもので、図１で説明し
た場合と同様の処理が行われることにより、Ｌコード、
Ｇコード、およびＩコードが、残差信号ｅに復号され
る。この残差信号（復号残差信号）ｅは、音声合成フィ
ルタ２９に対して、入力信号として与えられる。The adaptive codebook storage unit 22, the gain decoder 23, the excitation codebook storage unit 24, and the arithmetic units 26 to 28 are the adaptive codebook storage unit 9, the gain decoder 10, and the excitation codebook storage unit 11 of FIG. , And the arithmetic units 12 to 14, respectively, and by performing the same processing as in the case described with reference to FIG.
The G code and the I code are decoded into a residual signal e. This residual signal (decoded residual signal) e is provided to the speech synthesis filter 29 as an input signal.

【００３２】フィルタ係数復号器２５は、図１のベクト
ル量子化部５が記憶しているのと同一のコードブックを
記憶しており、Ａコードを、線形予測係数α_p’に復号
し、音声合成フィルタ２９に供給する。The filter coefficient decoder 25 stores the same codebook as that stored in the vector quantization unit 5 in FIG. 1, decodes the A code into a linear prediction coefficient α_p ′, It is supplied to the synthesis filter 29.

【００３３】音声合成フィルタ２９は、図１の音声合成
フィルタ６と同様に構成されており、フィルタ係数復号
器２５からの線形予測係数（復号線形予測係数）α_p’
をタップ係数とするとともに、演算器２８から供給され
る残差信号ｅを入力信号として、式（４）を演算し、こ
れにより、図１の自乗誤差最小判定部８において自乗誤
差が最小と判定されたときの合成音信号を生成する。こ
の合成音信号は、Ｄ／Ａ(Digital/Analog)変換部３０に
供給される。Ｄ／Ａ変換部３０は、音声合成フィルタ２
９からの合成音信号を、ディジタル信号からアナログ信
号にＤ／Ａ変換し、スピーカ３１に供給して出力させ
る。The speech synthesis filter 29 has the same configuration as the speech synthesis filter 6 in FIG. 1, and receives the linear prediction coefficient (decoded linear prediction coefficient) α_p ′ from the filter coefficient decoder 25.
Is used as a tap coefficient, and the residual signal e supplied from the arithmetic unit 28 is used as an input signal to calculate Equation (4), whereby the square error minimum determination unit 8 in FIG. A synthesized sound signal is generated when the sound is generated. This synthesized sound signal is supplied to a D / A (Digital / Analog) converter 30. The D / A conversion unit 30 includes the speech synthesis filter 2
9 is converted from a digital signal to an analog signal by D / A conversion and supplied to a speaker 31 for output.

【００３４】[0034]

【発明が解決しようとする課題】以上のように、携帯電
話機の送信部（図１）では、受信部（図２）の音声合成
フィルタ２９に与えられるフィルタデータとしての残差
信号と線形予測係数がコード化されて送信されてくるた
め、受信部では、そのコードが、残差信号と線形予測係
数に復号される。しかしながら、この復号された残差信
号や線形予測係数（以下、適宜、それぞれを、復号残差
信号または復号線形予測係数という）には、量子化誤差
等の誤差が含まれるため、音声をＬＰＣ分析して得られ
る残差信号と線形予測係数には一致しない。As described above, in the transmitting section (FIG. 1) of the portable telephone, the residual signal and the linear prediction coefficient as the filter data given to the speech synthesis filter 29 of the receiving section (FIG. 2). Is coded and transmitted, so that the receiving unit decodes the code into a residual signal and a linear prediction coefficient. However, the decoded residual signal and the linear prediction coefficient (hereinafter, appropriately referred to as a decoded residual signal and a decoded linear prediction coefficient, respectively) include an error such as a quantization error. And the linear prediction coefficient do not match.

【００３５】このため、受信部の音声合成フィルタ２９
が出力する合成音信号は、歪みを有する、音質の劣化し
たものとなることがある。For this reason, the speech synthesis filter 29 of the receiving section
May be distorted and sound quality may be degraded.

【００３６】本発明は、このような状況に鑑みてなされ
たものであり、高音質の合成音が得られるようにするも
のである。The present invention has been made in view of such a situation, and aims to obtain a high-quality synthesized sound.

【００３７】[0037]

【課題を解決するための手段】本発明のデータ処理装置
は、コードを復号し、復号フィルタデータを出力するコ
ード復号手段と、学習を行うことにより求められた所定
のタップ係数を取得する取得手段と、タップ係数および
復号フィルタデータを用いて、所定の予測演算を行うこ
とにより、フィルタデータの予測値を求め、音声合成フ
ィルタに供給する予測手段とを備えることを特徴とす
る。A data processing apparatus according to the present invention decodes a code and outputs decoded filter data, and an acquiring means for acquiring a predetermined tap coefficient obtained by performing learning. And a prediction unit that performs a predetermined prediction operation using the tap coefficients and the decoded filter data to obtain a predicted value of the filter data and supplies the predicted value to the speech synthesis filter.

【００３８】本発明のデータ処理方法は、コードを復号
し、復号フィルタデータを出力するコード復号ステップ
と、学習を行うことにより求められた所定のタップ係数
を取得する取得ステップと、タップ係数および復号フィ
ルタデータを用いて、所定の予測演算を行うことによ
り、フィルタデータの予測値を求め、音声合成フィルタ
に供給する予測ステップとを備えることを特徴とする。According to the data processing method of the present invention, a code decoding step of decoding a code and outputting decoding filter data, an obtaining step of obtaining a predetermined tap coefficient obtained by performing learning, a tap coefficient and decoding A prediction step of performing a predetermined prediction operation using the filter data to obtain a predicted value of the filter data and supplying the predicted value to the speech synthesis filter.

【００３９】本発明の第１の記録媒体は、コードを復号
し、復号フィルタデータを出力するコード復号ステップ
と、学習を行うことにより求められた所定のタップ係数
を取得する取得ステップと、タップ係数および復号フィ
ルタデータを用いて、所定の予測演算を行うことによ
り、フィルタデータの予測値を求め、音声合成フィルタ
に供給する予測ステップとを備えるプログラムが記録さ
れていることを特徴とする。A first recording medium according to the present invention comprises: a code decoding step of decoding a code and outputting decoded filter data; an acquisition step of acquiring a predetermined tap coefficient obtained by performing learning; And a prediction step of performing a predetermined prediction operation using the decoded filter data to obtain a predicted value of the filter data and supplying the predicted value to the speech synthesis filter.

【００４０】本発明の学習装置は、フィルタデータに対
応するコードを復号し、復号フィルタデータを出力する
コード復号手段と、タップ係数および復号フィルタデー
タを用いて予測演算を行うことにより得られるフィルタ
データの予測値の予測誤差が、統計的に最小になるよう
に学習を行い、タップ係数を求める学習手段とを備える
ことを特徴とする。The learning apparatus according to the present invention decodes a code corresponding to filter data and outputs decoded filter data, and filter data obtained by performing a prediction operation using tap coefficients and decoded filter data. And learning means for learning so as to statistically minimize the prediction error of the predicted value of.

【００４１】本発明の学習方法は、フィルタデータに対
応するコードを復号し、復号フィルタデータを出力する
コード復号ステップと、タップ係数および復号フィルタ
データを用いて予測演算を行うことにより得られるフィ
ルタデータの予測値の予測誤差が、統計的に最小になる
ように学習を行い、タップ係数を求める学習ステップと
を備えることを特徴とする。According to the learning method of the present invention, a code decoding step of decoding a code corresponding to filter data and outputting decoded filter data, and a filter data obtained by performing a prediction operation using tap coefficients and decoded filter data A learning step of learning so as to statistically minimize the prediction error of the prediction value of.

【００４２】本発明の第２の記録媒体は、フィルタデー
タに対応するコードを復号し、復号フィルタデータを出
力するコード復号ステップと、タップ係数および復号フ
ィルタデータを用いて予測演算を行うことにより得られ
るフィルタデータの予測値の予測誤差が、統計的に最小
になるように学習を行い、タップ係数を求める学習ステ
ップとを備えるプログラムが記録されていることを特徴
とする。The second recording medium of the present invention is obtained by decoding a code corresponding to filter data and outputting a decoded filter data by performing a code decoding step, and performing a prediction operation using the tap coefficients and the decoded filter data. And a learning step of learning so as to statistically minimize the prediction error of the predicted value of the filter data to be obtained and obtaining a tap coefficient.

【００４３】本発明のデータ処理装置およびデータ処理
方法、並びに第１の記録媒体においては、コードが復号
され、復号フィルタデータが出力される。さらに、学習
を行うことにより求められた所定のタップ係数が取得さ
れ、タップ係数および復号フィルタデータを用いて、所
定の予測演算を行うことにより、フィルタデータの予測
値が求められる。In the data processing device and the data processing method of the present invention, and the first recording medium, the code is decoded and the decoded filter data is output. Further, a predetermined tap coefficient obtained by performing learning is obtained, and a predetermined prediction operation is performed using the tap coefficient and the decoded filter data, whereby a predicted value of the filter data is obtained.

【００４４】本発明の学習装置および学習方法、並びに
第２の記録媒体においては、フィルタデータに対応する
コードが復号され、復号フィルタデータが出力される。
そして、タップ係数および復号フィルタデータを用いて
予測演算を行うことにより得られるフィルタデータの予
測値の予測誤差が、統計的に最小になるように学習が行
われ、タップ係数が求められる。In the learning device and the learning method of the present invention, the code corresponding to the filter data is decoded, and the decoded filter data is output.
Learning is performed so that the prediction error of the prediction value of the filter data obtained by performing the prediction operation using the tap coefficient and the decoded filter data is statistically minimized, and the tap coefficient is obtained.

【００４５】[0045]

【発明の実施の形態】図３は、本発明を適用した音声合
成装置の一実施の形態の構成例を示している。FIG. 3 shows an example of the configuration of an embodiment of a speech synthesizer to which the present invention is applied.

【００４６】この音声合成装置には、音声合成フィルタ
４７に与える残差信号と線形予測係数を、それぞれコー
ド化した残差コードとＡコードが多重化されたコードデ
ータが供給されるようになっており、その残差コードと
Ａコードから、それぞれ残差信号と線形予測係数を求
め、音声合成フィルタ４７に与えることで、合成音が生
成されるようになっている。The speech synthesizer is supplied with code data obtained by multiplexing a residual code and an A code obtained by encoding the residual signal and the linear prediction coefficient to be supplied to the speech synthesis filter 47. A synthesized signal is generated by obtaining a residual signal and a linear prediction coefficient from the residual code and the A code, respectively, and applying them to the speech synthesis filter 47.

【００４７】但し、残差コードを、残差信号と残差コー
ドとを対応付けたコードブックに基づいて、残差信号に
復号した場合には、前述したように、その復号残差信号
は、誤差を含むものとなり、合成音の音質が劣化する。
同様に、Ａコードを、線形予測係数とＡコードとを対応
付けたコードブックに基づいて、線形予測係数に復号し
た場合にも、その復号線形予測係数は、誤差を含むもの
となり、合成音の音質が劣化する。However, when the residual code is decoded into a residual signal based on a code book in which the residual signal is associated with the residual code, as described above, the decoded residual signal is An error is included, and the sound quality of the synthesized sound deteriorates.
Similarly, when the A code is decoded into a linear prediction coefficient based on a code book in which the linear prediction coefficient and the A code are associated with each other, the decoded linear prediction coefficient includes an error, and Sound quality deteriorates.

【００４８】そこで、図３の音声合成装置では、学習に
より求めたタップ係数を用いた予測演算を行うことによ
り、真の残差信号と線形予測係数の予測値を求め、これ
らを用いることで、高音質の合成音を生成するようにな
っている。Therefore, the speech synthesizer shown in FIG. 3 performs a prediction operation using the tap coefficients obtained by learning to obtain the true residual signal and the predicted value of the linear prediction coefficient, and uses these to obtain A high-quality synthesized sound is generated.

【００４９】即ち、図３の音声合成装置では、例えば、
クラス分類適応処理を利用して、復号線形予測係数が、
真の線形予測係数（の予測値）に復号される。That is, in the speech synthesizer shown in FIG.
Using the classification adaptive processing, the decoded linear prediction coefficient is
It is decoded to (the predicted value of) the true linear prediction coefficient.

【００５０】クラス分類適応処理は、クラス分類処理と
適応処理とからなり、クラス分類処理によって、データ
を、その性質に基づいてクラス分けし、各クラスごとに
適応処理を施すものであり、適応処理は、以下のような
手法のものである。The class classification adaptation process includes a class classification process and an adaptation process. The class classification process classifies data into classes based on the nature of the data, and performs an adaptation process for each class. Is based on the following method.

【００５１】即ち、適応処理では、例えば、復号線形予
測係数と、所定のタップ係数との線形結合により、真の
線形予測係数の予測値が求められる。That is, in the adaptive processing, for example, a predicted value of a true linear prediction coefficient is obtained by a linear combination of a decoded linear prediction coefficient and a predetermined tap coefficient.

【００５２】具体的には、例えば、いま、真の線形予測
係数を教師データとするとともに、その真の線形予測係
数を、所定のコードブックに基づいてベクトル量子化
し、さらに、そのベクトル量子化結果としてのＡコード
を、ベクトル量子化に用いたコードブックに基づいて復
号して得られる復号線形予測係数を生徒データとして、
教師データである線形予測係数ｙの予測値Ｅ［ｙ］を、
幾つかの復号線形予測係数ｘ₁，ｘ₂，・・・の集合と、
所定のタップ係数ｗ₁，ｗ₂，・・・の線形結合により規
定される線形１次結合モデルにより求めることを考え
る。この場合、予測値Ｅ［ｙ］は、次式で表すことがで
きる。More specifically, for example, the true linear prediction coefficients are now used as teacher data, and the true linear prediction coefficients are vector-quantized based on a predetermined codebook. The decoded linear prediction coefficient obtained by decoding the A code as is based on the codebook used for vector quantization as student data,
The prediction value E [y] of the linear prediction coefficient y which is the teacher data is
A set of several decoded linear prediction coefficients x₁ , x₂ ,.
It is considered that the tap coefficients are determined by a linear combination model defined by a linear combination of predetermined tap coefficients w₁ , w₂ ,. In this case, the predicted value E [y] can be expressed by the following equation.

【００５３】Ｅ［ｙ］＝ｗ₁ｘ₁＋ｗ₂ｘ₂＋・・・・・・（６）E [y] = w₁ x₁ + w₂ x₂ +... (6)

【００５４】式（６）を一般化するために、タップ係数
ｗ_jの集合でなる行列Ｗ、生徒データｘ_ijの集合でなる
行列Ｘ、および予測値Ｅ［ｙ_j］の集合でなる行列Ｙ’
を、To generalize equation (6), a matrix W composed of a set of tap coefficients w_j , a matrix X composed of a set of student data x_ij , and a matrix Y composed of a set of predicted values E [y_j ] '
To

【数１】で定義すると、次のような観測方程式が成立する。(Equation 1) Defines the following observation equation.

【００５５】ＸＷ＝Ｙ’・・・（７）XW = Y '(7)

【００５６】ここで、行列Ｘの成分ｘ_ijは、ｉ件目の生
徒データの集合（ｉ件目の教師データｙ_iの予測に用い
る生徒データの集合）の中のｊ番目の生徒データを意味
し、行列Ｗの成分ｗ_jは、生徒データの集合の中のｊ番
目の生徒データとの積が演算されるタップ係数を表す。
また、ｙ_iは、ｉ件目の教師データを表し、従って、Ｅ
［ｙ_i］は、ｉ件目の教師データの予測値を表す。な
お、式（６）の左辺におけるｙは、行列Ｙの成分ｙ_iの
サフィックスｉを省略したものであり、また、式（６）
の右辺におけるｘ₁，ｘ₂，・・・も、行列Ｘの成分ｘ_ij
のサフィックスｉを省略したものである。Here, the component x_ij of the matrix X means the j-th student data in the i-th set of student data (a set of student data used for predicting the_i-th teacher data y_i ). The component w_{j of the} matrix W represents a tap coefficient by which a product with the j-th student data in the set of student data is calculated.
Y_i represents the i-th teacher data.
[Y_i ] represents the predicted value of the i-th teacher data. Note that y on the left side of the equation (6) is obtained by omitting the suffix i of the component y_i of the matrix Y.
X₁ on the right side of, x_2, · · · also components of the matrix X x_ij
Suffix i is omitted.

【００５７】そして、この観測方程式に最小自乗法を適
用して、真の線形予測係数ｙに近い予測値Ｅ［ｙ］を求
めることを考える。この場合、教師データとなる真の線
形予測係数ｙの集合でなる行列Ｙ、および線形予測係数
ｙに対する予測値Ｅ［ｙ］の残差ｅの集合でなる行列Ｅ
を、Then, consider that a least square method is applied to this observation equation to obtain a prediction value E [y] close to the true linear prediction coefficient y. In this case, a matrix Y consisting of a set of true linear prediction coefficients y serving as teacher data and a matrix E consisting of a set of residuals e of prediction values E [y] for the linear prediction coefficients y
To

【数２】で定義すると、式（７）から、次のような残差方程式が
成立する。(Equation 2) From equation (7), the following residual equation is established.

【００５８】ＸＷ＝Ｙ＋Ｅ・・・（８）XW = Y + E (8)

【００５９】この場合、真の線形予測係数ｙに近い予測
値Ｅ［ｙ］を求めるためのタップ係数ｗ_jは、自乗誤差In this case, the tap coefficient w_j for obtaining the prediction value E [y] close to the true linear prediction coefficient y is the square error

【数３】を最小にすることで求めることができる。(Equation 3) Can be obtained by minimizing.

【００６０】従って、上述の自乗誤差をタップ係数ｗ_j
で微分したものが０になる場合、即ち、次式を満たすタ
ップ係数ｗ_jが、真の線形予測係数ｙに近い予測値Ｅ
［ｙ］を求めるため最適値ということになる。Therefore, the above square error is calculated by tap coefficient w_j
, The tap coefficient w_j that satisfies the following equation is equal to the predicted value E close to the true linear prediction coefficient y.
This is an optimum value for obtaining [y].

【００６１】[0061]

【数４】・・・（９）(Equation 4) ... (9)

【００６２】そこで、まず、式（８）を、タップ係数ｗ
_jで微分することにより、次式が成立する。Therefore, first, the equation (8) is changed to the tap coefficient w
By differentiating with_j , the following equation is established.

【００６３】[0063]

【数５】・・・（１０）(Equation 5) ... (10)

【００６４】式（９）および（１０）より、式（１１）
が得られる。From equations (9) and (10), equation (11)
Is obtained.

【００６５】[0065]

【数６】・・・（１１）(Equation 6) ... (11)

【００６６】さらに、式（８）の残差方程式における生
徒データｘ_ij、タップ係数ｗ_j、教師データｙ_i、および
誤差ｅ_iの関係を考慮すると、式（１１）から、次のよ
うな正規方程式を得ることができる。Further, considering the relationship among the student data x_ij , the tap coefficient w_j , the teacher data y_i , and the error e_i in the residual equation of the equation (8), the following normal equation is obtained from the equation (11). Equation can be obtained.

【００６７】[0067]

【数７】・・・（１２）(Equation 7) ... (12)

【００６８】なお、式（１２）に示した正規方程式は、
行列（共分散行列）Ａおよびベクトルｖを、The normal equation shown in equation (12) is
The matrix (covariance matrix) A and the vector v are

【数８】で定義するとともに、ベクトルＷを、数１で示したよう
に定義すると、式ＡＷ＝ｖ・・・（１３）で表すことができる。(Equation 8) If the vector W is defined as shown in Expression 1, it can be expressed by the following expression: AW = v (13)

【００６９】式（１２）における各正規方程式は、生徒
データｘ_ijおよび教師データｙ_iのセットを、ある程度
の数だけ用意することで、求めるべきタップ係数ｗ_jの
数Ｊと同じ数だけたてることができ、従って、式（１
３）を、ベクトルＷについて解くことで（但し、式（１
３）を解くには、式（１３）における行列Ａが正則であ
る必要がある）、最適なタップ係数（ここでは、自乗誤
差を最小にするタップ係数）ｗ_jを求めることができ
る。なお、式（１３）を解くにあたっては、例えば、掃
き出し法（Gauss-Jordanの消去法）などを用いることが
可能である。Each normal equation in the equation (12) is prepared by preparing a certain number of sets of the student data x_ij and the teacher data y_i , and forming the same number as the number J of the tap coefficients w_{j to} be obtained. And therefore equation (1)
3) with respect to the vector W (however, equation (1)
To solve 3), the matrix A in equation (13) needs to be non-singular), and the optimal tap coefficient (here, the tap coefficient that minimizes the square error) w_j can be obtained. In solving equation (13), for example, a sweeping method (Gauss-Jordan elimination method) or the like can be used.

【００７０】以上のようにして、最適なタップ係数ｗ_j
を求めておき、さらに、そのタップ係数ｗ_jを用い、式
（６）の予測演算により、真の線形予測係数ｙに近い予
測値Ｅ［ｙ］を求めるのが適応処理である。As described above, the optimum tap coefficient w_j
The advance calculated, further, using the tap coefficients w_j, the predictive calculation of the equation (6), an adaptive process to obtain the prediction value E [y] close to the true linear prediction coefficients y.

【００７１】なお、例えば、教師データとして、高いサ
ンプリング周波数でサンプリングした音声信号、または
多ビットを割り当てた音声信号をＬＰＣ分析することに
より得られた線形予測係数を用いるとともに、生徒デー
タとして、低いサンプリング周波数でサンプリングした
音声信号、または低ビットを割り当てた音声信号をＬＰ
Ｃ分析してベクトル量子化し、そのベクトル量子化結果
を復号して得られる復号線形予測係数を用いた場合、タ
ップ係数としては、高いサンプリング周波数でサンプリ
ングした音声信号、または多ビットを割り当てた音声信
号を生成するのに、予測誤差が、統計的に最小となる線
形予測係数が得られることになる。従って、この場合、
より高音質の合成音を得ることが可能となる。For example, a linear prediction coefficient obtained by performing LPC analysis on a speech signal sampled at a high sampling frequency or a speech signal to which multiple bits are assigned is used as teacher data, and a low sampling rate is used as student data. Audio signal sampled at frequency or audio signal assigned low bit
When a decoded linear prediction coefficient obtained by performing C analysis and vector quantization and decoding the vector quantization result is used, an audio signal sampled at a high sampling frequency or an audio signal to which multiple bits are assigned is used as a tap coefficient. Is generated, a linear prediction coefficient whose prediction error is statistically minimized is obtained. Therefore, in this case,
It is possible to obtain a synthesized sound of higher sound quality.

【００７２】図３の音声合成装置では、以上のようなク
ラス分類適応処理により、復号線形予測係数を、真の線
形予測係数（の予測値）に復号する他、復号残差信号
も、真の残差信号（の予測値）に復号するようになって
いる。In the speech synthesizer shown in FIG. 3, the decoded linear prediction coefficient is decoded into (the predicted value of) the true linear prediction coefficient by the above-described class classification adaptive processing, and the decoded residual signal is also converted into the true linear prediction coefficient. The decoding is performed to (predicted value of) the residual signal.

【００７３】即ち、デマルチプレクサ（ＤＥＭＵＸ）４
１には、コードデータが供給されるようになっており、
デマルチプレクサ４１は、そこに供給されるコードデー
タから、フレームごとのＡコードと残差コードを分離
し、それぞれを、フィルタ係数復号器４２Ａと残差コー
ドブック記憶部４２Ｅに供給する。That is, the demultiplexer (DEMUX) 4
1 is supplied with code data,
The demultiplexer 41 separates the A code and the residual code for each frame from the code data supplied thereto, and supplies them to the filter coefficient decoder 42A and the residual code book storage unit 42E.

【００７４】ここで、図３におけるコードデータに含ま
れるＡコードと残差コードは、音声を、所定のフレーム
ごとにＬＰＣ分析して得られる線形予測係数と残差信号
を、所定のコードブックを用いて、それぞれベクトル量
子化することにより得られるコードとなっている。Here, the A code and the residual code included in the code data in FIG. 3 are obtained by converting a linear prediction coefficient and a residual signal obtained by performing LPC analysis on speech for each predetermined frame into a predetermined code book. , And are codes obtained by vector quantization.

【００７５】フィルタ係数復号器４２Ａは、デマルチプ
レクサ４１から供給されるフレームごとのＡコードを、
そのＡコードを得るときに用いられたのと同一のコード
ブックに基づいて、復号線形予測係数に復号し、タップ
生成部４３Ａに供給する。The filter coefficient decoder 42A converts the A code for each frame supplied from the demultiplexer 41 into
Based on the same codebook used to obtain the A code, the A code is decoded to a decoded linear prediction coefficient and supplied to the tap generation unit 43A.

【００７６】残差コードブック記憶部４２Ｅは、デマル
チプレクサ４１から供給されるフレームごとの残差コー
ドを得るときに用いられたのと同一のコードブックを記
憶しており、デマルチプレクサからの残差コードを、そ
のコードブックに基づいて、復号残差信号に復号し、タ
ップ生成部４３Ｅに供給する。The residual codebook storage section 42E stores the same codebook used when obtaining the residual code for each frame supplied from the demultiplexer 41, and stores the residual code from the demultiplexer. The code is decoded into a decoded residual signal based on the codebook, and is supplied to the tap generation unit 43E.

【００７７】タップ生成部４３Ａは、フィルタ係数復号
器４２Ａから供給されるフレームごとの復号線形予測係
数から、後述するクラス分類部４４Ａにおけるクラス分
類に用いられるクラスタップとなるものと、同じく後述
する予測部４６における予測演算に用いられる予測タッ
プとなるものを、それぞれ抽出する。即ち、タップ生成
部４３Ａは、例えば、いま処理しようとしているフレー
ムの復号線形予測係数すべてを、線形予測係数について
のクラスタップおよび予測タップとする。そして、タッ
プ生成部４３Ａは、線形予測係数についてのクラスタッ
プをクラス分類部４４Ａに、予測タップを予測部４６Ａ
に、それぞれ供給する。The tap generation unit 43A converts the decoded linear prediction coefficients for each frame supplied from the filter coefficient decoder 42A into a class tap used for class classification in the class classification unit 44A described later, Each of the prediction taps used for the prediction calculation in the unit 46 is extracted. That is, the tap generation unit 43A sets, for example, all the decoded linear prediction coefficients of the frame currently being processed as class taps and prediction taps for the linear prediction coefficient. Then, the tap generation unit 43A sends the class tap for the linear prediction coefficient to the class classification unit 44A and the prediction tap to the prediction unit 46A.
Respectively.

【００７８】タップ生成部４３Ｅは、残差コードブック
記憶部４２Ｅから供給されるフレームごとの復号残差信
号から、クラスタップとなるものと、予測タップとなる
ものを、それぞれ抽出する。即ち、タップ生成部４３Ｅ
は、例えば、いま処理しようとしているフレームの復号
残差信号のサンプル値すべてを、残差信号についてのク
ラスタップおよび予測タップとする。そして、タップ生
成部４３Ｅは、残差信号についてのクラスタップをクラ
ス分類部４４Ｅに、予測タップを予測部４６Ｅに、それ
ぞれ供給する。The tap generator 43E extracts a class tap and a prediction tap from the decoded residual signal for each frame supplied from the residual codebook storage 42E. That is, the tap generation unit 43E
Let, for example, all the sample values of the decoded residual signal of the frame to be processed be used as the class tap and the prediction tap for the residual signal. Then, the tap generation unit 43E supplies the class tap for the residual signal to the class classification unit 44E and the prediction tap to the prediction unit 46E.

【００７９】ここで、予測タップやクラスタップの構成
パターンは、上述したパターンのものに限定されるもの
ではない。Here, the configuration patterns of the prediction taps and the class taps are not limited to those described above.

【００８０】なお、タップ生成部４３Ａでは、復号線形
予測係数と、復号残差信号との両方の中から、線形予測
係数のクラスタップや予測タップを抽出するようにする
ことができる。さらに、タップ生成部４３Ａでは、Ａコ
ードや残差コードからも、線形予測係数についてのクラ
スタップや予測タップを抽出するようにすることができ
る。また、後段の予測部４６Ａや４６Ｅが既に出力した
信号や、音声合成フィルタ４７が既に出力した合成音信
号からも、線形予測係数についてのクラスタップや予測
タップを抽出するようにすることができる。タップ生成
部４３Ｅにおいても、同様にして、残差信号についての
クラスタップや予測タップを抽出することが可能であ
る。The tap generation section 43A can extract the class taps and prediction taps of the linear prediction coefficient from both the decoded linear prediction coefficient and the decoded residual signal. Further, the tap generation unit 43A can extract a class tap and a prediction tap for the linear prediction coefficient from the A code and the residual code. Further, the class taps and the prediction taps for the linear prediction coefficients can be extracted from the signals already output by the prediction units 46A and 46E at the subsequent stage and the synthesized sound signal already output by the speech synthesis filter 47. The tap generation unit 43E can similarly extract a class tap and a prediction tap for the residual signal.

【００８１】クラス分類部４４Ａは、タップ生成部４３
Ａからの線形予測係数についてのクラスタップに基づ
き、注目している注目フレーム（真の線形予測係数の予
測値を求めようとしているフレーム）の線形予測係数を
クラス分類し、その結果得られるクラスに対応するクラ
スコードを、係数メモリ４５Ａに出力する。The classifying section 44A includes a tap generating section 43
Based on the class tap for the linear prediction coefficient from A, the linear prediction coefficient of the focused frame of interest (the frame for which the prediction value of the true linear prediction coefficient is to be obtained) is classified, and the resulting class is The corresponding class code is output to the coefficient memory 45A.

【００８２】ここで、クラス分類を行う方法としては、
例えば、ADRC(Adaptive Dynamic Range Coding)等を採
用することができる。Here, as a method of performing the class classification,
For example, ADRC (Adaptive Dynamic Range Coding) or the like can be adopted.

【００８３】ADRCを用いる方法では、クラスタップを構
成する復号線形予測係数が、ADRC処理され、その結果得
られるADRCコードにしたがって、注目フレーム（の線形
予測係数）のクラスが決定される。In the method using ADRC, the decoded linear prediction coefficients constituting the class taps are subjected to ADRC processing, and the class of (the linear prediction coefficient of) the target frame is determined according to the ADRC code obtained as a result.

【００８４】なお、KビットADRCにおいては、例えば、
クラスタップを構成する復号線形予測係数の最大値MAX
と最小値MINが検出され、DR=MAX-MINを、集合の局所的
なダイナミックレンジとし、このダイナミックレンジDR
に基づいて、クラスタップを構成する復号線形予測係数
がKビットに再量子化される。即ち、クラスタップを構
成する復号線形予測係数の中から、最小値MINが減算さ
れ、その減算値がDR/2^Kで除算（量子化）される。そし
て、以上のようにして得られる、クラスタップを構成す
るKビットの各復号線形予測係数を、所定の順番で並べ
たビット列が、ADRCコードとして出力される。従って、
クラスタップが、例えば、１ビットADRC処理された場合
には、そのクラスタップを構成する各復号線形予測係数
は、最小値MINが減算された後に、最大値MAXと最小値MI
Nとの平均値で除算され、これにより、各復号線形予測
係数が１ビットとされる（２値化される）。そして、そ
の１ビットの復号線形予測係数を所定の順番で並べたビ
ット列が、ADRCコードとして出力される。In the K-bit ADRC, for example,
Maximum value MAX of decoded linear prediction coefficients constituting class taps
And the minimum value MIN is detected, and DR = MAX-MIN is set as the local dynamic range of the set.
, The decoded linear prediction coefficients constituting the class tap are requantized to K bits. That is, from the decoded linear prediction coefficients forming the class taps, the minimum value MIN is subtracted, and the subtracted value is divided (quantized) by DR / 2^K. Then, a bit string obtained by arranging the K-bit decoded linear prediction coefficients constituting the class tap in the predetermined order, which is obtained as described above, is output as an ADRC code. Therefore,
When a class tap is subjected to, for example, 1-bit ADRC processing, each decoded linear prediction coefficient constituting the class tap is obtained by subtracting a minimum value MIN from a maximum value MAX and a minimum value MI.
It is divided by the average value with N, whereby each decoded linear prediction coefficient is made one bit (binarized). Then, a bit string in which the 1-bit decoded linear prediction coefficients are arranged in a predetermined order is output as an ADRC code.

【００８５】なお、クラス分類部４４Ａには、例えば、
クラスタップを構成する復号線形予測係数の値の系列
を、そのままクラスコードとして出力させることも可能
であるが、この場合、クラスタップが、Ｐ次の復号線形
予測係数で構成され、各復号線形予測係数に、Ｋビット
が割り当てられているとすると、クラス分類部４４Ａが
出力するクラスコードの場合の数は、（２^N）^K通りとな
り、復号線形予測係数のビット数Ｋに指数的に比例した
膨大な数となる。The class classification unit 44A includes, for example,
It is also possible to output the series of values of the decoded linear prediction coefficients constituting the class taps as they are as the class code. In this case, the class taps are composed of P-order decoded linear prediction coefficients, Assuming that K bits are assigned to the coefficients, the number of class codes output by the classifying unit 44A is (2^N )^K , which is exponentially proportional to the number K of bits of the decoded linear prediction coefficient. It is a huge number.

【００８６】従って、クラス分類部４４Ａにおいては、
クラスタップの情報量を、上述のADRC処理や、あるいは
ベクトル量子化等によって圧縮してから、クラス分類を
行うのが好ましい。Therefore, in the classifying section 44A,
It is preferable to perform the class classification after compressing the information amount of the class tap by the above-described ADRC processing or vector quantization.

【００８７】クラス分類部４４Ｅも、タップ生成部４３
Ｅから供給されるクラスタップに基づき、クラス分類部
４４Ａにおける場合と同様にして、注目フレームのクラ
ス分類を行い、その結果得られるクラスコードを、係数
メモリ４５Ｅに出力する。The class classifying section 44E also includes the tap generating section 43
Based on the class tap supplied from E, the class classification of the frame of interest is performed in the same manner as in the class classification unit 44A, and the resulting class code is output to the coefficient memory 45E.

【００８８】係数メモリ４５Ａは、後述する図６の学習
装置において学習処理が行われることにより得られる、
クラスごとの線形予測係数についてのタップ係数を記憶
しており、クラス分類部４４Ａが出力するクラスコード
に対応するアドレスに記憶されているタップ係数を、予
測部４６Ａに出力する。The coefficient memory 45A is obtained by performing a learning process in a learning device shown in FIG.
The tap coefficient for the linear prediction coefficient for each class is stored, and the tap coefficient stored at the address corresponding to the class code output from the class classification unit 44A is output to the prediction unit 46A.

【００８９】係数メモリ４５Ｅは、後述する図６の学習
装置において学習処理が行われることにより得られる、
クラスごとの残差信号についてのタップ係数を記憶して
おり、クラス分類部４４Ｅが出力するクラスコードに対
応するアドレスに記憶されているタップ係数を、予測部
４６Ｅに出力する。The coefficient memory 45E is obtained by performing a learning process in a learning device shown in FIG.
The tap coefficient for the residual signal for each class is stored, and the tap coefficient stored at the address corresponding to the class code output from the class classification unit 44E is output to the prediction unit 46E.

【００９０】ここで、各フレームについて、Ｐ次の線形
予測係数が求められるとすると、注目フレームについ
て、Ｐ次の線形予測係数を、式（６）の予測演算によっ
て求めるには、Ｐセットのタップ係数が必要である。従
って、係数メモリ４５Ａには、１つのクラスコードに対
応するアドレスに対して、Ｐセットのタップ係数が記憶
されている。同様の理由から、係数メモリ４５Ｅには、
各フレームにおける残差信号のサンプル点と同一数のセ
ットのタップ係数が記憶されている。Here, assuming that a P-order linear prediction coefficient is determined for each frame, a P-order linear prediction coefficient for the frame of interest is determined by tapping the P-set. Coefficient is required. Therefore, the tap memory of the P set is stored in the coefficient memory 45A for the address corresponding to one class code. For the same reason, the coefficient memory 45E contains
The same number of sets of tap coefficients as the sample points of the residual signal in each frame are stored.

【００９１】予測部４６Ａは、タップ生成部４３Ａが出
力する予測タップと、係数メモリ４５Ａが出力するタッ
プ係数とを取得し、その予測タップとタップ係数とを用
いて、式（６）に示した線形予測演算（積和演算）を行
い、注目フレームのＰ次の線形予測係数（の予測値）を
求めて、音声合成フィルタ４７に出力する。The prediction section 46A acquires the prediction tap output from the tap generation section 43A and the tap coefficient output from the coefficient memory 45A, and uses the prediction tap and the tap coefficient to obtain the equation (6). A linear prediction operation (product-sum operation) is performed, and a P-order linear prediction coefficient (predicted value) of the frame of interest is obtained and output to the speech synthesis filter 47.

【００９２】予測部４６Ｅは、タップ生成部４３Ｅが出
力する予測タップと、係数メモリ４５Ｅが出力するタッ
プ係数とを取得し、その予測タップとタップ係数とを用
いて、式（６）に示した線形予測演算を行い、注目フレ
ームの残差信号（の予測値）を求めて、音声合成フィル
タ４７に出力する。The prediction section 46E acquires the prediction tap output from the tap generation section 43E and the tap coefficient output from the coefficient memory 45E, and uses the prediction tap and the tap coefficient to obtain the equation (6). A linear prediction operation is performed to obtain (predicted value of) the residual signal of the frame of interest, and output to the speech synthesis filter 47.

【００９３】ここで、係数メモリ４５Ａは、注目フレー
ムを構成するＰ次の線形予測係数の予測値それぞれを求
めるためのＰセットのタップ係数を出力するが、予測部
４６Ａは、各次数の線形予測係数を、予測タップと、そ
の次数に対応するタップ係数のセットとを用いて、式
（６）の積和演算を行う。予測部４６Ｅも同様である。Here, the coefficient memory 45A outputs the tap coefficients of the P set for obtaining the predicted values of the P-order linear prediction coefficients constituting the frame of interest. The prediction unit 46A performs the linear prediction of each order. The product-sum operation of the equation (6) is performed using the prediction tap and a set of tap coefficients corresponding to the order. The same applies to the prediction unit 46E.

【００９４】音声合成フィルタ４７は、例えば、図１の
音声合成フィルタ２９と同様に、ＩＩＲ型のディジタル
フィルタで、予測部４６Ａからの線形予測係数をＩＩＲ
フィルタのタップ係数とするとともに、予測部４６Ｅか
らの残差信号を入力信号として、その入力信号のフィル
タリングを行うことにより、合成音信号を生成し、Ｄ／
Ａ変換部４８に供給する。Ｄ／Ａ変換部４８は、音声合
成フィルタ４７からの合成音信号を、ディジタル信号か
らアナログ信号にＤ／Ａ変換し、スピーカ４９に供給し
て出力させる。The speech synthesis filter 47 is, for example, an IIR type digital filter similar to the speech synthesis filter 29 of FIG. 1, and converts the linear prediction coefficient from the prediction unit 46A into an IIR type.
In addition to using the tap coefficients of the filter, the residual signal from the prediction unit 46E is used as an input signal, and the input signal is filtered to generate a synthesized sound signal.
It is supplied to the A conversion unit 48. The D / A converter 48 D / A converts the synthesized sound signal from the voice synthesis filter 47 from a digital signal to an analog signal, and supplies the analog signal to a speaker 49 for output.

【００９５】なお、図３では、タップ生成部４３Ａと４
３Ｅにおいて、それぞれクラスタップを生成し、クラス
分類部４４Ａと４４Ｅにおいて、それぞれ、そのクラス
タップに基づくクラス分類を行い、さらに、係数メモリ
４５Ａと４５Ｅから、それぞれ、そのクラス分類結果と
してのクラスコードに対応する、線形予測係数と残差信
号それぞれについてのタップ係数を取得するようにした
が、線形予測係数と残差信号それぞれについてのタップ
係数は、例えば、以下のようにして取得することも可能
である。In FIG. 3, tap generation units 43A and 43A
In 3E, a class tap is generated, and in each of the classifying units 44A and 44E, a class is classified based on the class tap. The corresponding tap coefficients for each of the linear prediction coefficient and the residual signal are obtained.However, the tap coefficients for each of the linear prediction coefficient and the residual signal can be obtained, for example, as follows. is there.

【００９６】即ち、タップ生成部４３Ａと４３Ｅ、クラ
ス分類部４４Ａと４４Ｅ、係数メモリ４５Ａと４５Ｅ
を、ぞれぞれ一体的に構成する。いま、一体的に構成し
たタップ生成部、クラス分類部、係数メモリを、それぞ
れ、タップ生成部４３、クラス分類部４４、係数メモリ
４５というものとすると、タップ生成部４３には、復号
線形予測係数と復号残差信号とからクラスタップを構成
させ、クラス分類部４４には、そのクラスタップに基づ
いて、クラス分類を行わせ、１つのクラスコードを出力
させる。さらに、係数メモリ４５には、各クラスに対応
するアドレスに、線形予測係数についてのタップ係数
と、残差信号についてのタップ係数との組を記憶させて
おき、クラス分類部４４が出力するクラスコードに対応
するアドレスに記憶されている線形予測係数と残差信号
それぞれについてのタップ係数の組を出力させる。そし
て、予測部４６Ａと４６Ｅでは、このようにして、係数
メモリ４５から組で出力される線形予測係数についての
タップ係数と、残差信号についてのタップ係数に基づい
て、それぞれ、処理を行うようにすることができる。That is, the tap generation units 43A and 43E, the class classification units 44A and 44E, and the coefficient memories 45A and 45E
Are integrally configured. Now, assuming that the integrally formed tap generation unit, class classification unit, and coefficient memory are respectively a tap generation unit 43, a class classification unit 44, and a coefficient memory 45, the tap generation unit 43 includes a decoded linear prediction coefficient And the decoded residual signal to form a class tap, and the class classifying unit 44 performs a class classification based on the class tap and outputs one class code. Further, in the coefficient memory 45, a set of a tap coefficient for the linear prediction coefficient and a tap coefficient for the residual signal is stored at an address corresponding to each class, and the class code output by the class classification unit 44 is stored. Are output as a set of tap coefficients for each of the linear prediction coefficients and the residual signal stored at the address corresponding to Then, the prediction units 46A and 46E perform processing in this manner based on the tap coefficient for the linear prediction coefficient and the tap coefficient for the residual signal output as a set from the coefficient memory 45, respectively. can do.

【００９７】なお、タップ生成部４３Ａと４３Ｅ、クラ
ス分類部４４Ａと４４Ｅ、係数メモリ４５Ａと４５Ｅ
を、ぞれぞれ別に構成する場合には、線形予測係数につ
いてのクラス数と、残差信号についてのクラス数とは、
同一になるとは限らないが、一体的に構成する場合に
は、線形予測係数と残差信号についてのクラス数は、同
一になる。The tap generating units 43A and 43E, the classifying units 44A and 44E, and the coefficient memories 45A and 45E
Are separately configured, the number of classes for the linear prediction coefficient and the number of classes for the residual signal are:
Although not necessarily the same, the number of classes for the linear prediction coefficient and the residual signal is the same when they are configured integrally.

【００９８】次に、図４は、図３の音声合成フィルタ４
７の構成例を示している。Next, FIG. 4 shows the speech synthesis filter 4 shown in FIG.
7 shows a configuration example.

【００９９】図４において、音声合成フィルタ４７は、
Ｐ次の線形予測係数を用いるものとなっており、従っ
て、１つの加算器５１、Ｐ個の遅延回路（Ｄ）５２₁乃
至５２_P、およびＰ個の乗算器５３₁乃至５３_Pから構成
されている。In FIG. 4, the speech synthesis filter 47
Has become one using the P-order LPC coefficients, therefore, it consists of a single adder 51, P number of delay circuits (D) 52₁ to 52_P, and P multipliers 53₁ to 53_P ing.

【０１００】乗算器５３₁乃至５３_Pには、それぞれ、予
測部４６Ａから供給されるＰ次の線形予測係数α₁，
α₂，・・・，α_Pがセットされ、これにより、音声合成
フィルタ４７では、式（４）にしたがって演算が行わ
れ、合成音信号が生成される。The multipliers 53_{1 to} 53_P have P-order linear prediction coefficients α₁ , α₁ ,
α₂ ,..., α_P are set, whereby the speech synthesis filter 47 performs an operation in accordance with equation (4) to generate a synthesized sound signal.

【０１０１】即ち、予測部４６Ｅが出力する残差信号ｅ
は、加算器５１を介して、遅延回路５２₁に供給され、
遅延回路５２_pは、そこへの入力信号を、残差信号の１
サンプル分だけ遅延して、後段の遅延回路５２_p+1に出
力するとともに、乗算器５３_pに出力する。乗算器５３_p
は、遅延回路５２_pの出力と、そこにセットされた線形
予測係数α_pとを乗算し、その乗算値を、加算器５１に
出力する。That is, the residual signal e output from the prediction unit 46E
Via the adder 51 is supplied to the delay circuit 52_1,
The delay circuit 52_p converts the input signal there into the residual signal 1
The signal is delayed by the number of samples and output to the delay circuit 52_{p + 1} at the subsequent stage and output to the multiplier 53_p . Multiplier 53_p
Multiplies the output of the delay circuit 52_p by the linear prediction coefficient α_p set therein, and outputs the multiplied value to the adder 51.

【０１０２】加算器５１は、乗算器５３₁乃至５３_Pの出
力すべてと、残差信号ｅとを加算し、その加算結果を、
遅延回路５２₁に供給する他、音声合成結果（合成音信
号）として出力する。The adder 51 adds all the outputs of the multipliers 53_{1 to} 53_P and the residual signal e, and
Other supplied to the delay circuit 52_1, and outputs as a speech synthesis result (synthesized sound signal).

【０１０３】次に、図５のフローチャートを参照して、
図３の音声合成装置の処理（音声合成処理）について説
明する。Next, referring to the flowchart of FIG.
The processing (speech synthesis processing) of the speech synthesis device in FIG. 3 will be described.

【０１０４】デマルチプレクサ４１は、そこに供給され
るコードデータから、フレームごとのＡコードと残差コ
ードを順次分離し、それぞれを、フィルタ係数復号器４
２Ａと残差コードブック記憶部４２Ｅに供給する。The demultiplexer 41 sequentially separates the A code and the residual code for each frame from the code data supplied thereto, and separates them into the filter coefficient decoder 4.
2A and the residual codebook storage unit 42E.

【０１０５】フィルタ係数復号器４２Ａは、デマルチプ
レクサ４１から供給されるフレームごとのＡコードを、
復号線形予測係数に順次復号し、タップ生成部４３Ａに
供給し、また、残差コードブック記憶部４２Ｅは、デマ
ルチプレクサ４１から供給されるフレームごとの残差コ
ードを、復号残差信号に順次復号し、タップ生成部４３
Ｅに供給する。The filter coefficient decoder 42A outputs the A code for each frame supplied from the demultiplexer 41,
The decoded code is sequentially decoded into decoded linear prediction coefficients and supplied to the tap generation unit 43A. The residual codebook storage unit 42E sequentially decodes the residual code for each frame supplied from the demultiplexer 41 into a decoded residual signal. And the tap generation unit 43
Supply to E.

【０１０６】タップ生成部４３Ａは、そこに供給される
復号線形予測係数のフレームを、順次、注目フレームと
し、ステップＳ１において、フィルタ係数復号器４２Ａ
から供給される復号線形予測係数から、クラスタップと
予測タップを生成する。さらに、ステップＳ１では、タ
ップ生成部４３Ｅは、残差コードブック記憶部４２Ｅか
ら供給される復号残差信号から、クラスタップと予測タ
ップを生成する。タップ生成部４３Ａが生成したクラス
タップは、クラス分類部４４Ａに、予測タップは、予測
部４６Ａに、それぞれ供給され、タップ生成部４３Ｅが
生成したクラスタップは、クラス分類部４４Ｅに、予測
タップは、予測部４６Ｅに、それぞれ供給される。The tap generator 43A sequentially sets the frames of the decoded linear prediction coefficients supplied thereto as frames of interest, and in step S1, the filter coefficient decoder 42A
And class taps and prediction taps are generated from the decoded linear prediction coefficients supplied from. Further, in step S1, the tap generation unit 43E generates a class tap and a prediction tap from the decoded residual signal supplied from the residual codebook storage unit 42E. The class taps generated by the tap generation unit 43A are supplied to the classification unit 44A, the prediction taps are supplied to the prediction unit 46A, respectively, the class taps generated by the tap generation unit 43E are supplied to the classification unit 44E, and the prediction taps are , And the prediction unit 46E.

【０１０７】そして、ステップＳ２に進み、クラス分類
部４４Ａと４４Ｅは、タップ生成部４３Ａと４３Ｅから
供給されるクラスタップに基づいて、それぞれクラス分
類を行い、その結果得られるクラスコードを、係数メモ
リ４５Ａと４５Ｅに、ぞれぞれ供給して、ステップＳ３
に進む。Then, the process proceeds to step S2, where the classifying units 44A and 44E perform class classification based on the class taps supplied from the tap generating units 43A and 43E, and store the resulting class code in the coefficient memory. 45A and 45E, respectively, and supply them to step S3.
Proceed to.

【０１０８】ステップＳ３では、係数メモリ４５Ａと４
５Ｅは、クラス分類部４４Ａと４４Ｅから供給されるク
ラスコードに対応するアドレスから、タップ係数を、そ
れぞれ読み出し、予測部４６Ａと４６Ｅに、それぞれ供
給する。In step S3, the coefficient memories 45A and 45A
5E reads the tap coefficient from the address corresponding to the class code supplied from the classifying units 44A and 44E, and supplies the tap coefficients to the predicting units 46A and 46E, respectively.

【０１０９】そして、ステップＳ４に進み、予測部４６
Ａは、係数メモリ４５Ａが出力するタップ係数を取得
し、そのタップ係数と、タップ生成部４３Ａからの予測
タップとを用いて、式（６）に示した積和演算を行い、
注目フレームの真の線形予測係数（の予測値）を得る。
さらに、ステップＳ４では、予測部４６Ｅは、係数メモ
リ４５Ｅが出力するタップ係数を取得し、そのタップ係
数と、タップ生成部４３Ｅからの予測タップとを用い
て、式（６）に示した積和演算を行い、注目フレームの
真の残差信号（の予測値）を得る。Then, the process proceeds to a step S4, wherein the predicting section 46
A obtains the tap coefficient output from the coefficient memory 45A, and performs the product-sum operation shown in Expression (6) using the tap coefficient and the prediction tap from the tap generation unit 43A.
The true linear prediction coefficient (predicted value) of the frame of interest is obtained.
Further, in step S4, the prediction unit 46E acquires the tap coefficient output from the coefficient memory 45E, and uses the tap coefficient and the prediction tap from the tap generation unit 43E to calculate the product sum shown in Expression (6). The calculation is performed to obtain (the predicted value of) the true residual signal of the frame of interest.

【０１１０】以上のようにして得られた残差信号および
線形予測係数は、音声合成フィルタ４７に供給され、音
声合成フィルタ４７では、その残差信号および線形予測
係数を用いて、式（４）の演算が行われることにより、
注目フレームの合成音信号が生成される。この合成音信
号は、音声合成フィルタ４７から、Ｄ／Ａ変換部４８を
介して、スピーカ４９に供給され、これにより、スピー
カ４９からは、その合成音信号に対応する合成音が出力
される。The residual signal and the linear prediction coefficient obtained as described above are supplied to a speech synthesis filter 47. The speech synthesis filter 47 uses the residual signal and the linear prediction coefficient to obtain the equation (4). Is calculated,
A synthesized sound signal of the frame of interest is generated. The synthesized sound signal is supplied from the voice synthesis filter 47 to the speaker 49 via the D / A conversion unit 48, whereby the synthesized sound signal corresponding to the synthesized sound signal is output from the speaker 49.

【０１１１】予測部４６Ａと４６Ｅにおいて、線形予測
係数と残差信号がそれぞれ得られた後は、ステップＳ５
に進み、まだ、注目フレームとして処理すべきフレーム
の復号線形予測係数および復号残差信号があるかどうか
が判定される。ステップＳ５において、まだ、注目フレ
ームとして処理すべきフレームの復号線形予測係数およ
び復号残差信号があると判定された場合、ステップＳ１
に戻り、次に注目フレームとすべきフレームを、新たに
注目フレームとして、以下、同様の処理を繰り返す。ま
た、ステップＳ５において、注目フレームとして処理す
べきフレームの復号線形予測係数および復号残差信号が
ないと判定された場合、音声合成処理を終了する。After the prediction units 46A and 46E have obtained the linear prediction coefficient and the residual signal, respectively, step S5
It is determined whether there are still decoded linear prediction coefficients and decoded residual signals of the frame to be processed as the frame of interest. If it is determined in step S5 that there are still decoded linear prediction coefficients and decoded residual signals of the frame to be processed as the frame of interest, step S1
And the same processing is repeated hereafter, with the frame to be the next frame of interest set as the new frame of interest. If it is determined in step S5 that there is no decoded linear prediction coefficient and no decoded residual signal of the frame to be processed as the frame of interest, the speech synthesis processing ends.

【０１１２】次に、図６は、図３の係数メモリ４５Ａお
よび４５Ｅに記憶させるタップ係数の学習処理を行う学
習装置の一実施の形態の構成例を示している。Next, FIG. 6 shows an example of the configuration of an embodiment of a learning device for performing a learning process of tap coefficients stored in the coefficient memories 45A and 45E of FIG.

【０１１３】学習装置には、学習用のディジタル音声信
号が、フレーム単位で供給されるようになっており、こ
の学習用のディジタル音声信号は、ＬＰＣ分析部６１Ａ
および予測フィルタ６１Ｅに供給される。The learning device is supplied with a digital voice signal for learning in units of frames. The digital voice signal for learning is supplied to the LPC analyzing section 61A.
And the prediction filter 61E.

【０１１４】ＬＰＣ分析部６１Ａは、そこに供給される
音声信号のフレームを、順次、注目フレームとし、その
注目フレームの音声信号をＬＰＣ分析することで、Ｐ次
の線形予測係数を求める。この線形予測係数は、予測フ
ィルタ６１Ｅおよびベクトル量子化部６２Ａに供給され
るとともに、線形予測係数についてのタップ係数を求め
るための教師データとして、正規方程式加算回路６６Ａ
に供給される。The LPC analysis section 61A sequentially determines the frames of the audio signal supplied thereto as frames of interest, and performs an LPC analysis on the audio signal of the frame of interest to obtain a P-order linear prediction coefficient. The linear prediction coefficient is supplied to the prediction filter 61E and the vector quantization unit 62A, and is used as teacher data for obtaining tap coefficients for the linear prediction coefficient by a normal equation addition circuit 66A.
Supplied to

【０１１５】予測フィルタ６１Ｅは、そこに供給される
注目フレームの音声信号と線形予測係数を用いて、例え
ば、式（１）にしたがった演算を行うことにより、注目
フレームの残差信号を求め、ベクトル量子化部６２Ｅに
供給するとともに、残差信号についてのタップ係数を求
めるための教師データとして、正規方程式加算回路６６
Ｅに供給する。The prediction filter 61E obtains the residual signal of the frame of interest by performing, for example, an operation according to equation (1) using the audio signal of the frame of interest and the linear prediction coefficient supplied thereto. A normal equation addition circuit 66 is supplied to the vector quantization unit 62E and used as teacher data for obtaining tap coefficients for the residual signal.
Supply to E.

【０１１６】即ち、式（１）におけるｓ_nとｅ_nのＺ変換
を、ＳとＥとそれぞれ表すと、式（１）は、次式のよう
に表すことができる。[0116] That is, the Z-transform of s_n and e_n in the formula (1), expressed respectively S and E, equation (1) can be expressed by the following equation.

【０１１７】Ｅ＝（１＋α₁ｚ^-¹＋α₂ｚ^-²＋・・・＋α_Pｚ^-^P）Ｓ・・・（１４）[0117]_{E = (1 + α 1 z} - 1 + α 2 z - 2 + ··· + α P z - P) S ··· (14)

【０１１８】式（１４）から、残差信号ｅは、音声信号
ｓと線形予測係数α_Pとの積和演算で求めることがで
き、従って、残差信号ｅを求める予測フィルタ６１Ｅ
は、ＦＩＲ(Finite Impulse Response)型のディジタル
フィルタで構成することができる。From equation (14), the residual signal e can be obtained by the product-sum operation of the speech signal s and the linear prediction coefficient α_P, and therefore, the prediction filter 61E for obtaining the residual signal e
Can be configured by a FIR (Finite Impulse Response) type digital filter.

【０１１９】即ち、図７は、予測フィルタ６１Ｅの構成
例を示している。That is, FIG. 7 shows a configuration example of the prediction filter 61E.

【０１２０】予測フィルタ６１Ｅには、ＬＰＣ分析部６
１Ａから、Ｐ次の線形予測係数が供給されるようになっ
ており、従って、予測フィルタ６１Ｅは、Ｐ個の遅延回
路（Ｄ）７１₁乃至７１_P、Ｐ個の乗算器７２₁乃至７
２_P、および１つの加算器７３から構成されている。The prediction filter 61E includes an LPC analysis unit 6
From 1A, being adapted to the linear prediction coefficients P following is supplied, therefore, the prediction filter 61E is, P number of delay circuits (D) 71₁ to 71_P, P multipliers 72₁ to 7
2_P , and one adder 73.

【０１２１】乗算器７２₁乃至７２_Pには、それぞれ、Ｌ
ＰＣ分析部６１Ａから供給されるＰ次の線形予測係数の
うちのα₁，α₂，・・・，α_Pがセットされる。Each of the multipliers 72_{1 to} 72_P has L
Α₁ , α₂ ,..., Α_P among the P-order linear prediction coefficients supplied from the PC analysis unit 61A are set.

【０１２２】一方、注目フレームの音声信号ｓは、遅延
回路７１₁と加算器７３に供給される。遅延回路７１
_pは、そこへの入力信号を、残差信号の１サンプル分だ
け遅延して、後段の遅延回路７１_p+1に出力するととも
に、乗算器７２_pに出力する。乗算器７２_pは、遅延回路
７１_pの出力と、そこにセットされた線形予測係数α_pと
を乗算し、その乗算値を、加算器７３に出力する。[0122] On the other hand, the audio signal s of the frame of interest is supplied to the delay circuit 71₁ and the adder 73. Delay circuit 71
_p delays the input signal therefor by one sample of the residual signal, outputs the delayed signal to the delay circuit 71_{p + 1} at the subsequent stage, and outputs it to the multiplier 72_p . The multiplier 72_p multiplies the output of the delay circuit 71_p by the linear prediction coefficient α_p set therein, and outputs the multiplied value to the adder 73.

【０１２３】加算器７３は、乗算器７２₁乃至７２_Pの出
力すべてと、音声信号ｓとを加算し、その加算結果を、
残差信号ｅとして出力する。The adder 73 adds all the outputs of the multipliers 72_{1 to} 72_P and the audio signal s, and
It is output as a residual signal e.

【０１２４】図６に戻り、ベクトル量子化部６２Ａは、
線形予測係数を要素とするコードベクトルとコードとを
対応付けたコードブックを記憶しており、そのコードブ
ックに基づいて、ＬＰＣ分析部６１Ａからの注目フレー
ムの線形予測係数で構成される特徴ベクトルをベクトル
量子化し、そのベクトル量子化の結果得られるＡコード
を、フィルタ係数復号器６３Ａに供給する。ベクトル量
子化部６２Ｅは、残差信号のサンプル値を要素とするコ
ードベクトルとコードとを対応付けたコードブックを記
憶しており、そのコードブックに基づいて、予測フィル
タ６１Ｅからの注目フレームの残差信号のサンプル値で
構成される残差ベクトルをベクトル量子化し、そのベク
トル量子化の結果得られる残差コードを、残差コードブ
ック記憶部６３Ｅに供給する。Returning to FIG. 6, the vector quantization unit 62A
A codebook in which a code is associated with a code vector having a linear prediction coefficient as an element is stored. Based on the codebook, a feature vector composed of the linear prediction coefficient of the frame of interest from the LPC analysis unit 61A is stored. Vector quantization is performed, and the A code obtained as a result of the vector quantization is supplied to the filter coefficient decoder 63A. The vector quantization unit 62E stores a codebook in which a code is associated with a code vector having a sample value of the residual signal as an element, and based on the codebook, stores a residual frame of interest from the prediction filter 61E. The residual vector constituted by the sample value of the difference signal is vector-quantized, and the residual code obtained as a result of the vector quantization is supplied to the residual code book storage unit 63E.

【０１２５】フィルタ係数復号器６３Ａは、ベクトル量
子化部６２Ａが記憶しているのと同一のコードブックを
記憶しており、そのコードブックに基づいて、ベクトル
量子化部６２ＡからのＡコードを、復号線形予測係数に
復号し、線形予測係数についてのタップ係数を求めるた
めの生徒データとして、タップ生成部６４Ａに供給す
る。ここで、図３のフィルタ係数復号器４２Ａは、図６
のフィルタ係数復号器６３Ａと同様に構成されている。The filter coefficient decoder 63A stores the same codebook as that stored in the vector quantization unit 62A, and converts the A code from the vector quantization unit 62A based on the codebook. The data is decoded to the decoded linear prediction coefficient, and is supplied to the tap generation unit 64A as student data for obtaining a tap coefficient for the linear prediction coefficient. Here, the filter coefficient decoder 42A of FIG.
Is configured similarly to the filter coefficient decoder 63A.

【０１２６】残差コードブック記憶部６３Ｅは、ベクト
ル量子化部６２Ｅが記憶しているのと同一のコードブッ
クを記憶しており、そのコードブックに基づいて、ベク
トル量子化部６２Ｅからの残差コードを、復号残差信号
に復号し、残差信号についてのタップ係数を求めるため
の生徒データとして、タップ生成部６４Ｅに供給する。
ここで、図３の残差コードブック記憶部４２Ｅは、図６
の残差コードブック記憶部４２Ｅと同様に構成されてい
る。The residual codebook storage unit 63E stores the same codebook as that stored by the vector quantization unit 62E, and based on the codebook, stores the residual code from the vector quantization unit 62E. The code is decoded into a decoded residual signal, and is supplied to the tap generation unit 64E as student data for obtaining tap coefficients for the residual signal.
Here, the residual codebook storage unit 42E of FIG.
Is configured similarly to the residual codebook storage unit 42E.

【０１２７】タップ生成部６４Ａは、図３のタップ生成
部４３Ａにおける場合と同様に、フィルタ係数復号器６
３Ａから供給される復号線形予測係数から、予測タップ
とクラスタップを構成し、クラスタップを、クラス分類
部６５Ａに供給するとともに、予測タップを、正規方程
式加算回路６６Ａに供給する。タップ生成部６４Ｅは、
図３のタップ生成部４３Ｅにおける場合と同様に、残差
コードブック記憶部６３Ｅから供給される復号残差信号
から、予測タップとクラスタップを構成し、クラスタッ
プを、クラス分類部６５Ｅに供給するとともに、予測タ
ップを、正規方程式加算回路６６Ｅに供給する。The tap generation section 64A has a filter coefficient decoder 6 similar to the tap generation section 43A of FIG.
A prediction tap and a class tap are formed from the decoded linear prediction coefficients supplied from 3A, and the class tap is supplied to the classifying unit 65A and the prediction tap is supplied to the normal equation adding circuit 66A. The tap generation unit 64E
As in the case of the tap generation unit 43E in FIG. 3, a prediction tap and a class tap are formed from the decoded residual signal supplied from the residual codebook storage unit 63E, and the class tap is supplied to the class classification unit 65E. At the same time, the prediction tap is supplied to the normal equation addition circuit 66E.

【０１２８】クラス分類部６５Ａと６５Ｅは、図３のク
ラス分類部４４Ａと４４Ｅにおける場合とそれぞれ同様
に、そこに供給されるクラスタップに基づいて、クラス
分類を行い、その結果得られるクラスコードを、正規方
程式加算回路６６Ａと６６Ｅに、それぞれ供給する。The classifying units 65A and 65E perform class classification based on the class taps supplied thereto, as in the case of the classifying units 44A and 44E in FIG. 3, and classify the resulting class code. , And normal equation adding circuits 66A and 66E.

【０１２９】正規方程式加算回路６６Ａは、ＬＰＣ分析
部６１Ａからの教師データとしての注目フレームの線形
予測係数と、タップ生成部６４Ａからの生徒データとし
ての予測タップ（を構成する復号線形予測係数）を対象
とした足し込みを行う。正規方程式加算回路６６Ｅは、
予測フィルタ６１Ｅからの教師データとしての注目フレ
ームの残差信号と、タップ生成部６４Ｅからの生徒デー
タとしての予測タップ（を構成する復号残差信号）を対
象とした足し込みを行う。The normal equation addition circuit 66A calculates the linear prediction coefficient of the frame of interest as the teacher data from the LPC analysis section 61A and the prediction tap (the decoded linear prediction coefficient constituting the prediction tap) as the student data from the tap generation section 64A. Perform target addition. The normal equation addition circuit 66E
The addition is performed for the residual signal of the frame of interest as the teacher data from the prediction filter 61E and the prediction residual (the decoded residual signal forming the same) as the student data from the tap generation unit 64E.

【０１３０】即ち、正規方程式加算回路６６Ａは、クラ
ス分類部６５Ａから供給されるクラスコードに対応する
クラスごとに、予測タップ（生徒データ）を用い、式
（１３）の行列Ａにおける各コンポーネントとなってい
る、生徒データどうしの乗算（ｘ_inｘ_im）と、サメーシ
ョン（Σ）に相当する演算を行う。That is, the normal equation adding circuit 66A uses the prediction taps (student data) for each class corresponding to the class code supplied from the class classification section 65A to generate each component in the matrix A of the equation (13). Multiplication (x_in x_im ) between the student data, and an operation corresponding to summation (Σ).

【０１３１】さらに、正規方程式加算回路６６Ａは、や
はり、クラス分類部６５Ａから供給されるクラスコード
に対応するクラスごとに、生徒データ（予測タップを構
成する復号線形予測係数）および教師データ（注目フレ
ームの線形予測係数）を用い、式（１３）のベクトルｖ
における各コンポーネントとなっている、生徒データと
教師データの乗算（ｘ_inｙ_i）と、サメーション（Σ）
に相当する演算を行う。Further, the normal equation adding circuit 66A also generates student data (decoded linear prediction coefficients constituting prediction taps) and teacher data (frame of interest) for each class corresponding to the class code supplied from the class classification section 65A. Of the equation (13) using the linear prediction coefficient
Multiplication of student data and teacher data (x_in y_i ) and summation (Σ)
An operation corresponding to is performed.

【０１３２】正規方程式加算回路６６Ａは、以上の足し
込みを、ＬＰＣ分析部６１Ａから供給される線形予測係
数のフレームすべてを注目フレームとして行い、これに
より、各クラスについて、線形予測係数に関する式（１
３）に示した正規方程式をたてる。The normal equation adding circuit 66A performs the above-described addition using all the frames of the linear prediction coefficients supplied from the LPC analysis section 61A as the frames of interest.
The normal equation shown in 3) is established.

【０１３３】正規方程式加算回路６６Ｅも、同様の足し
込みを、予測フィルタ６１Ｅから供給される残差信号の
フレームすべてを注目フレームとして行い、これによ
り、各クラスについて、残差信号に関する式（１３）に
示した正規方程式をたてる。The normal equation addition circuit 66E also performs the same addition using all the frames of the residual signal supplied from the prediction filter 61E as the frame of interest, whereby the equation (13) relating to the residual signal is obtained for each class. Make the normal equation shown in.

【０１３４】タップ係数決定回路６７Ａと６７Ｅは、正
規方程式加算回路６６Ａと６６Ｅにおいてクラスごとに
生成された正規方程式それぞれを解くことにより、クラ
スごとに、線形予測係数と残差信号についてのタップ係
数をそれぞれ求め、係数メモリ６８Ａと６８Ｅの、各ク
ラスに対応するアドレスにそれぞれ供給する。The tap coefficient determination circuits 67A and 67E solve the linear prediction coefficients and the tap coefficients for the residual signal for each class by solving each of the normal equations generated for each class in the normal equation addition circuits 66A and 66E. Each is obtained and supplied to an address corresponding to each class in the coefficient memories 68A and 68E.

【０１３５】なお、学習用の音声信号として用意した音
声信号によっては、正規方程式加算回路６６Ａや６６Ｅ
において、タップ係数を求めるのに必要な数の正規方程
式が得られないクラスが生じる場合があり得るが、タッ
プ係数決定回路６７Ａと６７Ｅは、そのようなクラスに
ついては、例えば、デフォルトのタップ係数を出力す
る。Depending on the audio signal prepared as the audio signal for learning, the normal equation addition circuits 66A and 66E
In the above, there may be a case where a class in which the number of normal equations required for obtaining the tap coefficient is not obtained may occur. However, the tap coefficient determination circuits 67A and 67E determine the default tap coefficient for such a class, for example. Output.

【０１３６】係数メモリ６８Ａと６８Ｅは、タップ係数
決定回路６７Ａと６７Ｅからそれぞれ供給されるクラス
ごとの線形予測係数と残差信号についてのタップ係数
を、それぞれ記憶する。The coefficient memories 68A and 68E store the linear prediction coefficients for each class and the tap coefficients for the residual signal supplied from the tap coefficient determination circuits 67A and 67E, respectively.

【０１３７】次に、図８のフローチャートを参照して、
図６の学習装置の処理（学習処理）について説明する。Next, referring to the flowchart of FIG.
The processing (learning processing) of the learning device in FIG. 6 will be described.

【０１３８】学習装置には、学習用の音声信号が供給さ
れ、ステップＳ１１では、その学習用の音声信号から、
教師データと生徒データが生成される。A learning audio signal is supplied to the learning device. In step S11, the learning audio signal is
Teacher data and student data are generated.

【０１３９】即ち、ＬＰＣ分析部６１Ａは、学習用の音
声信号のフレームを、順次、注目フレームとし、その注
目フレームの音声信号をＬＰＣ分析することで、Ｐ次の
線形予測係数を求め、教師データとして、正規方程式加
算回路６６Ａに供給する。さらに、この線形予測係数
は、予測フィルタ６１Ｅおよびベクトル量子化部６２Ａ
にも供給され、ベクトル量子化部６２Ａは、ＬＰＣ分析
部６１Ａからの注目フレームの線形予測係数で構成され
る特徴ベクトルをベクトル量子化し、そのベクトル量子
化の結果得られるＡコードを、フィルタ係数復号器６３
Ａに供給する。フィルタ係数復号器６３Ａは、ベクトル
量子化部６２ＡからのＡコードを、復号線形予測係数に
復号し、その復号線形予測係数を、生徒データとして、
タップ生成部６４Ａに供給する。That is, the LPC analysis section 61A sequentially sets the frames of the audio signal for learning as a frame of interest, performs an LPC analysis on the audio signal of the frame of interest, obtains a P-order linear prediction coefficient, and obtains the teacher data. Is supplied to the normal equation addition circuit 66A. Further, the linear prediction coefficient is calculated by the prediction filter 61E and the vector quantization unit 62A.
The vector quantization unit 62A vector-quantizes the feature vector composed of the linear prediction coefficient of the frame of interest from the LPC analysis unit 61A, and converts the A code obtained as a result of the vector quantization into filter coefficient decoding. Bowl 63
A. The filter coefficient decoder 63A decodes the A code from the vector quantization unit 62A into decoded linear prediction coefficients, and uses the decoded linear prediction coefficients as student data.
This is supplied to the tap generator 64A.

【０１４０】一方、注目フレームの線形予測係数を、Ｌ
ＰＣ分析部６１Ａから受信した予測フィルタ６１Ｅは、
その線形予測係数と、注目フレームの学習用の音声信号
とを用いて、式（１）にしたがった演算を行うことによ
り、注目フレームの残差信号を求め、教師データとし
て、正規方程式加算回路６６Ｅに供給する。さらに、こ
の残差信号は、ベクトル量子化部６２Ｅにも供給され、
ベクトル量子化部６２Ｅは、予測フィルタ６１Ｅからの
注目フレームの残差信号のサンプル値で構成される残差
ベクトルをベクトル量子化し、そのベクトル量子化の結
果得られる残差コードを、残差コードブック記憶部６３
Ｅに供給する。残差コードブック記憶部６３Ｅは、ベク
トル量子化部６２Ｅからの残差コードを、復号残差信号
に復号し、その復号残差信号を、生徒データとして、タ
ップ生成部６４Ｅに供給する。On the other hand, the linear prediction coefficient of the frame of interest is represented by L
The prediction filter 61E received from the PC analysis unit 61A is
By using the linear prediction coefficient and the speech signal for learning the frame of interest to perform an operation according to equation (1), a residual signal of the frame of interest is obtained. To supply. Further, this residual signal is also supplied to a vector quantization unit 62E,
The vector quantization unit 62E vector-quantizes a residual vector composed of sample values of the residual signal of the frame of interest from the prediction filter 61E, and stores a residual code obtained as a result of the vector quantization in a residual codebook. Storage unit 63
Supply to E. The residual codebook storage unit 63E decodes the residual code from the vector quantization unit 62E into a decoded residual signal, and supplies the decoded residual signal to the tap generation unit 64E as student data.

【０１４１】そして、ステップＳ１２に進み、タップ生
成部６４Ａが、フィルタ係数復号器６３Ａから供給され
る復号線形予測係数から、線形予測係数についての予測
タップとクラスタップを構成するとともに、タップ生成
部６４Ｅが、残差コードブック記憶部６３Ｅから供給さ
れる復号残差信号から、残差信号についての予測タップ
とクラスタップを構成する。線形予測係数についてのク
ラスタップは、クラス分類部６５Ａに供給され、予測タ
ップは、正規方程式加算回路６６Ａに供給される。ま
た、残差信号についてのクラスタップは、クラス分類部
６５Ｅに供給され、予測タップは、正規方程式加算回路
６６Ｅに供給される。Then, the process proceeds to step S12, where the tap generation section 64A constructs a prediction tap and a class tap for the linear prediction coefficient from the decoded linear prediction coefficients supplied from the filter coefficient decoder 63A, and generates the tap generation section 64E. Form prediction taps and class taps for the residual signal from the decoded residual signal supplied from the residual codebook storage unit 63E. The class tap for the linear prediction coefficient is supplied to the classifying unit 65A, and the prediction tap is supplied to the normal equation adding circuit 66A. The class tap for the residual signal is supplied to the classifying unit 65E, and the prediction tap is supplied to the normal equation adding circuit 66E.

【０１４２】その後、ステップＳ１３において、クラス
分類部６５Ａが、線形予測係数についてのクラスタップ
に基づいて、クラス分類を行い、その結果得られるクラ
スコードを、正規方程式加算回路６６Ａに供給するとと
もに、クラス分類部６５Ｅが、残差信号についてのクラ
スタップに基づいて、クラス分類を行い、その結果得ら
れるクラスコードを、正規方程式加算回路６６Ｅに供給
する。Thereafter, in step S13, the class classifying section 65A classifies the class based on the class tap for the linear prediction coefficient, and supplies the resulting class code to the normal equation adding circuit 66A. The classifying section 65E classifies the residual signal based on the class tap, and supplies the resulting class code to the normal equation adding circuit 66E.

【０１４３】そして、ステップＳ１４に進み、正規方程
式加算回路６６Ａは、ＬＰＣ分析部６１Ａからの教師デ
ータとしての注目フレームの線形予測係数、およびタッ
プ生成部６４Ａからの生徒データとしての予測タップ
（を構成する復号線形予測係数）を対象として、式（１
３）の行列Ａとベクトルｖの、上述したような足し込み
を行う。さらに、ステップＳ１４では、正規方程式加算
回路６６Ｅが、予測フィルタ６１Ｅからの教師データと
しての注目フレームの残差信号、およびタップ生成部６
４Ｅからの生徒データとしての予測タップ（を構成する
復号残差信号）を対象として、式（１３）の行列Ａとベ
クトルｖの、上述したような足し込みを行い、ステップ
Ｓ１５に進む。In step S14, the normal equation adding circuit 66A constructs the linear prediction coefficient of the frame of interest as the teacher data from the LPC analysis section 61A and the prediction tap (the student data from the tap generation section 64A). (1), the equation (1)
3) Addition of the matrix A and the vector v as described above is performed. Further, in step S14, the normal equation addition circuit 66E outputs the residual signal of the frame of interest as teacher data from the prediction filter 61E and the tap generation unit 6
The above-described addition of the matrix A of Expression (13) and the vector v is performed on the prediction tap (the decoded residual signal constituting the prediction tap) as the student data from 4E, and the process proceeds to Step S15.

【０１４４】ステップＳ１５では、まだ、注目フレーム
として処理すべきフレームの学習用の音声信号があるか
どうかが判定される。ステップＳ１５において、まだ、
注目フレームとして処理すべきフレームの学習用の音声
信号があると判定された場合、ステップＳ１１に戻り、
次のフレームを新たに注目フレームとして、以下、同様
の処理が繰り返される。In step S15, it is determined whether there is still a speech signal for learning a frame to be processed as the frame of interest. In step S15,
If it is determined that there is an audio signal for learning of a frame to be processed as the frame of interest, the process returns to step S11,
With the next frame as a new frame of interest, the same processing is repeated thereafter.

【０１４５】また、ステップＳ１５において、注目フレ
ームとして処理すべきフレームの学習用の音声信号がな
いと判定された場合、即ち、正規方程式加算回路６６Ａ
と６６Ｅにおいて、各クラスについて、正規方程式が得
られた場合、ステップＳ１６に進み、タップ係数決定回
路６７Ａは、各クラスごとに生成された正規方程式を解
くことにより、各クラスごとに、線形予測係数について
のタップ係数を求め、係数メモリ６８Ａの、各クラスに
対応するアドレスに供給して記憶させる。さらに、タッ
プ係数決定回路６７Ｅも、各クラスごとに生成された正
規方程式を解くことにより、各クラスごとに、残差信号
についてのタップ係数を求め、係数メモリ６８Ｅの、各
クラスに対応するアドレスに供給して記憶させ、処理を
終了する。If it is determined in step S15 that there is no audio signal for learning the frame to be processed as the frame of interest, that is, the normal equation adding circuit 66A
If the normal equation is obtained for each class in steps E and E, the process proceeds to step S16, where the tap coefficient determination circuit 67A solves the normal equation generated for each class, thereby obtaining a linear prediction coefficient for each class. Are obtained and supplied to and stored in the coefficient memory 68A at addresses corresponding to the respective classes. Further, the tap coefficient determination circuit 67E also solves the normal equation generated for each class to obtain a tap coefficient for the residual signal for each class, and stores the tap coefficient in the coefficient memory 68E in an address corresponding to each class. The data is supplied and stored, and the process ends.

【０１４６】以上のようにして、係数メモリ６８Ａに記
憶された各クラスごとの線形予測係数についてのタップ
係数が、図３の係数メモリ４５Ａに記憶されているとと
もに、係数メモリ６８Ｅに記憶された各クラスごとの残
差信号についてのタップ係数が、図３の係数メモリ４５
Ｅに記憶されている。As described above, the tap coefficients for the linear prediction coefficients for each class stored in the coefficient memory 68A are stored in the coefficient memory 45A of FIG. The tap coefficient for the residual signal for each class is stored in the coefficient memory 45 of FIG.
E.

【０１４７】従って、図３の係数メモリ４５Ａに記憶さ
れたタップ係数は、線形予測演算を行うことにより得ら
れる真の線形予測係数の予測値の予測誤差（ここでは、
自乗誤差）が、統計的に最小になるように学習を行うこ
とにより求められたものであり、また、係数メモリ４５
Ｅに記憶されたタップ係数も、線形予測演算を行うこと
により得られる真の残差信号の予測値の予測誤差（自乗
誤差）が、統計的に最小になるように学習を行うことに
より求められたものであるから、図３の予測部４６Ａと
４６Ｅが出力する線形予測係数と残差信号は、それぞれ
真の線形予測係数と残差信号にほぼ一致することとな
り、その結果、これらの線形予測係数と残差信号によっ
て生成される合成音は、歪みの少ない、高音質のものと
なる。Therefore, the tap coefficients stored in the coefficient memory 45A of FIG. 3 are the prediction errors (here, the prediction errors of the prediction values of the true linear prediction coefficients obtained by performing the linear prediction operation).
(Square error) is obtained by performing learning so as to be statistically minimized.
The tap coefficient stored in E is also obtained by learning so that the prediction error (square error) of the predicted value of the true residual signal obtained by performing the linear prediction operation is statistically minimized. Therefore, the linear prediction coefficient and the residual signal output by the prediction units 46A and 46E in FIG. 3 substantially match the true linear prediction coefficient and the residual signal, respectively. As a result, these linear prediction coefficients The synthesized sound generated by the coefficient and the residual signal has low distortion and high sound quality.

【０１４８】なお、図３の音声合成装置において、上述
したように、例えば、タップ生成部４３Ａに、復号線形
予測係数と復号残差信号との両方から、線形予測係数の
クラスタップや予測タップを抽出させるようにする場合
には、図６のタップ生成部６４Ａにも、復号線形予測係
数と復号残差信号との両方から、線形予測係数のクラス
タップや予測タップを抽出させるようにする必要があ
る。タップ生成部６４Ｅについても同様である。In the speech synthesizer shown in FIG. 3, as described above, for example, the tap generation unit 43A outputs the class tap and the prediction tap of the linear prediction coefficient from both the decoded linear prediction coefficient and the decoded residual signal. In the case where the taps are extracted, it is necessary that the tap generation unit 64A of FIG. 6 also extract the class tap and the prediction tap of the linear prediction coefficient from both the decoded linear prediction coefficient and the decoded residual signal. is there. The same applies to the tap generation unit 64E.

【０１４９】また、図３の音声合成装置において、上述
したように、タップ生成部４３Ａと４３Ｅ、クラス分類
部４４Ａと４４Ｅ、係数メモリ４５Ａと４５Ｅを、ぞれ
ぞれ一体的に構成する場合には、図６の学習装置におい
ても、タップ生成部６４Ａと６４Ｅ、クラス分類部６５
Ａと６５Ｅ、正規方程式加算回路６６Ａと６６Ｅ、タッ
プ係数決定回路６７Ａと６７Ｅ、係数メモリ６８Ａと６
８Ｅを、ぞれぞれ一体的に構成する必要がある。この場
合、正規方程式加算回路６６Ａと６６Ｅを一体的に構成
した正規方程式加算回路では、ＬＰＣ分析部６１Ａが出
力する線形予測係数と、予測フィルタ６１Ｅが出力する
残差信号との両方を、一度に、教師データとするととも
に、フィルタ係数復号器６３Ａが出力する復号線形予測
係数と、残差コードブック記憶部６３Ｅが出力する復号
残差信号との両方を、一度に、生徒データとして、正規
方程式がたてられ、タップ係数決定回路６７Ａと６７Ｅ
とを一体的に構成したタップ係数決定回路では、その正
規方程式を解くことにより、クラスごとの、線形予測係
数と残差信号それぞれについてのタップ係数が、一度に
求められる。In the speech synthesizer shown in FIG. 3, when the tap generators 43A and 43E, the classifiers 44A and 44E, and the coefficient memories 45A and 45E are integrally formed, as described above. 6, the tap generators 64A and 64E, the classifier 65
A and 65E, normal equation addition circuits 66A and 66E, tap coefficient determination circuits 67A and 67E, coefficient memories 68A and 6
8E must be integrally formed. In this case, in the normal equation adding circuit in which the normal equation adding circuits 66A and 66E are integrally formed, both the linear prediction coefficient output from the LPC analysis unit 61A and the residual signal output from the prediction filter 61E are simultaneously output. In addition to the teacher data, both the decoded linear prediction coefficient output from the filter coefficient decoder 63A and the decoded residual signal output from the residual codebook storage unit 63E are used as student data at once, and the normal equation is Tap coefficient determination circuits 67A and 67E
In the tap coefficient determination circuit integrally configured with the above, the tap coefficient for each of the linear prediction coefficient and the residual signal for each class is obtained at once by solving the normal equation.

【０１５０】次に、図９は、本発明を適用した伝送シス
テム（システムとは、複数の装置が論理的に集合した物
をいい、各構成の装置が同一筐体中にあるか否かは問わ
ない）の一実施の形態の構成を示している。Next, FIG. 9 shows a transmission system to which the present invention is applied (a system refers to a device in which a plurality of devices are logically assembled, and it is determined whether or not the devices of each configuration are in the same housing. (Regardless of the present invention).

【０１５１】この伝送システムでは、携帯電話機８１₁
と８１₂が、基地局８２₁と８２₂それぞれとの間で、無
線による通信を行うとともに、基地局８２₁と８２₂それ
ぞれが、交換局８３との間で通信を行うことにより、最
終的には、携帯電話機８１₁と８１₂との間において、基
地局８２₁および８２₂、並びに交換局８３を介して、音
声の送受信を行うことができるようになっている。な
お、基地局８２₁と８２₂は、同一の基地局であっても良
いし、異なる基地局であっても良い。In this transmission system, the portable telephone 81₁
When 81_2, between the base station 82₁ and 82_2, respectively, performs communication by radio, each base station 82₁ and 82_2, by communicating with the switching center 83, the final In this configuration, voice can be transmitted and received between the mobile phones 81₁ and 81₂ via the base stations 82₁ and 82₂ and the exchange 83. Note that the base stations 82₁ and 82₂ may be the same base station or different base stations.

【０１５２】ここで、以下、特に区別する必要がない限
り、携帯電話機８１₁と８１₂を、携帯電話機８１と記述
する。Here, the portable telephones 81₁ and 81₂ will be described as the portable telephone 81 unless it is particularly necessary to distinguish them.

【０１５３】図１０は、図９の携帯電話機８１の構成例
を示している。FIG. 10 shows a configuration example of the mobile phone 81 of FIG.

【０１５４】アンテナ９１は、基地局８２₁または８２₂
からの電波を受信し、その受信信号を、変復調部９２に
供給するとともに、変復調部９２からの信号を、電波
で、基地局８２₁または８２₂に送信する。変復調部９２
は、アンテナ９１からの信号を復調し、その結果得られ
る、図１で説明したようなコードデータを、受信部９４
に供給する。また、変復調部９２は、送信部９３から供
給される、図１で説明したようなコードデータを変調
し、その結果得られる変調信号を、アンテナ９１に供給
する。送信部９３は、図１に示した送信部と同様に構成
され、そこに入力されるユーザの音声を、コードデータ
に符号化して、変復調部９２に供給する。受信部９４
は、変復調部９２からのコードデータを受信し、そのコ
ードデータから、図３の音声合成装置における場合と同
様の高音質の音声を復号して出力する。The antenna 91 is connected to the base station 82₁ or 82₂
It receives signals from, and transmits the received signal, and supplies the modem unit 92, a signal from the modem unit 92, a radio wave, the base station 82₁ or 82_2. Modem 92
Demodulates the signal from the antenna 91 and converts the resulting code data as described in FIG.
To supply. The modulation / demodulation unit 92 modulates the code data supplied from the transmission unit 93 as described with reference to FIG. 1, and supplies the resulting modulated signal to the antenna 91. The transmitting unit 93 is configured similarly to the transmitting unit shown in FIG. 1, encodes the user's voice input thereto into code data, and supplies the code data to the modem unit 92. Receiver 94
Receives the code data from the modulation / demodulation unit 92, decodes the code data, and decodes and outputs the same high-quality sound as in the speech synthesizer in FIG.

【０１５５】即ち、図１１は、図１０の受信部９４の構
成例を示している。なお、図中、図２における場合と対
応する部分については、同一の符号を付してあり、以下
では、その説明は、適宜省略する。FIG. 11 shows an example of the configuration of the receiving section 94 shown in FIG. In the figure, portions corresponding to those in FIG. 2 are denoted by the same reference numerals, and a description thereof will be omitted as appropriate below.

【０１５６】タップ生成部１０１には、チャネルデコー
ダ２１が出力する、フレーム（またはサブフレーム）ご
とのＬコード、Ｇコード、Ｉコード、およびＡコードが
供給されるようになっており、タップ生成部１０１は、
そのＬコード、Ｇコード、Ｉコード、およびＡコードか
ら、クラスタップとするものを抽出し、クラス分類部１
０４に供給する。ここで、タップ生成部１０１が生成す
るような、レコード等で構成されるクラスタップを、以
下、適宜、第１のクラスタップという。The L code, the G code, the I code, and the A code for each frame (or subframe) output from the channel decoder 21 are supplied to the tap generation unit 101. 101 is
From the L code, G code, I code, and A code, a class tap is extracted, and the class
04. Here, a class tap composed of records and the like generated by the tap generation unit 101 is hereinafter appropriately referred to as a first class tap.

【０１５７】タップ生成部１０２には、演算器２８が出
力する、フレーム（またはサブフレーム）ごとの残差信
号ｅが供給されるようになっており、タップ生成部１０
２は、その残差信号から、クラスタップとするもの（サ
ンプル点）を抽出し、クラス分類部１０４に供給する。
さらに、タップ生成部１０２は、演算器２８からの残差
信号から、予測タップとするものを抽出し、予測部１０
６に供給する。ここで、タップ生成部１０２が生成する
ような、残差信号で構成されるクラスタップを、以下、
適宜、第２のクラスタップという。The tap generator 102 is supplied with the residual signal e for each frame (or subframe) output from the arithmetic unit 28.
2 extracts a class tap (sample point) from the residual signal and supplies it to the classifying unit 104.
Further, the tap generation unit 102 extracts, from the residual signal from the arithmetic unit 28, what is to be a prediction tap,
6 Here, a class tap constituted by a residual signal, such as that generated by the tap generation unit 102, will be described below.
Where appropriate, referred to as a second class tap.

【０１５８】タップ生成部１０３には、フィルタ係数復
号器２５が出力する、フレームごとの線形予測係数α_p
が供給されるようになっており、タップ生成部１０３
は、その線形予測係数から、クラスタップとするものを
抽出し、クラス分類部１０４に供給する。さらに、タッ
プ生成部１０３は、フィルタ係数復号器２５からの線形
予測係数から、予測タップとするものを抽出し、予測部
１０７に供給する。ここで、タップ生成部１０３が生成
するような、線形予測係数で構成されるクラスタップ
を、以下、適宜、第３のクラスタップという。The tap generation unit 103 outputs the linear prediction coefficient α_p for each frame output from the filter coefficient decoder 25.
Is supplied, and the tap generation unit 103
Extracts a class tap from the linear prediction coefficients and supplies the class tap to the class classification unit 104. Further, the tap generation unit 103 extracts a prediction tap from the linear prediction coefficients from the filter coefficient decoder 25, and supplies the prediction tap to the prediction unit 107. Here, a class tap constituted by linear prediction coefficients generated by the tap generation unit 103 is hereinafter appropriately referred to as a third class tap.

【０１５９】クラス分類部１０４は、タップ生成部１０
１乃至１０３それぞれから供給される第１乃至第３のク
ラスタップをまとめて、最終的なクラスタップとし、そ
の最終的なクラスタップに基づいて、クラス分類を行
い、そのクラス分類結果としてのクラスコードを、係数
メモリ１０５に供給する。The classifying section 104 includes the tap generating section 10
The first to third class taps supplied from the respective 1 to 103 are collectively referred to as a final class tap, a class is classified based on the final class tap, and a class code as a result of the classification is obtained. Is supplied to the coefficient memory 105.

【０１６０】係数メモリ１０５は、後述する図１２の学
習装置において学習処理が行われることにより得られ
る、クラスごとの線形予測係数についてのタップ係数
と、残差信号についてのタップ係数を記憶しており、ク
ラス分類部１０４が出力するクラスコードに対応するア
ドレスに記憶されているタップ係数を、予測部１０６と
１０７に供給する。なお、係数メモリ１０５から予測部
１０６に対しては、残差信号についてのタップ係数Ｗｅ
が供給され、係数メモリ１０５から予測部１０７に対し
ては、線形予測係数についてのタップ係数Ｗａが供給さ
れる。The coefficient memory 105 stores tap coefficients for the linear prediction coefficients for each class and tap coefficients for the residual signal, which are obtained by performing a learning process in the learning apparatus shown in FIG. The tap coefficients stored at the addresses corresponding to the class codes output from the class classification unit 104 are supplied to the prediction units 106 and 107. Note that the tap coefficient We for the residual signal is sent from the coefficient memory 105 to the prediction unit 106.
Is supplied from the coefficient memory 105 to the prediction unit 107 with the tap coefficient Wa for the linear prediction coefficient.

【０１６１】予測部１０６は、図３の予測部４６Ｅと同
様に、タップ生成部１０２が出力する予測タップと、係
数メモリ１０５が出力する残差信号についてのタップ係
数とを取得し、その予測タップとタップ係数とを用い
て、式（６）に示した線形予測演算を行う。これによ
り、予測部１０６は、注目フレームの残差信号（の予測
値）ｅｍを求めて、音声合成フィルタ２９に、入力信号
として供給する。The prediction unit 106 acquires the prediction tap output from the tap generation unit 102 and the tap coefficient for the residual signal output from the coefficient memory 105, similarly to the prediction unit 46E of FIG. The linear prediction operation shown in Expression (6) is performed using the and the tap coefficients. Accordingly, the prediction unit 106 obtains (predicted value of) the residual signal em of the frame of interest and supplies it to the speech synthesis filter 29 as an input signal.

【０１６２】予測部１０７は、図３の予測部４６Ａと同
様に、タップ生成部１０３が出力する予測タップと、係
数メモリ１０５が出力する線形予測係数についてのタッ
プ係数とを取得し、その予測タップとタップ係数とを用
いて、式（６）に示した線形予測演算を行う。これによ
り、予測部１０７は、注目フレームの線形予測係数（の
予測値）ｍα_pを求めて、音声合成フィルタ２９に供給
する。The prediction unit 107 acquires the prediction tap output from the tap generation unit 103 and the tap coefficient for the linear prediction coefficient output from the coefficient memory 105, similarly to the prediction unit 46A of FIG. The linear prediction operation shown in Expression (6) is performed using the and the tap coefficients. Accordingly, the prediction unit 107 obtains (a predicted value of) the linear prediction coefficient mα_p of the frame of interest, and supplies it to the speech synthesis filter 29.

【０１６３】以上のように構成される受信部９４では、
基本的には、図５に示したフローチャートにしたがった
処理と同様の処理が行われることで、高音質の合成音
が、音声の復号結果として出力される。In the receiving unit 94 configured as described above,
Basically, the same processing as the processing according to the flowchart shown in FIG. 5 is performed, so that a high-quality synthesized sound is output as a decoded sound.

【０１６４】即ち、チャネルデコーダ２１は、そこに供
給されるコードデータから、Ｌコード、Ｇコード、Ｉコ
ード、Ａコードを分離し、それぞれを、適応コードブッ
ク記憶部２２、ゲイン復号器２３、励起コードブック記
憶部２４、フィルタ係数復号器２５に供給する。さら
に、Ｌコード、Ｇコード、Ｉコード、およびＡコード
は、タップ生成部１０１にも供給される。That is, the channel decoder 21 separates the L code, the G code, the I code, and the A code from the code data supplied thereto, and separates them into the adaptive codebook storage unit 22, the gain decoder 23, It is supplied to the codebook storage unit 24 and the filter coefficient decoder 25. Further, the L code, the G code, the I code, and the A code are also supplied to the tap generation unit 101.

【０１６５】そして、適応コードブック記憶部２２、ゲ
イン復号器２３、励起コードブック記憶部２４、演算器
２６乃至２８では、図１の適応コードブック記憶部９、
ゲイン復号器１０、励起コードブック記憶部１１、演算
器１２乃至１４における場合と同様の処理が行われ、こ
れにより、Ｌコード、Ｇコード、およびＩコードが、残
差信号ｅに復号される。この復号残差信号は、演算器２
８からタップ生成部１０２に供給される。The adaptive codebook storage unit 22, the gain decoder 23, the excitation codebook storage unit 24, and the arithmetic units 26 to 28 include the adaptive codebook storage unit 9 shown in FIG.
The same processing as in the gain decoder 10, the excitation codebook storage unit 11, and the arithmetic units 12 to 14 is performed, whereby the L code, the G code, and the I code are decoded into the residual signal e. This decoded residual signal is calculated by
8 to the tap generation unit 102.

【０１６６】さらに、フィルタ係数復号器２５は、図１
で説明したように、そこに供給されるＡコードを、復号
線形予測係数に復号し、タップ生成部１０３に供給す
る。Further, the filter coefficient decoder 25 has the configuration shown in FIG.
As described in, the A code supplied thereto is decoded into decoded linear prediction coefficients and supplied to the tap generation unit 103.

【０１６７】タップ生成部１０１は、そこに供給される
Ｌコード、Ｇコード、Ｉコード、およびＡコードのフレ
ームを、順次、注目フレームとし、ステップＳ１（図
５）において、チャネルデコーダ２１からのＬコード、
Ｇコード、Ｉコード、およびＡコードから、第１のクラ
スタップを生成し、クラス分類部１０４に供給する。さ
らに、ステップＳ１では、タップ生成部１０２が、演算
器２８からの復号残差信号から、第２のクラスタップを
生成し、クラス分類部１０４に供給するとともに、タッ
プ生成部１０３が、フィルタ係数復号器２５からの線形
予測係数から、第３のクラスタップを生成し、クラス分
類部１０４に供給する。また、ステップＳ１では、タッ
プ生成部１０２が、演算器２８からの残差信号から、予
測タップとするものを抽出し、予測部１０６に供給する
とともに、タップ生成部１０３が、フィルタ係数復号器
２５からの線形予測係数から、予測タップを生成し、予
測部１０７に供給する。The tap generation unit 101 sequentially sets the L code, G code, I code, and A code frames supplied thereto as frames of interest, and in step S1 (FIG. 5), outputs the L code from the channel decoder 21. code,
A first class tap is generated from the G code, the I code, and the A code, and supplied to the class classification unit 104. Further, in step S1, the tap generation unit 102 generates a second class tap from the decoded residual signal from the arithmetic unit 28 and supplies the second class tap to the class classification unit 104. A third class tap is generated from the linear prediction coefficient from the unit 25 and supplied to the class classification unit 104. In step S1, the tap generation unit 102 extracts a prediction tap from the residual signal from the arithmetic unit 28 and supplies the prediction tap to the prediction unit 106. A prediction tap is generated from the linear prediction coefficient from, and is supplied to the prediction unit 107.

【０１６８】そして、ステップＳ２に進み、クラス分類
部１０４は、タップ生成部１０１乃至１０３それぞれか
ら供給される第１乃至第３のクラスタップをまとめた、
最終的なクラスタップに基づいて、クラス分類を行い、
その結果得られるクラスコードを、係数メモリ１０５に
供給して、ステップＳ３に進む。Then, the process proceeds to a step S2, wherein the classifying section 104 groups the first to third class taps supplied from the tap generating sections 101 to 103, respectively.
Classify based on the final class tap,
The resulting class code is supplied to the coefficient memory 105, and the process proceeds to step S3.

【０１６９】ステップＳ３では、係数メモリ１０５は、
クラス分類部１０４から供給されるクラスコードに対応
するアドレスから、残差信号と線形予測係数それぞれに
ついてのタップ係数を読み出し、残差信号についてのタ
ップ係数を、予測部１０６に供給するとともに、線形予
測係数についてのタップ係数を、予測部１０７に供給す
る。In step S3, the coefficient memory 105 stores
A tap coefficient for each of the residual signal and the linear prediction coefficient is read from an address corresponding to the class code supplied from the class classification unit 104, and a tap coefficient for the residual signal is supplied to the prediction unit 106 and the linear prediction is performed. The tap coefficients for the coefficients are supplied to the prediction unit 107.

【０１７０】そして、ステップＳ４に進み、予測部１０
６は、係数メモリ１０５が出力する残差信号についての
タップ係数を取得し、そのタップ係数と、タップ生成部
１０２からの予測タップとを用いて、式（６）に示した
積和演算を行い、注目フレームの真の残差信号（の予測
値）を得る。さらに、ステップＳ４では、予測部１０７
は、係数メモリ１０５が出力する線形予測係数について
のタップ係数を取得し、そのタップ係数と、タップ生成
部１０３からの予測タップとを用いて、式（６）に示し
た積和演算を行い、注目フレームの真の線形予測係数
（の予測値）を得る。Then, the process proceeds to a step S4, wherein the prediction section 10
6 obtains a tap coefficient for the residual signal output from the coefficient memory 105, and performs the product-sum operation shown in Expression (6) using the tap coefficient and the prediction tap from the tap generation unit 102. , The true residual signal (predicted value) of the frame of interest. Further, in step S4, the prediction unit 107
Obtains tap coefficients for the linear prediction coefficients output from the coefficient memory 105, and performs the product-sum operation shown in Expression (6) using the tap coefficients and the prediction taps from the tap generation unit 103. The (true predicted value) of the true linear prediction coefficient of the frame of interest is obtained.

【０１７１】以上のようにして得られた残差信号および
線形予測係数は、音声合成フィルタ２９に供給され、音
声合成フィルタ２９では、その残差信号および線形予測
係数を用いて、式（４）の演算が行われることにより、
注目フレームの合成音信号が生成される。この合成音信
号は、音声合成フィルタ２９から、Ｄ／Ａ変換部３０を
介して、スピーカ３１に供給され、これにより、スピー
カ３１からは、その合成音信号に対応する合成音が出力
される。The residual signal and the linear prediction coefficient obtained as described above are supplied to a speech synthesis filter 29. The speech synthesis filter 29 uses the residual signal and the linear prediction coefficient to obtain the equation (4) Is calculated,
A synthesized sound signal of the frame of interest is generated. The synthesized sound signal is supplied from the voice synthesis filter 29 to the speaker 31 via the D / A conversion unit 30. As a result, the speaker 31 outputs a synthesized sound corresponding to the synthesized sound signal.

【０１７２】予測部１０６と１０７において、残差信号
と線形予測係数がそれぞれ得られた後は、ステップＳ５
に進み、まだ、注目フレームとして処理すべきフレーム
のＬコード、Ｇコード、Ｉコード、およびＡコードがあ
るかどうかが判定される。ステップＳ５において、ま
だ、注目フレームとして処理すべきフレームのＬコー
ド、Ｇコード、Ｉコード、およびＡコードがあると判定
された場合、ステップＳ１に戻り、次に注目フレームと
すべきフレームを、新たに注目フレームとして、以下、
同様の処理を繰り返す。また、ステップＳ５において、
注目フレームとして処理すべきフレームのＬコード、Ｇ
コード、Ｉコード、およびＡコードがないと判定された
場合、処理を終了する。After the prediction units 106 and 107 obtain the residual signal and the linear prediction coefficient, respectively, step S5
It is determined whether there is still an L code, a G code, an I code, and an A code of the frame to be processed as the frame of interest. If it is determined in step S5 that there are still L codes, G codes, I codes, and A codes of the frames to be processed as the frame of interest, the process returns to step S1, and the frame to be the next frame of interest is newly set. In the following,
The same processing is repeated. Also, in step S5,
L code, G of the frame to be processed as the frame of interest
If it is determined that there is no code, I code, and A code, the process ends.

【０１７３】次に、図１２は、図１１の係数メモリ１０
５に記憶させるタップ係数の学習処理を行う学習装置の
一実施の形態の構成例を示している。Next, FIG. 12 shows the coefficient memory 10 of FIG.
5 shows a configuration example of an embodiment of a learning device that performs a learning process of a tap coefficient stored in No. 5;

【０１７４】マイク２０１乃至コード決定部２１５は、
図１のマイク１乃至コード決定部１５とそれぞれ同様に
構成される。そして、マイク２０１には、学習用の音声
信号が入力されるようになっており、従って、マイク２
０１乃至コード決定部２１５では、その学習用の音声信
号に対して、図１における場合と同様の処理が施され
る。The microphone 201 through the code determination unit 215
The configuration is the same as that of the microphone 1 to the code determination unit 15 in FIG. The microphone 201 receives a learning audio signal.
In the 01 to chord determination unit 215, the same processing as in FIG. 1 is performed on the learning audio signal.

【０１７５】そして、予測フィルタ１１１Ｅには、Ａ／
Ｄ変換部２０２が出力する、ディジタル信号とされた学
習用の音声信号と、ＬＰＣ分析部２０４が出力する線形
予測係数が供給される。また、タップ生成部１１２Ａに
は、ベクトル量子化部２０５が出力する線形予測係数
（ベクトル量子化に用いられるコードブックのコードベ
クトル（セントロイドベクトル）を構成する線形予測係
数）が供給され、タップ生成部１１２Ｅには、演算器２
１４が出力する残差信号（音声合成フィルタ２０６に供
給されるのと同一の残差信号）が供給される。さらに、
正規方程式加算回路１１４Ａには、ＬＰＣ分析部２０４
が出力する線形予測係数が供給され、タップ生成部１１
７には、コード決定部２１５が出力するＬコード、Ｇコ
ード、Ｉコード、およびＡコードが供給される。The prediction filter 111E includes A /
A learning audio signal converted into a digital signal and output from the D conversion unit 202 and a linear prediction coefficient output from the LPC analysis unit 204 are supplied. The tap generation unit 112A is supplied with linear prediction coefficients (linear prediction coefficients constituting a code vector (centroid vector) of a codebook used for vector quantization) output from the vector quantization unit 205, and generates taps. The operation unit 2 is included in the unit 112E.
The residual signal (the same residual signal as that supplied to the speech synthesis filter 206) output by 14 is supplied. further,
The normal equation addition circuit 114A includes an LPC analysis unit 204
Are supplied, and the tap generation unit 11
7, the L code, the G code, the I code, and the A code output from the code determination unit 215 are supplied.

【０１７６】予測フィルタ１１１Ｅは、Ａ／Ｄ変換部２
０２から供給される学習用の音声信号のフレームを、順
次、注目フレームとして、その注目フレームの音声信号
と、ＬＰＣ分析部２０４から供給される線形予測係数を
用いて、例えば、式（１）にしたがった演算を行うこと
により、注目フレームの残差信号を求める。この残差信
号は、教師データとして、正規方程式加算回路１１４Ｅ
に供給される。The prediction filter 111E includes an A / D converter 2
The frames of the audio signal for learning supplied from 02 are sequentially set as a frame of interest, using the audio signal of the frame of interest and the linear prediction coefficient supplied from the LPC analysis unit 204, for example, to equation (1) By performing the calculation according to the above, the residual signal of the frame of interest is obtained. The residual signal is used as teacher data as a normal equation addition circuit 114E.
Supplied to

【０１７７】タップ生成部１１２Ａは、ベクトル量子化
部２０５から供給される線形予測係数から、図１１のタ
ップ生成部１０３における場合と同一の予測タップと第
３のクラスタップを構成し、第３のクラスタップを、ク
ラス分類部１１３Ａおよび１１３Ｅに供給するととも
に、予測タップを、正規方程式加算回路１１４Ａに供給
する。The tap generation section 112A forms the same prediction tap and third class tap as in the tap generation section 103 in FIG. 11 from the linear prediction coefficients supplied from the vector quantization section 205, The class tap is supplied to the classifying units 113A and 113E, and the prediction tap is supplied to the normal equation adding circuit 114A.

【０１７８】タップ生成部１１２Ｅは、演算器２１４か
ら供給される残差信号から、図１１のタップ生成部１０
２における場合と同一の予測タップと第２のクラスタッ
プを構成し、第２のクラスタップを、クラス分類部１１
３Ａおよび１１３Ｅに供給するとともに、予測タップ
を、正規方程式加算回路１１４Ｅに供給する。The tap generating section 112E calculates the tap signal of the tap generating section 10 shown in FIG.
2 and the same prediction tap and the second class tap as in the case of FIG.
The prediction tap is supplied to the normal equation addition circuit 114E while being supplied to 3A and 113E.

【０１７９】クラス分類部１１３Ａおよび１１３Ｅに
は、タップ生成部１１２Ａと１１２Ｅから、それぞれ第
３と第２のクラスタップが供給される他、タップ生成部
１１７から第１のクラスタップも供給される。そして、
クラス分類部１１３Ａと１１３Ｅは、図１１のクラス分
類部１０４における場合と同様に、そこに供給される第
１乃至第３のクラスタップをまとめて、最終的なクラス
タップとし、その最終的なクラスタップに基づいて、ク
ラス分類を行い、その結果得られるクラスコードを、正
規方程式加算回路１１４Ａと１１４Ｅに、それぞれ供給
する。To the classifying units 113A and 113E, the third and second class taps are supplied from the tap generating units 112A and 112E, respectively, and the first class tap is also supplied from the tap generating unit 117. And
As in the case of the classifying unit 104 in FIG. 11, the classifying units 113A and 113E combine the first to third class taps supplied thereto to form a final class tap, and Classification is performed based on the taps, and the resulting class code is supplied to normal equation addition circuits 114A and 114E, respectively.

【０１８０】正規方程式加算回路１１４Ａは、ＬＰＣ分
析部２０４からの注目フレームの線形予測係数を、教師
データとして受信するとともに、タップ生成部１１２Ａ
からの予測タップを、生徒データとして受信し、その教
師データおよび生徒データを対象として、クラス分類部
１１３Ａからのクラスコードごとに、図６の正規方程式
加算回路６６Ａにおける場合と同様の足し込みを行うこ
とにより、各クラスについて、線形予測係数に関する式
（１３）に示した正規方程式をたてる。正規方程式加算
回路１１４Ｅは、予測フィルタ１１１Ｅからの注目フレ
ームの残差信号を、教師データとして受信するととも
に、タップ生成部１１２Ｅからの予測タップを、生徒デ
ータとして受信し、その教師データおよび生徒データを
対象として、クラス分類部１１３Ｅからのクラスコード
ごとに、図６の正規方程式加算回路６６Ｅにおける場合
と同様の足し込みを行うことにより、各クラスについ
て、残差信号に関する式（１３）に示した正規方程式を
たてる。The normal equation adding circuit 114A receives the linear prediction coefficient of the frame of interest from the LPC analysis section 204 as teacher data, and also generates the tap generation section 112A.
Is received as student data, and the same addition as in the normal equation adding circuit 66A of FIG. 6 is performed on the teacher data and student data for each class code from the class classification unit 113A. Thus, for each class, the normal equation shown in Expression (13) for the linear prediction coefficient is established. The normal equation adding circuit 114E receives the residual signal of the frame of interest from the prediction filter 111E as teacher data, receives the prediction tap from the tap generator 112E as student data, and outputs the teacher data and student data. As an object, the same addition as in the normal equation adding circuit 66E of FIG. 6 is performed for each class code from the class classification unit 113E, so that the normal signal shown in the equation (13) for the residual signal is obtained for each class. Make an equation.

【０１８１】タップ係数決定回路１１５Ａと１１５Ｅ
は、正規方程式加算回路１１４Ａと１１４Ｅにおいてク
ラスごとに生成された正規方程式それぞれを解くことに
より、クラスごとに、線形予測係数と残差信号について
のタップ係数をそれぞれ求め、係数メモリ１１６Ａと１
１６Ｅの、各クラスに対応するアドレスにそれぞれ供給
する。Tap coefficient determination circuits 115A and 115E
Solves each of the normal equations generated for each class in the normal equation addition circuits 114A and 114E, thereby obtaining a linear prediction coefficient and a tap coefficient for the residual signal for each class.
16E to the addresses corresponding to each class.

【０１８２】なお、学習用の音声信号として用意する音
声信号によっては、正規方程式加算回路１１４Ａや１１
４Ｅにおいて、タップ係数を求めるのに必要な数の正規
方程式が得られないクラスが生じる場合があり得るが、
タップ係数決定回路１１５Ａと１１５Ｅは、そのような
クラスについては、例えば、デフォルトのタップ係数を
出力する。Depending on the audio signal prepared as the audio signal for learning, the normal equation addition circuit 114A or 11
In 4E, there may be a case where a class in which the necessary number of normal equations for obtaining the tap coefficients cannot be obtained occurs.
For such a class, the tap coefficient determination circuits 115A and 115E output, for example, default tap coefficients.

【０１８３】係数メモリ１１６Ａと１１６Ｅは、タップ
係数決定回路１１５Ａと１１５Ｅから、それぞれ供給さ
れるクラスごとの線形予測係数と残差信号についてのタ
ップ係数を、それぞれ記憶する。The coefficient memories 116A and 116E store the linear prediction coefficients for each class and the tap coefficients for the residual signal supplied from the tap coefficient determination circuits 115A and 115E, respectively.

【０１８４】タップ生成部１１７は、コード決定部２１
５から供給されるＬコード、Ｇコード、Ｉコード、およ
びＡコードから、図１１のタップ生成部１０１における
場合と同一の第１のクラスタップを生成し、クラス分類
部１１３Ａおよび１１３Ｅに供給する。[0184] The tap generation section 117
From the L code, G code, I code, and A code supplied from 5, the same first class tap as that in the tap generation unit 101 in FIG. 11 is generated and supplied to the class classification units 113A and 113E.

【０１８５】以上のように構成される学習装置では、基
本的には、図８に示したフローチャートにしたがった処
理と同様の処理が行われることで、高音質の合成音を得
るためのタップ係数が求められる。In the learning apparatus configured as described above, basically, the same processing as the processing according to the flowchart shown in FIG. 8 is performed, so that tap coefficients for obtaining a high-quality synthesized sound are obtained. Is required.

【０１８６】学習装置には、学習用の音声信号が供給さ
れ、ステップＳ１１において、その学習用の音声信号か
ら、教師データと生徒データが生成される。The learning device is supplied with a learning voice signal, and in step S11, teacher data and student data are generated from the learning voice signal.

【０１８７】即ち、学習用の音声信号は、マイク２０１
に入力され、マイク２０１乃至コード決定部２１５は、
図１のマイク１乃至コード決定部１５における場合とそ
れぞれ同様の処理を行う。That is, the audio signal for learning is transmitted from the microphone 201
And the microphone 201 through the code determination unit 215
The same processing as in the case of the microphone 1 to the code determination unit 15 in FIG. 1 is performed.

【０１８８】その結果、ＬＰＣ分析部２０４で得られる
線形予測係数は、教師データとして、正規方程式加算回
路１１４Ａに供給される。また、この線形予測係数は、
予測フィルタ１１１Ｅにも供給される。さらに、演算器
２１４で得られる残差信号は、生徒データとして、タッ
プ生成部１１２Ｅに供給される。As a result, the linear prediction coefficients obtained by the LPC analysis section 204 are supplied to the normal equation adding circuit 114A as teacher data. Also, this linear prediction coefficient is
It is also supplied to the prediction filter 111E. Further, the residual signal obtained by the calculator 214 is supplied to the tap generator 112E as student data.

【０１８９】また、Ａ／Ｄ変換部２０２が出力するディ
ジタルの音声信号は、予測フィルタ１１１Ｅに供給さ
れ、ベクトル量子化部２０５が出力する線形予測係数
は、生徒データとして、タップ生成部１１２Ａに供給さ
れる。さらに、コード決定部２１５が出力するＬコー
ド、Ｇコード、Ｉコード、およびＡコードは、タップ生
成部１１７に供給される。The digital audio signal output from the A / D converter 202 is supplied to the prediction filter 111E, and the linear prediction coefficient output from the vector quantizer 205 is supplied to the tap generator 112A as student data. Is done. Further, the L code, G code, I code, and A code output by the code determination unit 215 are supplied to the tap generation unit 117.

【０１９０】そして、予測フィルタ１１１Ｅは、Ａ／Ｄ
変換部２０２から供給される学習用の音声信号のフレー
ムを、順次、注目フレームとして、その注目フレームの
音声信号と、ＬＰＣ分析部２０４から供給される線形予
測係数を用いて、式（１）にしたがった演算を行うこと
により、注目フレームの残差信号を求める。この予測フ
ィルタ１１１Ｅで得られる残差信号は、教師データとし
て、正規方程式加算回路１１４Ｅに供給される。Then, the prediction filter 111E calculates the A / D
The frames of the audio signal for learning supplied from the conversion unit 202 are sequentially set as a frame of interest, and the audio signal of the frame of interest and the linear prediction coefficient supplied from the LPC analysis unit 204 are used to obtain Equation (1). By performing the calculation according to the above, the residual signal of the frame of interest is obtained. The residual signal obtained by the prediction filter 111E is supplied to the normal equation adding circuit 114E as teacher data.

【０１９１】以上のようにして、教師データと生徒デー
タが得られた後は、ステップＳ１２に進み、タップ生成
部１１２Ａが、ベクトル量子化部２０５から供給される
線形予測係数から、線形予測係数についての予測タップ
と第３のクラスタップを生成するとともに、タップ生成
部１１２Ｅが、演算器２１４から供給される残差信号か
ら、残差信号についての予測タップと第２のクラスタッ
プを生成する。さらに、ステップＳ１２では、タップ生
成部１１７が、コード決定部２１５から供給されるＬコ
ード、Ｇコード、Ｉコード、およびＡコードから、第１
のクラスタップを生成する。After the teacher data and the student data have been obtained as described above, the process proceeds to step S12, where the tap generation unit 112A calculates the linear prediction coefficient from the linear prediction coefficient supplied from the vector quantization unit 205. , And a tap generation unit 112E generates a prediction tap and a second class tap for the residual signal from the residual signal supplied from the arithmetic unit 214. Further, in step S12, the tap generation unit 117 determines the first code from the L code, G code, I code, and A code supplied from the code determination unit 215.
Generate class taps for.

【０１９２】線形予測係数についての予測タップは、正
規方程式加算回路１１４Ａに供給され、残差信号につい
ての予測タップは、正規方程式加算回路１１４Ｅに供給
される。また、第１乃至第３のクラスタップは、クラス
分類回路１１３Ａおよび１１３Ｅに供給される。The prediction tap for the linear prediction coefficient is supplied to the normal equation addition circuit 114A, and the prediction tap for the residual signal is supplied to the normal equation addition circuit 114E. The first to third class taps are supplied to the classifying circuits 113A and 113E.

【０１９３】その後、ステップＳ１３において、クラス
分類部１１３Ａと１１３Ｅが、第１乃至第３のクラスタ
ップに基づいて、クラス分類を行い、その結果得られる
クラスコードを、正規方程式加算回路１１４Ａと１１４
Ｅに、それぞれ供給する。Thereafter, in step S13, the classifying units 113A and 113E perform class classification based on the first to third class taps, and classify the resulting class codes into normal equation adding circuits 114A and 114A.
E.

【０１９４】そして、ステップＳ１４に進み、正規方程
式加算回路１１４Ａは、ＬＰＣ分析部２０４からの教師
データとしての注目フレームの線形予測係数、およびタ
ップ生成部１１２Ａからの生徒データとしての予測タッ
プを対象として、式（１３）の行列Ａとベクトルｖの、
上述したような足し込みを、クラス分類部１１３Ａから
のクラスコードごとに行う。さらに、ステップＳ１４で
は、正規方程式加算回路１１４Ｅが、予測フィルタ１１
１Ｅからの教師データとしての注目フレームの残差信
号、およびタップ生成部１１２Ｅからの生徒データとし
ての予測タップを対象として、式（１３）の行列Ａとベ
クトルｖの、上述したような足し込みを、クラス分類部
１１３Ｅからのクラスコードごとに行い、ステップＳ１
５に進む。Then, proceeding to step S14, the normal equation adding circuit 114A targets the linear prediction coefficient of the frame of interest as the teacher data from the LPC analysis unit 204 and the prediction tap as the student data from the tap generation unit 112A. , Of the matrix A and the vector v in equation (13)
The above-described addition is performed for each class code from the class classification unit 113A. Further, in step S14, the normal equation adding circuit 114E
For the residual signal of the frame of interest as the teacher data from 1E and the prediction tap as the student data from the tap generator 112E, the above-described addition of the matrix A and the vector v of Expression (13) is performed. Is performed for each class code from the class classifying unit 113E, and step S1
Go to 5.

【０１９５】ステップＳ１５では、まだ、注目フレーム
として処理すべきフレームの学習用の音声信号があるか
どうかが判定される。ステップＳ１５において、まだ、
注目フレームとして処理すべきフレームの学習用の音声
信号があると判定された場合、ステップＳ１１に戻り、
次のフレームを新たに注目フレームとして、以下、同様
の処理が繰り返される。In step S15, it is determined whether there is still a speech signal for learning a frame to be processed as the frame of interest. In step S15,
If it is determined that there is an audio signal for learning of a frame to be processed as the frame of interest, the process returns to step S11,
With the next frame as a new frame of interest, the same processing is repeated thereafter.

【０１９６】また、ステップＳ１５において、注目フレ
ームとして処理すべきフレームの学習用の音声信号がな
いと判定された場合、即ち、正規方程式加算回路１１４
Ａと１１４Ｅそれぞれにおいて、各クラスについて、正
規方程式が得られた場合、ステップＳ１６に進み、タッ
プ係数決定回路１１５Ａは、各クラスごとに生成された
正規方程式を解くことにより、各クラスごとに、線形予
測係数についてのタップ係数を求め、係数メモリ１１６
Ａの、各クラスに対応するアドレスに供給して記憶させ
る。さらに、タップ係数決定回路１１５Ｅも、各クラス
ごとに生成された正規方程式を解くことにより、各クラ
スごとに、残差信号についてのタップ係数を求め、係数
メモリ１１６Ｅの、各クラスに対応するアドレスに供給
して記憶させ、処理を終了する。If it is determined in step S15 that there is no audio signal for learning the frame to be processed as the frame of interest, that is, the normal equation adding circuit 114
When the normal equation is obtained for each class in each of A and 114E, the process proceeds to step S16, where the tap coefficient determination circuit 115A solves the normal equation generated for each class, thereby obtaining a linear equation for each class. The tap coefficients for the prediction coefficients are obtained, and the coefficient memory 116
A is supplied to the address corresponding to each class and stored. Further, the tap coefficient determination circuit 115E also solves the normal equation generated for each class to obtain a tap coefficient for the residual signal for each class, and stores the tap coefficient in the coefficient memory 116E in the address corresponding to each class. The data is supplied and stored, and the process ends.

【０１９７】以上のようにして、係数メモリ１１６Ａに
記憶された各クラスごとの線形予測係数についてのタッ
プ係数と、係数メモリ１１６Ｅに記憶された各クラスご
との残差信号についてのタップ係数が、図１１の係数メ
モリ１０５に記憶されている。As described above, the tap coefficients for the linear prediction coefficients for each class stored in the coefficient memory 116A and the tap coefficients for the residual signal for each class stored in the coefficient memory 116E are shown in FIG. Eleven coefficient memories 105 are stored.

【０１９８】従って、図１１の係数メモリ１０５に記憶
されたタップ係数は、線形予測演算を行うことにより得
られる真の線形予測係数や残差信号の予測値の予測誤差
（自乗誤差）が、統計的に最小になるように学習を行う
ことにより求められたものであるから、図１１の予測部
１０６と１０７が出力する残差信号と線形予測係数は、
それぞれ真の残差信号と線形予測係数にほぼ一致するこ
ととなり、その結果、これらの残差信号と線形予測係数
によって生成される合成音は、歪みの少ない、高音質の
ものとなる。Accordingly, the tap coefficients stored in the coefficient memory 105 of FIG. 11 are obtained by calculating the true linear prediction coefficient obtained by performing the linear prediction operation and the prediction error (square error) of the prediction value of the residual signal. The residual signal and the linear prediction coefficient output by the prediction units 106 and 107 in FIG. 11 are obtained by performing learning so as to minimize the difference.
The true residual signal and the linear prediction coefficient substantially coincide with each other, and as a result, the synthesized sound generated by the residual signal and the linear prediction coefficient has low distortion and high sound quality.

【０１９９】次に、上述した一連の処理は、ハードウェ
アにより行うこともできるし、ソフトウェアにより行う
こともできる。一連の処理をソフトウェアによって行う
場合には、そのソフトウェアを構成するプログラムが、
汎用のコンピュータ等にインストールされる。Next, the above-described series of processing can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is
Installed on a general-purpose computer.

【０２００】そこで、図１３は、上述した一連の処理を
実行するプログラムがインストールされるコンピュータ
の一実施の形態の構成例を示している。FIG. 13 shows a configuration example of an embodiment of a computer in which a program for executing the above-described series of processing is installed.

【０２０１】プログラムは、コンピュータに内蔵されて
いる記録媒体としてのハードディスク３０５やＲＯＭ３
０３に予め記録しておくことができる。A program is stored in a hard disk 305 or a ROM 3 as a recording medium built in the computer.
03 can be recorded in advance.

【０２０２】あるいはまた、プログラムは、フロッピー
（登録商標）ディスク、CD-ROM(Compact Disc Read Onl
y Memory)，MO(Magneto optical)ディスク，DVD(Digita
l Versatile Disc)、磁気ディスク、半導体メモリなど
のリムーバブル記録媒体３１１に、一時的あるいは永続
的に格納（記録）しておくことができる。このようなリ
ムーバブル記録媒体３１１は、いわゆるパッケージソフ
トウエアとして提供することができる。Alternatively, the program may be a floppy (registered trademark) disk, a CD-ROM (Compact Disc Read Onl
y Memory), MO (Magneto optical) disc, DVD (Digita
l Versatile Disc), a magnetic disk, a semiconductor memory, etc., can be temporarily or permanently stored (recorded) in a removable recording medium 311. Such a removable recording medium 311 can be provided as so-called package software.

【０２０３】なお、プログラムは、上述したようなリム
ーバブル記録媒体３１１からコンピュータにインストー
ルする他、ダウンロードサイトから、ディジタル衛星放
送用の人工衛星を介して、コンピュータに無線で転送し
たり、LAN(Local Area Network)、インターネットとい
ったネットワークを介して、コンピュータに有線で転送
し、コンピュータでは、そのようにして転送されてくる
プログラムを、通信部３０８で受信し、内蔵するハード
ディスク３０５にインストールすることができる。The program can be installed in the computer from the removable recording medium 311 as described above, can be wirelessly transferred from a download site to the computer via an artificial satellite for digital satellite broadcasting, or can be connected to a LAN (Local Area). Network) or the Internet, and the program can be transferred to the computer by wire. The computer can receive the transferred program by the communication unit 308 and install the program on the built-in hard disk 305.

【０２０４】コンピュータは、CPU(Central Processing
Unit)３０２を内蔵している。CPU３０２には、バス３
０１を介して、入出力インタフェース３１０が接続され
ており、CPU３０２は、入出力インタフェース３１０を
介して、ユーザによって、キーボードや、マウス、マイ
ク等で構成される入力部３０７が操作等されることによ
り指令が入力されると、それにしたがって、ROM(Read O
nly Memory)３０３に格納されているプログラムを実行
する。あるいは、また、CPU３０２は、ハードディスク
３０５に格納されているプログラム、衛星若しくはネッ
トワークから転送され、通信部３０８で受信されてハー
ドディスク３０５にインストールされたプログラム、ま
たはドライブ３０９に装着されたリムーバブル記録媒体
３１１から読み出されてハードディスク３０５にインス
トールされたプログラムを、RAM(Random Access Memor
y)３０４にロードして実行する。これにより、CPU３０
２は、上述したフローチャートにしたがった処理、ある
いは上述したブロック図の構成により行われる処理を行
う。そして、CPU３０２は、その処理結果を、必要に応
じて、例えば、入出力インタフェース３１０を介して、
LCD(Liquid CryStal Display)やスピーカ等で構成され
る出力部３０６から出力、あるいは、通信部３０８から
送信、さらには、ハードディスク３０５に記録等させ
る。The computer has a CPU (Central Processing).
Unit 302). The CPU 302 has a bus 3
01 is connected to the input / output interface 310 via the input / output interface 310, and the user operates the input unit 307 including a keyboard, a mouse, a microphone, and the like via the input / output interface 310. When a command is input, the ROM (Read O
nly Memory) 303 is executed. Alternatively, the CPU 302 may execute a program stored in the hard disk 305, a program transferred from a satellite or a network, received by the communication unit 308 and installed in the hard disk 305, or a removable recording medium 311 attached to the drive 309. The program read and installed on the hard disk 305 is stored in a RAM (Random Access Memory).
y) Load into 304 and execute. Thereby, the CPU 30
2 performs processing according to the above-described flowchart or processing performed by the configuration of the above-described block diagram. Then, the CPU 302 transmits the processing result as necessary, for example, via the input / output interface 310.
An output is made from an output unit 306 composed of an LCD (Liquid CryStal Display), a speaker, or the like, or transmitted from the communication unit 308, and further recorded on the hard disk 305.

【０２０５】ここで、本明細書において、コンピュータ
に各種の処理を行わせるためのプログラムを記述する処
理ステップは、必ずしもフローチャートとして記載され
た順序に沿って時系列に処理する必要はなく、並列的あ
るいは個別に実行される処理（例えば、並列処理あるい
はオブジェクトによる処理）も含むものである。Here, in this specification, processing steps for describing a program for causing a computer to perform various processing do not necessarily have to be processed in chronological order in the order described in the flowchart, and may be performed in parallel. Alternatively, it also includes processing executed individually (for example, parallel processing or processing by an object).

【０２０６】また、プログラムは、１のコンピュータに
より処理されるものであっても良いし、複数のコンピュ
ータによって分散処理されるものであっても良い。さら
に、プログラムは、遠方のコンピュータに転送されて実
行されるものであっても良い。Further, the program may be processed by one computer, or may be processed in a distributed manner by a plurality of computers. Further, the program may be transferred to a remote computer and executed.

【０２０７】なお、本実施の形態においては、学習用の
音声信号として、どのようなものを用いるかについて
は、特に言及しなかったが、学習用の音声信号として
は、人が発話した音声の他、例えば、曲（音楽）等を採
用することが可能である。そして、上述したような学習
処理によれば、学習用の音声信号として、人の発話を用
いた場合には、そのような人の発話の音声の音質を向上
させるようなタップ係数が得られ、曲を用いた場合に
は、曲の音質を向上させるようなタップ係数が得られる
ことになる。In this embodiment, no particular reference has been made as to what kind of speech signal to use as a learning speech signal. Alternatively, for example, a song (music) or the like can be adopted. According to the above-described learning process, when a human utterance is used as the learning voice signal, a tap coefficient that improves the sound quality of the voice of such a human utterance is obtained, When a song is used, a tap coefficient that improves the sound quality of the song is obtained.

【０２０８】また、図１１の実施の形態では、係数メモ
リ１０５には、タップ係数をあらかじめ記憶させておく
ようにしたが、係数メモリ１０５に記憶させるタップ係
数は、携帯電話機８１において、図９の基地局８２（あ
るいは交換局８３）や、図示しないＷＷＷ(World Wide
Web)サーバ等からダウンロードするようにすることがで
きる。即ち、上述したように、タップ係数は、人の発話
用や曲用等のように、ある種類の音声信号に適したもの
を、学習によって得ることができる。さらに、学習に用
いる教師データおよび生徒データによっては、合成音の
音質に差が生じるタップ係数を得ることができる。従っ
て、そのような各種のタップ係数を、基地局８２等に記
憶させておき、ユーザには、自身の所望するタップ係数
をダウンロードさせるようにすることができる。そし
て、このようなタップ係数のダウンロードサービスは、
無料で行うこともできるし、有料で行うこともできる。
さらに、タップ係数のダウンロードサービスを有料で行
う場合には、タップ係数のダウンロードに対する対価と
しての代金は、例えば、携帯電話機８１の通話料等とと
もに請求するようにすることが可能である。In the embodiment of FIG. 11, the tap coefficients are stored in advance in the coefficient memory 105. However, the tap coefficients stored in the coefficient memory 105 are the same as those in FIG. The base station 82 (or the exchange 83) or a WWW (World Wide
(Web) It can be downloaded from a server or the like. That is, as described above, a tap coefficient suitable for a certain type of audio signal, such as for a human utterance or music, can be obtained by learning. Further, depending on teacher data and student data used for learning, it is possible to obtain a tap coefficient that causes a difference in sound quality of a synthesized sound. Therefore, such various tap coefficients can be stored in the base station 82 or the like, and the user can download the tap coefficient desired by the user. And, such a tap coefficient download service,
You can do it for free or for a fee.
Further, when the tap coefficient download service is performed for a fee, the price for the download of the tap coefficient can be charged together with the call charge of the mobile phone 81, for example.

【０２０９】また、係数メモリ１０５は、携帯電話機８
１に対して着脱可能なメモリカード等で構成することが
できる。この場合、上述したような各種のタップ係数そ
れぞれを記憶させた、異なるメモリカードを提供するよ
うにすれば、ユーザは、場合に応じて、所望のタップ係
数が記憶されたメモリカードを、携帯電話機８１に装着
して使用することが可能となる。[0209] The coefficient memory 105 stores
It can be configured by a memory card or the like that can be attached to and detached from one. In this case, if a different memory card storing the above-described various tap coefficients is provided, the user can replace the memory card storing the desired tap coefficient with a mobile phone as necessary. 81 can be used.

【０２１０】さらに、本発明は、例えば、ＶＳＥＬＰ(V
ector Sum Excited Liner Prediction)，ＰＳＩ−ＣＥ
ＬＰ(Pitch Synchronous Innovation CELP)，ＣＳ−Ａ
ＣＥＬＰ(Conjugate Structure Algebraic CELP)等のＣ
ＥＬＰ方式による符号化の結果得られるコードから合成
音を生成する場合に、広く適用可能である。Furthermore, the present invention relates to, for example, VSELP (V
ector Sum Excited Liner Prediction), PSI-CE
LP (Pitch Synchronous Innovation CELP), CS-A
C such as CELP (Conjugate Structure Algebraic CELP)
The present invention is widely applicable to a case where a synthesized sound is generated from a code obtained as a result of encoding by the ELP method.

【０２１１】また、本発明は、ＣＥＬＰ方式による符号
化の結果得られるコードから合成音を生成する場合に限
らず、あるコードから、残差信号と線形予測係数を得
て、合成音を生成する場合に、広く適用可能である。The present invention is not limited to the case where a synthesized speech is generated from a code obtained as a result of encoding according to the CELP system, but generates a synthesized speech by obtaining a residual signal and a linear prediction coefficient from a certain code. Widely applicable in cases.

【０２１２】さらに、本実施の形態では、タップ係数を
用いた線形１次予測演算によって、残差信号や線形予測
係数の予測値を求めるようにしたが、この予測値は、そ
の他、２次以上の高次の予測演算によって求めることも
可能である。Furthermore, in the present embodiment, the prediction values of the residual signal and the linear prediction coefficient are obtained by the linear primary prediction operation using the tap coefficients. Can be obtained by a higher-order prediction calculation of

【０２１３】また、ＣＥＬＰ方式では、ソフト補間ビッ
トや、フレームエネルギが、コードデータに含められる
場合があるが、この場合、そのソフト補間ビットや、フ
レームエネルギも用いて、クラス分類を行うようにする
ことが可能である。In the CELP system, the soft interpolation bits and the frame energy may be included in the code data. In this case, the classification is performed using the soft interpolation bits and the frame energy. It is possible.

【０２１４】[0214]

【発明の効果】本発明のデータ処理装置およびデータ処
理方法、並びに第１の記録媒体によれば、コードが復号
され、復号フィルタデータが出力される。さらに、学習
を行うことにより求められた所定のタップ係数が取得さ
れ、タップ係数および復号フィルタデータを用いて、所
定の予測演算を行うことにより、フィルタデータの予測
値が求められる。従って、そのフィルタデータによっ
て、高音質の合成音を生成することが可能となる。According to the data processing apparatus, the data processing method, and the first recording medium of the present invention, a code is decoded and decoded filter data is output. Further, a predetermined tap coefficient obtained by performing learning is obtained, and a predetermined prediction operation is performed using the tap coefficient and the decoded filter data, whereby a predicted value of the filter data is obtained. Therefore, it is possible to generate a high-quality synthesized sound using the filter data.

【０２１５】本発明の学習装置および学習方法、並びに
第２の記録媒体によれば、フィルタデータに対応するコ
ードが復号され、復号フィルタデータが出力される。そ
して、タップ係数および復号フィルタデータを用いて予
測演算を行うことにより得られるフィルタデータの予測
値の予測誤差が、統計的に最小になるように学習が行わ
れ、タップ係数が求められる。従って、そのタップ係数
によって、高音質の合成音を生成するためのフィルタデ
ータを得ることが可能となる。According to the learning apparatus, the learning method, and the second recording medium of the present invention, the code corresponding to the filter data is decoded, and the decoded filter data is output. Learning is performed so that the prediction error of the prediction value of the filter data obtained by performing the prediction operation using the tap coefficient and the decoded filter data is statistically minimized, and the tap coefficient is obtained. Therefore, it is possible to obtain filter data for generating a high-quality synthesized sound by using the tap coefficients.

【図面の簡単な説明】[Brief description of the drawings]

【図１】従来の携帯電話機の送信部の一例の構成を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration of an example of a transmission unit of a conventional mobile phone.

【図２】従来の携帯電話機の受信部の一例の構成を示す
ブロック図である。FIG. 2 is a block diagram illustrating a configuration of an example of a receiving unit of a conventional mobile phone.

【図３】本発明を適用した音声合成装置の一実施の形態
の構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of an embodiment of a speech synthesis device to which the present invention has been applied;

【図４】音声合成フィルタ４７の構成例を示すブロック
図である。FIG. 4 is a block diagram illustrating a configuration example of a speech synthesis filter 47;

【図５】図３の音声合成装置の処理を説明するフローチ
ャートである。FIG. 5 is a flowchart illustrating a process of the speech synthesizer of FIG. 3;

【図６】本発明を適用した学習装置の一実施の形態の構
成例を示すブロック図である。FIG. 6 is a block diagram illustrating a configuration example of an embodiment of a learning device to which the present invention has been applied.

【図７】予測フィルタ６１Ｅの構成例を示すブロック図
である。FIG. 7 is a block diagram illustrating a configuration example of a prediction filter 61E.

【図８】図６の学習装置の処理を説明するフローチャー
トである。FIG. 8 is a flowchart illustrating a process of the learning device in FIG. 6;

【図９】本発明を適用した伝送システムの一実施の形態
の構成例を示す図である。FIG. 9 is a diagram illustrating a configuration example of an embodiment of a transmission system to which the present invention has been applied.

【図１０】携帯電話機８１の構成例を示すブロック図で
ある。FIG. 10 is a block diagram illustrating a configuration example of a mobile phone 81.

【図１１】受信部９４の構成例を示すブロック図であ
る。FIG. 11 is a block diagram illustrating a configuration example of a receiving unit 94.

【図１２】本発明を適用した学習装置の他の実施の形態
の構成例を示すブロック図である。FIG. 12 is a block diagram illustrating a configuration example of another embodiment of a learning device to which the present invention has been applied.

【図１３】本発明を適用したコンピュータの一実施の形
態の構成例を示すブロック図である。FIG. 13 is a block diagram illustrating a configuration example of a computer according to an embodiment of the present invention.

【符号の説明】[Explanation of symbols]

１チャンネルデコーダ，２２適応コードブック記
憶部，２３ゲイン復号器，２４励起コードブッ
ク記憶部，２５フィルタ係数復号器，２６乃至２
８演算器，２９音声合成フィルタ，３０Ｄ／
Ａ変換部，３１スピーカ，４１デマルチプレク
サ，４２Ａフィルタ係数復号器，４２Ｅ残差コー
ドブック記憶部，４３Ａ，４３Ｅタップ生成部，
４４Ａ，４４Ｅクラス分類部，４５Ａ，４５Ｅ係
数メモリ，４６Ａ，４６Ｅ予測部，４７音声合成
フィルタ，４８Ｄ／Ａ変換部，４９スピーカ，
５１加算器，５２₁乃至５２_P 遅延回路，５３
₁乃至５３_P 乗算器，６１ＡＬＰＣ分析部，６１Ｅ
予測フィルタ，６２Ａ，６２Ｅベクトル量子化
部，６３Ａフィルタ係数復号器，６３Ｅ残差コ
ードブック記憶部，６４Ａ，６４Ｅタップ生成部，
６５Ａ，６５Ｅクラス分類部，６６Ａ，６６Ｅ
正規方程式加算回路，６７Ａ，６７Ｅタップ係数決
定回路，６８Ａ，６８Ｅ係数メモリ，７１₁乃至７
１_P 遅延回路，７２₁乃至７２_P 乗算器，７３
加算器，８１₁，８１₂ 携帯電話機，８２₁，８２₂
基地局，８３交換局，９１アンテナ，９２
変復調部，９３送信部，９４受信部，１０１
乃至１０３タップ生成部，１０４クラス分類部，
１０５係数メモリ，１０６，１０７予測部，
１１１Ｅ予測フィルタ，１１２Ａ，１１２Ｅタッ
プ生成部，１１３Ａ，１１３Ｅクラス分類部，１
１４Ａ，１１４Ｅ正規方程式加算回路，１１５
Ａ，１１５Ｅタップ係数決定回路，１１６Ａ，１１
６Ｅ係数メモリ，１１７タップ生成部，２０１
マイク，２０２Ａ／Ｄ変換部，２０３演算
器，２０４ＬＰＣ分析部，２０５ベクトル量子化
部，２０６音声合成フィルタ，２０７自乗誤差
演算部，２０８自乗誤差最小判定部，２０９適応
コードブック記憶部，２１０ゲイン復号器，２１
１励起コードブック記憶部，２１２乃至２１４
演算器，２１５コード決定部，３０１バス，３
０２ CPU，３０３ ROM，３０４ RAM，３０５
ハードディスク，３０６出力部，３０７入力
部，３０８通信部，３０９ドライブ，３１０
入出力インタフェース，３１１リムーバブル記録媒
体1 channel decoder, 22 adaptive codebook storage, 23 gain decoder, 24 excitation codebook storage, 25 filter coefficient decoder, 26 to 2
8 arithmetic unit, 29 speech synthesis filter, 30 D /
A conversion unit, 31 speakers, 41 demultiplexer, 42A filter coefficient decoder, 42E residual codebook storage unit, 43A, 43E tap generation unit,
44A, 44E Classifier, 45A, 45E coefficient memory, 46A, 46E predictor, 47 speech synthesis filter, 48 D / A converter, 49 speaker,
51 adder, 52_{1 to} 52_P delay circuit, 53
_{1 to} 53_P multiplier, 61A LPC analyzer, 61E
Prediction filter, 62A, 62E vector quantization unit, 63A filter coefficient decoder, 63E residual codebook storage unit, 64A, 64E tap generation unit,
65A, 65E Classifier, 66A, 66E
Normal equation addition circuit, 67A, 67E tap coefficient determination circuit, 68A, 68E coefficient memory, 71_{1 to} 7
1_P delay circuit, 72_{1 to} 72_P multiplier, 73
Adder, 81₁ , 81₂ Mobile phone, 82₁ , 82₂
Base station, 83 exchange, 91 antenna, 92
Modulation / demodulation unit, 93 transmitting unit, 94 receiving unit, 101
To 103 tap generator, 104 classifier,
105 coefficient memory, 106, 107 prediction unit,
111E prediction filter, 112A, 112E tap generation unit, 113A, 113E class classification unit, 1
14A, 114E Normal equation addition circuit, 115
A, 115E tap coefficient determination circuit, 116A, 11
6E coefficient memory, 117 tap generator, 201
Microphone, 202 A / D conversion unit, 203 arithmetic unit, 204 LPC analysis unit, 205 vector quantization unit, 206 speech synthesis filter, 207 square error calculation unit, 208 minimum square error determination unit, 209 adaptive codebook storage unit, 210 Gain decoder, 21
1 excitation codebook storage unit, 212 to 214
Arithmetic unit, 215 code decision unit, 301 bus, 3
02 CPU, 303 ROM, 304 RAM, 305
Hard disk, 306 output unit, 307 input unit, 308 communication unit, 309 drive, 310
I / O interface, 311 Removable recording medium

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤森泰弘東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者渡辺勉東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者木村裕人東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 5D045 CA01 CC08 5J064 AA01 BA13 BB03 BB13 BC01 BC12 BD02 BD03 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Yasuhiro Fujimori 6-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Tsutomu Watanabe 6-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Hiroto Kimura 6-35 Kita Shinagawa, Shinagawa-ku, Tokyo Sony Corporation F-term (reference) 5D045 CA01 CC08 5J064 AA01 BA13 BB03 BB13 BC01 BC12 BD02 BD03