JPH02287500A

Movatterモバイル変換

Info

Publication number: JPH02287500A
Application number: JP1107615A
Authority: JP
Inventors: Yoshiaki Asakawa; 淺川　吉章; Hiroshi Ichikawa; 市川　熹
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-04-28
Filing date: 1989-04-28
Publication date: 1990-11-27

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

Translated fromJapanese

【発明の詳細な説明】[Detailed description of the invention]【産業上の利用分野】[Industrial application field]

本発明は音声の高能率符号化装置に係り、特に高品質な
再生音声を高い情報圧縮率で得ることに好適な音声符号
化方式に関する。［従来の技術］従来、音声高能率符号化方式には、様々な方式が提案さ
れてきた１例えば、中田和男著「ディジタル情報圧縮」
　（廣済堂産報出版、電子科学シリーズ１００）には、
様々な方式がわかりやすく解説されており、波形符号化
方式や情報源符号化方式（パラメータ符号化方式）に関
する多数の方式が示されている。一般に、各種高能率符号化方式は、音声の情報の存在が
偏っている点に注目し、情報の存在している部分に符号
の割当を厚くすることにより実現しているが、この点を
さらに積極的に推し進め、複数のパラメータの組合せと
しての情報の偏りに注目し、パラメータの組合せセット
（ベクトルと呼ぶ）に対し、音声情報の存在している部
分に符号の割当を厚くする、「ベクトル量子化」と称さ
れる方式［たとえば、Ｓ、ルーカス他著、“セグメント
　クワンテイゼーション　フォー　ベリー〇−レート　
スピーチ　コーディングプロシーディングズ　オブ　アイ・シー・エイ・ニス・
ニス・ピー　１９８２年、第１５６３頁（Ｓ、Ｒｏｕｃ
ｏｓ　ａｔ　ａｌ、、”Ｓｅｇｍｅｎｔ　ｑｕａｎｔｉ
ｚａｔｉｏｎ　ｆｏｒｖｅｒｙ−１ｏｗ−ｒａｔｅ　５
ｐｅｅｃｈ　ｃｏｄｉｎｇ”Ｐｒｏｃ、　ＩＣＡＳＳＰ
　８２゜ｐ、１５６３（１９８２））　］が注目されて
いる。通常のベクトル量子化では、入力ベクトルを最も類似し
たコードベクトルのコードで表わし、伝送あるいは蓄積
し、このコードからコードベクトルを読みだして入力ベ
クトルの量子化結果とする。したがって、入力ベクトルはこれに最も近いコードベク
トルに置換されることになる。これに対し、コードブック中の複数のコードベクトルの
情報を用いて入力ベクトルを表わす方法がある。入力ベ
クトルと各コードベクトルとの級関数を用いて内挿する
ファジィベクトル量子化法［たとえば、Ｈ，Ｐ、ツエン
他著、″ファジィベクトル　クワンティゼーション　ア
プライドツー　ハイデン　マルコフ　モデリング″、ア
イ・シー・エイ・ニス・ニス・ピー　１９８７年、４月
（Ｈ，Ｐ、Ｔｓｅｎｇ　ａｔ　ａｌ、、”Ｆｕｚｚｙ　
ｖｅｃｔｏｒ　ｑｕａｎｔｉｚａｔｉｏｎ　ａｐｐｌｉ
ｅｄ　ｔｏ　ｈｉｄｄｅｎ　Ｍａｒｋｏｖ　ｍｏｄｅｌ
ｉｎｇ”ＩＣＡＳＳＰ　８７，４　（１９８７））　］
である。ファジィベクトル量子化では、同じコードブックを用い
ても通常のベクトル量子化よりも少ない誤差で入力ベク
トルを近似できる。但し、級関数の情報量や処理量が多
く、そのままでは高能率符号化には使えないため、改良
方法が提案されている（たとえば、中村他、″ファジィ
ベクトル量子化を用いたスペクトログラムの正規化″、
日本音響学会誌４５巻２号（平成１−２）、特願昭６３
−２４０９７２、特願平１−５７７０６など）。The present invention relates to a high-efficiency audio encoding device, and particularly to an audio encoding method suitable for obtaining high-quality reproduced audio at a high information compression rate. [Prior art] Various methods have been proposed for high-efficiency audio encoding methods.For example, "Digital Information Compression" by Kazuo Nakata
(Kosaido Sanpo Publishing, Electronic Science Series 100),
Various methods are explained in an easy-to-understand manner, and a large number of methods related to waveform encoding methods and information source encoding methods (parameter encoding methods) are shown. In general, various high-efficiency encoding methods focus on the fact that the presence of voice information is unevenly distributed, and achieve this by thickening the allocation of codes to the parts where information exists. We are actively promoting the ``vector quantum'' method, which focuses on the bias of information as a combination of multiple parameters, and assigns a thicker code to the part where audio information exists for a parameter combination set (called a vector). [For example, S. Lucas et al., "Segment quantization for very high rates"]
Speech Coding Proceedings of I.C.A. Nis.
Niss P. 1982, p. 1563 (S, Rouc
os at al,,”Segment quanti
zation forvery-1ow-rate 5
peach coding”Proc, ICASSP
82゜p, 1563 (1982)) is attracting attention. In normal vector quantization, an input vector is represented by a code of the most similar codevector, transmitted or stored, and the codevector is read from this code and used as the quantization result of the input vector. Therefore, the input vector will be replaced with the code vector closest to it. On the other hand, there is a method of representing an input vector using information on a plurality of code vectors in a codebook. A fuzzy vector quantization method that interpolates using a class function between the input vector and each code vector [for example, H. P. Tseng et al., "Fuzzy Vector Quantization Applied to Heiden Markov Modeling", I.C.A.・Nis Nis P April 1987 (H, P, Tseng at al, ``Fuzzy
vector quantization appli
ed to hidden Markov model
ing" ICASSP 87, 4 (1987))]
It is. Fuzzy vector quantization can approximate an input vector with less error than normal vector quantization even if the same codebook is used. However, the amount of information and processing amount of the class function is large, and it cannot be used as is for high-efficiency coding, so improved methods have been proposed (for example, Nakamura et al., ``Spectrogram Normalization Using Fuzzy Vector Quantization''). ″,
Journal of the Acoustical Society of Japan, Vol. 45, No. 2 (Heisei 1-2), Patent Application 1986
-240972, Japanese Patent Application No. 1-57706, etc.).

【発明が解決しようとする課題】[Problem to be solved by the invention]

上記従来方式を音声信号の高能率符号化に適用する場合
、入力信号の性質が十分考慮されておらず、必ずしも良
好な特性が得られないという問題があった。すなわち、
音声信号では信号の特徴の時間的な変化は比較的緩やか
であるから、直前の量子化結果（再生ベクトル）は次の
入力ベクトルの良い近似となっている。したがってこの
再生ベクトルを利用すれば、より精度良く量子化するこ
とが出来る。本発明の目的は、特に音声信号や画像信号のように信号
の時間的、若しくは空間的な変化が比較的緩やかな信号
のベクトル量子化に好適な、ファジィベクトル量子化方
式を提供することにある。When the conventional method described above is applied to high-efficiency encoding of audio signals, there is a problem in that the characteristics of the input signal are not sufficiently taken into consideration, and good characteristics are not necessarily obtained. That is,
Since temporal changes in signal characteristics of audio signals are relatively gradual, the immediately previous quantization result (reproduction vector) is a good approximation of the next input vector. Therefore, by using this reproduction vector, it is possible to perform quantization with higher accuracy. An object of the present invention is to provide a fuzzy vector quantization method that is suitable for vector quantization of signals in which temporal or spatial changes are relatively gradual, such as audio signals and image signals. .

【課題を解決するための手段１上記目的を達成するために、予め用意しであるコードブ
ックのコードベクトルと共に、現在量子化処理を行って
いる入力ベクトルに先行して量子化処理されたベクトル
を逆量子化して得られた再生ベクトルを用いる。入力ベ
クトルの量子化に再生ベクトルを用いるために、符号化
側に復号化側にあるのと同一の機能を有する局所逆量子
化手段と、再生したベクトルを、次の入力ベクトルの量
子化処理まで保持するための記憶手段、量子化時に再生
ベクトルを読み込む手段を有する。【作用】本発明の代表的な手順について、その作用を説明する。伝送したい音声が入力されると、一定間隔のフレームに
分けられ、フレームごとに分析部において特徴ベクトル
が抽出される。この特徴ベクトル（入力ベクトル）とコ
ードブック中のコードベクトルが比較される。さらに、
入力ベクトルに近い順にあらかじめ定められた個数のコ
ードベクトルを選びだす。もし、コードベクトルごとに
近傍ベクトルをあらかじめ登録しておく方式（特願昭６
３−２４０９７２）ならば、登録されている近傍ベクト
ルを読みだしておく、これと同時に、前フレームでベク
トル量子化された入力ベクトルの局所逆量子化結果であ
る再生ベクトルを読みだす。コードブックから読みだされたコードベクトルと再生ベ
クトルに対し、予め定められた評価基準に従って量子化
歪等を評価し、使用するベクトルを選択する。これらの
ベクトルのコードと入力ベクトルに対する帰属度（級関
数）を用いて入力ベクトルを量子化する。コード及び級
関数は伝送されると同時に局所逆量子化器に入力され、
入力ベクトルを量子化するのに用いたコードベクトルと
再生ベクトルを用いて逆量子化し、この再生ベクトルを
格納するにのように、ファジィベクトル量子化に前フレームの再生
ベクトルを使用することで、入力ベクトルとの類似度の
高いコードベクトルを追加したのと同じ効果があり、情
報量を増加させずにより少ない量子化歪でベクトル量子
化することができる。［実施例］以下、本発明の実施例を図面を用いて説明する。第１図は本発明の一実施例を説明するためのブロック図
である。送信側と受信側を対にした一方向のみを示して
おり、逆方向への通信路は、図が複雑になるため省略し
である。第１図において、入力音声１０１はアナログ・ディジタ
ル（Ａ／Ｄ）変換器１０２を経て、２面構成のバッファ
メモリ１０３に入力される。このメモリは以下の処理の
時間調整と、入力音声の中断を防止するために設けられ
ている。バッファメモリ１０３からの音声は分析部１０
４に入力され、ピッチ情報１０７．スペクトル情報１０
６、レベル情報１０５が求められる。スペクトル情報１０６は本発明を適用したファジィベク
トル量子化部１０８に加えられ、ベクトルコード１０９
と級関数（メンバシップ関数）１１０を得る。ベクトル
コード１０９、級関数１１０、ピッチ情報１０７．レベ
ル情報１０５は送信部１１１．伝送路１１２を経て受信
部１１３に送られる。受信側では受信部で受けたベクトルコード１゜９′、級
関数１１０’、ピッチ情報１０７′　レベル情報１０５
′はファジィベクトル逆量子化部１１４に加えられ、ス
ペクトル情報１１５が復元され、ピッチ情報１０７′　
レベル情報１０５′と共に合成部１１６に加えられる。合成部１１６では音声波形に復号され、出力用の２面バ
ッファメモリ１１７を経て、ディジタル・アナログ（Ｄ
／Ａ）変換器１１８によりアナログ信号に変換され、出
力音声１１９として再生される。以下、各部分を詳細に説明する。第２図は分析部１０４を説明するための図である。本実施例では、分析部はパワスペクトル包絡（ＰＳＥ）
分析法による。ＰＳＥ分析法は、中高等の論文”パワー
スペクトル包絡（ＰＳＥ）音声分析・合成系″５日本音
響学会誌４４巻１１号（昭６３−１１）に詳細に述べら
れている。ここではその概要を述べる。第２図において、ピッチ抽出部２０１は入力音声のピッ
チ情報（ピッチ周波数またはピッチ周期）を抽出する。ピッチ抽出の方法は、相関法やＡＭＤＦ法など公知の方
法を用いれば良い。波形切り出し部２０３は入力音声か
らスペクトル情報を分析するための波形区間を切り出す
ものであり、２０〜６０　ｍ　ｓ程度の区間を切り出す
。固定長の区間とすることが多いが、ピッチ周期に依存し
、その３倍程度の可変長にすることもある。切り出され
た波形は、フーリエ変換部２０４に送られ、フーリエ級
数に変換される。このとき。切り出された波形にハミング窓等、通常用いられる窓関
数を掛けた後１前後に零データを埋め込み。２０４８点のデータとし、高速フーリエ変換（ＦＦＴ）
を用いることで、高速かつ周波数分解能の高いデータが
得られる。フーリエ係数を絶対値で表示したものが切り
出し波形の周波数成分、すなわちスペクトルとなる。切
り出し波形が周期構造を有する場合は、スペクトルはピ
ッチの高調波による線スペクトル構造を有する。ピッチ再標本化部２０５では、ＦＦＴにより得られたス
ペクトル情報の中から、ピッチ周波数の高調波成分（線
スペクトル成分）のみを取り出す。このようにして取り出したデータは、後述の余弦級数展
開時の周期πに対応付けて、以下考える。パワスペクトル化部２０６は、スペクトルの各成分を自
乗し、パワスペクトルに変換する。さらに、対数化部２
０７は、各成分を対数化し、対数パワスペクトルを得る
。レベル正規化部２０８は入力音声の大きさに基づくレベ
ル変動を吸収するものであるが、次の余弦変換部２０９
において、まとめて抽出しても良い。余弦変換部２０９は対数パワスペクトルを再標本化した
データを用いて、有限項の余弦級数で近似的に表現する
ものである。項数ｍは、通常２５程度に設定する。パワ
スペクトル包絡を次のように表現する。Ｙ＝Ａｏ＋Ａ、ｃｏｓλ＋Ａ２ｃｏｓ２λ＋−＋　Ａ−
ｃｏｓ　ｍλ　　（１）係数Ａは、再標本化されたパワ
スペクトルデータと、（１）式によるＹとの２乗誤差が
最小となるように求められる。係数の第０項Ａ０は入力
のレベルを表わしているのでレベル情報１０５として、
Ａｏ、・・・）Ａ＠をスペクトル情報１０６として出力
する。次に第３図を用いて本発明の再生ベクトルを利用するフ
ァジィベクトル量子化部を説明する。第３図において、
コードブック４０１にはコードベクトルの要素の値とそ
のコードが記憶されている。距離計算部４０２において、スペクトル情報（入力ベク
トル）１０６が入力されると、再生ベクトルメモリ４０
９から前フレームの再生ベクトル４１１′が読みだされ
、また、コードブック４０１から各コードベクトルが読
みだされ、入力ベクトル１０６との距離が計算され、距
離値４０３が出力される。ここで［１尺度は５ベクトル
の各要素に重み付けしたユークリッド距離であるが、他
の適当な尺度を用いても良いことは言うまでもない。ま
た、ピッチ情報１０７などを利用して、距離計算の対象
とするコードベクトルの範囲を限定することも可能であ
る。候補ベクトル選択部４０４において、次に述べるベクト
ル評価の対象とするコードベクトルの候補を選択する。ここでは、距離値４０３を参照して、距離の小さいもの
から所定個数（０個）を選択し、距離の小さい順に並べ
替えた候補ベクトルのコード４゜５として出力する。候
補ベクトルに再生ベクトル４１１′が含まれている場合
は、そのコードとしてコードベクトルに割り振られてい
るコード以外の値、たとえばＯを割り振る。候補ベクト
ルの選択基準は上記のほか、距離値が所定の閾値よりも
小さいものとしても良いし、所定個数以下で、がっ、距
離値が所定閾値以下としても良い。また、コードブック
中の全コードベクトルを対象とする場合には１本候補ベ
クトル選択部は不要である。また多くの場合、前フレームの再生ベクトルは候補ベク
トルとして選ばれるので、再生ベクトルは常に候補ベク
トルとすることもできる。さらに。コードブック中のコードベクトルのうち人カベクトルに
最も類似したものと再生ベクトルのみを候補ベクトルと
すれば、候補ベクトル選択部の機能は非常に簡単になる
。ベクトル選択部４０６では、候補ベクトルに対し、以下
の手順で量子化歪を算出し、評価する。入力ベクトルとの距離値４０３の最小値ｄ　ｍｌｎが、
最近傍ベクトルで入力ベクトルをベクトル量子化（いわ
ゆる通常のベクトル量子化）したときの量子化歪になる
ので、まずこれを評価の基準とする。次に、最近傍ベクトル以外の候補ベクトルを一つずつ最
近傍ベクトルと組み合わせ、ファジィベクトル量子化し
、量子化歪を算出する。ファジィベクトル量子化につい
ては、中村等の文献「ファジィベクトル量子化を用いた
スペクトログラムの正規化」　（日本音響学会誌４５巻
２号（１９８９））及びそこで引用されている文献に詳
しく述べられているので、ここではその概要を説明する
。ファジィベクトル量子化では、入力ベクトルを複数個の
コードベクトルに対する帰属度によって表現する。帰属
度は級関数（メンバシップ関数）により、数値化される
０級関数の求め方の一例を次式に示す。今、。個のコードベクトル（ｖ１＃・・・、　ｖｃ）を
対象とするとき、入力ベクトルＸｋとコードベクトルＶ
ｔとの距離をｄｔｋとする。入力ベクトルがどのコード
ベクトルにも一致しない場合は、各コードベクトルに対
する級関数ｕｔｋは次式によって求まる。二二に、ｐはファジィネスと呼ぶパラメータで。通常１．５程度の値とする。もし、入力ベクトルがコー
ドベクトルのいずれかに一致したときは、そのコードベ
クトルに対する級関数の値を１とし、他を０とする。次に１級関数からベクトルを再生する（逆量子化操作）
について説明する。再生ベクトルｘｋはコードベクトル
の線形結合で表わされる。入力ベクトルＸｈと再生ベクトルｘｋ′　との誤差（距
離）がファジィベクトル量子化による量子化歪である。最近傍ベクトルと、残りの候補ベクトルを、順次、一つ
ずつ用いてファジィベクトル量子化し、それぞれの量子
化歪を求める。これらの量子化歪の最小値がｄ　１１１
＋１以下のとき、この最小値を与える候補ベクトルを選
択する。また、このときの最小値を改めてｄ□０と置く
６次に、最近傍ベクトルと今選択されたベクトルに残り
の候補ベクトルを順次−つずつ追加し、同様の手順を候
補ベクトルが無くなるまで繰り返す０以上は候補ベクト
ルの中で量子化歪を減少させるものは、すべて選択する
場合である。この他、選択されたベクトル数が所定個数
に達したら、処理を打ち切るようにすることもできる。これ以外のベクトル選択の方法は、本発明の発明者によ
る出願（中研３１８９０３９６）に詳述されているので
、ここでは省略する。ファジィベクトル量子化部４０８では、ベクトル選択部
４０６の出力であるベクトルコード４０７を参照して、
選択されたベクトルを用いて入力ベクトルをファジィベ
クトル量子化する。具体的には前述の（２）式に基づい
て級関数を算出する。出力は選択されたベクトルコード１０９と、級関数１１
０である０選択されるベクトル数が可変のときは、その
数の情報も出力する。また、級関数はその性質上、総和
は１となるので、ベクトル数より１少ない個数だけ出力
すれば良い、また、実際に選択されたベクトル数が所定
の個数（固定）に満たないときは、残りの個数に対する
級関数値はＯとすれば良い。次に本発明の特徴である逆量子化について説明する。ファジィベクトル量子化の出力であるベクトルコード１
０９と級関数１１０は送信部１１１へ送出されるととも
に、局所逆量子化器４１０へ入力される０局所逆量子化
器４１０では、入カベクトル１０６の量子化に用いたの
と同じコードベクトルと再生ベクトル４１１′を用いて
、入力ベクトル１０６の再生ベクトル４１１を計算する
。具体的には前述の（３）式に基づいて再生ベクトルＸ
″４１１を算出する。再生ベクトル４１１は再生ベクト
ルメモリ４０９へ転送され、記憶される。このとき、前フレームにおける再生ベクトルは消去され
る。なお、更新された再生ベクトル４１１は、受信側で
得られる再生ベクトルと同一のものであることは言うま
でもない。以上では、本発明のベクトル選択機能を従来提案されて
いる通常のファジィベクトル量子化に適用する場合につ
いて説明した。これに対し１本発明の発明者らが既に提
案している、近傍ベクトルをコードベクトルごとに事前
に登録しておくタイプのファジィベクトル量子化（特願
昭６３−２４０９７２）に適用する場合について簡単に
説明する。この場合、候補ベクトルは最近値ベクトルとそれに対し
て事前に登録されている近傍ベクトルおよび再生ベクト
ル４１１′である。したがって候補ベクトル選択部４０
４の機能は大幅に簡略化されている。また最終段のファ
ジィベクトル量子化部４０８の出力のうち、ベクトルコ
ードは最近値ベクトルコードだけである。候補ベクトル
のうち選択されなかったものは１級関数値を０にするこ
とによって判別できる。次に復号側（受信側）について説明する。第４図はファジィベクトル逆量子化部１１４を説明する
ための図である。ベクトルコード１０９′が受信される
と、コードブック７０１から対応するコードベクトルｖ
１が読みだされ、これと同時に再生ベクトルメモリ７０
３から前フレームの再生ベクトルが読みだされる。これ
と受信された級関数ｕｉｂｌｌｏ’　を用いて、ベクト
ル再生部７０２において、前述の（３）式によりベクト
ルを再生する。なお、受信側のコードブック７０１は送
信側のコードブック４０１と同一の内容であることは言
うまでもない、また再生ベクトル７０４も符号化側の前
フレームの再生ベクトル４１１′と同一である。再生ベ
クトルＸｈ””（ＡＸ′＋Ａｉ’ｔ＊−ａ＊Ａｍ’）はスペク
トル情報１１５として合成部１１６に送られる。次に、合成部１１６を第５図を用いて説明する。同図において、対数パワスペクトル再生部８０１では、
伝送されたレベル情報Ａ、’１０５’　と再生ベクトル
（スペクトル情報１１５）の各要素Ａ　１’　＋　Ａｚ
′ｖ　＊　１１　＊　ｇ　Ａｗｒ’を用いて対数パワス
ペクトルＹ’　８０２を次式にしたがって得る。Ｙ’　＝Ａ’　＋Ａ’　ｃｏｓ　λ＋Ａ’、ｃｏｓ　２
λ＋−・＋　Ａ：、ｃｏｓ　ｍλ　　（４）再生された
対数パワスペクトルＹ’８０２は逆対数変換部８０３で
変換（１、／　２　）　ｌｏｇ−１を行い、零位相化ス
ペクー本ル８０４を得、逆フーリエ変換部８０５へ送ら
れる。逆フーリエ変換部８０５では高速フーリエ逆変換
（ＩＦＦＴ）により音声素片８０６が得られる。音声素
片８０６は波形合成部８０７でピッチ情報１０７′にし
たがって順次ピッチ間隔だけずらしながら加えあわせら
れ、再生音声８０８として出力される。本実施例によれば、特に母音部など類似のスペクトルが
連続する部分において量子化歪が低減し。滑らかな再生音声が得られる。また、ファジィベクトル
量子化に使用するベクトルを選択しているので、子音か
ら母音というようにスペクトルの性質が大きく変化した
場合でも再生音声がなまることなく、明瞭度の高い再生
音声が得られた。以上の実施例以外に、コードブックを２種類持つことも
できる。すなわち第１のコードブックは第１の実施例で
述べたものと同様に、入力ベクトルの空間を分割したと
きの代表ベクトルであり、第２のコードブックは量子化
誤差用である。第２のコードブックは次のように使用さ
れる。すなわち、入力ベクトルを第１のコードブックを
用いて通常のベクトル量子化する。このときの量子化誤
差（歪）を第２のコードブックを用いてさらにベクトル
量子化する。この様に２段階のベクトル量子化を行った
後の量子化誤差を第１の実施例でファジィベクトル量子
化したときの量子化誤差と比較し、誤差が小さくなる方
の量子化方式を選択するものである。第２の実施例によ
れば、ファジィベクトル量子化で量子化歪の低減効果が
小さい入力ベクトルに対して、専用の誤差用コードブッ
クを持つことにより、良好な量子化特性を得ることがで
きた。[Means for solving the problem 1] In order to achieve the above purpose, in addition to the code vector of the codebook prepared in advance, a vector that has been quantized prior to the input vector that is currently being quantized is used. The reproduction vector obtained by inverse quantization is used. In order to use the reproduced vector for quantization of the input vector, a local inverse quantization means having the same function as that on the decoding side is provided on the encoding side, and the reproduced vector is used until the quantization process of the next input vector. It has storage means for holding and means for reading reproduction vectors at the time of quantization. [Operation] The operation of typical procedures of the present invention will be explained. When audio to be transmitted is input, it is divided into frames at regular intervals, and a feature vector is extracted for each frame in the analysis section. This feature vector (input vector) is compared with the code vector in the codebook. moreover,
A predetermined number of code vectors are selected in order of proximity to the input vector. If there is a method in which neighboring vectors are registered in advance for each code vector (patent application
3-240972), the registered neighboring vectors are read out, and at the same time, the reproduced vector which is the result of local inverse quantization of the input vector vector quantized in the previous frame is read out. The code vectors and reproduction vectors read from the codebook are evaluated for quantization distortion, etc. according to predetermined evaluation criteria, and the vectors to be used are selected. The input vector is quantized using the code of these vectors and the degree of membership (class function) for the input vector. The code and class function are input to a local inverse quantizer at the same time as they are transmitted,
By using the previous frame's reconstructed vector for fuzzy vector quantization, the input vector is dequantized using the code vector and reconstructed vector used to quantize the input vector, and this reconstructed vector is stored. This has the same effect as adding a code vector with a high degree of similarity to the vector, and vector quantization can be performed with less quantization distortion without increasing the amount of information. [Examples] Examples of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram for explaining one embodiment of the present invention. Only one direction of the transmission side and reception side is shown, and the communication path in the opposite direction is omitted because it would complicate the diagram. In FIG. 1, input audio 101 is input to a two-sided buffer memory 103 via an analog-to-digital (A/D) converter 102. This memory is provided to adjust the time of the following processing and to prevent input audio from being interrupted. The audio from the buffer memory 103 is sent to the analysis unit 10
4, pitch information 107. Spectrum information 10
6. Level information 105 is obtained. The spectral information 106 is added to the fuzzy vector quantization unit 108 to which the present invention is applied, and the vector code 109 is
A class function (membership function) 110 is obtained. Vector code 109, class function 110, pitch information 107. The level information 105 is sent to the transmitter 111. The signal is sent to the receiving section 113 via the transmission path 112. On the receiving side, the vector code 1°9' received by the receiving section, class function 110', pitch information 107', level information 105
' is added to the fuzzy vector dequantization unit 114, the spectrum information 115 is restored, and the pitch information 107'
It is added to the synthesis section 116 together with the level information 105'. The synthesizer 116 decodes it into an audio waveform, passes through a two-sided buffer memory 117 for output, and converts it into a digital/analog (D
/A) It is converted into an analog signal by the converter 118 and reproduced as the output audio 119. Each part will be explained in detail below. FIG. 2 is a diagram for explaining the analysis section 104. In this embodiment, the analysis section uses power spectral envelope (PSE)
Depends on analytical method. The PSE analysis method is described in detail in a junior and senior high school paper entitled "Power Spectral Envelope (PSE) Speech Analysis and Synthesis System" 5, Journal of the Acoustical Society of Japan, Vol. 44, No. 11 (1986-11). Here we will give an overview. In FIG. 2, a pitch extraction unit 201 extracts pitch information (pitch frequency or pitch period) of input speech. As a pitch extraction method, a known method such as a correlation method or an AMDF method may be used. The waveform cutting unit 203 cuts out a waveform section for analyzing spectrum information from the input voice, and cuts out a section of about 20 to 60 ms. Although it is often a fixed length section, depending on the pitch period, the length may be variable, about three times that length. The extracted waveform is sent to the Fourier transform unit 204 and transformed into a Fourier series. At this time. After multiplying the extracted waveform by a commonly used window function such as a Hamming window, zero data is embedded around 1. Data of 2048 points, fast Fourier transform (FFT)
By using , data with high speed and high frequency resolution can be obtained. The Fourier coefficients expressed as absolute values become the frequency components of the cut-out waveform, that is, the spectrum. When the cut-out waveform has a periodic structure, the spectrum has a line spectrum structure due to pitch harmonics. The pitch resampling unit 205 extracts only the harmonic components (line spectrum components) of the pitch frequency from the spectrum information obtained by FFT. The data extracted in this manner will be considered below in association with the period π during cosine series expansion, which will be described later. The power spectrum conversion unit 206 squares each component of the spectrum and converts it into a power spectrum. Furthermore, the logarithmization unit 2
07 logarithms each component to obtain a logarithmic power spectrum. The level normalization unit 208 absorbs level fluctuations based on the loudness of the input audio, but the next cosine conversion unit 209
, they may be extracted all at once. The cosine transform unit 209 uses data obtained by resampling the logarithmic power spectrum and approximately expresses it using a cosine series of finite terms. The number of terms m is usually set to about 25. The power spectrum envelope is expressed as follows. Y=Ao+A, cosλ+A2cos2λ+-+ A-
cos mλ (1) The coefficient A is determined so that the square error between the resampled power spectrum data and Y according to equation (1) is minimized. The 0th term A0 of the coefficient represents the level of the input, so as the level information 105,
Ao, . . . )A@ is output as spectrum information 106. Next, a fuzzy vector quantization section using reproduction vectors according to the present invention will be explained with reference to FIG. In Figure 3,
The code book 401 stores the values of elements of code vectors and their codes. When the spectrum information (input vector) 106 is input to the distance calculation unit 402, the reproduction vector memory 40
9, the reproduction vector 411' of the previous frame is read out, each code vector is read out from the codebook 401, the distance from the input vector 106 is calculated, and a distance value 403 is output. Here, [1 scale is the Euclidean distance weighted to each element of the 5 vectors, but it goes without saying that other suitable scales may be used. Furthermore, it is also possible to limit the range of code vectors targeted for distance calculation by using the pitch information 107 or the like. Candidate vector selection section 404 selects code vector candidates to be subjected to vector evaluation, which will be described below. Here, with reference to the distance value 403, a predetermined number (0) of vectors with the smallest distance are selected and output as code 4.5 of candidate vectors sorted in descending order of distance. If the candidate vector includes the reproduced vector 411', a value other than the code assigned to the code vector, for example O, is assigned as its code. In addition to the above criteria for selecting candidate vectors, the distance value may be smaller than a predetermined threshold, or the number of candidate vectors may be less than or equal to a predetermined number, and the distance value may be less than or equal to a predetermined threshold. Furthermore, when all code vectors in the codebook are targeted, the single candidate vector selection section is not necessary. Furthermore, in many cases, the reproduction vector of the previous frame is selected as the candidate vector, so the reproduction vector can always be the candidate vector. moreover. If only the code vectors in the codebook that are most similar to the human vector and the reproduced vectors are used as candidate vectors, the function of the candidate vector selection section becomes very simple. The vector selection unit 406 calculates and evaluates quantization distortion for the candidate vector using the following procedure. The minimum value dmln of the distance value 403 with respect to the input vector is
Since this is the quantization distortion when the input vector is vector quantized using the nearest neighbor vector (so-called normal vector quantization), this is first used as the criterion for evaluation. Next, candidate vectors other than the nearest neighbor vector are combined with the nearest neighbor vector one by one, fuzzy vector quantization is performed, and quantization distortion is calculated. Fuzzy vector quantization is described in detail in the literature by Nakamura et al., "Normalization of spectrograms using fuzzy vector quantization" (Journal of the Acoustical Society of Japan, Vol. 45, No. 2 (1989)) and the literature cited therein. Therefore, I will provide an overview here. In fuzzy vector quantization, an input vector is expressed by degrees of belonging to a plurality of code vectors. The degree of membership is expressed as a numerical value using a class function (membership function). An example of how to obtain a class 0 function is shown in the following equation. now,. When target code vectors (v1#..., vc), input vector Xk and code vector V
Let the distance from t be dtk. If the input vector does not match any code vector, the class function utk for each code vector is determined by the following equation. Second, p is a parameter called fuzziness. Usually the value is about 1.5. If the input vector matches one of the code vectors, the value of the class function for that code vector is set to 1, and the others are set to 0. Next, reproduce the vector from the first-class function (inverse quantization operation)
I will explain about it. The reproduction vector xk is expressed as a linear combination of code vectors. The error (distance) between the input vector Xh and the reproduced vector xk' is quantization distortion due to fuzzy vector quantization. Fuzzy vector quantization is performed using the nearest neighbor vector and the remaining candidate vectors one by one, and the quantization distortion of each is determined. The minimum value of these quantization distortions is d 111
When it is less than or equal to +1, select the candidate vector that gives this minimum value. Also, set the minimum value at this time as d□0 again.6 Next, add the remaining candidate vectors to the nearest neighbor vector and the currently selected vector one by one, and repeat the same procedure until there are no more candidate vectors. A value of 0 or more means that all candidate vectors that reduce quantization distortion are selected. In addition, the process may be terminated when the number of selected vectors reaches a predetermined number. Other methods of vector selection are detailed in the application filed by the inventor of the present invention (Chuuken 31890396), so they will be omitted here. In the fuzzy vector quantization unit 408, referring to the vector code 407 that is the output of the vector selection unit 406,
Fuzzy vector quantize the input vector using the selected vector. Specifically, the class function is calculated based on the above-mentioned equation (2). The output is the selected vector code 109 and the class function 11
If the number of selected vectors is variable, information on that number is also output. Also, because the sum of the class functions is 1, it is only necessary to output the number of vectors that is 1 less than the number of vectors.Also, if the number of vectors actually selected is less than a predetermined number (fixed), The series function value for the remaining number may be O. Next, inverse quantization, which is a feature of the present invention, will be explained. Vector code 1 which is the output of fuzzy vector quantization
09 and the class function 110 are sent to the transmitter 111 and input to the local inverse quantizer 410. In the 0 local inverse quantizer 410, the same code vector used for quantizing the input vector 106 is reproduced. A reproduction vector 411 of the input vector 106 is calculated using the vector 411'. Specifically, based on the above equation (3), the reproduction vector
"411 is calculated. The reproduction vector 411 is transferred to the reproduction vector memory 409 and stored. At this time, the reproduction vector in the previous frame is deleted. Note that the updated reproduction vector 411 is obtained on the receiving side. Needless to say, it is the same as the reproduced vector.The above describes the case where the vector selection function of the present invention is applied to the conventional fuzzy vector quantization proposed in the past.In contrast, the invention of the present invention We will briefly explain the case where it is applied to fuzzy vector quantization of the type in which neighboring vectors are registered in advance for each code vector (Japanese Patent Application No. 63-240972), which has already been proposed by the authors. The vectors are the nearest value vector, the neighboring vectors registered in advance, and the reproduced vector 411'.Therefore, the candidate vector selection unit 40
4 has been greatly simplified. Further, among the outputs of the fuzzy vector quantization unit 408 at the final stage, the vector code is only the latest value vector code. Among the candidate vectors, those that are not selected can be determined by setting the first-class function value to 0. Next, the decoding side (receiving side) will be explained. FIG. 4 is a diagram for explaining the fuzzy vector inverse quantization section 114. When the vector code 109' is received, the corresponding code vector v is retrieved from the codebook 701.
1 is read out, and at the same time the reproduction vector memory 70
The reproduction vector of the previous frame is read from 3. Using this and the received class function uiblo', the vector reproducing unit 702 reproduces the vector according to the above-mentioned equation (3). It goes without saying that the codebook 701 on the receiving side has the same contents as the codebook 401 on the transmitting side, and the reproduction vector 704 is also the same as the reproduction vector 411' of the previous frame on the encoding side. The reproduction vector Xh""(AX'+Ai't*-a*Am') is sent to the combining section 116 as spectrum information 115. Next, the combining section 116 will be explained using FIG. 5. In the same figure, the logarithmic power spectrum reproducing section 801
Transmitted level information A, '105' and each element of the reproduction vector (spectral information 115) A1' + Az
Using 'v*11*g Awr', a logarithmic power spectrum Y' 802 is obtained according to the following equation. Y' = A' + A' cos λ+A', cos 2
λ+-・+ A:, cos mλ (4) The regenerated logarithmic power spectrum Y'802 is transformed by (1,/2) log-1 in an inverse logarithmic transformation unit 803 to obtain a zero-phase spectrum main spectrum 804. , is sent to the inverse Fourier transform unit 805. An inverse Fourier transform unit 805 obtains a speech segment 806 by inverse fast Fourier transform (IFFT). The speech segments 806 are added together in a waveform synthesis section 807 while being sequentially shifted by pitch intervals according to the pitch information 107', and are output as reproduced speech 808. According to this embodiment, quantization distortion is reduced particularly in parts where similar spectra are continuous, such as vowel parts. Provides smooth playback sound. In addition, since the vectors used for fuzzy vector quantization are selected, even when the spectral properties change significantly, such as from consonants to vowels, the reproduced audio does not become distorted and highly intelligible reproduced audio can be obtained. . In addition to the above embodiments, it is also possible to have two types of codebooks. That is, the first codebook is a representative vector obtained by dividing the input vector space, and the second codebook is for quantization errors, as described in the first embodiment. The second codebook is used as follows. That is, the input vector is subjected to normal vector quantization using the first codebook. The quantization error (distortion) at this time is further vector quantized using the second codebook. The quantization error after performing two-stage vector quantization in this way is compared with the quantization error when fuzzy vector quantization was performed in the first embodiment, and the quantization method with the smaller error is selected. It is something. According to the second embodiment, good quantization characteristics can be obtained by having a dedicated error codebook for input vectors for which the effect of reducing quantization distortion by fuzzy vector quantization is small. .

【発明の効果】【Effect of the invention】

本発明によれば、特に信号の変化が緩やかな部分に対し
て、量子化歪を従来のファジィベクトル量子化よりも常
に小さく出来るので、同一情報量で高品質な音声を伝送
できる。また、同じ品質ならば、情報量を削減できる。さらに、近傍ベクトルを事前に登録したコードブックを
使用するファジィベクトル量子化に適用することにより
、−層情報量の削減が可能である。なお、本発明の説明では、対象は全て音声を例にしてい
るが、類似の構造の情報をもつものに利用できることは
言うまでもない。According to the present invention, quantization distortion can always be made smaller than conventional fuzzy vector quantization, especially for portions where signal changes are gradual, so high quality audio can be transmitted with the same amount of information. Furthermore, if the quality is the same, the amount of information can be reduced. Furthermore, by applying this method to fuzzy vector quantization using a codebook in which neighborhood vectors are registered in advance, it is possible to reduce the amount of -layer information. Note that in the description of the present invention, all targets are voice as examples, but it goes without saying that the present invention can be used for information having a similar structure.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例のシステム構成を説明するブ
ロック図、第２図は分析部を説明する図、第３図はファ
ジィベクトル量子化部を説明する図。第４図はファジィベクトル逆量子化部を説明する図、第
５図は合成部を説明する図である。符号の説明１０１・・・入力音声、１０３，１１７・・・バッフア
メ−１１−ＩＪ、１０４・・・分析部、１０６．１１５
・・・スペクトル情報、１０７．１０７’・・・ピッチ
情報、１０８・・・ファジィベクトル量子化部、１０９
．１０９’　・・・ベクトルインデクス、１１０．１１
０’・・・級関数、１１４・・・ファジィベクトル逆量
子化部、１１６・・・合成部、１２０・・・出力音声。FIG. 1 is a block diagram illustrating the system configuration of an embodiment of the present invention, FIG. 2 is a diagram illustrating an analysis section, and FIG. 3 is a diagram illustrating a fuzzy vector quantization section. FIG. 4 is a diagram for explaining the fuzzy vector dequantization section, and FIG. 5 is a diagram for explaining the combining section. Explanation of codes 101... Input voice, 103, 117... Buffer American-11-IJ, 104... Analysis section, 106.115
... Spectrum information, 107.107' ... Pitch information, 108 ... Fuzzy vector quantization section, 109
．． 109'...Vector index, 110.11
0'...Class function, 114...Fuzzy vector inverse quantization unit, 116...Synthesizing unit, 120...Output audio.

Claims

Translated fromJapanese

【特許請求の範囲】１、符号化側と復号化側に同一のコードブックを有し、
符号化側において入力ベクトルを前記コードブックに格
納されているコードベクトルと照合し、前記コードベク
トルのコード及び／又は帰属度を符号化し、伝送又は蓄
積し、復号化側において前記伝送又は蓄積されたコード
及び／又は帰属度を復号化し、前記コードブックを用い
て前記入力ベクトルを再構成するベクトル量子化方式に
おいて、前記入力ベクトルよりも時間的又は空間的に先
行した入力ベクトルの量子化結果を、前記入力ベクトル
の量子化に用いることを特徴とするベクトル量子化方式
。２、符号化側に復号化側で得られる再生ベクトルと同じ
ベクトルを得る逆量子化手段と、該逆量子化手段によっ
て得られた再生ベクトルを少なくとも１処理単位分保持
する記憶手段と、該記憶手段に格納された前記再生ベク
トルを照合する手段を有することを特徴とする特許請求
の範囲第１項記載のベクトル量子化方式。３、前記再生ベクトルと前記コードブック内のコードベ
クトルを所定の評価基準により評価し、該評価結果に基
づいて選択的に使用するようにしたことを特徴とする、
特許請求の範囲第１項ないし第２項記載のベクトル量子
化方式。４、前記再生ベクトルと前記コードブック内のコードベ
クトルのうち前記入力ベクトルに最も近いコードベクト
ルを用いることを特徴とする、特許請求の範囲第１項な
いし第２項記載のベクトル量子化方式。５、符号化側に局所復号器を有し、前記入力信号と復号
信号との誤差を評価する手段と、該評価手段の結果に基
づき前記コード及び／又は帰属度を修正する手段を有す
ることを特徴とする、特許請求の範囲第１項ないし第４
項記載のベクトル量子化方式。６、前記コードブックと第２のコードブックと、再生ベ
クトルと入力ベクトルの、又は、復号信号と入力信号の
誤差を評価する手段を有し、該評価手段の結果に基づき
前記コードブックと前記第２のコードブックを選択的に
使用するようにしたことを特徴とする、特許請求の範囲
第１項ないし第５項記載のベクトル量子化方式。７、再生ベクトル利用機能をハードウェア及び／又はソ
フトウェアで実現したベクトル量子化装置。[Claims] 1. Having the same codebook on the encoding side and the decoding side,
On the encoding side, the input vector is compared with the code vector stored in the codebook, and the code and/or membership degree of the code vector is encoded and transmitted or stored, and on the decoding side, the transmitted or stored In a vector quantization method that decodes codes and/or membership degrees and reconstructs the input vector using the codebook, the quantization result of an input vector that precedes the input vector in time or space is A vector quantization method characterized in that it is used to quantize the input vector. 2. Inverse quantization means for obtaining the same vector as the reproduction vector obtained on the decoding side on the encoding side, storage means for holding at least one processing unit of reproduction vectors obtained by the inverse quantization means, and the memory. 2. The vector quantization method according to claim 1, further comprising means for collating said reproduction vector stored in said means. 3. The reproduction vector and the code vector in the codebook are evaluated according to predetermined evaluation criteria, and are selectively used based on the evaluation result.
A vector quantization method according to claims 1 and 2. 4. The vector quantization method according to claim 1 or 2, characterized in that a code vector closest to the input vector is used among the reproduction vector and the code vector in the codebook. 5. A local decoder is provided on the encoding side, and means for evaluating an error between the input signal and the decoded signal, and means for correcting the code and/or the degree of membership based on the result of the evaluation means. Claims 1 to 4 characterized by
Vector quantization method described in section. 6. Means for evaluating the error between the codebook and the second codebook, the reproduced vector and the input vector, or between the decoded signal and the input signal; 6. The vector quantization method according to claim 1, wherein two codebooks are selectively used. 7. A vector quantization device that implements a reproduced vector utilization function using hardware and/or software.