JPH06324697A

Movatterモバイル変換

Info

Publication number: JPH06324697A
Application number: JP5127104A
Authority: JP
Inventors: Ho-Sun Chung; 鎬宣鄭; Jong-Un Park; 政雲朴
Original assignee: Goldstar Electron Co Ltd
Current assignee: SK Hynix Inc
Priority date: 1992-05-30
Filing date: 1993-05-28
Publication date: 1994-11-25
Also published as: KR930023908A; KR950003390B1; DE4317991A1

Abstract

PURPOSE: To obtain a fine recognition result by Korean monosyllable recognition by providing the voice recognition system with a voice input means, a voice analyzer, a main computer connected to an interface means, an I/O storage medium, and a data output medium. CONSTITUTION: The system is provided with a microphone 10 for inputting a voice, an analog amplifier 20 for amplifying the voice inputted from the microphone 10, the voice analyzer 30 for analyzing a sound signal inputted from the amplifier 2O, and an interface board 40 for interfacing with a PC. The system is also provided with a hard disk driver(HDD) & floppy disk driver(FDD) 50 for transmitting/receiving data to/from the computer, the main computer 60 connected to the board 40 and the HDD & FDD 50, a keyboard 70 to be an input device for the computer 60, and a monitor 80 to be an output device for the computer 60. Thus an IDMLP neural circuit network executes voice recognition by the use of the computer 60.

Description

Translated fromJapanese

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識システムに係
り、特に韓国語の短音節音声認識システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system, and more particularly to a Korean short syllable speech recognition system.

【０００２】[0002]

【従来の技術】音声を認識するための特徴の抽出におい
て、話す人の発音速度及び習慣、発音する時の環境の違
い、話者の感情の状態、方言の存在等により音声を互い
に区分するための固有の特徴を探すことは非常に難しい
ことである。又、種々の造音結合により同じ音素だとし
ても前後音素の影響を受け変形された音声学的な特徴を
表す。このような要因のため音声固有の特徴を抽出する
ためのアルゴリズムの開発が難しくアルゴリズムを通じ
て得た知識の表現及び統合に困難があった。2. Description of the Related Art In extracting features for recognizing voices, the voices are distinguished from each other according to the pronunciation speed and habits of the speaker, the difference in the environment during pronunciation, the emotional state of the speaker, the presence of dialects, etc. Finding the unique features of is very difficult. Further, even if the same phoneme is formed by various kinds of sound synthesis, the phonetic characteristics are transformed by the influence of the preceding and following phonemes. Due to these factors, it is difficult to develop an algorithm for extracting features peculiar to speech, and it is difficult to express and integrate the knowledge acquired through the algorithm.

【０００３】このような様々の問題点を解決するために
既存の方式を利用して解決するために多くの研究が遂行
された。この中で音声の周波数成分に現れるピ−ク（pe
ak）を使用し有声音を分けるホルマント解釈（Formant
Analysis）、単語の発音と発音間の時間的不整合（Time
Distortion ）を動的プログラミング（Dynamic Progra
mming ）技法を使用し減らした後一番近い発音を選択し
認識するＤＴＷ（Dynamic Time Warping）方法、音声信
号をＨＭＭ（HiddenMarkov Model）に表し音声認識に使
用するＨＭＭ方法等が特記に値する。In order to solve these various problems, many studies have been carried out to solve the problems using the existing method. In this, the peak (pe
Formant Interpretation (Formant)
Analysis), pronunciation of words and temporal inconsistency between pronunciations (Time
Distortion) for dynamic programming (Dynamic Progra
The DTW (Dynamic Time Warping) method for selecting and recognizing the closest pronunciation after reduction using the mming) technique, the HMM method for expressing a voice signal in an HMM (Hidden Markov Model) and using it for voice recognition are particularly noteworthy.

【０００４】しかしながら、このような方法を利用して
現在まで具現されたほぼ大部分の音声認識システムは人
間が自然に発音した音声は認識し、音声特性の多様な変
化を受容するめに多くの計算を必要としたのでその実用
性や妥当性に問題があり、実時間の音声認識が難しくな
る。こういう音声認識のようなパタ−ン認識の一般的な
問題が解決できる方法として提示されているのが神経回
路網モデルとファジ−論理である。However, most of the speech recognition systems implemented to date by using such a method recognize a speech naturally spoken by a human and perform many calculations in order to accept various changes in speech characteristics. However, there is a problem in its practicality and validity, and real-time speech recognition becomes difficult. Neural network models and fuzzy logic have been proposed as methods that can solve the general problems of pattern recognition such as speech recognition.

【０００５】神経網モデルはFohn-Noimannコンピュ−タ
−とは異なり、曖昧だったり不完全で互いにかち合うデ
−タから与えられた問題を解決するための適切な規則が
学習できるだけでなく、多くの数の計算要素（Neuron）
の並列処理が期待でき音声認識のように並列処理が絶対
的に必要な分野においての利用可能性を見せている。神
経網モデルの長所は次の通りである。The neural network model is different from the Fohn-Noimann computer in that it can learn appropriate rules for solving a problem given from data that is ambiguous or incomplete and can share many data. Computation factor of numbers (Neuron)
It can be expected to be used in parallel processing, and it is showing applicability in fields where parallel processing is absolutely necessary, such as speech recognition. The advantages of the neural network model are as follows.

【０００６】第１．適応性が高い。人間の音声は周辺の
雑音、発音速度、話者の特性により様々な形態に変われ
るが、神経網モデルはそれに対し効率的に学習され得
る。第２．学習過程に妥当性がある。多様な音声デ−タから
抽象的な特性を適切に抽出しアルゴリズム化するのは非
常に難しいが、神経網モデルの場合は例を通じた学習で
自ら特徴を抽出し学習できる。First. Highly adaptable. Human speech is transformed into various forms depending on surrounding noise, pronunciation speed, and speaker characteristics, but neural network models can be efficiently learned. Second. There is validity in the learning process. It is very difficult to properly extract abstract characteristics from various speech data and algorithmize them, but in the case of a neural network model, features can be extracted and learned by learning through examples.

【０００７】第３．並列処理できる。多くの数の基本要
素が並列的に演算を遂行し結果を得る神経網モデルの構
造は学習する時必要な莫大の時間を並列処理を通じて成
せる。既存の方法が特定パタ−ンの基準パタ−ンを予め固定し
たり、音声デ−タに存する多くの規則を一々プログラミ
ングするのに比べ神経回路網を利用したシステムは外部
に現れる情報の特性を自ら見つけ学習をするので変移の
特性を一々指さなくてもパタ−ンが分類でき、又変形さ
れたパタ−ンに対しても良い性能を見せる。実際に神経
回路網を音声認識の方法に導入し音声認識の性能を向上
させようとする神経回路網モデルの中代表的なものとし
てＴＤＮＮ（Time Delay NeuralNetwork ）がある。Ｔ
ＤＮＮは音素単位の音声認識に良い認識性能を見せ、音
韻グル−プに対応する副回路網をモジュ−ルで構成し実
験した結果副回路網の高い認識率を低下させることなし
に認識対象の範囲が拡張できた。Third. Can be processed in parallel. The structure of the neural network model, in which a large number of basic elements perform operations in parallel and obtains the results, can perform a huge amount of time required for learning through parallel processing. In contrast to existing methods that pre-fix a reference pattern of a specific pattern or program many rules that exist in voice data one by one, a system using a neural network determines the characteristics of information that appears outside. Since it finds and learns by itself, it can classify patterns without pointing to the characteristics of the transition one by one, and it also shows good performance with respect to deformed patterns. TDNN (Time Delay Neural Network) is a typical neural network model that actually introduces a neural network into a speech recognition method to improve the performance of speech recognition. T
The DNN shows good recognition performance for phoneme-based speech recognition, and as a result of experimenting by constructing a sub-network corresponding to the phonological group with a module, a high recognition rate of the sub-network can be achieved without lowering the recognition target. The range could be expanded.

【０００８】前述した神経回路網の特性を実際の問題に
適用し充分に利用するためにはハ−ドウェアの具現が必
ず必要である。所が、神経回路網をハ−ドウェア化する
時はコンピュ−タ−によるソフトウェアシミュレ−ショ
ンとは異なり多くの制約がある。それでチップ具現のた
めのＩＤＭＬＰ神経回路網が提案された。そして、全て
の人が同じ言語を発音しても人によってその周波数特性
が異なるので、即ち音声デ−タの多様性のためファジ−
論理を導入した。In order to apply the characteristics of the neural network described above to actual problems and make full use of them, it is necessary to implement hardware. However, when the neural network is made into hardware, there are many restrictions unlike software simulation by a computer. Therefore, an IDMLP neural network for chip implementation was proposed. Even if all people pronounce the same language, the frequency characteristics differ depending on the person, that is, due to the variety of voice data, fuzzy
Introduced logic.

【０００９】[0009]

【発明が解決しようとする課題】本発明の目的は神経回
路網とファジ−パタ−ンマッチングアルゴリズムを利用
した韓国語短音節音声認識システムを提供することであ
る。SUMMARY OF THE INVENTION An object of the present invention is to provide a Korean short syllable speech recognition system using a neural network and a fuzzy pattern matching algorithm.

【００１０】[0010]

【課題を解決するための手段】前記の目的を達成するた
めに本発明の音声認識システムは音声信号を入力するた
めの音声入力手段と、前記音声入力手段からの音声信号
を分析するための音声分析器と、前記音声分析器からの
信号を主コンピュ−タ−に伝達するためのインタフェ−
ス手段と、前記インタフェ−ス手段と連結された主コン
ピュ−タ−と、前記主コンピュ−タ−に連結されデ−タ
を入出力するための入出力貯蔵媒体と、前記主コンピュ
−タ−に連結された入力媒体及び前記主コンピュ−タ−
に連結されたデ−タ出力媒体を具備する。In order to achieve the above object, a voice recognition system of the present invention comprises a voice input means for inputting a voice signal and a voice for analyzing the voice signal from the voice input means. An analyzer and an interface for transmitting a signal from the voice analyzer to a main computer.
Means, a main computer connected to the interface means, an input / output storage medium connected to the main computer for inputting / outputting data, and the main computer. An input medium connected to the main computer and the main computer
And a data output medium connected to the.

【００１１】[0011]

【作用】コンピュ−タ−を利用してＩＤＭＬＰ神経回路
網によって音声認識を遂行することにより認識率が高め
る。The recognition rate is increased by performing voice recognition by the IDMLP neural network using the computer.

【００１２】[0012]

【実施例】以下、添付した図面に基づき本発明を詳細に
説明する。図１は本発明による音声認識システムの音声
分析のためのハ−ドウェア構成のブロック図を示す。図
１において、音声を入力するためのマイクロホン１０、
前記マイクロホン１０からの音声を増幅するためのアナ
ログ増幅器２０、前記アナログ増幅器２０からの音声信
号を分析するための音声分析器３０、ＰＣとのインタフ
ェ−スのためのインタフェ−スボ−ド４０、コンピュ−
タ−とデ−タをやり取りするためのハ−ドディスクドラ
イバ−とフロッピ−ディスクドライバ−５０、前記イン
タフェ−スボ−ド４０と前記ハ−ドディスクドライバ−
とフロッピ−ディスクドライバ−５０と連結された主コ
ンピュ−タ−６０、前記主コンピュ−タ−６０の入力装
置であるキ−ボ−ド７０及び前記主コンピュ−タ−６０
の出力装置であるモニタ−８０から構成されている。The present invention will be described in detail below with reference to the accompanying drawings. FIG. 1 shows a block diagram of a hardware configuration for speech analysis of a speech recognition system according to the present invention. In FIG. 1, a microphone 10 for inputting voice,
An analog amplifier 20 for amplifying the sound from the microphone 10, a sound analyzer 30 for analyzing the sound signal from the analog amplifier 20, an interface board 40 for interfacing with a PC, a computer. −
Hard disk driver for exchanging data with data, a floppy disk driver-50, the interface board 40 and the hard disk driver
, A main computer 60 connected to a floppy disk driver 50, a keyboard 70 as an input device of the main computer 60, and the main computer 60.
Monitor-80 which is an output device of the.

【００１３】図２は音声入力のためのもので、可変抵抗
を利用し入力音声の大きさが調節でき、使用された差動
増幅器９０は普通人の声が７KHz まで分布するので周波
数帯域幅が１０KHz 以上である差動増幅器TL072CP を使
用した。図３は音声分析のための一実施例のブロック図
を示すもので、全体的に二部分に分けられる。一つはア
ドレスデコ−ダ−を含むＰＣとのインタフェ−ス部分で
あり、残りは音声分析を行う部分である。FIG. 2 is for voice input. The volume of the input voice can be adjusted by using a variable resistor, and the used differential amplifier 90 has a frequency bandwidth of ordinary human voice distributed up to 7 KHz. A differential amplifier TL072CP with a frequency of 10 KHz or more was used. FIG. 3 shows a block diagram of one embodiment for speech analysis, which is generally divided into two parts. One is an interface part with a PC including an address decoder, and the other is a part for performing voice analysis.

【００１４】アドレスデコ−ダ−は74LS688 を利用して
インタフェ−スボ−ドである8255PPI とμPD7763のチッ
プを各々別に選択させた。アドレスデコ−ダ−であある
74LS688 はオ−プンコレクタ−形のチップであるのでプ
ルアップ抵抗を74LS688 の出力を表す19番ピンと電源の
間に連結し構成した。ＰＣとのインタフェ−スは8255PP
I を利用したが、こうすることにより、μPD7763とＰＣ
とのインタフェ−スボ−ドを非常に簡単に設計及び製作
できた。8255PPI とμPD7763の制御はデ−タバスとアド
レスバスそしてIBM PC AT のスロットにある幾つかの信
号を利用したが、この部分はソフトウェア構成で詳細に
述べることにする。The address decoder uses the 74LS688 to select the interface boards 8255PPI and .mu.PD7763 chips separately. It is an address decoder
Since the 74LS688 is an open collector type chip, a pull-up resistor was connected between pin 19 representing the 74LS688 output and the power supply. 8255PP interface with PC
I used I, but by doing this, μPD7763 and PC
The interface board with and was able to be designed and manufactured very easily. The control of the 8255PPI and the µPD7763 utilized the data bus, address bus and some signals in the slots of the IBM PC AT, but this part will be described in detail in the software configuration.

【００１５】8255PPI のリセット（RESET ）入力は IBM
PC のI/0 スロットにあるリセット端子をそのまま連結
しコンピュ−タ−がブ−ティングされる時自動にリセッ
トさせた。音声分析をする部分はディジタルとアナログ
信号が混ぜられている部分なので雑音に格別気を遣うべ
きである。本発明ではこのような雑音の除去をほぼ 0.1
μF のキャパシタ−を使用した。The 8255PPI reset (RESET) input is IBM
The reset terminal in the I / 0 slot of the PC was connected as it was and automatically reset when the computer was booted. The voice analysis part is a part where digital and analog signals are mixed, so special attention should be paid to noise. In the present invention, removal of such noise is almost 0.1
A μF capacitor was used.

【００１６】μPD7763に対する制御信号の入力はデ−タ
バスとアドレスバスを利用した。そして、内部回路の動
作を合わせるために4MHzの周波数を有するクロックが要
求されるが、これは4MHzモジュ−ルクリスタルを利用し
て解決した。μPD7763のリセット信号は入力モ−ドを設
定する前に必ず行うべきだが、ハ−ドウェア的でこの問
題を解決しようとすれば全体的な回路があまり大きくな
るので本発明では8255PPI を利用してソフトウェア的で
解決した。即ち、デ−タバスでリセット信号に当たる値
を8255PPI の出力ポ−トを通じてμPD7763のリセット端
子に伝達する方法を採った。A data bus and an address bus are used for inputting control signals to the μPD7763. A clock having a frequency of 4MHz is required to match the operation of the internal circuit, but this was solved by using a 4MHz module crystal. Although the reset signal of the μPD7763 should always be performed before setting the input mode, the hardware size is too large to solve this problem, so in the present invention, the 8255PPI is used for software. Settled. That is, the method of transmitting the value corresponding to the reset signal by the data bus to the reset terminal of the µPD7763 through the output port of 8255PPI is adopted.

【００１７】音声分析器のフレ−ム（FRAME ）端子が
“１”を出力すれば一フレ−ムの分析が完了することを
表すが、この時コンピュ−タ−はμPD7763の内部にある
スタックに貯蔵された１６チャネルフィルタ−バンクの
値を読み出せば良い。この部分のインタフェ−スを8255
PPI を使用して音声認識において一番多い時間の要求さ
れる前処理部分をハ−ドウェアで構成し全体的には実時
間音声認識システムを構成することができた。When the frame (FRAME) terminal of the voice analyzer outputs "1", it means that the analysis of one frame is completed. At this time, the computer is placed in the stack inside the μPD7763. It is sufficient to read the stored 16-channel filter-bank value. The interface of this part is 8255
It was possible to construct a real-time speech recognition system as a whole by using PPI to configure the preprocessing part that requires the most time in speech recognition with hardware.

【００１８】即ち、一般的なソフトウェアシミュレ−シ
ョンでは音声信号を分析するのにたくさんの時間が消耗
されるが、本発明では１６チャネル帯域フィルタ−バン
クの出力を求めるのにハ−ドウェアで製作したのでこの
過程で必要とされる莫大な時間が減らせ実時間で音声を
認識できるシステムを構成することができた。図４はμ
PD7763の内部にある１６個の帯域フィルタ−の周波数表
を表す。マイクロホンの出力を増幅するために普通人の
声周波数を含むことのできる周波数帯域幅が１０KHz 以
上であるTL072CP を使用した。このアナログ増幅端の出
力がμPD7763の入力に使用され音声分析器μPD7763の出
力は８ビットディジタルデ−タである。That is, in a general software simulation, it takes a lot of time to analyze a voice signal, but in the present invention, a hardware is used to obtain the output of the 16-channel bandpass filter bank. Therefore, a huge amount of time required in this process was reduced and a system capable of recognizing voice in real time could be constructed. 4 is μ
The frequency table of 16 band-pass filters inside the PD7763 is shown. To amplify the output of the microphone, we used TL072CP, which has a frequency bandwidth of 10KHz or more that can contain the voice frequency of a normal person. The output of this analog amplifying terminal is used as the input of the .mu.PD7763, and the output of the voice analyzer .mu.PD7763 is 8-bit digital data.

【００１９】図５は母音FIG. 5 shows a vowel

【００２０】[0020]

【外１】[Outer 1]

【００２１】に対する音声分析器の一フレ−ム出力を示
す。普通母音が持っている特性であるホルマントが観察
されたので本発明で設計及び製作した音声分析システム
が音声認識のための前処理用ボ−ドとして欠陥のないこ
とが見られる。ソフトウェアの構成は二通りで分類され
る。第１は先に設計製作した音声分析ボ−ドを使用者が
たやすく使用するための制御プログラムが必要である。
このような制御プログラムは8255PPI とμPD7763の動作
モ−ドの設定とデ−タの入出力を指定する。1 shows a frame output of a speech analyzer for. It can be seen that the voice analysis system designed and constructed according to the present invention is not defective as a preprocessing board for voice recognition, because a formant which is a characteristic of a normal vowel is observed. The software structure is classified into two types. First, a control program is required for the user to easily use the voice analysis board designed and manufactured previously.
Such a control program specifies the operation mode settings and data input / output of the 8255 PPI and µPD7763.

【００２２】第２で構成されるソフトウェアは音声分析
ボ−ドを通じて周波数領域に解釈されたデ−タから音声
区間を検出し、様々に違って現れるそれぞれの発声長さ
を正規化し最終的にはＩＤＭＬＰ神経回路網の入力に使
用されるデ−タの二進化を行う。8255PPI 動作モ−ドの
設定は基本動作モ−ドで入力と出力を行うポ−トを指定
することで本発明では“Ａ”と“Ｃ”ポ−トを出力ポ−
トに定め、“Ｂ”ポ−トを入力ポ−トに指定した。上の
動作モ−ドの設定は次の通りにすることにより指定され
る。The second software detects the voice section from the data interpreted in the frequency domain through the voice analysis board, normalizes the vocal lengths that appear differently, and finally, The data used for input to the IDMLP neural network is binarized. The 8255PPI operation mode is set by designating the input and output ports in the basic operation mode. In the present invention, the "A" and "C" ports are output ports.
"B" port was designated as the input port. The above operating mode settings are specified by:

【００２３】 outportb(0x307,0x82):/* A&C Port output,B Port input*/ 8255PPI は音声分析チップを初期化し分析が終わったか
否かを分かるに使用されるようハ−ドウェアが構成され
ているので8255PPI を制御するプログラムは簡単であ
る。次はμPD7763を制御するプログラムが必要だが、こ
の時注意すべきなのはデ−タを分析し、その分析された
デ−タを読み出す時間をよく計算して全体システムを安
定させる。Outportb (0x307,0x82): / * A & C Port output, B Port input * / 8255 PPI is configured to be used to initialize the voice analysis chip and to know whether the analysis is finished or not. So the program to control the 8255PPI is simple. Next, a program for controlling the µPD7763 is required, but what should be noted at this time is to analyze the data and calculate the time to read the analyzed data well to stabilize the entire system.

【００２４】先ず、音声分析チップの初期化を行うべき
だがこれは8255PPI を利用して次のようにする。 outportb(0x00):/* μPD7763 Reset Signal*/ delay(1):/* Reset Signal Duration*/ outportb(0xff):/* Free Reset Signal*/ リセット信号はシステムを初期化するので４クロック以
上入力しなければならない。それで上の“delay(int);"
関数が必要である。First, the voice analysis chip should be initialized, which uses 8255PPI as follows. outportb (0x00): / * μPD7763 Reset Signal * / delay (1): / * Reset Signal Duration * / outportb (0xff): / * Free Reset Signal * / Input the reset signal for 4 or more clocks as it initializes the system. There must be. So the above “delay (int);”
Function is required.

【００２５】μPD7763の入出力制御部はデ−タバス（DB
0 〜DB7 ）を間に置き外部とデ−タを入出力する時の制
御を行う。図６は四つの制御信号（ＣＳ、ＷＲ、Ａ０、
Ａ１）の状態による動作を示す。リセット解除の後に 3
78μsec 以内に動作モ−ドの設定をすべきである。μPD
7763の動作モ−ドはデ−タバスを通じてチップ内部にあ
る COMMAND/STATUS レジスタ−にデ−タを記入すること
により設定されるが、コンピュ−タ−で制御可能な動作
モ−ドの種類は次の通りである。The input / output control unit of the μPD7763 is a data bus (DB
0 to DB7) is placed between them to control the input / output of data to / from the outside. FIG. 6 shows four control signals (CS, WR, A0,
The operation according to the state of A1) is shown. After reset release 3
The operation mode should be set within 78 μsec. μPD
The operation mode of the 7763 is set by writing data to the COMMAND / STATUS register inside the chip through the data bus.The types of operation modes that can be controlled by the computer are as follows. Is the street.

【００２６】１．分析フレ−ムの周期。２．ＰＲＥ−ＡＭＰの利得。３．イコライザのオン／オフ。４．低域通過フィルタ−の遮断周波数。前述した四つの項目はコンピュ−タ−でデ−タバスを通
じて COMMAND/STATUSレジスタ−にデ−タを記入するこ
とにより成されるが、Ｃ言語を利用して簡単に次のよう
に構成できる。1. Period of analysis frame. 2. PRE-AMP gain. 3. Equalizer on / off. 4. Cutoff frequency of the low-pass filter. The above-mentioned four items are made by writing the data in the COMMAND / STATUS register through the data bus with the computer, but it can be easily constructed as follows using the C language.

【００２７】 outportb(0x304.0x4c): /* 0dB. 16ms */ outportb(0x304.0x02): /* 25Hz. EQ OFF */ 図７は音声の入力から分析結果を読み出すまでの全体の
フローチャートを示す。図８は音声分析器の出力を 825
5PPIを通じてコンピュ−タ−のメモリ貯蔵した後、音声
区間の検出とＩＤＭＬＰ神経回路網の入力に使用される
二進化されたデ−タを獲得する過程を示す。Outportb (0x304.0x4c): / * 0dB. 16ms * / outportb (0x304.0x02): / * 25Hz. EQ OFF * / FIG. 7 shows the entire flow chart from the input of voice to the reading of the analysis result. . Figure 8 shows the output of the speech analyzer
A process of acquiring binarized data used for detection of a voice section and input of an IDMLP neural network after memory storage of a computer through 5PPI is shown.

【００２８】本発明では一フレ−ムの長さを１６msにし
た。分析が完了されたデ−タで音声区間を検出すべきだ
が、本発明では一フレ−ムのエネルギ−が定めて置いた
しきい値より大きい場合にはそのフレ−ムを音声区間に
定めた。一人が同じ言葉を何回繰り返して発音してみる
時、発音の長さが同じである場合が殆どないので時間軸
正規化をしなければならない。韓国語の短音節に当たる
言葉を何回繰り返して発音してみれば短くは８フレ−ム
長くは２６フレ−ムまで続いたので基準を１５フレ−ム
に定め時間軸正規化を行い、上で求めたデ−タを二進化
する。In the present invention, the length of one frame is 16 ms. Although the voice section should be detected by the data after the analysis is completed, in the present invention, when the energy of one frame is larger than the predetermined threshold value, the frame is set to the voice section. When one person repeatedly pronounces the same word, the pronunciation length is almost the same, so time axis normalization must be performed. Repeatedly pronouncing the words that correspond to the Korean short syllables, the short ones lasted up to 8 frames and the longest up to 26 frames, so the standard was set at 15 frames and the time axis was normalized. The obtained data is binarized.

【００２９】本発明ではＩＤＭＬＰ神経回路網の入力に
使用するためにマイクロホンの入力から１６チャネルフ
ィルタ−バンクの出力までをハ−ドウェアで具現し入力
デ−タの採集時間を減らした。入力デ−タの抽出過程は
先に説明した最終的な正規化された二進化デ−タを求め
ることでその過程は次の通りである。In the present invention, the input from the microphone to the output of the 16-channel filter bank is embodied by hardware for use in the input of the IDMLP neural network to reduce the input data collection time. The process of extracting the input data is as follows by obtaining the final normalized binarized data described above.

【００３０】１．マイクロホンで音声信号を受け入れ
る。２．TP072CP を利用して音声分析器μPD7763の入力に適
当な大きさで増幅する。３．音声分析器から分析結果を読み出す。４．定めたしきい値を利用して音声区間を検出する。1. Accept audio signal with microphone. 2. Use the TP072CP to amplify the input to the voice analyzer μPD7763 with an appropriate size. 3. Read the analysis result from the voice analyzer. 4. The voice section is detected by using the defined threshold.

【００３１】５．基準フレ−ムの数に合うよう時間軸で
正規化させる。６．各フィルタ−の出力を隣のフィルタ−の出力と比べ
この時の相対的な大きさを二進化する。図９は音声区間を検出した直後のデ−タを示す。図１０
は音声信号の周波数スペクトルを示す。5. Normalize on the time axis to match the number of reference frames. 6. The output of each filter is compared with the output of the adjacent filter, and the relative size at this time is binarized. FIG. 9 shows the data immediately after detecting the voice section. Figure 10
Indicates the frequency spectrum of the audio signal.

【００３２】図１１は二進化された周波数スペクトルを
示す。音声分析器から読み出した各フィルタ−のエネル
ギ−を次の式でのように左側のフィルタ−の出力と比べ
大きい場合１、小さい場合０の値をそのフィルタ−での
出力値とする。FIG. 11 shows the binarized frequency spectrum. When the energy of each filter read from the voice analyzer is larger than the output of the filter on the left side as in the following formula, a value of 1 is set, and a value of 0 is set as an output value of the filter.

【００３３】[0033]

【数１】[Equation 1]

【００３４】図１２はこのような方法を使用して得た出
力を示す。上の過程を経た結果一つのフレ−ムに当たる
入力ビット数は１５ビットである。本発明では韓国語の
短音節音声認識のための前哨段階として、“０”から
“９”までの数字音認識にファジ−論理と先に説明した
ＩＤＭＬＰ神経回路網の適用可能性を実験してみた。そ
してＩＤＭＬＰ神経回路網を利用した認識において全て
の学習デ−タを学習した結果とファジ−化し一つのデ−
タで学習した結果に対しそれぞれ認識実験をし、神経回
路網とファジ−論理の結合可能性を打診してみた。FIG. 12 shows the output obtained using such a method. As a result of the above process, the number of input bits corresponding to one frame is 15 bits. In the present invention, as an outpost stage for Korean short syllable speech recognition, the fuzzy logic and the applicability of the IDMLP neural network described above are tested for the number sound recognition from "0" to "9". saw. Then, in the recognition using the IDMLP neural network, the result of learning all the learning data is made into a fuzz and one data.
We conducted a recognition experiment for each of the learning results and examined the possibility of connecting the neural network and fuzzy logic.

【００３５】図１３はファジ−化したデ−タを再び二進
化する過程を示す。“０”から“９”まで１０回ずつ発
音された各々のデ−タを重畳させデ−タをファジ−化し
それを更に適当な臨界値に二進化した。図１４はファジ
−化したデ−タを二進化した結果を示す。回路網の学習
の際入力ノ−ドの数は一定であるので相異なる長さで発
音された音の長さを時間軸に対し正規化すべきである。
本発明では１５フレ−ムを基準として基準フレ−ムより
入力パタ−ンのフレ−ムが長ければ適当な間隔でフレ−
ムを満たしていきながら時間軸正規化を遂行した。FIG. 13 shows a process of re-binarizing fuzzy data. Each piece of data, which was pronounced 10 times from "0" to "9", was superimposed to make the data fuzzy, and was further binarized to an appropriate critical value. FIG. 14 shows the result of binarizing the fuzzified data. Since the number of input nodes is constant during the learning of the network, the lengths of sounds generated with different lengths should be normalized with respect to the time axis.
In the present invention, if the frame of the input pattern is longer than the reference frame with 15 frames as the reference, the frames are spaced at appropriate intervals.
The time axis normalization was performed while satisfying the requirements.

【００３６】本発明では一人の話者により発音された２
００個のＩＤＭＬＰ神経回路網の学習デ−タで使用し、
設計された音声認識システムが音声デ−タの多様性にど
のぐらいの適応性があるか判断するために認識実験は
朝、昼、夜の三通り時間帯から抽出した３００個のデ−
タで認識実験をした。学習した結果学習デ−タは二進デ
−タとファジ−デ−タ全て１００％の認識率を見せ、試
験デ−タに対しては二進デ−タで学習した場合とファジ
−デ−タで学習した場合全て９４％以上の高い認識率を
表した。各々の実験結果を図１５に示した。In the present invention, the two words pronounced by one speaker
Used in the learning data of 00 IDMLP neural network,
In order to judge the adaptability of the designed speech recognition system to the diversity of speech data, the recognition experiment was conducted with 300 pieces of data extracted from three time zones, morning, daytime and night.
I did a recognition experiment. As a result of learning, the learning data shows a recognition rate of 100% for all the binary data and the fuzzy data, and for the test data, the case of learning with the binary data and the fuzzy data. All of them showed a high recognition rate of 94% or more. The results of each experiment are shown in FIG.

【００３７】二進化されたデ−タでＩＤＭＬＰ神経回路
網を学習した場合には朝に行った認識実験の認識率が９
４％と現れ、昼の認識率が９９％、夜の認識率が９６％
とそれぞれ現れた。全体的には９６．３％の認識率を見
せた。ファジ−化されたデ−タでＩＤＭＬＰ神経回路網
を学習した場合には朝に行った認識実験の認識率が９７
％と現れ、昼の認識率が９９％、夜の認識率が９８％と
現れた。When the IDMLP neural network was learned with the binarized data, the recognition rate of the recognition experiment conducted in the morning was 9
Appearance of 4%, day recognition rate 99%, night recognition rate 96%
And each appeared. The overall recognition rate was 96.3%. When the IDMLP neural network was learned with fuzzified data, the recognition rate in the recognition experiment conducted in the morning was 97.
%, The daytime recognition rate was 99%, and the night recognition rate was 98%.

【００３８】図１５〜図２０に示した表で分かるように
朝、昼、夜の中で昼の実験の認識率が最も良く、認識率
の一番よくないデ−タは数字音“６”だった。ＩＤＭＬ
Ｐ神経回路網の学習は二進化したデ−タを全て学習した
場合とファジ−デ−タで学習した二通りの場合に対して
全てが単層で学習が終わった。それでＩＤＭＬＰ神経回
路網の構造的な特性はテストして見なかったが、ファジ
−化したデ−タを利用して学習をさせても認識結果には
大きな差がなくむしろ数字音認識では認識率が些か向上
された。As can be seen from the tables shown in FIGS. 15 to 20, the recognition rate in the daytime experiment is the highest in the morning, the daytime and the night, and the data having the lowest recognition rate is the numeral sound "6". was. IDML
The learning of the P neural network was completed in a single layer for both cases of learning all the binarized data and two cases of learning by fuzzy data. Therefore, although the structural characteristics of the IDMLP neural network were not tested, the recognition results did not show a large difference even if the learning was performed using the fuzzified data. Was slightly improved.

【００３９】ＩＤＭＬＰ神経回路網を韓国語短音節認識
に適用するために本発明ではIn order to apply the IDMLP neural network to Korean short syllable recognition, in the present invention,

【００４０】[0040]

【外２】[Outside 2]

【００４１】の五通りのモジュ−ルに分類する類型分類
神経回路網を先ず構成し、分類された類型別に各々の副
回路網で最終的に認識するよう全体回路網を六つのモジ
ュ−ルで構成した。そして、各モジュ−ルはＩＤＭＬＰ
神経回路網で構成した。図２１はモジュ−ラＩＤＭＬＰ
神経回路網の構成を示す。類型分類段階では先に述べた
五つの母音を基準として入力される音声を五つのグル−
プに分類する役割を果たす。分類対象音節は図２２に示
したようにFirst, a type classification neural network for classifying into five types of modules is constructed, and the entire network is divided into six modules so that each sub-network is finally recognized by the classified type. Configured. And each module is IDMLP
Composed of neural network. Figure 21 is a modular IDMLP
The structure of a neural network is shown. At the type classification stage, the speech input based on the five vowels mentioned above is divided into five groups.
Play a role in classifying into groups. The syllables to be classified are as shown in FIG.

【００４２】[0042]

【外３】[Outside 3]

【００４３】の母音を含む韓国語Ｃ−Ｖ短音節７０個で
ある。入力される音声を五つのグル−プに分類する類型
分類神経回路網を学習する時使用されたデ−タの抽出は
入力音声から母音部のみを抽出し学習を遂行した。使用
されたデ−タが母音−子音（Ｖ−Ｃ）短音節なので母音
Ｖを抽出することは簡単である。母音が後ろの部分に存
するので全体フレ−ムから後端部に存する幾フレ−ムの
みを抽出しても類型分類に使用されるデ−タとしては充
分であるが、認識対象語彙を次第に拡張させる時、即ち
Ｃ−Ｖ−Ｃ短音節を考えて見れば上のアルゴリズムが不
適である。70 Korean CV short syllables including vowels. The extraction of the data used when learning the type classification neural network for classifying the input voice into five groups was performed by extracting only the vowel part from the input voice. It is easy to extract the vowel V because the data used is vowel-consonant (VC) short syllables. Since the vowels are in the rear part, it is sufficient as the data used for the type classification to extract only the frames in the rear end from the whole frame, but the vocabulary to be recognized is gradually expanded. The above algorithm is unsuitable when considering, that is, the C-V-C short syllable.

【００４４】それで、本発明では全体フレ−ムの中間部
分のみを抽出し類型分類神経回路網の学習デ−タで使用
した。このような方法で抽出したデ−タで学習させた神
経回路網で類型分類実験を数字音認識実験のように朝、
昼、夜の三通り時間帯にわたってした。学習デ−タは一
人の話者により五つの母音を含む７０個の音節を５回発
音した３５０個のデ−タから母音部を抽出したデ−タを
使用した。試験デ−タは各モジュ−ル当たり420 個のデ
−タを、全体的に2500個のデ−タでモジュ−ル分析試験
をした。Therefore, in the present invention, only the intermediate portion of the whole frame is extracted and used in the learning data of the type classification neural network. A type classification experiment with a neural network learned by data extracted by such a method is performed in the morning like a number sound recognition experiment.
There were three time zones, day and night. As the learning data, data in which a vowel part was extracted from 350 data in which 70 syllables including 5 vowels were pronounced 5 times by one speaker was used. The test data consisted of 420 data for each module, and a total of 2500 data was subjected to the module analysis test.

【００４５】図２３〜図２４は子音−母音（Ｃ−Ｖ）短
音節に対する分類率を示す。図２３〜図２４に示したよ
うに、平均的に98.4% の類型分類成功率を表した。各モ
ジュ−ル別に類型分類率を見れば次の通りである。23 to 24 show classification rates for consonant-vowel (CV) short syllables. As shown in FIGS. 23 to 24, the type classification success rate was 98.4% on average. The type classification rate for each module is as follows.

【００４６】[0046]

【外４】[Outside 4]

【００４７】の類型分類成功率を見せた。Ｃ−Ｖ短音節
からデ−タを抽出し学習をさせたが、子音−母音−子音
（Ｃ−Ｖ−Ｃ）短音節デ−タに対しても類型分類実験を
して見た結果90% 以上の類型分類成功率を表した。The type classification success rate of was shown. The data was extracted from the CV short syllables and learned, but the result of a type classification experiment for consonant-vowel-consonant (CVC) short syllable data was 90%. The above typological success rate is shown.

【００４８】[0048]

【発明の効果】本発明の音声認識システムは韓国語短音
節認識において良い結果が得られ、このような認識実験
を通じて短音節以上の音声認識も可能である。又、新し
い音声認識システムが具現できる。The speech recognition system of the present invention has good results in Korean short syllable recognition, and speech recognition of more than short syllables is possible through such recognition experiment. Also, a new voice recognition system can be realized.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明による音声分析のためのハ−ドウェア構
成のブロック図を示す図である。FIG. 1 is a block diagram of a hardware configuration for speech analysis according to the present invention.

【図２】本発明による音声入力部のアナログ回路を示す
図である。FIG. 2 is a diagram showing an analog circuit of a voice input unit according to the present invention.

【図３】本発明による音声分析のための回路を示す図で
ある。FIG. 3 shows a circuit for speech analysis according to the present invention.

【図４】音声分析のための回路内部の１６個の帯域フィ
ルタ−周波数表を示す図である。FIG. 4 is a diagram showing 16 band-pass filters inside the circuit for speech analysis-frequency table.

【図５】母音[Figure 5] Vowel

【外５】に対する音声分析器の一フレ−ムの出力を示す。[Outside 5] 2 shows the output of one frame of the speech analyzer for.

【図６】音声分析器の制御信号と動作を示す図である。FIG. 6 is a diagram showing control signals and operations of a voice analyzer.

【図７】音声分析器の音声分析のフローチャートであ
る。FIG. 7 is a flow chart of voice analysis of a voice analyzer.

【図８】音声分析デ−タの抽出過程を示すフローチャー
トである。FIG. 8 is a flowchart showing a process of extracting voice analysis data.

【図９】音声区間を検出した直後のデ−タを示す図であ
る。FIG. 9 is a diagram showing data immediately after a voice section is detected.

【図１０】音声信号の周波数スペクトルを示す図であ
る。FIG. 10 is a diagram showing a frequency spectrum of an audio signal.

【図１１】二進化された周波数スペクトルを示す図であ
る。FIG. 11 is a diagram showing a binarized frequency spectrum.

【図１２】二進化された音声信号を示す図である。FIG. 12 is a diagram showing a binarized audio signal.

【図１３】ファジ−化したデ−タを再び二進化する過程
を示す図である。FIG. 13 is a diagram showing a process of re-binarizing fuzzy data.

【図１４】ファジ−化されたデ−タを二進化した結果を
示す図である。FIG. 14 is a diagram showing the result of binarization of fuzzified data.

【図１５】朝、昼、夜の二進化されたデ−タとファジ−
化されたデ−タの認識実験結果を示す図である。FIG. 15: Data and fuzzy that have undergone binarization in the morning, day and night
It is a figure which shows the recognition experiment result of the digitized data.

【図１６】朝、昼、夜の二進化されたデ−タとファジ−
化されたデ−タの認識実験結果を示す図である。[FIG. 16] Data and fuzzy that have undergone binarization in the morning, day and night
It is a figure which shows the recognition experiment result of the digitized data.

【図１７】朝、昼、夜の二進化されたデ−タとファジ−
化されたデ−タの認識実験結果を示す図である。[FIG. 17] Data and fuzzy that have undergone binarization in the morning, day and night
It is a figure which shows the recognition experiment result of the digitized data.

【図１８】朝、昼、夜の二進化されたデ−タとファジ−
化されたデ−タの認識実験結果を示す図である。[Fig. 18] Data and fuzzy that have undergone binarization in the morning, day and night
It is a figure which shows the recognition experiment result of the digitized data.

【図１９】朝、昼、夜の二進化されたデ−タとファジ−
化されたデ−タの認識実験結果を示す図である。[FIG. 19] Data and fuzzy that have undergone binarization in the morning, day and night
It is a figure which shows the recognition experiment result of the digitized data.

【図２０】朝、昼、夜の二進化されたデ−タとファジ−
化されたデ−タの認識実験結果を示す図である。[Fig. 20] Data and fuzzy that have undergone binarization in the morning, day and night
It is a figure which shows the recognition experiment result of the digitized data.

【図２１】モジュ−ラＩＤＭＬＰ神経回路網の構成を示
す図である。FIG. 21 is a diagram showing a configuration of a modular IDMLP neural network.

【図２２】Ｃ−Ｖ短音節の分類対象音節を示す図であ
る。FIG. 22 is a diagram showing classification target syllables of CV short syllables.

【図２３】各モジュ−ルに対する分類率を示す図であ
る。FIG. 23 is a diagram showing a classification rate for each module.

【図２４】各モジュ−ルに対する分類率を示す図であ
る。FIG. 24 is a diagram showing a classification rate for each module.

【符号の説明】[Explanation of symbols]

１０マイクロホン２０アナログ増幅器３０音声分析器４０インタフェースボード５０ハードディスクドライバーとフロッピーディスク
ドライバー６０主コンピューター７０キーボード８０モニター９０差動増幅器10 Microphone 20 Analog Amplifier 30 Speech Analyzer 40 Interface Board 50 Hard Disk Driver and Floppy Disk Driver 60 Main Computer 70 Keyboard 80 Monitor 90 Differential Amplifier

Claims

Translated fromJapanese

【特許請求の範囲】[Claims]

【請求項１】音声信号を入力するための音声入力手段
と、前記音声入力手段からの信号を所定の周波数帯域に分け
て各周波数帯域での信号のエネルギ−に表現するための
音声分析手段と、前記音声分析手段からの信号を入力し信号に対する各周
波数帯域の大きさを比べ信号を二進化し前記二進化され
たデ−タに当たる音声信号を外部に出力するための主コ
ンピュ−タ−を具備することを特徴とする音声認識シス
テム。1. A voice input means for inputting a voice signal, and a voice analysis means for dividing the signal from the voice input means into predetermined frequency bands and expressing the energy of the signal in each frequency band. A main computer for inputting a signal from the voice analysis means, comparing the sizes of respective frequency bands with respect to the signal, binarizing the signal, and outputting a voice signal corresponding to the binarized data to the outside. A voice recognition system characterized by being provided.

【請求項２】前記音声分析手段と前記主コンピュ−タ
−の間に前記二手段のインタフェ−スのためのインタフ
ェ−ス手段を更に具備することを特徴とする請求項１記
載の音声認識システム。2. The voice recognition system according to claim 1, further comprising interface means for interfacing the two means between the voice analysis means and the main computer. .

【請求項３】音声信号を入力するための音声入力手段
と、前記音声入力手段からの信号を所定の周波数帯域に分け
て各周波数帯域での信号のエネルギ−に表現するための
音声分析手段と、前記音声分析手段からの信号を入力し信号に対する各周
波数帯域の大きさを比べ信号を二進化し前記二進化され
たデ−タに当たる音声信号を外部に出力するための主コ
ンピュ−タ−を具備した音声認識システムの音声認識方
法において、音声分析デ−タを前記主コンピュ−タ−に入力する第１
段階と、前記第１段階からのデ−タを利用し音声区間を検出する
第２段階と、前記第２段階のデ−タを利用して時間軸正規化を遂行す
る第３段階と、前記時間軸正規化されたデ−タを二進化する第４段階
と、前記第４段階の結果をコンピュ−タ−のメモリに貯蔵す
る第５段階からなることを特徴とする音声認識システム
の音声認識方法。3. A voice input means for inputting a voice signal, and a voice analysis means for dividing the signal from the voice input means into predetermined frequency bands and expressing the energy of the signal in each frequency band. A main computer for inputting a signal from the voice analysis means, comparing the sizes of respective frequency bands with respect to the signal, binarizing the signal, and outputting a voice signal corresponding to the binarized data to the outside. A voice recognition method for a voice recognition system, comprising: first inputting voice analysis data to the main computer;
A second step of detecting a voice section using the data from the first step; a third step of performing time axis normalization using the data of the second step; Speech recognition of a speech recognition system comprising a fourth step of binarizing time-normalized data and a fifth step of storing the result of the fourth step in a memory of a computer. Method.