JP2867695B2

Movatterモバイル変換

Info

Publication number: JP2867695B2
Application number: JP2327062A
Authority: JP
Inventors: 淳野口
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1990-11-28
Filing date: 1990-11-28
Publication date: 1999-03-08
Anticipated expiration: 2014-03-08
Also published as: JPH04195100A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、自動通訳システム，音声QAシステム等にお
いて連続的に発声した連続音声を認識する連続音声認識
装置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition device for recognizing continuous speech uttered continuously in an automatic interpretation system, a voice QA system, or the like.

〔従来の技術〕[Conventional technology]

従来、連続的に発生した音声をあらかじめ定められた
文法にしたがって認識する方法として、例えば電子情報
通信学会論文誌D Vol.J71−D No.9の第1650頁から1659
頁に「フレーム同期化，ビームサーチ，ベクトル量子化
の統合によるDPマッチングの高速化」（以下文献１と称
す）と題して発表された論文に、単語単位の標準パター
ンを正規文法を表す有限状態オートマトンに従って結合
したものを基にDPマッチングにより連続音声を認識する
方法がある。その他にも有限状態オートマインに従って
連続音声を認識する方法には、例えば（社）電子情報通
信学会編，中川聖一著「確率モデルによる音声認識」
（以下文献２と称す）の第29頁に述べられているような
隠れマルコフモデル（HMM）を用いた方法もある。Conventionally, as a method of recognizing continuously generated speech in accordance with a predetermined grammar, for example, IEICE Transactions D Vol.J71-D No.9, pp.1650 to 1659
A paper titled "Speeding up DP Matching by Integrating Frame Synchronization, Beam Search, and Vector Quantization" (hereinafter referred to as Reference 1) contains a finite state that represents a regular grammar with a standard pattern in word units. There is a method of recognizing continuous speech by DP matching based on a combination according to an automaton. Other methods for recognizing continuous speech in accordance with finite state automine include, for example, Seiichi Nakagawa, edited by the Institute of Electronics, Information and Communication Engineers, “Speech Recognition by Stochastic Model”
There is also a method using a hidden Markov model (HMM) as described on page 29 of (hereinafter referred to as reference 2).

以下、文献１に述べられている、有限状態オートマイ
ンで表現された文法を用いてフレーム同期DPマッチング
により連続音声を認識する場合について述べる。文献２
に述べられているようなHMMを用いる方法も、連続音声
を認識するための基本的な処理方法は同じであるので、
同様に扱える。また、以下は認識単位として単語を用い
た場合について説明するが、これ以外に音素などを認識
単位とすることができる。Hereinafter, a case will be described in which continuous speech is recognized by frame-synchronous DP matching using a grammar represented by finite state automine described in Document 1. Reference 2
The method using HMM as described in the above is the same as the basic processing method for recognizing continuous speech.
Can be treated similarly. In the following, a case where a word is used as a recognition unit will be described. However, a phoneme or the like may be used as a recognition unit.

入力された音声パターン（入力パターン）は特徴の時
系列、Ａ＝a₁,a₂,…,a_i,…,a_I （１）として表現する。一方単語ｎの標準パターンは、 Bⁿ＝b₁ⁿ,b₂ⁿ,…,b_jⁿ,…,b_Jn （２）と表す。入力パターンの特徴a_iと標準パターン特徴b_jⁿ
の間の距離をｄ（n;i,j）とする。単語レベル処理とし
て、以下に示す累積値ｇに対するDP漸化式を解くことに
より単語間距離を求めることができる。同時に経路値Ｌ
を計算することにより連続音声認識を行った場合の認識
結果をバックトレースすることができる。Input speech pattern (input pattern) is a time series of_{_{feature, A = a 1, a 2}} , ..., a, ..., expressed as a_I (1). Whereas the standard pattern of the word n^{_{^{is, B n = b 1 n,}}} b 2 n, ..., b j n, ..., expressed as b_Jn (2). Input pattern features a_i and standard pattern features b_jⁿ
Is d (n; i, j). As the word level processing, the inter-word distance can be obtained by solving the DP recurrence formula for the cumulative value g shown below. At the same time, the route value L
Is calculated, the result of recognition when continuous speech recognition is performed can be backtraced.

入力パターンのフレームｉにおける単語間距離はｇ
（n;i;Jn）として求められる。また、そのときの標準パ
ターンに対する入力パターンの始端のフレームは経路値
Ｌ（（n;i;Jn）として求められる。（３）式では累積値
の初期値として０を与えたが、文レベルの処理として有
限状態オートマトンに従って直前の単語の累積値を与
え、かつ単語レベルの認識結果を保存することにより連
続音声認識が可能になる。 The distance between words in frame i of the input pattern is g
(N; i; Jn). The frame at the beginning of the input pattern with respect to the standard pattern at that time is obtained as a path value L ((n; i; Jn). In equation (3), 0 is given as the initial value of the accumulated value. Continuous speech recognition becomes possible by giving the accumulated value of the immediately preceding word according to the finite state automaton and storing the word level recognition result.

これら、有限オートマトンを用いた方法と同等の処理
量で文脈自由文法を扱うことができる音声認識方式とし
ては、1989年電子情報通信学会春季全国大会「拡張遷移
網を用いた連続音声認識の一方式」（以下文献３と称
す）（吉田和永，渡辺隆夫）がある。拡張遷移網（以下
ATNと呼ぶ）を用いた音声認識は、 1.サブネットワークの呼び出しにより文脈自由文法が扱
える、 2.レジスタとそのレジスタをテストする機構より履歴を
考慮した処理を行い、語順自由，共起関係，係受けを扱
うことができる、等の自然言語を記述するための高い能力を有している。As a speech recognition method that can handle context-free grammar with the same processing amount as the method using the finite automaton, the 1989 IEICE Spring National Convention, “One method of continuous speech recognition using an extended transition network” (Hereinafter referred to as Reference 3) (Kazunaga Yoshida, Takao Watanabe). Extended transition network (below
Speech recognition using ATN): 1. Context-free grammar can be handled by calling a sub-network. 2. History-considered processing is performed by a register and a mechanism that tests the register, and free word order, co-occurrence, He has a high ability to describe natural languages, such as being able to handle dependency.

また、音声認識用のネットワーク中のある地点までの
経路によりその後の経路に制限を与える方法としては、
日本音響学会講演論文集昭和60年９月〜10月２−４−16
「ε−nodeを有するネットワークモデルに基づく音声会
話システム−到達経路に応じたε−nodeの重み付け」
（以下文献４と称す）（小林哲則）がある。Also, as a method of restricting a subsequent route by a route to a point in the voice recognition network,
Proceedings of the Acoustical Society of Japan September-October, 1985 2-4-16
"Speech Conversation System Based on Network Model with ε-node-Weighting of ε-node According to Reaching Route"
(Hereinafter referred to as Document 4) (Tetsunori Kobayashi).

〔発明が解決しようとする課題〕[Problems to be solved by the invention]

しかしながら、文献３の方法では、前記1.のサブネッ
トワークの呼び出しにより文脈自由文法を扱う方法につ
いては考察されているが、前記2.のレジスタとそのレジ
スタをテストする機構より履歴を考慮した処理を行うこ
とに対しては十分な考察がなされていない。However, in the method of Reference 3, although the method of handling the context-free grammar by calling the sub-network described in 1. above is considered, the processing considering the history is performed by the register of 2. and the mechanism for testing the register. There is not enough consideration on what to do.

例えば「７月４日のマドンナのコンサートに行きた
い。」と「マドンナの７月４日のコンサートに行きた
い。」のように語順の異なる文を受理するネットワーク
を記述する場合、レジスタをテストする機構が備わって
いれば第11図（ａ）に示すようなネットワークとして記
述できる。これに対し、レジスタをテストする機構が備
わっていなければ第11図（ｂ）に示すようにそれぞれの
語順を展開した形のネットワークとして記述しなければ
ならず、ネットワーク中の状態数および遷移数が増え、
その結果計算量が増加する。When writing a network that accepts sentences in different word order, such as "I want to go to the Madonna concert on July 4" and "I want to go to the Madonna concert on July 4", test the register If a mechanism is provided, it can be described as a network as shown in FIG. On the other hand, if there is no mechanism for testing the register, it must be described as a network in which each word order is expanded as shown in FIG. 11 (b), and the number of states and transitions in the network is Increase
As a result, the amount of calculation increases.

一方、文献４の方法には、レジスタのテスト機構は述
べられているものの、フレーム同期に音声認識を行うこ
とに対しては考察がなされていない。On the other hand, although the method of Reference 4 describes a register testing mechanism, no consideration is given to performing speech recognition in frame synchronization.

本発明の目的は、このような欠点を克服した、フレー
ム同期化により実時間性が良く高速であり、また、経路
を考慮した処理を行うことにより認識対象文を表すネッ
トワークを簡素化することが可能で、かつ意味的に誤っ
た文を出力せずにより自由な発声を認識することができ
る連続音声認識装置を提供することにある。An object of the present invention is to overcome such drawbacks, to achieve high-speed real-time performance by frame synchronization, and to simplify a network representing a recognition target sentence by performing processing in consideration of a path. It is an object of the present invention to provide a continuous speech recognition device capable of recognizing a free utterance without outputting a sentence that is semantically incorrect.

〔課題を解決するための手段〕[Means for solving the problem]

第１の発明による連続音声認識装置は、認識対象の文法を表現するネットワークを記憶するネ
ットワークメモリと、予め定められた認識単位の標準パ
ターンを、前記ネットワークに従って結合して連続音声
を認識する連続音声認識装置において、標準パターンを記憶しておく標準パターンメモリと、入力された音声パターンと標準パターンの各フレーム
間の距離を求める距離計算部と、前記音声パターンと標準パターンのフレームを対応付
けるマッチングパス上の前記距離の累積値を求める累積
値計算部と、前記累積値を記憶しておく累積値メモリと、累積値が最小となるように経路を記憶しておく戻り点
メモリと、前記経路の計算を行う戻り処理部と、前記ネットワーク上の予め定められた部分をたどった
場合にある経路制約値を記憶する経路メモリと、前記経路メモリの内容と前記ネットワークに予め用意
されている判定条件によりテストを行いその結果により
次にたどる経路を限定する経路限定部と、対話の内容を記憶しておく対話情報記憶部と、この対話情報記憶部の内容により、前記経路メモリに
経路制約値を記憶する対話情報処理部と、を有することを特徴としている。According to a first aspect of the present invention, there is provided a continuous speech recognition apparatus comprising: a network memory that stores a network expressing a grammar to be recognized; and a standard pattern of a predetermined recognition unit, which is connected to the continuous memory to recognize the continuous speech. In the recognition device, a standard pattern memory for storing a standard pattern, a distance calculation unit for calculating a distance between each frame of the input voice pattern and each frame of the standard pattern, and a matching path for associating the voice pattern with a frame of the standard pattern. A cumulative value calculating unit for calculating the cumulative value of the distance; a cumulative value memory for storing the cumulative value; a return point memory for storing a route so that the cumulative value is minimized; A return processing unit for performing the following, and a route constraint value when a predetermined portion on the network is traced. A path memory to perform, a path limiter for performing a test based on the contents of the path memory and a judgment condition prepared in advance in the network, and restricting a path to be followed next according to a result thereof; dialog information for storing contents of the dialog A storage unit; and a dialog information processing unit that stores a route constraint value in the route memory based on the contents of the dialog information storage unit.

また、第２の発明による連続音声認識装置は、前記経路メモリに記憶した経路制約値をネットワーク
上に予め用意されていた情報により特定の経路制約値を
消去する経路メモリ消去部を有することを特徴としてい
る。Further, the continuous speech recognition apparatus according to the second invention is characterized in that the continuous speech recognition device has a route memory erasing unit that erases a specific route constraint value from the route constraint value stored in the route memory based on information prepared in advance on a network. And

また、第３の発明による連続音声認識装置は、前記経路メモリの内容を保持する経路制約値保持部を
有することを特徴としている。Further, the continuous speech recognition device according to the third invention is characterized in that it has a route constraint value holding unit for holding the contents of the route memory.

〔作用〕[Action]

本発明による連続音声認識の作用について説明する。
本発明は、フレーム同期で、経路を考慮した処理を行う
ことを可能にしたものである。The operation of continuous speech recognition according to the present invention will be described.
The present invention makes it possible to perform processing in consideration of a path in frame synchronization.

第６図は経路を考慮した処理の動作を説明する図であ
る。例えば、入力音声が「He speaks English」である
とする。まず最初にstartの位置からネットワークをた
どっていく。ここで、“He"を認識する経路をたどった
とすると“set SANTAN"とあるので経路メモリにSANTAN
という経路制約値が記憶される。ここでSANTANは３人称
単数を表している。この結果、次の“speak"にいく経路
から“speaks"にいく経路かを選ぶ際に“speak"にいく
経路は“not SANTAN"とあるのでたどれなくなる。ま
た、仮に入力音声が「I speak English」であるとする
と、“I"を認識する経路をたどったときには何も経路制
約値は記憶されないので、“speaks"にいく経路は“if
SANTAN"とあるのでたどれなくなる。第１の発明におい
ては、上述の経路を考慮した処理と同時に、（３）〜
（６）式の漸化式計算を行う。この結果、意味的におか
しい文を排除できる。よって、意味的におかしい文を排
除するためにネットワークを冗長にする必要がないので
ネットワークを簡素化することが可能になる。FIG. 6 is a diagram for explaining the operation of the processing in consideration of the route. For example, assume that the input voice is "He speaks English". First, follow the network from the start position. Here, if the route for recognizing “He” is traced, there is “set SANTAN”.
Is stored. Here, SANTAN represents the third person singular. As a result, when selecting the route from the next route to “speak” to the route to “speaks”, the route to “speak” is “not SANTAN” and cannot be traced. Also, if the input voice is "I speak English", no route constraint value is stored when following the route that recognizes "I", so the route to "speaks" is "if
SANTAN "is lost. In the first invention, at the same time as the processing considering the above-described route, (3) to
The recurrence formula calculation of the formula (6) is performed. As a result, semantically incorrect sentences can be eliminated. Therefore, it is not necessary to make the network redundant in order to eliminate semantically incorrect sentences, so that the network can be simplified.

ところで、例えば「He speaks English and I speak
Japanese.」という音声が入力されたとき上述の方法で
は“He speaks English"の部分が認識された時点で“SA
NTAN"という経路制約値が経路メモリに記憶される。こ
のままでは、これに続く“and I speak Japanese."を認
識する時にも経路メモリに“SANTAN"という経路制約値
があるため誤った認識を行ってしまう。そこで、第２の
発明では、経路メモリに記憶した経路制約値をネットワ
ーク上にあらかじめ用意されていた情報により特定の経
路制約値を消去することにより、経路メモリ上の経路制
約値をネットワーク上の任意の時点で初期化することを
可能にしている。By the way, for example, "He speaks English and I speak
When "He speaks English" is recognized in the above method when the voice "Japanese." Is input, "SA
The route constraint value of "NTAN" is stored in the route memory. In this state, even when recognizing the subsequent "and I speak Japanese." Therefore, in the second invention, the route constraint value stored in the route memory is erased from the specific route constraint value based on information prepared in advance on the network, so that the route constraint value in the route memory is deleted from the network. It is possible to initialize at any point above.

例えば、第７図に示すネットワークでは“and"の直前
に経路メモリの経路制約値“SANTAN"を消去するような
情報を用意することにより経路メモリに記憶した特定の
経路制約値“SANTAN"を消去するようにしている。この
結果、「He speaks English and I speak Japanese.」
のような文も正しく認識される。For example, in the network shown in FIG. 7, the specific route constraint value “SANTAN” stored in the route memory is erased by preparing information such that the route constraint value “SANTAN” in the route memory is erased immediately before “and”. I am trying to do it. As a result, "He speaks English and I speak Japanese."
Is also correctly recognized.

次に、第８図（ａ）のネットワークで表現される文が
対話中で発話された後に、第８図（ｂ）のネットワーク
で表現される文が発話されたとする。この場合、例えば
第８図（ａ）で“現金にしますか”という文が発話され
たとすると次に発話された文として例えば“いいえ現金
にします”は意味的におかしい。そこで、第３の発明で
は、対話における文脈情報を用いてレジスタに経路制約
値をセットする対話情報処理部により、このような意味
的におかしい文を排除する。これによってネットワーク
を簡素化できる。Next, it is assumed that a sentence expressed in the network of FIG. 8B is uttered after the sentence expressed in the network of FIG. 8A is uttered during the dialogue. In this case, if, for example, the sentence "Would you like to make cash?" Is uttered in FIG. 8 (a), for example, "No cash will be made" is the next uttered sentence. Therefore, in the third invention, such a semantically strange sentence is eliminated by a dialog information processing unit that sets a path constraint value in a register using context information in the dialog. This can simplify the network.

また、第４の発明では、ある入力音声を認識した時点
での認識結果の経路上の経路制約値を経路制約値保持部
に保持しておき、その後の入力音声の認識開始時にここ
に保持されている経路制約値を経路メモリにセットす
る。これによって、以前の入力音声の内容に従ってネッ
トワーク中の経路を制約することができる。Further, in the fourth invention, a path constraint value on a route as a result of recognition at the time of recognizing a certain input voice is stored in a path restriction value storage unit, and is stored here when the input voice recognition starts thereafter. The route constraint value is set in the route memory. Thus, the route in the network can be restricted according to the content of the previous input voice.

このように本発明の手法を用いることにより、このよ
うな対話中で意味的におかしい文を排除することができ
る。By using the method of the present invention as described above, sentences that are semantically strange in such a dialog can be excluded.

〔実施例〕〔Example〕

本発明による連続音声認識装置の実施例について図面
を参照して説明する。An embodiment of a continuous speech recognition apparatus according to the present invention will be described with reference to the drawings.

まず、第１の発明による一実施例について説明する。 First, an embodiment according to the first invention will be described.

第１図は第１の発明による一実施例を示す構成図であ
る。FIG. 1 is a block diagram showing one embodiment according to the first invention.

第２図（ａ），（ｂ）は第１の発明における文レベル
処理を説明する模式図である。第２図（ａ）は経路メモ
リの内容を図示し、第２図（ｂ）はネットワークの一例
を示している。FIGS. 2A and 2B are schematic diagrams for explaining the sentence level processing in the first invention. FIG. 2A shows the contents of the path memory, and FIG. 2B shows an example of the network.

この連続音声認識装置は、認識対象の文法を表現する
ネットワークを記憶するネットワークメモリ７と、標準
パターンを記憶しておく標準パターンメモリ１と、入力
された音声パターンと標準パターンの各フレーム間の距
離を求める距離計算部２と、音声パターンと標準パター
ンのフレームを対応付けるマッチングパス上の前記距離
の累積値を求める累積値計算部３と、累積値を記憶して
おく累積値メモリ４と、累積値が最小となるような経路
を記憶しておく戻り点メモリ８と、その経路の計算を行
う戻り処理部と、ネットワーク上のあらかじめ定められ
た部分をたどった場合にある経路制約値を記憶する経路
メモリ６と、経路メモリ６の内容とネットワークにあら
かじめ用意されている判定条件によりテストを行いその
結果により次にたどる経路を限定する経路限定部５とを
有している。This continuous speech recognition apparatus includes a network memory 7 for storing a network expressing a grammar to be recognized, a standard pattern memory 1 for storing a standard pattern, and a distance between each frame of the input speech pattern and the standard pattern. , A cumulative value calculator 3 for determining the cumulative value of the distance on the matching path that associates the voice pattern with the frame of the standard pattern, a cumulative value memory 4 for storing the cumulative value, A return point memory 8 for storing a route that minimizes the error, a return processing unit for calculating the route, and a route for storing a route constraint value when a predetermined portion of the network is traced. A test is performed based on the contents of the memory 6 and the contents of the path memory 6 and a judgment condition prepared in advance in the network. And a path limiting portion 5 to limit the path.

標準パターンメモリ１には、予め標準パターンＢが保
持されている。距離計算部２では、入力パターンＡのｉ
番目のフレームの特徴a_iと、単語ｎの標準パターンBⁿの
ｊ番目のフレームの特徴量b_jⁿが読みだされ、特徴量間
の距離ｄ（n;i,j）が計算され出力される。また、累積
値計算部３では、入力された距離ｄを用いて、漸化式計
算（５）が行われる。漸化式計算に必要な累積値ｆは、
累積値メモリ４の中に保持されており、必要に応じて累
積値計算３より、読みだし，書き込みの処理が行われ
る。累積値の計算ごとに、式（６）にしたがって累積値
が小さいものの経路値Ｌが戻り点メモリ８に蓄えられ
る。全累積値計算後、戻り点メモリ８の経路値Ｌをもと
に戻り処理部９により最適な経路が得られる。The standard pattern memory 1 stores a standard pattern B in advance. In the distance calculation unit 2, i of the input pattern A
The feature a_i of the n th frame and the feature amount b_jⁿ of the j th frame of the standard pattern Bⁿ of the word n are read out, and the distance d (n; i, j) between the feature amounts is calculated and output. You. The cumulative value calculation unit 3 performs recurrence formula calculation (5) using the input distance d. The cumulative value f required for the recurrence formula calculation is
The data is held in the cumulative value memory 4, and the reading and writing processes are performed by the cumulative value calculation 3 as necessary. For each calculation of the accumulated value, the path value L of the smaller accumulated value is stored in the return point memory 8 according to the equation (6). After the calculation of all the accumulated values, the return processor 9 obtains an optimal route based on the route value L in the return point memory 8.

また文レベルの処理としては、経路メモリ６にネット
ワークメモリ７上のあらかじめ定められた部分をたどっ
た場合にその部分に対応する経路制御値を記憶し、経路
限定部５にて経路メモリ６の内容ｅとネットワークメモ
リ７上にあらかじめ用意されている判定条件ｃによりテ
ストを行い、その結果ｇにより次にたどる経路を限定す
る。その結果、戻り処理部９より認識結果Ｚが得られ
る。Further, as sentence level processing, when a predetermined portion of the network memory 7 is traced to the route memory 6, a route control value corresponding to the portion is stored. A test is performed according to e and a determination condition c prepared in advance on the network memory 7, and as a result g, the route to be followed next is limited. As a result, a recognition result Z is obtained from the return processing unit 9.

第１図において経路メモリ６は各入力フレームｉの各
単語ｎの標準パターンフレームｊに対して初期状態から
（n;i,j）までの経路にある経路制約値をＰ（n;i,j）と
して保持するものである。In FIG. 1, the path memory 6 stores a path constraint value P (n; i, j) in the path from the initial state to (n; i, j) for the standard pattern frame j of each word n of each input frame i. ).

いま、第１図のネットワークメモリ７が各単語の始端
フレームにおいて第２図（ｂ）のように用意されている
とする。以下、入力パターンｉフレームの処理について
述べる。Now, it is assumed that the network memory 7 of FIG. 1 is prepared at the start frame of each word as shown in FIG. 2B. Hereinafter, the processing of the input pattern i-frame will be described.

まず経路限定部５にて、経路制約値をセットするため
に単語ｎの標準パターンの終端フレームj_eⁿにおいて、
もしネットワーク上での単語ｎからその他の単語への遷
移の際に経路制約値のセットコマンドがあったら、その
経路制約値を経路メモリ６中のＰ（n;i,j）に追加する
という処理を行う。First the route limitation unit 5, the end frame j_eⁿ of the standard pattern of the word n to set the path constraint value,
If there is a route constraint value set command at the time of transition from word n to another word on the network, a process of adding the route constraint value to P (n; i, j) in route memory 6 I do.

次に（４）式により計算されたｋ^＊に従って（７）式
の計算を行い経路制約値を伝搬させる。Next, the calculation of Expression (7) is performed according to k^* calculated by Expression (4), and the path constraint value is propagated.

Ｐ（n;i;j）＝Ｐ（n;j−1;j−ｋ^＊） ……（７）次にテストについて説明する。ネットワーク上で単語
ｎの始端j_sⁿへの遷移にテストの条件があったら、その
条件にしたがってテストを行う。もし、条件が満足され
なかった場合は累積距離ｇ（n;i,j_sⁿ）を無限大にす
る。P (n; i; j) = P (n; j−1; j−k^* ) (7) Next, the test will be described. If there is a test condition for the transition of the word n to the beginning j_sⁿ on the network, the test is performed according to the condition. If, when the condition is not satisfied cumulative distance g; to⁽ⁿ i, j_s n) to infinity.

例えば第１図のネットワークメモリ７上の情報から経
路メモリ６上にSANTANという経路制約値がセットされて
いるとする。第６図に示すように、標準パターン“I",
“He"から標準パターン“speak",“speaks"への経路の
うち“speak"を通る経路にはネットワークメモリ７上に
判定条件“if SANTAN"とあるので、経路限定部５にて、
“speak"の始端に無限大の累積値を与えるように累積値
計算部３に情報ｘが送られ漸化式計算（５）を行う。そ
の結果、“speak"にいく経路を認識しないようになる。
他の状態に対しても同じような処理を行う。For example, it is assumed that a route constraint value of SANTAN is set in the route memory 6 from the information in the network memory 7 in FIG. As shown in FIG. 6, the standard pattern “I”,
Among the routes from “He” to the standard patterns “speak” and “speaks”, the route passing through “speak” has the determination condition “if SANTAN” on the network memory 7.
Information x is sent to the cumulative value calculator 3 so as to give an infinite cumulative value to the beginning of “speak”, and recurrence formula calculation (5) is performed. As a result, the route to "speak" is not recognized.
Similar processing is performed for other states.

この結果、認識結果として「I speaks English.」や
「He speak English.」等の非文を生成しないような認
識が可能となる。As a result, recognition can be performed such that non-sentences such as “I speaks English.” And “He speak English.” Are not generated as recognition results.

次に第２の発明による一実施例について説明する。第
３図は第２の発明による一実施例を示す構成図である。
この連続音声認識装置は、第１図の構成に加えて、経路
メモリ６に記憶した経路制約値をネットワーク上にあら
かじめ用意されていた情報により特定の経路制約値を消
去する経路メモリ消去部12を有している。そして、この
経路メモリ消去部12は、ネットワークメモリ７上にあら
かじめ用意されていた情報により経路メモリ６に記憶し
た特定の経路制約値を消去するような命令ｍを送る。Next, an embodiment according to the second invention will be described. FIG. 3 is a block diagram showing an embodiment according to the second invention.
This continuous speech recognition apparatus includes, in addition to the configuration shown in FIG. 1, a path memory erasing unit 12 for erasing a path restriction value stored in a path memory 6 based on information prepared in advance on a network. Have. Then, the route memory erasing unit 12 sends an instruction m for erasing a specific route constraint value stored in the route memory 6 based on information prepared in advance on the network memory 7.

第７図にネットワークメモリの一例を示す。第７図の
ような場合、例えば「He speaks English and I speak
Japanese.」という音声が入力された時に、“He speaks
English."の部分の経路をたどった時点で“SANTAN"と
いう経路制約値が第３図の経路メモリ６に記憶されてい
る。このままでは、“and I speak Japanese."を認識す
る時にも経路メモリ６に“SANTAN"という経路制約値が
あるため誤った認識を行ってしまう。従って、第７図の
ようにネットワークメモリ上に“and"の経路をたどった
後に経路メモリ６の経路制約値“SANTAN"を消去するよ
うな情報を用意することにより経路メモリ消去部12が経
路メモリ６に記憶した特定の経路制約値“SANTAN"を消
去するような命令を送る。他の処理は第１図の連続音声
認識装置と全く同じである。この結果、「He speaks En
glish and I speak Japanese.」のような文も正しく認
識される。FIG. 7 shows an example of the network memory. In the case of Fig. 7, for example, "He speaks English and I speak
"He speaks" when the voice "Japanese."
The path constraint value "SANTAN" is stored in the path memory 6 in Fig. 3 at the time of following the path of "English." In this state, the path memory is also used when recognizing "and I speak Japanese." 6, there is a route constraint value of "SANTAN", so that the recognition is erroneous. Therefore, as shown in Fig. 7, after following the route of "and" on the network memory, the route constraint value "SANTAN" of the route memory 6 is obtained. By preparing information to erase "", the route memory erasing unit 12 sends a command to erase the specific route constraint value "SANTAN" stored in the route memory 6. Other processes are the same as those in FIG. It is exactly the same as a speech recognizer, resulting in "He speaks En
glish and I speak Japanese. "

次に第３の発明による一実施例について説明する。第
４図は第３の発明による一実施例を示す構成図である。
この連続音声認識装置は、第１図の構成に加えて、対話
の内容を記憶しておく対話情報記憶部11と、この記憶部
の内容により経路メモリ６に経路制約値を記憶する対話
情報処理部10とを有している。Next, an embodiment according to the third invention will be described. FIG. 4 is a block diagram showing one embodiment according to the third invention.
This continuous speech recognition apparatus has, in addition to the configuration shown in FIG. 1, a conversation information storage unit 11 for storing the contents of a conversation, and a conversation information processing for storing a path constraint value in a path memory 6 based on the contents of this storage unit. And a part 10.

さらに、第10図は第３の発明による一実施例を説明す
る模式図である。FIG. 10 is a schematic diagram for explaining an embodiment according to the third invention.

対話中の音声認識の結果のネットワーク上のノード列
を対話情報記憶部11に記憶しておき、その内容およびネ
ットワークメモリ７に予め記された情報から対話情報処
理部10が経路メモリ６に経路制約値を記憶するように指
示をする。The node sequence on the network as a result of the speech recognition during the dialogue is stored in the dialogue information storage unit 11, and the dialogue information processing unit 10 stores the route constraint in the route memory 6 based on the contents and the information recorded in the network memory 7 in advance. Instructs to memorize the value.

例えば、第８図（ａ），（ｂ）のような認識用のネッ
トワークメモリが用意されているときに、「カードにし
ますか」が認識結果であるとする。対話中でこのような
発話が認識されたとき、次発話として「はい、現金にし
ます。」，「いいえ、カードにします。」のような文は
意味的に誤った文となる。この“カードにしますか”と
いうノードは、第８図（ａ）に示したように、２番のノ
ードなので対話情報記憶部11にこの２という番号が記憶
される。For example, when network memories for recognition as shown in FIGS. 8A and 8B are prepared, it is assumed that "Do you want to use a card" is the recognition result. When such an utterance is recognized during the dialogue, a sentence such as “Yes, cash” or “No, card” is a semantically incorrect sentence as the next utterance. Since the node "Do you want to make a card" is the second node as shown in FIG. 8 (a), the number 2 is stored in the conversation information storage unit 11.

このとき、対話情報記憶部11に記憶されているノード
列を用いて対話情報処理部10が予め内部のテーブルに与
えられた情報から対話中にこれまで現れたノードのノー
ド番号をもとに第８図（ｂ）に示したように、ネットワ
ーク上の特定の場所で経路制約値をセットしたりテスト
をしたりするようにネットワークに情報を付け加える。
この結果、例えば第10図（ｂ）で“はい”というノード
が選ばれたとすると経路制約値にＸという値がセットさ
れるので、判定条件“test not X"より番号６のノード
へは遷移ができなくなる。この結果、対話中で意味的に
誤った文を排除することができる。At this time, the dialogue information processing unit 10 uses the node sequence stored in the dialogue information storage unit 11 based on information given in advance to an internal table based on the node numbers of the nodes that have appeared so far during the dialogue. 8 As shown in FIG. 8 (b), information is added to the network so that a route constraint value is set or tested at a specific place on the network.
As a result, for example, if the node “Yes” is selected in FIG. 10 (b), the value X is set as the route constraint value, so that the transition to the node of number 6 is made from the determination condition “test not X”. become unable. As a result, sentences that are semantically incorrect during the conversation can be eliminated.

また、対話情報処理部10の動作としては、第10図
（ａ）に示したような予め用意されたテーブルを用いる
方法以外にも、対話情報記憶部11の内容から論理式を作
成し、その論理式を評価することにより、対話上の意味
が誤った文を出力しないように経路制約値を与えるよう
な情報を経路メモリ６に送るという方法もある。In addition to the operation of the dialog information processing unit 10, in addition to the method using a table prepared in advance as shown in FIG. There is also a method in which information that gives a route constraint value is sent to the route memory 6 by evaluating a logical expression so as not to output a sentence having a wrong meaning in dialogue.

他の処理は第１図の連続音声認識装置と全く同じであ
る。Other processes are exactly the same as those of the continuous speech recognition apparatus of FIG.

次に第４の発明による一実施例について説明する。第
５図は第４の発明による一実施例を示す構成図である。
この連続音声認識装置は、第２図の連続音声認識装置の
構成に加えて、経路メモリ６の内容を保持する経路メモ
リ保持部13を有している。そして、この経路メモリ保持
部13にこれまで入力された文のうち予め定められたいく
つかの文でセットされた経路メモリ６内の経路制約値の
内容を保存しておくことにより、ネットワークメモリ７
内のあるメモリで記憶されたある経路制約値の内容をネ
ットワークメモリ７の他のネットワーク内で使用する。
その他の処理は第２図の連続音声認識装置と全く同じで
ある。Next, an embodiment according to the fourth invention will be described. FIG. 5 is a block diagram showing one embodiment according to the fourth invention.
This continuous speech recognition apparatus has a path memory holding unit 13 for holding the contents of the path memory 6 in addition to the configuration of the continuous speech recognition apparatus shown in FIG. The path memory holding unit 13 stores the contents of the path constraint values in the path memory 6 set in some predetermined sentences among the sentences input so far, so that the network memory 7 is stored.
The contents of a certain route constraint value stored in a certain memory in the network memory 7 are used in another network.
Other processes are exactly the same as those of the continuous speech recognition apparatus of FIG.

例えば、第９図（ａ），（ｂ）のネットワークを用い
て認識する場合について考える。第９図（ａ）に示した
ネットワークにより「現金にしますか」という文が認識
された時点でGENKINという経路制約値が経路制約値保持
部に記憶される。その後、第９図（ｂ）のネットワーク
を用いて累積値の計算をする際に、例えば“はい”を通
る経路が選ばれたとするとこのときに記憶される経路制
約値“YES"と共に第９図（ａ）で記憶された経路制約値
“GENKIN"も用いてテストを行う。従って、上述の場合
は第９図（ｂ）のネットワーク上に用意された判定条件
により「カードにします」は認識結果として出力されな
くなる。For example, consider a case where recognition is performed using the networks in FIGS. 9 (a) and 9 (b). When the sentence "Do you want to make cash?" Is recognized by the network shown in FIG. 9 (a), the route constraint value GENKIN is stored in the route constraint value holding unit. Thereafter, when calculating a cumulative value using the network of FIG. 9 (b), for example, if a route that passes through “yes” is selected, the route constraint value “YES” stored at this time is displayed together with the route constraint value “YES”. A test is also performed using the route constraint value “GENKIN” stored in (a). Therefore, in the case described above, "make card" is not output as a recognition result due to the determination conditions prepared on the network in FIG. 9 (b).

この結果、意味的におかしい文を排除することができ
るようになるため、ネットワークメモリを簡素化できる
ようになる。As a result, semantically incorrect sentences can be eliminated, so that the network memory can be simplified.

〔発明の効果〕〔The invention's effect〕

以上述べたように本発明によれば、フレーム同期化に
より実時間性が良く高速であり、また、経路を考慮した
処理を行うことにより認識対象文を表すネットワークを
簡素化することが可能でかつより自由な発声を認識する
ことができる連続音声認識装置を提供することができ
る。As described above, according to the present invention, it is possible to simplify the network representing the sentence to be recognized by performing processing in consideration of the path and realizing high-speed real time performance by frame synchronization, and A continuous speech recognition device capable of recognizing free speech can be provided.

【図面の簡単な説明】[Brief description of the drawings]

第１図は第１の発明による一実施例を示す構成図、第２図は第１の発明における文レベル処理を説明する模
式図、第３図は第２の発明による一実施例を示す構成図、第４図は第３の発明による一実施例を示す構成図、第５図は第４の発明による一実施例を示す構成図、第６図は第１の発明における文レベルの処理を説明する
模式図、第７図は第２の発明における文レベルの処理を説明する
模式図、第８図は第３の発明における文レベルの処理を説明する
模式図、第９図は第４の発明における文レベルの処理を説明する
模式図、第10図は第３の発明による一実施例を説明する模式図、第11図は本発明が解決しようとする問題点を説明する模
式図である。１……標準パターンメモリ２……距離計算部３……累積値計算部４……累積値メモリ５……経路限定部６……経路メモリ７……ネットワークメモリ８……戻り点メモリ９……戻り処理部 10……対話情報処理部 11……対話情報記憶部 12……経路メモリ消去部 13……線路メモリ保持部（経路制約値保持部）FIG. 1 is a configuration diagram showing one embodiment according to the first invention, FIG. 2 is a schematic diagram illustrating sentence level processing in the first invention, and FIG. 3 is a configuration showing one embodiment according to the second invention. FIG. 4, FIG. 4 is a block diagram showing one embodiment according to the third invention, FIG. 5 is a block diagram showing one embodiment according to the fourth invention, and FIG. 6 is a sentence level process in the first invention. FIG. 7 is a schematic diagram illustrating a sentence level process in the second invention, FIG. 8 is a schematic diagram illustrating a sentence level process in the third invention, and FIG. 9 is a fourth diagram. FIG. 10 is a schematic diagram illustrating a sentence level process in the invention, FIG. 10 is a schematic diagram illustrating an embodiment according to the third invention, and FIG. 11 is a schematic diagram illustrating a problem to be solved by the present invention. . 1 Standard pattern memory 2 Distance calculation unit 3 Cumulative value calculation unit 4 Cumulative value memory 5 Route limiting unit 6 Path memory 7 Network memory 8 Return point memory 9 Return processing unit 10: Dialog information processing unit 11: Dialog information storage unit 12: Path memory erasing unit 13: Track memory holding unit (path constraint value holding unit)

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 - 9/18 ＪＩＣＳＴ（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continued on the front page (58) Field surveyed (Int.Cl.⁶ , DB name) G10L 3/00-9/18 JICST (JOIS)

Claims

Translated fromJapanese

(57)【特許請求の範囲】(57) [Claims]

【請求項１】認識対象の文法を表現するネットワークを
記憶するネットワークメモリと、予め定められた認識単
位の標準パターンを、前記ネットワークに従って結合し
て連続音声を認識する連続音声認識装置において、標準パターンを記憶しておく標準パターンメモリと、入力された音声パターンと標準パターンの各フレーム間
の距離を求める距離計算部と、前記音声パターンと標準パターンのフレームを対応付け
るマッチングパス上の前記距離の累積値を求める累積値
計算部と、前記累積値を記憶しておく累積値メモリと、累積値が最小となるように経路を記憶しておく戻り点メ
モリと、前記経路の計算を行う戻り処理部と、前記ネットワーク上の予め定められた部分をたどった場
合にある経路制約値を記憶する経路メモリと、前記経路メモリの内容と前記ネットワークに予め用意さ
れている判定条件によりテストを行いその結果により次
にたどる経路を限定する経路限定部と、対話の内容を記憶しておく対話情報記憶部と、この対話情報記憶部の内容により、前記経路メモリに経
路制約値を記憶する対話情報処理部と、を有することを特徴とする連続音声認識装置。1. A continuous speech recognition apparatus for recognizing continuous speech by combining a network memory for storing a network expressing a grammar to be recognized and a standard pattern of a predetermined recognition unit according to the network. A memory for storing a standard pattern memory, a distance calculator for calculating a distance between each frame of the input voice pattern and each frame of the standard pattern, and a cumulative value of the distance on a matching path for associating the voice pattern with the frame of the standard pattern. A cumulative value calculation unit for calculating the cumulative value, a cumulative value memory for storing the cumulative value, a return point memory for storing a route so that the cumulative value is minimized, and a return processing unit for calculating the route. A route memory that stores a route constraint value when a predetermined portion on the network is traced; A route limiting unit that performs a test based on the contents of the route memory and a determination condition prepared in advance in the network, and limits a route to be followed according to the result; a dialog information storage unit that stores the content of the dialog; A conversation information processing unit that stores a path constraint value in the path memory according to the contents of the information storage unit.

【請求項２】前記経路メモリに記憶した経路制約値をネ
ットワーク上に予め用意されていた情報により特定の経
路制約値を消去する経路メモリ消去部を有することを特
徴とする請求項１記載の連続音声認識装置。2. The continuity according to claim 1, further comprising a route memory erasing unit for erasing a specific route constraint value from the route constraint value stored in the route memory based on information prepared in advance on a network. Voice recognition device.

【請求項３】前記経路メモリの内容を保持する経路制約
値保持部を有することを特徴とする請求項１または２記
載の連続音声認識装置。3. The continuous speech recognition apparatus according to claim 1, further comprising a route constraint value holding unit for holding the contents of the route memory.