JP2022094442A

Movatterモバイル変換

Info

Publication number: JP2022094442A
Application number: JP2020207326A
Authority: JP
Inventors: 伸也日月; Shinya Tachimori
Original assignee: Onkyo Corp
Current assignee: Onkyo Corp
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2022-06-27

Abstract

To provide means for preventing wrong recognition in voice recognition.SOLUTION: A voice recognition system executes: voice recognition processing which recognizes voice; identification processing which identifies a result estimated to be a predetermined portion, from the result of the voice recognition processing; Chinese character searching processing which searches a Chinese character of the same sound as the predetermined portion identified by the identification processing; and display processing which displays the Chinese character of the same sound searched by the Chinese character searching processing, and the Chinese character by the voice recognition processing. The predetermined portion is a family name or a name. Moreover, the voice recognition system further executes reception processing which receives any selection, from the Chinese character of the same sound searched by the Chinese character searching processing, and the Chinese character by the recognition processing, displayed by the display processing.SELECTED DRAWING: Figure 1

Description

Translated fromJapanese

本発明は、音声認識を行う音声認識システム、及び、音声認識方法に関する。 The present invention relates to a voice recognition system that performs voice recognition and a voice recognition method.

音声認識が用いられ、日本語の音声からテキストに人名が変換された場合、都道府県名と同じ発音の苗字が、人名ではなく、都道府県名に誤認識されたり、異なる漢字の氏名が誤った漢字で出力されたりする場合がある。例えば、「ナガノ」という発音の苗字は、「長野」、「永野」の場合があるが、都道府県名と同じ「長野」に誤認識される場合がある（図２参照。）。例えば、音声認識された情報に基づいて、荷物等を配達するサービスでは（例えば、特許文献１参照。）、誤った氏名で送付されると、誤配達となる可能性があり、問題である。 When voice recognition is used and a person's name is converted from Japanese voice to text, the surname with the same pronunciation as the prefecture name is mistakenly recognized as the prefecture name instead of the person's name, or the name of a different kanji is wrong. It may be output in Kanji. For example, the surname pronounced "Nagano" may be "Nagano" or "Nagano", but may be mistakenly recognized as "Nagano", which is the same as the prefecture name (see Fig. 2). For example, in a service that delivers a package or the like based on voice-recognized information (see, for example, Patent Document 1), if it is sent with an incorrect name, it may result in an incorrect delivery, which is a problem.

特開２０２０－０５７９３２号公報Japanese Unexamined Patent Publication No. 2020-057932

上述したように、音声認識において、誤認識が発生すると、問題が発生する可能性がある。 As mentioned above, in speech recognition, if misrecognition occurs, problems may occur.

本発明の目的は、音声認識における誤認識を防止する手段を提供することである。 An object of the present invention is to provide a means for preventing erroneous recognition in speech recognition.

第１の発明の音声認識システムは、音声を認識する音声認識処理と、前記音声認識処理の結果から、所定部分と推定される結果を特定する特定処理と、前記特定処理により特定された前記所定部分と同音の漢字を検索する漢字検索処理と、前記漢字検索処理により検索された前記同音の漢字と、前記音声認識処理による漢字と、を表示する表示処理と、を実行することを特徴とする。 The voice recognition system of the first invention has a voice recognition process for recognizing a voice, a specific process for specifying a result presumed to be a predetermined portion from the result of the voice recognition process, and the predetermined process specified by the specific process. It is characterized by executing a kanji search process for searching for a kanji having the same sound as a portion, and a display process for displaying the kanji with the same sound searched by the kanji search process and the kanji by the voice recognition process. ..

本発明では、音声認識処理による漢字と、漢字検索処理により検索された同音の漢字と、が表示される。これにより、オペレーター等は、正しい漢字を選択することができるため、音声認識における誤認識を防止することができる。 In the present invention, the kanji by the voice recognition process and the kanji of the same sound searched by the kanji search process are displayed. As a result, the operator or the like can select the correct Chinese character, and thus it is possible to prevent erroneous recognition in voice recognition.

第２の発明の音声認識システムは、第１の発明の音声認識システムにおいて、前記表示処理により表示される、前記漢字検索処理により検索された前記同音の漢字と、前記音声認識処理による漢字と、から、いずれかの選択を受け付ける受付処理をさらに実行することを特徴とする。 The voice recognition system of the second invention includes the same-sound Chinese characters displayed by the display processing and searched by the Chinese character search processing, and the Chinese characters by the voice recognition processing, which are displayed in the voice recognition system of the first invention. Therefore, it is characterized in that the reception process for accepting any of the selections is further executed.

第３の発明の音声認識システムは、第２の発明の音声認識システムにおいて、前記受付処理により受け付けられた漢字を、音声認識結果として決定する決定処理をさらに実行することを特徴とする。 The voice recognition system of the third invention is characterized in that, in the voice recognition system of the second invention, a determination process of determining the kanji received by the reception process as a voice recognition result is further executed.

第４の発明の音声認識システムは、第１～第３のいずれかの発明の音声認識システムにおいて、外部からの着信を受電する受電処理をさらに実行し、前記音声認識処理において、前記受電処理により受電された着信からの音声を認識することを特徴とする。 The voice recognition system of the fourth invention further executes a power receiving process for receiving an incoming call from the outside in the voice recognition system of any one of the first to third inventions, and in the voice recognition process, the power receiving process is performed. It is characterized by recognizing the voice from the received incoming call.

第５の発明の音声認識システムは、第４の発明の音声認識システムにおいて、前記受電処理において、受電後、発話を促す音声ガイドを送出することを特徴とする The voice recognition system of the fifth invention is characterized in that, in the voice recognition system of the fourth invention, in the power receiving process, after receiving power, a voice guide prompting an utterance is transmitted.

第６の発明の音声認識システムは、第５の発明の音声認識システムにおいて、前記音声ガイドは、住所、氏名の発話を促す音声ガイドであることを特徴とする。 The voice recognition system of the sixth invention is characterized in that, in the voice recognition system of the fifth invention, the voice guide is a voice guide for prompting utterance of an address and a name.

第７の発明の音声認識システムは、第１～第６のいずれかの発明の音声認識システムにおいて、前記所定部分は、同音の漢字が複数存在する部分であることを特徴とする。 The voice recognition system of the seventh invention is characterized in that, in the voice recognition system of any one of the first to sixth aspects, the predetermined portion is a portion in which a plurality of kanji characters of the same sound are present.

第８の発明の音声認識システムは、第１～第７のいずれかの発明の音声認識システムにおいて、前記書知恵部分は、氏、又は、名であることを特徴とする。 The voice recognition system of the eighth invention is characterized in that, in the voice recognition system of any one of the first to seventh inventions, the written wisdom portion is a name or a name.

第９の発明の音声認識方法は、音声を認識する音声認識処理と、前記音声認識処理の結果から、所定部分と推定される結果を特定する特定処理と、前記特定処理により特定された前記所定部分と同音の漢字を検索する漢字検索処理と、前記漢字検索処理により検索された前記同音の漢字と、前記音声認識処理による漢字と、を表示する表示処理と、を実行することを特徴とする。
The voice recognition method of the ninth invention includes a voice recognition process for recognizing a voice, a specific process for specifying a result presumed to be a predetermined portion from the result of the voice recognition process, and the predetermined process specified by the specific process. It is characterized by executing a kanji search process for searching for a kanji having the same sound as a portion, and a display process for displaying the kanji with the same sound searched by the kanji search process and the kanji by the voice recognition process. ..

本発明によれば、オペレーター等は、正しい漢字を選択することができるため、音声認識における誤認識を防止することができる。 According to the present invention, since the operator or the like can select the correct Chinese character, it is possible to prevent erroneous recognition in voice recognition.

音声認識処理の結果を示す図である。It is a figure which shows the result of the voice recognition processing.従来の音声認識処理の結果を示す図である。It is a figure which shows the result of the conventional speech recognition processing.

以下、本発明の実施形態について説明する。本実施形態では、音声認識を行う音声認識システムを、電話の自動応答システム（以下、単に「自動応答システム」という。）に適用した場合について説明する。自動応答システムは、音声を認識するクラウドサーバー、クラウドサーバーによる音声認識結果を格納するデータべース等から構成される。自動応答システムの動作の概要は以下の通りである、自動応答システムは、ユーザーからの着信を受電する。自動応答システムは、着信からの音声（ユーザーによる住所、氏名の発話）を認識し、音声認識結果をデータベースに格納する。データベースに格納された音声認識結果（データ）に基づいて、荷物等の発送が行われる。なお、音声認識は、上述したクラウドサーバーに限らず、他の音声認識を実行可能な装置等によって行われてもよい。 Hereinafter, embodiments of the present invention will be described. In the present embodiment, a case where the voice recognition system that performs voice recognition is applied to an automatic answering system for telephones (hereinafter, simply referred to as “automatic answering system”) will be described. The automatic response system is composed of a cloud server that recognizes voice, a database that stores voice recognition results by the cloud server, and the like. The outline of the operation of the automatic answering system is as follows. The automatic answering system receives an incoming call from a user. The automatic answering system recognizes the voice from the incoming call (address and name spoken by the user) and stores the voice recognition result in the database. Luggage and the like are shipped based on the voice recognition result (data) stored in the database. The voice recognition is not limited to the cloud server described above, and may be performed by another device or the like capable of performing voice recognition.

自動応答システムは、受電処理、音声認識処理、特定処理、漢字検索処理、表示処理、受付処理、決定処理等を行う。以下、各処理について説明する。 The automatic response system performs power receiving processing, voice recognition processing, specific processing, Chinese character search processing, display processing, reception processing, determination processing, and the like. Hereinafter, each process will be described.

受電処理は、外部からの着信を受電する処理である。自動応答システムは、受電処理において、受電後、ユーザーによる発話を促す音声ガイドを送出する（流す）。音声ガイドは、ユーザーによる住所、氏名の発話を促す音声ガイドであり、例えば、「ピーという音の後、住所、氏名をお願いします。」というような音声ガイドである。ユーザーは、自動応答システムに割り当てられた電話番号に発信し、音声ガイドに従って、住所、氏名を発話する。 The power receiving process is a process of receiving an incoming call from the outside. In the power receiving process, the automatic response system sends (plays) a voice guide prompting the user to speak after receiving the power. The voice guide is a voice guide that encourages the user to speak an address and a name. For example, a voice guide such as "Please give me an address and a name after a beep". The user calls the telephone number assigned to the automatic answering system and speaks the address and name according to the voice guide.

音声認識処理は、受電処理により受電された着信からの音声を認識する処理である。自動音声システムは、ユーザーによる発話の音声データを、一旦、ストレージに格納する（録音）。その後、自動応答システムは、音声データをテキストに変換する。自動応答システムは、変換したテキストを、データベースに格納する。 The voice recognition process is a process of recognizing a voice from an incoming call received by the power receiving process. The automatic voice system temporarily stores (recording) the voice data of the utterance of the user in the storage. The auto attendant system then converts the voice data into text. The auto attendant system stores the converted text in a database.

ここで、発話データ毎に音声データがわかれているため、自動応答システムは、音声データに、どの（氏名、住所）が含まれているか、を判断することは容易である。しかしながら、音声認識に汎用のシステムが用いられた場合、氏名が誤認識される場合がある。例えば、ユーザーが、「ながの」と発話したとき、正しい氏が、「永野」であっても、都道府県と同じ「長野」と認識される可能性が高い。また、ユーザーが、「おがわ」と発話した場合、「小川」、「尾川」、「緒川」と多くの候補があるため（図１参照）、自動応答システムは、正しい漢字を選択することは難しい。この問題を解決するため、本実施形態では、自動応答システムは、苗字（氏）と推定される単語を検出した場合、内蔵されている同音の単語リストから同じ苗字を検索し、苗字の他の候補として、情報を追加する。 Here, since the voice data is divided for each utterance data, it is easy for the automatic response system to determine which (name, address) is included in the voice data. However, when a general-purpose system is used for voice recognition, the name may be misrecognized. For example, when a user utters "Nagano", even if the correct person is "Nagano", it is highly likely that he will be recognized as "Nagano", which is the same as the prefecture. Also, when the user speaks "Ogawa", there are many candidates such as "Ogawa", "Ogawa", and "Ogawa" (see Fig. 1), so it is difficult for the automatic response system to select the correct Chinese character. .. In order to solve this problem, in the present embodiment, when the automatic response system detects a word presumed to be the surname (Mr.), it searches for the same surname from the built-in homophone word list and other surnames. Add information as a candidate.

特定処理は、音声認識処理の結果から、所定部分（同音の漢字が複数存在する部分）と推定される結果を特定する処理である。本実施形態では、所定部分は、氏（苗字）（又は、名）である。従って、特定処理において、氏（苗字）と推定される結果が特定される。漢字検索処理は、特定処理により特定された氏（苗字）と同音の漢字を検索する処理である。例えば、特定処理により特定された氏（苗字）が、「小川」である場合、同音の漢字として、「尾川」、「緒川」が検索される。 The specific process is a process of specifying a result estimated to be a predetermined part (a part in which a plurality of kanji characters having the same sound exist) from the result of the voice recognition process. In the present embodiment, the predetermined portion is Mr. (last name) (or first name). Therefore, in the specific process, the result presumed to be Mr. (last name) is specified. The kanji search process is a process for searching for a kanji that has the same sound as the surname specified by the specific process. For example, when the person (last name) specified by the specific process is "Ogawa", "Ogawa" and "Ogawa" are searched as the kanji of the same sound.

表示処理は、漢字検索処理により検索された同音の漢字と、音声認識処理による漢字と、を表示する処理である。例えば、音声認識処理による漢字が「小川」であり、同音の漢字として、「尾川」、「緒川」が検索された場合、漢字検索処理により検索された同音の漢字「尾川」、「緒川」と、音声認識処理による漢字「小川」と、が表示される。 The display process is a process of displaying the kanji of the same sound searched by the kanji search process and the kanji by the voice recognition process. For example, if the Chinese character by voice recognition processing is "Ogawa" and "Ogawa" and "Ogawa" are searched as the same-sound Chinese characters, the same-sound Chinese characters "Ogawa" and "Ogawa" are searched by the kanji search processing. , The Chinese character "Ogawa" by voice recognition processing is displayed.

受付処理は、表示処理により表示される、漢字検索処理により検索された同音の漢字と、音声認識処理よる漢字と、から、いずれかの選択を受け付ける。例えば、自動応答システムを利用するコールセンターのオペレーター等は、表示された複数の漢字の候補から、いずれかを選択する。決定処理は、受付処理により受け付けられた漢字を、音声認識結果として決定する処理である。自動応答システムは、決定した漢字を、データベースに格納する。そして、データベースに格納された情報に基づいて、荷物等が発送される。 The reception process accepts one of the kanji of the same sound searched by the kanji search process and the kanji by the voice recognition process, which are displayed by the display process. For example, a call center operator or the like who uses an automatic answering system selects one from a plurality of displayed Chinese character candidates. The determination process is a process of determining the Chinese characters accepted by the reception process as a voice recognition result. The automatic response system stores the determined Chinese characters in the database. Then, the package or the like is shipped based on the information stored in the database.

以上説明したように、本実施形態では、音声認識処理による漢字（例えば、「小川」）と、漢字検索処理により検索された同音の漢字（例えば、「尾川」、「緒川」）と、が表示される。これにより、オペレーター等は、正しい漢字を選択することができるため、音声認識における誤認識を防止することができる。 As described above, in the present embodiment, the kanji by the voice recognition process (for example, "Ogawa") and the kanji of the same sound searched by the kanji search process (for example, "Ogawa" and "Ogawa") are displayed. Will be done. As a result, the operator or the like can select the correct Chinese character, and thus it is possible to prevent erroneous recognition in voice recognition.

以上、本発明の実施形態について説明したが、本発明を適用可能な形態は、上述の実施形態には限られるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更を加えることが可能である。 Although the embodiments of the present invention have been described above, the embodiments to which the present invention can be applied are not limited to the above-described embodiments, and modifications can be made as appropriate without departing from the spirit of the present invention. be.

本発明は、音声認識を行う音声認識システム、及び、音声認識方法に好適に採用され得る。 INDUSTRIAL APPLICABILITY The present invention can be suitably adopted for a voice recognition system for performing voice recognition and a voice recognition method.

Claims

Translated fromJapanese

音声を認識する音声認識処理と、
前記音声認識処理の結果から、所定部分と推定される結果を特定する特定処理と、
前記特定処理により特定された前記所定部分と同音の漢字を検索する漢字検索処理と、
前記漢字検索処理により検索された前記同音の漢字と、前記音声認識処理による漢字と、を表示する表示処理と、
を実行することを特徴とする音声認識システム。Voice recognition processing that recognizes voice and
A specific process for specifying a result presumed to be a predetermined part from the result of the voice recognition process, and a specific process.
A kanji search process for searching for a kanji with the same sound as the predetermined part specified by the specific process, and a kanji search process.
A display process for displaying the kanji of the same sound searched by the kanji search process and the kanji by the voice recognition process.
A speech recognition system characterized by running.

前記表示処理により表示される、前記漢字検索処理により検索された前記同音の漢字と、前記音声認識処理による漢字と、から、いずれかの選択を受け付ける受付処理をさらに実行することを特徴とする請求項１に記載の音声認識システム。 A claim characterized by further executing a reception process for accepting any selection from the kanji of the same sound searched by the kanji search process and the kanji by the voice recognition process displayed by the display process. Item 1. The voice recognition system according to Item 1.

前記受付処理により受け付けられた漢字を、音声認識結果として決定する決定処理をさらに実行することを特徴とする請求項２に記載の音声認識システム。 The voice recognition system according to claim 2, further performing a determination process of determining the Chinese characters accepted by the reception process as a voice recognition result.

外部からの着信を受電する受電処理をさらに実行し、
前記音声認識処理において、前記受電処理により受電された着信からの音声を認識することを特徴とする請求項１～３のいずれか１項に記載の音声認識システム。Further execute the power receiving process to receive the incoming call from the outside,
The voice recognition system according to any one of claims 1 to 3, wherein in the voice recognition process, a voice from an incoming call received by the power receiving process is recognized.

前記受電処理において、受電後、発話を促す音声ガイドを送出することを特徴とする請求項４に記載の音声認識システム。 The voice recognition system according to claim 4, wherein in the power receiving process, after receiving power, a voice guide prompting an utterance is transmitted.

前記音声ガイドは、住所、氏名の発話を促す音声ガイドであることを特徴とする請求項５に記載の音声認識システム。 The voice recognition system according to claim 5, wherein the voice guide is a voice guide that encourages speech of an address and a name.

前記所定部分は、同音の漢字が複数存在する部分であることを特徴とする請求項１～６のいずれか１項に記載の音声認識システム。 The voice recognition system according to any one of claims 1 to 6, wherein the predetermined portion is a portion in which a plurality of kanji characters having the same sound are present.

前記所定部分は、氏、又は、名であることを特徴とする請求項１～７のいずれか１項に記載の音声認識システム。 The voice recognition system according to any one of claims 1 to 7, wherein the predetermined portion is a name or a name.

音声を認識する音声認識処理と、
前記音声認識処理の結果から、所定部分と推定される結果を特定する特定処理と、
前記特定処理により特定された前記所定部分と同音の漢字を検索する漢字検索処理と、
前記漢字検索処理により検索された前記同音の漢字と、前記音声認識処理による漢字と、を表示する表示処理と、
を実行することを特徴とする音声認識方法。Voice recognition processing that recognizes voice and
A specific process for specifying a result presumed to be a predetermined part from the result of the voice recognition process, and a specific process.
A kanji search process for searching for a kanji with the same sound as the predetermined part specified by the specific process, and a kanji search process.
A display process for displaying the kanji of the same sound searched by the kanji search process and the kanji by the voice recognition process.
A speech recognition method characterized by performing.