KR102785151B1

Movatterモバイル変換

Info

Publication number: KR102785151B1
Application number: KR1020180132805A
Authority: KR
Inventors: 양우석
Original assignee: 현대자동차주식회사; 기아 주식회사
Priority date: 2018-11-01
Filing date: 2018-11-01
Publication date: 2025-03-21
Anticipated expiration: 2038-11-01
Also published as: KR20200050152A

Abstract

Translated fromKorean

본 발명은 음성 인식 시스템에 관한 것으로, 보다 상세히는 복수의 기기가 동시에 동일 음성 명령을 입력받은 경우, 효율적으로 음성 인식 결과를 출력할 수 있는 음성 인식 시스템 및 그 제어방법에 관한 것이다. 본 발명의 일 실시예에 따른 음성 인식 방법은, 복수의 인접한 기기 각각이 동일 발화자의 음성 명령을 수신하는 단계; 상기 복수의 기기 각각이 상기 수신된 음성 명령에 대응되는 음성 명령 데이터 및 보조 데이터를 포함하는 음성 인식 데이터를 음성 인식 서버로 전송하는 단계; 상기 음성 인식 서버가 상기 보조 데이터를 기반으로 동일 음성 명령을 포함하는 음성 인식 데이터를 전송한 적어도 하나의 기기를 판단하는 단계; 상기 음성 인식 서버가 상기 판단된 적어도 하나의 기기가 전송한 음성 명령 데이터에 대한 음성 인식 결과를 생성하는 단계; 및 상기 음성 인식 서버가 상기 판단된 적어도 하나의 기기 중 상기 음성 인식 결과를 출력할 기기를 결정하는 단계를 포함할 수 있다.The present invention relates to a voice recognition system, and more particularly, to a voice recognition system capable of efficiently outputting a voice recognition result when a plurality of devices simultaneously receive the same voice command, and a control method thereof. A voice recognition method according to one embodiment of the present invention may include: a step in which each of a plurality of adjacent devices receives a voice command from the same speaker; a step in which each of the plurality of devices transmits voice recognition data including voice command data and auxiliary data corresponding to the received voice command to a voice recognition server; a step in which the voice recognition server determines at least one device that has transmitted voice recognition data including the same voice command based on the auxiliary data; a step in which the voice recognition server generates a voice recognition result for voice command data transmitted by the at least one determined device; and a step in which the voice recognition server determines a device from among the at least one determined device to output the voice recognition result.

Description

Translated fromKorean

다중 기기를 음성 인식 시스템 및 그 제어 방법{SYSTEM FOR RECOGNIZING VOICE USING MULTIPLE DEVICES AND METHOD OF CONTROLLING THE SAME}{SYSTEM FOR RECOGNIZING VOICE USING MULTIPLE DEVICES AND METHOD OF CONTROLLING THE SAME}

본 발명은 음성 인식 시스템에 관한 것으로, 보다 상세히는 복수의 기기가 동시에 동일 음성 명령을 입력받은 경우, 효율적으로 음성 인식 결과를 출력할 수 있는 음성 인식 시스템 및 그 제어방법에 관한 것이다.The present invention relates to a voice recognition system, and more specifically, to a voice recognition system capable of efficiently outputting a voice recognition result when a plurality of devices simultaneously receive the same voice command, and a control method thereof.

크게는 차량에서부터 작게는 스마트 디바이스까지 최근의 기기들은 보다 적극적으로 음성 인식 기능을 적용하고 있다. 따라서, 발화자 주변에 음성 인식이 가능한 복수의 기기들이 음성 인식 대기 상태로 존재하는 경우도 많다.From large vehicles to small smart devices, recent devices are more actively applying voice recognition functions. Therefore, there are many cases where multiple devices capable of voice recognition exist around the speaker in a voice recognition standby state.

그런데, 이러한 상황에서 발화자의 음성 명령에 대하여 복수의 기기가 개별적으로 음성 인식 기능을 수행하는 경우, 동일 음성 명령에 대한 음성 답변이 유사한 시점에 동시에 출력되어 오히려 발화자가 음성 인식 결과를 인지하기 어려운 상황이 발생할 수 있는 문제점이 있다. 물론, 최근의 몇몇 기기들은 미리 등록된 사용자의 음성만을 선별적으로 인식하기도 하며, 서비스별로 미리 결정된 웨이크업 명령어에 동반되는 음성만을 인식하기도 한다.However, in such a situation, if multiple devices individually perform voice recognition functions in response to the speaker's voice command, there is a problem that voice responses to the same voice command may be output simultaneously at similar times, making it difficult for the speaker to recognize the voice recognition results. Of course, some recent devices selectively recognize only the voices of pre-registered users, and recognize only the voices accompanying the wake-up command determined in advance for each service.

그러나, 이러한 기능이 적용되더라도 동일한 발화자를 미리 인식 대상 음성으로 등록한 복수의 기기가 함께 존재하거나, 웨이크업 명령어가 중복되는 복수의 기기가 존재하는 상황에는 여전히 인식된 음성에 대한 답변을 복수의 기기 각각이 개별적으로 출력하는 문제점이 존재한다.However, even if this function is applied, there is still a problem in which multiple devices individually output responses to recognized voices when there are multiple devices that have pre-registered the same speaker as a target voice for recognition, or when there are multiple devices with overlapping wake-up commands.

본 발명은 음성 인식 가능한 복수의 기기가 발화자의 동일 음성 명령을 인식할 수 있는 환경에서 보다 편리한 사용 환경을 제공하기 위한 것이다.The present invention is to provide a more convenient usage environment in an environment where multiple voice recognition capable devices can recognize the same voice command from a speaker.

특히, 본 발명은 복수의 음성 인식 가능한 기기가 발화자의 동일 음성 명령에 대하여 개별적으로 음성 인식 결과를 출력하는 상황을 방지할 수 있는 음성 인식 시스템 및 그 제어방법에 관한 것이다.In particular, the present invention relates to a voice recognition system and a control method thereof that can prevent a situation in which a plurality of voice recognition capable devices individually output voice recognition results for the same voice command from a speaker.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by a person having ordinary skill in the technical field to which the present invention belongs from the description below.

상기와 같은 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 음성 인식 방법은, 복수의 인접한 기기 각각이 동일 발화자의 음성 명령을 수신하는 단계; 상기 복수의 기기 각각이 상기 수신된 음성 명령에 대응되는 음성 명령 데이터 및 보조 데이터를 포함하는 음성 인식 데이터를 음성 인식 서버로 전송하는 단계; 상기 음성 인식 서버가 상기 보조 데이터를 기반으로 동일 음성 명령을 포함하는 음성 인식 데이터를 전송한 적어도 하나의 기기를 판단하는 단계; 상기 음성 인식 서버가 상기 판단된 적어도 하나의 기기가 전송한 음성 명령 데이터에 대한 음성 인식 결과를 생성하는 단계; 및 상기 음성 인식 서버가 상기 판단된 적어도 하나의 기기 중 상기 음성 인식 결과를 출력할 기기를 결정하는 단계를 포함할 수 있다.In order to solve the above technical problem, a voice recognition method according to an embodiment of the present invention may include a step in which each of a plurality of adjacent devices receives a voice command from the same speaker; a step in which each of the plurality of devices transmits voice recognition data including voice command data and auxiliary data corresponding to the received voice command to a voice recognition server; a step in which the voice recognition server determines at least one device that has transmitted voice recognition data including the same voice command based on the auxiliary data; a step in which the voice recognition server generates a voice recognition result for voice command data transmitted by the determined at least one device; and a step in which the voice recognition server determines a device from among the determined at least one device to which the voice recognition result is to be output.

또한, 본 발명의 일 실시예에 따른 음성 인식 서버는, 복수의 기기로부터 발화자의 음성 명령에 대응되는 음성 명령 데이터 및 보조 데이터를 포함하는 음성 인식 데이터를 수신하는 통신부; 및 상기 수신된 음성 인식 데이터를 처리하는 제어부를 포함하되, 상기 제어부는 상기 보조 데이터를 기반으로 동일 음성 명령을 포함하는 음성 인식 데이터를 전송한 적어도 하나의 기기를 판단하는 동일 명령 판단부; 상기 판단된 적어도 하나의 기기가 전송한 음성 명령 데이터에 대한 음성 인식 결과를 생성하는 음성인식 처리부; 및 상기 판단된 적어도 하나의 기기 중 상기 음성 인식 결과를 출력할 기기를 결정하는 출력 기기 결정부를 포함할 수 있다.In addition, a voice recognition server according to one embodiment of the present invention may include a communication unit that receives voice recognition data including voice command data and auxiliary data corresponding to a voice command of a speaker from a plurality of devices; and a control unit that processes the received voice recognition data, wherein the control unit may include an identical command determination unit that determines at least one device that has transmitted voice recognition data including the identical voice command based on the auxiliary data; a voice recognition processing unit that generates a voice recognition result for voice command data transmitted by the determined at least one device; and an output device determination unit that determines a device from among the determined at least one device to output the voice recognition result.

또한, 본 발명의 다른 실시예에 따른 음성 인식 방법은, 복수의 인접한 기기 각각이 동일 발화자의 음성 명령을 수신하는 단계; 상기 복수의 기기 각각이 상기 수신된 음성 명령에 대응되는 음성 명령 데이터 및 보조 데이터를 포함하는 음성 인식 데이터를 외부 기기로 전송하는 단계; 상기 외부 기기가 상기 보조 데이터를 기반으로 동일 음성 명령을 포함하는 음성 인식 데이터를 전송한 적어도 하나의 기기를 판단하는 단계; 상기 외부 기기가 상기 판단된 적어도 하나의 기기가 전송한 음성 명령 데이터에 대한 음성 인식 결과를 음성 인식 서버에 문의하여 수신하는 단계; 및 상기 외부 기기가 상기 판단된 적어도 하나의 기기 중 상기 음성 인식 결과를 출력할 기기를 결정하는 단계를 포함할 수 있다.In addition, a voice recognition method according to another embodiment of the present invention may include a step in which each of a plurality of adjacent devices receives a voice command from the same speaker; a step in which each of the plurality of devices transmits voice recognition data including voice command data and auxiliary data corresponding to the received voice command to an external device; a step in which the external device determines at least one device that has transmitted voice recognition data including the same voice command based on the auxiliary data; a step in which the external device inquires about and receives a voice recognition result for voice command data transmitted by the determined at least one device from a voice recognition server; and a step in which the external device determines a device from among the determined at least one device to output the voice recognition result.

아울러, 본 발명의 다른 실시예에 따른 음성 인식 장치는, 기 연결된 복수의 기기로부터 발화자의 음성 명령에 대응되는 음성 명령 데이터 및 보조 데이터를 포함하는 음성 인식 데이터를 수신하는 제1 통신부; 상기 보조 데이터를 기반으로 동일 음성 명령을 포함하는 음성 인식 데이터를 전송한 적어도 하나의 기기를 판단하는 동일 명령 판단부와, 상기 판단된 적어도 하나의 기기 중 상기 음성 인식 결과를 출력할 기기를 결정하는 출력 기기 결정부를 포함하는 제어부; 및 상기 판단된 적어도 하나의 기기가 전송한 음성 명령 데이터에 대한 음성 인식 결과를 음성 인식 서버에 문의하여 수신하는 제2 통신부를 포함할 수 있다.In addition, a voice recognition device according to another embodiment of the present invention may include a first communication unit that receives voice recognition data including voice command data and auxiliary data corresponding to a voice command of a speaker from a plurality of connected devices; a control unit including an identical command determination unit that determines at least one device that has transmitted voice recognition data including the identical voice command based on the auxiliary data, and an output device determination unit that determines a device to output the voice recognition result among the at least one determined device; and a second communication unit that inquires of a voice recognition server and receives a voice recognition result for the voice command data transmitted by the at least one determined device.

상기와 같이 구성되는 본 발명의 적어도 하나의 실시예에 관련된 음성 인식 시스템을 통해 복수의 음성 인식 가능한 기기가 공존하는 상황에서 발화자는 보다 편리하게 음성 인식 결과를 획득할 수 있다.In a situation where a plurality of voice recognition capable devices coexist, a speaker can more conveniently obtain voice recognition results through a voice recognition system related to at least one embodiment of the present invention configured as described above.

특히, 기기별 위치와 시간 정보를 통해 동일 발화자의 단일 명령인지 여부 및 음성 인식 결과를 출력할 기기가 결정되므로, 복수의 기기 각각이 개별적으로 음성 인식 결과를 출력하는 상황이 방지될 수 있다.In particular, since it is determined whether a single command is from the same speaker and which device will output the voice recognition result through the location and time information of each device, a situation in which each of multiple devices individually outputs the voice recognition result can be prevented.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable from the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art to which the present invention belongs from the description below.

도 1은 본 발명의 실시예들이 적용될 수 있는 시스템 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 음성 인식 가능한 기기 구성의 일례를 나타낸다.
도 3은 본 발명의 일 실시예에 따른 음성 인식 서버 구성의 일례를 나타낸다.
도 4는 본 발명의 일 실시예에 따른 음성 인식 시스템에서 음성 인식이 수행되는 과정의 일례를 나타낸다.
도 5은 본 발명의 다른 실시예에 따른 시스템 구성을 설명하기 위한 도면이다.
도 6은 본 발명의 다른 실시예에 따른 음성 인식 시스템에서 음성 인식이 수행되는 과정의 일례를 나타낸다.Figure 1 is a drawing for explaining a system configuration to which embodiments of the present invention can be applied.
FIG. 2 illustrates an example of a configuration of a voice recognition capable device according to one embodiment of the present invention.
FIG. 3 illustrates an example of a voice recognition server configuration according to one embodiment of the present invention.
FIG. 4 shows an example of a process in which voice recognition is performed in a voice recognition system according to one embodiment of the present invention.
FIG. 5 is a drawing for explaining a system configuration according to another embodiment of the present invention.
FIG. 6 shows an example of a process in which speech recognition is performed in a speech recognition system according to another embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are assigned similar drawing reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서 전체에 걸쳐서 동일한 참조번호로 표시된 부분들은 동일한 구성요소들을 의미한다.Throughout the specification, whenever a part is said to "include" a certain component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise specifically stated. Furthermore, parts designated by the same reference numerals throughout the specification refer to the same components.

본 발명의 일 실시예에서는, 복수의 음성 인식 가능한 기기가 공존하는 상황에서 발화자의 단일 음성 명령에 대하여, 기기별로 부가 정보를 부여하여 서버에 전달하면, 서버가 단일 음성 명령을 입력받은 기기을 판단하고, 기기별로 음성 인식 결과를 출력할 형태를 결정해줄 것을 제안한다.In one embodiment of the present invention, in a situation where multiple voice recognition capable devices coexist, it is proposed that when a single voice command from a speaker is given, additional information is assigned to each device and transmitted to a server, and the server determines the device that received the single voice command and determines the form in which the voice recognition result is to be output for each device.

먼저, 도 1을 참조하여 본 발명의 실시예들이 수행될 수 있는 환경을 설명한다. 도 1은 본 발명의 실시예들이 적용될 수 있는 시스템 구성을 설명하기 위한 도면이다.First, an environment in which embodiments of the present invention can be performed will be described with reference to FIG. 1. FIG. 1 is a drawing for explaining a system configuration to which embodiments of the present invention can be applied.

도 1을 참조하면, 발화자(100)의 주변(예컨대, 발화자(100)의 음성 인식이 가능한 거리 이내)에 음성 인식 기능을 지원하는 복수의 기기(200A, 200B, 200C)가 존재한다. 또한, 각 기기(200A, 200B, 200C)는 무선 통신망(300)을 통해 음성 인식 서버(400)와 통신할 수 있다.Referring to FIG. 1, there are multiple devices (200A, 200B, 200C) supporting voice recognition function around a speaker (100) (e.g., within a distance where the speaker's (100) voice can be recognized). In addition, each device (200A, 200B, 200C) can communicate with a voice recognition server (400) via a wireless communication network (300).

도 1에 도시된 상황은 예시적인 것으로, 서버(400)와 기기(200A, 200B, 200C)는 무선 통신망(300)이 아닌 유선 통신망(미도시)을 통해서도 연결될 수 있으며, 기기의 개수도 둘 이상이면 족하다.The situation illustrated in Fig. 1 is exemplary, and the server (400) and devices (200A, 200B, 200C) can be connected through a wired communication network (not shown) rather than a wireless communication network (300), and two or more devices are sufficient.

이하에서는 도 2 및 도 3을 참조하여 실시예에 따른 음성 인식 가능한 기기와 음성 인식 서버의 구성 각각을 설명한다.Hereinafter, the configurations of a voice recognition capable device and a voice recognition server according to an embodiment will be described with reference to FIGS. 2 and 3, respectively.

도 2는 본 발명의 일 실시예에 따른 음성 인식 가능한 기기 구성의 일례를 나타낸다.FIG. 2 illustrates an example of a configuration of a voice recognition capable device according to one embodiment of the present invention.

도 2를 참조하면, 실시예에 따른 음성 인식 가능한 기기(200)는 마이크(210), GPS(220), 무선통신부(230), 출력부(240) 및 제어부(250)를 포함할 수 있다.Referring to FIG. 2, a voice recognition capable device (200) according to an embodiment may include a microphone (210), GPS (220), a wireless communication unit (230), an output unit (240), and a control unit (250).

마이크(210)는 발화자(100)의 음성 명령을 입력받아 전기적 신호로 변환하며, GPS(220)는 기기(200)의 위치 정보를 획득할 수 있다.The microphone (210) receives a voice command from the speaker (100) and converts it into an electrical signal, and the GPS (220) can obtain location information of the device (200).

무선통신부(230)는 적어도 무선통신망(300)을 통해 음성 인식 서버(400)와 데이터 교환을 수행할 수 있으며, 실시예에 따라 주변의 다른 음성 인식 가능한 기기와 데이터 교환을 수행할 수 있도록 구성될 수 있다. 이를 위해, 무선통신부(230)는 3/4/5G 등의 무선통신 프로토콜과 Wi-Fi, 블루투스, Zigbee 등 근거리 무선통신 프로토콜 중 적어도 하나를 지원할 수 있다.The wireless communication unit (230) can perform data exchange with the voice recognition server (400) at least through the wireless communication network (300), and according to an embodiment, can be configured to perform data exchange with other voice recognition capable devices in the vicinity. To this end, the wireless communication unit (230) can support at least one of wireless communication protocols such as 3/4/5G and short-range wireless communication protocols such as Wi-Fi, Bluetooth, and Zigbee.

출력부(240)는 적어도 무선 통신부(230)를 통해 음성 인식 서버(400)로부터 수신된 음성 인식 결과 정보에 따라, 발화자의 음성 명령에 대응되는 음성 인식 결과를 시각 정보 및 음향 중 적어도 하나의 방식으로 출력할 수 있다. 이를 위해, 출력부(240)는 디스플레이 및 스피커 중 적어도 하나를 포함할 수 있다.The output unit (240) can output a voice recognition result corresponding to a voice command of a speaker in at least one of visual information and sound, based on voice recognition result information received from a voice recognition server (400) through at least a wireless communication unit (230). To this end, the output unit (240) can include at least one of a display and a speaker.

제어부(250)는 전술한 각 구성요소의 전반적인 동작을 제어할 수 있으며, 특히 실시예에 따른 음성 인식 관련 기능을 처리하는 음성 인식 처리부(251)를 포함할 수 있다. 음성 인식 처리부(251)는 발화자(100)의 음성 명령이 마이크(210)를 통해 입력된 경우, 이에 보조 데이터를 추가한 음성 인식 데이터가 무선 통신부(230)를 통해 음성 인식 서버(400)로 전송되도록 제어할 수 있다.The control unit (250) can control the overall operation of each component described above, and in particular, can include a voice recognition processing unit (251) that processes a voice recognition-related function according to an embodiment. The voice recognition processing unit (251) can control voice recognition data with auxiliary data added thereto to be transmitted to the voice recognition server (400) through the wireless communication unit (230) when a voice command of a speaker (100) is input through a microphone (210).

즉, 음성 인식 데이터는 적어도 발화자(100)의 음성 명령 정보와 보조 데이터를 포함할 수 있다. 음성 명령 정보는 마이크를 통해 입력된 음성 명령이 소정의 절차를 통해 가공된 음향 정보일 수도 있고, 이러한 음향 정보를 STT(Speach To Text) 처리한 텍스트 정보일 수도 있으며, 이들의 조합일 수도 있다. 또한, 보조 데이터는 음성 명령이 입력된 시점의 시간 정보(즉, 타임 스탬프), 음성 명령의 수신 감도(볼륨 크기, SNR 등), 음성 명령이 입력된 시점의 해당 기기의 위치 정보, 해당 기기의 네트워크 정보 중 적어도 하나가 포함될 수 있다. 시간 정보는 무선통신부(230)나 GPS(220)를 통해 획득될 수 있으며, 기기의 위치 정보는 GPS(220)를 통해 획득될 수 있고, 네트워크 정보는 무선통신부(230)를 통해 획득될 수 있다.That is, the voice recognition data may include at least the voice command information of the speaker (100) and auxiliary data. The voice command information may be audio information processed through a predetermined procedure from a voice command input through a microphone, text information processed through STT (Speech To Text) of such audio information, or a combination thereof. In addition, the auxiliary data may include at least one of time information (i.e., time stamp) at the time the voice command was input, reception sensitivity of the voice command (volume size, SNR, etc.), location information of the corresponding device at the time the voice command was input, and network information of the corresponding device. The time information may be acquired through the wireless communication unit (230) or GPS (220), location information of the device may be acquired through GPS (220), and network information may be acquired through the wireless communication unit (230).

여기서 네트워크 정보는 해당 기기가 어태치된 무선망의 정보, 예컨대, 통신사 정보, 기지국 정보, 네트워크 명, 근접기기 프로토콜(Apple 사의 AirDrop 등)의 식별 정보 등을 의미할 수 있다.Here, network information may refer to information about the wireless network to which the device is attached, such as information about the telecommunications company, information about the base station, network name, and identification information for nearby device protocols (such as Apple's AirDrop).

도 3은 본 발명의 일 실시예에 따른 음성 인식 서버 구성의 일례를 나타낸다.FIG. 3 illustrates an example of a voice recognition server configuration according to one embodiment of the present invention.

도 3을 참조하면, 실시예에 따른 음성 인식 서버는 제어부(410)와 통신부(420)를 포함할 수 있다. 제어부(410)는 다시 동일명령 판단부(411), 음성인식 처리부(413), 출력 기기 설정부(415)를 포함할 수 있다. 동일명령 판단부(411)는 기기들(200A, 200B, 200C 등)이 보내온 음성 인식 데이터에 포함된 보조 데이터를 이용하여 해당 음성 명령 데이터가 동일 발화자의 음성 명령에 해당하는지 여부를 판단한다. 이는 실질적으로 동일한 시점에서 서로 근접한 위치의 기기들이 수신한 음성 명령 데이터는 동일 발화자의 음성 명령에 해당할 수 있기 때문이다.Referring to FIG. 3, a voice recognition server according to an embodiment may include a control unit (410) and a communication unit (420). The control unit (410) may further include an identical command determination unit (411), a voice recognition processing unit (413), and an output device setting unit (415). The identical command determination unit (411) determines whether the voice command data corresponds to a voice command of the same speaker by using auxiliary data included in the voice recognition data sent by devices (200A, 200B, 200C, etc.). This is because voice command data received by devices located close to each other at substantially the same point in time may correspond to voice commands of the same speaker.

음성인식 처리부(413)는 동일명령 판단부(411)에서 동일한 음성 명령을 수신한 것으로 판단한 기기들이 보내온 음성 명령 데이터에 기반하여, 음성 명령을 분석하고 그에 대응되는 음성 인식 결과를 생성한다.The voice recognition processing unit (413) analyzes the voice command and generates a corresponding voice recognition result based on the voice command data sent by devices that are determined to have received the same voice command by the identical command determination unit (411).

또한, 출력 기기 결정부(415)는 동일명령 판단부(411)에서 동일한 음성 명령을 수신한 것으로 판단한 기기들 중 어떠한 기기에서 음성인식 처리부(413)가 생성한 음성 인식 결과를 출력할지 여부를 결정한다. 예컨대, 출력 기기 결정부(415)는 보조 데이터를 기반으로 음성 인식 결과를 출력할 기기를 선택할 수 있다. 보다 구체적으로, 출력 기기 결정부(415)는 보조 데이터의 타임 스탬프가 가장 빠른 시점을 나타내는 기기를 선택할 수도 있고, 음성 명령의 수신 감도가 가장 높은 기기를 선택할 수도 있다. 이는 타임 스탬프가 가장 빠른 시점을 나타내거나 수신 감도가 가장 높음이 가장 발화자에 가까운 기기일 수 있기 때문이다.In addition, the output device determination unit (415) determines which device among the devices determined to have received the same voice command by the same command determination unit (411) will output the voice recognition result generated by the voice recognition processing unit (413). For example, the output device determination unit (415) may select a device to output the voice recognition result based on auxiliary data. More specifically, the output device determination unit (415) may select a device whose time stamp of the auxiliary data indicates the earliest point in time, or may select a device with the highest reception sensitivity of the voice command. This is because the device with the earliest time stamp or the highest reception sensitivity may be the device closest to the speaker.

음성인식 처리부(413)가 생성한 음성 인식 결과와 출력 기기 결정부(415)가 결정한 출력 기기에 대한 정보는 통신부(420)를 통해 해당 기기로 전송될 수 있다. 이때, 출력 기기 결정부(415)가 결정한 출력 기기와, 해당 기기를 제외한 나머지 동일 음성 명령을 수신한 기기에는 서로 다른 정보가 전송될 수 있다. 예컨대, 출력 기기 결정부(415)가 결정한 출력 기기로 전송되는 출력 기기에 대한 정보는 동일 음성 명령을 수신한 기기 중 대표로 음성 인식 결과를 출력함을 나타내는 플래그 또는 필드 설정값을 가질 수 있으며, 나머지 기기로 전송되는 출력 기기에 대한 정보는 이러한 정보가 누락되거나 대표 기기가 아님을 나타내는 정보가 포함될 수도 있다. 또는, 출력 기기에 대한 정보는 음성 인식 결과를 출력하는 형태에 대한 정보가 될 수도 있다. 예컨대, 출력 기기 결정부(415)가 결정한 출력 기기로는 음성 인식 결과를 음향으로 출력할 것을 지시하는 정보가 포함될 수 있고, 나머지 기기로는 음성 인식 결과를 음향을 제외한 형태로 출력할 것을 지시하는 정보가 포함될 수도 있다.The voice recognition result generated by the voice recognition processing unit (413) and the information about the output device determined by the output device determination unit (415) may be transmitted to the corresponding device via the communication unit (420). At this time, different information may be transmitted to the output device determined by the output device determination unit (415) and the remaining devices that received the same voice command except for the corresponding device. For example, the information about the output device transmitted to the output device determined by the output device determination unit (415) may have a flag or field setting value indicating that the output device is a representative device among the devices that received the same voice command and the information about the output device transmitted to the remaining devices may include information indicating that such information is missing or is not a representative device. Alternatively, the information about the output device may be information about the form in which the voice recognition result is output. For example, the information may include information instructing the output device determined by the output device determination unit (415) to output the voice recognition result as sound and the information may include information instructing the remaining devices to output the voice recognition result in a form other than sound.

이하에서는 도 4를 참조하여, 전술한 기기 구성에 따른 음성 인식 과정을 설명한다. 도 4는 본 발명의 일 실시예에 따른 음성 인식 시스템에서 음성 인식이 수행되는 과정의 일례를 나타낸다.Hereinafter, a voice recognition process according to the above-described device configuration will be described with reference to FIG. 4. FIG. 4 shows an example of a process in which voice recognition is performed in a voice recognition system according to one embodiment of the present invention.

도 4에서는 설명의 편의를 위하여 발화자 주변에 두 개의 음성 인식 가능한 기기(200A, 200B)가 존재하는 경우를 가정한다.For convenience of explanation, in Fig. 4, it is assumed that there are two voice recognition capable devices (200A, 200B) around the speaker.

도 4를 참조하면, 발화자의 음성 명령 발화에 따라, 각 음성 인식 기기(200A, 200B)에서는 음성 명령을 입력 받고(S410A, S410B), 이에 보조 데이터를 추가하여 음성 인식 데이터를 생성할 수 있다(S420A. S420B).Referring to FIG. 4, according to the speaker's voice command utterance, each voice recognition device (200A, 200B) can receive a voice command (S410A, S410B) and add auxiliary data thereto to generate voice recognition data (S420A, S420B).

음성 인식 기기A(200A)가 생성한 음성 인식 데이터A와 음성 인식 기기B(200B)가 생성한 음성 인식 데이터B는 각각 음성 인식 서버(400)로 전송될 수 있다(S430A, S430B). 물론, 각 기기의 성능에 따라 음성 인식 데이터를 생성하고 이를 음성 인식 서버(400)로 전송하는데까지 소요되는 시간은 상이할 수 있으나, 동일 음성 명령에 대한 타임 스탬프 정보는 실질적으로 큰 차이가 없을 것이다.Voice recognition data A generated by voice recognition device A (200A) and voice recognition data B generated by voice recognition device B (200B) can be transmitted to the voice recognition server (400) respectively (S430A, S430B). Of course, depending on the performance of each device, the time required to generate voice recognition data and transmit it to the voice recognition server (400) may differ, but the time stamp information for the same voice command will not substantially differ.

음성 인식 서버는 수신된 음성 인식 데이터를 기반으로 각 데이터에 포함된 음성 명령 데이터가 동일 음성 명령에 해당하는지 여부를 판단하고(S440), 동일 음성 명령에 대한 음성 인식 결과를 생성하며(S450), 동일 음성 명령이 복수개인 경우, 수신한 기기들 중 어떠한 기기가 음성 인식 결과를 출력할지 또는 각 기기별 음성 인식 결과를 출력 형태를 결정할 수 있다(S460).The voice recognition server determines whether voice command data included in each data corresponds to the same voice command based on the received voice recognition data (S440), generates a voice recognition result for the same voice command (S450), and, if there are multiple identical voice commands, determines which of the receiving devices will output the voice recognition result or the output format of the voice recognition result for each device (S460).

예컨대, 음성 인식 기기B가 대표 출력 기기로 결정된 경우, 음성 인식 서버(400)는 음성 인식 기기B로 음성 인식 결과 정보를 전송함에 있어, 음성 인식 결과에 해당 기기가 대표 출력 기기임을 나타내는 정보나 음향을 포함하여 음성 인식 결과를 출력할 것을 지시하는 정보를 포함시킬 수 있다(S470B).For example, if voice recognition device B is determined to be the representative output device, when transmitting voice recognition result information to voice recognition device B, the voice recognition server (400) may include information indicating that the device is the representative output device in the voice recognition result or information instructing that the voice recognition result be output including sound (S470B).

이와 달리 음성 인식 기기A는 대표 출력 기기가 아니므로, 음성 인식 서버(400)는 음성 인식 기기A로 음성 인식 결과 정보를 전송함에 있어, 음성 인식 결과에 해당 기기가 대표 출력 기기가 아님을 나타내는 정보나 음향을 제외하여 음성 인식 결과를 출력할 것을 지시하는 정보를 포함시킬 수 있다(S470A).In contrast, since voice recognition device A is not a representative output device, when transmitting voice recognition result information to voice recognition device A, the voice recognition server (400) may include information indicating that the device is not a representative output device in the voice recognition result or information instructing that the voice recognition result be output by excluding sound (S470A).

그에 따라 각 기기는 음성 인식 서버가 전송한 음성 인식 결과 정보에 대응되는 형태로 음성 인식 결과를 출력할 수 있다(S480A. S480B).Accordingly, each device can output voice recognition results in a form corresponding to the voice recognition result information transmitted by the voice recognition server (S480A. S480B).

한편, 본 발명의 다른 실시예에 의하면, 발화자 주변의 기기가 외부 기기에 연결되어 소정의 네트워크를 형성하고, 해당 외부 기기가 동일 음성 명령 판단 기능과 결과 출력 기기를 선정하는 기능을 수행할 수도 있다. 이를 도 5 및 도 6을 참조하여 설명한다.Meanwhile, according to another embodiment of the present invention, a device surrounding a speaker may be connected to an external device to form a predetermined network, and the external device may perform the same voice command judgment function and the function of selecting a result output device. This will be described with reference to FIGS. 5 and 6.

도 5은 본 발명의 다른 실시예에 따른 시스템 구성을 설명하기 위한 도면이다.FIG. 5 is a drawing for explaining a system configuration according to another embodiment of the present invention.

도 5를 참조하면, 발화자(100)의 주변(예컨대, 발화자(100)의 음성 인식이 가능한 거리 이내)에 음성 인식 기능을 지원하는 복수의 기기(200A, 200B, 200C)가 존재한다. 또한, 각 기기(200A, 200B, 200C)는 외부 기기(500, 예컨대, 차량의 AVN 시스템)와 연결되어 소정의 네트워크를 구성하고 있다. 또한, 외부 기기(500)는 무선 통신망(300)을 통해 음성 인식 서버(400)와 통신할 수 있다.Referring to FIG. 5, there are multiple devices (200A, 200B, 200C) supporting voice recognition function around a speaker (100) (e.g., within a distance where the speaker's (100) voice can be recognized). In addition, each device (200A, 200B, 200C) is connected to an external device (500, e.g., an AVN system of a vehicle) to form a predetermined network. In addition, the external device (500) can communicate with a voice recognition server (400) via a wireless communication network (300).

이러한 상황에서 음성 인식이 수행되는 과정을 도 6을 참조하여 설명한다.The process by which voice recognition is performed in this situation is explained with reference to Fig. 6.

도 6은 본 발명의 다른 실시예에 따른 음성 인식 시스템에서 음성 인식이 수행되는 과정의 일례를 나타낸다. 도 6에서 각 기기(200A, 200B)는 차량 내에 존재하며, 각각 AVN 시스템(500)에 무선 연결된 상황을 가정한다.Fig. 6 shows an example of a process in which voice recognition is performed in a voice recognition system according to another embodiment of the present invention. In Fig. 6, it is assumed that each device (200A, 200B) exists in a vehicle and is wirelessly connected to the AVN system (500).

도 6에서 각 기기(200A, 200B)가 발화자의 음성 명령을 인식하고 보조 데이터를 추가하여 음성 인식 데이터를 생성하는 과정(S410A, S410B, S420A, S420B)까지는 도 4와 동일하므로 중복되는 설명은 생략하기로 한다.In Fig. 6, the process in which each device (200A, 200B) recognizes the speaker's voice command and adds auxiliary data to generate voice recognition data (S410A, S410B, S420A, S420B) is the same as Fig. 4, so redundant description will be omitted.

음성 인식 기기A(200A)가 생성한 음성 인식 데이터A와 음성 인식 기기B(200B)가 생성한 음성 인식 데이터B는 각각 AVN 시스템(500)으로 전송될 수 있다(S630A, S630B). 물론, 각 기기의 성능에 따라 음성 인식 데이터를 생성하고 이를 AVN 시스템(500)으로 전송하는데까지 소요되는 시간은 상이할 수 있으나, 동일 음성 명령에 대한 타임 스탬프 정보는 실질적으로 큰 차이가 없을 것이다.Voice recognition data A generated by voice recognition device A (200A) and voice recognition data B generated by voice recognition device B (200B) can be transmitted to the AVN system (500) respectively (S630A, S630B). Of course, depending on the performance of each device, the time required to generate voice recognition data and transmit it to the AVN system (500) may differ, but the time stamp information for the same voice command will not substantially differ.

AVN 시스템(500)은 수신된 음성 인식 데이터를 기반으로 각 데이터에 포함된 음성 명령 데이터가 동일 음성 명령에 해당하는지 여부를 판단하고(S640), 동일 음성 명령에 대한 음성 인식을 음성 인식 서버(400)로 전송하여 그(400)로부터 음성 인식 결과를 수신할 수 있다(S650). AVN 시스템(500)은 동일 음성 명령이 복수개인 경우, 수신한 기기들 중 어떠한 기기가 음성 인식 결과를 출력할지 또는 각 기기별 음성 인식 결과를 출력 형태를 결정할 수 있다(S660).The AVN system (500) determines whether voice command data included in each data corresponds to the same voice command based on the received voice recognition data (S640), and transmits voice recognition for the same voice command to the voice recognition server (400) to receive the voice recognition result from it (400) (S650). If there are multiple identical voice commands, the AVN system (500) can determine which of the receiving devices will output the voice recognition result or the output format of the voice recognition result for each device (S660).

예컨대, 음성 인식 기기B가 대표 출력 기기로 결정된 경우, AVN 시스템(500)은 음성 인식 기기B로 음성 인식 결과 정보를 전송함에 있어, 음성 인식 결과에 해당 기기가 대표 출력 기기임을 나타내는 정보나 음향을 포함하여 음성 인식 결과를 출력할 것을 지시하는 정보를 포함시킬 수 있다(S670B).For example, if voice recognition device B is determined as the representative output device, when transmitting voice recognition result information to voice recognition device B, the AVN system (500) may include information indicating that the corresponding device is the representative output device in the voice recognition result or information instructing that the voice recognition result be output including sound (S670B).

이와 달리 음성 인식 기기A는 대표 출력 기기가 아니므로, AVN 시스템(500)은 음성 인식 기기A로 음성 인식 결과 정보를 전송함에 있어, 음성 인식 결과에 해당 기기가 대표 출력 기기가 아님을 나타내는 정보나 음향을 제외하여 음성 인식 결과를 출력할 것을 지시하는 정보를 포함시킬 수 있다(S670A).In contrast, since voice recognition device A is not a representative output device, when transmitting voice recognition result information to voice recognition device A, the AVN system (500) may include information indicating that the device is not a representative output device in the voice recognition result or information instructing that the voice recognition result be output by excluding sound (S670A).

그에 따라 각 기기는 AVN 시스템(500)이 전송한 음성 인식 결과 정보에 대응되는 형태로 음성 인식 결과를 출력할 수 있다(S680A. S680B).Accordingly, each device can output voice recognition results in a form corresponding to the voice recognition result information transmitted by the AVN system (500) (S680A. S680B).

도 6을 참조하여 전술한 기능을 수행하기 위해, AVN 시스템(500)은 적어도 복수의 기기와 동시에 근거리 무선 통신 연결을 수행하는 제1 통신부, 음성 인식 서버(400)와 무선 통신을 수행할 수 있는 제2 통신부와, 도 3을 참조하여 전술한 음성 인식 서버(400)의 동일명령 판단부(411) 및 출력 기기 결정부(415)에 대응되는 구성을 포함하는 제어부를 가지는 것이 바람직하다.In order to perform the function described above with reference to FIG. 6, it is preferable that the AVN system (500) have a first communication unit that performs short-range wireless communication connection with at least a plurality of devices simultaneously, a second communication unit that can perform wireless communication with a voice recognition server (400), and a control unit that includes a configuration corresponding to the same command determination unit (411) and the output device determination unit (415) of the voice recognition server (400) described above with reference to FIG. 3.

전술한 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다.The above-described present invention can be implemented as a computer-readable code on a medium in which a program is recorded. The computer-readable medium includes all kinds of recording devices that store data that can be read by a computer system. Examples of the computer-readable medium include a hard disk drive (HDD), a solid state disk (SSD), a silicon disk drive (SDD), a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc.

따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.Accordingly, the above detailed description should not be construed as restrictive in all respects but should be considered as illustrative. The scope of the invention should be determined by a reasonable interpretation of the appended claims, and all changes coming within the equivalent scope of the invention are intended to be embraced therein.