KR20240068017A

Movatterモバイル変換

Info

Publication number: KR20240068017A
Application number: KR1020220148053A
Authority: KR
Inventors: 정민영; 장진예; 김산; 신사임
Original assignee: 한국전자기술연구원
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2024-05-17
Also published as: WO2024101615A1

Abstract

According to the present disclosure, a turn-free conversation device is provided, including one or more processors and a memory communicatively connected to the processors and storing program codes executed by the processors, wherein the program codes are executed by the processors and are written to determine whether it is time to provide an intermediate response whenever a part of a user's speech is input, and to output an intermediate response corresponding to the user's speech, thereby providing the user with linguistic, auditory, and visual responses in the middle of speaking a sentence, thereby making the user want to continue a conversation with the turn-free conversation device, giving the user a feeling of talking to a person, and reducing the user's rejection.

Description

Translated fromKorean

턴프리 대화 방법 및 장치{Turn-free conversation method and apparatus}Turn-free conversation method and apparatus}

본 개시는 턴프리 대화 방법 및 장치에 관한 것이다.This disclosure relates to a turn-free conversation method and device.

최근 인공지능을 이용하여 사람과 대화를 제공하는 대화 서비스가 제공되고 있다. 인공지능 장치는 사람과의 대화에 기초하여 정해진 기능을 실행하거나, 사람이 질문한 내용에 대한 답변을 제공할 수 있다. 현재 서비스되고 있는 인공지능을 이용한 대화서비스는 사람이 문장을 말하는 턴이 종료되면 인공지능은 사람이 말한 문장을 인식하고 답변을 제공하는 턴을 수행하는 방식이다. 이러한 턴 방식은 사람과 사람 사이의 실제의 대화와는 거리가 있다.Recently, conversation services that provide conversations with people using artificial intelligence have been provided. Artificial intelligence devices can execute designated functions based on conversations with people or provide answers to questions asked by people. The conversation service using artificial intelligence that is currently in service is a method in which when the person's turn to speak a sentence ends, the artificial intelligence performs a turn to recognize the sentence the person said and provides an answer. This turn method is far from an actual conversation between people.

KRKR 10-2020-0000604 10-2020-0000604AA

본 개시는 사람이 문장을 말하는 중간에 언어적, 청각적, 시각적 반응을 사용자에게 제공할 수 있는 턴프리 대화 방법 및 장치를 제공하기 위한 것이다.The present disclosure is intended to provide a turn-free conversation method and device that can provide verbal, auditory, and visual responses to a user while a person is speaking a sentence.

본 성과물은 1. 한국전자기술연구원의 기본연구사업의 미래전략기술개발사업 중, '멀티모달 상호작용 및 지식기반 토론이 가능한 인공지능 복합대화 시스템 기술 연구 과제(과제번호: 401C2906, 기여율: 1/2)', 및 2. 과학기술정보통신부의 사람중심인공지능핵심원천기술개발사업 중, '(1세부)인간과 교감하는 멀티모달 인터랙션 인공지능 기술 과제(과제고유번호: 1711160496, 기여율: 1/2)' 의 지원을 받아 수행된 결과이다.This achievement is 1. Among the future strategic technology development projects of the basic research project of the Korea Electronics Research Institute, 'Artificial intelligence complex conversation system technology research project capable of multimodal interaction and knowledge-based discussion (project number: 401C2906, contribution rate: 1/ 2)', and 2. Among the human-centered artificial intelligence core source technology development project of the Ministry of Science and ICT, '(Part 1) Multimodal interaction artificial intelligence technology task that communicates with humans (task identification number: 1711160496, contribution rate: 1/ 2) This is the result of a project carried out with support from .

본 개시에 따른 턴프리 대화 장치는, 하나 또는 복수의 프로세서, 상기 프로세서와 통신가능하게 연결되고, 상기 프로세서에서 실행되는 프로그램 코드를 저장하는 메모리를 포함하고, 상기 프로그램 코드는 상기 프로세서에 의해 실행되고, 사용자의 발화의 일부가 입력될 때마다 중간응답을 제공할 시점인지 판단하고, 사용자의 발화에 대응하는 중간응답을 출력하는 턴프리 대화모델을 포함할 수 있다.The turn-free conversation device according to the present disclosure includes one or a plurality of processors, a memory communicatively connected to the processors and storing program codes to be executed in the processors, and the program codes are executed by the processors. , it may include a turn-free conversation model that determines whether it is time to provide an intermediate response whenever a part of the user's utterance is input, and outputs an intermediate response corresponding to the user's utterance.

일 구현예(one embodiment)에 따르면 상기 중간응답은 단어 또는 문장으로 표현되는 언어적 응답, 소리로 표현되는 청각적 응답, 이모티콘, 표정 또는 제스처로 표현되는 시각적 응답을 포함하며, 사용자의 발화를 인식하고 있음을 나타내는 것일 수 있다.According to one embodiment, the intermediate response includes a verbal response expressed as a word or sentence, an auditory response expressed as a sound, and a visual response expressed as an emoticon, facial expression, or gesture, and recognizes the user's utterance. It may indicate that something is being done.

일 구현예에 따르면, 상기 턴프리 대화모델은 제1 화자의 발화를 입력되는 순서대로 정해진 크기의 조각으로 생성하고, 상기 조각에 타임스탬프를 부여하고, 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하며, 상기 조각그룹이 학습데이터이고 상기 타임스탬프마다 생성된 조각그룹에 대한 제2 화자의 발화가 라벨데이터인 학습데이터세트를 학습하여 생성되고, 사용자의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹이 생성되어 상기 턴프리 대화모델에 입력되면, 상기 턴프리 대화모델은 중간응답을 제공할 시점인지 판단하고 중간응답을 출력할 수 있다.According to one implementation, the turn-free conversation model generates the first speaker's utterance into pieces of a fixed size in the order in which they are input, assigns a timestamp to the pieces, and accumulates pieces of the previous timestamp for each timestamp. A fragment group is created by connecting, and the fragment group is learning data, and the second speaker's utterance for the fragment group generated for each timestamp is created by learning a learning dataset as label data, and the user's utterance is input. They are created in order as pieces of a fixed size, given a timestamp, and for each timestamp, pieces of the previous timestamp are cumulatively connected to create a group of pieces and input into the turn-free conversation model, and the turn-free conversation model provides an intermediate response. You can determine whether it is time to provide and output an intermediate response.

일 구현예에 따르면, 상기 학습데이터는 수집된 대화 데이터에서 두 사람의 발화가 겹치는 경우, 겹치는 발화를 포함하는 문장의 길이를 비교하고 문장의 길이가 긴 화자의 발화를 제1 화자의 발화로 지정하고, 길이가 짧은 화자의 발화를 제2 화자의 발화로 지정할 수 있다.According to one implementation, when the utterances of two people overlap in the collected conversation data, the learning data compares the length of sentences including the overlapping utterances and designates the utterance of the speaker with the longer sentence as the utterance of the first speaker. And, the short speaker's utterance can be designated as the second speaker's utterance.

일 구현예에 따르면, 상기 프로그램 코드는 상기 프로세서에 의해 실행되고, 상기 사용자의 발화의 전부가 입력되면 상기 발화의 내용을 분석하여 실질적인 응답을 제공하는 턴기반 대화모델을 더 포함할 수 있다.According to one implementation, the program code is executed by the processor and may further include a turn-based dialogue model that analyzes the content of the user's speech when all of the user's speech is input and provides a substantive response.

일 구현예에 따르면, 상기 프로세서는 상기 턴프리 대화모델과 상기 턴기반 대화모델은 서로 독립적으로 실행하고, 상기 턴프리 대화모델은 사용자의 발화 중간에 중간응답을 제공하고, 상기 턴기반 대화모델은 상기 사용자의 발화가 완료되면 실질적인 응답을 제공할 수 있다.According to one implementation, the processor executes the turn-free dialogue model and the turn-based dialogue model independently of each other, the turn-free dialogue model provides an intermediate response in the middle of the user's speech, and the turn-based dialogue model Once the user's speech is completed, an actual response can be provided.

본 개시에 따른 턴프리 대화 방법은, 사용자의 발화를 입력받는 단계, 사용자의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하는 단계, 상기 타임스탬프마다 생성된 조각그룹을 턴프리 대화모델에 입력하고, 중간응답을 획득하는 단계, 상기 중간응답을 사용자에게 제공하는 단계를 포함할 수 있다.The turn-free conversation method according to the present disclosure includes the steps of receiving a user's utterance, creating pieces of a certain size in the order in which the user's utterance is input, giving a timestamp, and accumulating pieces of the previous timestamp for each timestamp. It may include the step of creating a fragment group by connecting, inputting the fragment group created for each timestamp into a turn-free conversation model, obtaining an intermediate response, and providing the intermediate response to the user.

일 구현예에 따른 턴프리 대화 방법은, 상기 턴프리 대화모델을 생성하는 단계를 더 포함하며, 상기 턴프리 대화모델을 생성하는 단계는 제1 화자의 발화를 입력되는 순서대로 정해진 크기의 조각으로 생성하고, 상기 조각에 타임스탬프를 부여하고, 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하며, 상기 조각그룹이 학습데이터이고 상기 타임스탬프마다 생성된 조각그룹에 대한 제2 화자의 발화가 라벨데이터인 학습데이터세트를 생성하는 단계, 및 상기 학습데이터세트를 이용하여, 타임스탬프마다 생성된 조각그룹이 입력되면, 중간응답을 제공할 시점인지 판단하고 중간응답을 출력하도록 턴프리 대화모델을 학습시키는 단계를 포함할 수 있다.The turn-free conversation method according to one implementation further includes the step of generating the turn-free conversation model, wherein the step of generating the turn-free conversation model divides the utterance of the first speaker into pieces of a predetermined size in the order in which they are input. Create a fragment group by assigning a timestamp to the fragment, cumulatively connecting the fragments of the previous timestamp for each timestamp, and creating a fragment group. The fragment group is learning data, and the fragment group generated for each timestamp is generated. 2 A step of creating a learning dataset in which the speaker's utterance is label data, and using the learning dataset to determine whether it is time to provide an intermediate response when a fragment group generated for each timestamp is input and output an intermediate response. It may include the step of learning a turn-free conversation model.

일 구현예에 따르면, 상기 학습데이터세트를 생성하는 단계는 수집된 대화 데이터에서 두 사람의 발화가 겹치는 경우, 겹치는 발화를 포함하는 문장의 길이를 비교하고 문장의 길이가 긴 화자의 발화를 제1 화자의 발화로 지정하고, 길이가 짧은 화자의 발화를 제2 화자의 발화로 지정하여 학습데이터와 라벨데이터를 구분하는 과정을 더 수행하고, 상기 제1 화자의 발화를 조각그룹으로 생성하여 입력데이터로 정하고 상기 제2 화자의 발화를 라벨데이터로 정할 수 있다.According to one implementation, in the step of generating the learning dataset, when the utterances of two people overlap in the collected conversation data, the length of the sentence including the overlapping utterance is compared and the utterance of the speaker with the longer sentence is used as the first utterance. The utterance of the speaker is designated as the speaker's utterance, and the shorter speaker's utterance is designated as the second speaker's utterance, further performing the process of distinguishing the training data from the label data, and the first speaker's utterance is generated as a fragment group to provide input data. and the second speaker's utterance can be set as label data.

본 개시에 따른 턴프리 대화 방법은, 상기 사용자의 발화의 전부가 입력되면 상기 발화의 내용을 분석하여 실질적인 응답을 제공하는 턴기반 대화모델을 이용하여, 상기 사용자의 발화가 완료되면 상기 발화를 상기 턴기반 대화모델에 입력하여 실질적 응답을 획득하는 단계, 및 상기 실질적 응답을 사용자에게 제공하는 단계를 더 포함할 수 있다.The turn-free conversation method according to the present disclosure uses a turn-based conversation model that analyzes the content of the user's utterance and provides a practical response when all of the user's utterance is input, and when the user's utterance is completed, the utterance is It may further include obtaining a substantive response by inputting it into a turn-based dialogue model, and providing the substantive response to the user.

일 구현예에 따르면, 상기 중간응답을 획득하는 단계와 상기 실질적 응답을 획득하는 단계는 서로 독립적으로 실행되어, 사용자의 발화 중간에는 상기 중간응답을 제공하는 단계가 수행되고, 사용자의 발화가 완료되면 실질적 응답을 제공하는 단계가 수행될 수 있다.According to one implementation, the step of obtaining the intermediate response and the step of obtaining the actual response are performed independently of each other, so that the step of providing the intermediate response is performed in the middle of the user's utterance, and when the user's utterance is completed, the step of providing the intermediate response is performed. Steps may be taken to provide a substantive response.

본 개시의 특징 및 이점들은 첨부도면에 의거한 다음의 상세한 설명으로 더욱 명백해질 것이다.The features and advantages of the present disclosure will become more apparent from the following detailed description based on the accompanying drawings.

이에 앞서 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이고 사전적인 의미로 해석되어서는 아니 되며, 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합되는 의미와 개념으로 해석되어야만 한다.Prior to this, terms or words used in this specification and claims should not be interpreted in their usual, dictionary meaning, and the inventor may appropriately define the concept of the term in order to explain his or her invention in the best way. It must be interpreted with meaning and concept consistent with the technical idea of the present invention based on the principle that it is.

본 개시의 일 구현예에 따르면, 턴프리 대화 장치는 사용자가 문장을 말하는 중간에 언어적, 청각적, 시각적 반응을 사용자에게 제공하므로, 사용자가 턴프리 대화 장치와 대화를 지속하고 싶은 마음을 갖게 할 수 있다.According to one implementation of the present disclosure, the turn-free conversation device provides verbal, auditory, and visual responses to the user while the user is speaking a sentence, so that the user wants to continue the conversation with the turn-free conversation device. can do.

본 개시의 일 구현예에 따르면, 턴프리 대화 장치는 언어적, 청각적, 시각적 반응을 사용자에게 제공함으로써 사용자가 사람과 대화하는 느낌을 줄 수 있고 사용자의 거부감을 감소시킬 수 있다.According to one implementation of the present disclosure, a turn-free conversation device can give the user the feeling of talking to a person and reduce the user's resistance by providing verbal, auditory, and visual responses to the user.

도 1은 일 구현예에 따른 턴프리 대화 장치와 사용자의 대화를 나타내는 도면이다.
도 2는 일 구현예에 따른 턴프리 대화 장치를 나타내는 도면이다.
도 3은 일 구현예에 따른 턴프리 대화모델의 학습데이터세트를 설명하는 도면이다.
도 4는 일 구현예에 따른 턴프리 대화 방법을 나타내는 도면이다.
도 5는 일 구현예에 따른 사용자의 발화와 응답을 설명하는 도면이다.1 is a diagram illustrating a conversation between a turn-free conversation device and a user according to one implementation.
Figure 2 is a diagram showing a turn-free conversation device according to an implementation.
Figure 3 is a diagram explaining a learning dataset of a turn-free conversation model according to an implementation example.
Figure 4 is a diagram showing a turn-free conversation method according to one implementation.
Figure 5 is a diagram illustrating a user's speech and response according to an implementation example.

본 개시의 목적, 장점, 및 특징들은 첨부된 도면들과 연관되는 이하의 상세한 설명과 바람직한 구현예들로부터 더욱 명백해질 것이나, 본 개시가 반드시 이에 한정되는 것은 아니다. 또한, 본 개시를 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명은 생략한다.The objects, advantages, and features of the present disclosure will become more apparent from the following detailed description and preferred embodiments taken in conjunction with the accompanying drawings, but the present disclosure is not necessarily limited thereto. Additionally, in describing the present disclosure, if it is determined that a detailed description of related known technologies may unnecessarily obscure the gist of the present disclosure, the detailed description will be omitted.

도면의 구성요소들에 참조부호를 부여함에 있어서, 동일한 구성 요소들은 비록 다른 도면상에 표시되더라도 가능한 한 동일한 참조부호가 부여되고, 유사한 구성요소에 대해서는 유사한 참조부호가 부여됨에 유의하여야 한다.In assigning reference numerals to components in the drawings, it should be noted that identical components are assigned the same reference numerals as much as possible even if they are shown in different drawings, and similar components are assigned similar reference numerals.

본 개시의 일 구현예를 설명하기 위해 사용한 용어는 본 개시를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 달리 명시하지 않는 한 복수의 표현을 포함한다는 것을 알아야 한다.Terms used to describe one implementation of the present disclosure are not intended to limit the disclosure. It should be noted that singular expressions include plural expressions unless the context clearly dictates otherwise.

본 문서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this document, "have,""You can have it,""includes," or "may include" Expressions such as indicate the presence of the corresponding feature (e.g., a numerical value, function, operation, or component such as a part) and do not exclude the presence of additional features.

"일(one)", "다른(other)", "또다른(another)", "제1(first)", "제2(second)" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 상기 용어들에 의해 제한되는 것은 아니다."one", "other", "another", "first", "second" Terms such as are used to distinguish one component from another component, and the components are not limited by the above terms.

본 문서에 기재된 구현예 및 첨부된 도면은 본 개시를 특정한 실시 형태에 대해 한정하려는 것이 아니다. 본 개시는 구현예의 다양한 변경(modifications), 균등물(equivalents), 및/또는 대체물(alternatives)을 포함하는 것으로 이해되어야 한다.The implementation examples described in this document and the accompanying drawings are not intended to limit the disclosure to specific embodiments. This disclosure should be understood to include various modifications, equivalents, and/or alternatives of the embodiments.

도 1은 일 구현예에 따른 턴프리 대화 장치(30)와 사용자(10)의 대화를 나타내는 도면이다.FIG. 1 is a diagram illustrating a conversation between a turn-free conversation device 30 and a user 10 according to an implementation example.

일 구현예에 따른 턴프리 대화 장치(30)는 사용자(10)의 발화에 응답을 제공할 수 있다. 사용자(10)가 턴프리 대화 장치(30)에 발화를 입력하면 턴프리 대화 장치(30)는 중간응답 또는 실질적 응답을 제공할 수 있다. 중간응답은 사용자(10)가 말하는 중간에 턴프리 대화 장치(30)가 응답하는 것을 말한다. 중간응답은 언어적, 청각적, 시각적 방법으로 표현될 수 있고, 비언어적 방법으로도 표현될 수 있다. 실질적 응답은 사용자(10)의 발화가 완료되면 턴프리 대화 장치(30)가 사용자(10)의 발화에 대응하는 내용을 포함하는 응답을 하는 것이다.The turn-free conversation device 30 according to one implementation may provide a response to the user's utterance. When the user 10 inputs a speech into the turn-free conversation device 30, the turn-free conversation device 30 may provide an intermediate response or an actual response. The intermediate response refers to the turn-free conversation device 30 responding while the user 10 is speaking. Intermediate responses can be expressed verbally, auditorily, visually, and non-verbally. The actual response is when the user's 10 utterance is completed, the turn-free conversation device 30 provides a response containing content corresponding to the user 10's utterance.

턴프리 대화 장치(30)는 네트워크를 통해 원격단말(20)과 연결될 수 있다. 사용자(10)는 원격단말(20)을 통해 턴프리 대화 장치(30)와 대화할 수 있다. 원격단말(20)은 스마트폰, 인공지능 비서 단말, 태블릿 PC, 노트북 PC, 공기청정기, 스마트 TV, 자동차 등에 포함될 수 있다. 원격단말(20)은 마이크를 통해 사용자(10)의 발화를 수신하여 턴프리 대화 장치(30)로 제공하고, 턴프리 대화 장치(30)로부터 사용자(10)에게 제공할 응답을 수신하여 스피커로 출력할 수 있다.The turn-free conversation device 30 can be connected to the remote terminal 20 through a network. The user 10 can communicate with the turn-free conversation device 30 through the remote terminal 20. The remote terminal 20 may be included in a smartphone, artificial intelligence assistant terminal, tablet PC, laptop PC, air purifier, smart TV, car, etc. The remote terminal 20 receives the user's utterance through a microphone and provides it to the turn-free conversation device 30, and receives the response to be provided to the user 10 from the turn-free conversation device 30 and transmits it to the speaker. Can be printed.

중간응답은 단어 또는 문장으로 표현되는 언어적 응답, 소리로 표현되는 청각적 응답, 이모티콘, 표정 또는 제스처로 표현되는 시각적 응답을 포함할 수 있다. 중간응답은 사용자(10)의 발화를 인식하고 있음을 나타내는 것이다.Intermediate responses may include verbal responses expressed as words or sentences, auditory responses expressed as sounds, and visual responses expressed as emoticons, facial expressions, or gestures. The intermediate response indicates that the user's 10 utterance is recognized.

중간응답은 사용자(10)가 발화를 완성하기 전에 제공되는 것으로서, 턴프리 대화 장치(30)가 사용자(10)의 발화를 인식하고 있다는 점을 사용자(10)에게 나타내기 위해 제공될 수 있다. 중간응답이 제공됨으로써, 사용자(10)에게 사람과 대화하는 느낌을 줄 수 있고 사용자(10)의 거부감을 감소시킬 수 있으며 대화를 지속하고 싶은 마음을 갖게 할 수 있다.The intermediate response is provided before the user 10 completes the utterance, and may be provided to indicate to the user 10 that the turn-free conversation device 30 recognizes the user 10's utterance. By providing an intermediate response, it is possible to give the user 10 the feeling of having a conversation with a person, reduce the user's 10 resistance, and make the user want to continue the conversation.

도 2는 일 구현예에 따른 턴프리 대화 장치(30)를 나타내는 도면이다.FIG. 2 is a diagram illustrating a turn-free conversation device 30 according to one implementation.

일 구현예에 따른 턴프리 대화 장치(30)는 정보처리기능을 수행하는 컴퓨터 장치일 수 있다. 턴프리 대화 장치(30)는 PC, 서버컴퓨터, 태블릿PC 등을 포함할 수 있다. 턴프리 대화 장치(30)는, 하나 또는 복수의 프로세서(31), 프로세서(31)와 통신가능하게 연결되고, 프로세서(31)에서 실행되는 프로그램 코드(100)를 저장하는 메모리(32)를 포함하고, 프로그램 코드(100)는 프로세서(31)에 의해 실행되고, 사용자(10)의 발화의 일부가 입력될 때마다 중간응답을 제공할 시점인지 판단하고, 사용자(10)의 발화에 대응하는 중간응답을 출력하는 턴프리 대화모델(110)을 포함할 수 있다. 턴프리 대화 장치(30)는 네트워크와 연결되어 데이터를 송수신하는 통신부(33), 또는 사용자(10)의 발화를 수신하고 사용자(10)에게 응답을 제공하는 입출력부(34)를 더 포함할 수 있다. 프로세서(31), 메모리(32), 통신부(33), 입출력부(34)는 서로 통신가능하게 연결될 수 있다.The turn-free conversation device 30 according to one implementation may be a computer device that performs an information processing function. The turn-free conversation device 30 may include a PC, server computer, tablet PC, etc. The turn-free conversation device 30 includes one or more processors 31, a memory 32 communicatively connected to the processors 31 and storing program code 100 executed in the processors 31. The program code 100 is executed by the processor 31, and whenever a part of the user's 10 utterance is input, it determines whether it is time to provide an intermediate response, and provides an intermediate response corresponding to the user 10's utterance. It may include a turn-free conversation model 110 that outputs a response. The turn-free conversation device 30 may further include a communication unit 33 that is connected to a network and transmits and receives data, or an input/output unit 34 that receives an utterance from the user 10 and provides a response to the user 10. there is. The processor 31, memory 32, communication unit 33, and input/output unit 34 may be communicatively connected to each other.

프로세서(31)는 턴프리 대화 방법을 수행하도록 작성된 프로그램 코드(100)를 실행할 수 있다. 프로세서(31)는 CPU, GPU, 그 밖의 정보처리소자를 포함할 수 있다.The processor 31 may execute program code 100 written to perform a turn-free conversation method. The processor 31 may include a CPU, GPU, and other information processing elements.

메모리(32)는 프로세서(31)에서 실행되는 프로그램 코드(100), 그 밖에 턴프리 대화 방법을 수행하기 위하여 필요한 데이터를 저장할 수 있다. 메모리(32)는 하드디스크, 메모리(32)칩, 데이터베이스, 클라우드 저장소 등을 포함할 수 있다.The memory 32 can store the program code 100 executed in the processor 31 and other data necessary to perform a turn-free conversation method. The memory 32 may include a hard disk, memory 32 chip, database, cloud storage, etc.

통신부(33)는 Wi-fi, bluetooth, zigbee 등의 근거리 무선통신, LTE, 5G, 6G 등의 이동통신, ethernet, LAN, IP4, IP6 등의 통신방식 등, 알려진 통신방식을 이용할 수 있다. 통신부(33)는 원격단말(20)과 유선 또는 무선 네트워크를 통해 연결될 수 있다.The communication unit 33 can use known communication methods such as short-range wireless communication such as Wi-fi, bluetooth, and zigbee, mobile communication such as LTE, 5G, and 6G, and communication methods such as ethernet, LAN, IP4, and IP6. The communication unit 33 may be connected to the remote terminal 20 through a wired or wireless network.

입출력부(34)는 마이크, 스피커, 디스플레이를 포함할 수 있다. 입출력부(34)는 마이크를 통해 사용자(10)의 발화를 입력받을 수 있다. 입출력부(34)는 스피커를 통해 응답을 청각적으로 출력할 수 있다. 입출력부(34)는 디스플레이를 통해 응답을 시각적으로 출력할 수 있다.The input/output unit 34 may include a microphone, speaker, and display. The input/output unit 34 can receive the user's 10 speech through a microphone. The input/output unit 34 can output a response audibly through a speaker. The input/output unit 34 can visually output a response through a display.

프로세서(31)는 메모리(32)에 저장되어 있는 프로그램 코드(100)를 읽어들여 실행하여 턴프리 대화 방법을 수행할 수 있다. 본 문서에서 '턴프리(Turn-free)'는 사용자의 발화가 끝나기 전에도 응답을 제공하는 방식을 말한다. '턴기반(Turn-based)'은 사용자의 발화가 끝나면 응답을 제공하는 방식을 말한다.The processor 31 can perform a turn-free conversation method by reading and executing the program code 100 stored in the memory 32. In this document, 'Turn-free' refers to a method that provides a response even before the user's speech ends. ‘Turn-based’ refers to a method that provides a response after the user has finished speaking.

프로그램 코드(100)는 턴프리 대화모델(110)을 포함할 수 있다. 턴프리 대화모델(110)은 학습데이터세트를 학습하여 생성된 인공지능 모델이다. 턴프리 대화모델(110)은 사용자(10)의 발화의 일부를 순서대로 입력받는 즉시 중간응답을 제공할 시점인지 판단하고 중간응답을 출력할 수 있다. 프로세서(31)는 턴프리 대화모델(110)이 출력하는 중간응답을 입출력부(34) 또는 원격단말(20)을 통해 사용자(10)에게 출력할 수 있다.Program code 100 may include a turn-free conversation model 110. The turn-free conversation model 110 is an artificial intelligence model created by learning a learning dataset. The turn-free conversation model 110 can determine whether it is time to provide an intermediate response as soon as it receives part of the user's 10 utterance in order and output the intermediate response. The processor 31 may output the intermediate response output by the turn-free conversation model 110 to the user 10 through the input/output unit 34 or the remote terminal 20.

턴프리 대화모델(110)은 RNN, LSTM, Seq2Seq, Transformer 기반 사전학습모델 BERT, GPT, T5 등의 모델로 구현될 수 있다. 턴프리 대화모델(110)은 본 문서에서 기재한 모델 이외의 다른 구조의 모델로도 구현될 수 있다. 턴프리 대화모델(110)은 이후에 설명하는 학습데이터세트를 통해 학습함으로써, 중간응답을 제공할 시점인지 여부와, 적절한 중간응답이 무엇인지를 학습할 수 있다.The turn-free conversation model 110 can be implemented with models such as RNN, LSTM, Seq2Seq, Transformer-based dictionary learning model BERT, GPT, and T5. The turn-free conversation model 110 can also be implemented as a model with a structure other than the model described in this document. By learning through a learning dataset described later, the turn-free conversation model 110 can learn whether it is time to provide an intermediate response and what an appropriate intermediate response is.

프로그램 코드(100)는 프로세서(31)에 의해 실행되고, 사용자(10)의 발화의 전부가 입력되면 발화의 내용을 분석하여 실질적인 응답을 제공하는 턴기반 대화모델(120)을 더 포함할 수 있다. 턴기반 대화모델(120)은 알려진 방법을 이용하여 생성된 인공지능 모델이다. 턴기반 대화모델(120)은 사용자(10)의 발화를 분석하고 사용자(10)의 발화에 대응하는 내용의 실질적 응답을 출력할 수 있다. 프로세서(31)는 턴기반 대화모델(120)이 출력하는 실질적 응답을 입출력부(34) 또는 원격단말(20)을 통해 사용자(10)에게 출력할 수 있다.The program code 100 is executed by the processor 31, and may further include a turn-based conversation model 120 that analyzes the content of the utterance when all of the utterance of the user 10 is input and provides an actual response. . The turn-based conversation model 120 is an artificial intelligence model created using a known method. The turn-based conversation model 120 can analyze the utterance of the user 10 and output an actual response containing content corresponding to the utterance of the user 10. The processor 31 may output the actual response output by the turn-based dialogue model 120 to the user 10 through the input/output unit 34 or the remote terminal 20.

프로세서(31)는 턴프리 대화모델(110)과 상기 턴기반 대화모델(120)은 서로 독립적으로 실행하고, 턴프리 대화모델(110)은 사용자(10)의 발화 중간에 중간응답을 제공하고, 턴기반 대화모델(120)은 사용자(10)의 발화가 완료되면 실질적인 응답을 제공할 수 있다. 턴프리 대화 장치(30)는 사용자(10)가 발화를 하는 도중에 중간응답을 제공할 수 있고, 사용자(10)가 발화를 완료하면 실질적 응답을 제공할 수 있다. 따라서 사용자(10)는 턴프리 대화 장치(30)로부터 사람과 대화를 하는 것과 같은 자연스러움을 얻으면서 턴프리 대화 장치(30)로부터 정보나 기능을 제공받을 수 있다.The processor 31 executes the turn-free dialogue model 110 and the turn-based dialogue model 120 independently of each other, and the turn-free dialogue model 110 provides an intermediate response in the middle of the user's 10 utterance, The turn-based conversation model 120 can provide an actual response when the user 10's utterance is completed. The turn-free conversation device 30 can provide an intermediate response while the user 10 is speaking, and can provide an actual response when the user 10 has completed speaking. Accordingly, the user 10 can receive information or functions from the turn-free conversation device 30 while obtaining the same naturalness as having a conversation with a person.

도 3은 일 구현예에 따른 턴프리 대화모델(110)의 학습데이터세트를 설명하는 도면이다.FIG. 3 is a diagram illustrating a learning dataset of the turn-free conversation model 110 according to an implementation example.

턴프리 대화모델(110)은 제1 화자(210)의 발화를 입력되는 순서대로 정해진 크기의 조각으로 생성하고, 조각에 타임스탬프를 부여하고, 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하며, 조각그룹이 학습데이터이고 타임스탬프마다 생성된 조각그룹에 대한 제2 화자(220)의 발화가 라벨데이터인 학습데이터세트를 학습하여 생성될 수 있다.The turn-free conversation model 110 generates the utterance of the first speaker 210 into fragments of a fixed size in the order in which they are input, assigns a timestamp to the fragment, and cumulatively connects the fragments of the previous timestamp for each timestamp. A fragment group may be created by learning a learning dataset in which the fragment group is learning data and the utterance of the second speaker 220 for the fragment group generated for each timestamp is label data.

턴프리 대화모델(110)은 인공지능 모델에 학습데이터세트를 학습시켜 생성될 수 있다. 학습데이터세트는 학습데이터와, 학습데이터에 대한 라벨데이터를 포함할 수 있다. 학습데이터세트는 사람과 사람 사이의 대화 데이터(200)를 통해 생성할 수 있다. 사람과 사람 사이의 대화 데이터(200)는 오픈소스에 공개된 데이터나, 드라마, 영화, TV쇼와 같은 방송프로그램 등으로부터 획득할 수 있다.The turn-free conversation model 110 can be created by training a learning dataset in an artificial intelligence model. The learning data set may include training data and label data for the training data. A learning dataset can be created through conversation data 200 between people. Conversation data 200 between people can be obtained from open source data or broadcast programs such as dramas, movies, and TV shows.

대화 데이터(200)는 소리 형태(Sound Form, SF)로 존재할 수 있다. 소리 형태의 대화 데이터(200) 중에서 발화를 추출하여 학습데이터세트를 생성할 수 있다. 소리 형태의 대화 데이터(200)는 STT(Speech To Text) 엔진을 이용하여 텍스트 형태(Text Form, TF)로 변환할 수 있다.The conversation data 200 may exist in sound form (SF). A learning dataset can be created by extracting utterances from conversation data 200 in the form of sound. Conversation data 200 in the form of sound can be converted into text form (TF) using a Speech To Text (STT) engine.

소리 형태의 대화 데이터(200)로부터 텍스트 형태의 대화 데이터(200)를 생성하는 과정에서 화자를 분리하고 문장을 분리할 수 있다. 화자의 분리는 Speaker Seperation 기술을 이용하여 수행될 수 있다. 혼합된 상태의 복수의 화자의 발화는 Speaker Seperation 기술을 이용하여 각각의 화자의 발화로 분리되고, 하나의 화자가 발화하는 여러개의 문장은 음성 활동 감지(Voice Activity Detection, VAD)를 이용하여 하나의 문장씩 분리될 수 있다.In the process of generating conversation data 200 in text form from conversation data 200 in sound form, speakers can be separated and sentences can be separated. Separation of speakers can be performed using Speaker Separation technology. Utterances from multiple speakers in a mixed state are separated into each speaker's utterances using Speaker Separation technology, and multiple sentences uttered by one speaker are separated into one speaker using Voice Activity Detection (VAD). Each sentence can be separated.

예를 들어, 2명의 화자가 대화를 하는 경우, Speaker Seperation 기술을 이용하여 제1 화자(210)의 발화와 제2 화자(220)의 발화를 구별하고, 음성 활동 감지를 이용하여 제1 화자(210)의 여러 문장을 분리하고, 제2 화자(220)의 여러 문장을 분리할 수 있다.For example, when two speakers are having a conversation, Speaker Separation technology is used to distinguish between the utterances of the first speaker 210 and the second speaker 220, and voice activity detection is used to distinguish between the utterances of the first speaker 210 and the second speaker 220. 210) can separate several sentences, and several sentences of the second speaker 220 can be separated.

텍스트 형태의 대화 데이터(200)를 생성함에 있어서, 정해진 크기의 조각으로 발화를 분리할 수 있다. 조각은 사용자(10)의 발화의 일부이다. 조각의 크기는 시간을 기준으로 결정될 수 있다. 예를 들어, 조각은 1초로 결정될 수 있고, 총 발화의 길이가 3초인 경우, 3개의 조각으로 분리될 수 있다. 또는, 조각의 크기는 음절 또는 어절을 기준으로 결정될 수 있다. 사용자(10)의 발화에서 어절의 개수를 기준으로 조각이 분리될 수 있다.When generating conversation data 200 in text form, the utterance can be separated into pieces of a fixed size. The fragment is a portion of the user's 10 utterance. The size of a piece can be determined based on time. For example, a fragment may be determined to be 1 second long, and if the total length of the utterance is 3 seconds, it may be separated into 3 fragments. Alternatively, the size of the piece may be determined based on syllables or words. In the user's 10 utterance, fragments may be separated based on the number of words.

사용자(10)의 발화를 조각으로 분리함에 있어서, 각 조각이 발화된 타임스탬프(time stamp)를 부여할 수 있다. 타임스탬프는 각 조각의 순서를 나타낼 수 있다. 타임스탬프는 조각 사이의 간격을 나타낼 수도 있다. 타임스탬프는 발화에서 첫 조각을 0초로 하고, 이후의 조각까지의 실제 시간을 나타낼 수도 있다.When separating the utterance of the user 10 into pieces, a time stamp at which each piece was uttered can be assigned. Timestamps can indicate the order of each piece. Timestamps can also indicate gaps between pieces. Timestamps may represent the actual time from the first fragment of an utterance to 0 seconds and subsequent fragments.

학습데이터세트는 조각으로 분리된 사용자(10)의 발화와 타임스탬프에 기초하여 생성될 수 있다. 학습데이터세트는 학습데이터와 라벨데이터를 포함하며, 타임스탬프마다 하나씩 생성될 수 있다. 학습데이터는 타임스탬프에 해당하는 조각과 이전 타임스탬프의 조각을 누적하여 포함할 수 있다. 예를 들어, 제1 화자(210)의 발화는 3개의 조각으로 분리되고, 제1 타임스탬프, 제2 타임스탬프, 제3 타임스탬프가 부여될 수 있다. 제1 타임스탬프(t1)에서 학습데이터가 되는 조각은 '오늘' 이고, 라벨데이터는 제2 화자(220)의 발화인 '응' 이 될 수 있다. 제2 타임스탬프에서 학습데이터가 되는 조각그룹은 제1 타임스탬프의 '오늘'과 제2 타임스탬프의 '날씨'를 포함하고, 라벨데이터는 제2 화자(220)의 발화가 없으므로 '빈공간' 이 될 수 있다. 제3 타임스탬프에서 학습데이터가 되는 조각그룹은 제1 타임스탬프의 '오늘', 제2 타임스탬프의 '날씨', 제3 타임스탬프의 '어때?'를 포함하고, 라벨데이터는 제2 화자(220)의 발화가 없으므로 '빈공간' 이 될 수 있다. 라벨데이터가 빈공간이라는 것은 라벨데이터가 존재하지 않는다는 의미이다. '빈공간'은 도면에서 점선 네모로 표시하였다. '빈공간'은 룩업테이블에 대응하는 코드인 'C3빈공간'으로 지정되어 저장될 수 있다. 프로세서(31)는 턴프리 대화모델(110)이 'C3빈공간'을 출력하면 중간응답을 제공하지 않는 것으로 판단할 수 있다.A learning dataset can be created based on the utterances and timestamps of the user 10 separated into pieces. The learning dataset includes training data and label data, and can be created one for each timestamp. Learning data may include fragments corresponding to the timestamp and fragments of the previous timestamp by accumulating them. For example, the utterance of the first speaker 210 may be divided into three pieces and assigned a first timestamp, a second timestamp, and a third timestamp. In the first timestamp (t1), the piece of learning data may be 'today', and the label data may be 'Yes', which is the utterance of the second speaker 220. The fragment group that becomes the learning data in the second timestamp includes 'today' in the first timestamp and 'weather' in the second timestamp, and the label data is an 'empty space' because there is no utterance from the second speaker 220. This can be. The fragment group that becomes the learning data in the third timestamp includes 'today' in the first timestamp, 'weather' in the second timestamp, and 'how are you?' in the third timestamp, and the label data is the second speaker ( 220), there is no utterance, so it can be an 'empty space'. Empty label data means that label data does not exist. ‘Empty space’ is indicated by a dotted square in the drawing. The 'empty space' can be designated and stored as 'C3 empty space', which is a code corresponding to the lookup table. The processor 31 may determine that an intermediate response is not provided when the turn-free conversation model 110 outputs 'C3 empty space'.

라벨데이터인 제2 화자(220)의 발화는 중간응답에 해당한다. 제2 화자(220)의 발화는 단어 또는 문장으로 표현되는 언어적 응답을 포함할 수 있다. 예를 들어, 제2 화자(220)의 발화는 '응', '아니', '그래서', '그리고', '계속해' 등과 같은 언어적 응답을 포함할 수 있다. 그리고, 제2 화자(220)의 발화는 소리로 표현되는 청각적 응답일 수 있다. 예를 들어, 제2 화자(220)의 발화는 한숨, 놀라는 음성, '음...', '어...', 등과 같은 비언어적이고 청각적인 표현을 포함할 수 있다.The utterance of the second speaker 220, which is label data, corresponds to an intermediate response. The utterance of the second speaker 220 may include a verbal response expressed in words or sentences. For example, the utterance of the second speaker 220 may include verbal responses such as 'yes', 'no', 'so', 'and', 'continue', etc. Additionally, the utterance of the second speaker 220 may be an auditory response expressed as a sound. For example, the utterance of the second speaker 220 may include non-verbal and auditory expressions such as a sigh, a surprised voice, 'um...', 'uh...', etc.

제2 화자(220)의 발화를 대신하여 이모티콘, 표정 또는 제스처와 같은 시각적 응답이 부가될 수 있다. 예를 들어, 제2 화자(220)의 발화는 놀라는 표정, 행복한 표정, 슬픈 표정, 화난 표정 등을 포함할 수 있다. 시각적 응답은 입출력부(34)의 디스플레이를 통해 사용자(10)에게 제공될 수 있다.A visual response such as an emoticon, facial expression, or gesture may be added in place of the second speaker 220's utterance. For example, the utterance of the second speaker 220 may include a surprised expression, a happy expression, a sad expression, an angry expression, etc. A visual response may be provided to the user 10 through the display of the input/output unit 34.

학습데이터세트를 생성함에 있어서, 언어적 응답과 청각적 응답은 대화 데이터(200)에서 음성 활동 감지 방법으로 획득할 수 있고, 시각적 응답은 별도의 데이터를 이용하여 부여할 수 있다. 라벨데이터에서 언어적 응답은 텍스트 형태로 저장될 수 있다. 청각적 응답과 시각적 응답은 룩업테이블 방식으로 저장될 수 있다. 룩업테이블은 메모리(32)에 저장될 수 있다. '아!'이라는 청각적 응답과 이에 대응하는 'C1놀람'이 룩업테이블에 저장되고, '음...'이라는 청각적 응답과 이에 대응하는 'C1음'이 룩업테이블에 저장되고, '놀라는 표정'이라는 시각적 응답과 이에 대응하는 'C2놀람'이 룩업테이블에 저장되고, '화나는 표정'이라는 시각적 응답과 이에 대응하는 'C2화남'이 룩업테이블에 저장될 수 있다. 라벨데이터에는 'C1한숨', 'C1음', 'C2놀람', 'C2화남' 등의 코드가 이용되며, 학습된 턴프리 대화모델(110)은 코드를 출력하도록 학습될 수 있다. 프로세서(31)는 턴프리 대화모델(110)이 텍스트를 출력하면 언어적 응답으로서 소리 또는 텍스트로 응답을 제공하고, 턴프리 대화모델(110)이 코드를 출력하면, 룩업테이블을 참조하여 대응하는 청각적 응답 또는 시각적 응답을 제공할 수 있다.In creating a learning dataset, verbal responses and auditory responses can be obtained from the conversation data 200 using a voice activity detection method, and visual responses can be provided using separate data. In label data, verbal responses can be stored in text form. Auditory and visual responses can be stored in a lookup table format. The lookup table may be stored in memory 32. The auditory response ‘Ah!’ and the corresponding ‘C1 surprise’ are stored in the lookup table, the auditory response ‘Hmm...’ and the corresponding ‘C1 sound’ are stored in the lookup table, and the ‘surprise’ response is stored in the lookup table. The visual response 'facial expression' and the corresponding 'C2 surprise' can be stored in the lookup table, and the visual response 'angry expression' and the corresponding 'C2 angry' can be stored in the lookup table. Codes such as 'C1 sigh', 'C1 sound', 'C2 surprise', and 'C2 angry' are used in the label data, and the learned turn-free conversation model 110 can be trained to output codes. When the turn-free conversation model 110 outputs a text, the processor 31 provides a response in the form of a sound or text as a verbal response, and when the turn-free conversation model 110 outputs a code, it refers to the look-up table and provides a corresponding response. It can provide an auditory response or a visual response.

이러한 과정을 통해 생성된 학습데이터세트를 학습하면, 턴프리 대화모델(110)은 사용자(10)의 발화 도중에 중간응답을 언제 제공하여야 하는지 판단할 수 있고, 어떤 중간응답을 제공하여야 하는지 학습할 수 있다.By learning the learning data set generated through this process, the turn-free conversation model 110 can determine when to provide an intermediate response during the user's 10 utterance and learn what kind of intermediate response to provide. there is.

화자 분리를 수행함에 있어서, 어느 화자의 발화를 학습데이터에 배치하고, 어느 화자의 발화를 라벨데이터에 배치하여야 하는지 구분할 필요가 있다. 일 구현예에서, 학습데이터는 수집된 대화 데이터(200)에서 두 사람의 발화가 겹치는 경우, 겹치는 발화를 포함하는 문장의 길이를 비교하고 문장의 길이가 긴 화자의 발화를 제1 화자(210)의 발화로 지정하고, 길이가 짧은 화자의 발화를 제2 화자(220)의 발화로 지정할 수 있다. 발화가 겹치는 것은 어느 화자가 발화하는 동안에 다른 화자가 발화하여 동일한 시간에 2개 이상의 문장이 존재하는 상태이다. 달리 설명하면, 발화가 겹치는 것은 하나의 발화에 다른 발화가 오버랩되는 상태이다.When performing speaker separation, it is necessary to distinguish which speaker's utterances should be placed in the learning data and which speaker's utterances should be placed in the label data. In one implementation, when the utterances of two people overlap in the collected conversation data 200, the learning data compares the length of sentences including the overlapping utterances and compares the utterance of the speaker with the longer sentence to the first speaker 210. can be designated as the utterance of , and the speaker's utterance with a short length can be designated as the utterance of the second speaker 220. Overlapping utterances is a state in which two or more sentences exist at the same time because one speaker utters while another speaker utters. In other words, overlapping utterances is a state in which one utterance overlaps with another.

도 3을 참조하면, 제1 타임스탬프에서 서로 다른 사람의 발화('오늘' 과 '응')가 겹친다. 이러한 경우, 문장의 길이가 긴 화자를 제1 화자(210)로 결정하고 긴 문장을 학습데이터에 배치할 수 있다. 본 개시에서 턴프리 대화 모델이 출력하는 중간응답은 사용자(10)의 발화 도중에 적절한 반응을 제공하기 위한 것이므로, 발화가 겹치는 경우 길이가 짧은 발화를 제2 화자(220)의 발화로 결정하고 라벨데이터로 배치하는 것이다.Referring to Figure 3, in the first timestamp, different people's utterances ('today' and 'yes') overlap. In this case, the speaker with long sentences can be determined as the first speaker 210 and the long sentences can be placed in the learning data. In the present disclosure, the intermediate response output by the turn-free conversation model is intended to provide an appropriate response during the user's 10 utterance, so when utterances overlap, the short utterance is determined as the utterance of the second speaker 220 and the label data It is placed as .

이러한 과정을 통해 생성된 학습데이터세트로 학습된 턴프리 대화모델(110)은, 사용자(10)의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹이 생성되어 턴프리 대화모델(110)에 입력되면, 턴프리 대화모델(110)은 중간응답을 제공할 시점인지 판단하고 중간응답을 출력할 수 있다.The turn-free conversation model 110, learned with the learning data set generated through this process, is created in pieces of a fixed size in the order in which the utterances of the user 10 are input, is given a timestamp, and for each timestamp, the previous timestamp When a fragment group is created by cumulatively connecting the pieces and input into the turn-free conversation model 110, the turn-free conversation model 110 can determine whether it is time to provide an intermediate response and output the intermediate response.

프로세서(31)는 턴프리 대화모델(110)을 이용하기 위하여, 사용자(10)의 발화를 조각으로 분리하고 타임스탬프에 따라 조각그룹을 생성하여 조각그룹을 턴프리 대화모델(110)에 입력할 수 있다. 사용자(10)의 발화는 소리 형태로 입력되거나, 텍스트 형태로 입력될 수 있다. 사용자(10)는 턴프리 대화 장치(30) 또는 원격단말(20)에 음성으로 발화를 입력할 수 있다. 또는, 사용자(10)는 턴프리 대화 장치(30) 또는 원격단말(20)에 키보드를 이용하여 텍스트를 입력하는 방식으로 발화를 입력할 수 있다. 프로세서(31)는 사용자(10)의 발화가 입력되는 순서대로 정해진 크기의 조각으로 분리하고 타임스탬프를 부여할 수 있다. 프로세서(31)가 제1 타임스탬프에 해당하는 조각을 턴프리 대화모델(110)에 입력하면, 턴프리 대화모델(110)은 학습된 중간응답을 출력하거나, 학습된 '빈공간'을 출력할 수 있다. 프로세서(31)가 제2 타임스탬프에 해당하는 조각그룹(제1 타임스탬프의 조각 및 제2 타임스탬프의 조각)을 턴프리 대화모델(110)에 입력하면, 턴프리 대화모델(110)은 학습된 중간응답을 출력하거나, 학습된 '빈공간'을 출력할 수 있다. 프로세서(31)는 턴프리 대화모델(110)에서 출력된 중간응답을 입출력부(34) 또는 원격단말(20)을 통해 사용자(10)에게 제공할 수 있다.In order to use the turn-free conversation model 110, the processor 31 separates the utterance of the user 10 into pieces, creates a fragment group according to the timestamp, and inputs the fragment group into the turn-free conversation model 110. You can. The user 10's utterance may be input in the form of sound or text. The user 10 can input speech by voice into the turn-free conversation device 30 or the remote terminal 20. Alternatively, the user 10 may input speech by entering text into the turn-free conversation device 30 or the remote terminal 20 using a keyboard. The processor 31 may separate the user's 10 utterance into pieces of a certain size in the order in which they are input and assign a timestamp. When the processor 31 inputs a fragment corresponding to the first timestamp into the turn-free conversation model 110, the turn-free conversation model 110 outputs a learned intermediate response or outputs a learned 'empty space'. You can. When the processor 31 inputs a fragment group corresponding to the second timestamp (a fragment of the first timestamp and a fragment of the second timestamp) into the turn-free conversation model 110, the turn-free conversation model 110 learns. You can output the intermediate response or the learned 'empty space'. The processor 31 may provide the intermediate response output from the turn-free conversation model 110 to the user 10 through the input/output unit 34 or the remote terminal 20.

도 4는 일 구현예에 따른 턴프리 대화 방법을 나타내는 도면이다. 도 5는 일 구현예에 따른 사용자의 발화와 응답을 설명하는 도면이다. 도 4 및 도 5를 함께 참조한다.Figure 4 is a diagram showing a turn-free conversation method according to one implementation. Figure 5 is a diagram illustrating a user's speech and response according to an implementation example. Please refer to Figures 4 and 5 together.

턴프리 대화 방법은 사용자(10)의 발화를 입력받는 단계(S10), 사용자(10)의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하는 단계(S20), 타임스탬프마다 생성된 조각그룹을 턴프리 대화모델(110)에 입력하고, 중간응답을 획득하는 단계(S30), 중간응답을 사용자(10)에게 제공하는 단계(S40)를 포함할 수 있다.The turn-free conversation method is a step of receiving the user's 10 utterance (S10), and the user 10's utterance is generated into pieces of a fixed size in the order in which they are input, timestamps are given, and for each timestamp, a piece of the previous timestamp is created. A step of creating a fragment group by cumulatively connecting (S20), inputting the fragment group created for each timestamp into the turn-free conversation model (110), and obtaining an intermediate response (S30), and converting the intermediate response to the user (10) ) may include a step (S40) of providing to.

사용자(10)의 발화를 입력받는 단계(S10)는 턴프리 대화 장치(30)가 사용자(10)의 발화를 입력받는 과정이다. 턴프리 대화 장치(30)의 입출력부(34)는 마이크 또는 키보드를 이용하여 사용자(10)의 발화를 소리 또는 텍스트로 입력받을 수 있다. 사용자(10)의 발화를 입력받는 단계(S10)는 발화를 실시간으로 순서대로 입력받을 수 있다.The step of receiving the user 10's utterance (S10) is a process in which the turn-free conversation device 30 receives the user 10's utterance. The input/output unit 34 of the turn-free conversation device 30 can receive the user's utterance as sound or text using a microphone or keyboard. In the step (S10) of receiving the user's utterance, the utterances can be input sequentially in real time.

조각그룹을 생성하는 단계(S20)는 사용자(10)의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성할 수 있다. 프로세서(31)는 입출력부(34)를 통해 입력받은 사용자(10)의 발화를 입력되는 순서대로 정해진 크기의 조각으로 분리하고 타임스탬프를 부여할 수 있다. 프로세서(31)는 타임스탬프에 대응하는 조각과, 이전의 타임스탬프의 조각들을 누적하여 포함하는 조각그룹을 생성할 수 있다. 프로세서(31)는 사용자(10)의 발화가 입력되면 순서대로 복수의 조각그룹을 생성할 수 있다.In the step (S20) of creating a fragment group, fragments of a certain size are created in the order in which the user's utterance is input, a timestamp is given, and a fragment group is created by cumulatively connecting fragments with the previous timestamp for each timestamp. can do. The processor 31 may separate the utterance of the user 10 input through the input/output unit 34 into pieces of a predetermined size in the order in which they are input and assign a timestamp to them. The processor 31 may generate a fragment group that includes a fragment corresponding to a timestamp and fragments of previous timestamps by accumulating them. The processor 31 may generate a plurality of fragment groups in order when the user's 10 utterance is input.

중간응답을 획득하는 단계(S30)는 프로세서(31)가 조각그룹을 턴프리 대화모델(110)에 입력하고 턴프리 대화모델(110)이 중간응답 또는 '빈공간'을 출력하는 과정이다. 턴프리 대화모델(110)은 입력받은 조각그룹에 대응하는 중간응답 또는 '빈공간'을 출력할 수 있다.The step (S30) of obtaining an intermediate response is a process in which the processor 31 inputs a fragment group into the turn-free conversation model 110 and the turn-free conversation model 110 outputs an intermediate response or 'empty space'. The turn-free conversation model 110 can output an intermediate response or 'empty space' corresponding to the input fragment group.

중간응답을 사용자(10)에게 제공하는 단계(S40)에서 프로세서(31)는 턴프리 대화모델(110)에서 출력된 중간응답을 입출력부(34) 또는 원격단말(20)을 통해 사용자(10)에게 제공할 수 있다. 턴프리 대화모델(110)에서 '빈공간'을 출력하는 경우 해당 타임스탬프에서는 중간응답을 제공하지 않는 것으로 판단한 것이므로, 프로세서(31)는 사용자(10)에게 아무런 응답을 제공하지 않을 수 있다.In the step of providing an intermediate response to the user 10 (S40), the processor 31 sends the intermediate response output from the turn-free conversation model 110 to the user 10 through the input/output unit 34 or the remote terminal 20. can be provided to. When the turn-free conversation model 110 outputs 'empty space', it is determined that no intermediate response is provided at the corresponding timestamp, so the processor 31 may not provide any response to the user 10.

일 구현예에 따른 턴프리 대화 방법은 상기 턴프리 대화모델(110)을 생성하는 단계(S50)를 더 포함할 수 있다. 턴프리 대화모델(110)을 생성하는 단계(S50)는 제1 화자(210)의 발화를 입력되는 순서대로 정해진 크기의 조각으로 생성하고, 조각에 타임스탬프를 부여하고, 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하며, 조각그룹이 학습데이터이고 타임스탬프마다 생성된 조각그룹에 대한 제2 화자(220)의 발화가 라벨데이터인 학습데이터세트를 생성하는 단계(S51), 및 학습데이터세트를 이용하여, 타임스탬프마다 생성된 조각그룹이 입력되면, 중간응답을 제공할 시점인지 판단하고 중간응답을 출력하도록 턴프리 대화모델(110)을 학습시키는 단계(S52)를 포함할 수 있다. 학습데이터세트를 생성하는 단계(S51)는 수집된 대화 데이터(200)에서 두 사람의 발화가 겹치는 경우, 겹치는 발화를 포함하는 문장의 길이를 비교하고 문장의 길이가 긴 화자의 발화를 제1 화자(210)의 발화로 지정하고, 길이가 짧은 화자의 발화를 제2 화자(220)의 발화로 지정하여 학습데이터와 라벨데이터를 구분하는 과정을 더 수행하고, 상기 제1 화자(210)의 발화를 조각그룹으로 생성하여 입력데이터로 정하고 상기 제2 화자(220)의 발화를 라벨데이터로 정할 수 있다.The turn-free conversation method according to one implementation may further include a step (S50) of generating the turn-free conversation model 110. In the step (S50) of generating the turn-free conversation model 110, the utterance of the first speaker 210 is generated into pieces of a fixed size in the order in which they are input, a timestamp is assigned to the pieces, and a previous timestamp is added to each timestamp. A fragment group is created by cumulatively connecting the fragments, and a learning dataset is created in which the fragment group is learning data and the utterance of the second speaker 220 for the fragment group generated for each timestamp is label data (S51). , and a step (S52) of training the turn-free conversation model 110 to determine whether it is time to provide an intermediate response and output an intermediate response when the fragment group generated for each timestamp is input using the learning data set. can do. In the step (S51) of generating a learning data set, when the utterances of two people overlap in the collected conversation data 200, the length of the sentences including the overlapping utterances are compared and the utterance of the speaker with the longer sentence is used as the first speaker. The utterance of the speaker (210) is designated as the utterance, and the short-length speaker's utterance is designated as the utterance of the second speaker (220), and a process of distinguishing between the training data and the label data is further performed, and the utterance of the first speaker (210) is further performed. can be created as a fragment group and set as input data, and the utterance of the second speaker 220 can be set as label data.

학습데이터세트를 생성하는 단계(S51)는 도 3을 참조하여 설명하였으므로 자세한 설명은 생략한다. 턴프리 대화모델(110)을 학습시키는 단계(S52)는 학습데이터세트를 이용하여 인공지능 모델을 학습시키는 과정이다. 인공지능 모델은 다층 신경망 구조를 포함할 수 있다. 턴프리 대화모델(110)을 학습시키는 단계(S52)는 역전파 방식을 이용하여 라벨데이터의 추정 정확도가 향상되도록 다층 신경망의 가중치를 조정하는 과정을 통해 수행될 수 있다.The step of generating a learning data set (S51) has been described with reference to FIG. 3, so detailed description will be omitted. The step (S52) of learning the turn-free conversation model 110 is a process of learning an artificial intelligence model using a learning dataset. Artificial intelligence models may include multi-layer neural network structures. The step (S52) of learning the turn-free conversation model 110 can be performed through a process of adjusting the weights of the multi-layer neural network to improve the estimation accuracy of label data using a backpropagation method.

일 구현예에 따르면 턴프리 대화 방법은, 사용자(10)의 발화의 전부가 입력되면 발화의 내용을 분석하여 실질적인 응답을 제공하는 턴기반 대화모델(120)을 이용하여, 사용자(10)의 발화가 완료되면 발화를 상기 턴기반 대화모델(120)에 입력하여 실질적 응답을 획득하는 단계(S60), 및 실질적 응답을 사용자(10)에게 제공하는 단계를 더 포함할 수 있다. 그리고, 중간응답을 획득하는 단계(S30)와 실질적 응답을 획득하는 단계(S60)는 서로 독립적으로 실행되어, 사용자(10)의 발화 중간에는 상기 중간응답을 제공하는 단계가 수행될 수 있고, 사용자(10)의 발화가 완료되면 실질적 응답을 제공하는 단계(S70)가 수행될 수 있다. According to one implementation, the turn-free conversation method uses the turn-based conversation model 120, which analyzes the content of the utterance and provides an actual response when the entire utterance of the user 10 is input, to determine the utterance of the user 10. Upon completion, the step may further include inputting the utterance into the turn-based dialogue model 120 to obtain a substantive response (S60), and providing the substantive response to the user 10. In addition, the step of obtaining an intermediate response (S30) and the step of obtaining a substantive response (S60) are executed independently of each other, so that the step of providing the intermediate response can be performed in the middle of the user's utterance, and the step of providing the intermediate response can be performed by the user. Once the utterance of (10) is completed, a step (S70) of providing a substantive response may be performed.

사용자(10)가 발화를 시작하면 프로세서(31)는 사용자(10)의 발화를 입력받는 단계(S10)를 수행한다. 다음으로, 프로세서(31)는 사용자(10)의 발화가 완료되지 않은 상태이므로 조각그룹을 생성하는 단계(S20)를 수행한다. 프로세서(31)는 사용자(10)의 발화의 일부를 조각으로 분리한다. 제1 타임스탬프에서, 조각 1은 그대로 조각그룹 1이 될 수 있다. 다음으로, 프로세서(31)는 중간응답을 획득하는 단계(S30)를 수행한다. 프로세서(31)는 조각그룹 1을 턴프리 대화모델(110)에 입력하고 중간응답 또는 '빈공간'을 획득할 수 있다. 다음으로, 프로세서(31)는 중간응답을 사용자(10)에게 제공하는 단계(S40)를 수행한다.When the user 10 starts speaking, the processor 31 performs a step (S10) of receiving the user's 10 speech. Next, the processor 31 performs a step (S20) of creating a fragment group because the user 10's utterance is not complete. The processor 31 separates part of the user's 10 utterance into pieces. At the first timestamp, piece 1 may remain piece group 1. Next, the processor 31 performs a step (S30) of obtaining an intermediate response. The processor 31 may input fragment group 1 into the turn-free conversation model 110 and obtain an intermediate response or 'empty space'. Next, the processor 31 performs a step (S40) of providing an intermediate response to the user 10.

그리고, 제2 타임스탬프에서, 프로세서(31)는 조각그룹을 생성하는 단계(S20)를 수행하며, 제1 타임스탬프의 조각 1과 제2 타임스탬프의 조각 2는 조각그룹 2가 될 수 있다. 다음으로, 프로세서(31)는 중간응답을 획득하는 단계(S30)를 수행한다. 프로세서(31)는 조각그룹 2를 턴프리 대화모델(110)에 입력하고 중간응답 또는 '빈공간'을 획득할 수 있다. 다음으로, 프로세서(31)는 중간응답을 사용자(10)에게 제공하는 단계(S40)를 수행한다.Then, at the second timestamp, the processor 31 performs a step (S20) of creating a fragment group, and fragment 1 of the first timestamp and fragment 2 of the second timestamp may become fragment group 2. Next, the processor 31 performs a step (S30) of obtaining an intermediate response. The processor 31 may input fragment group 2 into the turn-free conversation model 110 and obtain an intermediate response or 'empty space'. Next, the processor 31 performs a step (S40) of providing an intermediate response to the user 10.

그리고, 제3 타임스탬프에서, 프로세서(31)는 조각 1, 조각 2, 조각 3을 포함하는 조각그룹 3을 턴프리 대화모델(110)에 입력하고 중간응답 또는 '빈공간'을 획득하고 중간응답을 사용자(10)에게 제공할 수 있다. 유사하게, 제4 타임스탬프에서, 프로세서(31)는 조각 1, 조각 2, 조각 3, 조각 4를 포함하는 조각그룹 4을 턴프리 대화모델(110)에 입력하고 중간응답 또는 '빈공간'을 획득하고 중간응답을 사용자(10)에게 제공할 수 있다.And, at the third timestamp, the processor 31 inputs fragment group 3, including fragment 1, fragment 2, and fragment 3, into the turn-free conversation model 110, obtains an intermediate response or 'empty space', and generates an intermediate response. can be provided to the user 10. Similarly, at the fourth timestamp, the processor 31 inputs fragment group 4, including fragment 1, fragment 2, fragment 3, and fragment 4, into the turn-free conversation model 110 and enters an intermediate response or 'empty space'. It is possible to obtain and provide an intermediate response to the user 10.

사용자(10)의 발화 입력이 더이상 들어오지 않는 경우, 프로세서(31)는 사용자(10)의 발화가 완료된 것으로 판단할 수 있다. 프로세서(31)는 완료된 사용자(10)의 발화를 턴기반 대화모델(120)에 입력하고 실질적 응답을 획득할 수 있다. 프로세서(31)는 실질적 응답을 원격단말(20) 또는 입출력부(34)를 통해 사용자(10)에게 제공할 수 있다.When the user 10's speech input is no longer received, the processor 31 may determine that the user's 10 speech has been completed. The processor 31 may input the completed utterance of the user 10 into the turn-based dialogue model 120 and obtain an actual response. The processor 31 may provide an actual response to the user 10 through the remote terminal 20 or the input/output unit 34.

설명한 내용을 종합하여, 일 구현예에 따른 턴프리 대화모델(110)을 위한 학습데이터세트를 하나 더 설명한다. 제1 화자(210)가 '아까 그 누구지 진영이를 우연히 봤어' 라는 문장을 발화한 경우, 제2 화자(220)가 중간응답으로서 언어적 응답, 시각적 응답, 청각적 응답을 제공하는 학습데이터세트(표 1)를 생성할 수 있다.In summary, one more learning dataset for the turn-free conversation model 110 according to an implementation example will be described. A learning dataset in which, when the first speaker (210) utters the sentence 'Who was Jinyoung by chance?', the second speaker (220) provides a verbal response, a visual response, and an auditory response as intermediate responses. (Table 1) can be created.

학습데이터세트training dataset타임스탬프timestamp학습데이터learning data라벨데이터label datat1t1조각그룹sculpture group아까a moment ago네yest2t2조각그룹sculpture group아까 그Just before thatC3빈공간C3 empty spacet3t3조각그룹sculpture group아까 그 누구지Who is that person earlier?C2궁금C2Curioust4t4조각그룹sculpture group아까 그 누구지 진영Who is Jinyoung?C3빈공간C3 empty spacet5t5조각그룹sculpture group아까 그 누구지 진영이를Who was that person earlier, Jinyoung?C3빈공간C3 empty spacet6t6조각그룹sculpture group아까 그 누구지 진영이를 우연히Who was that person earlier? I happened to see Jinyoung.C1놀람C1 surpriset7t7조각그룹sculpture group아까 그 누구지 진영이를 우연히 봤어I happened to see Jinyoung, who is that guy?C3빈공간C3 empty space

표 1의 학습데이터세트는 화자가 '아까' 라는 단어를 발화하는 것을 입력받으면, 언어적 응답으로서 '네'를 제공하도록 턴프리 대화모델(110)을 학습시킬 수 있다. 그리고, 표 1의 학습데이터세트는 화자가 '아까 그' 까지 발화하는 것을 입력받으면 코드 'C3빈공간'을 출력하도록 턴프리 대화모델(110)을 학습시킬 수 있다. 계속해서, 표 1의 학습데이터세트는 화자가 '아까 그 누구지' 까지 발화하는 것을 입력받으면 코드'C2궁금'을 출력하도록 턴프리 대화모델(110)을 학습시킬 수 있다. 계속해서, 표 1의 학습데이터세트는 화자가 '아까 그 누구지 진영이를 우연히' 까지 발화하는 것을 입력받으면 코드'C1놀람'을 출력하도록 턴프리 대화모델(110)을 학습시킬 수 있다.The learning data set in Table 1 can train the turn-free conversation model 110 to provide 'yes' as a verbal response when the speaker inputs the word 'a while ago'. In addition, the learning data set in Table 1 can train the turn-free conversation model 110 to output the code 'C3 empty space' when the speaker receives the input of the speaker's utterance up to 'that'. Continuing, the learning data set in Table 1 can train the turn-free conversation model 110 to output the code 'C2 curiosity' when the speaker receives the input of 'Who was that person from earlier'. Continuing, the learning data set in Table 1 can train the turn-free conversation model 110 to output the code 'C1 surprise' when the speaker receives the input of 'Who was Jinyoung by chance?'

프로세서(31)는 표 1의 학습데이터세트를 이용하여 인공지능모델을 학습시켜 턴프리 대화모델(110)을 생성할 수 있다.The processor 31 can generate a turn-free conversation model 110 by training an artificial intelligence model using the learning dataset in Table 1.

표 1의 학습데이터세트를 학습한 턴프리 대화모델(110)은 사용자(10)가 '아까' 라는 단어를 발화하는 것을 입력받으면, 텍스트인 '네'를 출력하고, 프로세서는 언어적 응답으로서 '네'를 소리 또는 텍스트로 제공할 수 있다. 그리고, 턴프리 대화모델(110)은 사용자(10)가 '아까 그' 까지 발화하는 것을 입력받으면 코드'C3빈공간'을 출력하므로 프로세서(31)는 중간응답을 제공하지 않는다. 계속해서, 턴프리 대화모델(110)은 사용자(10)가 '아까 그 누구지' 까지 발화하는 것을 입력받으면 코드'C2궁금'을 출력하므로 프로세서(31)는 메모리(32)에 저장된 룩업테이블을 참조하여 '궁금해하는 표정'이라는 시각적 중간응답을 제공할 수 있다. 턴프리 대화모델(110)은 사용자(10)가 '아까 그 누구지 진영이를 우연히' 까지 발화하는 것을 입력받으면 코드'C1놀람'을 출력하므로 프로세서(31)는 메모리(32)에 저장된 룩업테이블을 참조하여 '아!'이라는 청각적 중간응답을 제공할 수 있다.When the turn-free conversation model 110, which learned the learning data set in Table 1, receives the input of the user 10 uttering the word 'a while ago', it outputs the text 'yes', and the processor outputs 'yes' as a verbal response. ‘Yes’ can be provided as a sound or text. In addition, the turn-free conversation model 110 outputs code 'C3 empty space' when the user 10 receives input of what the user 10 uttered 'just now', so the processor 31 does not provide an intermediate response. Continuing, the turn-free conversation model 110 outputs the code 'C2 Curiosity' when the user 10 utters 'Who was that person before?', so the processor 31 refers to the lookup table stored in the memory 32. Thus, a visual intermediate response called a ‘curious expression’ can be provided. The turn-free conversation model 110 outputs the code 'C1 surprise' when the user 10 utters 'Who was Jinyoung by chance?', so the processor 31 uses the lookup table stored in the memory 32. For reference, an auditory intermediate response of ‘Ah!’ can be provided.

사용자의 발화가 종료되면, 프로세서(31)는 사용자(10)가 발화한 문장을 턴기반 대화모델(120)에 제공하고, 턴기반 대화모델(120)이 출력하는 응답인 '어디서 보셨나요?'라는 응답을 사용자(10)에게 제공할 수 있다.When the user's speech ends, the processor 31 provides the sentence uttered by the user 10 to the turn-based dialogue model 120, and the turn-based dialogue model 120 outputs a response called 'Where did you see it?' A response may be provided to the user 10.

설명한 바와 같이, 턴프리 대화 방법은 사용자(10)가 발화를 수행하는 중간에는 턴프리 대화모델(110)을 이용하여 중간응답을 제공하고, 사용자(10)가 발화를 완료하면 턴기반 대화모델(120)을 이용하여 실질적 응답을 제공할 수 있다. 사용자(10)는 중간응답을 제공받음으로써 실제로 사람과 대화하는 것과 같은 느낌을 가질 수 있다.As described, the turn-free conversation method provides an intermediate response using the turn-free conversation model 110 while the user 10 is making a speech, and when the user 10 completes the speech, a turn-based conversation model ( 120) can be used to provide a practical response. The user 10 can feel as if he or she is actually having a conversation with a person by receiving an intermediate response.

이상 본 개시를 구체적인 구현예를 통하여 상세히 설명하였다. 구현예는 본 개시를 구체적으로 설명하기 위한 것으로, 본 개시는 이에 한정되지 않는다. 본 개시의 기술적 사상 내에서 당해 분야의 통상의 지식을 가진 자에 의해 그 변형이나 개량이 가능함은 명백하다고 할 것이다.The present disclosure has been described in detail above through specific implementation examples. The implementation examples are for specifically explaining the present disclosure, and the present disclosure is not limited thereto. It will be clear that modifications and improvements can be made by those skilled in the art within the technical spirit of the present disclosure.

본 개시의 단순한 변형 내지 변경은 모두 본 개시의 영역에 속하는 것으로 본 개시의 구체적인 보호 범위는 첨부된 특허청구범위에 의하여 명확해질 것이다.All simple modifications or changes to the present disclosure fall within the scope of the present disclosure, and the specific scope of protection of the present disclosure will be made clear by the appended claims.

10: 사용자20: 원격단말
30: 턴프리 대화 장치31: 프로세서
32: 메모리33: 통신부
34: 입출력부100: 프로그램 코드
110: 턴프리 대화모델120: 턴기반 대화모델
200: 대화 데이터210: 제1 화자
220: 제2 화자10: User 20: Remote terminal
30: Turn-free conversation device 31: Processor
32: memory 33: communication department
34: input/output unit 100: program code
110: Turn-free conversation model 120: Turn-based conversation model
200: conversation data 210: first speaker
220: Second speaker

Claims

Translated fromKorean

하나 또는 복수의 프로세서;
상기 프로세서와 통신가능하게 연결되고, 상기 프로세서에서 실행되는 프로그램 코드를 저장하는 메모리를 포함하고,
상기 프로그램 코드는
상기 프로세서에 의해 실행되고, 사용자의 발화의 일부가 입력될 때마다 중간응답을 제공할 시점인지 판단하고, 사용자의 발화에 대응하는 중간응답을 출력하는 턴프리 대화모델을 포함하는, 턴프리 대화 장치.One or more processors;
a memory communicatively connected to the processor and storing program code to be executed by the processor;
The program code is
A turn-free conversation device that is executed by the processor and includes a turn-free conversation model that determines whether it is time to provide an intermediate response every time a part of the user's utterance is input, and outputs an intermediate response corresponding to the user's utterance. .

청구항 1에 있어서,
상기 중간응답은
단어 또는 문장으로 표현되는 언어적 응답, 소리로 표현되는 청각적 응답, 이모티콘, 표정 또는 제스처로 표현되는 시각적 응답을 포함하며, 사용자의 발화를 인식하고 있음을 나타내는 것인, 턴프리 대화 장치.In claim 1,
The intermediate response is
A turn-free conversation device that includes a verbal response expressed as a word or sentence, an auditory response expressed as a sound, and a visual response expressed as an emoticon, facial expression, or gesture, and indicates that the user's utterance is recognized.

청구항 1에 있어서,
상기 턴프리 대화모델은
제1 화자의 발화를 입력되는 순서대로 정해진 크기의 조각으로 생성하고, 상기 조각에 타임스탬프를 부여하고, 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하며, 상기 조각그룹이 학습데이터이고 상기 타임스탬프마다 생성된 조각그룹에 대한 제2 화자의 발화가 라벨데이터인 학습데이터세트를 학습하여 생성되고,
사용자의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹이 생성되어 상기 턴프리 대화모델에 입력되면, 상기 턴프리 대화모델은 중간응답을 제공할 시점인지 판단하고 중간응답을 출력하는, 턴프리 대화 장치.In claim 1,
The turn-free conversation model is
Generate the first speaker's utterance into fragments of a fixed size in the order in which they are input, assign a timestamp to the fragment, and for each timestamp, fragment groups of the previous timestamp are cumulatively connected to create a fragment group, and the fragment group This training data is generated by learning a training data set in which the second speaker's utterance for the fragment group generated for each timestamp is label data,
In the order in which the user's utterance is input, it is created into pieces of a fixed size and given a timestamp. For each timestamp, a fragment group is created by cumulatively connecting the fragments of the previous timestamp and input into the turn-free conversation model, the turn The free conversation model is a turn-free conversation device that determines whether it is time to provide an intermediate response and outputs an intermediate response.

청구항 3에 있어서,
상기 학습데이터는
수집된 대화 데이터에서 두 사람의 발화가 겹치는 경우, 겹치는 발화를 포함하는 문장의 길이를 비교하고 문장의 길이가 긴 화자의 발화를 제1 화자의 발화로 지정하고, 길이가 짧은 화자의 발화를 제2 화자의 발화로 지정하는, 턴프리 대화 장치.In claim 3,
The learning data is
If the utterances of two people overlap in the collected conversation data, the length of the sentences containing the overlapping utterances are compared and the utterance of the speaker with the longer sentence is designated as the utterance of the first speaker, and the utterance of the speaker with the shorter sentence is designated as the utterance of the first speaker. 2 A turn-free conversation device specified by the speaker's utterance.

청구항 1에 있어서,
상기 프로그램 코드는
상기 프로세서에 의해 실행되고, 상기 사용자의 발화의 전부가 입력되면 상기 발화의 내용을 분석하여 실질적인 응답을 제공하는 턴기반 대화모델을 더 포함하는, 턴프리 대화 장치.In claim 1,
The program code is
A turn-free conversation device that is executed by the processor and further includes a turn-based conversation model that analyzes the contents of the user's utterance when all of the user's utterance is input and provides a substantive response.

청구항 5에 있어서,
상기 프로세서는
상기 턴프리 대화모델과 상기 턴기반 대화모델은 서로 독립적으로 실행하고, 상기 턴프리 대화모델은 사용자의 발화 중간에 중간응답을 제공하고, 상기 턴기반 대화모델은 상기 사용자의 발화가 완료되면 실질적인 응답을 제공하는, 턴프리 대화 장치.In claim 5,
The processor is
The turn-free dialogue model and the turn-based dialogue model run independently of each other, the turn-free dialogue model provides an intermediate response in the middle of the user's utterance, and the turn-based dialogue model provides an actual response when the user's utterance is completed. A turn-free conversation device that provides.

사용자의 발화를 입력받는 단계;
사용자의 발화가 입력되는 순서대로 정해진 크기의 조각으로 생성되고 타임스탬프가 부여되며 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하는 단계;
상기 타임스탬프마다 생성된 조각그룹을 턴프리 대화모델에 입력하고, 중간응답을 획득하는 단계;
상기 중간응답을 사용자에게 제공하는 단계를 포함하는, 턴프리 대화 방법.Receiving input of a user's speech;
Creating fragments of a predetermined size in the order in which the user's utterance is input, assigning a timestamp, and cumulatively connecting fragments with previous timestamps for each timestamp to create a fragment group;
Inputting the fragment group generated for each timestamp into a turn-free conversation model and obtaining an intermediate response;
A turn-free conversation method comprising providing the intermediate response to the user.

청구항 7에 있어서,
상기 중간응답은
단어 또는 문장으로 표현되는 언어적 응답, 소리로 표현되는 청각적 응답, 이모티콘, 표정 또는 제스처로 표현되는 시각적 응답을 포함하며, 사용자의 발화를 인식하고 있음을 나타내는 것인, 턴프리 대화 방법.In claim 7,
The intermediate response is
A turn-free conversation method that includes a verbal response expressed as a word or sentence, an auditory response expressed as a sound, and a visual response expressed as an emoticon, facial expression, or gesture, and indicates that the user's utterance is recognized.

청구항 7에 있어서,
상기 턴프리 대화모델을 생성하는 단계를 더 포함하며,
상기 턴프리 대화모델을 생성하는 단계는
제1 화자의 발화를 입력되는 순서대로 정해진 크기의 조각으로 생성하고, 상기 조각에 타임스탬프를 부여하고, 상기 타임스탬프마다 이전 타임스탬프의 조각을 누적으로 연결하여 조각그룹을 생성하며, 상기 조각그룹이 학습데이터이고 상기 타임스탬프마다 생성된 조각그룹에 대한 제2 화자의 발화가 라벨데이터인 학습데이터세트를 생성하는 단계; 및
상기 학습데이터세트를 이용하여, 타임스탬프마다 생성된 조각그룹이 입력되면, 중간응답을 제공할 시점인지 판단하고 중간응답을 출력하도록 턴프리 대화모델을 학습시키는 단계를 포함하는, 턴프리 대화 방법.In claim 7,
Further comprising the step of generating the turn-free conversation model,
The step of creating the turn-free conversation model is
Generate the first speaker's utterance into fragments of a fixed size in the order in which they are input, assign a timestamp to the fragment, and for each timestamp, fragment groups of the previous timestamp are cumulatively connected to create a fragment group, and the fragment group generating a learning data set in which this training data is label data and the second speaker's utterance for the fragment group generated for each timestamp is label data; and
A turn-free conversation method comprising using the learning data set to train a turn-free conversation model to determine whether it is time to provide an intermediate response when a fragment group generated for each timestamp is input and to output an intermediate response.

청구항 9에 있어서,
상기 학습데이터세트를 생성하는 단계는
수집된 대화 데이터에서 두 사람의 발화가 겹치는 경우, 겹치는 발화를 포함하는 문장의 길이를 비교하고 문장의 길이가 긴 화자의 발화를 제1 화자의 발화로 지정하고, 길이가 짧은 화자의 발화를 제2 화자의 발화로 지정하여 학습데이터와 라벨데이터를 구분하는 과정을 더 수행하고, 상기 제1 화자의 발화를 조각그룹으로 생성하여 입력데이터로 정하고 상기 제2 화자의 발화를 라벨데이터로 정하는, 턴프리 대화 방법.In claim 9,
The step of generating the learning dataset is
If the utterances of two people overlap in the collected conversation data, the length of the sentences containing the overlapping utterances are compared and the utterance of the speaker with the longer sentence is designated as the utterance of the first speaker, and the utterance of the speaker with the shorter sentence is designated as the utterance of the first speaker. 2 A process of specifying the speaker's utterance to distinguish between learning data and label data is further performed, the first speaker's utterance is created as a fragment group and set as input data, and the second speaker's utterance is set as label data. Free conversation method.

청구항 7에 있어서,
상기 사용자의 발화의 전부가 입력되면 상기 발화의 내용을 분석하여 실질적인 응답을 제공하는 턴기반 대화모델을 이용하여, 상기 사용자의 발화가 완료되면 상기 발화를 상기 턴기반 대화모델에 입력하여 실질적 응답을 획득하는 단계; 및
상기 실질적 응답을 사용자에게 제공하는 단계를 더 포함하는, 턴프리 대화 방법.In claim 7,
When the user's entire utterance is input, a turn-based dialogue model is used to analyze the content of the utterance and provide a substantive response. When the user's utterance is completed, the utterance is input into the turn-based dialogue model to provide a substantive response. acquiring; and
A turn-free conversation method further comprising providing the substantive response to the user.

청구항 11에 있어서,
상기 중간응답을 획득하는 단계와 상기 실질적 응답을 획득하는 단계는 서로 독립적으로 실행되어, 사용자의 발화 중간에는 상기 중간응답을 제공하는 단계가 수행되고, 사용자의 발화가 완료되면 실질적 응답을 제공하는 단계가 수행되는, 턴프리 대화 방법.In claim 11,
The step of obtaining the intermediate response and the step of obtaining the substantive response are performed independently of each other, so that the step of providing the intermediate response is performed in the middle of the user's speech, and the step of providing the substantive response when the user's speech is completed. is performed, a turn-free conversation method.