KR20210120712A

Movatterモバイル変換

Info

Publication number: KR20210120712A
Application number: KR1020200037778A
Authority: KR
Inventors: 송유선
Original assignee: 주식회사 케이티
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-10-07
Anticipated expiration: 2040-03-27
Also published as: KR102780186B1

Abstract

Translated fromKorean

영상 통화 서비스를 제공하는 서버는 사용자 단말로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습하는 모델 학습부, 사용자 단말 및 타사용자 단말 간의 영상 통화 서비스를 수행하는 영상 통화 수행부, 영상 통화 서비스가 수행되는 중에 사용자 단말로부터 영상 생성 요청 메시지를 수신하는 수신부 및 영상 생성 요청 메시지에 기초하여 영상 생성 모델을 통해 사용자 단말의 사용자가 등장하는 대체 영상을 생성하는 영상 생성부를 포함하고, 상기 영상 통화 수행부는 영상 통화 서비스를 통해 제공되는 사용자 단말에 의해 촬영된 실사 영상 대신 생성된 대체 영상을 타사용자 단말에게 전송할 수 있다.The server providing the video call service includes a model learning unit that learns an image generation model based on a plurality of learning data received from a user terminal, a video call performing unit that performs a video call service between the user terminal and another user terminal, and a video call. and a receiver for receiving an image generation request message from the user terminal while a service is being performed, and an image generation unit for generating an alternative image in which the user of the user terminal appears through an image generation model based on the image generation request message, wherein the video call The performing unit may transmit the generated replacement image to the other user terminal instead of the actual image taken by the user terminal provided through the video call service.

Description

Translated fromKorean

영상 통화 서비스를 제공하는 서버, 방법 및 컴퓨터 프로그램{SERVER, METHOD AND COMPUTER PROGRAM FOR PROVIDING VIDEO CALL SERVICE}Server, method and computer program providing video call service {SERVER, METHOD AND COMPUTER PROGRAM FOR PROVIDING VIDEO CALL SERVICE}

본 발명은 영상 통화 서비스를 제공하는 서버, 방법 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to a server, a method and a computer program for providing a video call service.

최근, 통신 관련 기술이 급속히 발전함에 따라, 화상　통화가 상용화되면서 기존의 음성　통화　및 단문 메시지 송수신 위주의 이동통신 서비스에서 벗어나 장소에 관계없이 사용자 단말기를 이용하여 상대방의 모습을 보면서　통화할 수 있게 되었다.In recent years, with the rapid development of communication-related technologies, with the commercialization of video calls, it is now possible to make calls while looking at the other person's appearance using a user terminal regardless of location, away from the existing mobile communication services that focus on sending and receiving voice and short messages. .

이처럼 영상 통화가 보편화되어 있긴 하지만 영상 통화가 어려운 환경에서는 서로 간에 영상 통화를 계속 진행하기가 쉽지 않다.Although video calls are common, it is not easy to continue video calls with each other in an environment where video calls are difficult.

한편, 특허문헌 1에는 발신 영상 단말과 영상 통화호가 설정된 착신 음성 단말로부터 수신된 음성 데이터를 기설정한 영상 데이터와 합성하여 생성된 영상 통화 데이터를 발신 영상 단말로 전송하는 구성이 개시되어 있다.Meanwhile, Patent Document 1 discloses a configuration for transmitting video call data generated by synthesizing audio data received from an outgoing video terminal and an incoming audio terminal to which a video call is set with preset video data to the outgoing video terminal.

한국등록특허공보 제10-0928916호 (2009.11.20. 등록)Korean Patent Publication No. 10-0928916 (Registered on November 20, 2009)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 사용자 단말 및 타사용자 단말 간에 영상 통화 서비스가 수행되는 중에 사용자 단말로부터 영상 생성 요청 메시지를 수신하면, 사용자 단말의 실사 영상 대신에 사용자 단말의 사용자가 등장하는 대체 영상을 타사용자 단말에게 전송하고자 한다.The present invention is intended to solve the problems of the prior art described above. When a video call service is performed between a user terminal and another user terminal and an image creation request message is received from the user terminal, the user terminal's image instead of the actual image of the user terminal is received. An alternative image in which a user appears is to be transmitted to another user terminal.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 영상 통화 서비스를 제공하는 서버는 사용자 단말로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습하는 모델 학습부; 상기 사용자 단말 및 타사용자 단말 간의 영상 통화 서비스를 수행하는 영상 통화 수행부; 상기 영상 통화 서비스가 수행되는 중에 상기 사용자 단말로부터 영상 생성 요청 메시지를 수신하는 수신부; 및 상기 영상 생성 요청 메시지에 기초하여 상기 영상 생성 모델을 통해 상기 사용자 단말의 사용자가 등장하는 대체 영상을 생성하는 영상 생성부를 포함하고, 상기 영상 통화 수행부는 상기 영상 통화 서비스를 통해 제공되는 상기 사용자 단말에 의해 촬영된 실사 영상 대신 상기 생성된 대체 영상을 상기 타사용자 단말에게 전송할 수 있다.As a technical means for achieving the above-described technical problem, a server providing a video call service according to the first aspect of the present invention includes: a model learning unit for learning an image generation model based on a plurality of learning data received from a user terminal; a video call performing unit for performing a video call service between the user terminal and another user terminal; a receiver configured to receive an image creation request message from the user terminal while the video call service is being performed; and an image generation unit generating an alternative image in which a user of the user terminal appears through the image generation model based on the image generation request message, wherein the video call performing unit is provided through the video call service Instead of the actual image taken by the generated replacement image may be transmitted to the other user terminal.

본 발명의 제 2 측면에 따른 영상 통화 서비스를 제공하는 방법은 사용자 단말로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습하는 단계; 상기 사용자 단말 및 타사용자 단말 간의 영상 통화 서비스를 수행하는 단계; 상기 영상 통화 서비스가 수행되는 중에 상기 사용자 단말로부터 영상 생성 요청 메시지를 수신하는 단계; 및 상기 영상 생성 요청 메시지에 기초하여 상기 영상 생성 모델을 통해 상기 사용자 단말의 사용자가 등장하는 대체 영상을 생성하는 단계를 포함하고, 상기 영상 통화 서비스를 수행하는 단계는 상기 영상 통화 서비스를 통해 제공되는 상기 사용자 단말에 의해 촬영된 실사 영상 대신 상기 생성된 대체 영상을 상기 타사용자 단말에게 전송하는 단계를 포함할 수 있다.A method of providing a video call service according to a second aspect of the present invention includes: learning an image generation model based on a plurality of learning data received from a user terminal; performing a video call service between the user terminal and another user terminal; receiving an image creation request message from the user terminal while the video call service is being performed; and generating an alternative image in which the user of the user terminal appears through the image generation model based on the image creation request message, wherein performing the video call service is provided through the video call service It may include transmitting the generated replacement image instead of the actual image taken by the user terminal to the other user terminal.

본 발명의 제 3 측면에 따른 영상 통화 서비스를 제공하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 사용자 단말로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습하고, 상기 사용자 단말 및 타사용자 단말 간의 영상 통화 서비스를 수행하고, 상기 영상 통화 서비스가 수행되는 중에 상기 사용자 단말로부터 영상 생성 요청 메시지를 수신하고, 상기 영상 생성 요청 메시지에 기초하여 상기 영상 생성 모델을 통해 상기 사용자 단말의 사용자가 등장하는 대체 영상을 생성하고, 상기 영상 통화 서비스를 통해 제공되는 상기 사용자 단말에 의해 촬영된 실사 영상 대신 상기 생성된 대체 영상을 상기 타사용자 단말에게 전송하는 명령어들의 시퀀스를 포함할 수 있다.When a computer program stored in a medium including a sequence of instructions for providing a video call service according to the third aspect of the present invention is executed by a computing device, it learns an image generation model based on a plurality of learning data received from a user terminal. performing a video call service between the user terminal and another user terminal, receiving an image generation request message from the user terminal while the video call service is being performed, and generating the image generation model based on the image generation request message A sequence of commands for generating an alternative image in which the user of the user terminal appears through the user terminal, and transmitting the generated alternative image to the other user terminal instead of the actual image taken by the user terminal provided through the video call service may include

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 사용자 단말 및 타사용자 단말 간에 영상 통화 서비스가 수행되는 중에 사용자 단말로부터 영상 생성 요청 메시지를 수신하면, 사용자 단말의 실사 영상 대신에 사용자 단말의 사용자가 등장하는 대체 영상을 타사용자 단말에게 전송할 수 있다.According to any one of the above-described problem solving means of the present invention, the present invention receives an image creation request message from a user terminal while a video call service is performed between the user terminal and another user terminal, the user instead of the actual image of the user terminal An alternative image in which the user of the terminal appears may be transmitted to the terminal of another user.

이를 통해, 본 발명은 사용자 또는 타사용자가 영상 통화를 필요로 하는 상황이지만, 실제 영상 통화를 하기 어려운 상황에 놓여 있더라도 상대의 실사 영상 대신에 대체 영상을 제공함으로써 영상 통화를 계속 진행할 수 있도록 한다.Through this, the present invention allows the user or other users to continue the video call by providing an alternative video instead of the actual video even if the user or other user is in a situation in which it is difficult to make a video call, although it is a situation that requires a video call.

또한, 본 발명은 사용자의 실사 영상을 촬영할 수 없는 상황에서도 사용자가 입력한 텍스트 또는, 음성 데이터에 기초하여 대체 영상을 생성하여 타사용자 단말에게 제공함으로써 영상 통화를 계속 진행할 수 있도록 한다.In addition, the present invention generates a replacement image based on the text or voice data input by the user even in a situation where the user's actual image cannot be taken and provides it to other user terminals so that the video call can be continued.

또한, 본 발명은 기존의 아바타 영상이 아닌 실제 사용자가 등장하는 대체 영상을 타사용자 단말에게 제공함으로써 타사용자가 영상 통화시의 실재감을 느낄 수 있도록 한다.In addition, the present invention provides another user terminal with an alternative image in which a real user appears, rather than the existing avatar image, so that other users can feel a sense of reality during a video call.

도 1은 본 발명의 일 실시예에 따른, 영상 통화 서비스 제공 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른, 도 1에 도시된 영상 통화 서비스 제공 서버의 블록도이다.
도 3a 내지 3b는 본 발명의 일 실시예에 따른, 영상 생성 모델의 학습 방법을 설명하기 위한 도면이다.
도 4a 내지 4c는 본 발명의 일 실시예에 따른, 대체 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 5a 내지 5d는 본 발명의 일 실시예에 따른, 반응형 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른, 영상 통화 서비스를 제공하는 방법을 나타낸 흐름도이다.
도 7은 본 발명의 일 실시예에 따른, 반응형 영상을 제공하는 방법을 나타낸 흐름도이다.1 is a block diagram of a system for providing a video call service according to an embodiment of the present invention.
2 is a block diagram of a video call service providing server shown in FIG. 1 according to an embodiment of the present invention.
3A to 3B are diagrams for explaining a method of learning an image generation model according to an embodiment of the present invention.
4A to 4C are diagrams for explaining a method of generating an alternative image according to an embodiment of the present invention.
5A to 5D are diagrams for explaining a method of generating a responsive image according to an embodiment of the present invention.
6 is a flowchart illustrating a method of providing a video call service according to an embodiment of the present invention.
7 is a flowchart illustrating a method of providing a responsive image according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that other components may be further included rather than excluding other components unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.Some of the operations or functions described as being performed by the terminal or device in the present specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the server.

이하, 첨부된 구성도 또는 처리 흐름도를 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다.Hereinafter, detailed contents for carrying out the present invention will be described with reference to the accompanying configuration diagram or process flow diagram.

도 1은 본 발명의 일 실시예에 따른, 영상 통화 서비스 제공 시스템의 구성도이다.1 is a block diagram of a system for providing a video call service according to an embodiment of the present invention.

도 1을 참조하면, 영상 통화 서비스 제공 시스템은 영상 통화 서비스 제공 서버(100), 사용자 단말(110) 및 타사용자 단말(120)을 포함할 수 있다. 다만, 이러한 도 1의 영상 통화 서비스 제공 시스템은 본 발명의 일 실시예에 불과하므로 도 1을 통해 본 발명이 한정 해석되는 것은 아니며, 본 발명의 다양한 실시예들에 따라 도 1과 다르게 구성될 수도 있다.Referring to FIG. 1 , a system for providing a video call service may include a video callservice providing server 100 , auser terminal 110 , and anotheruser terminal 120 . However, since the video call service providing system of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 , and may be configured differently from FIG. 1 according to various embodiments of the present invention. have.

일반적으로, 도 1의 영상 통화 서비스 제공 시스템의 각 구성요소들은 네트워크(미도시)를 통해 연결된다. 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.In general, each component of the video call service providing system of FIG. 1 is connected through a network (not shown). A network refers to a connection structure that enables information exchange between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World). Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

사용자 단말(110)은 이미지 데이터 및 영상 데이터 중 적어도 하나를 포함하는 복수의 학습 데이터를 영상 통화 서비스 제공 서버(100)에게 전송할 수 있다.Theuser terminal 110 may transmit a plurality of learning data including at least one of image data and image data to the video callservice providing server 100 .

영상 통화 서비스 제공 서버(100)는 사용자 단말(110)로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습할 수 있다.The video callservice providing server 100 may learn an image generation model based on a plurality of learning data received from theuser terminal 110 .

사용자 단말(110)은 영상 통화 서비스 제공 서버(100)에게 타사용자 단말(120)과의 영상 통화를 요청할 수 있다.Theuser terminal 110 may request a video call with theother user terminal 120 from the video callservice providing server 100 .

영상 통화 서비스 제공 서버(100)는 사용자 단말(110)로부터 타사용자 단말(120)과의 영상 통화 요청을 수신하면, 사용자 단말(110) 및 타사용자 단말(120) 간의 영상 통화 서비스를 사용자 단말(110) 및 타사용자 단말(120)에게 제공할 수 있다.When the video callservice providing server 100 receives a video call request with theother user terminal 120 from theuser terminal 110, the video call service between theuser terminal 110 and theother user terminal 120 is provided to the user terminal ( 110) andother user terminals 120 may be provided.

영상 통화 서비스 제공 서버(100)는 영상 통화 서비스가 수행되는 중에 사용자 단말(110)로부터 영상 생성 요청 메시지를 수신할 수 있다. 여기서, 영상 생성 요청 메시지는 사용자 단말(110)이 타사용자 단말(120)과 통화는 지속하고 싶으나, 영상 통화를 할 수 없는 상황에서 보내는 메시지일 수 있다.The video callservice providing server 100 may receive an image creation request message from theuser terminal 110 while the video call service is being performed. Here, the image creation request message may be a message sent by theuser terminal 110 in a situation where theuser terminal 110 wants to continue the call with theother user terminal 120 but cannot make a video call.

영상 통화 서비스 제공 서버(100)는 사용자 단말(110)로부터 수신된 영상 생성 요청 메시지에 기초하여 영상 생성 모델을 통해 사용자 단말(110)의 사용자가 등장하는 대체 영상을 생성할 수 있다.The video callservice providing server 100 may generate an alternative image in which the user of theuser terminal 110 appears through the image generation model based on the image generation request message received from theuser terminal 110 .

영상 통화 서비스 제공 서버(100)는 영상 통화 서비스를 통해 제공되는 사용자 단말(110)에 의해 촬영된 실사 영상 대신 생성된 대체 영상을 타사용자 단말(120)에게 전송할 수 있다.The video callservice providing server 100 may transmit an alternative image generated instead of the actual image captured by theuser terminal 110 provided through the video call service to theother user terminal 120 .

사용자 단말(110) 및 타사용자 단말(120)은 무선 통신이 가능한 모바일 단말을 포함할 수 있고, 본 발명의 다양한 실시예들에 따르면, 사용자 단말(110) 및 타사용자 단말(120)은 다양한 형태의 디바이스일 수 있다. 예를 들어, 사용자 단말(110)은 네트워크를 통해 원격지의 서버에 접속할 수 있는 휴대용 단말일 수 있다. 여기서, 휴대용 단말의 일 예에는 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 태블릿 PC, 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치가 포함될 수 있다. 다만, 사용자 단말(110)은 앞서 예시된 것들로 한정 해석되는 것은 아니다.Theuser terminal 110 and theother user terminal 120 may include a mobile terminal capable of wireless communication, and according to various embodiments of the present invention, theuser terminal 110 and theother user terminal 120 may have various forms. It may be a device of For example, theuser terminal 110 may be a portable terminal capable of accessing a remote server through a network. Here, an example of a portable terminal is a wireless communication device that guarantees portability and mobility, and includes a Personal Communication System (PCS), a Global System for Mobile communications (GSM), a Personal Digital Cellular (PDC), a Personal Handyphone System (PHS), and a PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminal, smartphone All kinds of handheld-based wireless communication devices such as , tablet PC, and the like may be included. However, theuser terminal 110 is not limited to those exemplified above.

이하에서는 도 1의 영상 통화 서비스 제공 시스템의 각 구성요소의 동작에 대해 보다 구체적으로 설명한다.Hereinafter, the operation of each component of the video call service providing system of FIG. 1 will be described in more detail.

도 2는 본 발명의 일 실시예에 따른, 도 1에 도시된 영상 통화 서비스 제공 서버(100)의 블록도이다.FIG. 2 is a block diagram of the video callservice providing server 100 shown in FIG. 1 according to an embodiment of the present invention.

도 2를 참조하면, 영상 통화 서비스 제공 서버(100)는 모델 학습부(200), 영상 통화 수행부(210), 수신부(220), 영상 생성부(230), 및 분석부(240)를 포함할 수 있다. 다만, 도 2에 도시된 영상 통화 서비스 제공 서버(100)는 본 발명의 하나의 구현 예에 불과하며, 도 2에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다.Referring to FIG. 2 , the video callservice providing server 100 includes amodel learning unit 200 , a videocall performing unit 210 , areceiving unit 220 , animage generating unit 230 , and an analyzingunit 240 . can do. However, the video callservice providing server 100 shown in FIG. 2 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 2 .

이하에서는 도 3a 내지 5d를 함께 참조하여 도 2를 설명하기로 한다.Hereinafter, FIG. 2 will be described with reference to FIGS. 3A to 5D.

수신부(220)는 사용자 단말(110)로부터 복수의 학습 데이터를 수신할 수 있다. 여기서, 복수의 학습 데이터는 이미지 데이터 및 영상 데이터 중 적어도 하나를 포함할 수 있다. 이 때, 이미지 데이터는 인물 정보 및 위치 정보 중 적어도 어느 하나에 기초하여 사용자 단말(110)에 의해 분류된 데이터이다.Thereceiver 220 may receive a plurality of learning data from theuser terminal 110 . Here, the plurality of learning data may include at least one of image data and image data. In this case, the image data is data classified by theuser terminal 110 based on at least one of person information and location information.

수신부(220)는 사용자 단말(110)로부터 수신된 학습 데이터가 이미지 데이터인 경우, 분류된 이미지 데이터 별 인물 정보 또는 위치 정보를 사용자 단말(110)로부터 더 수신할 수 있다.When the learning data received from theuser terminal 110 is image data, thereceiver 220 may further receive person information or location information for each classified image data from theuser terminal 110 .

또한, 수신부(220)는 사용자 단말(110)로부터 수신된 학습 데이터가 영상 데이터인 경우, 영상 데이터에 등장하는 인물 정보 및 영상 데이터가 촬영된 위치 정보를 사용자 단말(110)로부터 더 수신할 수 있다.In addition, when the learning data received from theuser terminal 110 is image data, the receivingunit 220 may further receive information about a person appearing in the image data and location information at which the image data is captured from theuser terminal 110 . .

도 3a를 참조하면, 사용자 단말(110)은 사용자 단말(110)의 메모리에 저장된 복수의 이미지 데이터를 사용자 단말(110)의 현재 위치 정보 또는 사용자 단말(110)에 의해 설정된 위치 정보(예컨대, 이미지 데이터가 촬영된 위치 정보)에 기초하여 분류할 수 있다. 예를 들어, 사용자 단말(110)은 영상 데이터의 분류를 위해 설정된 위치 정보가 사용자의 집 위치 정보와 회사 위치 정보인 경우, 복수의 이미지 데이터 중 사용자의 집 위치 정보에 기초하여 사용자의 집에서 촬영된 이미지 데이터(301)를 그룹핑할 수 있고, 사용자의 회사 위치 정보에 기초하여 사용자의 회사에서 촬영된 이미지 데이터를 그룹핑하여 분류할 수 있다. 이 후, 사용자 단말(110)은 위치 정보에 따라 분류된 이미지 데이터를 포함하는 학습 데이터를 영상 통화 서비스 제공 서버(100)에게 전송할 수 있다.Referring to FIG. 3A , theuser terminal 110 transmits a plurality of image data stored in the memory of theuser terminal 110 to the current location information of theuser terminal 110 or location information (eg, an image) set by theuser terminal 110 . It can be classified based on the location information where the data was captured). For example, when the location information set for the classification of image data is the user's home location information and the company location information, theuser terminal 110 captures the image at the user's home based on the user's home location information among a plurality of image data.image data 301 may be grouped, and image data photographed in the user's company may be grouped and classified based on the user's company location information. Thereafter, theuser terminal 110 may transmit the training data including the image data classified according to the location information to the video callservice providing server 100 .

만일, 사용자 단말(110)은 사용자로부터 위치 정보를 입력받지 않은 경우, 이미지 데이터의 배경 정보 간의 유사도에 기초하여 복수의 이미지 데이터 중 유사 배경 정보를 갖는 이미지 데이터끼리 분류할 수도 있다.If location information is not received from the user, theuser terminal 110 may classify image data having similar background information among a plurality of image data based on the similarity between the background information of the image data.

또한, 사용자 단말(110)은 사용자로부터 영상 데이터의 분류를 위해 인물 정보를 입력받으면, 입력된 인물 정보에 기초하여 복수의 이미지 데이터를 분류할 수 있다. 예를 들어, 사용자로부터 입력된 인물 정보가 '사용자 본인'과 'A 인물'인 경우, 사용자 단말(110)은 복수의 영상 데이터 중 '사용자'만 등장하는 영상 데이터(303)만을 그룹핑하고, 'A 인물'만 등장하는 영상 데이터만을 그룹핑하여 분류할 수 있다. 이 후, 사용자 단말(110)은 인물 정보에 따라 분류된 이미지 데이터를 포함하는 학습 데이터를 영상 통화 서비스 제공 서버(100)에게 전송할 수 있다. 만일, 사용자 단말(110)은 사용자로부터 인물 정보를 입력받지 않은 경우, 이미지 데이터의 인물간 유사도에 기초하여 복수의 이미지 데이터 중 유사 인물이 포함된 이미지 데이터끼리 분류할 수도 있다.In addition, when receiving person information for classification of image data from the user, theuser terminal 110 may classify a plurality of image data based on the input person information. For example, if the person information input from the user is 'the user' and 'person A', theuser terminal 110 groups only theimage data 303 in which only the 'user' appears among the plurality of image data, and ' Only image data in which only 'person A' appears can be grouped and classified. Thereafter, theuser terminal 110 may transmit the learning data including the image data classified according to the person information to the video callservice providing server 100 . If person information is not received from the user, theuser terminal 110 may classify image data including similar persons among a plurality of image data based on the degree of similarity between persons in the image data.

모델 학습부(200)는 사용자 단말(110)로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습할 수 있다.Themodel learning unit 200 may learn an image generation model based on a plurality of training data received from theuser terminal 110 .

모델 학습부(200)는 복수의 학습 데이터가 이미지 데이터인 경우, 인물 정보 및 위치 정보 중 적어도 어느 하나에 따라 분류된 이미지 데이터에 기초하여 영상 생성 모델을 학습할 수 있다.When the plurality of training data is image data, themodel learning unit 200 may learn an image generation model based on image data classified according to at least one of person information and location information.

예를 들어, 도 3b를 참조하면, 모델 학습부(200)는 이미지 데이터에 포함된 객체(인물 정보에 대응되는 객체)의 표정 정보 및 배경 정보(위치 정보에 대응되는 배경)를 분석하여 각 이미지 데이터로부터 특징 요소를 추출할 수 있다. 또한, 모델 학습부(200)는 이미지 데이터로부터 객체와 배경을 분리하는 세그먼테이션(segmentation) 작업을 수행하고, 배경과 분리된 객체의 이미지, 인물 정보 및 해당 이미지 데이터의 특징 요소를 영상 생성 모델(305)에 입력하여 해당 객체의 이미지가 인물 정보에 대응되는 객체로 구분하도록 영상 생성 모델(305)을 학습할 수 있다.For example, referring to FIG. 3B , themodel learning unit 200 analyzes facial expression information and background information (background corresponding to location information) of an object (object corresponding to person information) included in image data to analyze each image Feature elements can be extracted from data. In addition, themodel learning unit 200 performs a segmentation operation for separating the object and the background from the image data, and sets the image of the object separated from the background, person information, and characteristic elements of the image data into theimage generation model 305 . ) to learn theimage generation model 305 so that the image of the corresponding object is classified as an object corresponding to the person information.

모델 학습부(200)는 복수의 학습 데이터가 영상 데이터인 경우, 인물 정보에 따라 영상 데이터에 포함된 복수의 영상 프레임을 분리하고, 인물 정보 별로 분리된 영상 프레임에 기초하여 영상 생성 모델을 학습할 수 있다.When the plurality of learning data is image data, themodel learning unit 200 separates a plurality of image frames included in the image data according to person information, and learns an image generation model based on the image frames separated for each person information. can

영상 통화 수행부(210)는 사용자 단말(110) 및 타사용자 단말(120) 간의 영상 통화 서비스를 수행할 수 있다. 예를 들어, 영상 통화 수행부(210)는 사용자 단말(110)로부터 타사용자 단말(120)과의 영상 통화 요청을 수신하면, 사용자 단말(110)에게 타사용자 단말(120)에 의해 촬영된 타사용자 실사 영상을 제공하고, 타사용자 단말(120)에게 사용자 단말(110)에 의해 촬영된 사용자 실사 영상을 제공할 수 있다.The videocall performing unit 210 may perform a video call service between theuser terminal 110 and theother user terminal 120 . For example, when the videocall performing unit 210 receives a video call request with theother user terminal 120 from theuser terminal 110 , the videocall performing unit 210 sends theuser terminal 110 to theother user terminal 120 photographed by theother user terminal 120 . It is possible to provide a user's actual image, and to provide a user's actual image taken by theuser terminal 110 to the other user terminal (120).

수신부(220)는 영상 통화 서비스가 수행되는 중에 사용자 단말(110)로부터 영상 생성 요청 메시지를 수신할 수 있다. 여기서, 영상 생성 요청 메시지는 대체 영상 요청 및 사용자 데이터를 포함할 수 있다. 사용자 데이터는 사용자 단말(110)이 타사용자 단말(120)에게 전하고자 하는 대화 내용으로 구성된 사용자의 음성 데이터 또는 텍스트 데이터를 포함할 수 있다. 예를 들어, 도 4a를 참조하면, 사용자 단말(110)이 타사용자 단말(120)과 영상 통화를 하고 있는 중에 영상 통화를 수행하기 어려운 상황이 발생하면, 사용자 단말(110)은 영상 통화 화면에 표시된 영상 생성 요청 인터페이스를 통해 입력할 수 있다.Thereceiver 220 may receive an image creation request message from theuser terminal 110 while the video call service is being performed. Here, the image creation request message may include an alternative image request and user data. The user data may include voice data or text data of a user composed of conversation contents that theuser terminal 110 wants to convey to theother user terminal 120 . For example, referring to FIG. 4A , if a situation in which it is difficult to perform a video call occurs while theuser terminal 110 is making a video call with anotheruser terminal 120 , theuser terminal 110 displays the video call screen. It can be entered through the displayed image creation request interface.

사용자 단말(110)이 영상 생성 요청 인터페이스를 통해 영상 생성 요청에 대한 정보를 입력하면, 수신부(220)는 사용자 단말(110)로부터 사용자 실사 영상의 촬영 중지와 관련된 안내 메시지와 함께 타사용자 단말(120)에게 전하고자 하는 대화 내용을 포함하는 영상 생성 요청 메시지를 수신할 수 있다. 또한, 수신부(220)는 사용자 단말(110)로부터 대체 영상의 배경으로 사용될 배경 정보를 더 수신할 수 있고, 사용자 단말(110)의 현재 위치 정보를 더 수신할 수 있다.When theuser terminal 110 inputs information about the image generation request through the image generation request interface, the receivingunit 220other user terminal 120 together with a guide message related to stopping the shooting of the user's live-action image from theuser terminal 110 . ) may receive an image creation request message including the conversation contents to be conveyed to the user. In addition, thereceiver 220 may further receive background information to be used as a background of the alternative image from theuser terminal 110 , and may further receive current location information of theuser terminal 110 .

영상 통화 수행부(210)는 사용자 단말(110) 및 타사용자 단말(120) 간의 실사 영상을 서로에게 제공하여 영상 통화 서비스를 수행할 수 있으며, 영상 통화 서비스의 수행 중에 수신된 영상 생성 요청 메시지에 포함된 대체 영상 요청 및 사용자 데이터(사용자의 음성 데이터 또는 텍스트 데이터)에 따라 실사 영상을 대신하여 대체 영상을 제공할 수 있다.The videocall performing unit 210 may perform a video call service by providing a live-action image between theuser terminal 110 and theother user terminal 120 to each other, and respond to the video generation request message received during the video call service. According to the included replacement image request and user data (user's voice data or text data), an alternative image may be provided in place of the actual image.

이를 위해, 영상 생성부(230)는 사용자 단말(110)로부터 수신된 영상 생성 요청 메시지에 기초하여 영상 생성 모델을 통해 사용자 단말(110)의 사용자가 등장하는 대체 영상을 생성할 수 있다.To this end, theimage generator 230 may generate an alternative image in which the user of theuser terminal 110 appears through the image generation model based on the image generation request message received from theuser terminal 110 .

영상 생성부(230)는 영상 생성 요청 메시지에 포함된 사용자의 음성 데이터 또는 텍스트 데이터에 기초하여 영상 생성 모델을 통해 대체 영상을 생성할 수 있다. 예를 들면, 사용자의 음성 데이터에 '나 곧 회의 시작해'가 포함되어 있는 경우, 영상 생성부(230)는 사용자가 근무하는 회사의 회사 배경에 사용자가 등장하는 대체 영상을 영상 생성 모델을 통해 생성할 수 있다.Theimage generator 230 may generate an alternative image through an image generation model based on the user's voice data or text data included in the image creation request message. For example, when 'I will start a meeting soon' is included in the user's voice data, theimage generating unit 230 generates an alternative image in which the user appears in the corporate background of the company where the user works through the image generation model. can do.

영상 생성부(230)는 사용자 단말(110)로부터 대체 영상의 배경에 적용될 배경 정보를 수신한 경우, 배경 정보에 해당하는 배경 영상 및 사용자의 이미지 데이터를 합성하여 대체 영상을 생성할 수 있다. 예를 들어, 회사에 근무 중인 사용자 단말(110)로부터 '공원' 배경 정보를 수신하면, 영상 생성부(230)는 '공원' 배경의 영상에 사용자의 이미지 데이터를 합성하여 대체 영상을 생성할 수 있다.When receiving the background information to be applied to the background of the replacement image from theuser terminal 110 , theimage generator 230 may generate the replacement image by synthesizing the background image corresponding to the background information and the user's image data. For example, when receiving the 'park' background information from theuser terminal 110 working at the company, theimage generating unit 230 may generate an alternative image by synthesizing the user's image data with the image of the 'park' background. have.

영상 생성부(230)는 사용자 단말(110)이 대체 영상의 배경 정보를 설정하지 않은 경우, 사용자 단말(110)로부터 영상 생성 요청 메시지가 수신된 시간 정보 및 시간 정보에 대응하는 사용자 단말(110)의 위치 정보에 기초하여 대체 영상을 생성할 수 있다.When theuser terminal 110 does not set the background information of the alternative image, theimage generator 230 is configured to receive the image creation request message from theuser terminal 110 and theuser terminal 110 corresponding to the time information. An alternative image may be generated based on the location information of

도 4b를 참조하면, 사용자 단말(110)로부터 수신된 영상 생성 요청 메시지가 텍스트 데이터이고, 사용자 단말(110)이 대체 영상의 배경에 적용될 배경 정보를 설정하지 않은 경우, 영상 생성부(230)는 텍스트 데이터를 수신된 시간 정보 및 시간 정보에 대응하는 사용자 단말(110)의 위치 정보에 대응하는 배경 이미지를 영상 생성 모델로부터 도출하고, 도출된 배경 이미지에 사용자의 이미지를 합성하여 대체 영상을 생성할 수 있다. 또한, 영상 생성부(230)는 텍스트 데이터를 음성으로 변환하여 변환된 음성을 대체 영상에 합성시킬 수 있다. 예를 들어, 회의중인 엄마(사용자 단말(110)의 사용자)에게 자녀(타사용자 단말(120)의 타사용자)가 영상 통화를 요청하는 상황인 경우, 영상 생성부(230)는 사용자 단말(110)이 위치한 회사의 회사 배경에 사용자의 이미지를 합성하여 제 1 대체 영상(40)을 생성할 수 있다.Referring to FIG. 4B , when the image generation request message received from theuser terminal 110 is text data and theuser terminal 110 does not set background information to be applied to the background of the replacement image, theimage generation unit 230 is To generate an alternative image by deriving a background image corresponding to the location information of theuser terminal 110 corresponding to the received time information and the time information from the image generation model, and synthesizing the user's image with the derived background image. can Also, theimage generator 230 may convert text data into voice and synthesize the converted voice into an alternative image. For example, in a situation in which a child (other user of other user terminal 120) requests a video call to a mother (user of user terminal 110) in a meeting,video generation unit 230 generates user terminal 110 ) may be created by synthesizing the user's image with the background of the company in which thefirst replacement image 40 is located.

또는, 영상 생성부(230)는 사용자의 이미지 데이터에 포함된 배경들 중에서 어느 하나의 배경을 선택하고, 선택된 배경에 사용자의 이미지를 합성하여 제 1 대체 영상을 생성할 수도 있다. 예를 들어, 회의중인 엄마(사용자 단말(110)의 사용자)에게 자녀(타사용자 단말(120)의 타사용자)가 영상 통화를 요청하는 상황인 경우, 영상 생성부(230)는 사용자의 이미지 데이터에 포함된 배경들 중 타사용자에게 친숙한 배경(예컨대, 사용자와 타사용자가 함께 찍은 사진 속의 배경)을 선택하여 이를 배경으로 제 2 대체 영상(42)을 생성할 수 있다.Alternatively, theimage generator 230 may select any one background from among the backgrounds included in the user's image data, and synthesize the user's image with the selected background to generate the first replacement image. For example, in a situation in which a child (other user of another user terminal 120) requests a video call to a mother (user of the user terminal 110) in a meeting, theimage generation unit 230 may display the user's image data Asecond replacement image 42 may be generated as a background by selecting a background familiar to other users (eg, a background in a photo taken by the user and the other user) from among the backgrounds included in the .

다른 예로, 사용자 단말(110)로부터 수신된 영상 생성 요청 메시지가 텍스트 데이터이고, 사용자 단말(110)이 대체 영상의 배경에 적용될 배경 정보를 다른 배경 정보로 변경한 경우, 영상 생성부(230)는 사용자 단말(110)에 의해 변경된 다른 배경 정보에 대응하는 배경 이미지를 사용자의 이미지를 합성하여 대체 영상을 생성할 수 있다.As another example, when the image generation request message received from theuser terminal 110 is text data, and theuser terminal 110 changes the background information to be applied to the background of the replacement image to other background information, theimage generator 230 may An alternative image may be generated by synthesizing a user's image with a background image corresponding to other background information changed by theuser terminal 110 .

도 4c를 참조하면, 사용자 단말(110)로부터 수신된 영상 생성 요청 메시지가 음성 데이터이고, 사용자 단말(110)이 대체 영상의 배경에 적용될 배경 정보를 설정하지 않은 경우, 영상 생성부(230)는 음성 데이터로부터 사용자의 현재 상황을 도출하고, 도출된 현재 상황에 적합한 배경 이미지에 사용자의 이미지를 합성하여 대체 영상을 생성할 수 있다. 예를 들어, 사용자 단말(110)의 사용자가 이동 중으로 영업 또는 상담을 영상통화로 진행할 수 없는 상황인 경우, 영상 생성부(230)는 무배경 이미지에 사용자의 이미지를 합성한 대체 영상(44)을 생성할 수 있다.Referring to FIG. 4C , when the image generation request message received from theuser terminal 110 is voice data, and theuser terminal 110 does not set background information to be applied to the background of the alternative image, theimage generation unit 230 is An alternative image may be generated by deriving the user's current situation from the voice data and synthesizing the user's image with a background image suitable for the derived current situation. For example, if the user of theuser terminal 110 is in a situation in which sales or consultation cannot be conducted through a video call while on the move, theimage generator 230 is analternative image 44 synthesizing the user's image with a non-background image. can create

영상 통화 수행부(210)는 생성된 대체 영상을 영상 통화 서비스를 통해 제공되는 사용자 단말(110)에 의해 촬영된 실사 영상을 대신하여 타사용자 단말(120)에게 전송함으로써 사용자 단말(110) 및 타사용자 단말(120) 간의 영상 통화 서비스를 수행할 수 있다.The videocall performing unit 210 transmits the generated replacement image to theother user terminal 120 instead of the actual image taken by theuser terminal 110 provided through the video call service to theuser terminal 110 and other users. A video call service between theuser terminals 120 may be performed.

수신부(220)는 사용자 단말(110) 및 타사용자 단말(120) 각각으로부터 주변 상황 정보, 사용자의 음성 정보 및 행동 정보, 위치 정보 등을 수집할 수 있다.Thereceiver 220 may collect surrounding context information, user's voice information and behavior information, location information, and the like from each of theuser terminal 110 and theother user terminal 120 .

분석부(240)는 영상 통화 중에 있는 사용자 단말(110) 및 타사용자 단말(120) 간의 통화 상황 정보를 분석할 수 있다. 여기서, 통화 상황 정보는 도 5a를 참조하면, 사용자 단말(110) 및 타사용자 단말(120) 각각에 대한 위치 정보, 시간 정보, 음성 피치 정보, 얼굴 표정 정보, 행동 정보 및 주변 환경 정보 중 적어도 하나를 포함할 수 있다. 예를 들어, 분석부(240)는 사용자 단말(110)이 전송한 기존 이미지 데이터의 배경들과 사용자 단말(110)의 위치 정보 간의 유사도 분석을 통해 사용자 단말(110)의 주변 환경 정보를 분석할 수 있다.Theanalysis unit 240 may analyze call situation information between theuser terminal 110 and theother user terminal 120 during a video call. Here, the call situation information is at least one of location information, time information, voice pitch information, facial expression information, behavior information, and surrounding environment information for each of theuser terminal 110 and theother user terminal 120 , referring to FIG. 5A . may include. For example, theanalysis unit 240 may analyze the surrounding environment information of theuser terminal 110 through a similarity analysis between the backgrounds of the existing image data transmitted by theuser terminal 110 and the location information of theuser terminal 110 . can

또한, 분석부(240)는 영상 통화 중에 있는 사용자 단말(110) 및 타사용자 단말(120) 각각에 대한 통화 상황 정보의 변경 여부를 실시간으로 판단할 수 있다. 예를 들어, 분석부(240)는 사용자 단말(110)의 사용자 또는 타사용자 단말(120)의 사용자 중 음성 피치 정보가 변경된 사용자가 누구인지, 얼굴 표정 정보의 변화가 있는 사용자가 누구인지 여부를 판단할 수 있다.In addition, theanalysis unit 240 may determine in real time whether the call situation information for each of theuser terminal 110 and theother user terminal 120 during a video call is changed. For example, theanalysis unit 240 determines whether the user of theuser terminal 110 or the user of theother user terminal 120 is a user whose voice pitch information is changed and who has a change in the facial expression information. can judge

또한, 분석부(240)는 대체 영상을 수신한 타사용자 단말(120)에 대한 통화 상황 정보를 분석할 수 있다. 예를 들어, 도 5b를 함께 참조하면, 분석부(240)는 타사용자 단말(120)의 실사 영상(50)에 포함된 타사용자 얼굴 이미지로부터 타사용자의 감정을 분석하고, 기설정된 감정 유형(분노 감정, 경멸 감정, 역겨움 감정, 두려움 감정, 행복 감정, 중립 감정, 슬픔 감정, 놀람 감정 등) 중 타사용자 단말의 감정이 속하는 감정 유형을 도출할 수 있다. 또한, 분석부(240)는 타사용자 단말(120)의 음성 데이터(52)로부터 음성 피치 정보 및 대화 내용을 분석할 수 있다.In addition, theanalysis unit 240 may analyze the call situation information about theother user terminal 120 that has received the replacement image. For example, referring together with FIG. 5B , theanalysis unit 240 analyzes the emotions of other users from the facial images of other users included in the live-action image 50 of theother user terminal 120 , and sets a preset emotion type ( An emotion type to which the emotion of the other user's terminal belongs among anger emotion, contempt emotion, disgust emotion, fear emotion, happiness emotion, neutral emotion, sadness emotion, surprise emotion, etc.) may be derived. Also, theanalyzer 240 may analyze voice pitch information and conversation contents from thevoice data 52 of theother user terminal 120 .

영상 생성부(230)는 분석된 통화 상황 정보(또는 변경된 통화 상황 정보)에 기초하여 반응형 영상을 생성할 수 있다. 예를 들어, 영상 생성부(230)는 대체 영상을 수신한 타사용자 단말(120)의 얼굴 표정 정보 또는 음성 피치 정보에 기초하여 배경 정보를 변경하고, 변경된 배경 정보에 사용자의 이미지를 합성하여 반응형 영상을 생성할 수 있다.Theimage generator 230 may generate a responsive image based on the analyzed call situation information (or changed call situation information). For example, theimage generator 230 changes the background information based on the facial expression information or the voice pitch information of theother user terminal 120 that has received the replacement image, and reacts by synthesizing the user's image with the changed background information. You can create a type image.

영상 통화 수행부(210)는 대체 영상을 반응형 영상으로 변경하여 타사용자 단말(120)에게 전송할 수 있다.The videocall performing unit 210 may change the replacement image to a responsive image and transmit it to theother user terminal 120 .

분석부(240)는 반응형 영상을 수신한 타사용자 단말(120)의 반응 정보를 분석할 수 있다. 여기서, 타사용자 단말(120)의 반응 정보는 타사용자의 얼굴 표정 정보 또는 음성 피치 정보로부터 도출될 수 있다. 예를 들어, 도 5c를 참조하면, 분석부(240)는 반응형 영상을 수신한 타사용자 단말(120)의 실사 영상으로부터 타사용자의 얼굴 표정 정보를 도출하고, 도출된 타사용자의 얼굴 표정 정보에 기초하여 기설정된 감정 유형(54) 별 타사용자의 감정 점수를 산출할 수 있다. 또한, 분석부(240)는 기설정된 감정 유형(54) 별 타사용자의 감정 점수에 기초하여 타사용자 단말(120)의 반응 정보를 분석할 수 있다.Theanalyzer 240 may analyze reaction information of theother user terminal 120 that has received the responsive image. Here, the reaction information of theother user terminal 120 may be derived from facial expression information or voice pitch information of the other user. For example, referring to FIG. 5C , theanalysis unit 240 derives the facial expression information of another user from the live-action image of theother user terminal 120 that has received the responsive image, and the derived facial expression information of the other user. Based on , it is possible to calculate the emotional score of other users for eachpreset emotion type 54 . Also, theanalysis unit 240 may analyze the reaction information of theother user terminal 120 based on the emotion score of the other user for eachpreset emotion type 54 .

또한, 분석부(240)는 반응형 영상을 수신한 타사용자 단말(120)의 통화 상황 정보로부터 타사용자 단말(120)의 상태 정보(예컨대, 컨디션 상태 등)를 분석할 수 있다.In addition, theanalysis unit 240 may analyze the state information (eg, condition state, etc.) of theother user terminal 120 from the call situation information of theother user terminal 120 that has received the responsive image.

영상 생성부(230)는 분석된 타사용자 단말(120)의 반응 정보 또는 상태 정보에 기초하여 다른 반응형 영상을 생성할 수 있다. 예를 들어, 영상 생성부(230)는 반응형 영상을 수신한 타사용자 단말(120)의 반응 정보 또는 상태 정보에 대응하는 얼굴 표정 정보 또는 음성 피치 정보에 기초하여 배경 정보를 변경하고, 변경된 배경 정보에 사용자의 이미지를 합성하여 다른 반응형 영상을 생성할 수 있다.Theimage generator 230 may generate another responsive image based on the analyzed reaction information or state information of theother user terminal 120 . For example, theimage generator 230 changes the background information based on the facial expression information or voice pitch information corresponding to the reaction information or the state information of theother user terminal 120 that has received the responsive image, and changes the background information. Another responsive image can be created by synthesizing the information with the user's image.

사용자 단말(110)과 타사용자 단말(120) 간의 통화 내용이 이전 통화 내용과 동일하더라도 타사용자 단말(120)의 반응 정보 또는 상태 정보에 따라 상이한 반응형 영상이 생성될 수 있다.Even if the content of the call between theuser terminal 110 and theother user terminal 120 is the same as the previous call, a different responsive image may be generated according to the reaction information or status information of theother user terminal 120 .

모델 학습부(200)는 분석된 타사용자 단말(120)의 반응 정보 또는 상태 정보를 피드백 정보로서 영상 생성 모델에 입력하여 영상 생성 모델을 재학습시킬 수 있다. 또한, 모델 학습부(200)는 기설정된 감정 방향에 따라 타사용자 단말(120)의 타사용자가 감정이 변할 수 있도록 하는 다른 반응형 영상을 생성하도록 영상 생성 모델을 재학습할 수 있다. 여기서, 기설정된 감정 방향은 사용자 단말(110)에 의해 설정된 감정 방향이거나 영상 통화 서비스 제공 서버(100)에 의해 설정된 감정 방향일 수 있다.Themodel learning unit 200 may re-learn the image generation model by inputting the analyzed response information or state information of theother user terminal 120 as feedback information to the image generation model. Also, themodel learning unit 200 may re-learn the image generation model to generate another responsive image that allows other users of theother user terminal 120 to change their emotions according to a preset emotion direction. Here, the preset emotion direction may be an emotion direction set by theuser terminal 110 or an emotion direction set by the video callservice providing server 100 .

영상 생성부(230)는 재학습된 영상 생성 모델을 통해 기설정된 감정 방향으로 타사용자의 감정의 변화를 유도하는 다른 반응형 영상을 생성할 수 있다. 예를 들어, 영상 생성부(230)는 타사용자 단말(120)의 타사용자의 반응 정보가 '슬픔' 감정 정보를 갖고 있는 경우, '슬픔'감정 정보에서 '좋음' 감정 정보로 변화하는 방향으로 유도할 수 있도록 하는 다른 반응형 영상을 영상 생성 모델을 통해 생성할 수 있다. 또는 영상 생성부(230)는 타사용자 단말(120)의 타사용자의 반응 정보가 '분노' 감정 정보를 갖고 있는 경우, '분노' 감정 정보에서 '편안함' 감정 정보로 변화하는 방향으로 유도할 수 있도록 하는 다른 반응형 영상을 영상 생성 모델을 통해 생성할 수 있다.Theimage generator 230 may generate another responsive image that induces a change in the emotions of other users in a preset emotional direction through the re-learned image generation model. For example, when the reaction information of the other user of theother user terminal 120 has 'sadness' emotional information, theimage generator 230 changes from 'sad' emotional information to 'good' emotional information. Other responsive images that can be induced can be generated through the image generation model. Alternatively, when the reaction information of the other user of theother user terminal 120 has 'anger' emotional information, theimage generator 230 may guide the change from 'anger' emotional information to 'comfortable' emotional information. Other responsive images can be generated through the image generation model.

영상 통화 수행부(210)는 반응형 영상을 다른 반응형 영상으로 변경하여 타사용자 단말에게 전송할 수 있다.The videocall performing unit 210 may change the responsive image to another responsive image and transmit it to another user terminal.

예를 들어, 도 5d를 참조하면, 출근길에 있는 사용자 단말(110)(예컨대, 엄마)이 집에 있는 타사용자 단말(120)(예컨대, 딸)과 영상 통화를 하고 있는 중에 사용자 단말(110)이 회사에 도착한 경우, 분석부(240)는 사용자 단말(110)의 위치 정보가 '출근길'에서 '회사'로 변경되었음을 확인할 수 있다. 또한, 분석부(240)는 사용자 단말(110)과 타사용자 단말(120) 간의 통화 내용(예컨대, '회사 다왔어?') 및 사용자 단말(110)의 음성 피치 정보(예컨대, 볼륨이 감소)에 기초하여 사용자 단말(110)의 통화 상황 정보가 변경되었음을 확인할 수 있다.For example, referring to FIG. 5D , while the user terminal 110 (eg, mother) on the way to work is making a video call with another user terminal 120 (eg, daughter) at home, theuser terminal 110 . When arriving at this company, theanalysis unit 240 may confirm that the location information of theuser terminal 110 has been changed from 'commuting to work' to 'company'. In addition, theanalysis unit 240 is the content of the call between theuser terminal 110 and the other user terminal 120 (eg, 'Are you at work?') and voice pitch information of the user terminal 110 (eg, the volume is reduced). It can be confirmed that the call situation information of theuser terminal 110 is changed based on the .

영상 생성부(230)는 변경된 사용자 단말(110)의 위치 정보 및 음성 피치 정보에 기초하여 사용자 단말(110)이 위치한 회사의 회사 배경에 사용자의 이미지를 합성하여 대체 영상을 생성하고, 영상 통화 수행부(210)는 사용자의 실사 영상 대신에 대체 영상을 타사용자 단말(120)에게 전송할 수 있다.Theimage generator 230 generates an alternative image by synthesizing the user's image with the corporate background of the company in which theuser terminal 110 is located based on the changed location information and voice pitch information of theuser terminal 110, and performs a video call Theunit 210 may transmit an alternative image to theother user terminal 120 instead of the actual image of the user.

이 후, 분석부(240)는 대체 영상을 수신한 타사용자 단말(120)과 타사용자 단말(120)에 의해 촬영된 타사용자 실사 영상을 수신한 사용자 단말(110) 간의 통화 내용 및 통화 상황 정보를 분석할 수 있다.Thereafter, theanalysis unit 240 is a call content and call situation information between theother user terminal 120 that has received the replacement image and theuser terminal 110 that has received the other user's live-action image taken by theother user terminal 120 . can be analyzed.

타사용자 단말(120)의 타사용자의 음성 피치 정보(예컨대, 낮은 음성 피치) 및 얼굴 표정 정보(예컨대, 어두운 표정)가 변경되면, 영상 생성부(230)는 타사용자의 감정을 기설정된 감정 방향으로 변하도록 유도하기 위해 타사용자와 친근한 배경인 집 배경에 사용자의 이미지를 합성하여 제 1 반응형 영상을 생성하고, 영상 통화 수행부(210)는 대체 영상 대신에 제 1 반응형 영상을 타사용자 단말(120)에게 전송할 수 있다.When the other user's voice pitch information (eg, low voice pitch) and facial expression information (eg, dark facial expression) of theother user terminal 120 is changed, theimage generator 230 sets the emotion of the other user in a preset emotional direction. The first responsive image is generated by synthesizing the user's image with the home background, which is a familiar background with other users, and the videocall performing unit 210 converts the first responsive image to the other user instead of the alternative image. It can be transmitted to the terminal 120 .

이후, 분석부(240)는 제 1 반응형 영상을 수신한 타사용자 단말(120)과 타사용자 실사 영상을 수신한 사용자 단말(110) 간의 통화 내용 및 통화 상황 정보를 분석할 수 있다.Thereafter, theanalysis unit 240 may analyze the call content and call situation information between theother user terminal 120 that has received the first responsive image and theuser terminal 110 that has received the other user's live-action image.

영상 생성부(230)는 분석된 타사용자의 음성 피치 정보(예컨대, 높은 음성 피치) 및 얼굴 표정 정보(예컨대, 밝은 표정)에 따른 타사용자의 반응 정보에 기초하여 사용자 단말(110)의 상황이 반영된 회사 배경에 사용자의 이미지를 합성시켜 제 2 반응형 영상을 생성하고, 영상 통화 수행부(210)는 제 1 반응형 영상 대신에 제 2 반응형 영상을 타사용자 단말(120)에게 전송할 수 있다.Theimage generator 230 determines the situation of theuser terminal 110 based on the analyzed other user's voice pitch information (eg, high voice pitch) and the reaction information of other users according to facial expression information (eg, bright facial expression). A second responsive image is generated by synthesizing the user's image with the reflected corporate background, and the videocall performing unit 210 may transmit the second responsive image to theother user terminal 120 instead of the first responsive image .

한편, 이하에서는 통화 중 응급 상황이 발생된 경우, 응급 상황에 따른 반응형 영상을 제공하는 일 실시예를 설명한다.Meanwhile, an embodiment of providing a responsive image according to an emergency situation when an emergency situation occurs during a call will be described below.

예를 들어, 분석부(240)는 타사용자 단말(120)(예컨대, 응급 조치가 필요한 사용자의 단말)와 사용자 단말(110)(예컨대, 응급구조사의 단말 등) 간의 통화 내용 및 통화 상황 정보를 분석할 수 있다. 예를 들어, 분석부(240)는 타사용자 단말(120)의 다급한 목소리를 포함하는 음성 피치 정보를 분석하고, 통화 내용을 통해 타사용자 단말(120)의 응급 상황을 분석할 수 있다.For example, theanalysis unit 240 analyzes the call content and call situation information between the other user terminal 120 (eg, a terminal of a user in need of emergency measures) and the user terminal 110 (eg, a terminal of an emergency responder, etc.) can be analyzed. For example, theanalysis unit 240 may analyze voice pitch information including the urgent voice of theother user terminal 120 and analyze the emergency situation of theother user terminal 120 through the content of the call.

영상 생성부(230)는 분석된 타사용자 단말(120)의 응급 상황에 대한 정보(예컨대, 과호흡 발생)에 기초하여 타사용자 단말(120)에게 필요한 응급 조치 방법(예컨대, 과호흡 발생시 진정하는 방법)을 포함하는 반응형 영상을 생성하고, 영상 통화 수행부(210)는 응급 조치 방법을 포함하는 반응형 영상을 타사용자 단말(120)에게 전송할 수 있다.Theimage generating unit 230 is based on the analyzed information on the emergency situation of the other user terminal 120 (eg, hyperventilation), the emergency action method necessary for the other user terminal 120 (eg, to calm down when hyperventilation occurs) method), and the videocall performing unit 210 may transmit the responsive image including the emergency action method to theother user terminal 120 .

이하에서는 고객과 상담 업무를 진행하는 과정에서 반응형 영상을 제공하는 일 실시예를 설명한다.Hereinafter, an embodiment in which a responsive image is provided in a process of consulting with a customer will be described.

예를 들어, 사용자 단말(110)(예컨대, 상담사 단말)이 타사용자 단말(120)(예컨대, 고객의 단말)에게 상담을 진행하는 중에 약관 설명이 필요한 경우, 영상 생성부(230)는 약관 설명이 필요한 부분의 약관 정보를 텍스트 처리하여 반응형 영상을 생성하고, 영상 통화 수행부(210)는 약관 정보를 포함하는 반응형 영상을 타사용자 단말(120)에게 전송할 수 있다.For example, when the user terminal 110 (eg, the counselor's terminal) needs to explain the terms and conditions while consulting with the other user's terminal 120 (eg, the customer's terminal), theimage generating unit 230 explains the terms and conditions A responsive image is generated by text processing the terms and conditions information of the necessary part, and the videocall performing unit 210 may transmit the responsive image including the terms and conditions information to theother user terminal 120 .

분석부(240)를 통해 타사용자 단말(120)의 통화 상황 정보로부터 타사용자 단말(120)이 약관 정보를 이해하지 못한 부분이 있다고 판단되면, 영상 생성부(230)는 타사용자 단말(120)이 이해하지 못한 부분에 대한 추가 보완 영상을 포함하는 다른 반응형 영상을 생성하고, 영상 통화 수행부(210)는 다른 반응형 영상을 타사용자 단말(120)에게 전송할 수 있다.When it is determined that there is a part where theother user terminal 120 does not understand the terms and conditions information from the call situation information of theother user terminal 120 through theanalysis unit 240 , theimage generation unit 230 is theother user terminal 120 . Another responsive image including an additional complementary image for the part that is not understood may be generated, and the videocall performing unit 210 may transmit another responsive image to theother user terminal 120 .

만일, 타사용자 단말(120)의 타사용자가 운전 중에 있어 다른 반응형 영상을 수신하기 곤란한 상황인 경우, 영상 통화 수행부(210)는 타사용자 단말(120)의 위치 정보에 기초하여 다른 반응형 영상 대신에 사용자 단말(110)의 실사 영상으로 전환하여 사용자 단말(110)의 실사 영상을 타사용자 단말(120)에게 제공할 수 있다.If it is difficult to receive another responsive image while the other user of theother user terminal 120 is driving, the videocall performing unit 210 performs another responsive type based on the location information of theother user terminal 120 . The actual image of theuser terminal 110 may be provided to theother user terminal 120 by switching to the actual image of theuser terminal 110 instead of the image.

한편, 당업자라면, 모델 학습부(200), 영상 통화 수행부(210), 수신부(220), 영상 생성부(230) 및 분석부(240) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다.Meanwhile, for those skilled in the art, each of themodel learning unit 200, the videocall performing unit 210, the receivingunit 220, theimage generating unit 230, and the analyzingunit 240 may be implemented separately, or one or more of these may be integrated. It will be fully understood that it can be implemented.

도 6은 본 발명의 일 실시예에 따른, 영상 통화 서비스를 제공하는 방법을 나타낸 흐름도이다.6 is a flowchart illustrating a method of providing a video call service according to an embodiment of the present invention.

도 6을 참조하면, 단계 S601에서 영상 통화 서비스 제공 서버(100)는 사용자 단말(110)로부터 수신된 복수의 학습 데이터에 기초하여 영상 생성 모델을 학습할 수 있다.Referring to FIG. 6 , in step S601 , the video callservice providing server 100 may learn an image generation model based on a plurality of learning data received from theuser terminal 110 .

단계 S603에서 영상 통화 서비스 제공 서버(100)는 사용자 단말(110) 및 타사용자 단말(120) 간의 영상 통화 서비스를 수행할 수 있다.In step S603 , the video callservice providing server 100 may perform a video call service between theuser terminal 110 and theother user terminal 120 .

단계 S605에서 영상 통화 서비스 제공 서버(100)는 영상 통화 서비스가 수행되는 중에 사용자 단말(110)로부터 영상 생성 요청 메시지를 수신할 수 있다.In step S605 , the video callservice providing server 100 may receive an image creation request message from theuser terminal 110 while the video call service is being performed.

단계 S607에서 영상 통화 서비스 제공 서버(100)는 사용자 단말(110)로부터 영상 수신된 영상 생성 요청 메시지에 기초하여 영상 생성 모델을 통해 사용자 단말(110)의 사용자가 등장하는 대체 영상을 생성할 수 있다.In step S607 , the video callservice providing server 100 may generate an alternative image in which the user of theuser terminal 110 appears through the image generation model based on the image generation request message received from theuser terminal 110 . .

단계 S609에서 영상 통화 서비스 제공 서버(100)는 영상 통화 서비스를 통해 제공되는 사용자 단말(110)에 의해 촬영된 실사 영상 대신 생성된 대체 영상을 타사용자 단말(120)에게 전송할 수 있다.In step S609 , the video callservice providing server 100 may transmit the generated replacement image to theother user terminal 120 instead of the actual image captured by theuser terminal 110 provided through the video call service.

상술한 설명에서, 단계 S601 내지 S609는 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S601 to S609 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

도 7은 본 발명의 일 실시예에 따른, 반응형 영상을 제공하는 방법을 나타낸 흐름도이다.7 is a flowchart illustrating a method of providing a responsive image according to an embodiment of the present invention.

도 7을 참조하면, 단계 S701에서 영상 통화 서비스 제공 서버(100)는 영상 통화 중인 사용자 단말(110) 및 타사용자 단말(120) 간의 통화 상황 정보를 분석할 수 있다. 여기서, 통화 상황 정보는 사용자 단말(110) 및 타사용자 단말(120) 각각에 대한 위치 정보, 음성 피치 정보, 행동 정보 및 얼굴 표정 정보 중 적어도 하나를 포함할 수 있다.Referring to FIG. 7 , in step S701 , the video callservice providing server 100 may analyze call situation information between theuser terminal 110 and theother user terminal 120 during a video call. Here, the call situation information may include at least one of location information, voice pitch information, behavior information, and facial expression information for each of theuser terminal 110 and theother user terminal 120 .

단계 S703에서 영상 통화 서비스 제공 서버(100)는 분석된 통화 상황 정보에 기초하여 반응형 영상을 생성할 수 있다.In step S703, the video callservice providing server 100 may generate a responsive video based on the analyzed call situation information.

단계 S705에서 영상 통화 서비스 제공 서버(100)는 영상 통화 서비스를 통해 제공되는 사용자 단말(110)의 사용자가 등장하는 대체 영상을 반응형 영상으로 변경하여 타사용자 단말(120)에게 전송할 수 있다.In step S705 , the video callservice providing server 100 may change an alternative image in which the user of theuser terminal 110 provided through the video call service appears into a responsive image and transmit it to theother user terminal 120 .

단계 S707에서 영상 통화 서비스 제공 서버(100)는 반응형 영상을 수신한 타사용자 단말(120)의 반응 정보를 분석할 수 있다.In step S707 , the video callservice providing server 100 may analyze reaction information of theother user terminal 120 that has received the responsive video.

단계 S709에서 영상 통화 서비스 제공 서버(100)는 타사용자 단말(120)의 반응 정보에 기초하여 다른 반응형 영상을 생성할 수 있다.In step S709 , the video callservice providing server 100 may generate another responsive image based on the response information of theother user terminal 120 .

단계 S711에서 영상 통화 서비스 제공 서버(100)는 반응형 영상을 다른 반응형 영상으로 변경하여 타사용자 단말(120)에게 전송할 수 있다.In step S711 , the video callservice providing server 100 may change the responsive image to another responsive image and transmit it to theother user terminal 120 .

상술한 설명에서, 단계 S701 내지 S711은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S701 to S711 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

100: 영상 통화 서비스 제공 서버
110: 사용자 단말
120: 타사용자 단말
200: 모델 학습부
210: 영상 통화 수행부
220: 수신부
230: 영상 생성부
240: 분석부100: video call service providing server
110: user terminal
120: other user terminal
200: model learning unit
210: video call performing unit
220: receiver
230: image generating unit
240: analysis unit