WO2025075239A1

Movatterモバイル変換

Info

Publication number: WO2025075239A1
Application number: PCT/KR2023/016892
Authority: WO
Inventors: 이수민
Original assignee: Wayne Hills Bryant AI Co Ltd
Current assignee: Wayne Hills Bryant AI Co Ltd
Priority date: 2023-10-05
Filing date: 2023-10-27
Publication date: 2025-04-10
Anticipated expiration: 2026-04-05
Also published as: WO2025075238A1

Abstract

A system for generating multimedia content on the basis of neural network signals, according to one embodiment, comprises: a neural network signal measurement instrument including sensors for measuring at least one type of neural network signal; an electronic device for controlling the neural network signal measurement instrument, identifying a user-intended prompt on the basis of at least one neural network signal acquired from the neural network signal measurement instrument, and outputting multimedia content generated by matching an image resource to the prompt; and a server for acquiring the prompt from the electronic device, and generating multimedia content by matching an image resource to the prompt on the basis of a core keyword for each of paragraphs including one or more sentences included in the prompt.

Description

Translated fromKorean

뉴럴네트워크 신호 기반 인공지능 자동화 컨텐츠 생성/합성 방법 및 이를 수행하는 멀티미디어 컨텐츠 생성 시스템Neural network signal-based artificial intelligence automated content generation/synthesis method and multimedia content generation system performing the same

본 개시는 뉴럴 네트워크 신호 기반 컨텐츠 생성/합성 방법에 관한 것이다. 보다 상세하게는 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성/합성하는 시스템 및 이의 동작 방법에 관한 것이다.The present disclosure relates to a method for generating/synthesizing content based on a neural network signal. More specifically, it relates to a system for generating/synthesizing multimedia content based on a neural network signal and an operating method thereof.

개인화된 다양한 컨텐츠들이 제작되고 있으며, 기존 텍스트 기반 컨텐츠들에서 영상화 기반 멀티미디어 컨텐츠들이 기하 급수적으로 증가하고 있다. 다만 영상 제작, 편집 및 생성에 많은 비용이 소모되면서 이미지 또는 영상 기반 멀티미디어 컨텐츠들을 생성하기 위한 다양한 기술들이 공개되고 있다.A variety of personalized content is being produced, and multimedia content based on visualization is increasing exponentially from existing text-based content. However, as video production, editing, and creation cost a lot, various technologies for creating image or video-based multimedia content are being released.

일 실시 예에 의하면 프롬프트 기반 멀티미디어 컨텐츠를 생성하는 기술들이 개시된 바 있고, 이러한 기술을 통해 적은 비용으로도 사용자들이 원하는 멀티미디어 컨텐츠를 제작하는 서비스가 개시된 바 있다.According to one embodiment, techniques for generating prompt-based multimedia content have been disclosed, and a service for producing multimedia content desired by users at low cost has been disclosed through these techniques.

수요자들이 원하는 멀티미디어 컨텐츠를 신속하고 정확하게 생성하기 위해서는 수요자 의도를 파악하기 위한 기술, 파악된 의도와 문맥에 맞는 영상을 매칭하기 위한 기술, 생성된 영상을 원하는 수요자들에 공급하기 위한 기술, 생성 영상을 편집하기 위한 편집 툴에 대한 기술 등 복합적인 기술 개발이 요구된다.In order to quickly and accurately create multimedia content that users want, the development of complex technologies is required, including technologies for understanding the user's intent, technologies for matching images that match the understood intent and context, technologies for supplying the created images to users who want them, and technologies for editing tools for editing the created images.

한편, 사람의 생각을 효과적으로 인식하기 위해 뇌-컴퓨터 인터페이스를 이용하여 텍스트나 이미지를 생성하기 위한 기술들이 개발되고 있다. 일반적으로 뇌-컴퓨터 인터페이스(BCI, Brain-Computer Interface)는 시각적 자극에 의해 유도되는 신경 응답에 따른 뇌 신경 신호들을 획득하게 되는데, 일 예로 뇌전도 신호를 이용 문자, 이미지, 영상을 생성하는 연구들이 개시된 바 있다.Meanwhile, technologies are being developed to generate text or images using brain-computer interfaces to effectively recognize human thoughts. In general, brain-computer interfaces (BCIs) acquire brain neural signals based on neural responses induced by visual stimuli. For example, studies have been initiated to generate text, images, and videos using electroencephalography signals.

다만, 기존 BCI 로부터 획득되는 신호 분석의 정확도 및 시간당 정보 획득량의 한계가 있으며, 기존 BCI 들에서 획득가능한 다양한 타입의 신호들을 복합적으로 이용하여 사용자 의도를 정확하게 파악하기 위한 기술 개발이 요구되고 있다.However, there are limitations in the accuracy of signal analysis obtained from existing BCIs and the amount of information obtained per hour, and there is a need for technology development to accurately identify user intent by comprehensively utilizing various types of signals obtainable from existing BCIs.

일부 실시예에 의하면, 사용자의 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 시스템 및 이의 동작 방법이 제공될 수 있다.According to some embodiments, a system and a method of operating the same for generating multimedia content based on a user's neural network signal may be provided.

일 실시 예에 의하면, 사용자의 뉴럴 네트워크 신호에 기초하여 프롬프트를 식별하고, 프롬프트에 적어도 하나의 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성하는 시스템 및 이의 동작 방법이 제공될 수 있다.According to one embodiment, a system and a method of operating the same may be provided for generating multimedia content by identifying a prompt based on a user's neural network signal and matching at least one image resource to the prompt.

상술한 기술적 과제를 달성하기 위한 본 개시의 일 실시 예에 따라, 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 시스템에 있어서, 적어도 하나의 타입의 뉴럴 네트워크 신호를 측정하는 센서들을 포함하는 뉴럴 네트워크 신호 측정기; 상기 뉴럴 네트워크 신호 측정기를 제어하고, 상기 뉴럴 네트워크 신호 측정기로부터 획득된 적어도 하나의 뉴럴 네트워크 신호에 기초하여 상기 사용자가 의도한 프롬프트를 식별하며, 상기 프롬프트에 영상 리소스를 매칭함으로써 생성되는 멀티미디어 컨텐츠를 출력하는 전자 장치; 및 상기 전자 장치로부터 상기 프롬프트를 획득하고, 상기 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여 상기 프롬프트에 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성하는 서버; 를 포함하는 멀티미디어 컨텐츠 생성 시스템이 제공될 수 있다.According to an embodiment of the present disclosure for achieving the above-described technical problem, a multimedia content generation system based on a neural network signal may be provided, comprising: a neural network signal measuring device including sensors measuring at least one type of neural network signal; an electronic device controlling the neural network signal measuring device, identifying a prompt intended by the user based on at least one neural network signal obtained from the neural network signal measuring device, and outputting multimedia content generated by matching an image resource to the prompt; and a server obtaining the prompt from the electronic device and generating multimedia content by matching an image resource to the prompt based on core keywords of paragraphs including one or more sentences included in the prompt.

상술한 기술적 과제를 해결하기 위한 본 개시의 또 다른 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템의 동작 방법에 있어서, 적어도 하나의 타입의 뉴럴 네트워크 신호를 획득하는 단계; 적어도 하나의 튜럴 네트워크 신호에 기초하여 사용자가 의도한 프롬프트를 식별하는 단계; 상기 식별된 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여, 상기 프롬프트에 영상 리소스를 매칭하는 단계; 및 상기 프롬프트에 매칭된 상기 영상 리소스를 합성함으로써 멀티미디어 컨텐츠를 생성하는 단계;를 포함하는 멀티미디어 컨텐츠 생성 시스템의 동작 방법이 제공될 수 있다.According to another embodiment of the present disclosure for solving the above-described technical problem, a method of operating a multimedia content generation system may be provided, including: a step of obtaining at least one type of neural network signal; a step of identifying a prompt intended by a user based on at least one neural network signal; a step of matching an image resource to the prompt based on a core keyword of paragraphs including one or more sentences included in the identified prompt; and a step of generating multimedia content by synthesizing the image resource matched to the prompt.

상술한 기술적 과제를 달성하기 위한 본 개시의 또 다른 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템이 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 방법에 있어서, 외부 디바이스로부터 획득되는 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여, 상기 사용자가 의도한 프롬프트를 식별하는 단계; 상기 식별된 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여, 상기 프롬프트에 영상 리소스를 매칭하는 단계; 및 상기 프롬프트에 매칭된 상기 영상 리소스를 합성함으로써 멀티미디어 컨텐츠를 생성하는 단계; 를 포함하는, 방법을 수행하도록 하는 프로그램이 저장된 컴퓨터로 읽을 수 있는 기록 매체가 제공될 수 있다.According to another embodiment of the present disclosure for achieving the above-described technical problem, a method for generating multimedia content based on a neural network signal by a multimedia content generation system may be provided, the method including: identifying a prompt intended by a user based on at least one type of neural network signal acquired from an external device; matching an image resource to the prompt based on core keywords of paragraphs including one or more sentences included in the identified prompt; and generating multimedia content by synthesizing the image resource matched to the prompt; a computer-readable recording medium storing a program for causing the method to be performed.

일 실시 예에 의하면 뉴럴 네트워크 신호 측정기, 전자 장치 및 서버가 서로 연동함으로써 사용자 의사에 따른 프롬프트를 정확하게 식별할 수 있다.In one embodiment, a neural network signal measuring device, an electronic device, and a server can work together to accurately identify a user's intended prompt.

일 실시 예에 의하면, 멀티미디어 컨텐츠 생성 서비스를 이용하는 사용자의 피로도에 따라 제어되는 가상 입력 장치들이 제공될 수 있다.In one embodiment, virtual input devices may be provided that are controlled according to the fatigue level of a user using a multimedia content creation service.

도 1은 일 실시 예에 따른 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성 및 합성하는 과정을 개략적으로 나타내는 도면이다.FIG. 1 is a diagram schematically illustrating a process of generating and synthesizing multimedia content based on a neural network signal according to one embodiment.

도 2는 일 실시 예에 따른 전자 장치가 뉴럴 네트워크 신호 측정기로부터 획득하는 뉴럴 네트워크 신호의 종류와 뉴럴 네트워크 신호 별로 결정 가능한 신호 특징들을 설명하기 위한 도면이다.FIG. 2 is a diagram for explaining the types of neural network signals acquired from a neural network signal measuring device by an electronic device according to one embodiment of the present invention and signal characteristics that can be determined for each neural network signal.

도 3은 일 실시 예에 따른 전자 장치가 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 방법의 흐름도이다.FIG. 3 is a flowchart of a method for an electronic device to generate multimedia content based on a neural network signal according to one embodiment.

도 4는 일 실시 예에 따른 전자 장치가 뉴럴 네트워크 신호로부터 텍스트 이미지를 생성하기 위해 이용하는 신경망 기반 네트워크의 동작을 설명하기 위한 도면이다.FIG. 4 is a diagram illustrating the operation of a neural network-based network used by an electronic device to generate a text image from a neural network signal according to one embodiment.

도 5는 일 실시 예에 따라 전자 장치가 가상 입력 장치 이미지상 식별된 자판 영역에 대응되는 텍스트 요소 또는 문자 요소를 식별하는 구체적인 방법의 흐름도이다.FIG. 5 is a flowchart of a specific method for an electronic device to identify a text element or character element corresponding to a keyboard area identified on a virtual input device image according to one embodiment.

도 6은 일 실시 예에 따른 전자 장치가 디스플레이상에 출력하는 가상 입력 장치 이미지와, 상기 가상 입력 장치 이미지상 소정의 자판 영역을 선택하려는 사용자의 의사를 검증하는 과정을 설명하기 위한 도면이다.FIG. 6 is a drawing for explaining a process of verifying a virtual input device image output on a display by an electronic device according to one embodiment of the present invention and a user's intention to select a predetermined keyboard area on the virtual input device image.

도 7은 일 실시 예에 따른 전자 장치가 가상 입력 장치 이미지상 소정의 자판 영역을 선택하는 사용자 의사를 검증하는 방법의 흐름도이다.FIG. 7 is a flowchart of a method for verifying a user's intention to select a predetermined keyboard area on a virtual input device image by an electronic device according to one embodiment.

도 8은 일 실시 예에 따른 전자 장치가 사용자의 의사를 검증하는 구체적인 방법의 흐름도이다.FIG. 8 is a flowchart of a specific method for an electronic device to verify a user's intention according to one embodiment.

도 9는 일 실시 예에 따른 전자 장치가 복수의 인터페이스 영역들에 가상 입력 장치와 후보 텍스트들을 제공하는 과정을 설명하기 위한 도면이다.FIG. 9 is a diagram illustrating a process in which an electronic device according to one embodiment provides virtual input devices and candidate texts to multiple interface areas.

도 10은 일 실시 예에 따른 전자 장치가 사용자의 집중력 수준에 따라 시각적 자극의 주파수 대역을 변경하는 동작을 설명하기 위한 도면이다.FIG. 10 is a diagram illustrating an operation of an electronic device according to one embodiment of the present invention to change a frequency band of a visual stimulus depending on a user's concentration level.

도 11은 일 실시 예에 따른 컨텐츠 생성 시스템, 전자 장치 및 서버의 블록도이다.FIG. 11 is a block diagram of a content creation system, an electronic device, and a server according to one embodiment.

도 12는 또 다른 실시 예에 따른 전자 장치의 블록도이다.FIG. 12 is a block diagram of an electronic device according to another embodiment.

도 13은 일 실시 예에 따른 뉴럴 네트워크 신호 측정기, 전자 장치 및 서버가 서로 연동함으로써 멀티미디어 컨텐츠를 생성하는 과정을 설명하기 위한 도면이다.FIG. 13 is a diagram for explaining a process of generating multimedia content by interconnecting a neural network signal measuring device, an electronic device, and a server according to one embodiment.

일 실시 예에 따른 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 시스템은 적어도 하나의 타입의 뉴럴 네트워크 신호를 측정하는 센서들을 포함하는 뉴럴 네트워크 신호 측정기; 상기 뉴럴 네트워크 신호 측정기를 제어하고, 상기 뉴럴 네트워크 신호 측정기로부터 획득된 적어도 하나의 뉴럴 네트워크 신호에 기초하여 상기 사용자가 의도한 프롬프트를 식별하며, 상기 프롬프트에 영상 리소스를 매칭함으로써 생성되는 멀티미디어 컨텐츠를 출력하는 전자 장치; 및 상기 전자 장치로부터 상기 프롬프트를 획득하고, 상기 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여 상기 프롬프트에 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성하는 서버; 를 포함할 수 있다.A system for generating multimedia content based on a neural network signal according to one embodiment may include: a neural network signal measuring device including sensors for measuring at least one type of neural network signal; an electronic device for controlling the neural network signal measuring device, identifying a prompt intended by the user based on at least one neural network signal obtained from the neural network signal measuring device, and outputting multimedia content generated by matching an image resource to the prompt; and a server for obtaining the prompt from the electronic device and generating multimedia content by matching an image resource to the prompt based on core keywords of paragraphs including one or more sentences included in the prompt.

또 다른 실시 예에 의하면 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 시스템의 동작 방법은, 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여, 상기 사용자가 의도한 프롬프트를 식별하는 단계; 상기 식별된 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여, 상기 프롬프트에 영상 리소스를 매칭하는 단계; 및 상기 프롬프트에 매칭된 상기 영상 리소스를 합성함으로써 멀티미디어 컨텐츠를 생성하는 단계; 를 포함할 수 있다.According to another embodiment, a method of operating a system for generating multimedia content based on a neural network signal may include: identifying a prompt intended by a user based on at least one type of neural network signal; matching a video resource to the prompt based on core keywords of paragraphs including one or more sentences included in the identified prompt; and generating multimedia content by synthesizing the video resource matched to the prompt.

또 다른 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템이 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 방법에 있어서, 외부 디바이스로부터 획득되는 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여, 상기 사용자가 의도한 프롬프트를 식별하는 단계; 상기 식별된 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여, 상기 프롬프트에 영상 리소스를 매칭하는 단계; 및 상기 프롬프트에 매칭된 상기 영상 리소스를 합성함으로써 멀티미디어 컨텐츠를 생성하는 단계; 를 포함하는, 방법을 수행하도록 하는 프로그램이 저장된 컴퓨터로 읽을 수 있는 기록 매체가 제공될 수 있다.According to another embodiment, a method for generating multimedia content based on a neural network signal by a multimedia content generation system may be provided, the method comprising: identifying a prompt intended by a user based on at least one type of neural network signal acquired from an external device; matching a video resource to the prompt based on core keywords of paragraphs including one or more sentences included in the identified prompt; and generating multimedia content by synthesizing the video resource matched to the prompt; a computer-readable recording medium having stored thereon a program that causes the computer to perform the method.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 개시에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly explained, and the present disclosure will be described in detail.

본 개시에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in this disclosure are selected from the most widely used general terms possible while considering the functions of this disclosure, but they may vary depending on the intention of engineers working in the field, precedents, the emergence of new technologies, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meanings thereof will be described in detail in the description of the relevant invention. Therefore, the terms used in this disclosure should be defined based on the meanings of the terms and the overall contents of this disclosure, rather than simply the names of the terms.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When a part of the specification is said to "include" a component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise specifically stated. In addition, terms such as "part", "module", etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software, or a combination of hardware and software.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the attached drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily practice the present disclosure. However, the present disclosure may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present disclosure in the drawings, parts that are not related to the description are omitted, and similar parts are assigned similar drawing reference numerals throughout the specification.

일 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템(10)은 뉴럴 네트워크 신호에 기초하여 사용자가 의도한 프롬프트를 식별하고, 식별된 프롬프트에 인공지능 모델을 적용함으로써 멀티미디어 컨텐츠를 생성 및 합성할 수 있다. 일 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템(10)은 뉴럴 네트워크 신호 측정기(120), 전자 장치(1000) 및 서버(2000)를 포함할 수 있다.According to one embodiment, the multimedia content creation system (10) can identify a prompt intended by a user based on a neural network signal and generate and synthesize multimedia content by applying an artificial intelligence model to the identified prompt. According to one embodiment, the multimedia content creation system (10) can include a neural network signal measurement device (120), an electronic device (1000), and a server (2000).

그러나, 상술한 예에 한정되는 것은 아니고, 또 다른 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템(10)은 네트워크(3000)를 통해 연결되는 전자 장치(4000)와 기타 생체 신호 측정기(140)를 더 포함할 수도 있다. 도 1에 도시된 전자 장치(1000)와 전자장치(4000)는 멀티미디어 컨텐츠 생성 시스템(10)이 제공하는 멀티미디어 컨텐츠 제공 서비스를 이용하는 서로 다른 사용자들이 이용하는 단말 내지 컴퓨터 장치일 수 있다.However, it is not limited to the above-described example, and according to another embodiment, the multimedia content creation system (10) may further include an electronic device (4000) connected via a network (3000) and other bio-signal measuring devices (140). The electronic device (1000) and the electronic device (4000) illustrated in FIG. 1 may be terminals or computer devices used by different users who utilize the multimedia content provision service provided by the multimedia content creation system (10).

일 실시 예에 의하면, 뉴럴 네트워크 신호 측정기(120)는 비침습형 센서(122) 또는 뇌침습형 센서(124) 중 적어도 하나를 포함할 수 있다. 또 다른 예에 의하면, 뉴럴 네트워크 신호 측정기(120)는 전자 장치(1000)와 연결됨으로써 데이터들을 송수신하는 네트워크 인터페이스, 상기 네트워크 인터페이스, 비침습형 센서(122) 또는 뇌침습형 센서(124) 중 적어도 하나를 제어하는 프로세서 및 상기 프로세서에 의해 실행되는 인스트럭션을 저장하는 메모리를 더 포함할 수 있다. 예를 들어, 뉴럴 네트워크 신호 측정기(120)는 적어도 하나의 타입의 센서들을 이용하여 사용자로부터 적어도 하나의 타입의 뉴럴 네트워크 신호를 획득할 수 있다.In one embodiment, the neural network signal measuring device (120) may include at least one of a non-invasive sensor (122) or a brain-invasive sensor (124). In another example, the neural network signal measuring device (120) may further include a network interface that is connected to an electronic device (1000) to transmit and receive data, a processor that controls the network interface, at least one of the non-invasive sensor (122) or the brain-invasive sensor (124), and a memory that stores instructions executed by the processor. For example, the neural network signal measuring device (120) may obtain at least one type of neural network signal from a user by using at least one type of sensors.

일 실시 예에 의하면, 비침습형(Non-Invasive) 센서(122)는 두피 표면에 장착되는 EEG 센서, MEG 센서, 또는 적외선 기반의 fNIRS(Functional Near-Infrared spectroscopy) 센서 중 적어도 하나를 포함할 수 있으며, 뇌혈류 변화를 측정하는 fMRI장치, PET(Positron Emission Tomography) 장치, SPECT(Single Photon Emission Computed Tomography) 장치를 포함할 수도 있다. 또한, 일 실시 예에 의하면 뇌 침습형(Invasive) 센서(124)는, 뇌 피질의 표면에 연결되는 피질 전도 신호(EcoG) 센서, 단일 신호 측정 센서, 스파이크 신호 측정 센서를 포함할 수 있다.In one embodiment, the non-invasive sensor (122) may include at least one of an EEG sensor, an MEG sensor, or an infrared-based fNIRS (Functional Near-Infrared spectroscopy) sensor mounted on the scalp surface, and may also include an fMRI device, a PET (Positron Emission Tomography) device, or a SPECT (Single Photon Emission Computed Tomography) device for measuring cerebral blood flow changes. In addition, in one embodiment, the brain invasive sensor (124) may include a cortical conduction signal (EcoG) sensor, a single signal measurement sensor, or a spike signal measurement sensor connected to the surface of the cerebral cortex.

일 실시 예에 의하면, 뉴럴 네트워크 신호는 뇌전도(EEG) 신호, 피질 전도 신호(ECoG), 스파이크 신호 중 적어도 하나를 포함할 수 있으나, 이에 한정되는 것은 아니며, 사용자의 뇌 신경사이에 전달되는 전기 신호 또는 자기 신호에 의해 발생하는 뇌 신경 응답 신호들을 포함할 수 있다. 뉴럴 네트워크 신호 측정기(120)는 적어도 하나의 타입의 센서들에 대응되는 복수의 채널들로부터 뉴럴 네트워크 신호들을 획득할 수 있다. 또한, 뉴럴 네트워크 신호 측정기(120)는 복수의 채널들을 이용하여, 시간의 흐름에 따라 소정의 주기로 사용자의 행동 또는 생각에 관한 이벤트 별 뉴럴 네트워크 신호들을 측정할 수 있다.According to one embodiment, the neural network signal may include, but is not limited to, at least one of an electroencephalogram (EEG) signal, an electrocortical conduction signal (ECoG), and a spike signal, and may include brain nerve response signals generated by electric signals or magnetic signals transmitted between brain nerves of the user. The neural network signal measuring device (120) may obtain neural network signals from a plurality of channels corresponding to at least one type of sensors. In addition, the neural network signal measuring device (120) may measure neural network signals for each event related to the user's behavior or thoughts at a predetermined cycle over time by using the plurality of channels.

생체 신호 측정기(140)는 뉴럴 네트워크 신호와 독립적으로 사용자의 생체 신호를 측정할 수 있다. 일 실시 예에 의하면, 생체 신호 측정기(140)는 사용자의 심전도 신호(ECG), 근전도 신호(EMG), 안전도 신호(EOG) 중 적어도 하나를 측정할 수 있다.The biosignal measuring device (140) can measure the user's biosignal independently of the neural network signal. According to one embodiment, the biosignal measuring device (140) can measure at least one of the user's electrocardiogram (ECG), electromyogram (EMG), and electrooculogram (EOG) signals.

전자 장치(1000)는, S152에서, 뉴럴 네트워크 신호 측정기(120)로부터 적어도 하나의 타입의 뉴럴 네트워크 신호를 획득하고, S154에서, 상기 획득된 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여 사용자가 의도한 프롬프트를 식별할 수 있다. 예를 들어, 전자 장치(1000)는 뉴럴 네트워크 신호 측정기(120)를 제어하고, 뉴럴 네트워크 신호 측정기로부터 획득된 적어도 하나의 뉴럴 네트워크 신호에 기초하여 사용자가 의도한 프롬프트를 식별하며, 프롬프트에 영상 리소스를 매칭함으로써 생성되는 멀티미디어 컨텐츠를 출력할 수 있다.The electronic device (1000) may, at S152, obtain at least one type of neural network signal from the neural network signal measuring device (120), and, at S154, identify a prompt intended by the user based on the obtained at least one type of neural network signal. For example, the electronic device (1000) may control the neural network signal measuring device (120), identify a prompt intended by the user based on the at least one neural network signal obtained from the neural network signal measuring device, and output multimedia content generated by matching an image resource to the prompt.

일 실시 예에 의하면, 전자 장치(1000)는 카메라(130) 또는 디스플레이(131) 중 적어도 하나를 포함할 수 있다. 전자 장치(1000)는 디스플레이를 통해 제공되는 시각적 자극을 확인한 사용자의 뉴럴 네트워크 신호와, 카메라를 통해 획득된 사용자의 안면 이미지로부터 결정되는 아이트래킹 신호에 기초하여 프롬프트를 식별할 수 있다. 일 실시 예에 의하면, 프롬프트는 하나 이상의 문장들, 하나 이상의 문장들을 포함하는 문단들, 상기 문단들을 포함하는 텍스트를 포함할 수 있다.In one embodiment, the electronic device (1000) may include at least one of a camera (130) and a display (131). The electronic device (1000) may identify a prompt based on a neural network signal of a user who has identified a visual stimulus provided through the display and an eye tracking signal determined from a facial image of the user acquired through the camera. In one embodiment, the prompt may include one or more sentences, paragraphs including one or more sentences, or text including the paragraphs.

S156에서, 서버(2000)는 전자 장치(1000)로부터 획득된 프롬프트에 포함된 문단 별 영상 리소스를 매칭할 수 있다. 일 실시 예에 의하면 서버(2000)는 전자 장치로부터 프롬프트를 획득하고, 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여 상기 프롬프트에 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성할 수 있다. 예를 들어, 서버(2000)는 텍스트 또는 프롬프트가 입력되면, 텍스트 또는 요약 텍스트내 문단 별 키워드 기반 영상 리소스를 매칭하는 인공지능 기반 영상 생성 모델을 이용하여, 영상 리소스를 매칭할 수도 있다. S158에서, 서버(2000)는 프롬프트에 영상 리소스들을 매칭함으로써 멀티미디어 컨텐츠를 생성할 수 있다. 상술한 S156 내지 S158은 서버(2000)에 의해 수행될 수도 있고, 전자 장치(1000)에 의해 수행될 수도 있음은 물론이다.In S156, the server (2000) can match paragraph-wise image resources included in the prompt acquired from the electronic device (1000). According to one embodiment, the server (2000) can generate multimedia content by acquiring a prompt from the electronic device and matching image resources to the prompt based on core keywords of paragraphs including one or more sentences included in the prompt. For example, when text or a prompt is input, the server (2000) can match image resources using an artificial intelligence-based image generation model that matches image resources based on keywords of paragraphs in text or summary text. In S158, the server (2000) can generate multimedia content by matching image resources to the prompt. It goes without saying that the above-described S156 to S158 may be performed by the server (2000) or may be performed by the electronic device (1000).

일 실시 예에 의하면, 전자 장치(1000)는 사용자(210)의 신체의 외부 또는 내부에 장착되는 뉴럴 네트워크 신호 측정기(220)로부터 적어도 하나의 타입의 뉴럴 네트워크 신호들을 획득할 수 있다. 전자 장치(1000)는 뉴럴 네트워크 신호(230)로부터, 시간의 흐름에 따른 뇌파(242), SMR(243), ERP(244) 또는 SSEP(246)와 같은 특징들을 측정할 수 있다. 또한 도 2에는 도시되지 않았지만, 전자 장치(1000)는 뉴럴 네트워크 신호 측정기외, 생체 신호 측정기(140)로부터 적외선 센서 기반 사용자 뇌혈류 변화량에 관한 fNIRS 특징, 자기공명 영상 데이터로부터 호기득되는 fMRI 특징, PET 특징, SPECT 특징을 더 획득할 수도 있다.According to one embodiment, the electronic device (1000) can obtain at least one type of neural network signals from a neural network signal measuring device (220) mounted externally or internally on the body of the user (210). The electronic device (1000) can measure features such as brain waves (242), SMR (243), ERP (244), or SSEP (246) over time from the neural network signal (230). In addition, although not shown in FIG. 2, the electronic device (1000) can further obtain, in addition to the neural network signal measuring device, an fNIRS feature related to a change in the user's cerebral blood flow based on an infrared sensor from a biosignal measuring device (140), an fMRI feature obtained from magnetic resonance imaging data, a PET feature, and a SPECT feature.

전자 장치(1000)는 뉴럴 네트워크 신호 측정기로부터 획득되는 적어도 하나의 타입의 뉴럴 네트워크 신호와 적어도 하나의 타입의 다중 생체 신호들을 조합함으로써 사용자가 의도한 프롬프트를 식별할 수 있다. 일 실시 예에 의하면, 뉴럴 네트워크 신호 측정기(220)는 뇌 침습형 또는 비침습형 타입 중 적어도 하나에 관한 전극 센서들(221), 전극 센서들(221)에서 획득되는 전위 신호들을 디지털 신호로 변환하기 위한 ADC 회로(222), 획득된 전위 신호들 또는 ADC 변환 값을 전자 장치(1000)로 유선 또는 무선으로 전달하기 위한 네트워크 인터페이스(223)를 포함할 수 있다. 또한, 뉴럴 네트워크 신호 측정기(220)는 전극 센서들(221), ADC 회로(222) 및 네트워크 인터페이스(223)의 동작을 제어하기 위한 인스트럭션을 저장하는 메모리(224) 및 메모리(224)에 저장된 인스트럭션을 실행하는 적어도 하나의 프로세서(225)를 포함할 수 있다. 전자 장치(1000)가 뉴럴 네트워크 신호 측정기(220)로부터 획득하는 뉴럴 네트워크 신호(230)는 뇌전도 신호(232), 피질 전도 신호(234), 스파이크 신호(236) 중 적어도 하나를 포함할 수 있다.The electronic device (1000) can identify a prompt intended by a user by combining at least one type of neural network signal acquired from a neural network signal measuring device and at least one type of multiple bio-signals. According to one embodiment, the neural network signal measuring device (220) can include electrode sensors (221) of at least one type of brain invasive or non-invasive, an ADC circuit (222) for converting potential signals acquired from the electrode sensors (221) into digital signals, and a network interface (223) for transmitting the acquired potential signals or ADC conversion values to the electronic device (1000) by wire or wirelessly. In addition, the neural network signal measuring device (220) can include a memory (224) for storing instructions for controlling operations of the electrode sensors (221), the ADC circuit (222), and the network interface (223), and at least one processor (225) for executing the instructions stored in the memory (224). The neural network signal (230) obtained by the electronic device (1000) from the neural network signal measuring device (220) may include at least one of an electroencephalogram signal (232), a cortical conduction signal (234), and a spike signal (236).

S310에서, 전자 장치(1000)는 외부 디바이스로부터 획득되는 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여, 상기 사용자가 의도한 프롬프트를 식별할 수 있다. S320에서, 전자 장치(1000)는 식별된 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여, 상기 프롬프트에 영상 리소스를 매칭할 수 있다. S330에서, 전자 장치(1000)는 프롬프트에 매칭된 상기 영상 리소스를 합성함으로써 멀티미디어 컨텐츠를 생성할 수 있다.In S310, the electronic device (1000) can identify a prompt intended by the user based on at least one type of neural network signal acquired from an external device. In S320, the electronic device (1000) can match an image resource to the prompt based on a core keyword of paragraphs including one or more sentences included in the identified prompt. In S330, the electronic device (1000) can generate multimedia content by synthesizing the image resource matched to the prompt.

전자 장치(1000)는 적대적 생성 신경망 기반의 인공지능 모델을 이용하여 사용자의 뉴럴 네트워크 신호에 기초하여 프롬프트를 식별할 수 있다. 전자 장치(1000)가 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여 프롬프트를 식별하는 동작은 후술하는 도 4를 참조하여 구체적으로 설명하기로 한다.The electronic device (1000) can identify a prompt based on a user's neural network signal by using an artificial intelligence model based on an adversarial generative neural network. An operation of the electronic device (1000) to identify a prompt based on at least one type of neural network signal will be specifically described with reference to FIG. 4 described below.

전자 장치(1000)는 프롬프트에 포함된 하나 이상의 문단들 별 핵심 키워드에 기초하여 영상 리소스들을 매칭할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)는 문단들 별로 결정되는 핵심 키워드가 다의어 또는 동음이의어인지 여부에 따라 핵심 키워드가 가질 수 있는 타겟 의미를 식별할 수 있다. 전자 장치(1000)는 핵심 키워드의 문맥에 맞는 타겟 의미 또는 상기 타겟 의미 및 상기 핵심 키워드와 의존 관계에 있는 형태소 키워드들 중 문단 내 사용횟수에 따른 우선순위에 기초하여 결정되는 서브 키워드에 기초하여, 문단 별로 영상 리소스를 매칭할 수 있다.The electronic device (1000) can match video resources based on core keywords of one or more paragraphs included in a prompt. According to one embodiment, the electronic device (1000) can identify a target meaning that the core keyword may have based on whether the core keyword determined for each paragraph is a polysemous word or a homonym. The electronic device (1000) can match video resources for each paragraph based on a target meaning that fits the context of the core keyword or a sub-keyword that is determined based on a priority according to the number of times it is used in the paragraph among morpheme keywords that are dependent on the target meaning and the core keyword.

전자 장치(1000)는 문단 별 키워드 빈도수와, 각 빈도수에 대해 누적 측정되는 사용자의 피드백, 문단 별로 결정되는 키워드의 빈도수 및 상기 키워드의 연결 관계에 관한 정보를 포함하는 키워드 네트워크 중 적어도 하나에 기초하여 문단 별 핵심 키워드를 결정할 수 있다. 전자 장치(1000)는 기 설정된 다의어 또는 동음이의어 리스트에 기초하여 문단 별로 결정된 핵심 키워드가 동음이의어 또는 다의어인지 여부를 결정할 수 있다.The electronic device (1000) may determine a core keyword for each paragraph based on at least one of a keyword network including a keyword frequency for each paragraph, user feedback accumulated for each frequency, a frequency of keywords determined for each paragraph, and information on a relationship between the keywords. The electronic device (1000) may determine whether a core keyword determined for each paragraph is a homonym or a polysemous word based on a preset list of polysemous or synonymous words.

전자 장치(1000)는 프롬프트의 길이에 따라 결정되는 어텐션(ATTENTION)에 기초하여 결정되는 가중치를 포함하는 시퀀스 투 시퀀스 모델을 프롬프트에 적용함으로써 요약 프롬프트를 생성할 수 있고, 생성된 요약 프롬프트 내 문단들의 키워드 네트워크 유사도에 기초하여 문단들을 병합하거나 분할함으로써 문단 분할 수준을 조정할 수 있으며, 조정된 문단 분할 수준에 따라 생성된 분할 문단들 별로 영상 리소스를 매칭할 수 있다. 영상 리소스들은 핵심 키워드, 타겟 의미 또는 타겟 의미와 의존 관계에 있는 서브 키워드들 중 적어도 하나를 포함하는 라벨링 값들에 의해 라벨링될 수 있다. 전자 장치(1000)는 영상 리소스들의 라벨링 값들에 기초하여 프롬프트에 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성할 수 있다.The electronic device (1000) can generate a summary prompt by applying a sequence-to-sequence model including a weight determined based on ATTENTION determined based on a length of a prompt to the prompt, and can adjust a paragraph division level by merging or splitting paragraphs based on keyword network similarities of paragraphs in the generated summary prompt, and can match image resources for each of the split paragraphs generated based on the adjusted paragraph division level. The image resources can be labeled by labeling values including at least one of a core keyword, a target meaning, or sub-keywords dependent on the target meaning. The electronic device (1000) can generate multimedia content by matching an image resource to a prompt based on the labeling values of the image resources.

또 다른 예에 의하면, 도 3에 도시된 S310 내지 S330 중 적어도 하나의 동작은 전자 장치와 연결된 서버 또는 전자 장치와 서버의 연동에 의해 수행될 수 있음은 물론이다.As another example, it goes without saying that at least one of the operations S310 to S330 illustrated in FIG. 3 can be performed by a server connected to an electronic device or by linkage between an electronic device and a server.

도 4의 그림 (410) 내지 그림 (420)을 참조하여 전자 장치가 사용자의 뉴럴 네트워크 신호에 기초하여 텍스트 이미지를 식별하는 과정을 설명하기로 한다. 전자 장치(1000)는 적대적 생성 신경망(Generative Adversarial Network, GAN) 기반의 인공지능 모델(410)을 이용하여 뉴럴 네트워크 신호로부터 텍스트를 식별할 수 있다. 일 실시 예에 의하면, 적대적 생성 신경망 기반의 인공지능 모델(410)은 구별기(412), 생성기(414), 잠재벡터 생성기(416) 및 잠재 벡터 변환 네트워크(418)를 포함할 수 있다.Referring to Figures (410) to (420) of FIG. 4, a process in which an electronic device identifies a text image based on a user's neural network signal will be described. The electronic device (1000) can identify text from a neural network signal using an artificial intelligence model (410) based on a generative adversarial network (GAN). According to one embodiment, the artificial intelligence model (410) based on a generative adversarial network can include a discriminator (412), a generator (414), a latent vector generator (416), and a latent vector transformation network (418).

일 실시 예에 의하면, 적대적 생성 신경망 기반의 인공지능 모델(410)은 생성기(414)와 구별기(412)를 이용하여 뉴럴 네트워크 신호로부터 획득 가능한 텍스트 이미지 수준이 점진적으로 개선된, 텍스트 이미지를 생성할 수 있다. 예를 들어, 생성기(414)는 뉴럴 네트워크 신호에 기초하여 텍스트 이미지를 생성할 수 있다. 구별기(412)는 실제 텍스트 이미지와 가상의 텍스트 이미지를 입력으로 하여 참과 거짓을 구별할 수 있다. 생성기(414)는 구별기(412)는 판별에 따라 실제 텍스트 이미지와 유사한 가상의 텍스트 이미지를 생성하도록 학습되고, 구별기(412)는 실제 텍스트 이미지와 가상의 텍스트 이미지를 구별할 수 있도록 학습될 수 있다. 잠재 벡터 생성기(416)는 잠재 공간(LATENT SPACE)로부터 샘플링되어 생성기의 입력으로 사용될 잠재 벡터(LATENT VECTOR)를 생성할 수 있다.According to one embodiment, an artificial intelligence model (410) based on an adversarial generative neural network can generate a text image whose level of text image obtainable from a neural network signal is gradually improved by using a generator (414) and a discriminator (412). For example, the generator (414) can generate a text image based on a neural network signal. The discriminator (412) can input a real text image and a virtual text image and distinguish between true and false. The generator (414) can be trained so that the discriminator (412) generates a virtual text image similar to the real text image according to the determination, and the discriminator (412) can be trained so that the real text image and the virtual text image can be distinguished. The latent vector generator (416) can generate a latent vector (LATENT VECTOR) to be sampled from a latent space (LATENT SPACE) and used as an input of the generator.

일 실시 예에 의하면, 전자 장치(1000)는 제1 뉴럴 네트워크 신호(411)를 획득하고, 획득된 제1 뉴럴 네트워크 신호(411)를 생성기(414)로 전달할 수 있다. 생성기(414)는 제1 뉴럴 네트워크 신호(411)로부터 제1 텍스트 이미지 'X'(421)를 생성할 수 있다. 또 다른 예에 의하면, 생성기(414)는 잠재 벡터 생성기가, 제1 뉴럴 네트워크 신호(411)로부터 생성한 제1 잠재 벡터를 획득하고, 획득된 제1 잠재 벡터로부터 제1 텍스트 이미지(421)를 생성할 수 있다.According to one embodiment, the electronic device (1000) may obtain a first neural network signal (411) and transmit the obtained first neural network signal (411) to a generator (414). The generator (414) may generate a first text image 'X' (421) from the first neural network signal (411). According to another example, the generator (414) may obtain a first latent vector generated by a latent vector generator from the first neural network signal (411) and generate a first text image (421) from the obtained first latent vector.

전자 장치(1000)는 생성된 제1 텍스트 이미지(421)를 디스플레이 상에 출력하고, 디스플레이상에 출력된 제1 텍스트 이미지(421)를 확인한 사용자의 제2 뉴럴 네트워크 신호(423)를 뉴럴 네트워크 신호 측정기(120)로부터 획득할 수 있다. 전자 장치(1000)는 제1 텍스트 이미지(421)를 확인한 사용자의 제2 뉴럴 네트워크 신호를 적대적 생성 신경망 기반의 인공지능 모델에 입력함으로써 제2 텍스트 이미지(431)를 획득할 수 있다. 보다 상세하게는, 전자 장치(1000)는 제2 뉴럴 네트워크 신호(423)를 잠재 벡터 생성기(416)에 입력함으로써, 제2 잠재 벡터(425)를 획득할 수 있다. 전자 장치(1000)는 제2 뉴럴 네트워크 신호의 주파수 영역 별 파워 스펙트럼 특징 및 상기 특징들이 나타내는 패턴과, 상기 제1 텍스트 이미지가 시각적으로 제공된 시간과 상기 제2 뉴럴 네트워크 신호가 획득된 ERPs(Event Related Potentials)의 시간적 유사도를 결정할 수 있다.The electronic device (1000) can output the generated first text image (421) on the display, and obtain a second neural network signal (423) of a user who has confirmed the first text image (421) output on the display from a neural network signal measuring device (120). The electronic device (1000) can obtain a second text image (431) by inputting the second neural network signal of the user who has confirmed the first text image (421) into an artificial intelligence model based on an adversarial generative neural network. More specifically, the electronic device (1000) can obtain a second latent vector (425) by inputting the second neural network signal (423) into a latent vector generator (416). The electronic device (1000) can determine the power spectrum features of the second neural network signal in the frequency domain and the pattern represented by the features, and the temporal similarity between the time at which the first text image is visually provided and the ERPs (Event Related Potentials) from which the second neural network signal is acquired.

잠재 벡터 변환 네트워크(418)는 제2 뉴럴 네트워크 신호의 주파수 영역 별 파워 스펙트럼 특징 및 패턴과, 상기 시간적 유사도에 기초하여 제2 잠재 벡터(425)를 제3 잠재 벡터(427)로 변환할 수 있다. 전자 장치(1000)는 제3 잠재 벡터(427)를 적대적 생성 신경망 기반의 인공지능 모델의 생성기(414)에 재입력함으로써 재구성된 제2 텍스트 이미지(431)를 획득할 수 있다. 전자 장치(1000)는 제2 텍스트 이미지(431)가 미리 설정된 제3 텍스트 이미지(435)의 유사도를 결정하고, 제3 텍스트 이미지와 유사하다고 식별되면, 제2 텍스트 이미지에 대응되는 텍스트 요소 또는 문자 요소를 식별함으로써, 사용자가 의도한 프롬프트를 결정할 수 있다.The latent vector transformation network (418) can transform the second latent vector (425) into a third latent vector (427) based on the power spectrum features and patterns of the second neural network signal in the frequency domain and the temporal similarity. The electronic device (1000) can obtain a reconstructed second text image (431) by re-inputting the third latent vector (427) into the generator (414) of the artificial intelligence model based on the adversarial generative neural network. The electronic device (1000) determines the similarity between the second text image (431) and a preset third text image (435), and if the second text image (431) is identified as being similar to the third text image, identifies a text element or character element corresponding to the second text image, thereby determining a prompt intended by the user.

후술하는 도 6에 도시된 바와 같이, 전자 장치(1000)는 프롬프트 입력을 위해 적어도 하나의 타입의 가상 입력 장치 이미지를 디스플레이상에 출력하고, 뉴럴 네트워크 신호 측정기로부터, 가상 입력 장치 이미지를 확인한 사용자의 제1 뉴럴 네트워크 신호를 획득할 수 있다. 전자 장치(1000)는 도 4에서 상술한 방법에 따라 제2 텍스트 이미지가 식별되면, 제2 텍스트 이미지와 동일한 텍스트 요소 또는 자판 요소를 포함하는 자판 영역을, 가상 입력 장치 이미지상에서 식별할 수 있다.As illustrated in FIG. 6 described below, the electronic device (1000) can output at least one type of virtual input device image on a display for a prompt input, and obtain a first neural network signal of a user who has confirmed the virtual input device image from a neural network signal measuring device. When a second text image is identified according to the method described above in FIG. 4, the electronic device (1000) can identify a keyboard area including the same text element or keyboard element as the second text image on the virtual input device image.

S510에서, 전자 장치(1000)는 제1 뉴럴 네트워크 신호로부터 생성한 제2 텍스트 이미지와 상기 제3 텍스트 이미지의 유사도가 임계 유사도 이상으로 식별되면, 상기 적어도 하나의 타입의 가상 입력 장치 이미지상 상기 제2 텍스트 이미지에 대응되는 자판 영역을 식별할 수 있다. 예를 들어, 전자 장치(1000)는 제2 텍스트 이미지의 모양 특징 또는 신경망(예컨대 CNN 출력 값 특징) 기반 특징과, 가상 입력 장치 이미지에 포함된 자판 영역상 텍스트 이미지의 모양 특징 또는 신경망 기반 특징을 비교함으로써 유사도를 결정할 수 있고, 유사도가 임계 유사도 이상으로 식별되는 가상 입력 장치 이미지상 자판 영역을 식별할 수 있다.In S510, if the electronic device (1000) identifies that the similarity between the second text image generated from the first neural network signal and the third text image is greater than or equal to a threshold similarity, the electronic device (1000) can identify a keyboard area corresponding to the second text image on the at least one type of virtual input device image. For example, the electronic device (1000) can determine the similarity by comparing shape features or neural network (e.g., CNN output value features)-based features of the second text image with shape features or neural network-based features of the text image on the keyboard area included in the virtual input device image, and can identify a keyboard area on the virtual input device image whose similarity is greater than or equal to a threshold similarity.

도 5에는 도시되지 않았지만, 또 다른 실시 예에 의하면, 전자 장치(1000)는 제1 뉴럴 네트워크 신호에 기초하여, 가상 입력 장치의 자판 영역에 대응되는 텍스트 요소 또는 문자 요소가 아닌, 단어 또는 어절 단위의 제1 텍스트 이미지를 생성하고, 상기 생성된 단어 또는 어절 단위의 제1 텍스트 이미지를 확인하는 사용자의 제2 뉴럴 네트워크 신호를 적대적 생성 신경망 모델에 입력함으로써, 상기 적대적 생성 신경망 모델로부터 단어 또는 어절 단위의 제2 텍스트 이미지르 획득하며, 상기 제2 텍스트 이미지가 미리 설정된 제3 텍스트 이미지와 유사한지 여부에 기초하여, 상기 제2 텍스트 이미지에 대응되는 단어 또는 어절을 포함하는 프롬프트를 식별할 수도 있다.Although not illustrated in FIG. 5, according to another embodiment, the electronic device (1000) generates a first text image in the unit of a word or phrase, rather than a text element or character element corresponding to a keyboard area of a virtual input device, based on a first neural network signal, and inputs a second neural network signal of a user confirming the generated first text image in the unit of a word or phrase into an adversarial generative neural network model, thereby obtaining a second text image in the unit of a word or phrase from the adversarial generative neural network model, and may identify a prompt including a word or phrase corresponding to the second text image based on whether the second text image is similar to a preset third text image.

S520에서, 전자 장치(1000)는 자판 영역이 식별되면, 미리 설정된 검증 시간 동안, 상기 자판 영역에 대응되는 텍스트 요소 또는 문자 요소를 선택할 것인지 여부에 대한 사용자의 의사를 검증할 수 있다. 예를 들어, 전자 장치(1000)는 제1 뉴럴 네트워크 신호로부터 생성된 텍스트 이미지에 대응되는 자판 영역이 식별되면, 바로 자판 영역에 대응되는 텍스트 요소를 입력된 텍스트로 결정하는 것이 아니라, 해당 자판 영역에 대응되는 텍스트 요소 또는 문자 요소를 사용자가 선택하려는 의사가 있는지 여부를 검증함으로써, 텍스트 입력 정확도를 향상시킬 수 있다. S530에서, 전자 장치(1000)는 검증 결과에 기초하여 상기 자판 영역에 대응되는 텍스트 요소 또는 문자 요소를 식별할 수 있다.In S520, when a keyboard area is identified, the electronic device (1000) can verify the user's intention as to whether to select a text element or character element corresponding to the keyboard area during a preset verification time. For example, when a keyboard area corresponding to a text image generated from a first neural network signal is identified, the electronic device (1000) does not immediately determine the text element corresponding to the keyboard area as the input text, but verifies whether the user has an intention to select the text element or character element corresponding to the keyboard area, thereby improving text input accuracy. In S530, the electronic device (1000) can identify the text element or character element corresponding to the keyboard area based on the verification result.

도 6을 참조하면, 전자 장치(1000)가 제공하는 가상 입력 장치(610)의 일 예가 도시된다. 가상 입력 장치(610)는 디스플레이상 일부 영역에 제공될 수 있고, 가상 입력 장치를 통해 실시간으로 입력된 텍스트들이 표시되는 인터페이스 영역과 구분되어 디스플레이상에 제공될 수도 있다. 또한, 일 실시 예에 의하면, 가상 입력 장치는 본 개시에 따른 영상 생성/합성 서비스를 이용하는 사용자들이 메타 버스 공간상 영상 생성/합성 서비스를 수행하는 과정에서 이용될 수도 있다. 이를 위해, 상기 가상 입력 장치 인터페이스들은 다중 접속 허용이 가능한 메타버스 공간상에서 이미지 또는 3D 모델링화되어 제공될 수 도 있다. 가상 입력 장치(610)는 복수의 자판 영역들(614, 618)을 포함할 수 있다. 전자 장치(1000)는 사용자의 뉴럴 네트워크 신호에 기초하여 1차적으로 식별된 자판 영역(612)을 다른 자판 영역과 다르게 시각적으로 표시할 수 있다.Referring to FIG. 6, an example of a virtual input device (610) provided by an electronic device (1000) is illustrated. The virtual input device (610) may be provided in a portion of a display area, and may be provided on the display separately from an interface area where texts input in real time through the virtual input device are displayed. In addition, according to an embodiment, the virtual input device may be used by users of the image generation/synthesis service according to the present disclosure in a process of performing the image generation/synthesis service in a metaverse space. To this end, the virtual input device interfaces may be provided as images or 3D modeled in a metaverse space that allows multiple accesses. The virtual input device (610) may include a plurality of keyboard areas (614, 618). The electronic device (1000) may visually display a keyboard area (612) that is primarily identified based on a user's neural network signal differently from other keyboard areas.

예를 들어, 전자 장치(1000)는 사용자의 뉴럴 네트워크 신호에 기초하여 식별된 자판 영역을 (623)을 시각적 특성(색상특성, 글자크기 특성, 자판 면적 또는 크기 특성)에 기초하여 변조함으로써 변조된 자판 영역(625)을 생성할 수 있다. 전자 장치(1000)는 변조된 자판 영역(625)과 변조전 자판 영역(623)을 미리 설정된 주기(622)로 번갈아가면서 디스플레이상 또는 자판 영역의 일부 영역에 중첩하여 출력할 수 있다. 주기(622)는 변조전 자판 영역(623)이 처음 표시된 시점으로부터, 변조된 자판 영역 (625)이 표시된 이후, 변조 전 자판 영역(627)이 다시 출력되는 시점까지의 시간을 의미할 수 있다.For example, the electronic device (1000) may generate a modulated keyboard area (625) by modulating a keyboard area (623) identified based on a user's neural network signal based on visual characteristics (color characteristics, font size characteristics, keyboard area or size characteristics). The electronic device (1000) may output the modulated keyboard area (625) and the pre-modulated keyboard area (623) alternately at a preset cycle (622) so as to overlap them on a display or a portion of the keyboard area. The cycle (622) may mean the time from the time when the pre-modulated keyboard area (623) is first displayed to the time when the pre-modulated keyboard area (627) is output again after the modulated keyboard area (625) is displayed.

전자 장치(1000)는 사용자의 뉴럴 네트워크 신호에 기초하여 식별된 자판 영역을 시각적으로 변조 후, 변조된 자판 영역과 변조 전 자판 영역을 미리 설정된 주기로 번갈아 표시하면서, 사용자로 하여금 가상 입력 장치 이미지상 어떤 자판 영역이 식별된 상태인지에 대한 정보를 제공하고, 해당 자판 영역을 선택할 의사가 있는 지 여부에 대한 사용자 입력을 대기 중인 상태임을 나타낼 수 있다.The electronic device (1000) can visually modulate a keyboard area identified based on a user's neural network signal, and then alternately display the modulated keyboard area and the keyboard area before modulation at a preset cycle, thereby providing the user with information about which keyboard area is identified on the virtual input device image, and indicating that the device is waiting for a user input regarding whether or not the user intends to select the corresponding keyboard area.

또한, 일 실시 예에 의하면, 전자 장치(1000)는 소정의 검증 시간 동안, 사용자의 의사를 검증하기 위해 서로 다른 주파수로 플리커링되는 기호 패턴들을 포함하는 검증 영역(642)을 자판 영역의 일부 영역에 중첩하여 표시할 수 있다. 일 실시 예에 의하면, 검증 영역(642)은 제1 뉴럴 네트워크 신호에 기초하여 1차적으로 자판 영역이 식별되면, 소정의 검증 시간 동안만 가상 입력 장치 이미지 상 소정의 자판 영역에 중첩하여 표시될 수 있다. 전자 장치(1000)는 가상 입력 장치 이미지 내 자판 영역들에 검증 영역(642)을 항상 표시하는 것이 아니라, 소정의 검증 시간 동안만 텍스트 식별 정확도 향상을 위해 검증 영역을 표시함으로써 사용자의 시각적 불편을 최소화할 수 있다.In addition, according to one embodiment, the electronic device (1000) may display a verification area (642) including symbol patterns that flicker at different frequencies, overlapping a part of the keyboard area for a predetermined verification time to verify the user's intention. According to one embodiment, when the keyboard area is primarily identified based on the first neural network signal, the verification area (642) may be displayed overlapping a predetermined keyboard area on the virtual input device image only for a predetermined verification time. The electronic device (1000) may minimize visual discomfort to the user by displaying the verification area only for a predetermined verification time to improve text identification accuracy, rather than always displaying the verification area (642) on the keyboard areas within the virtual input device image.

전자 장치(1000)는 검증 영역에 대한 사용자의 제3 뉴럴 네트워크 신호 또는 검증 영역에 대한 사용자의 제1 아이트래킹 신호 중 적어도 하나에 기초하여, 검증 영역을 포함하는 자판 영역에 대응되는 텍스트 요소를 선택하려는 사용자의 의사를 검증할 수도 있다. 그림 (640)에 도시된 바와 같이, 일 실시 예에 따른 검증 영역(642)은 제1 타입의 주파수로 플리커링되는 제1 부분 검증 영역(644)과 제2 타입의 주파수로 플리커링되는 제2 부분 검증 영역(646)을 포함할 수 있다. 상기 제1 부분 검증 영역(644)과 제2 부분 검증 영역(646)에 나타나는 시각적 자극이 플리커링되는 주파수 영역은 1Hz에서 40Hz 사이의 주파수 영역 중 서로 다른 일부 주파수 영역으로 결정될 수 있다. 일 실시 예에 의하면 제1 부분 검증 영역(644)은 제1 주파수 값으로 플리커링될 수 있고, 제2 부분 검증 영역(646)은 상기 제1 주파수와 다른 주파수 값으로 플리커링될 수 있다.The electronic device (1000) may verify a user's intention to select a text element corresponding to a keyboard area including a verification area based on at least one of a user's third neural network signal for the verification area or a user's first eye tracking signal for the verification area. As illustrated in the drawing (640), a verification area (642) according to an embodiment may include a first partial verification area (644) flickering with a first type of frequency and a second partial verification area (646) flickering with a second type of frequency. Frequency ranges in which visual stimuli appearing in the first partial verification area (644) and the second partial verification area (646) flicker may be determined as different frequency ranges from 1 Hz to 40 Hz. According to an embodiment, the first partial verification area (644) may flicker with a first frequency value, and the second partial verification area (646) may flicker with a frequency value different from the first frequency.

또한, 일 실시 예에 의하면, 변조전 자판 영역(623)과 변조된 자판 영역(625)이 번갈아가며 표시되는 주파수(예컨대 빈도수)는 제1 부분 검증 영역(644)과 제2 부분 검증 영역(646)이 플리커링되는 주파수 값들을 포함하는 주파수 영역대 보다 낮은 주파수 영역대 중 하나의 주파수 값으로 결정될 수 있다. 이를 통해 전자 장치(1000)는 1차적으로 식별된 자판 영역을 사용자에게 표시함과 함께 해당 자판 영역을 선택하려는 의사가 있는지 여부를 검증하기 위한 검증 영역에 사용자의 집중을 효과적으로 유도할 수 있다.In addition, according to one embodiment, the frequency (e.g., frequency) at which the modulated keyboard area (623) and the modulated keyboard area (625) are alternately displayed may be determined as one of the frequency values in a frequency range lower than the frequency range including the frequency values at which the first partial verification area (644) and the second partial verification area (646) flicker. Through this, the electronic device (1000) can effectively induce the user's concentration on the verification area for verifying whether the user has an intention to select the corresponding keyboard area while displaying the primarily identified keyboard area to the user.

예를 들어, 전자 장치(1000)는 가상 입력 장치 이미지를 확인한 사용자의 제1 뉴럴 네트워크 신호에 기초하여 1차적으로 가상 입력 장치 이미지상 하나의 자판 영역이 식별되면, 해당 자판 영역에 따른 텍스트 요소를 선택할 의사가 있는지 여부를, 제3 뉴럴 네트워크 신호의 분석 결과에 기초하여 검증할 수 있다. 보다 상세하게는, 전자 장치(1000)는 소정의 검증 시간 동안 출력되는 검증 영역을 확인한 사용자의 제3 뉴럴 네트워크 신호를 획득하고, 획득된 제3 뉴럴 네트워크 신호의 푸리에 변환을 통한 주파수 도메인 영역에서, 서로 다른 주파수를 포함하는 주파수 영역대별 파워 스펙트럼 세기에 기초하여 자판 영역에 대응되는 텍스트 요소를 선택하려는 사용자의 의사를 검증할 수 있다.For example, if a keyboard area is primarily identified on a virtual input device image based on a first neural network signal of a user who has confirmed a virtual input device image, the electronic device (1000) can verify whether there is an intention to select a text element according to the keyboard area based on the analysis result of a third neural network signal. More specifically, the electronic device (1000) can obtain a third neural network signal of the user who has confirmed the verification area output for a predetermined verification time, and can verify the intention of the user to select a text element corresponding to the keyboard area based on a power spectrum intensity for each frequency domain including different frequencies in a frequency domain domain through Fourier transform of the obtained third neural network signal.

예를 들어, 자판 영역(643)이 1차적으로 식별된 상태에서 그림 (620)에 따라 식별된 자판 영역이 주기적으로 시각적으로 변조 표시되는 동안, 전자 장치(1000)는 제3 뉴럴 네트워크 신호로부터 제1 부분 검증 영역(644)의 플리커링 주파수에 대응되는 주파수 값을 포함하는 제1 주파수 영역대의 제1 파워 스펙트럼 세기를 결정할 수 있고, 제3 뉴럴 네트워크 신호로부터 제2 부분 검증 영역(646)의 플리커링 주파수에 대응되는 주파수 값을 포함하는 제2 주파수 영역대의 제2 파워 스펙트럼 세기를 결정할 수 있다. 전자 장치(1000)는 제1 파워 스펙트럼 세기와 제2 파워 스펙트럼 세기 중, 임계치 이상의 파워스펙트럼 세기를 나타내는 주파수 영역대의 주파수 값에 대응되는 부분 검증 영역의 플리커링 주파수를 식별할 수 있다.For example, while the keyboard area (643) is primarily identified and the keyboard area identified according to the drawing (620) is periodically visually modulated and displayed, the electronic device (1000) can determine a first power spectrum intensity of a first frequency domain including a frequency value corresponding to the flickering frequency of the first partial verification area (644) from the third neural network signal, and can determine a second power spectrum intensity of a second frequency domain including a frequency value corresponding to the flickering frequency of the second partial verification area (646) from the third neural network signal. The electronic device (1000) can identify a flickering frequency of the partial verification area corresponding to a frequency value of a frequency domain indicating a power spectrum intensity greater than a threshold value among the first power spectrum intensity and the second power spectrum intensity.

예를 들어, 전자 장치(1000)는 제1 파워 스펙트럼의 세기가 임계 파워스펙트럼 세기 이상으로 식별되는 경우, 사용자가 제1 부분 검증 영역(644)에 집중하고 있는 상태로 결정하고, 상기 자판 영역(643)에 대응되는 텍스트 요소를 정말로 선택하려는 의사가 검증된 것으로 결정할 수 있다. 또 다른 예로, 전자 장치(1000)는 제2 파워 스펙트럼의 세기가 임계 파워 스펙트럼 세기 이상으로 식별되는 경우, 사용자가 제2 부분 검증 영역(646)에 집중하고 있는 상태로 결정하고, 상기 1차적으로 식별된 자판 영역(643)에 대응되는 텍스트 요소를 선택하려는 의사가 검증되지 않은 것으로 결정할 수 있다. 또 다른 예에 의하면, 전자 장치(1000)는 제1 파워 스펙트럼의 세기와 제2 파워 스펙트럼의 세기를 비교하고, 파워 스펙트럼의 세기가 더 크게 식별되는 파워 스펙트럼 주파수 영역대의 주파수 값에 대응되는, 플리커링 주파수로 플리커링되고 있는 부분 검증 영역을 식별할 수도 있다.For example, if the intensity of the first power spectrum is identified as being greater than or equal to a threshold power spectrum intensity, the electronic device (1000) may determine that the user is focusing on the first partial verification area (644), and determine that the user's intention to select a text element corresponding to the keyboard area (643) has been verified. As another example, if the intensity of the second power spectrum is identified as being greater than or equal to a threshold power spectrum intensity, the electronic device (1000) may determine that the user is focusing on the second partial verification area (646), and determine that the user's intention to select a text element corresponding to the primarily identified keyboard area (643) has not been verified. As another example, the electronic device (1000) may compare the intensity of the first power spectrum and the intensity of the second power spectrum, and identify a partial verification area that is flickering at a flickering frequency corresponding to a frequency value of a power spectrum frequency range in which the intensity of the power spectrum is identified as being greater.

도 7을 참조하면 도 6을 참조하여 상술한 전자 장치가 사용자의 의사를 검증하는 흐름도가 도시된다. S710에서, 전자 장치(1000)는 검증 시간 동안, 상기 식별된 자판 영역을 시각적 특성에 기초하여 변조함으로써 생성되는 자판 영역과 변조 전 상기 식별된 자판 영역을 기 설정된 주기로 번갈아가면서 상기 디스플레이 상에 출력할 수 있다. S720에서, 전자 장치(1000)는 검증 시간 동안, 상기 사용자의 의사를 검증하기 위해 서로 다른 주파수로 플리커링되는 기호 패턴들을 포함하는 검증 영역을 상기 자판 영역의 일부 영역에 중첩하여 표시할 수 있다. 상기 검증 영역은 서로 다른 플리커링 주파수로 디스플레이 상에 표시되는 복수의 부분 검증 영역들을 포함할 수 있고, 플리커링 대상이 되는 기호 패턴들은 서로 다른 패턴으로 마련될 수 있다.Referring to FIG. 7, a flowchart for verifying a user's intention by the electronic device described above with reference to FIG. 6 is illustrated. In S710, the electronic device (1000) can alternately output a keyboard area generated by modulating the identified keyboard area based on visual characteristics and the identified keyboard area before modulation on the display at a preset cycle during a verification time. In S720, the electronic device (1000) can display a verification area including symbol patterns flickering at different frequencies to overlap a portion of the keyboard area in order to verify the user's intention during the verification time. The verification area can include a plurality of partial verification areas displayed on the display at different flickering frequencies, and symbol patterns to be flickered can be provided as different patterns.

S730에서, 전자 장치(1000)는 검증 영역에 대한 사용자의 제3 뉴럴 네트워크 신호 또는 상기 검증 영역에 대한 사용자의 제1 아이트래킹 신호 중 적어도 하나에 기초하여, 상기 검증 영역을 포함하는 자판 영역에 대응되는 텍스트 요소를 선택하려는 상기 사용자의 의사를 검증할 수 있다. 일 실시 예에 의하면 전자 장치(1000)는 제3 뉴럴 네트워크 신호에 포함된 서로 다른 주파수를 포함하는 주파수 영역 대별 파워 스펙트럼의 세기에 기초하여 자판 영역에 대응되는 텍스트 요소를 선택하려는 사용자의 의사를 검증할 수 있다. S740에서, 전자 장치(1000)는 상기 텍스트 요소를 선택하려는 사용자의 의사가 검증되면 상기 자판 영역에 대응되는 텍스트 요소를 식별할 수 있다.In S730, the electronic device (1000) can verify the user's intention to select a text element corresponding to a keyboard area including the verification area based on at least one of the user's third neural network signal for the verification area or the user's first eye tracking signal for the verification area. According to one embodiment, the electronic device (1000) can verify the user's intention to select a text element corresponding to the keyboard area based on the intensity of a power spectrum by frequency domain including different frequencies included in the third neural network signal. In S740, the electronic device (1000) can identify the text element corresponding to the keyboard area when the user's intention to select the text element is verified.

도 7에 도시되지는 않았지만 또 다른 실시 예에 의하면, 전자 장치(1000)는 제3 뉴럴 네트워크 신호에 포함된, 상기 주파수 영역대별 파워 스펙트럼의 세기들 중, 하나의 파워 스펙트럼의 세기가 임계치 이상으로 식별되더라도, 상기 제3 뉴럴 네트워크 신호의 알파 밴드 영역 에너지가 감소하고, 베타 밴드 에너지 영역이 증가하는 것으로 식별되면, 상기 제3 뉴럴 네트워크 신호에 따라 상기 텍스트 요소를 선택하려는 사용자의 의사가 검증되지 않은 것으로 결정할 수 있다. 예를 들어, 전자 장치(1000)는 소정의 검증 시간 동안 해당 자판 영역을 선택하려는 사용자 의사를 검증하기 위한 제3 뉴럴 네트워크 신호의 알파 밴드 영역(예컨대 8Hz 내지 12Hz)의 파워스펙트럼 세기와, 베타벤드 영역(12Hz에서 30Hz)의 파워스펙트럼의 세기를 결정할 수 있다.Although not illustrated in FIG. 7, according to another embodiment, the electronic device (1000) may determine that the user's intention to select the text element is not verified according to the third neural network signal when, among the intensities of the power spectra by frequency domain included in the third neural network signal, one power spectrum is identified as being higher than a threshold, and when it is identified that the alpha band energy of the third neural network signal decreases and the beta band energy region increases. For example, the electronic device (1000) may determine the power spectrum intensity of the alpha band region (e.g., 8 Hz to 12 Hz) of the third neural network signal and the power spectrum intensity of the beta band region (12 Hz to 30 Hz) for verifying the user's intention to select the corresponding keyboard area during a predetermined verification time.

전자 장치(1000)는 제3 뉴럴 네트워크 신호 내 알파 영역의 파워 스펙트럼의 세기가 감소하고, 베타 밴드 영역의 파워 스펙트럼의 세기가 증가하는 것으로 식별되면, 제3 뉴럴 네트워크 신호상 동작상상(Motor Imagery)에 따른 노이즈로 식별하고, 제3 뉴럴 네트워크 신호에 따라 텍스트 요소를 선택하려는 사용자의 의사가 검증되지 않은 것으로 결정할 수도 있다. 본 개시에 따른 전자 장치(1000)는 가상 입력 장치에 대한 검증 영역에 집중하는 사용자의 상상동작에 의한 노이즈 발생을 감지하고, 해당 상상동작에 의한 노이즈 발생시 1차적으로 식별된 자판 영역에 대한 텍스트 요소에 대한 사용자 의사가 검증되지 않는 것으로 결정하기 때문에 보다 텍스트 정확도를 향상시킬 수 있는 장점이 있다.If the electronic device (1000) identifies that the intensity of the power spectrum of the alpha region in the third neural network signal decreases and that the intensity of the power spectrum of the beta band region increases, the electronic device may identify the third neural network signal as noise according to motor imagery, and determine that the user's intention to select a text element according to the third neural network signal is not verified. The electronic device (1000) according to the present disclosure detects the occurrence of noise due to the user's imaginary motion focusing on the verification area for the virtual input device, and determines that the user's intention for the text element for the keyboard area that was primarily identified when the noise occurred due to the imaginary motion is not verified, so there is an advantage in that text accuracy can be further improved.

S810에서, 전자 장치(1000)는 제3 뉴럴 네트워크 신호에 포함된, 상기 기호 패턴들을 플리커링하는데 사용되는 서로 다른 주파수에 대응되는 주파수 영역 대별 파워 스펙트럼의 세기들이 모두 임계치 이상 또는 모두 임계치 이하로 식별되는 경우 상기 제1 아이트래킹 신호에 기초하여, 상기 디스플레이 상에서 사용자 초점 영역을 식별할 수 있다. 예를 들어, 제3 뉴럴 네트워크 신호로부터 제1 부분 검증 영역의 플리커링 주파수에 대응되는 주파수 영역대의 제1 파워 스펙트럼의 세기와, 제2 부분 검증 영역의 플리커링 주파수에 대응되는 주파수 영역대의 제2 파워 스펙트럼의 세기 모두가 임계치 이상으로 식별되거나, 모두 임계치 이하로 식별되는 경우 사용자의 자판 영역을 선택하려는 의사가 불명확할 수 있다.In S810, the electronic device (1000) can identify a user focus area on the display based on the first eye tracking signal if the intensities of power spectra in frequency domains corresponding to different frequencies used to flicker the symbol patterns included in the third neural network signal are all identified as being above a threshold or below a threshold. For example, if the intensities of the first power spectrum in the frequency domain corresponding to the flickering frequency of the first partial verification area and the intensities of the second power spectrum in the frequency domain corresponding to the flickering frequency of the second partial verification area are both identified as being above a threshold or below a threshold, the user's intention to select a keyboard area may be unclear.

전자 장치(1000)는 제3 뉴럴 네트워크 신호에 포함된 제1 부분 검증 영역의 플리커링 주파수에 대응되는 주파수 영역대의 제1 파워 스펙트럼의 세기와, 제2 부분 검증 영역의 플리커링 주파수에 대응되는 주파수 영역대의 제2 파워 스펙트럼의 세기가 모두 임계치 이상이거나, 임계치 이하로 식별되는 경우에는, 사용자의 제1 아이트래킹 신호를 획득할 수 있다. 전자 장치(1000)는 제1 아이트래킹 신호에 기초하여 디스플레이 상에서 사용자 초점 영역을 식별할 수 있다.The electronic device (1000) can obtain a first eye-tracking signal of a user when the intensity of a first power spectrum in a frequency domain corresponding to a flickering frequency of a first partial verification region included in a third neural network signal and the intensity of a second power spectrum in a frequency domain corresponding to a flickering frequency of a second partial verification region are both identified as being equal to or greater than a threshold or equal to or less than a threshold. The electronic device (1000) can identify a user focus region on a display based on the first eye-tracking signal.

S820에서, 전자 장치(1000)는 제1 아이트래킹 신호에 기초하여 디스플레이 상에서 결정되는 초점 영역과 검증 영역 내 포함된 하나의 부분 검증 영역을 포함하는 전체 면적에 대한, 상기 초점 영역과 상기 검증 영역 내 포함되는 하나의 부분 검증 영역이 중첩되는 중첩 영역이 차지하는 면적의 비율에 기초하여 자판 영역에 대응되는 텍스트 요소를 선택하려는 사용자 의사를 검증할 수 있다. 예를 들어, 도 6에서 도시된 바와 같이, 전자 장치(1000)는 제1 아이트래킹 신호 기반 결정되는 초점 영역(648)과 중첩되는 부분 검증 영역(644)의 중첩 영역(645)을 결정할 수 있고, 초점 영역(648) 및 상기 초점 영역이 일부 관련된 부분 검증 영역(644)을 합한 전체 면적에 대한, 상기 중첩 영역(645) 면적의 비율에 기초하여, 사용자가 선택하려는 부분 검증 영역이 제1 부분 검증 영역(644)임을 식별할 수 있다.In S820, the electronic device (1000) can verify a user's intention to select a text element corresponding to a keyboard area based on a ratio of an area occupied by an overlapping area between a focus area determined on a display based on a first eye-tracking signal and a partial verification area included in the verification area, to a total area including the focus area and a partial verification area included in the verification area. For example, as illustrated in FIG. 6, the electronic device (1000) can determine an overlapping area (645) between a focus area (648) determined based on a first eye-tracking signal and a partial verification area (644) overlapping the focus area (648), and can identify that a partial verification area that the user intends to select is the first partial verification area (644) based on a ratio of an area of the overlapping area (645) to a total area including the focus area (648) and the partial verification area (644) to which the focus area is partially related.

또 다른 예에 의하면 도 6에 도시된 바와 같이, 전자 장치(1000)는 제1 아이 트래킹 신호에 따른 초점 영역(659)이 제1 부분 검증 영역(654)과 중첩되는 제1 중첩 영역(653)의 면적이, 제1 부분 검증 영역(654) 및 초점 영역(659)의 전체 면적에 대해 차지하는 제1 면적 비율을 결정하고, 제1 아이 트래킹 신호에 따른 초점 영역(659)이 제2 부분 검증 영역(656)과 중첩되는 제2 중첩 영역(655)의 면적이, 제2 부분 검증 영역(656) 및 초점 영역(659)의 전체 면적에 대해 차지하는 제2 면적 비율을 결정할 수 있다.In another example, as illustrated in FIG. 6, the electronic device (1000) may determine a first area ratio of an area of a first overlapping area (653) in which a focus area (659) according to a first eye tracking signal overlaps a first partial verification area (654) with respect to the total area of the first partial verification area (654) and the focus area (659), and may determine a second area ratio of an area of a second overlapping area (655) in which a focus area (659) according to the first eye tracking signal overlaps a second partial verification area (656) with respect to the total area of the second partial verification area (656) and the focus area (659).

전자 장치(1000)는 제1 파워 스펙트럼의 세기 및 제2 파워 스펙트럼의 세기가 모두 임계 범위 안에 있거나, 제1 파워 스펙트럼의 세기 및 제2 파워 스펙트럼의 세기가 모두 임계치 이상이거나 또는 제1 파워 스펙트럼의 세기 및 제2 파워 스펙트럼의 세기가 모두 임계치 이하인 경우, 아이트래킹 신호를 획득하고, 아이트래킹 신호에 기초하여 제1 면적 비율 및 제2 면적 비율을 결정하며, 제1 면적 비율 및 제2 면적 비율에 기초하여, 부분검증 영역들 중, 사용자가 선택하려는 부분 검증 영역을 정확하게 식별할 수 있다. 예를 들어, 전자 장치(1000)는 제1 면적 비율이 제2 면적 비율보다 크게 식별되는 경우, 검증 영역 내 제1 부분 검증 영역(654)을 선택하려는 사용자의 의사를 식별하고, 최종적으로는 자판 영역(661)을 선택하려는 사용자의 의사가 검증된 것으로 결정할 수 있다.The electronic device (1000) acquires an eye-tracking signal when both the intensity of the first power spectrum and the intensity of the second power spectrum are within a threshold range, or both the intensity of the first power spectrum and the intensity of the second power spectrum are equal to or greater than a threshold, or both the intensity of the first power spectrum and the intensity of the second power spectrum are equal to or less than a threshold, and determines a first area ratio and a second area ratio based on the eye-tracking signal, and accurately identifies a partial verification region that the user intends to select among the partial verification regions based on the first area ratio and the second area ratio. For example, when the first area ratio is identified to be greater than the second area ratio, the electronic device (1000) can identify the user's intention to select the first partial verification region (654) within the verification region, and ultimately determine that the user's intention to select the keyboard region (661) has been verified.

전자 장치(1000)가 아이트래킹 신호에 따른 초점 영역을 식별하는 과정은 공지의 아이트래킹 신호 기반 초점 영역을 식별하는 기술을 이용할 수 있다. 예를 들어, 전자 장치(1000)는 카메라를 통해 안면 이미지를 획득하고, 획득된 안면 이미지로부터 얼굴 영역과 안구 영역을 검출할 수 있다. 전자 장치(1000)는 안구 영역상 눈동자의 위치를 추적할 수 있다. 예를 들어, 전자 장치(1000)는 시간의 흐름에 따라 사용자의 이전 프레임 안면 이미지상 검출되는 눈동자의 위치와, 상기 이전 프레임 다음에 획득된 프레임 안면 이미지상 검출되는 눈동자의 위치 변화를 측정하고, 측정된 위치 변화에 기초하여 눈동자의 이동 방향과 속도를 결정할 수 있다. 전자 장치(1000)는 눈동자의 이동 방향과 속도에 기초하여 사용자가 현재 디스플레이상에서 응시하는 초점 영역을 식별할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)는 눈동자의 이동 방향과 속도에 기초하여 다음 예측되는 초점 영역의 위치를 보정함으로써 초점 영역의 예측 정확도를 향상시킬 수도 있다.The process of identifying a focus area according to an eye-tracking signal by the electronic device (1000) may utilize a known technology for identifying a focus area based on an eye-tracking signal. For example, the electronic device (1000) may acquire a face image through a camera, and detect a face area and an eye area from the acquired face image. The electronic device (1000) may track the position of the pupil in the eye area. For example, the electronic device (1000) may measure the position of the pupil detected in the previous frame face image of the user over time, and the change in the position of the pupil detected in the frame face image acquired after the previous frame, and determine the movement direction and speed of the pupil based on the measured position change. The electronic device (1000) may identify the focus area that the user is currently gazing at on the display based on the movement direction and speed of the pupil. According to one embodiment, the electronic device (1000) may also improve the prediction accuracy of the focus area by correcting the position of the next predicted focus area based on the movement direction and speed of the pupil.

또 다른 예에 의하면, 전자 장치(1000)가 이용하는 뉴럴 네트워크 신호 측정기(120)는 사용자의 머리에 착용될 수 있는 웨어러블 디바이스 타입으로 마련될 수 있다. 일 실시 예에 의하면 뉴럴 네트워크 신호 측정기(120)는 적어도 하나의 가속도 센서를 포함하고, 뉴럴 네트워크 신호 측정기(120)는 적어도 하나의 타입의 뉴럴 네트워크 신호 뿐만 아니라, 사용자의 머리에 착용된 상태에서 사용자의 머리의 움직임을 측정할 수도 있다.In another example, the neural network signal measuring device (120) used by the electronic device (1000) may be provided as a wearable device type that can be worn on the user's head. In one embodiment, the neural network signal measuring device (120) includes at least one acceleration sensor, and the neural network signal measuring device (120) may measure not only at least one type of neural network signal, but also the movement of the user's head while being worn on the user's head.

전자 장치(1000)는 뉴럴 네트워크 신호상 파워스펙트럼의 세기에 기초하여 검증 영역에 대한 사용자의 의사를 명확하게 식별하기 어려운 경우, 아이트래킹 신호 또는 웨어러블 디바이스에서 획득되는 가속도 센서 값에 기초하여 검증 영역에 포함된 하나의 부분 검증 영역들을 선택하려는 사용자의 의사를 더 정확하게 식별할 수도 있다.In cases where it is difficult to clearly identify a user's intention for a verification area based on the intensity of a power spectrum of a neural network signal, the electronic device (1000) may more accurately identify a user's intention to select one partial verification area included in the verification area based on an eye tracking signal or an acceleration sensor value obtained from a wearable device.

일 실시 예에 의하면 전자 장치는 서로 다른 인터페이스 영역들에 가상 입력 장치 인터페이스와 현재 식별된 텍스트 안내 인터페이스 및 후보 텍스트 안내 인터페이스를 제공할 수 있다. 예를 들어, 전자 장치(1000)는 제1 인터페이스 영역(910)에 가상 입력 장치 인터페이스에 대한 이미지를 표시하고, 상기 제1 인터페이스 영역에 인접하고, 상기 제1 인터페이스 영역(910)과 구별되는 제2 인터페이스 영역(930)에 사용자 의사 검증이 완료됨에 따라 현재까지 완성된 텍스트 요소 또는 문자 요소를 출력하는 텍스트 안내 인터페이스 이미지(930)와, 현재까지 완성된 텍스트 요소 또는 문자 요소와 관련된 후보 텍스트 이미지(932)를 표시할 수 있다.According to one embodiment, the electronic device may provide a virtual input device interface, a currently identified text guidance interface, and a candidate text guidance interface in different interface areas. For example, the electronic device (1000) may display an image for a virtual input device interface in a first interface area (910), and may display a text guidance interface image (930) that outputs text elements or character elements completed so far as user intention verification is completed in a second interface area (930) adjacent to the first interface area and distinct from the first interface area (910), and a candidate text image (932) related to the text elements or character elements completed so far.

제1 인터페이스 영역(910)과 제2 인터페이스 영역(930)은 각각의 인터페이스 경계(922, 931)로 구분될 수 있다. 제1 인터페이스 영역(910)에 제공되는 가상 입력 장치들은 하나의 화면 모드에서 제공될 수도 있으나, 1 내지 3단계로 구성된 전환 가능한 복수의 화면 모드들을 통해 제1 인터페이스 영역상에서 제공될 수도 있다. 예를 들어, 제1 인터페이스 영역(910)에 모든 영문 자판 영역들이 1개 화면 모드에서 제공될 수도 있지만, 또 다른 예에 의하면, 도 9에 도시되지 않은 I,O,P 또는 기호와 같은 자판 영역들은 제1 인터페이스 영역(910)에 출력가능한 별도의 화면 모드에서 제공될 수 있다.The first interface area (910) and the second interface area (930) may be separated by their respective interface boundaries (922, 931). The virtual input devices provided in the first interface area (910) may be provided in one screen mode, but may also be provided on the first interface area through a plurality of switchable screen modes configured in 1 to 3 stages. For example, all English keyboard areas in the first interface area (910) may be provided in one screen mode, but according to another example, keyboard areas such as I, O, P or symbols not shown in FIG. 9 may be provided in a separate screen mode that is outputtable in the first interface area (910).

일 실시 예에 의하면, 전자 장치(1000)는 제1 인터페이스 영역(910) 또는 제2 인터페이스 영역(930) 중 적어도 하나를 응시하는 사용자의 안면 이미지상 안구 영역을 검출하고, 검출된 안구 영역의 가로길이에 대한 세로 길이의 종횡비에 기초하여 사용자의 눈깜빡임 상태를 식별할 수 있다. 전자 장치(1000)는 안구 영역의 종횡비에 기초하여 눈깜빡임 상태(예컨대 눈이 떠져있는지, 눈이 감겨져 있는지)와, 1번 눈깜빡임시 눈이 감겨진 있는 상태가 유지되는 시간 및 눈 깜빡임 횟수를 결정할 수 있다. 전자 장치(1000)는 눈깜빡임 상태와 눈이 감겨진 상태가 유지되는 시간 및 눈깜빡임 횟수에 기초하여 사용자의 눈깜빡임 동작(또는 모션)을 결정하고, 결정된 눈깜빡임 동작에 미리 대응되는 제어 기능을 실행할 수 있다.According to one embodiment, the electronic device (1000) can detect an eye region on a face image of a user gazing at least one of the first interface region (910) and the second interface region (930), and identify an eye blink state of the user based on an aspect ratio of a vertical length to a horizontal length of the detected eye region. Based on the aspect ratio of the eye region, the electronic device (1000) can determine an eye blink state (e.g., whether the eyes are open or closed), a time period for which the eyes are closed and a number of eye blinks during one eye blink. Based on the eye blink state, the time period for which the eyes are closed and the number of eye blinks, the electronic device (1000) can determine an eye blink motion (or motion) of the user, and execute a control function corresponding to the determined eye blink motion in advance.

일 실시 예에 의하면, 사용자의 눈깜빡임 동작에 대응되는 제어 기능들은 인터페이스 영역 별로 다르게 설정될 수 있다. 예를 들어, 전자 장치(1000)는 사용자의 아이트래킹 신호에 따른 초점 영역이 머무르는 인터페이스 영역을 식별하 수 있다. 전자 장치(1000)는 사용자의 초점 영역(924)이 제1 인터페이스 영역(910)에 머무르는 것으로 식별되는 동안, 소정의 눈깜빡임 모션이 식별되면, 제1 인터페이스 영역에 대해서 미리 설정되어 있는, 눈깜빡임 동작에 대응되는 제1 기능 리스트(962) 중 하나의 제어 기능을 수행할 수 있다.According to one embodiment, control functions corresponding to the user's eye blinking motion may be set differently for each interface area. For example, the electronic device (1000) may identify an interface area where a focus area according to the user's eye tracking signal remains. When a predetermined eye blinking motion is identified while the user's focus area (924) is identified as remaining in the first interface area (910), the electronic device (1000) may perform one control function from a first function list (962) corresponding to the eye blinking motion, which is preset for the first interface area.

또 다른 예에 의하면, 전자 장치(1000)는 사용자의 초점 영역(924)이 제2 인터페이스 영역(930)에 머무르는 것으로 식별되는 동안, 소정의 눈깜빡임 모션이 식별되면, 제2 인터페이스 영역에 대해서 미리 설정되어 있는, 눈깜빡임 동작에 대응되는 제2 기능 리스트(942) 중 하나의 제어 기능을 수행할 수 있다. 예를 들어, 전자 장치(1000)는 사용자의 초점 영역이 제1 인터페이스 영역(910)에 머무르는 것으로 식별되는 동안, 사용자의 안면 이미지로부터 1번의 눈깜빡임 동작이 식별되는 경우, 해당 입력은 무시할 수 있으나, 사용자의 초점 영역(924)이 제2 인터페이스 영역(930)에 머무르는 동안 1번의 눈깜빡임 동작이 식별되는 경우, C 기능을 수행할 수 있다.In another example, the electronic device (1000) may perform one control function from a second function list (942) corresponding to the eye blinking motion, which is preset for the second interface area, when a predetermined eye blinking motion is identified while the user's focus area (924) is identified as staying in the second interface area (930). For example, when a single eye blinking motion is identified from the user's facial image while the user's focus area (924) is identified as staying in the first interface area (910), the electronic device (1000) may ignore the corresponding input, but may perform the C function when a single eye blinking motion is identified while the user's focus area (924) is identified as staying in the second interface area (930).

본 개시에 따른 전자 장치(1000)는 적어도 하나의 타입의 가상 입력 장치 이미지들이 표시되는 영역과, 입력된 텍스트 요소 또는 후보 텍스트들이 출력되는 인터페이스 영역을 하나의 화면상에서 구분 표시할 수 있을 뿐만 아니라, 현재 사용자의 초점 영역이 머무르는 인터페이스 영역의 위치를 식별하고, 인터페이스 영역 별로 서로 다르게 설정되는, 눈깜빡임 동작에 대응되는 제어 기능들을 수행함으로써, 사용자의 텍스트 입력 편의를 향상시킬 수 있다.An electronic device (1000) according to the present disclosure can display on a single screen an area where at least one type of virtual input device images are displayed and an interface area where input text elements or candidate texts are output, and can identify a location in an interface area where a current user's focus area remains, and perform control functions corresponding to an eye blinking motion that are set differently for each interface area, thereby improving convenience for a user's text input.

일 실시 예에 의하면, 전자 장치(1000)는 문서, 메시지, 대화내용 등의 문맥(context)을 인식하여 자연어 처리 엔진(Natural language processing model)과, 인식된 문맥을 기초로 사용자의 의도를 예측하는 추론 엔진을 포함하는 후보 텍스트 추천을 위한 인공지능 모델을 이용하여 제2 인터페이스 영역(920)에 후보 텍스트들(934)을 출력할 수 있다. 예를들어, 추론 엔진은 현재까지 입력된 텍스트(930)와 사용자의 의도를 맵핑한 사용자 의도 테이블을 생성하고, 생성된 사용자 의도 테이블에 기초하여 후보 텍스트들을 출력하도록 미리 학습된 시퀀스 모델일 수 있다.According to one embodiment, the electronic device (1000) may output candidate texts (934) to a second interface area (920) by using an artificial intelligence model for candidate text recommendation that includes a natural language processing model and an inference engine that recognizes the context of a document, a message, a conversation, etc., and predicts the user's intention based on the recognized context. For example, the inference engine may be a pre-learned sequence model that generates a user intention table that maps the text (930) input so far and the user's intention, and outputs candidate texts based on the generated user intention table.

또 다른 예에 의하면, 전자 장치(1000)는 뉴럴 네트워크 신호 측정기로부터 fNIR 신호를 획득하고, Fnir 신호에서 나타나는 뇌 피질 영역에서의 혈류 및 산소 농도 변화를 감지하여 사용자의 감정 상태를 결정할 수 있다. 전자 장치(1000)는 현재까지 식별된 텍스트의 문맥과, 상기 사용자의 감정 상태를 더 고려하여 후보 텍스트를 추천함으로써, 보다 사용자 의도에 적합한 후보 텍스트를 제공할 수도 있다.In another example, the electronic device (1000) may obtain an fNIR signal from a neural network signal meter and detect changes in blood flow and oxygen concentration in a cerebral cortex region that appear in the fNIR signal to determine the user's emotional state. The electronic device (1000) may also provide candidate texts that are more suitable for the user's intention by recommending candidate texts by further considering the context of the text identified so far and the user's emotional state.

예를 들어, 전자 장치(1000)는 인공지능 모델 내 자연어 처리 모델을 통해 수집된 텍스트('APP')의 문맥을 인식하고, 추론 모델을 이용하여 인식된 문맥에 기초하여 복수의 사용자 의도 (예컨대 APP 다음 LE를 입력하려는 의도 1, 또는 APP 다음 APP을 좋아한다는 LIKE를 입력하려는 의도 2) 들 중 하나의 사용자 의도를 결정하며, 결정된 사용자 의도에 기초하여 후보 텍스트들을 출력할 수 있다. 전자 장치가 후보 텍스트들 중 하나의 후보 텍스트를 식별하는 동작은 제1 인터페이스 영역상 하나의 자판 영역을 식별하고, 식별된 자판 영역을 선택하려는 사용자의 의사를 검증하는 동작에 대응될 수 있다.For example, the electronic device (1000) may recognize the context of a text ('APP') collected through a natural language processing model in an artificial intelligence model, determine one user intent among a plurality of user intents (e.g., intent 1 to input LE after APP, or intent 2 to input LIKE indicating that one likes APP after APP) based on the recognized context using an inference model, and output candidate texts based on the determined user intent. An operation of the electronic device identifying one candidate text among the candidate texts may correspond to an operation of identifying one keyboard area on a first interface area and verifying the user's intention to select the identified keyboard area.

예를 들어, 전자 장치(1000)는 제2 인터페이스 영역(930)에 출력된 후보 텍스 후보 텍스트들을 확인한 사용자의 제4 뉴럴 네트워크 신호에 기초하여 생성된 제4 텍스트 이미지와 제2 인터페이스 영역상 표시된 소정의 텍스트 이미지의 유사도에 기초하여, 후보 텍스트들 중 하나의 후보 텍스트를 식별할 수 있다. 예를 들어, 전자 장치(1000)는 적대적 생성 신경망 기반 인공지능 모델을 이용하여, 상기 제4 뉴럴 네트워크 신호에 기초하여 생성된 텍스트 이미지를 재구성하고, 재구성된 텍스트 이미지와 제2 인터페이스 영역상 표시된 소정의 이미지들의 유사도에 기초하여, 후보 텍스트 이미지(934)를 식별할 수 있다.For example, the electronic device (1000) may identify one of the candidate texts based on the similarity between a fourth text image generated based on a fourth neural network signal of a user who has verified the candidate texts output to the second interface area (930) and a predetermined text image displayed on the second interface area. For example, the electronic device (1000) may reconstruct a text image generated based on the fourth neural network signal using an adversarial generative neural network-based artificial intelligence model, and identify a candidate text image (934) based on the similarity between the reconstructed text image and the predetermined images displayed on the second interface area.

전자 장치(1000)는 식별된 후보 텍스트 이미지(934)를 시각적으로 변조함과 함께 서로 다른 주파수로 플리커링되는 후보 텍스트 부분 검증 영역들(936, 938)을 포함하는 후보 텍스트 검증 영역을 후보 텍스트 이미지 (934)에 중첩하여 표시할 수 있다. 전자 장치(1000)는 식별된 후보 텍스트가 표시된 이미지 영역에 중첩하여 표시되는 후보 텍스트 검증 영역에 대한 사용자의 제5 뉴럴 네트워크 신호 또는 제2 아이트래킹 신호 중 적어도 하나에 기초하여, 상기 식별된 후보 텍스트를 선택하려는 사용자의 의사를 검증할 수 있다. 상술한 과정은 제1 인터페이스 영역상 자판 영역을 식별하고, 식별된 자판 영역을 검증하는 과정에 대응될 수 있다. 전자 장치(1000)는 후보 텍스트(934)를 선택하려는 사용자 의사가 검증되면, 상기 선택된 후보 텍스트를 식별할 수 있다.The electronic device (1000) can display a candidate text verification area including candidate text portion verification areas (936, 938) that flicker at different frequencies while visually modulating the identified candidate text image (934) by overlaying it on the candidate text image (934). The electronic device (1000) can verify a user's intention to select the identified candidate text based on at least one of the user's fifth neural network signal or the second eye tracking signal with respect to the candidate text verification area displayed by overlaying it on the image area where the identified candidate text is displayed. The above-described process can correspond to a process of identifying a keyboard area on a first interface area and verifying the identified keyboard area. When the user's intention to select the candidate text (934) is verified, the electronic device (1000) can identify the selected candidate text.

S410에서, 전자 장치(1000)는 적어도 하나의 타입의 가상 입력 장치 이미지를 응시하는 사용자 안면 이미지를 획득할 수 있다. S420에서, 전자 장치(1000)는 사용자 안면 이미지상 검출되는 안구 영역의 안구 가로길이에 대한 세로 길이의 종횡비와, 임계비 이상의 상기 종횡비가 유지되는 시간에 기초하여 사용자의 집중력 수준을 결정할 수 있다. 예를 들어, 전자 장치(1000)는 DTM(Distance Threshold Method) 기법을 이용하여 사용자 안면 이미지상 안구 영역의 종횡비를 결정하고, 안구 영역의 종횡비가 제1 임계 비율 이상이면 사용자가 눈을 뜬 상태로 결정하고 안구 영역의 종횡비가 제2 임계 비율 이하로 식별되면 사용자가 눈을 감은 상태로 결정할 수 있다.In S410, the electronic device (1000) may obtain a user face image gazing at at least one type of virtual input device image. In S420, the electronic device (1000) may determine the user's concentration level based on an aspect ratio of a vertical length to an horizontal length of an eye region detected on the user's face image and a time during which the aspect ratio is maintained above a threshold ratio. For example, the electronic device (1000) may determine an aspect ratio of an eye region on the user's face image using a DTM (Distance Threshold Method) technique, and may determine that the user's eyes are open if the aspect ratio of the eye region is above a first threshold ratio, and may determine that the user's eyes are closed if the aspect ratio of the eye region is identified as being below a second threshold ratio.

예를 들어, 전자 장치(1000)는 임계비 이상의 종횡비가 유지되는 시간이 임계 시간 이상으로 식별되는 경우, 사용자의 집중력 수준을 높게 결정할 수 있다. 또 다른 예에 의하면, 전자 장치(1000)는 임계비 이상의 종횡비가, 임계 시간으로 유지되는 횟수를 카운팅하고, 카운팅된 횟수가 임계 횟수 이상인 경우 사용자의 집중력 수준을 높게 결정할 수도 있다.For example, the electronic device (1000) may determine the user's concentration level to be high if the time during which the aspect ratio greater than the threshold ratio is maintained is identified as being greater than the threshold time. According to another example, the electronic device (1000) may count the number of times the aspect ratio greater than the threshold ratio is maintained for the threshold time, and determine the user's concentration level to be high if the counted number of times is greater than the threshold number.

S430에서, 전자 장치(1000)는 사용자의 집중력 수준이 임계 수준 미만으로 식별되는 경우 상기 디스플레이상에 표시되는 상기 가상 입력 장치의 이미지, 검증 영역, 변조된 시각적 특성을 가지는 자판 영역 중 적어도 하나에 관한 시각적 자극에 사용되는 주파수 영역대를 미리 설정된 비율로 업스케일링할 수 있다. 또 다른 예에 의하면, 전자 장치(1000)는 사용자의 집중력 수준이 임계 수준 미만으로 식별되는 경우, 시각적 자극에 사용되는 주파수 영역대를, 사용자의 피로도가 낮은 주파수 영역대로 임의 비율로 변환할 수 있다. 또 다른 예에 의하면, 전자 장치(1000)는 사용자의 집중력 수준이 임계 수준 미만으로 결정되는 경우, 디스플레이 상에 표시되는 제1 인터페이스 영역 또는 제2 인터페이스 영역상의 이미지를 소정의 리프레시 이미지로 대체하고, 대체된 이미지를 일정하게 모니터상에 제공함으로써, 사용자의 피로도를 경감시킬 수 있다.In S430, the electronic device (1000) may upscale a frequency domain used for visual stimulation regarding at least one of an image of the virtual input device displayed on the display, a verification region, and a keyboard region having modulated visual characteristics, at a preset ratio if the user's concentration level is identified as being below a threshold level. According to another example, the electronic device (1000) may convert a frequency domain used for visual stimulation into a frequency domain in which the user's fatigue is low at an arbitrary ratio if the user's concentration level is identified as being below a threshold level. According to another example, the electronic device (1000) may replace an image on a first interface region or a second interface region displayed on the display with a predetermined refresh image and provide the replaced image on a monitor at a constant level, thereby reducing the user's fatigue.

도 10에는 도시되지 않았지만, 전자 장치(1000)는 안구 영역 이미지상에서, 임계비 이상의 종횡비가 유지되는 시간이 임계 시간 이상인지 여부를 결정하고, 임계비 이상의 종횡비가 유지되는 시간이 임계 시간 이상으로 결정되면, 가상 입력 장치 이미지를 응시하는 사용자의 뉴럴 네트워크 신호의 세기를 식별할 수 있다. 전자 장치(1000)는 임계비 이상의 종횡비가 임계 시간 이상으로 결정되더라도, 가상 입력 장치 이미지를 응시하는 사용자의 뉴럴 네트워크 신호의 세기가 임계 시간 이하로 식별되면, 사용자의 집중력 수준을 임계 수준 미만으로 결정할 수 있다. 이를 통해 전자 장치(1000)는 사용자가 눈을 뜬 상태에서 졸고 있는 것인지 또는 사용자가 눈을 뜬 상태여도 집중력이 저하된 상태인지 여부를 정확하게 결정할 수 있다.Although not illustrated in FIG. 10, the electronic device (1000) determines whether the time during which an aspect ratio greater than or equal to a threshold ratio is maintained on the eye region image is greater than or equal to a threshold time, and if the time during which an aspect ratio greater than or equal to the threshold ratio is maintained is determined to be greater than or equal to a threshold time, the electronic device (1000) can identify the intensity of a neural network signal of a user staring at the virtual input device image. Even if the aspect ratio greater than or equal to the threshold time is determined to be greater than or equal to a threshold time, if the intensity of a neural network signal of the user staring at the virtual input device image is identified to be less than or equal to a threshold time, the electronic device (1000) can determine the user's concentration level to be less than a threshold level. Through this, the electronic device (1000) can accurately determine whether the user is dozing off with his or her eyes open or whether the user is in a state of reduced concentration even with his or her eyes open.

또 다른 예에 의하면, 전자 장치(1000)는 안구 영역 이미지상 임계비 이상의 종횡비가 유지되는 것으로 결정되면, 사용자의 뉴럴 네트워크 신호상 나타나는 세타(Theta)파(약 4Hz에서 8Hz)에 대응되는 주파수 영역대의 파워스펙트럼 세기를 식별하고, 식별된 파워스펙트럼의 세기가 임계 세기 미만으로 식별되면, 상기 임계비 이상의 종횡비가 임계 시간 이상으로 결정되더라도, 사용자의 집중력 수준을 임계 수준 미만으로 결정할 수도 있다. 또 다른 예에 의하면, 전자 장치(1000)는 임계비 이상의 종횡비가 유지되더라도, 사용자의 후두엽 부근에 장착된 채널로부터 획득되는 뉴럴 네트워크 신호상 나타나는 소정의 주파수 대역의 세기가 임계세기 미만으로 식별되는 경우, 현재 사용자가 피로한 상태로 결정할 수도 있다. 상술한 방법을 통해 전자 장치(1000)는 현재 사용자의 피로도를 정확하게 측정할 수 있다.In another example, if it is determined that an aspect ratio greater than or equal to a threshold ratio is maintained in an eye region image, the electronic device (1000) identifies a power spectrum intensity in a frequency domain corresponding to theta waves (approximately 4 Hz to 8 Hz) appearing in a neural network signal of the user, and if the intensity of the identified power spectrum is determined to be less than the threshold intensity, even if the aspect ratio greater than or equal to the threshold ratio is determined to be greater than or equal to a threshold time, the user's concentration level may be determined to be less than the threshold level. In another example, if the intensity of a predetermined frequency band appearing in a neural network signal acquired from a channel mounted near the user's occipital lobe is determined to be less than the threshold intensity, even if the aspect ratio greater than or equal to the threshold ratio is maintained, the electronic device (1000) may determine that the current user is in a fatigued state. Through the above-described method, the electronic device (1000) can accurately measure the current user's fatigue level.

또한 도 10에는 도시되지 않았지만, 전자 장치(1000)는 사용자의 집중력 수준이 임계 수준 이상으로 식별되는 경우, 제1 인터페이스 영역 또는 제2 인터페이스 영역 중 적어도 하나에서 표시되는 시각 자극의 변환 속도를 증가시킬 수도 있다. 일 실시 예에 의하면, 전자 장치(1000)는 사용자의 집중력 수준을 식별한 결과 임계 수준이상으로 식별되는 경우, 도 9에 도시된 제1 인터페이스 영역상에 표시된 가상 입력 장치의 모드 전환 속도를 증가시키거나, 제1 인터페이스 영역 또는 제2 인터페이스 영역 별로 서로 다르게 설정되는 사용자 눈깜빡임 모션에 대응되는 기능의 실행 속도를 증가시킬 수도 있다.In addition, although not illustrated in FIG. 10, the electronic device (1000) may increase the conversion speed of a visual stimulus displayed in at least one of the first interface area or the second interface area when the user's concentration level is identified as being above a threshold level. According to one embodiment, when the electronic device (1000) identifies the user's concentration level as being above a threshold level, the electronic device (1000) may increase the mode switching speed of a virtual input device displayed on the first interface area illustrated in FIG. 9, or increase the execution speed of a function corresponding to a user's eye blink motion that is set differently for each of the first interface area or the second interface area.

일 실시 예에 의하면, 멀티미디어 컨텐츠 생성 시스템(10)은 전자 장치(1000), 서버(2000) 및 뉴럴 네트워크 신호 측정기(120)를 포함할 수 있다. 그러나 상술한 예에 한정되는 것은 아니며 도 1에 도시된 바와 같이 멀티미디어 컨텐츠를 생성하기 위해 기타 생체 신호 측정기, 네트워크 장치를 더 포함할 수도 있다. 일 실시 예에 의하면, 전자 장치(1000)는 제1 네트워크 인터페이스(1500), 디스플레이(1210), 메모리(1700) 및 제1 프로세서(1300)를 포함할 수 있다. 그러나, 도시된 구성 요소가 모두 필수구성요소인 것은 아니다. 도시된 구성 요소보다 많은 구성 요소에 의해 전자 장치(1000)가 구현될 수도 있고, 그 보다 적은 구성 요소에 의해서도 전자 장치(1000)는 구현될 수도 있다.According to one embodiment, the multimedia content creation system (10) may include an electronic device (1000), a server (2000), and a neural network signal measuring device (120). However, it is not limited to the above-described example, and may further include other bio-signal measuring devices and network devices to create multimedia content as illustrated in FIG. 1. According to one embodiment, the electronic device (1000) may include a first network interface (1500), a display (1210), a memory (1700), and a first processor (1300). However, not all of the illustrated components are essential components. The electronic device (1000) may be implemented by more components than the illustrated components, or may be implemented by fewer components.

예를 들어, 도 12에 도시된 바와 같이, 전자 장치(1000)는 제1 프로세서(1300), 제1 네트워크 인터페이스(1500), 디스플레이(1210) 및 메모리(1700)외에, 사용자 입력 인터페이스(1100), 출력부(1200), 센싱부(1400), 네트워크 인터페이스(1500), A/V 입력부(1600) 및 메모리(1700)를 더 포함할 수도 있다. 도 11 내지 도 12를 참조하여 전자 장치 및 서버의 구성에 대해 구체적으로 설명하기로 한다.For example, as illustrated in FIG. 12, the electronic device (1000) may further include a user input interface (1100), an output unit (1200), a sensing unit (1400), a network interface (1500), an A/V input unit (1600), and a memory (1700), in addition to the first processor (1300), the first network interface (1500), the display (1210), and the memory (1700). The configuration of the electronic device and the server will be specifically described with reference to FIGS. 11 and 12.

사용자 입력 인터페이스(1100)는, 사용자가 전자 장치(1000)를 제어하기 위한 데이터를 입력하는 수단을 의미한다. 예를 들어, 사용자 입력 인터페이스(1100)에는 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다.The user input interface (1100) refers to a means for a user to input data for controlling an electronic device (1000). For example, the user input interface (1100) may include, but is not limited to, a key pad, a dome switch, a touch pad (contact electrostatic capacitance type, pressure resistive film type, infrared detection type, surface ultrasonic conduction type, integral tension measurement type, piezo effect type, etc.), a jog wheel, a jog switch, etc.

사용자 입력 인터페이스(1100)는, 뉴럴 네트워크 신호를 획득하기 위한 시각적 응답을 표시 및 제어하기 위한 사용자 입력을 획득할 수 있다. 또 다른 예에 의하면, 출력부(1200)는, 오디오 신호 또는 비디오 신호 또는 진동 신호를 출력할 수 있으며, 출력부(1200)는 디스플레이부(1210), 음향 출력부(1220), 및 진동 모터(1230)를 포함할 수 있다.The user input interface (1100) can obtain a user input for displaying and controlling a visual response for obtaining a neural network signal. According to another example, the output unit (1200) can output an audio signal, a video signal, or a vibration signal, and the output unit (1200) can include a display unit (1210), an audio output unit (1220), and a vibration motor (1230).

디스플레이부(1210)는 전자 장치(1000)에서 처리되는 정보를 표시 출력하기 위한 화면을 포함한다. 또한, 화면은 전자 장치(1000)가 추론한 텍스트, 또는 텍스트에 부분 동영상을 매칭함으로써 생성한 렌더링 영상을 출력할 수 있다. 음향 출력부(1220)는 네트워크 인터페이스(1500)로부터 수신되거나 메모리(1700)에 저장된 오디오 데이터를 출력한다. 또한, 음향 출력부(1220)는 전자 장치(1000)에서 수행되는 기능(예를 들어, 호신호 수신음, 메시지 수신음, 알림음)과 관련된 음향 신호를 출력한다.The display unit (1210) includes a screen for displaying and outputting information processed in the electronic device (1000). In addition, the screen can output text inferred by the electronic device (1000) or a rendered image generated by matching a partial video to text. The audio output unit (1220) outputs audio data received from the network interface (1500) or stored in the memory (1700). In addition, the audio output unit (1220) outputs an audio signal related to a function performed in the electronic device (1000) (e.g., a call signal reception sound, a message reception sound, a notification sound).

제1 프로세서(1300)는 통상적으로 전자 장치(1000)의 전반적인 동작을 제어한다. 예를 들어, 제1 프로세서(1300)는, 메모리(1700)에 저장된 프로그램들을 실행함으로써, 사용자 입력 인터페이스(1100), 출력부(1200), 센싱부(1400), 네트워크 인터페이스(1500), A/V 입력부(1600) 등을 전반적으로 제어할 수 있다. 또한, 제1 프로세서(1300)는 메모리(1700)에 저장된 프로그램들을 실행함으로써, 도 1 내지 도 10에 기재된 전자 장치(1000)의 기능을 수행할 수 있다.The first processor (1300) typically controls the overall operation of the electronic device (1000). For example, the first processor (1300) may control the user input interface (1100), the output unit (1200), the sensing unit (1400), the network interface (1500), the A/V input unit (1600), etc., by executing programs stored in the memory (1700). In addition, the first processor (1300) may perform the functions of the electronic device (1000) described in FIGS. 1 to 10 by executing programs stored in the memory (1700).

일 실시 예에 의하면, 제1 프로세서(1300)는 복수개로 마련될 수 있다. 예를 들어, 복수의 제1 프로세서(1300)는 하나 이상의 인스트럭션을 실행함으로써, 외부 디바이스로부터 획득되는 적어도 하나의 타입의 뉴럴 네트워크 신호에 기초하여, 상기 사용자가 의도한 프롬프트를 식별하고, 상기 식별된 프롬프트에 포함된 하나 이상의 문장을 포함하는 문단들 별 핵심 키워드에 기초하여, 상기 프롬프트에 영상 리소스를 매칭하고, 상기 프롬프트에 매칭된 상기 영상 리소스를 합성함으로써 멀티미디어 컨텐츠를 생성할 수 있다.According to one embodiment, the first processor (1300) may be provided in plural units. For example, the plurality of first processors (1300) may execute one or more instructions to identify a prompt intended by the user based on at least one type of neural network signal obtained from an external device, match an image resource to the prompt based on a core keyword of paragraphs including one or more sentences included in the identified prompt, and generate multimedia content by synthesizing the image resource matched to the prompt.

센싱부(1400)는, 전자 장치(1000)의 상태 또는 전자 장치(1000) 주변의 상태를 감지하고, 감지된 정보를 프로세서(1300)로 전달할 수 있다. 센싱부(1400)는 전자 장치(1000)의 사양 정보, 모니터링 대상 공간에 대한 온도, 습도, 기압 정보 등을 센싱할 수 있다.The sensing unit (1400) can detect the status of the electronic device (1000) or the status around the electronic device (1000) and transmit the detected information to the processor (1300). The sensing unit (1400) can sense specification information of the electronic device (1000), temperature, humidity, and air pressure information for the monitoring target space, etc.

예를 들어, 센싱부(1400)는, 지자기 센서(Magnetic sensor)(1410), 가속도 센서(Acceleration sensor)(1420), 온/습도 센서(1430), 적외선 센서(1440), 자이로스코프 센서(1450), 위치 센서(예컨대, GPS)(1460), 기압 센서(1470), 근접 센서(1480), 및 RGB 센서(illuminance sensor)(1490) 중 적어도 하나를 포함할 수 있으나, 이에 한정되는 것은 아니다. 또다른 예에 의하면, 센싱부(1400)는 사용자 생체 신호들(예컨대 근전도 신호, 심전도 신호)을 측정하기 위한 센서들을 더 포함할 수도 있다. 각 센서들의 기능은 그 명칭으로부터 당업자가 직관적으로 추론할 수 있으므로, 구체적인 설명은 생략하기로 한다.For example, the sensing unit (1400) may include at least one of a magnetic sensor (1410), an acceleration sensor (1420), a temperature/humidity sensor (1430), an infrared sensor (1440), a gyroscope sensor (1450), a position sensor (e.g., GPS) (1460), a barometric pressure sensor (1470), a proximity sensor (1480), and an RGB sensor (illuminance sensor) (1490), but is not limited thereto. According to another example, the sensing unit (1400) may further include sensors for measuring user biosignals (e.g., electromyogram signals, electrocardiogram signals). Since the functions of each sensor can be intuitively inferred from its name by those skilled in the art, a detailed description thereof will be omitted.

제1 네트워크 인터페이스(1500)는 전자 장치(1000)가 다른 장치(미도시) 및 서버(2000)와 통신을 하게 하는 하나 이상의 구성요소를 포함할 수 있다. 다른 장치(미도시)는 전자 장치(1000)와 같은 컴퓨팅 장치이거나, 센싱 장치일 수 있으나, 이에 제한되지 않는다. 예를 들어, 제1 네트워크 인터페이스(1500)는, 무선 통신 인터페이스 (1510), 유선 통신 인터페이스 (1520), 이동 통신부(1530)를 포함할 수 있다. 무선 통신 인터페이스(1510)는 근거리 통신부(short-range wireless communication unit), 블루투스 통신부, BLE(Bluetooth Low Energy) 통신부, 근거리 무선 통신부(Near Field Communication unit), WLAN(와이파이) 통신부, 지그비(Zigbee) 통신부, 적외선(IrDA, infrared Data Association) 통신부, WFD(Wi-Fi Direct) 통신부, UWB(ultra wideband) 통신부 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.The first network interface (1500) may include one or more components that allow the electronic device (1000) to communicate with other devices (not shown) and a server (2000). The other devices (not shown) may be computing devices such as the electronic device (1000) or sensing devices, but are not limited thereto. For example, the first network interface (1500) may include a wireless communication interface (1510), a wired communication interface (1520), and a mobile communication unit (1530). The wireless communication interface (1510) may include, but is not limited to, a short-range wireless communication unit, a Bluetooth communication unit, a BLE (Bluetooth Low Energy) communication unit, a near field communication unit, a WLAN (Wi-Fi) communication unit, a Zigbee communication unit, an infrared (IrDA, infrared Data Association) communication unit, a WFD (Wi-Fi Direct) communication unit, a UWB (ultra wideband) communication unit, etc.

유선 통신 인터페이스(1520)는 유선 통신을 통해 전자 장치와 연결된 외부 디바이스와 데이터를 주고받기 위한 적어도 하나의 유선 인터페이스를 포함할 수 있다. 이동 통신부(1520)는, 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. 여기에서, 무선 신호는, 음성 호 신호, 화상 통화 호 신호 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 형태의 데이터를 포함할 수 있다.The wired communication interface (1520) may include at least one wired interface for transmitting and receiving data with an external device connected to the electronic device via wired communication. The mobile communication unit (1520) transmits and receives a wireless signal with at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various forms of data according to transmission and reception of a voice call signal, a video call signal, or a text/multimedia message.

A/V(Audio/Video) 입력부(1600)는 오디오 신호 또는 비디오 신호 입력을 위한 것으로, 이에는 카메라(1610)와 마이크로폰(1620) 등이 포함될 수 있다. 카메라(1610)는 화상 통화모드 또는 촬영 모드에서 이미지 센서를 통해 정지영상 또는 동영상 등의 화상 프레임을 얻을 수 있다. 이미지 센서를 통해 캡쳐된 이미지는 제1 프로세서(1300) 또는 별도의 이미지 처리부(미도시)를 통해 처리될 수 있다.The A/V (Audio/Video) input unit (1600) is for inputting audio signals or video signals, and may include a camera (1610) and a microphone (1620), etc. The camera (1610) can obtain image frames, such as still images or moving images, through an image sensor in a video call mode or a shooting mode. An image captured through the image sensor can be processed through a first processor (1300) or a separate image processing unit (not shown).

마이크로폰(1620)은, 외부의 음향 신호를 입력 받아 전기적인 음성 데이터로 처리한다. 예를 들어, 마이크로폰(1620)은 외부 디바이스 또는 사용자로부터 음향 신호를 수신할 수 있다. 마이크로폰(1620)은 사용자의 음성 입력을 수신할 수 있다. 마이크로폰(1620)은 외부의 음향 신호를 입력 받는 과정에서 발생되는 잡음(noise)을 제거하기 위한 다양한 잡음 제거 알고리즘을 이용할 수 있다.The microphone (1620) receives an external sound signal and processes it into electrical voice data. For example, the microphone (1620) can receive an sound signal from an external device or a user. The microphone (1620) can receive a user's voice input. The microphone (1620) can use various noise removal algorithms to remove noise generated in the process of receiving an external sound signal.

메모리(1700)는, 제1 프로세서(1300)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 전자 장치(1000)로 입력되거나 전자 장치(1000)로부터 출력되는 데이터를 저장할 수도 있다. 또한, 메모리(1700)는 전자 장치(1000)가 이용하는 뉴럴네트워크 신호 패턴 학습을 위한 인공지능 모델, 텍스트 별 문맥 또는 핵심 키워드를 식별하기 위한 인공지능 모델, 텍스트 기반 영상 리소스 매칭을 통해 렌더링 영상 생성을 위한 인공지능 모델을 저장할 수 있다.The memory (1700) can store a program for processing and controlling the first processor (1300), and can also store data input to or output from the electronic device (1000). In addition, the memory (1700) can store an artificial intelligence model for learning neural network signal patterns used by the electronic device (1000), an artificial intelligence model for identifying context or core keywords by text, and an artificial intelligence model for generating a rendered image through text-based image resource matching.

예를 들어, 메모리(1700)는 적어도 하나의 신경망 모델 내 레이어들, 노드들, 상기 레이어들의 연결 강도에 관한 가중치 값들을 저장할 수 있다. 또한, 전자 장치(1000)는 신경망 모델을 학습하기 위해 전자 장치(1000)가 생성한 학습 데이터를 더 저장할 수도 있다. 또한, 메모리(1700)는 전자 장치와 연결된 카메라들 또는 서버의 동작 환경에 대한 정보들을 더 저장할 수도 있다.For example, the memory (1700) may store weight values regarding layers, nodes, and connection strengths of at least one neural network model. In addition, the electronic device (1000) may further store learning data generated by the electronic device (1000) to learn the neural network model. In addition, the memory (1700) may further store information regarding the operating environments of cameras or servers connected to the electronic device.

메모리(1700)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory (1700) may include at least one type of storage medium among a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, an SD or XD memory, etc.), a RAM (Random Access Memory), a SRAM (Static Random Access Memory), a ROM (Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read-Only Memory), a PROM (Programmable Read-Only Memory), a magnetic memory, a magnetic disk, and an optical disk.

메모리(1700)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류할 수 있는데, 예를 들어, UI 모듈(1710), 터치 스크린 모듈(1720), 알림 모듈(1730) 등으로 분류될 수 있다.Programs stored in memory (1700) can be classified into multiple modules according to their functions, for example, a UI module (1710), a touch screen module (1720), a notification module (1730), etc.

UI 모듈(1710)은, 전자 장치(1000)가 뉴럴 네트워크 신호 기반 텍스트 식별, 텍스트 기반 영상 생성, 생성된 영상에 대한 편집을 위한 UI, GUI 등을 제공할 수 있다. 터치 스크린 모듈(1720)은 사용자의 터치 스크린 상의 터치 제스처를 감지하고, 터치 제스처에 관한 정보를 프로세서(1300)로 전달할 수 있다. 일부 실시예에 따른 터치 스크린 모듈(1720)은 터치 코드를 인식하고 분석할 수 있다. 터치 스크린 모듈(1720)은 컨트롤러를 포함하는 별도의 하드웨어로 구성될 수도 있다.The UI module (1710) can provide a UI, GUI, etc. for text identification based on neural network signals, text-based image generation, and editing of the generated image to the electronic device (1000). The touch screen module (1720) can detect a touch gesture on a user's touch screen and transmit information about the touch gesture to the processor (1300). The touch screen module (1720) according to some embodiments can recognize and analyze a touch code. The touch screen module (1720) can also be configured as separate hardware including a controller.

알림 모듈(1730)은 전자 장치(1000)의 이벤트 발생을 알리기 위한 신호를 발생할 수 있다. 예를 들어, 전자 장치(1000)가 제공한 렌더링 영상에 대한 사용자 피드백이 획득되거나, 뉴럴 네트워크 신호 기반 식별된 텍스트에 대한 문맥 식별이 어려운 경우 이에 따른 알림음을 제공할 수 있다. 일 실시 예에 따른 이벤트의 예로는 호 신호 수신, 메시지 수신, 키 신호 입력, 일정 알림 등이 있다. 알림 모듈(1730)은 디스플레이부(1210)를 통해 비디오 신호 형태로 알림 신호를 출력할 수도 있고, 음향 출력부(1220)를 통해 오디오 신호 형태로 알림 신호를 출력할 수도 있고, 진동 모터(1230)를 통해 진동 신호 형태로 알림 신호를 출력할 수도 있다.The notification module (1730) may generate a signal to notify the occurrence of an event of the electronic device (1000). For example, when user feedback is obtained for a rendered image provided by the electronic device (1000) or contextual identification of identified text based on a neural network signal is difficult, a notification sound may be provided accordingly. Examples of events according to an embodiment of the present invention include reception of a call signal, reception of a message, input of a key signal, schedule notification, etc. The notification module (1730) may output a notification signal in the form of a video signal through the display unit (1210), may output a notification signal in the form of an audio signal through the audio output unit (1220), or may output a notification signal in the form of a vibration signal through the vibration motor (1230).

일 실시 예에 의하면 서버(2000)는 제2 네트워크 인터페이스(2100), 데이터 베이스(2200) 및 제2 프로세서(2300)를 포함할 수 있다. 도 11에 도시된 서버(2000)의 구성은 도 12에 기재된 전자 장치의 구성에 대응될 수도 있다. 일 실시 예에 의하면 제2 네트워크 인터페이스(2100)는 전자 장치(1000)로부터 프롬프트 또는 텍스트 정보를 획득하고, 획득된 프롬프트 또는 텍스트 정보를 기초로 생성되는 영상 정보를 전자 장치로 전송할 수 있다.According to one embodiment, the server (2000) may include a second network interface (2100), a database (2200), and a second processor (2300). The configuration of the server (2000) illustrated in FIG. 11 may correspond to the configuration of the electronic device described in FIG. 12. According to one embodiment, the second network interface (2100) may obtain prompt or text information from the electronic device (1000) and transmit image information generated based on the obtained prompt or text information to the electronic device.

또 다른 실시 예에 의하면, 네트워크 인터페이스(2100)는 전자 장치가 학습한 인공지능 모델, 신경망 모델, 뉴럴 네트워크 신호의 특징 및 패턴 정보들을 전자 장치와 송수신할 수 있다. 또 다른 예에 의하면 네트워크 인터페이스(2100)는 텍스트 또는 프롬프트에 매칭되는 부분 동영상 정보들, 부분 동영상 정보들에 매칭되는 태그 정보들을 송수신할 수도 있다.According to another embodiment, the network interface (2100) may transmit and receive information about the characteristics and patterns of artificial intelligence models, neural network models, and neural network signals learned by the electronic device to and from the electronic device. According to another example, the network interface (2100) may also transmit and receive partial video information matching text or prompts, and tag information matching the partial video information.

또한, 일 실시 예에 의하면 데이터 베이스(2200)는 도 11에서 상술한 메모리에 대응될 수 있다. 예를 들어, 데이터 베이스(2200)는 전자 장치(1000)로부터 획득되는 뉴럴 네트워크 신호 정보들, 텍스트 정보들, 텍스트 정보들에 매칭 가능한 영상 또는 이미지 정보들을 저장할 수 있다.In addition, according to one embodiment, the database (2200) may correspond to the memory described above in FIG. 11. For example, the database (2200) may store neural network signal information, text information, and image or video information that can be matched to the text information obtained from the electronic device (1000).

일 실시 예에 의하면, 제2 프로세서(2300)는 서버(2000)의 전반적인 동작을 제어할 수 있다. 예를 들어, 제2 프로세서(2300)는 네트워크 인터페이스(2100) 및 데이터 베이스(2200)를 제어함으로써, 도 1 내지 10에서 기재된 전자 장치(1000)가 수행하는 뉴럴 네트워크 신호 기반 멀티미디어 컨텐츠를 생성하는 방법 중 전부 또는 일부를 수행할 수 있다.In one embodiment, the second processor (2300) can control the overall operation of the server (2000). For example, the second processor (2300) can perform all or part of the method for generating multimedia content based on neural network signals performed by the electronic device (1000) described in FIGS. 1 to 10 by controlling the network interface (2100) and the database (2200).

일 실시 예에 의하면, 서버(2000)의 뉴럴 네트워크 신호로부터 결정된 프롬프트에 영상 리소스를 정확하게 매칭하기 위해 하기의 동작을 수행할 수 있다. 예를 들어, 제2 프로세서(2300)는 전자 장치로부터 프롬프트를 획득하고, 상기 프롬프트에 포함된 하나 이상의 문단들 별 핵심 키워드가 다의어 또는 동음 이의어인지 여부에 기초하여 결정되는 타겟 의미 또는 상기 프롬프트 내 동일한 타겟 의미를 나타내는 핵심 키워드를 포함하는 문단이 2이상 식별되는지 여부에 따라 결정되는 적어도 하나의 서브 키워드에 기초하여 상기 프롬프트에 상기 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성할 수 있다.According to one embodiment, the following operations may be performed to accurately match a video resource to a prompt determined from a neural network signal of the server (2000). For example, the second processor (2300) may obtain a prompt from an electronic device, and generate multimedia content by matching the video resource to the prompt based on at least one sub-keyword determined based on whether a core keyword of one or more paragraphs included in the prompt is a polysemous or homonymous word or whether two or more paragraphs including a core keyword expressing the same target meaning in the prompt are identified.

일 실시 예에 의하면 제2 프로세서(2300)는 상기 전자 장치로부터 획득되는 상기 프롬프트 요약을 요청하는 사용자 입력 또는 상기 프롬프트의 길이에 따라 결정되는 어텐션에 기초하여 상기 프롬프트를 요약하고, 상기 요약된 프롬프트를 문단 별로 분할하며, 상기 분할된 문단 별 핵심 키워드를 결정하는 것을 특징으로 하는, 멀티미디어 컨텐츠 생성할 수 있다. 예를 들어, 서버(2000)는 어텐션 가중치에 따라 프롬프트내 각 텍스트 요소의 중요성을 소정의 가중치로 나타내고, 어텐션 가중치와 텍스트 입력에 따른 컨텍스트 벡터가 입력되면, 요약 텍스트를 출력하는 시퀀스 모델을 이용하여 프롬프느 요약을 수행할 수도 있다.According to one embodiment, the second processor (2300) may generate multimedia content characterized by summarizing the prompt based on a user input requesting the prompt summary obtained from the electronic device or attention determined based on the length of the prompt, dividing the summarized prompt into paragraphs, and determining a core keyword for each divided paragraph. For example, the server (2000) may perform prompt summary using a sequence model that indicates the importance of each text element in the prompt with a predetermined weight according to the attention weight and outputs a summary text when a context vector according to the attention weight and text input is input.

일 실시 예에 의하면 제2 프로세서(2300)는 문단 별 핵심 키워드의 수에 기초하여 제1 문단 분할 지점을 결정하고, 상기 제1 문단 분할 지점에 따라 구분되는 제1 분할 문단들에 포함된 상기 핵심 키워드들의 빈도수 및 연결 관계 중 적어도 하나에 관한 키워드 네트워크를 결정하고, 상기 제1 분할 문단들 중, 인접한 문단들에 대한 키워드 네트워크의 유사도를 결정하고, 상기 키워드 네트워크의 유사도가 임계 유사도 이상인 인접한 문단들을 병합함으로써 상기 텍스트 내 제2 문단 분할 지점을 결정하고, 상기 제2 문단 분할 지점에 기초하여 결정되는 제2 문단 분할들에 기초하여 상기 요약된 프롬프트를 분할할 수도 있다.According to one embodiment, the second processor (2300) determines a first paragraph division point based on the number of core keywords per paragraph, determines a keyword network regarding at least one of the frequencies and connection relationships of the core keywords included in first divided paragraphs divided according to the first paragraph division point, determines a similarity of the keyword network for adjacent paragraphs among the first divided paragraphs, and merges adjacent paragraphs whose similarity of the keyword network is equal to or greater than a threshold similarity, thereby determining a second paragraph division point within the text, and may divide the summarized prompt based on the second paragraph divisions determined based on the second paragraph division point.

예를 들어, 서버(2000)는 문단 별 키워드의 빈도수와 연결 관계에 관한 키워드 네트워크들 사이 코사인 유사도, 자카드 유사도, 그래프 유사도 중 적어도 하나에 따른 유사도 판별 기법을 적용하여 키워드 네트워크들 사이의 유사도에 기초하여 문단 분할을 최적화함으로써 영상 리소스 매칭 정확도를 향상시킬 수 있다. 일 실시 예에 의하면, 제2 프로세서(2300)는 문단 별 핵심 키워드가 다의어 또는 동음이의어가 아닌 경우, 상기 핵심 키워드에 일대일로 대응되는 단일 의미를 상기 핵심 키워드의 타겟 의미로 결정할 수 있다.For example, the server (2000) may optimize paragraph segmentation based on the similarity between keyword networks by applying a similarity determination technique based on at least one of cosine similarity, Jaccard similarity, and graph similarity between keyword networks regarding the frequency and connection relationship of keywords per paragraph, thereby improving the image resource matching accuracy. According to one embodiment, if a core keyword per paragraph is not a polysemous or homonymous word, the second processor (2300) may determine a single meaning corresponding one-to-one to the core keyword as the target meaning of the core keyword.

일 실시 예에 의하면, 제2 프로세서(2300)는 상기 문단 별 핵심 키워드가 다의어 또는 동음이의어로 식별되는 경우, 상기 문단 별 핵심 키워드를 타겟 키워드로 결정하고, 상기 결정된 타겟 키워드와 관련된 복수의 의미들을 미리 결정된 차원의 벡터들로 변환하고, 상기 변환된 벡터들 사이의 거리에 기초하여 상기 타겟 키워드를 포함하는 문장에 인접한 문장들 중, 상기 타겟 의미를 식별하기 위해 분석 대상이 되는 문장들이 상기 타겟 키워드를 포함하는 문장으로부터 떨어진 거리 범위에 관한 문장 분석 범위를 결정하고, 상기 문장 분석 범위에 기초하여 결정되는 적어도 하나의 문장에 포함된 단어들에 대한 개체명을 식별하고, 상기 식별된 개체명 사이의 지배-피지배 관계를 나타내는 구문 트리에 따라 결정되는 의존관계에 기초하여 상기 타겟 의미를 결정할 수 있다.According to one embodiment, the second processor (2300) determines the paragraph-specific core keyword as a target keyword when the paragraph-specific core keyword is identified as a polysemous word or a homonym, converts a plurality of meanings related to the determined target keyword into vectors of a predetermined dimension, determines a sentence analysis range regarding a distance range of sentences adjacent to a sentence including the target keyword, the sentences to be analyzed for identifying the target meaning are separated from the sentence including the target keyword based on a distance between the converted vectors, identifies a noun for words included in at least one sentence determined based on the sentence analysis range, and determines the target meaning based on a dependency relationship determined according to a syntax tree representing a dominating-dominated relationship between the identified nouns.

일 실시 예에 의하면, 타겟 의미는 다의어 또는 동음이의어들이 포함하고 있는 복수의 후보 의미들 중, 가장 적합한 의미로 식별된 하나의 의미를 나타낼 수 있다. 일 실시 예에 의하면, 제2 프로세서(2300)는 상기 프롬프트 내, 문단 별 핵심 키워드에 대해 결정된 상기 타겟 의미가 동일한 문단이 있는지 여부를 식별하고, 상기 타겟 의미가 동일한 문단이 있는 것으로 식별되는 경우, 상기 핵심 키워드가 포함된 문단 별 적어도 하나의 서브 키워드를 결정하고, 상기 문단 별 핵심 키워드에 결정된 타겟 의미 및 상기 적어도 하나의 서브 키워드에 기초하여, 상기 문단들 별, 상기 타겟 의미 및 상기 서브 키워드를 모두 포함하는 라벨링 정보가 라벨링된 영상 리소스들을 매칭함으로써 상기 멀티미디어 컨텐츠를 생성할 수 있다.According to one embodiment, the target meaning may indicate one meaning identified as the most appropriate meaning among a plurality of candidate meanings included in polysemous or homonymous words. According to one embodiment, the second processor (2300) identifies whether there is a paragraph having the same target meaning determined for the core keyword of each paragraph in the prompt, and if it is identified that there is a paragraph having the same target meaning, determines at least one sub-keyword for each paragraph including the core keyword, and based on the target meaning determined for the core keyword of each paragraph and the at least one sub-keyword, matches image resources labeled with labeling information including both the target meaning and the sub-keyword for each paragraph, thereby generating the multimedia content.

S1301에서, 서버(2000)는 인공지능 기반 영상 생성 모델을 미리 구축 및 학습시킬 수 있다. 예를들어, 서버(2000)는 프롬프트가 입력되면, 입력된 프롬프트를 소정의 문단들로 분할하고, 분할된 문단들 별 영상 리소스를 매칭함으로써 영상을 생성하는 인공지능 기반 영상 생성 모델을 미리 학습시킬 수 있다. S1302에서, 뉴럴 네트워크 신호 측정기(120)는 적어도 하나의 타입의 뉴럴 네트워크 신호를 측정할 수 있다. S1304에서, 뉴럴 네트워크 신호 측정기(120)는 획득된 적어도 하나의 타입의 뉴럴 네트워크 신호를 전자 장치(1000)로 전송할 수 있다. S1306에서, 전자 장치(1000)는 적어도 하나의 타입의 뉴럴 네트워크 신호로부터 프롬프트를 식별할 수 있다.In S1301, the server (2000) can pre-build and train an artificial intelligence-based image generation model. For example, when a prompt is input, the server (2000) can pre-train an artificial intelligence-based image generation model that generates an image by dividing the input prompt into predetermined paragraphs and matching image resources for each of the divided paragraphs. In S1302, the neural network signal measuring device (120) can measure at least one type of neural network signal. In S1304, the neural network signal measuring device (120) can transmit the acquired at least one type of neural network signal to the electronic device (1000). In S1306, the electronic device (1000) can identify the prompt from at least one type of neural network signal.

S1308에서 전자 장치(1000)는 프롬프트를 서버(2000)로 전송할 수 있다. S1310에서, 서버(2000)는 전자 장치(1000)로부터 획득된 프롬프트를 전처리할 수 있다. 서버(2000)가 프롬프트를 전처리 하는 동작은 프롬프트의 요약 및 노이즈 제거 동작을 포함할 수 있다. S1312에서, 서버(2000)는 전처리된 프롬프트 내 하나 이상의 문단들에 영상 리소스를 매칭할 수 있다. S1314에서, 서버(2000)는 프롬프트에 영상 리소스를 매칭함으로써 멀티미디어 컨텐츠를 생성할 수 있다. S1316에서, 서버(2000)는 멀티미디어 컨텐츠를 전자 장치(1000)로 전송할 수 있다. S1318에서, 전자 장치(1000)는 생성된 멀티미디어 컨텐츠를 출력할 수 있다.In S1308, the electronic device (1000) can transmit a prompt to the server (2000). In S1310, the server (2000) can preprocess the prompt obtained from the electronic device (1000). The operation of the server (2000) to preprocess the prompt may include a summary operation and a noise removal operation of the prompt. In S1312, the server (2000) can match a video resource to one or more paragraphs in the preprocessed prompt. In S1314, the server (2000) can generate multimedia content by matching the video resource to the prompt. In S1316, the server (2000) can transmit the multimedia content to the electronic device (1000). In S1318, the electronic device (1000) can output the generated multimedia content.

본 개시에 따른 뉴럴 네트워크 신호에 기초하여 멀티미디어 컨텐츠를 생성하는 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The method for generating multimedia content based on a neural network signal according to the present disclosure may be implemented in the form of program commands that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program commands, data files, data structures, etc., singly or in combination. The program commands recorded on the medium may be those specially designed and configured for the present invention or may be those known to and usable by those skilled in the art of computer software.

컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속한다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROMs, RAMs, and flash memories. Examples of program instructions include not only machine language codes generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements made by those skilled in the art using the basic concepts of the present invention defined in the following claims also fall within the scope of the present invention.