CN108388399B

Movatterモバイル変換

Info

Publication number: CN108388399B
Application number: CN201810032045.5A
Authority: CN
Inventors: 秦萌萌; 贾志强; 俞晓君
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Virtual Point Technology Co Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2021-04-06
Anticipated expiration: 2038-01-12
Also published as: CN108388399A

Abstract

The invention provides a state management method of virtual idol, the virtual idol has specific image characteristic, and is shown out through the holographic apparatus, the method comprises: obtaining a multi-modal input; resolving intents or operations in the multi-modal input to obtain a conversion intention or a conversion instruction for state conversion; converting the current state of the virtual idol into a new state of the virtual idol indicated by the conversion intention or the conversion instruction; the new state includes: and starting the capability or skill module required by the virtual idol in the new state. The virtual idol state management method and the virtual idol state management system provided by the invention provide a virtual idol which can complete multi-mode interaction with a user through holographic imaging. In addition, the virtual idol provided by the invention also comprises a plurality of states, such as a pause state, an audio output state, a recording waiting state, a recording state, a standby state and a skill starting state, and the state of the virtual idol can be managed, so that the interactive experience of a user is improved.

Description

Virtual idol state management method and system

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method and a system for managing the state of a virtual idol.

Background

The development of robotic chat interactive systems has been directed to mimicking human conversation. Early applications of the more widespread chat bot include the mini i chat bot or the siri chat bot on apple cell phone, among others, that processes received input (including text or speech) and responds accordingly based on the input in an attempt to mimic human-human interaction between contexts.

However, at present, the development of a robot chat interactive system related to virtual idols is not perfect, and a multi-modal interactive product capable of performing multi-modal interaction with a user and managing the state of the virtual idols does not appear yet.

Therefore, the invention provides a method and a system for managing the state of a virtual idol.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for managing the state of a virtual idol, wherein the virtual idol has specific image characteristics and is displayed by a holographic device, the method comprising the following steps:

obtaining a multi-modal input;

resolving intents or operations in the multimodal input to obtain a conversion intent or a conversion instruction for state conversion;

converting the current state of the virtual idol into a new state of the virtual idol indicated by the conversion intent or conversion instruction;

the new state includes: and starting a capability or skill module required by the virtual idol in the new state.

According to one embodiment of the present invention, the states of the virtual idol are divided into a sleep state, an active state, and a wait state, wherein,

the sleep state includes: a suspend state and a standby state;

the active states include: a recording state, an audio output state and a skill opening state;

in a pause state, stopping running the virtual idol;

in a standby state, running the virtual idol in a background;

under the recording state, stopping multi-mode output before stopping, and starting to detect the audio signal;

in an audio output state, calling a language interaction module in the capability or skill module to carry out dialogue interaction;

and in the skill opening state, calling a dance performance module in the ability or skill module to perform the dance performance.

According to an embodiment of the present invention, the waiting state is a recording waiting state.

According to an embodiment of the invention, in the waiting state, the state to be entered is determined to be an audio output state or a skill opening state by combining with an analysis result of the cloud brain on the multi-modal input, and after the audio output state or the skill opening state is entered, multi-modal output started by a capability or skill module is executed by combining with feedback of the cloud brain.

According to an embodiment of the present invention, in any one of the active states, if it is detected that the task in the current state has been processed to end and no multi-modal input data is detected, the current state is transited to a standby state or a suspended state in the sleep state.

According to an embodiment of the present invention, the recording state in the active state has the highest priority, and when the virtual idol is in the waiting state, that is, in the recording waiting state, the user voice is collected so that the virtual idol enters the recording state.

According to another aspect of the invention, there is also provided a program product containing a series of instructions for carrying out any of the method steps as described above.

According to another aspect of the present invention, there is also provided a virtual idol, wherein the virtual idol has a specific avatar and preset attributes, and the method as described above is adopted to perform the state transformation process of the virtual idol.

According to another aspect of the present invention, there is also provided a state management system for virtual idols, the system comprising:

the intelligent device is loaded with the virtual idol, is used for acquiring multi-modal input, and has the capabilities of natural language understanding, visual perception, touch perception, language voice output and emotion expression and action output;

the holographic equipment is used for acquiring multi-mode input, converting the image of the virtual even image into a holographic image and displaying the holographic image;

and the cloud brain is used for determining whether the state to be entered is an audio output state or a skill opening state according to the analysis result of the multi-modal input in a waiting state, and deciding the multi-modal output of the virtual idol after entering the audio output state or the skill opening state.

The virtual idol state management method and system provided by the invention provide a virtual idol which can complete multi-mode interaction with a user in a holographic imaging mode. In addition, the virtual idol in the virtual idol state management system provided by the invention also comprises a plurality of states, such as a pause state, an audio output state, a recording waiting state, a recording state, a standby state and a skill starting state.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 shows a multi-modal interaction diagram of a virtual idol state management system according to one embodiment of the invention;

FIG. 2 is a block diagram illustrating a state management system for virtual idols in accordance with one embodiment of the present invention;

FIG. 3 illustrates a state classification diagram of a state management system for virtual doublets, according to one embodiment of the present invention;

FIG. 4 is a diagram illustrating state transition of a state management system for virtual idols in accordance with one embodiment of the present invention;

FIG. 5 illustrates a block diagram of a state management system for virtual idols, in accordance with one embodiment of the present invention;

FIG. 6 shows a flow diagram of a method for state management of virtual idols in accordance with one embodiment of the invention;

FIG. 7 illustrates another flowchart of a method for state management of virtual idols in accordance with one embodiment of the present invention; and

fig. 8 shows a flow diagram of communication among four parties, a user, a smart device, a holographic device, and a cloud brain, according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

For clarity, the following description is required before the examples:

the virtual even image provided by the invention takes holographic equipment as a main display interface and has specific image characteristics;

the intelligent device supporting the input/output and control module realizes multi-modal man-machine interaction, and has AI capabilities of natural language understanding, visual perception, touch perception, language voice output, emotion expression and action output and the like;

the social attribute, the personality attribute, the character skill and the like can be configured, so that a user (a quadratic element feverish friend) can enjoy the virtual character with entertainment and personalized smooth experience.

The cloud brain realizes interaction with the user for the terminal providing the processing capability of the virtual idol for performing semantic understanding (language semantic understanding, action semantic understanding, visual recognition, emotion calculation and cognitive calculation) on the interaction requirement of the user so as to help the user to make a decision.

Various embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

FIG. 1 shows a multi-modal interaction diagram of a virtual idol state management system according to one embodiment of the invention. As shown in fig. 1, performing multi-modal interactions requires auser 101, asmart device 102, aholographic device 103, and acloud brain 104. Wherein, theuser 101 interacting with the virtual idol can be a real person, another virtual idol and a physical virtual idol, and the interaction process of the another virtual idol and the physical virtual idol and the virtual idol is similar to the interaction process of a single person and the virtual idol. Thus, only the multi-modal interaction process of the user (person) with the virtual idol is illustrated in FIG. 1.

The process of interaction between the virtual idol and theuser 101 in fig. 1 is:

the preliminary preparation or condition required for interaction is that the virtual idol is piggybacked and run on thesmart device 102 and that the virtual idol has specific character characteristics. The virtual idol has AI capabilities of natural language understanding, visual perception, touch perception, language output, emotion expression action output and the like. In order to cooperate with the touch sensing function of the virtual idol, a part with the touch sensing function also needs to be installed on the intelligent device. According to an embodiment of the invention, in order to improve the interactive experience, the virtual idol is displayed in the preset area of the holographic device after being started, so that the waiting time of a user is prevented from being too long.

It should be noted that the appearance and dressing of the virtual idol is not limited to one mode. The virtual idol may have different images and make-up. The image of the virtual idol is generally a 3D high-modulus animated image. The virtual idol may have different appearances and decorations. The image of each virtual idol can also correspond to a plurality of different dresses, and the dressing classification can be classified according to seasons and occasions. These images and masquerades may be present incloud brain 104 or insmart device 102 and may be invoked at any time when they need to be invoked.

The social attributes, personality attributes, and character skills of the virtual idol are also not limited to one type or category. The virtual human can have various social attributes, various personality attributes and various character skills. The social attributes, personality attributes and character skills can be matched respectively, and are not fixed in a matching mode, so that a user can select and match the social attributes, the personality attributes and the character skills according to needs.

According to one embodiment of the invention, theholographic device 103 for displaying virtual idols comprises a communication interface, an imaging means and an output means. Wherein, the communication interface receives the image of the virtual idol and the interactive data transmitted by theintelligent device 102. The imaging device is connected with the communication interface and used for converting the image of the virtual even image into a holographic image and displaying the holographic image in a preset area. The output device is connected with the communication interface and the imaging device and is used for presenting display data of the current states of the holographic image and the virtual idol.

Following is a multimodal interaction process, first, a multimodal input is obtained. The multimodal input may be spoken by theuser 101 or may be input through the perceptual environment. The multimodal input may contain information in a variety of modalities, such as text, speech, visual, and perceptual information. The receiving devices for acquiring the multi-modal input are all installed or configured on the intelligent equipment or the holographic equipment, and comprise a text receiving device for receiving text, a voice receiving device for receiving voice, a camera for receiving vision, infrared equipment for receiving perception information and the like.

Next, intents or operations in the multimodal input are parsed to derive a conversion intent or conversion instruction for state conversion. In the multimodal interaction process, the virtual idol may interact with theuser 101 in multiple states, each state having different capabilities or skill modules of the virtual idol.

In order to convert the state of the virtual idol during the interaction between the virtual idol and theuser 101, it is necessary to analyze the intention or operation in the multimodal input in real time, and analyze whether the multimodal input includes the intention of theuser 101 to convert the state of the virtual idol, so as to obtain a conversion intention or a conversion instruction for converting the state.

After the transformation intention or the transformation instruction is obtained, next, the current state of the virtual idol is transformed into the new state of the virtual idol indicated by the transformation intention. According to one embodiment of the invention, the states of the virtual idol include a sleep state, an active state, and a wait state. Wherein the sleep state comprises: a suspend state and a standby state; the active states include: a recording state, an audio output state, and a skill on state. The running condition of the virtual idol in each state is that the virtual idol stops running in a pause state; in a standby state, running a virtual idol in a background; in a recording state, detecting an audio input signal, and starting a recording module in the capability or skill module to record audio input data; in the audio output state, a language interaction module in the ability or skill module is called to carry out dialogue interaction; and in the skill opening state, calling a dance performance module in the ability or skill module to perform the dance performance.

And finally, starting a capability or skill module required by the virtual idol in the new state.

In one embodiment of the present invention, the screen of thesmart device 102 faces theholographic device 103, and displays an image of a virtual idol on the screen, where the image of the virtual idol is a four-angle view, i.e., a front view, a rear view, a left view, and a right view.

According to another embodiment of the present invention, a virtual idol has a specific avatar and preset attributes, and the method for managing the state of the virtual idol provided by the present invention is used to perform the process of converting the state of the virtual idol.

FIG. 2 is a block diagram illustrating a state management system for virtual idols according to an embodiment of the present invention. As shown in fig. 2, the completion of multi-modal interactions by the system requires:user 101,smart device 102, andcloud brain 104. Thesmart device 102 includes a receivingdevice 102A, aprocessing device 102B, anoutput device 102C, and a connectingdevice 102D.Cloud brain 104 includes acommunication device 1041.

The virtual idol state management system provided by the invention needs to establish a smooth communication channel among theuser 101, theintelligent device 102 and thecloud brain 104 so as to complete the interaction between theuser 101 and the virtual idol. To accomplish the task of interaction,smart device 102 andcloud brain 104 are configured with means and components that support the completion of interaction. The object interacting with the virtual idol may be one party or multiple parties.

Thesmart device 102 includes a receivingdevice 102A, aprocessing device 102B, anoutput device 102C, and a connectingdevice 102D. Wherein the receivingdevice 102A is configured to receive a multimodal input. Examples of the receivingapparatus 102A include a keyboard, a cursor control device (mouse), a microphone for voice operation, a scanner, a touch function (e.g., a capacitive sensor to detect physical touch), a camera (detecting motion not involving touch using visible or invisible wavelengths), and so forth. Thesmart device 102 may obtain multimodal input through the input devices mentioned above. Theoutput device 102C is used for outputting the multi-modal output data of the virtual idol interacting with theuser 101, and will not be described herein.

Theprocessing device 102B is configured to process interaction data transmitted by thecloud brain 104 during an interaction process. Theconnection device 102D is used for communication with thecloud brain 104, and theprocessing device 102B processes the multi-modal input preprocessed by the receivingdevice 102A or the data transmitted by the cloud brain. Theconnection device 102D sends a call instruction to call the robot capability on thecloud brain 104.

In the waiting state, thecloud brain 104 can determine whether the state to be entered is an audio output state or a skill on state according to the analysis result of the multi-modal input, and decide the multi-modal output of the virtual idol after entering the audio output state or the skill on state.

Thecloud brain 104 includes acommunication device 1041 for completing communication with thesmart device 102. Thecommunication device 1041 communicates with theconnection device 102D on thesmart device 102, receives the request from thesmart device 102, and sends the processing result from thecloud brain 104, which is a medium for communication between thesmart device 102 and thecloud brain 104.

FIG. 3 shows a state classification diagram of a state management system for virtual doublets, according to one embodiment of the invention. As shown in FIG. 3, virtual idol state 300 includes asleep state 301, anactive state 302, and await state 303. Thesleep state 301 includes a suspend state 3011 and a standby state 3012. Theactive state 302 includes a recording state 3021, an audio output state 3022, and a skill on state 3023.

According to one embodiment of the invention, the capabilities or skills of the virtual idol state include, in the pause state 3011, stopping running the virtual idol; in the standby state 3012, the virtual idol is running in the background; in the recording state 3021, an audio input signal is detected, and a recording module in the capability or skill module is started to record audio input data; in a voice output state 3022, a language interaction module in the capability or skill module is called to perform dialogue interaction; in the skill on state 3023, a dance performance module of the ability or skill modules is invoked for a dance performance.

In the virtual idol state management system provided by the invention, the waiting state is an important component in the virtual idol state and is a bridge between the recording state and the audio output state or the skill opening state. According to an embodiment of the present invention, the waitingstate 303 may be a recording waiting state, that is, a state of responding to interruption, in which the state to be entered is determined to be the audio output state 3022 or the skill on state 3023 in combination with the analysis result of thecloud brain 104 on the multimodal input, and after the audio output state 3022 or the skill on state 3023 is entered, the multimodal output in which the competence or skill module is turned on is performed in combination with the feedback of thecloud brain 104.

In any of theactive states 302, if it is detected that the task in the current state has finished processing and no multimodal input data is detected, the current state is transitioned to the standby state 3012 or the suspended state 3011 in thesleep state 301, according to an embodiment of the present invention. In addition, the recording state 3021 in theactive state 302 has the highest priority, and when the virtual idol is in the waitingstate 303, i.e., the recording waiting state, the user voice is captured so that the virtual idol enters the recording state.

FIG. 4 is a diagram illustrating state transition of a state management system for virtual idols according to an embodiment of the present invention.

The virtual idol in the virtual idol state management system provided by the invention has a plurality of different states, and the virtual idol has different capabilities or skills in each state. The virtual idol is capable of transforming its state under the direction of the user when interacting multimodal with the user.

The virtual idol then enters a paused state after the interactive program in thesmart device 102 is activated. In the pause state, the virtual idol stops running. When an activation event occurs, the virtual idol enters a recording waiting state. In an embodiment of the present invention, the activation event may be that theuser 101 presses a button for starting the recording waiting state, that is, theintelligent device 102 may include an entity recording waiting button or a virtual recording waiting button, and when theuser 101 presses the entity recording waiting button or the virtual recording waiting button, the state of the virtual idol is converted from the pause state to the recording waiting state. In addition, it should be noted that the activation event may also be in other forms, and the present invention does not limit the activation form of the activation event.

After the virtual idol is converted into the recording waiting state from the pause state, if the virtual idol detects that the user speaks, the state of the virtual idol is converted into the recording waiting state from the recording waiting state. In the recording state, the virtual idol detects the audio input signal, and a recording module in the capability or skill module is started to record audio input data. When the virtual idol is in the recording state and the virtual idol detects the words of 'seeing again', the virtual idol is converted into the standby state from the recording state. At this time, theuser 101 reveals the intention of ending the recording, and the virtual idol immediately changes to the standby state to wait for the next multi-modal input of theuser 101.

And if the virtual idol is in the recording state and the user is detected to stop speaking, converting the virtual idol from the recording state into the recording waiting state. In addition, if the virtual idol is required to be switched from the recording state to the audio output state, the virtual idol is firstly switched from the recording state to the recording waiting state and then from the recording waiting state to the audio output state. When the virtual idol is in the audio output state, theuser 101 can perform dialogue interaction with the virtual idol, the virtual idol can play interactive audio interacted with theuser 101, and when the interactive audio playing is finished, the state of the virtual idol is converted into the pause state from the audio output state.

In addition, when the virtual idol is in the standby state, if theuser 101 sends out a wakeup intention or an instruction, the virtual idol is converted into a recording waiting state from the standby state. The wake intent here may be a particular audio emitted by the virtual idol and a particular limb movement or a particular biometric characteristic of theuser 101.

If the virtual idol is required to be switched from the recording state to the skill starting state, the virtual idol is firstly converted from the recording state to the recording waiting state and then from the recording waiting state to the skill starting state. When the virtual idol is in the skill on state, the dance performance module in the ability or skill module is called to perform the dance performance so as to show the dance performance to theuser 101.

When the virtual idol is in the skill opening state and the virtual idol sings completely or is interrupted, the virtual idol is converted into the recording waiting state from the skill opening state. When the virtual idol is in the skill opening state and the virtual idol sings, the virtual idol is converted into the standby state from the skill opening state.

FIG. 5 illustrates a block diagram of a state management system for virtual idols in accordance with one embodiment of the present invention. As shown in fig. 5, the system includes an acquisition module 501, an intent module 502, a status module 503, and a skills module 504. The acquisition module 501 includes a text acquisition unit 5011, anaudio acquisition unit 5012, a vision acquisition unit 5013, and a perception acquisition unit 5014.

The obtaining module 501 is used for obtaining multimodal input. The text collection unit 5011 is used to collect text information. Theaudio collection unit 5012 is used to collect audio information. The vision acquisition unit 5013 is used to acquire visual information. The perception acquisition unit 5014 is used to acquire perception information. Examples of acquisition module 501 include a keyboard, a cursor control device (mouse), a microphone for voice operation, a scanner, touch functionality (e.g., capacitive sensors to detect physical touch), a camera, a sensory control device, such as using visible or invisible wavelengths of radiation, signals, environmental data, and so forth. The multimodal input data may be acquired through the input device mentioned above. The multi-modal input may comprise one or more of text, audio, visual, and perceptual data, and the invention is not limited thereto.

The intent module 502 is used to parse intents or operations in the multimodal input to obtain a conversion intent or conversion instruction for state conversion. The intent module 502 includes aparsing unit 5021, and theparsing unit 5021 is used for parsing the multi-modal input to obtain a conversion intent or a conversion instruction included in the multi-modal input. The translation intent or translation instructions can be used to direct the translation between the various states of the virtual idol.

The state module 503 is used to convert the current state of the virtual idol into the new state of the virtual idol indicated by the conversion intent. According to one embodiment of the invention, the virtual idol includes a plurality of states, such as a sleep state, an active state, and a wait state. The sleep state includes a suspend state and a standby state. The active state includes a recording state, an audio output state, and a skill on state. The intermediate active state comprises a record waiting state. The status module 503 comprises a convertingunit 5031, and in one embodiment, the convertingunit 5031 can convert the status of the virtual idol from the sleep status to the active status and also convert the status of the virtual idol from the active status to the sleep status.

Skill module 504 is used to turn on the capabilities or skill modules required by the virtual idol in the new state. Skill module 504 includes anopening unit 5041, and when the virtual idol is converted to the new state, openingunit 5041 immediately opens the capability or skill of the virtual idol corresponding to the new state.

FIG. 6 shows a flowchart of a method for state management of virtual idols, according to an embodiment of the invention.

As shown in fig. 6, in step S601, a multimodal input is acquired. In this step, thesmart device 102 or theholographic device 103 may obtain a multi-modal input, which may be input by theuser 101 or input by another device having an input function. Thesmart device 102 and theholographic device 103 will be configured with corresponding means for obtaining multimodal input. The multimodal input may be in the form of text input, audio input, and sensory input.

Next, in step S602, the intention or operation in the multimodal input is parsed to obtain a conversion intention or a conversion instruction for state conversion. The multimodal input includes various information, and in order to know the interaction intention information of theuser 101, it is necessary to analyze the intention or operation in the multimodal input and obtain a conversion intention or a conversion instruction for state conversion from the intention or operation.

Then, in step S603, the current state of the virtual idol is converted into the new state of the virtual idol that is intentionally indicated. According to one embodiment of the invention, the virtual idol includes a plurality of states, such as a sleep state, an active state, and a wait state. The sleep state includes a suspend state and a standby state. The active state includes a recording state, an audio output state, and a skill on state. The intermediate active state comprises a record waiting state. After learning the translation intent or translation instruction for the state translation, the current state of the virtual idol is translated in this step to the new state of the virtual idol indicated by the intent.

Finally, the virtual idol enters a new state, and in step S604, the capability or skill module required by the virtual idol in the new state is started. Each state of the virtual idol contains a capability or skill module in the state. According to one embodiment of the invention, in a pause state, the virtual idol is stopped from running; in a standby state, running a virtual idol in a background; in a recording state, detecting an audio input signal, and starting a recording module in the capability or skill module to record audio input data; in the audio output state, a language interaction module in the ability or skill module is called to carry out dialogue interaction; and in the skill opening state, calling a dance performance module in the ability or skill module to perform the dance performance.

In addition, the virtual idol state management system provided by the invention can be matched with a program product, and the program product comprises a series of instructions for executing the steps of the virtual idol state management method.

FIG. 7 shows another flowchart of a method for state management of virtual idols in accordance with one embodiment of the invention.

As shown in fig. 7, in step S701, thesmart device 102 issues a request to thecloud brain 104. Thereafter, in step S702, thesmart device 102 is always in an interaction state with thecloud brain 104. During the interaction, thesmart device 102 will time the time it takes to return data.

In step S703, if the returned response data is not obtained for a long time, for example, the time length exceeds a predetermined time length of 5S, thesmart device 102 may select to perform local reply, and generate local common response data. Then, in step S704, an animation associated with the local general response is output, and the voice playback device is called to perform voice playback.

In order to realize multi-modal interaction between thesmart device 102 and theuser 101, communication connection is required to be established among theuser 101, thesmart device 102, theholographic device 103 and thecloud brain 104. The communication connection should be real-time and unobstructed to ensure that the interaction is not affected.

In order to complete the interaction, some conditions or preconditions need to be met. These conditions or preconditions include the loading and running of virtual idols in thesmart device 102 and the hardware facilities of thesmart device 102 with sensing and control functions. In addition, theholographic device 103 can receive the image of the virtual idol transmitted by theintelligent device 102, convert the image of the virtual idol into a holographic image, and display the holographic image in a preset area.

After the preparation is completed, thesmart device 102 starts to interact with theuser 101, and first, thesmart device 102 and/or theholographic device 103 obtains multi-modal input, which may be sent by theuser 101 or sent by other devices. At this point, the two parties of the spread data transfer are theuser 101 and thesmart device 102 and/or theholographic device 103. Next, intents or operations in the multimodal input are parsed to derive a conversion intent or conversion instruction for state conversion.

Then, when the virtual idol is in the waiting state, theintelligent device 102 sends a request to thecloud brain 104, thecloud brain 104 determines whether the state to be entered is an audio output state or a skill opening state according to the analysis result of the multi-modal input, and after the state to be entered is the audio output state or the skill opening state, thecloud brain 104 replies theintelligent device 102 to decide the multi-modal output of the virtual idol. At this time, two parties that perform communication are thesmart device 102 and thecloud brain 104.

After theintelligent device 102 receives the data and the instruction transmitted by thecloud brain 104 or after the current state of the virtual idol running in theintelligent device 102 is converted into the new state of the virtual idol indicated by the conversion intention, theintelligent device 102 transmits the image of the virtual idol and the display data of the current state of the virtual idol to theholographic device 103. Theholographic device 103 converts the virtual even image into a hologram, so that the hologram of the virtual even image is displayed in a predetermined area of theholographic device 103. At this time, thesmart device 102 and theholographic device 103 are two parties of the spread communication.

Finally, theholographic device 103 outputs the hologram of the virtual idol and the display data of the current state of the virtual idol, and displays the output to theuser 101. The two parties that are spreading the communication at this time are theholographic device 103 and theuser 101.

It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for managing the status of a virtual idol, wherein said virtual idol has specific image characteristics and is displayed by a holographic device, said method comprising the steps of:

obtaining a multi-modal input;

converting the current state of the virtual idol into a new state of the virtual idol indicated by the conversion intention or the conversion instruction, wherein the states of the virtual idol comprise a sleep state, an active state and a waiting state, and the sleep state comprises: a suspended state and a standby state, the active state including: the device comprises a recording state, an audio output state and a skill opening state, wherein the waiting state is a recording waiting state;

the new state includes: starting a capability or skill module required by the virtual idol in the new state;

after an interactive program in the intelligent equipment is activated, entering a pause state, stopping running of the virtual idol, entering a recording waiting state when an activation event occurs, if the user speaking is detected in the recording waiting state, converting the recording waiting state into a recording state, detecting an audio input signal in the recording state, starting a recording module in the capability or skill module to record audio input data, and if the recording waiting state is detected and the user speaking is stopped, converting the recording waiting state into a recording waiting state;

if the recording state needs to be switched to the audio output state, the recording state is switched to a recording waiting state, then the recording waiting state is switched to the audio output state, when the recording waiting state is in the audio output state, a user can carry out dialogue interaction with the virtual idol, the virtual idol can play interactive audio interacted with the user, and when the interactive audio playing is finished, the audio output state is switched to a pause state;

when the user is in the recording state, the user table is exposed to finish the intention of finishing the recording, the user table is converted into the standby state to wait for the next multi-mode input of the user, and when the user table is in the standby state, if the user sends out an awakening intention or an instruction, the user table is converted into the recording waiting state, wherein the awakening intention can be a specific audio frequency sent by a virtual idol, a specific limb action or a specific biological characteristic of the user;

if need be by the recording state switch to skill on-state, turn into the state of waiting to record by the recording state at first, turn into the state of waiting to record by the state of waiting to record again, when being in the skill on-state, the dance performance module of calling ability or among the skill module carries out the dance performance, finishes singing or is interrupted when being in the state of skill on-state, then turns into the state of waiting to record, starts when being in the state of skill on-state and virtual idol singing, then turns into standby state.

2. The method for managing the state of a virtual idol according to claim 1, wherein the state of the virtual idol is divided into a sleep state, an active state, and a wait state, wherein,

the sleep state includes: a suspend state and a standby state;

in a pause state, stopping running the virtual idol;

in a standby state, running the virtual idol in a background;

3. The method for managing the status of a virtual idol as claimed in claim 2, wherein said wait status is a record waiting status when a previous status is over.

4. The method for state management of virtual idols according to claim 1, wherein,

and determining whether the state to be entered is an audio output state or a skill opening state by combining with the analysis result of the cloud brain on the multi-modal input in the recording waiting state, and executing multi-modal output started by a capability or skill module by combining with the feedback of the cloud brain after entering the audio output state or the skill opening state.

5. The method for state management of virtual idols according to claim 2, wherein,

in any active state, if it is detected that the task in the current state has been processed to end and no multi-modal input data is detected, the current state is transitioned to a standby state or a suspended state in the sleep state.

6. The method for state management of virtual idols according to claim 5, wherein,

the recording state in the active state has the highest priority, and when the virtual idol is in a waiting state, namely a recording waiting state, the voice of the user is collected so that the virtual idol enters the recording state.

7. A storage medium containing a series of instructions for performing the method steps of any of claims 1-6.

8. Virtual idol, characterized in that it has a specific avatar and preset attributes, the state transformation of which is performed using the method according to any of claims 1-6.

9. A system for managing the status of a virtual idol, said system comprising:

the intelligent device is loaded with the virtual idol as claimed in claim 8, is used for acquiring multi-modal input, and has the capabilities of natural language understanding, visual perception, touch perception, language voice output and emotional expression and action output;

a holographic device for acquiring a multimodal input and converting the imagery of the virtual idol of claim 8 into a hologram and displaying the hologram;

a cloud brain, configured to determine, in a waiting state, whether a state to be entered is an audio output state or a skill on state according to an analysis result of the multi-modal input, and decide multi-modal output of the virtual idol according to claim 8 after entering the audio output state or the skill on state.