Disclosure of Invention
The embodiment of the invention provides an intelligent robot control method, an intelligent robot control device, a server, a robot and a storage medium, which are used for improving the flexibility and diversity of interaction between the intelligent robot and a user.
In a first aspect, an embodiment of the present invention provides an intelligent robot control method, where the method is applied to a server, and the method includes:
Receiving sensing data and touch operation data uploaded by an intelligent robot, wherein the sensing data are acquired by different types of sensors arranged in the intelligent robot, the touch operation data are generated by detecting touch operation of a user through a flexible screen arranged in the intelligent robot, and the flexible screen forms all or part of an outer shell of the intelligent robot;
Determining multi-modal response behavior data corresponding to the intelligent robot based on the sensing data and the touch operation data, wherein the multi-modal response behavior data at least comprises any one of action data, voice output data and display data, and the display data is displayed through the flexible screen;
and transmitting the multi-mode reaction behavior data to the intelligent robot so that the intelligent robot performs reaction output based on the multi-mode reaction behavior data.
Optionally, the determining, based on the sensing data and the touch operation data, multi-modal response behavior data corresponding to the intelligent robot includes:
preprocessing the sensing data;
and inputting the preprocessed sensing data and the touch operation data into a pre-trained neural network model to obtain multi-mode reaction behavior data corresponding to the intelligent robot.
Optionally, the sensing data includes image data captured by an image capturing device and/or voice receiving data picked up by a voice pickup device;
the preprocessing of the sensing data comprises the following steps:
semantic extraction of the image data, and/or,
The voice reception data is converted into text data.
Optionally, after the preprocessed sensing data and the touch operation data are input into a pre-trained neural network model to obtain multi-mode reaction behavior data corresponding to the intelligent robot, the method further includes:
Determining reference response behavior data and behavior constraint conditions corresponding to the preprocessed sensing data and the touch operation data based on preset rules;
Comparing the reference reaction behavior data with the multi-modal reaction behavior data, and supplementing the multi-modal reaction behavior data according to a comparison result;
and eliminating the data which does not accord with the behavior constraint condition in the multi-mode reaction behavior data.
Optionally, after the preprocessed sensing data and the touch operation data are input into a pre-trained neural network model to obtain multi-mode reaction behavior data corresponding to the intelligent robot, the method further includes:
acquiring service logic currently being executed by the intelligent robot;
And adjusting the multi-mode reaction behavior data based on the business logic.
In a second aspect, an embodiment of the present invention provides a method for controlling a smart robot, the method being applied to a smart robot including different types of sensors and a flexible screen as all or part of an outer casing of the smart robot, the method comprising:
Acquiring sensing data through the sensors of different types, and generating touch operation data through touch operation of a user detected by the flexible screen;
Uploading the sensing data and the touch operation data to a server;
Receiving multi-modal response behavior data corresponding to the sensing data and the touch operation data, wherein the multi-modal response behavior data at least comprises any one of action data, voice output data and display data, and the display data is displayed through the flexible screen;
and carrying out reaction output based on the multi-mode reaction behavior data.
Optionally, the sensor includes an image capturing device and/or a voice pickup device, and the sensing data includes image data captured by the image capturing device and/or voice receiving data picked up by the voice pickup device;
After the sensing data is acquired by the different types of sensors, the method further comprises:
semantic extraction of the image data, and/or,
The voice reception data is converted into text data.
Optionally, the display data includes two-dimensional display data or three-dimensional display data.
Optionally, the intelligent robot comprises a whole flexible screen or a plurality of flexible screens.
Optionally, when the intelligent robot includes a plurality of flexible screens, the performing reaction output based on the multi-modal reaction behavior data includes:
determining sub-display data in the display data corresponding to each flexible screen respectively based on the current position of each flexible screen;
and controlling each flexible screen to display the corresponding sub-display data.
Optionally, the method further comprises:
detecting dividing operation of dividing a sub-display area in a main display area by the user through the flexible screen;
and in response to the dividing operation, dividing the sub display area in the main display area, and respectively displaying different display data through the main display area and the sub display area.
In a third aspect, an embodiment of the present invention provides an intelligent robot control apparatus, where the apparatus is applied to a server, the apparatus includes:
The intelligent robot comprises a receiving module, a control module and a control module, wherein the receiving module is used for receiving sensing data and touch operation data uploaded by the intelligent robot, the sensing data are acquired by different types of sensors arranged in the intelligent robot, the touch operation data are generated by detecting touch operation of a user through a flexible screen arranged in the intelligent robot, and the flexible screen forms all or part of an outer shell of the intelligent robot;
The determining module is used for determining multi-modal response behavior data corresponding to the intelligent robot based on the sensing data and the touch operation data, wherein the multi-modal response behavior data at least comprises any one of action data, voice output data and display data, and the display data is displayed through the flexible screen;
And the sending module is used for sending the multi-mode reaction behavior data to the intelligent robot so that the intelligent robot carries out reaction output based on the multi-mode reaction behavior data.
Optionally, the determining module is configured to:
preprocessing the sensing data;
and inputting the preprocessed sensing data and the touch operation data into a pre-trained neural network model to obtain multi-mode reaction behavior data corresponding to the intelligent robot.
Optionally, the sensing data includes image data captured by an image capturing device and/or voice receiving data picked up by a voice pickup device;
the determining module is used for:
semantic extraction of the image data, and/or,
The voice reception data is converted into text data.
Optionally, the apparatus further comprises an optimization module, configured to:
Determining reference response behavior data and behavior constraint conditions corresponding to the preprocessed sensing data and the touch operation data based on preset rules;
Comparing the reference reaction behavior data with the multi-modal reaction behavior data, and supplementing the multi-modal reaction behavior data according to a comparison result;
and eliminating the data which does not accord with the behavior constraint condition in the multi-mode reaction behavior data.
Optionally, the optimizing module is further configured to:
acquiring service logic currently being executed by the intelligent robot;
And adjusting the multi-mode reaction behavior data based on the business logic.
In a fourth aspect, an embodiment of the present invention provides a smart robot control apparatus applied to a smart robot including different types of sensors and a flexible screen as all or part of an outer casing of the smart robot, the apparatus including:
the acquisition module is used for acquiring sensing data through the sensors of different types and generating touch operation data through touch operation of a user detected by the flexible screen;
the sending module is used for uploading the sensing data and the touch operation data to a server;
The receiving module is used for receiving multi-modal response behavior data corresponding to the sensing data and the touch operation data, which are issued by the server, wherein the multi-modal response behavior data at least comprises any one of action data, voice output data and display data, and the display data is displayed through the flexible screen;
and the output module is used for carrying out reaction output based on the multi-mode reaction behavior data.
Optionally, the sensor includes an image capturing device and/or a voice pickup device, and the sensing data includes image data captured by the image capturing device and/or voice receiving data picked up by the voice pickup device;
the device also comprises a preprocessing module, wherein the preprocessing module is used for:
semantic extraction of the image data, and/or,
The voice reception data is converted into text data.
Optionally, the display data includes two-dimensional display data or three-dimensional display data.
Optionally, the intelligent robot comprises a whole flexible screen or a plurality of flexible screens.
Optionally, when the intelligent robot includes a plurality of the flexible screens, the output module is configured to:
determining sub-display data in the display data corresponding to each flexible screen respectively based on the current position of each flexible screen;
and controlling each flexible screen to display the corresponding sub-display data.
Optionally, the apparatus further comprises a dividing module, configured to:
detecting dividing operation of dividing a sub-display area in a main display area by the user through the flexible screen;
and in response to the dividing operation, dividing the sub display area in the main display area, and respectively displaying different display data through the main display area and the sub display area.
In a fifth aspect, an embodiment of the present invention provides a server, including a processor and a memory, where the memory stores executable code, and when the executable code is executed by the processor, makes the processor implement at least the intelligent robot control method in the first aspect.
In a sixth aspect, an embodiment of the present invention provides a smart robot, including a processor and a memory, where the memory stores executable code, and when the executable code is executed by the processor, causes the processor to at least implement the smart robot control method in the second aspect.
In a seventh aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of a server, causes the processor to at least implement the intelligent robot control method of the first aspect.
In an eighth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of a smart robot, causes the processor to implement at least the smart robot control method in the second aspect.
By adopting the invention, the sensing data acquired by different sensors can be acquired, and the touch operation of a user can be detected through the flexible screen and the touch operation data can be generated. The sensing data and the touch operation data are used as multi-mode input data, so that interaction modes of interaction with the intelligent robot can be enriched, the interaction modes are not limited to a single interaction mode, and the flexibility and the diversity of interaction are improved. By processing the multimodal input data, multimodal reaction behavior data can be correspondingly output, and the intelligent robot can output multimodal reaction behaviors based on the multimodal reaction behavior data, for example, the intelligent robot can display data in the multimodal reaction behavior data through a flexible screen. By adopting the mode, the expression form of the intelligent robot on the external output can be enriched, and the flexibility of the intelligent robot reaction is improved.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.
The words "if", as used herein, may be interpreted as "at" or "when" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
The embodiment of the invention provides an intelligent robot control method which can be applied to an intelligent robot. Before the method flow is specifically introduced, the structure of the intelligent robot is simply introduced so as to facilitate the understanding of the method flow provided by the embodiment of the invention by combining the structure of the intelligent robot.
Different types of sensors can be arranged in the intelligent robot and are respectively used for collecting different types of sensing data. The sensors may include, but are not limited to, environmental sensors, visual sensors, voice sensors, force sensors, and the like. The environmental sensor may include, for example, a temperature sensor, a humidity sensor, a collision sensor, a drop sensor, an inertial sensor, and the like. The vision sensor may include, for example, at least one set of depth vision sensors, single or multi-line lidar, infrared, ultrasound, etc. The voice sensor may for example comprise a microphone or a microphone array or the like receiving voice data. The force sensor may comprise, for example, a torque sensor on each joint of the intelligent robot, etc. Through these sensors, the intelligent robot can perceive the surrounding environment.
The outer shell of the intelligent robot can be provided with a flexible screen, and the flexible screen can be used as all or part of the outer shell of the intelligent robot. Based on this, the flexible screen may fully or partially enclose the internal components of the intelligent robot.
It will be appreciated that the flexible screen may be bent such that the flexible screen may be formed in a configuration to act as an outer housing for the intelligent robot. In some alternative embodiments, the humanoid intelligent robot may be comprised of a head (e.g., whole face, hindbrain, sides of head, etc., where the face may include eyes), neck joints, arms (e.g., thigh, forearm, elbow, etc.), humanoid hands (e.g., 2-5 digits, 2-3 joints in each digit), multiple degrees of freedom waist, single leg, double leg, knee joint, double foot, wheel chassis of various shapes, etc. The outer casings of the above-listed components of the intelligent robot may be provided as flexible screens, from which part of the components may be selected, the outer casing thereof may be provided as a flexible screen, or the outer casing of all the components may be provided as a flexible screen.
For the convenience of maintenance, some sensors of the intelligent robot may be disposed at positions easily exposed to the outside, and the sensors may be covered inside the intelligent robot by a cover. The cover body can be arranged at a position aligned with the shell body or arranged in a space in which the shell body is sunken. So that the sensor can be exposed after opening the cover. In some alternative embodiments, the flexible screen may be configured to cover the otherwise externally mounted sensor in order to allow the flexible screen to be as completely covered as possible on the exterior surface of the intelligent robot, as the flexible screen replaces the original housing. Thus, the hollowing of a small part of the area of the flexible screen can be avoided, and the integrity of the flexible screen can be maintained. The sensors originally installed outside can include microphones (arrays), various vision cameras, obstacle avoidance cameras, lidar, environmental sensors, inertial navigation, moment sensors, collision sensors, drop sensors, and the like.
The flexible screen provided in the intelligent robot may be used as an input part or an output part. When it is used as an input part, the flexible screen can detect a touch operation by a user. The flexible screen may display data when it is used as an output component. Notably, the flexible screen may detect a touch operation of the user instead of the tactile sensor. Compared with the touch sensor, if the intelligent robot is required to sense the touch of a user from multiple directions, more touch sensors are required to be arranged in multiple directions respectively. Often, due to space limitation, cost limitation and the like, more touch sensors cannot be installed on the intelligent robot, so that a user can only touch certain structural parts of the intelligent robot, and if the user touches the structural parts of the intelligent robot, which are not provided with the touch sensors, the intelligent robot cannot sense the touch of the user. By adopting the invention, the flexible screen can be laid on the surface of the intelligent robot in a large area, the original touch sensor is replaced by the flexible screen, and the flexible screen is used as a component capable of detecting touch operation of a user, so that the problem of the touch sensor of the whole body of the intelligent robot can be well solved. As long as the user touches the structural part of the intelligent robot where the flexible screen is arranged, the intelligent robot can sense the touch.
Meanwhile, the flexible screen can receive the touch operation of a user more flexibly, and the touch operation is not only received singly like a touch sensor. Touch operations that the flexible screen can detect may include single point touch, multi-point touch, single point sliding touch a distance, multi-point sliding touch a distance, and the like.
Fig. 1 is a flowchart of an intelligent robot control method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
101. Sensing data are collected through different types of sensors, and touch operation data are generated through touch operation of a user detected by the flexible screen.
102. And uploading the sensing data and the touch operation data to a server.
103. And receiving multi-mode reaction behavior data corresponding to the sensing data and the touch operation data, wherein the multi-mode reaction behavior data at least comprises any one of action data, voice output data and display data, and the display data is displayed through a flexible screen.
104. And carrying out reaction output based on the multi-mode reaction behavior data.
In practical applications, different types of sensing data can be acquired through different types of sensors. For example, when the sensor is a voice pickup device (microphone or the like), voice reception data can be collected. When the sensor is an image capturing device (at least one set of depth vision sensors, etc.), image data may be captured. Meanwhile, touch operation of a user can be detected through the flexible screen, and touch operation data are generated based on the touch operation. Then, the sensing data and the touch operation data may be used as multi-modal input data. It is understood that "multi-modal" herein may be interpreted as different dimensions, different types.
It should be noted that, in practical application, not all sensors can collect corresponding sensing data at the same time, if a certain sensor does not collect valid sensing data, the sensing data that can be collected by the sensor may not be used as multi-mode input data. For example, if a user does not speak to the intelligent robot but merely takes his hand to the intelligent robot, the voice pickup device in the intelligent robot may not collect valid voice reception data, and the voice reception data is not one of the multimodal input data. In general, the multi-modal input data includes data of which dimensions is determined by the specific usage in the actual application scenario.
After the intelligent robot collects sensing data and touch operation data through the sensor and the flexible screen which are installed on the intelligent robot, the data can be uploaded to a server and processed by the server. In an actual application scenario, assuming that a certain user inputs multi-mode input data to the intelligent robot through own behaviors, the intelligent robot needs to make a corresponding reaction based on the behaviors of the user. What reactions the specific intelligent robot needs to make are derived based on the analysis of the user's behavior, this analysis logic may be provided in a server, the server gives the reactions that the intelligent robot should output to the user, which reactions that the intelligent robot should output to the user may be represented by multi-modal behavior reaction data. The process of how the server outputs the multimodal behavioral response data based on the multimodal input data will be described in detail in the next embodiment and will not be described here.
The intelligent robot can receive the multi-mode behavior reaction data issued by the server and output the reaction based on the multi-mode behavior reaction data. Wherein, the multi-modal behavioral response data may be composed of one or more of motion data, voice output data, and display data. Based on this, the reactions that the intelligent robot can output may be diversified, and are not limited to the reaction results of a single dimension. Based on the method, the method enriches the reaction output form of the intelligent robot and improves the diversity of the reaction output of the intelligent robot. Wherein the display data may be displayed through a flexible screen.
For example, assume that during a user's conversation with the intelligent robot, the intelligent robot wants to express its mental mood to the user as being highly surge. At this time, the intelligent robot can broadcast voice output data 'the mood of the heart of me is very surge' in a voice broadcast mode, and synchronously, the intelligent robot can also play the video of the sea wave rolling in the display data in the flexible screen so as to vividly convey the content which the intelligent robot wants to express to the user.
Alternatively, the display data may include two-dimensional display data or three-dimensional display data. If the three-dimensional display data is displayed through the flexible screen, the user can see the stereoscopic image picture from various angles of the intelligent robot.
In some alternative embodiments, the intelligent robot may be controlled to synchronously or asynchronously broadcast voice and display data, wherein the broadcast voice output data may or may not be related to the specific content of the display data.
In some alternative embodiments, the intelligent robot can be controlled to display the display data through the flexible screen while doing some actions in time series. The actions performed by the intelligent robot may also be considered synchronized with the displayed display data. Or the intelligent robot can be controlled to do some actions in a time series manner and broadcast voice output data, and display the display data through the flexible screen. The display data may or may not be related to the voice output data of the broadcast, the action data performed.
After the intelligent robot collects the sensing data, the sensing data can be directly uploaded to the server for processing by the server, or the sensing data can be initially preprocessed locally, and the preprocessed sensing data is uploaded to the server. Thus, the preprocessing process may also be done in the server. Alternatively, the sensor provided in the intelligent robot may include an image photographing device and/or a voice pickup device, and the sensing data includes image data photographed by the image photographing device and/or voice reception data picked up by the voice pickup device. Accordingly, the intelligent robot may also perform semantic extraction of image data after collecting the sensing data by different types of sensors, and/or convert voice reception data into text data.
Since the preprocessing process may also be performed in the server, the preprocessing process will be described later in the method flow performed in the server, and will not be developed too much.
Optionally, the flexible screen arranged in the intelligent robot can be a whole flexible screen or a plurality of spliced flexible screens. For example, a larger area of the chest of the intelligent robot may be provided as a one-piece flexible screen. Or each part of the intelligent robot can be respectively arranged as a flexible screen, for example, the left arm, the right arm, the chest and the two legs can be respectively arranged as flexible screens, and the flexible screens can be spliced together for use. Of course, when the intelligent robot is provided with how fast flexible screens, the flexible screens can be spliced together to be used as a whole, and can also be used independently. The image frames displayed by different flexible screens can be the same or different, or some flexible screens in the plurality of flexible screens can be set to not display any image frame. In addition, it should be noted that, for a flexible screen, the image displayed therein may be stretched, and the stretching operation includes, but is not limited to, unidirectional, bidirectional, and multidirectional stretching.
When a plurality of flexible screens are spliced together for display, optionally, when the intelligent robot comprises a plurality of flexible screens, the process of outputting the response based on the multi-mode response behavior data can be realized by determining the sub-display data in the display data corresponding to each flexible screen respectively based on the current position of each flexible screen, and controlling each flexible screen to display the sub-display data corresponding to each flexible screen.
The process that a plurality of flexible screens are spliced together to be displayed is introduced by an example shown in fig. 2, as shown in fig. 2, the left and right arms, the chest and the two legs of the intelligent robot are respectively arranged as flexible screens, and when the two arms of the intelligent robot droop and stand vertically, the flexible screens respectively arranged at the corresponding positions of the left and right arms, the chest and the two legs can be spliced into a polygon A. The bounding rectangle a 'or bounding box of polygon a may be determined and the display data to be displayed fills the bounding rectangle a'. Because some areas in the circumscribed rectangle A' are covered by the flexible screen, and some areas are not covered by the flexible screen, the corresponding sub-display data in the display data can be normally displayed for the areas covered by the flexible screen, and the corresponding sub-display data in the display data can be not displayed for the areas not covered by the flexible screen. The flexible screens arranged at the corresponding positions of the left arm, the right arm, the chest and the two legs of the intelligent robot can respectively display the corresponding sub-display data in the display data.
Because the intelligent robot can also do some actions when showing, when intelligent robot does the action, the position that different flexible screens are located can take place the shift along with the part of setting up. When the flexible screen is shifted, the above operation can be repeatedly performed, so as to determine the sub-display data in the display data corresponding to each flexible screen respectively based on the position of each flexible screen again, and control each flexible screen to display the sub-display data corresponding to each flexible screen.
For example, as shown in fig. 3, the intelligent robot is changed from a state that the arms are sagged and stand vertically to a state that the arms are opened horizontally to two sides, so that flexible screens respectively arranged at the corresponding positions of the left arm, the right arm, the chest and the two legs form a T shape. At this time, the flexible screens respectively arranged at the corresponding positions of the left and right arms, the chest and the legs can be spliced into a polygon B. The bounding rectangle B 'or bounding box of the polygon B may be determined, and the bounding rectangle B' is filled with display data to be displayed. Since the area of the external rectangle B ' is smaller than that of the external rectangle a ', in order to enable the display data to still fill the external rectangle B ', the stretching operation needs to be performed on the display data in the external rectangle a ', so that the stretched display data can fill the external rectangle B '.
Because some areas of the circumscribed rectangle B' are covered by the flexible screen, and some areas are not covered by the flexible screen, the corresponding sub-display data in the display data can be normally displayed for the areas covered by the flexible screen, and the corresponding sub-display data in the display data can be not displayed for the areas not covered by the flexible screen.
It can be seen that in this state of the intelligent robot, the larger area in the circumscribed rectangle B' is not covered by the flexible screen, and accordingly some information in the display data cannot be displayed. If the information which is not displayed is more important, the intelligent robot can be controlled to move appropriately, so that the flexible screen can cover most or all areas in the circumscribed rectangle B' in a time-sharing way, and more information can be displayed. For example, as shown in fig. 4, the two arms of the intelligent robot can be moved up and down, so that the area swept by the two arms can be encircled into a fan shape. When the two arms sweep a certain position, the sub-display data corresponding to the position can be displayed, so that the time-sharing display effect can be achieved. When the two arms of the intelligent robot move fast enough, the positions of the two arms are difficult to distinguish by human eyes, and at the moment, from the visual effect, all contents in the sector area can be seen by a user at the same time, so that the effect of dynamic display is realized.
The flexible screen provided by the embodiment of the invention can also support the display effect of 'picture-in-picture'. Optionally, detecting a division operation of dividing the sub-display area in the main display area by the user through the flexible screen, dividing the sub-display area in the main display area in response to the division operation, and displaying different display data through the main display area and the sub-display area respectively.
In practical applications, the flexible screen may be divided into a main display area and a sub-display area to implement "picture-in-picture". The position, size, shape, etc. of the sub-display area may be specified by a user through a touch operation, wherein the touch operation to set the sub-display area may include, but is not limited to, multi-point, multi-line, drawing an irregular figure. The multi-point operation is understood to mean that by touching a plurality of points in the flexible screen, a polygon is uniquely defined by the positions of the plurality of points. For example, a user may touch two points in the flexible screen, which may be two points on a diagonal of a rectangle, so that a rectangle may be uniquely determined based on the two points. The operation of a multi-wire is understood to mean that by drawing a plurality of wires in a flexible screen, the wires may enclose a polygon. Drawing an irregular figure is understood to mean that the user is free to draw closed areas of arbitrary shape.
In addition, it should be noted that "picture-in-picture" may be set in a single flexible screen, or "picture-in-picture" may be set in the whole body after the plurality of flexible screens are spliced.
The invention further provides an intelligent robot control method which can be applied to a server. Fig. 5 is a flowchart of a control method of an intelligent robot according to an embodiment of the present invention, as shown in fig. 5, the method includes the following steps:
501. And receiving sensing data and touch operation data uploaded by the intelligent robot, wherein the sensing data are acquired by different types of sensors arranged in the intelligent robot, the touch operation data are generated by detecting touch operation of a user through a flexible screen arranged in the intelligent robot, and the flexible screen forms all or part of an outer shell of the intelligent robot.
502. And determining multi-mode reaction behavior data corresponding to the intelligent robot based on the sensing data and the touch operation data, wherein the multi-mode reaction behavior data at least comprises any one of action data, voice output data and display data, and the display data is displayed through a flexible screen.
503. And transmitting the multi-mode reaction behavior data to the intelligent robot so that the intelligent robot performs reaction output based on the multi-mode reaction behavior data.
The intelligent robot can collect sensing data through different types of sensors, and meanwhile, touch operation of a user is detected through the flexible screen so as to generate touch operation data. The intelligent robot may upload the sensing data and the touch operation data to the server, which may receive the data. The specific process of the intelligent robot for acquiring sensing data and acquiring touch operation data is described in detail in the previous embodiment, and will not be described herein.
The server can correspondingly output multi-mode reaction behavior data based on the sensing data and the touch operation data uploaded by the intelligent robot. Optionally, the process of determining the multi-modal response behavior data may be implemented by acquiring the preprocessed sensing data, and inputting the preprocessed sensing data and the touch operation data into a pre-trained neural network model to obtain the multi-modal response behavior data corresponding to the intelligent robot.
The method and the device can be used for preprocessing the sensing data in the intelligent robot or can be used for preprocessing the sensing data locally in a server, and the embodiment of the invention is not limited.
Alternatively, the sensing data may include image data photographed by an image photographing device and/or voice reception data picked up by a voice pickup device. Accordingly, the preprocessing of the sensing data may be implemented as semantic extraction of image data and/or conversion of voice reception data into text data.
The image data contains more information, and some information is helpful for understanding the environment around the intelligent robot, and some information interferes with the process of understanding the environment around the intelligent robot. Based on this, semantic extraction can be performed on the image data, and important information focused on the image data can be extracted. And on the other hand, it is more effective for the neural network model to utilize the semantic extraction result of the image data. If the raw image data is directly input into the neural network model, the neural network model may not be able to efficiently extract the useful features of the image data. Similar to image data, it is more effective for the neural network model to use text data converted based on voice reception data. Based on this, the image data and the voice reception data may be preprocessed before being input to the neural network model, so that the neural network model can more efficiently use the input information.
After preprocessing the sensor data, it can be considered that a sequence of high-dimensional vectors can be obtained. The sequence of high-dimensional vectors may then be input into a neural network model, which may output multi-modal reaction behavior data needed by the intelligent robot to react to the environment.
The neural network model can be constructed by adopting a simple operation fusion method, a fusion method based on an attention mechanism, a fusion method based on bilinear pooling and the like.
The neural network model output may be an identification of the display data. The server can be further provided with a multimedia resource library, and corresponding display data can be searched in the multimedia resource library based on the identification of the display data and sent to the intelligent robot. The multimedia resource library can store different types of data such as voice, text, images, audios and videos.
For example, a scene in which the intelligent robot is supposed to express an internal mood surge can be realized by playing a video of sea wave rolling in a flexible screen. Firstly, a neural network model can output the identification of the sea wave rolling video, then the sea wave rolling video can be found out from a multimedia resource library based on the identification, and the sea wave rolling video is sent to an intelligent robot.
Optionally, in order to make the intelligent robot better perform a suitable reaction to the environment, after the multi-mode reaction behavior data corresponding to the intelligent robot is obtained, the reference reaction behavior data and the behavior constraint condition corresponding to the preprocessed sensing data and the touch operation data can be determined based on a preset rule, the reference reaction behavior data and the multi-mode reaction behavior data are compared, the multi-mode reaction behavior data are supplemented according to a comparison result, and the data which do not accord with the behavior constraint condition in the multi-mode reaction behavior data are removed.
In practical application, the reference response behavior data and the behavior constraint condition corresponding to the preprocessed sensing data and touch operation data can be determined based on a knowledge base, set rules and the like. The reference reaction behavior data can be used as a supplement to the multi-modal reaction behavior data, and the behavior constraint condition can be used as a constraint of the behavior.
For example, an intelligent robot may call a child after seeing it. Generally, if an adult is faced by an intelligent robot, he is normally called upon. If a child is facing at this time, the act of bending over may be performed before the call is made, so that the line of sight of the child is kept as level as possible, which is more relevant. The bending action is the supplementary data serving as the multi-mode reaction behavior data in the reference reaction behavior data.
For another example, if an adult is faced with after an intelligent robot calls, the business logic it will next perform may be to recommend a business product, such as a financial product, to the adult. But if the intelligent robot is facing a child, it may not be able to execute the business logic described above next, it may turn to execute other logic, such as asking the child what animation he wants to see. Wherein, the constraint on the multi-modal response behavior data is obtained without recommending business products to children.
Optionally, in order to make the intelligent robot better perform a suitable reaction to the environment, after the multi-mode reaction behavior data corresponding to the intelligent robot is obtained, the service logic currently executed by the intelligent robot can be obtained, and the multi-mode reaction behavior data is adjusted based on the service logic.
For example, assume that the intelligent robot is currently executing business logic for pouring coffee to user A, and that the priority of the business logic is higher. In executing the business logic, it is assumed that the intelligent robot encounters user B again en route to the table operating the poured coffee. At this time, the intelligent robot can temporarily pause the task of pouring coffee to the user a, and the intelligent robot can call the user B first and then continue to execute the task of pouring coffee to the user a. If the intelligent robot does not currently have the business logic to perform mainly, for example, the task of pouring coffee to the user a is not currently performed, when the intelligent robot encounters the user B, the intelligent robot can ask the user B whether to need help or not, besides making a call to the user B. It follows that the business logic currently being executed by the intelligent robot can also affect the final reaction behavior.
In addition to the multi-modal reaction behavior data being adjustable according to the business logic, the multi-modal reaction behavior data may be adjusted according to control instructions given by the background personnel. It is noted that the knowledge base, the set rules, the business logic and the control instructions given by the background personnel can form nested control logic to finally adjust the multi-mode reaction behavior data.
By adopting the invention, the sensing data acquired by different sensors can be acquired, and the touch operation of a user can be detected through the flexible screen and the touch operation data can be generated. The sensing data and the touch operation data are used as multi-mode input data, so that interaction modes of interaction with the intelligent robot can be enriched, the interaction modes are not limited to a single interaction mode, and the flexibility and the diversity of interaction are improved. By processing the multimodal input data, multimodal reaction behavior data can be correspondingly output, and the intelligent robot can output multimodal reaction behaviors based on the multimodal reaction behavior data, for example, the intelligent robot can display data in the multimodal reaction behavior data through a flexible screen. By adopting the mode, the expression form of the intelligent robot on the external output can be enriched, and the flexibility of the intelligent robot reaction is improved.
An intelligent robot control apparatus according to one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these intelligent robotic control devices can be configured using commercially available hardware components through the steps taught by the present solution.
Fig. 6 is a schematic structural diagram of an intelligent robot control device according to an embodiment of the present invention, where the device is applied to a server, as shown in fig. 6, and the device includes:
The receiving module 61 is configured to receive sensing data and touch operation data uploaded by an intelligent robot, where the sensing data are acquired by different types of sensors set in the intelligent robot, the touch operation data are generated by detecting a touch operation of a user through a flexible screen set in the intelligent robot, and the flexible screen forms all or part of an outer shell of the intelligent robot;
the determining module 62 is configured to determine, based on the sensing data and the touch operation data, multi-modal response behavior data corresponding to the intelligent robot, where the multi-modal response behavior data includes at least any one of motion data, voice output data, and display data, and the display data is displayed through the flexible screen;
And the sending module 63 is configured to send the multi-modal response behavior data to the intelligent robot, so that the intelligent robot performs response output based on the multi-modal response behavior data.
Optionally, the determining module 62 is configured to:
preprocessing the sensing data;
and inputting the preprocessed sensing data and the touch operation data into a pre-trained neural network model to obtain multi-mode reaction behavior data corresponding to the intelligent robot.
Optionally, the sensing data includes image data captured by an image capturing device and/or voice receiving data picked up by a voice pickup device;
The determining module 62 is configured to:
semantic extraction of the image data, and/or,
The voice reception data is converted into text data.
Optionally, the apparatus further comprises an optimization module, configured to:
Determining reference response behavior data and behavior constraint conditions corresponding to the preprocessed sensing data and the touch operation data based on preset rules;
Comparing the reference reaction behavior data with the multi-modal reaction behavior data, and supplementing the multi-modal reaction behavior data according to a comparison result;
and eliminating the data which does not accord with the behavior constraint condition in the multi-mode reaction behavior data.
Optionally, the optimizing module is further configured to:
acquiring service logic currently being executed by the intelligent robot;
And adjusting the multi-mode reaction behavior data based on the business logic.
Fig. 7 is a schematic structural diagram of a control device for an intelligent robot, which is applied to an intelligent robot and includes different types of sensors and a flexible screen as all or part of an outer shell of the intelligent robot. As shown in fig. 7, the apparatus includes:
The acquisition module 71 is configured to acquire sensing data through the different types of sensors, and generate touch operation data through touch operation of a user detected by the flexible screen;
A sending module 72, configured to upload the sensing data and the touch operation data to a server;
The receiving module 73 is configured to receive multi-modal response behavior data corresponding to the sensing data and the touch operation data, where the multi-modal response behavior data includes at least any one of action data, voice output data, and display data, and the display data is displayed through the flexible screen;
And an output module 74, configured to perform reaction output based on the multi-mode reaction behavior data.
Optionally, the sensor includes an image capturing device and/or a voice pickup device, and the sensing data includes image data captured by the image capturing device and/or voice receiving data picked up by the voice pickup device;
the device also comprises a preprocessing module, wherein the preprocessing module is used for:
semantic extraction of the image data, and/or,
The voice reception data is converted into text data.
Optionally, the display data includes two-dimensional display data or three-dimensional display data.
Optionally, the intelligent robot comprises a whole flexible screen or a plurality of flexible screens.
Optionally, when the intelligent robot includes a plurality of the flexible screens, the output module 74 is configured to:
determining sub-display data in the display data corresponding to each flexible screen respectively based on the current position of each flexible screen;
and controlling each flexible screen to display the corresponding sub-display data.
Optionally, the apparatus further comprises a dividing module, configured to:
detecting dividing operation of dividing a sub-display area in a main display area by the user through the flexible screen;
and in response to the dividing operation, dividing the sub display area in the main display area, and respectively displaying different display data through the main display area and the sub display area.
The apparatus shown in fig. 6 and 7 may sequentially execute the intelligent robot control method provided in the embodiments shown in fig. 5 and fig. 1 to fig. 4, and detailed execution processes and technical effects are referred to the descriptions in the foregoing embodiments and are not repeated herein.
In one possible design, the configuration of the intelligent robot control device shown in fig. 6 may be implemented as a server, as shown in fig. 8, which may include a processor 91 and a memory 92. Wherein the memory 92 has executable code stored thereon, which when executed by the processor 91, causes the processor 91 to at least implement the intelligent robot control method provided in the embodiment of fig. 5 as described above.
Optionally, a communication interface 93 may be included in the server for communicating with other devices.
In one possible design, the configuration of the intelligent robot control apparatus shown in fig. 7 described above may be implemented as an intelligent robot, which may include a processor 91', a memory 92', as shown in fig. 9. Wherein the memory 92' has executable code stored thereon, which when executed by the processor 91' causes the processor 91' to at least implement the intelligent robot control method provided in the embodiments of fig. 1-4 described above.
Optionally, a communication interface 93' may also be included in the intelligent robot for communicating with other devices.
In addition, embodiments of the present invention provide a non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of a smart robot, causes the processor to at least implement a smart robot control method as provided in the embodiments shown in fig. 1 to 4 described above.
Embodiments of the present invention provide a non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of a server, causes the processor to at least implement the intelligent robot control method provided in the embodiment of fig. 5 as described above.
The apparatus embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The intelligent robot control method provided by the embodiment of the present invention may be executed by a certain program/software, the program/software may be provided by a network side, the server mentioned in the foregoing embodiment may download the program/software to a local nonvolatile storage medium, and when it needs to execute the foregoing intelligent robot control method, the program/software is read into a memory by a CPU, and then the CPU executes the program/software to implement the intelligent robot control method provided in the foregoing embodiment, and the execution process may refer to the schematic diagrams in fig. 1 to 4.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.