Movatterモバイル変換


[0]ホーム

URL:


CN117021114A - Robot control method and device - Google Patents

Robot control method and device
Download PDF

Info

Publication number
CN117021114A
CN117021114ACN202311246211.9ACN202311246211ACN117021114ACN 117021114 ACN117021114 ACN 117021114ACN 202311246211 ACN202311246211 ACN 202311246211ACN 117021114 ACN117021114 ACN 117021114A
Authority
CN
China
Prior art keywords
text
instruction
robot
environment map
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311246211.9A
Other languages
Chinese (zh)
Inventor
吴若溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Opper Communication Co ltd
Original Assignee
Beijing Opper Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Opper Communication Co ltdfiledCriticalBeijing Opper Communication Co ltd
Priority to CN202311246211.9ApriorityCriticalpatent/CN117021114A/en
Publication of CN117021114ApublicationCriticalpatent/CN117021114A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The application discloses a control method and a control device of a robot, wherein under the condition that a text to be processed input by a user is acquired, the text to be processed is analyzed to obtain at least one instruction text; acquiring a simulation environment map corresponding to a real physical environment in a preset area; determining task instructions required to be carried out by the robot when the robot completes the instruction text on the simulation environment map based on the instruction text and the simulation environment map, and obtaining each task instruction corresponding to each instruction text; and controlling the robot to sequentially execute each task instruction. The application can reduce the complexity of converting the text processing to be processed in the high-level language into the task instruction, thereby improving the accuracy of controlling the robot.

Description

Robot control method and device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a control method and device of a robot.
Background
With the application of intelligent robots in indoor home scenes, the intelligent requirements on the intelligent robots are higher and higher, such as intelligent navigation, intelligent accompaniment, intelligent service and the like. However, the existing robots are mainly driven by machine task instructions, and most of the user inputs high-level language texts, but the instructions in the high-level language texts are complex, so that the accuracy of robot control is low.
That is, the accuracy of robot control in the prior art is low.
Disclosure of Invention
The embodiment of the application provides a control method and a control device for a robot, which can improve the accuracy of robot control.
In a first aspect, the present application provides a method for controlling a robot, including:
under the condition that a text to be processed input by a user is obtained, the text to be processed is analyzed to obtain at least one instruction text;
acquiring a simulation environment map corresponding to a real physical environment in a preset area;
determining task instructions required to be carried out by the robot on the simulated environment map based on the instruction text and the simulated environment map, and obtaining the task instructions corresponding to the instruction texts;
and controlling the robot to execute each task instruction sequentially.
In a second aspect, the present application provides a control device for a robot, including:
the analysis module is used for analyzing the text to be processed to obtain at least one instruction text under the condition that the text to be processed input by the user is acquired;
the acquisition module is used for acquiring a simulation environment map corresponding to the real physical environment in the preset area;
The determining module is used for determining task instructions which are required to be carried out by the robot when the instruction text is completed on the simulation environment map based on the instruction text and the simulation environment map, and obtaining the task instructions corresponding to the instruction text;
and the control module is used for controlling the robot to sequentially execute the task instructions.
In a third aspect, the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to run the computer program in the memory, to implement steps in the control method of a robot provided by the present application.
In a fourth aspect, the present application provides a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor, implementing the steps in the method for controlling a robot provided by the present application.
In a fifth aspect, the present application provides a computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps in the control method of a robot provided by the present application.
In the application, compared with the related art, under the condition that the text to be processed input by the user is obtained, the text to be processed is analyzed to obtain at least one instruction text; acquiring a simulation environment map corresponding to a real physical environment in a preset area; determining task instructions required to be carried out by the robot when the robot completes the instruction text on the simulation environment map based on the instruction text and the simulation environment map, and obtaining each task instruction corresponding to each instruction text; and controlling the robot to sequentially execute each task instruction. According to the application, the text to be processed is firstly decomposed into at least one instruction text, and then the instruction text is mapped into the task instruction which can be executed by the robot on the simulation environment map by combining with the simulation environment map, so that the text to be processed in the high-level language input by the user can be decomposed into the instruction text, and then mapped into the task instruction which can be directly executed by the robot, the complexity of converting the text to be processed in the high-level language into the task instruction can be reduced, and the accuracy of controlling the robot is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a control system of a robot according to an embodiment of the present application;
FIG. 2 is a flow chart of an embodiment of a method for controlling a robot according to an embodiment of the present application;
FIG. 3 is a schematic view of a plurality of simulation environments in one embodiment of a control method of a robot according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a simulation environment changing before and after a robot executes a task instruction in an embodiment of a control method of a robot according to an embodiment of the present application;
FIG. 5 is a schematic diagram of generating an execution video in one embodiment of a control method of a robot according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a preset training set in one embodiment of a control method for a robot according to an embodiment of the present application;
FIG. 7 is a schematic diagram of adding a bypass module and a fusion module to a preset large-scale language model in an embodiment of a control method of a robot according to an embodiment of the present application;
FIG. 8 is a schematic diagram of training a preset text parsing model in one embodiment of a control method of a robot according to an embodiment of the present application;
fig. 9 is a schematic flow chart of another embodiment of a control method of a robot according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a prompt template in another embodiment of a control method of a robot according to an embodiment of the present application;
FIG. 11 is a schematic view of a control device of a robot according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It should be noted that the principles of the present application are illustrated as implemented in a suitable computing environment. The following description is based on illustrative embodiments of the application and should not be taken as limiting other embodiments of the application not described in detail herein.
In the following description of the present application reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or a different subset of all possible embodiments and can be combined with each other without conflict.
In the following description of the present application, the terms "first", "second", "third" and "third" are merely used to distinguish similar objects from each other, and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a particular order or sequence, as permitted, to enable embodiments of the present application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
In order to be able to improve the accuracy of robot control, embodiments of the present application provide a control method of a robot, a control device of a robot, an electronic apparatus, a computer-readable storage medium, and a computer program product. The control method of the robot may be executed by a control device of the robot or by an electronic device integrated with the control device of the robot.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
Referring to fig. 1, the present application further provides a control system of a robot, as shown in fig. 1, where the control system of a robot includes an electronic device 100 and a robot 300 connected to the electronic device 100, and the control device of a robot provided by the present application is integrated in the electronic device 100.
The electronic device 100 may be any device with a processor and having a processing capability, such as a mobile electronic device with a processor, such as a smart phone, a tablet computer, a palm computer, a notebook computer, and a smart speaker, or a stationary electronic device with a processor, such as a desktop computer, a television, a server, and an industrial device.
In addition, the control system of the robot may further include a memory 200 for storing raw data, intermediate data, and result data.
In the embodiment of the application, the memory may be a cloud memory, cloud storage (cloud storage) is a new concept which extends and develops in the concept of cloud computing, and the distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system which provides data storage and service access functions together by integrating a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces through functions such as cluster application, grid technology, distributed storage file systems and the like.
At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.
It should be noted that, the schematic view of the scenario of the control system of the robot shown in fig. 1 is only an example, and the control system and scenario of the robot described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application, and as one of ordinary skill in the art can know, along with the evolution of the control system of the robot and the appearance of a new service scenario, the technical solution provided by the embodiments of the present application is equally applicable to similar technical problems.
The following will describe in detail. The numbers of the following examples are not intended to limit the preferred order of the examples.
Referring to fig. 2, fig. 2 is a flow chart of an embodiment of a control method of a robot according to an embodiment of the present application, and as shown in fig. 2, the flow chart of the control method of the robot according to the present application is as follows:
201. and under the condition that the text to be processed input by the user is acquired, analyzing the text to be processed to acquire at least one instruction text.
The text to be processed is text which belongs to natural language and is input by a user. Natural language is a naturally occurring language for human communication that consists of a series of phonemes, words, syntax and semantics that can be understood and used by humans or other living beings. Natural language is a wide variety of such as chinese, english, french, etc. The natural language has the advantages of rich expression, flexibility and variability, and the defects of complex rules and multiple inconsistencies.
The text to be processed can be the text of various languages such as Chinese, english, french and the like. For example, the text to be processed "take a bottle of milk from a refrigerator". The user inputs the text to be processed hopefully the robot can understand "take a bottle of milk from the refrigerator" and execute the instruction.
The instruction text is a low-level language instruction which can be understood and executed by the robot, and the instruction text can be directly input into the robot to be executed by the robot. At least one instruction text is a series of instruction texts that the robot needs to execute to complete the intent expressed by the text to be processed. Optionally, the processing text can be decomposed into at least two instruction texts, the text to be processed is analyzed, and the more the number of the obtained instruction texts is, the more accurate the robot is controlled. For example, the text to be processed is "take a bottle of milk from a refrigerator", and after analysis, 6 instruction texts are obtained, where the 6 instruction texts are respectively: "{ find } < mill >", "{ walk } < mill >", "find } < frame >", "{ open } < frame >", "{ grad } < mill >", and "{ open } < frame >".
In a specific embodiment, extracting keywords in the text to be processed, and searching matched instruction texts in a preset instruction text library according to the extracted keywords to obtain at least one instruction text.
In another specific embodiment, a preset text parsing model is trained in advance based on a preset training set, and a text to be processed is parsed based on the preset text parsing model, so that at least one instruction text is obtained. The preset text analysis model may be various neural network models, which are not limited herein. By means of semantic analysis of the text to be processed through the preset text analysis model, user intention in the text to be processed can be extracted more accurately compared with a keyword matching mode, so that more accurate instruction text is obtained, and accuracy of controlling the robot is improved.
202. And acquiring a simulation environment map corresponding to the real physical environment in the preset area.
The simulation environment map is a three-dimensional electronic map in a preset area, namely, three-dimensional abstract description of one or more aspects of the real world or a part of the real world is based on a three-dimensional electronic map database according to a certain proportion. The network three-dimensional electronic map not only provides map searching functions such as map inquiry, travel navigation and the like for users through an intuitive geographical live-action simulation expression mode, but also integrates a series of services such as living information, electronic government affairs, electronic commerce, virtual communities, travel navigation and the like.
The robot can be controlled to avoid obstacles in the real physical environment of the preset area through the laser radar, and each object in the real physical environment of the preset area can be scanned by 360 degrees through the camera of the robot. And establishing a simulation environment map corresponding to the real physical environment in the preset area by utilizing a synchronous positioning and mapping (Simultaneous Localization and Mapping, SLAM) algorithm. SLAM can be described as: the robot starts to move from an unknown position in an unknown environment, positions itself according to the position and the map in the moving process, and builds an incremental map on the basis of self-positioning, so as to realize autonomous positioning and navigation of the robot.
Wherein, the relation between each object and each object in the preset area is marked on the simulation environment map. The knowledge graph can be used for simulating the relationship between each object and each object on the environment map. The information of each object is represented by each point on the knowledge graph, and the interrelationships among the objects are represented by each side on the knowledge graph. Wherein the object attribute information corresponding to each node comprises: index, category, current state of the object, detection frame of the object, 3D information of the object, etc., wherein the current state of the object mainly comprises the following states: open, closed, on, in.
In the embodiment of the application, the number of the simulation environment maps corresponding to the real physical environment in the preset area can be multiple, and the multiple simulation environment maps corresponding to the real physical environment in the preset area are used for the robot to execute instruction texts in different scenes. As shown in fig. 3, the present application simulates 6 different simulated environment maps, and the scene categories related to the 6 different simulated environment maps in fig. 3 mainly include: navigation movements, home management, companion, entertainment, and daily user activities.
203. And determining task instructions required to be carried out by the robot when the robot completes the instruction text on the simulation environment map based on the instruction text and the simulation environment map, and obtaining each task instruction corresponding to each instruction text.
In the embodiment of the application, the instruction text is a target action and a target object acted by the target action. For example, one of the instruction texts is: { walk } < mill >, the target action is { walk }, the target object is a mill, i.e., moves to the position of the target object mill.
Since the instruction output of the model and the input of the simulation may not be completely unified, instruction mapping is required, and the instruction text and the simulation environment map need to be aligned. In a specific embodiment, determining, based on the instruction text and the simulated environment map, task instructions to be performed by the robot for completing the instruction text on the simulated environment map, to obtain each task instruction corresponding to each instruction text, including:
(1) And acquiring robot attribute information of the robot on the simulation environment map and object attribute information of the target object on the simulation environment map.
In a specific embodiment, the robot attribute information includes a position of the robot on the simulated environment map, a state of the robot, and the like; the object attribute information of the target object on the simulation environment map comprises: the position, current state, etc. of the target object on the simulated environment map.
For example, the robot attribute information is a robot position of the robot, and the object attribute information of the target object on the simulation environment map is a target object position of the target object on the simulation environment map.
(2) And determining a task instruction corresponding to the instruction text based on the robot attribute information of the robot on the simulation environment map, the object attribute information of the target object on the simulation environment map and the target action.
In a specific embodiment, the robot attribute information is a robot position of the robot, the object attribute information of the target object on the simulated environment map is a target object position of the target object on the simulated environment map, and the target action is walk. Determining a navigation task to be executed by the instruction text according to the target action, wherein the task instruction corresponding to the instruction text is as follows: the robot is navigation-moved from the robot position to the target object position on the simulated environment map.
For example, the instruction text is: { walk } < mill >, map instruction text to task instruction, get task instruction as: < char () > { walk } < mill >. The task instruction is as follows: < char () > { walk } < mill >, means that the robot is navigation-moved from the robot position to the position where the mill is located on the simulated environment map.
In a specific embodiment, the text to be processed is "take a bottle of milk from a refrigerator", and after analysis, 6 instruction texts are obtained, where the 6 instruction texts are respectively: "{ find } < mill >", "{ walk } < mill >", "{ find } < frame >", "{ open } < frame >", "{ grad } < mill >", "{ open } < frame >". Determining task instructions required to be carried out by the robot when the robot completes the instruction text on the simulation environment map based on the instruction text and the simulation environment map, and obtaining each task instruction corresponding to each instruction text, wherein the corresponding 6 task instructions are respectively as follows: "< char () > { find } < mill >", "< char () > { walk } < mill >", "< char () > { find } < frame >", "< char () > { open } < frame >", "< char () > { grad } < mill >", "< char () > { open } < frame >".
In order to avoid the situation, in a specific embodiment, under the condition that each target action in each instruction text belongs to a preset action set, determining a task instruction required to be performed by the robot in the simulation environment map to complete the instruction text based on the instruction text and the simulation environment knowledge graph, and obtaining each task instruction corresponding to each instruction text.
The preset action set may be preset, for example, the preset action set may include { walk }, { grad }, { lie }, and { read }. When all target actions in all instruction texts belong to a preset action set, the fact that all the instruction texts are actions which can be executed by the robot is indicated, at the moment, task instructions which need to be executed by the robot in the instruction texts are determined on the basis of the instruction texts and the simulation environment knowledge graph, and the problem that the analyzed instruction texts cannot be executed by the robot and are disordered can be avoided.
Further, when the positions of all target objects in all the instruction texts are located in the preset area, the target objects are indicated to be obtainable by the robot in the preset area, and then task instructions required to be carried out by the robot in the instruction texts are determined on the basis of the instruction texts and the simulation environment knowledge graph, so that the problem that the analyzed instruction texts cannot be executed by the robot and are disordered can be avoided.
The application simulates 6 different simulation environment maps, and the scene categories related to the 6 different simulation environment maps in fig. 3 mainly comprise: navigation movements, home management, companion, entertainment, and daily user activities.
In a specific embodiment, scene categories of all the instruction texts are obtained, if the scene categories of all the instruction texts belong to the same scene category, a simulation environment map corresponding to the scene categories of all the instruction texts is obtained, task instructions required to be carried out by the robot on the simulation environment map are determined based on the instruction texts and the simulation environment map, and all the task instructions corresponding to all the instruction texts are obtained.
In another specific embodiment, a simulation environment map corresponding to a scene category set by a user is obtained, task instructions required to be carried out by the robot in the simulation environment map to finish the instruction text are determined based on the instruction text and the simulation environment map corresponding to the scene category set by the user, and each task instruction corresponding to each instruction text is obtained.
204. And controlling the robot to sequentially execute each task instruction.
Further, after each task instruction is acquired, the robot is required to call various applications or services to execute the task instruction on the simulation environment map.
Specifically, the instruction text includes a target action and a target object on which the target action acts. After the instruction text is mapped to the task instruction, the target action of the robot is moving, and the tasks to be executed are a navigation task and a moving task. And controlling the robot to call various navigation applications to plan a moving path moving to a target object in the simulated environment map so as to complete a navigation task. And controlling the robot to call the operation module to move to the target object according to the moving path in the simulation environment map so as to complete the moving task, thereby completing the task instruction.
For example, one of the task instructions is: < char () > { walk } < mill >, the target action is { walk }, and the target object is mill, then the robot needs to acquire the position of the robot and the position of the object mill on the simulated environment map, then perform the navigation task to obtain the moving path for driving to the object mill, and finally perform the moving task to move to the object mill according to the moving path.
As shown in fig. 4, the upper half part in fig. 4 is a current state diagram of a simulation environment map in a preset area, the text to be processed is turn on tv and turn light off, the text to be processed is analyzed to obtain at least one instruction text, and the robot is controlled to sequentially execute each task instruction so as to complete the at least one instruction text. After the control robot sequentially executes each task instruction, a state diagram of the simulation environment map in the preset area is shown in the lower half of fig. 4.
In the embodiment of the application, the control method of the robot further comprises the following steps: controlling the robot to shoot to obtain an execution video in the process of controlling the robot to sequentially execute each task instruction; and outputting an execution video after controlling the robot to sequentially execute each task instruction.
Referring to fig. 5, a user inputs a text to be processed "find a place where bananas is stored", parses bananas putin the fridge "the text to be processed to obtain at least one instruction text, and controls a robot to sequentially execute each task instruction to complete the at least one instruction text. Controlling the robot to shoot to obtain an execution video in the process of controlling the robot to sequentially execute each task instruction; and outputting an execution video after controlling the robot to sequentially execute each task instruction.
In a specific embodiment, analyzing the text to be processed to obtain at least one instruction text includes analyzing the text to be processed based on a preset text analysis model to obtain at least one instruction text.
In the embodiment of the application, the preset text analysis model comprises a preset large-scale language model, a bypass module and a fusion module. The preset large-scale language model is Chatglm-6b. The GLM model is a frame deduced by the university of Qinghua, adopts the model Chatglm-6b, is based on transformation of a transducer model, adopts a self-coding and autoregressive bidirectional idea, converts a problem into a complete space filling solution, has a parameter magnitude of 62 hundred million parameters, optimizes Chinese questions and answers and dialogues by taking 1T bilingual corpus as a training data set, and can generate answers conforming to human preferences at present. Of course, in other embodiments, the preset large-scale language model may be a model of ChatGPT series or LLama series, which is determined according to the specific situation.
In a specific embodiment, the method includes parsing a text to be processed based on a preset text parsing model to obtain at least one instruction text, and includes: adding a bypass module and a fusion module to a preset large-scale language model to obtain an initial language model; training the initial language model based on a preset training set to obtain a preset text analysis model. The parameters of the bypass module in the initial language model are updated in training. Specifically, the bypass module comprises a dimension reduction matrix and a dimension increase matrix, and the dimension reduction matrix is utilized to process the text to be processed to obtain dimension reduction data; and processing the dimension-increasing data by using the dimension-increasing matrix to obtain bypass output data.
The preset training set is corpus data of tasks executed by the indoor home scene robot. Since the corpus data of the indoor home scene robot executing task basically has no public dataset, the corpus dataset needs to be designed and constructed by oneself to obtain a preset training set. The application constructs 6 indoor family scene environments, constructs corpus types of 12 large indoor families, and comprises the following steps: leisure, work, room cleaning, room arrangement, food preparation, sanitation, social, diet, activity, sleep, preparation work and the like, and a mass of corpus data is constructed.
The preset training set comprises training texts and a corresponding instruction text set, and the instruction text set comprises at least one instruction text. Optionally, the instruction text set comprises at least two instruction texts. Referring to fig. 6, each training text is shown in the corpus task list in fig. 6, and the training text may be "Read book", "Watch tv", "list to music". The training text and corresponding instruction text set are shown in corpus format in fig. 6.
For example, the training text is "Read book", and the corresponding instruction text set includes a plurality of instruction texts, for example, the plurality of instruction texts in the instruction text set are "1.{ walk } < driving_room >", "2.{ walk } < book >", "3.{ grad } < book >", "4.{ walk } < sofa >", "5.{ lie } < sofa >", "6.{ Read } < book >", respectively.
As shown in fig. 7, the left side in fig. 7 is the weight of the preset large-scale language model, and the input and output dimensions are d; x is the processed text to be processed, and the dimension is d and W0 In order to preset the weight of a large-scale language model, the weight can be decomposed through a low rank, the formula is shown as (1), A and B are decomposition matrixes, A is a dimension reduction matrix, B is a dimension increase matrix, and the dimension reduction matrix A and the dimension increase matrix B form a bypass module. During training, weight W0 Frozen, not gradiently updated, but the dimension-reduction matrix a and dimension-increase matrix B contain trainable parameters, whose forward propagation formula is shown in formula (2).
W0 +ΔW=W0 +BA (1)
h=W0 x+ΔWx=W0 x+BAx (2)
The dimension-reducing matrix a is initialized by adopting random gaussian distribution, and the dimension-increasing matrix B is initialized by adopting zero value, so that the Δw=ba value is zero in the initial stage of model training. r is the rank of the matrix, scaling Δwx: alpha/r, wherein alpha is a constant in r, and h is output after the fusion module fuses data.
In a specific embodiment, a preset training set is first obtained, and the format of the preset training set is json format; secondly, preprocessing a preset training set to convert json data into a json data format, and simultaneously, word segmentation processing is carried out on each training text in the preset training set and data are stored; and performing fine tuning on the basis of the chatGLM-6b large model, setting training parameters, and storing the training model to obtain a fine tuning large-scale language model.
The GLM model adopts a self-coding and autoregressive structure, and solves the problem similar to the idea of complete filling. Auto-coding refers to the random deletion of consecutive tokens, and auto-regressive refers to the ability to access predicted tokens, so the model can be well understood.
As shown in fig. 8, in the embodiment of the present application, training an initial language model based on a preset training set, the steps of obtaining a preset text parsing model are as follows:
assume the inputs are [ x1, x2, x3, x4, x5, x6]. We perform different span sampling conforming to poisson distribution, i.e. [ x3 ]]And [ x5, x6 ]]. The input is split into two parts: part A is x1, x2, [ M ]],x4,x5,[M]]Part B is [ x3 ]]And [ x5, x6 ]]. Part A is the corrupted text xcorrupt Part B is the covered fragment. The words of Part A can see each other but cannot see any of the words in Part B. Words of Part B may see the prepositions in Part A and Part B, but not follow-on words in Part B.
GLM autoregressive generates Part B fractions. We splice Part a and Part B, with each span in Part B taking S as input and appending E as output. Two position encodings are employed here, where 2D position encoding 1 represents an index, which corresponds to the mask [ M ] mark position for the span of the B portion; the 2D position codes 2 represent relative positions within the span so the marks in section a are all 0.
Given an input text x= [ x1, x2, x3, x4, x5.]From which a plurality of text fragments s are sampled1 ,s2 ,s3 ,s4 .......sm ]Wherein each segment si Corresponding to a series of consecutive words in x. Each fragment uses a separate [ MASK ]]Symbol replacement to form a corrupted text xcorrupt . The model predicts missing words from the corrupted text in an autoregressive manner, meaning that when predicting missing words in a segment, the model can access the corrupted text and the previously predicted segment. To fully capture the interdependencies between different fragments, we randomly shuffle the order of the fragments, similar to arranging a language model. All permutation and combination are performed on the set of poisson samples i.e. part b, i.e. Zm is length m index sequence [1,2, 3..m]Is a set of all possible permutations of the initial language model, the objective function of the initial language model is as follows,
wherein words in each blank are generated in a left-to-right order, i.e. fragments s are generatedi The probability pθ of (a) can be decomposed into the following format,
and training the initial language model according to the iteration of the objective function, and obtaining a preset text analysis model when the preset stop condition is met.
In the embodiment of the application, a text to be processed is analyzed based on a preset text analysis model to obtain at least one instruction text, which comprises the following steps:
(1) And processing the text to be processed by using a bypass module to obtain bypass output data.
In the embodiment of the application, the bypass module comprises a dimension reduction matrix and a dimension increase matrix, and the bypass module is utilized to process the text to be processed to obtain bypass output data, which comprises the following steps: processing the text to be processed by using the dimension-reducing matrix to obtain dimension-reducing data; and processing the dimension-increasing data by using the dimension-increasing matrix to obtain bypass output data.
(2) And processing the text to be processed by using a preset large-scale language model to obtain model output data.
(3) And fusing the bypass output data and the model output data by utilizing a fusion module to obtain at least one instruction text.
Referring to fig. 9, fig. 9 is a flow chart of another embodiment of a control method of a robot according to an embodiment of the present application, and as shown in fig. 9, the flow chart of the control method of a robot according to the present application is as follows:
401. and presetting a text analysis model.
Specifically, a bypass module and a fusion module are added to a preset large-scale language model to obtain an initial language model. Training the initial language model based on a preset training set to obtain a preset text analysis model.
The preset training set comprises a plurality of training texts and instruction text sets corresponding to the training texts. The instruction text set contains a plurality of instruction texts. The training text is human high-level language text. The instruction text is a low-level language instruction text that may be used to control the robot.
402. When the user input data input by the user is acquired, judging whether the user input data is of a text type or not.
In the embodiment of the application, the user input data input by the user may be voice type data, picture type data and text type data. The user may enter user input data through a chat window. Judging whether the user input data is of a text type or not, and if the user input data is of a voice type, performing voice recognition on the user input data to obtain a text to be processed; if the user input data is the data of the picture type, carrying out image recognition on the user input data to obtain a text to be processed; and if the user input data is text type data, determining the user input data as the text to be processed.
403. User input data is determined to be text to be processed and a pre-constructed prompt template is acquired.
In the embodiment of the application, if the user input data is text type data, the user input data is determined to be the text to be processed.
The prompt template is used for indicating splitting the text to be processed into at least one instruction text. The Prompt template is the Prompt. In the field of machine learning and natural language processing, promt generally refers to a segment of input text that is used to generate text or language model training. Prompt may be a question, a topic, a piece of text, or a set of keywords that are used to guide the model in generating relevant text. For example, in a text generation task, a Prompt may be used to specify the topic or style of generating text in order to generate more satisfactory text. In language model training, promt may be used as part of the input sequence to specify the language rules and context information that the model needs to learn.
The pre-constructed prompt template is shown in fig. 10, wherein "take a bottle of milk from a refrigerator" in fig. 10 is a text to be processed, and LLM is a preset text analysis model.
404. Inputting the prompt template and the text to be processed into a preset text analysis model to obtain at least one instruction text corresponding to the text to be processed.
For example, the text to be processed is "take a bottle of milk from a refrigerator". Inputting the prompt template and the text to be processed into a preset text analysis model to obtain at least one instruction text corresponding to the text to be processed. After analysis, 6 instruction texts are obtained, wherein the 6 instruction texts are respectively: "1.{ walk } < living_room >", "2.{ walk } < book >", "3.{ grad } < book >", "4.{ walk } < sofa >", "5.{ lie } < sofa >", and "6.{ read } < book >".
405. And judging whether each target action in each instruction text belongs to a preset action set.
In the embodiment of the present application, it is determined whether each target action in each instruction text belongs to a preset action set, and if each target action in each instruction text belongs to a preset action set, execution 406 is performed: and judging whether the positions of all target objects in all instruction texts are all located in a preset area. And if the target actions in the instruction texts are not uniform and belong to the preset action set, sending out prompt information.
For example, the 6 instruction texts are respectively: "< char () > { find } < mill >", "< char () > { walk } < mill >", "< char () > { find } < frame >", "< char () > { open } < frame >", "< char () > { grad } < mill >", "< char () > { open } < frame >". The target actions in the 6 instruction texts are respectively as follows: "{ find }," { walk }, "{ find }," { open }, "{ grad }," { open }, and "{ open }.
406. And judging whether the positions of all target objects in all instruction texts are all located in a preset area.
In the embodiment of the application, whether the positions of all target objects in all instruction texts are located in a preset area is judged, if the positions of all target objects in all instruction texts are located in the preset area, the execution 407 is performed, and task instructions required to be performed by a robot in the instruction texts are determined based on the instruction texts and the simulation environment knowledge graph to obtain all task instructions corresponding to all instruction texts. And if the position non-uniformity of each target object in each instruction text is in the preset area, sending out prompt information.
For example, the 6 instruction texts are respectively: "< char () > { find } < mill >", "< char () > { walk } < mill >", "< char () > { find } < frame >", "< char () > { open } < frame >", "< char () > { grad } < mill >", "< char () > { open } < frame >". The target objects in the 6 instruction texts are respectively: "{ mill", "{ frame }," { mill }, "{ frame }, and" { frame }.
407. And determining task instructions required to be carried out by the robot on the simulation environment map after completing the instruction text based on the instruction text and the simulation environment knowledge graph, and obtaining each task instruction corresponding to each instruction text.
In a specific embodiment, the text to be processed is "take a bottle of milk from a refrigerator", and after analysis, 6 instruction texts are obtained, where the 6 instruction texts are respectively: "{ find } < mill >", "{ walk } < mill >", "{ find } < frame >", "{ open } < frame >", "{ grad } < mill >", "{ open } < frame >". Determining task instructions required to be carried out by the robot when the robot completes the instruction text on the simulation environment map based on the instruction text and the simulation environment map, and obtaining each task instruction corresponding to each instruction text, wherein the corresponding 6 task instructions are respectively as follows: "< char () > { find } < mill >", "< char () > { walk } < mill >", "< char () > { find } < frame >", "< char () > { open } < frame >", "< char () > { grad } < mill >", "< char () > { open } < frame >".
408. And controlling the robot to sequentially execute each task instruction.
409. And controlling the robot to shoot to obtain an execution video in the process of controlling the robot to sequentially execute each task instruction.
In the embodiment of the application, 409 and 408 are executed simultaneously, and the execution video can display the whole process of the robot for executing each task instruction in sequence.
410. And outputting an execution video after controlling the robot to sequentially execute each task instruction.
In the embodiment of the application, after the robot is controlled to sequentially execute each task instruction, an execution video is output. The user can see the whole process of the robot to sequentially execute each task instruction. And according to the advantages and disadvantages of the execution video, the constructed prompt template can be adjusted, so that the accuracy of controlling the robot is further improved.
The application provides the possibility for robot body intelligence, enables a large language model to have the capability of physical world interaction, and provides one-stop service from text input to video output. Firstly, constructing massive data of indoor home scene corpus; secondly, a model of the GLM is enabled to be an expert in a certain field through a fine tuning technology of the GLM pre-training large language model; to further provide model accuracy, prompt template (Prompt) construction is performed; then building a robot simulation environment map to have the capability of executing instructions; and finally, carrying out chain construction on the large model and the simulation environment, completing the mapping of the instruction, realizing the output from the text input to the robot command executing process video, and completing the one-stop service chain. The application also provides more intelligent possibility for the intelligent home robot, so that the interaction between the intelligent robot and human is more intimate, and the natural language model is not limited to the chat type generation dialogue function, but can directly drive the robot through text to realize a system with intelligent body.
In order to facilitate better implementation of the control method of the robot provided by the embodiment of the application, the embodiment of the application also provides a control device of the robot based on the control method of the robot. The meaning of the noun is the same as that of the control method of the robot, and specific implementation details refer to the description in the embodiment of the method.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a control device of a robot according to an embodiment of the present application, where the control device of the robot may include an analysis module 601, an acquisition module 602, a determination module 603, and a control module 604, where,
the parsing module 601 is configured to parse the text to be processed to obtain at least one instruction text when the text to be processed input by the user is obtained;
the acquiring module 602 is configured to acquire a simulation environment map corresponding to a real physical environment in a preset area;
the determining module 603 is configured to determine, based on the instruction text and the simulated environment map, task instructions that the robot needs to perform when completing the instruction text on the simulated environment map, and obtain each task instruction corresponding to each instruction text;
and the control module 604 is used for controlling the robot to sequentially execute each task instruction.
The specific implementation of each module can be referred to the previous embodiments, and will not be repeated here.
The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the processor is used for executing the steps in the control method of the robot provided by the embodiment by calling the computer program stored in the memory.
Referring to fig. 12, fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the application.
The electronic device may include one or more processing cores 'processors 101, one or more computer-readable storage media's memory 102, power supply 103, and input unit 104, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in the figures is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 101 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 102, and invoking data stored in the memory 102. Optionally, processor 101 may include one or more processing cores; alternatively, the processor 101 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 101.
The memory 102 may be used to store software programs and modules, and the processor 101 executes various functional applications and data processing by executing the software programs and modules stored in the memory 102. The memory 102 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 102 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 102 may also include a memory controller to provide access to the memory 102 by the processor 101.
The electronic device further comprises a power supply 103 for powering the various components, optionally, the power supply 103 may be logically connected to the processor 101 by a power management system, whereby the functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 103 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 104, which input unit 104 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit, an image acquisition component, and the like, which are not described herein. Specifically, in this embodiment, the processor 101 in the electronic device loads executable codes corresponding to one or more computer programs into the memory 102 according to the following instructions, and the steps in the control method of the robot provided by the present application are executed by the processor 101, for example:
under the condition that a text to be processed input by a user is obtained, the text to be processed is analyzed to obtain at least one instruction text; acquiring a simulation environment map corresponding to a real physical environment in a preset area; determining task instructions required to be carried out by the robot when the robot completes the instruction text on the simulation environment map based on the instruction text and the simulation environment map, and obtaining each task instruction corresponding to each instruction text; and controlling the robot to sequentially execute each task instruction.
It should be noted that, the electronic device provided in the embodiment of the present application and the control method of the robot in the above embodiment belong to the same concept, and detailed implementation processes of the electronic device are described in the above related embodiments, which are not repeated here.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed on a processor of an electronic device provided by an embodiment of the present application, causes the processor of the electronic device to execute the steps in the control method of a robot provided by the present application. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform various alternative implementations of the control method of the robot described above.
The above describes in detail a control method and apparatus for a robot provided by the present application, and specific examples are applied herein to illustrate the principles and embodiments of the present application, where the above description of the examples is only for helping to understand the method and core ideas of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present application, the present description should not be construed as limiting the present application in summary.
It should be noted that when the above embodiments of the present application are applied to specific products or technologies, related data concerning users are required to obtain user approval or consent, and the collection, use and processing of the related data are required to comply with related laws and regulations and standards of related countries and regions.

Claims (12)

CN202311246211.9A2023-09-252023-09-25Robot control method and devicePendingCN117021114A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202311246211.9ACN117021114A (en)2023-09-252023-09-25Robot control method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202311246211.9ACN117021114A (en)2023-09-252023-09-25Robot control method and device

Publications (1)

Publication NumberPublication Date
CN117021114Atrue CN117021114A (en)2023-11-10

Family

ID=88626669

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202311246211.9APendingCN117021114A (en)2023-09-252023-09-25Robot control method and device

Country Status (1)

CountryLink
CN (1)CN117021114A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118915598A (en)*2024-10-092024-11-08武汉理工大学Multi-cabin-cleaning robot interactive control method and system based on large language model
WO2025126663A1 (en)*2023-12-112025-06-19ソニーグループ株式会社Information processing device, information processing method, and program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2025126663A1 (en)*2023-12-112025-06-19ソニーグループ株式会社Information processing device, information processing method, and program
CN118915598A (en)*2024-10-092024-11-08武汉理工大学Multi-cabin-cleaning robot interactive control method and system based on large language model
CN118915598B (en)*2024-10-092025-01-24武汉理工大学 Interactive control method and system of multiple cleaning robots based on large language model

Similar Documents

PublicationPublication DateTitle
Misra et al.Tell me dave: Context-sensitive grounding of natural language to manipulation instructions
Park et al.Visual language navigation: A survey and open challenges
CN113656563B (en) A neural network search method and related equipment
Hassani et al.Visualizing natural language descriptions: A survey
CN110263324A (en)Text handling method, model training method and device
KR20190019962A (en) Architectures and processes for computer learning and understanding
CN117021114A (en)Robot control method and device
CN114328943B (en) Question answering method, device, equipment and storage medium based on knowledge graph
CN118093840B (en)Visual question-answering method, device, equipment and storage medium
CN116821307B (en)Content interaction method, device, electronic equipment and storage medium
CN117283549A (en)Robot control method and device
Narayan et al.Deep learning approaches to text production
EP4629175A1 (en)Method and apparatus for determining picture generation model, method and apparatus for picture generation, computing device, storage medium, and program product
CN116186220A (en)Information retrieval method, question and answer processing method, information retrieval device and system
CN113590803B (en) A data processing method, device, storage medium and computer equipment
CN116956866A (en)Scenario data processing method, apparatus, device, storage medium and program product
CN120031684A (en) VR-based multi-person foreign language situational teaching method and system for primary and secondary schools
CN118551004A (en)Knowledge retrieval graph-based Chinese dialogue knowledge retrieval method and system
CN118427215A (en)Method for generating structured query statement, method for processing question and answer and corresponding device
CN117635741A (en)Image generation method and device
CN116541490A (en)Complex scene video question-answering method and system for cloud service robot
Laird et al.A case study of knowledge integration across multiple memories in Soar
CN120012770A (en) Text data processing method, device, computer equipment, readable storage medium and program product
CN113821610A (en)Information matching method, device, equipment and storage medium
CN118627543A (en) Text processing model training method, text processing method, question-answering processing method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp