Disclosure of Invention
In order to solve the defects of the prior art, the space environment identification method and system for robot intelligent service are provided by the disclosure, the relation among all articles in the space and the relation among the articles and the attributes are obtained by constructing the relation triple, and the environment identification capability of the space is greatly improved.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
the first aspect of the disclosure provides a space environment identification method for robot intelligent service.
A space environment identification method for robot intelligent service comprises the following steps:
acquiring at least one image of a spatial environment region to be identified;
acquiring a model by adopting a preset visual relationship according to the acquired image, acquiring article characteristics in the image and relationship characteristics among articles, and further acquiring a visual relationship triple of the articles in the image;
obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article;
obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article;
and processing each acquired image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the article multiple relation graph of each acquired image to realize identification of the space environment.
A second aspect of the present disclosure provides a spatial environment recognition system for a robot-intelligent service.
A spatial environment recognition system for robotic intelligence services, comprising:
a data acquisition module configured to: acquiring at least one image of a spatial environment region to be identified;
a visual relationship triplet acquisition module configured to: acquiring a model by adopting a preset visual relationship according to the acquired image, acquiring article characteristics in the image and relationship characteristics among articles, and further acquiring a visual relationship triple of the articles in the image;
an item attribute acquisition module configured to: obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article;
an item multiple relationship diagram acquisition module configured to: obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article;
an environment identification module configured to: and processing each image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the obtained article multiple relation graph of each image to realize the identification of the environment.
A third aspect of the present disclosure provides a medium on which a program is stored, the program implementing, when executed by a processor, the steps in the spatial environment recognition method for robot intelligence service according to the first aspect of the present disclosure.
A fourth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a program stored on the memory and executable on the processor, where the processor executes the program to implement the steps in the method for identifying a spatial environment for robot intelligent services according to the first aspect of the present disclosure.
Compared with the prior art, the beneficial effect of this disclosure is:
1. according to the method, the system, the medium and the electronic equipment, the preset visual relationship obtaining model is adopted, the object features in the image and the relationship features between the objects are obtained, the triple features of the objects in the image are obtained, the visual relationship triple of each object in the image is further obtained, the identification of the relative position relationship of each object in the image is achieved, and the identification capacity of the space is improved.
2. According to the method, the system, the medium and the electronic equipment, the model is obtained by using the preset attribute according to the obtained article characteristics, the attribute of each article is obtained, the article multi-relation graph in the current image is obtained according to the obtained visual relation triple of each article and the attribute of each article, the visual relation of each article and the attribute of each article are effectively combined, and the space environment is more efficiently and accurately identified.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example 1:
as shown in fig. 1 to 3, embodiment 1 of the present disclosure provides a spatial environment recognition method for robot intelligent service, including the following steps:
acquiring at least one image of a spatial environment region to be identified;
according to the obtained image, a preset visual relationship obtaining model is adopted, the object features in the image and the relationship features between the objects are obtained, the triple features of the objects in the image are obtained, and then the visual relationship triple of the objects in the image is obtained;
obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article;
obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article;
and processing each acquired image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the article multiple relation graph of each acquired image to realize identification of the space environment.
The visual space is constructed by the following steps:
firstly, the object relation in the environment is mapped to a semantic level, and the cognition of the robot to the service environment space relation is enhanced. Next, in order to further understand the intrinsic logical relationship of the item, a method of obtaining the item attribute from the network is given. And finally, storing the obtained environmental knowledge in a form of a multi-relation graph to complete the construction of the visual space.
The method specifically comprises the following steps:
(1) acquisition of visual relationships
In the semantic visual space construction process, firstly, the object visual relationships in the scene, such as "Computer on task", are obtained, and these relationships generally express < subject, predictor, object > in the form of triples. The visual relation realizes higher-level understanding of semantic level and is an important way for realizing the increase from perception intelligence to cognitive intelligence. In the construction work of the semantic visual space, the visual relation directly provides the leading knowledge in the environment. Fig. 1 provides a method for acquiring a visual relationship, which combines an image and semantic information to acquire a visual relationship.
In the method, firstly, a fast RCNN (Faster convolutional neural network based on regions) technology is used for target detection, and the semantics of a target object and the coordinate information of a detection frame of the target object are obtained. And performing feature coding on the semantics of the target object, and acquiring the relationship features between the objects according to the detection frame information to complete the construction of the triple features. Inputting the triple characteristics into a Bi-RNN (bidirectional recurrent neural network) network to construct a relation triple. In particular, considering that the existence of the dynamic object can influence the robustness of the constructed visual relationship, in the framework, the dynamic article filtering module is used at the same time, so that the influence of the dynamic object on the visual relationship library is reduced.
(1-1) triple feature construction
The triple features include word vectors for constructing entity targets, and entity relationship coding. A BERT (Bidirectional Encoder representation of transducers) model is adopted for constructing word vectors of an entity target, semantic labels in a data set are used as training data and input into the BERT model, fine adjustment of the model is carried out, parameters obtained by the model are used as characteristics, and the characteristics are the word vectors.
When obtaining the relational features, the spatial relationships between the targets are utilized to model the relational features. Relationship between objects uses [ x ]1,x2,x3]Denotes x1Represents object 1, x3Represents object 3, x2Representing a relationship. The spatial relationship s is represented as follows:
in the formula 1, the first and second groups of the compound,
a prediction block (i ═ 1,3) representing a target object;
refers to the center coordinates, w, of the prediction box
iAnd h
iRepresenting the width and height of the prediction box, respectively. W and H represent the width and height of the intersection of the target 1 and target 3 prediction boxes;
are codes for relevant spatial relationships, and the codes are particularly important for representing complex spatial relationships. After the spatial relation s is obtained, the s is input into a multilayer perceptron (MLP) to obtain a feature expression x with the same dimension as the word vector
2And thus obtain the feature expression of the triplet.
(1-2) construction of visual relationship
And after the triple features are obtained, inputting the features into the Bi-RNN for visual relation prediction. The BRNN (bidirectional recurrent neural network) is an efficient natural language processing model, which is formed by superposing two RNN networks up and down, and the output is determined by the states of the two RNN (convolutional neural networks). In the visual relationship acquisition model, the output of the Bi-RNN is designed to be one rather than multiple outputs, and the input is a feature of a triplet, as shown in fig. 2.
In the framework shown in fig. 2, forward propagation is included
And backward propagation
The framework is mainly divided into an input layer, a hidden layer and an output layer. Wherein the formulas for the hidden layer and the output layer are defined as follows:
in the context of these formulas, the expression,
and
the parameter matrixes respectively represent an input layer-hidden layer and a hidden layer-hidden layer in the forward propagation process.
And
respectively, the hidden layer-output layer parameter matrix and the offset vector in the forward propagation process. The meaning of the parameters in the back propagation process can be obtained in the same way. x is the number of
tRepresenting the input, y the output, f the activation function, b
yIs an offset value.
From the framework shown in FIG. 2 and equation (4), the predicted relationship output y can be obtained:
wherein h is1,h2,h3The values in the forward propagation and the backward propagation can be obtained by formula (2) and formula (3), and y represents the predicted relationship, such as "while", "on", etc. And after the word vectors and the relation characteristics are output through Bi-RNN training, the visual relation triple can be obtained.
(1-3) dynamic object Filtering
In the process of acquiring the visual relationship, the service robot pays attention to the relationship among the articles in the environment, and the relationship among the articles is always kept unchanged for a certain time. However, the existence of dynamic objects in the environment directly affects the timeliness of the visual relationship. For example, when detecting visual relationships, relationships between people and items are often extracted, and these relationships may change rapidly. If the relation is added into the relation library, inconsistency with reality is generated when the visual relation is searched in subsequent services, meanwhile, the redundancy degree of the relation library is increased, and the service efficiency of the robot is reduced. Therefore, the obtained triple relationship containing the dynamic object needs to be filtered.
The dynamic filtering module mainly provides a dynamic object list, and when detecting that the visual relation triplets contain dynamic objects, the triplets are filtered. The objects in the list may be added according to common sense, e.g. "people". The objects can also be added according to experience in actual operation, and through multiple operation observation, if the visual relationship of a certain type of objects changes rapidly, the objects are added. Thereby finally completing the acquisition of the visual relationship.
(2) Attribute knowledge acquisition
When the robot executes the intelligent service, not only the physical relationship between the objects needs to be understood, but also the deep-level attributes of the objects need to be further understood, so that the relationship of the objects on the logical level can be better understood. Therefore, when building the semantic visual space, the attributes of the item also need to be added. Due to the diversity of items and related attributes in the environment, it is desirable to obtain the related attributes of the items from the network.
In the process of acquiring the related attributes of the article, firstly, the article semantics are used as key words to search related information on the network, and after the search is carried out, a series of result character strings are obtained. Attributes of the item are extracted from the result strings.
And extracting the attribute of the article from the obtained result character string, and identifying the attribute key words in the result by adopting an NER (named entity identification) model. In the process of entity identification, a BERT-CRF method is adopted to carry out named entity identification. In the method, text information is extracted by using a BERT model, characteristic coding is carried out, the coding is input into a CRF layer to carry out information decoding, and a labeling sequence is obtained. CRF refers to conditional random fields, a discriminative probabilistic undirected graph model for labeling and partitioning sequence data. BERT-CRF is an improvement on Bi-LSTM-CRF, and a BERT model is used for replacing Bi-LSTM, and the BERT model and the Bi-LSTM have similar training principles.
Input sequence X ═ X1,X2,…,Xn) Obtaining a predicted tag sequence y (y) through output of a BERT-CRF model1,y2,..,yn)。
The scores defining the predicted sequences are as follows:
wherein
Indicating that the output result at the ith position is y
iThe probability of (c).
Representing the transition probability from state i to i + 1. The scoring function takes the previous states into account and the obtained result is more consistent with the actual output result.
During training, for each sample X, all possible sequences y are scored as s (X, y), and the total score is normalized:
in equation 7, Y represents the correct annotation sequence and Y represents all possible annotation sequences. The loss function can be obtained by equation 7, as follows:
through continuous iterative training of the loss function in the model, the loss function is reduced to the minimum value, and therefore an ideal entity labeling effect is obtained. The attribute keywords in the result character string are extracted through the model, and the relationship is established with the objects in the environment, so that the object semantics in the environment are enriched, and the knowledge system in the semantic visual space is further improved.
(3) Storage of environmental knowledge
After the visual relationship and the attribute of the article in the environment are acquired, the semantic visual space is constructed to finish data preparation, and the connection form of the data knowledge is diversified, so that the diversified information needs to be efficiently organized and expressed in a multi-relationship graph form. The basic construction form of the multi-relationship graph is nodes and edges, wherein the nodes store entity semantic and item attribute information, and the edges represent the relationship between entities and the relationship between the entities and the attributes. Unlike a general graph, a multi-relationship graph contains multiple types of nodes and multiple types of edges. FIG. 3 is an example of a multiple relationship graph store.
In fig. 3, the gray boxes represent entity information in the environment and the white boxes represent attributes of the item. It can be seen that the visual relationship and the article attribute in the environment are combined in the form of a multiple relationship diagram, and through the combination, the robot can understand the article relationship in the environment not only at the semantic level, but also increase the knowledge of the article at the logical level, thereby forming a uniform and standard expression.
And storing the acquired environmental knowledge through a multi-relation graph to finally form a semantic visual space suitable for the intelligent service of the robot.
Example 2:
an embodiment 2 of the present disclosure provides a spatial environment recognition system for robot intelligent service, including:
a data acquisition module configured to: acquiring at least one image of a spatial environment region to be identified;
a visual relationship triplet acquisition module configured to: according to the obtained image, a preset visual relationship obtaining model is adopted, the object features in the image and the relationship features between the objects are obtained, the triple features of the objects in the image are obtained, and then the visual relationship triple of the objects in the image is obtained;
an item attribute acquisition module configured to: obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article;
an item multiple relationship diagram acquisition module configured to: obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article;
an environment identification module configured to: and processing each image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the obtained article multiple relation graph of each image to realize the identification of the environment.
The working method of the system described in this embodiment is the same as the spatial environment recognition method for the robot intelligent service in embodiment 1, and details are not repeated here.
Example 3:
the embodiment 3 of the present disclosure provides a medium on which a program is stored, which when executed by a processor, implements the steps in the spatial environment recognition method for robot intelligent service according to the embodiment 1 of the present disclosure.
Example 4:
the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, and when the processor executes the program, the steps in the method for identifying a spatial environment for robot intelligent services according to embodiment 1 of the present disclosure are implemented.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.