Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides an agent learning method, system, equipment and medium, which have stronger autonomous cognitive ability to the outside world based on a forward compression theory and a multi-level abstraction theory.
In a first aspect, an embodiment of the present invention provides an agent learning method, including the steps of:
s1: the intelligent agent establishes effective cognition of each physical entity high-dimensional structure based on external scene data, and obtains effective cognition of the low-dimensional structure through segmentation and rendering;
s2: establishing a corresponding small model based on each physical entity, and carrying out feature cognition based on forward propagation on the effective cognition of a low-dimensional structure of the physical entity in the small model to obtain a feature vector of each physical entity;
s3: establishing a corresponding abstract class based on the feature vector of each physical entity, obtaining a multi-level abstract network based on the relevance among the abstract classes, and outputting a decision value by the multi-level abstract network;
s4: the intelligent agent corrects the decision value output by the multi-level abstract network, and the feature vector and the multi-level abstract network of the physical entity corresponding to the decision value based on the decision value and rewards obtained from the change of physical environment;
s5: and establishing a corresponding relation between the corrected feature vector of the physical entity and the current human language to complete the learning of the intelligent agent.
Further, in the step S1, the agent is configured to collect and input image and video data of different target scenes, and train to obtain a deep neural network of a 3D structure of the target scene;
based on the different physical entity objects in the depth neural network segmentation scene of the full scene, constructing the depth neural networks corresponding to different object entities, wherein the depth neural networks corresponding to the different object entities are effective cognition of each physical entity high-dimensional structure;
and rendering 2D images of physical entities with different view angles based on a rendering algorithm of the deep neural network, wherein the 2D images of the physical entities with different view angles serve as effective cognition of a low-dimensional structure of each physical entity.
Further, the deep neural network is a network model of a neural radiation field NeRF.
Further, in the step S2, an organic growth mechanism is adopted in the process of feature cognition of the small model, and the organic growth mechanism is a mechanism for continuously generating a large number of small models and organically combining the small models into a large model; the organic growth mechanism actively builds a new small model applicable to identifying more new data when the small model cannot effectively identify more new data.
Further, the multi-level abstract network in the step S3 is used for simulating a world model, the world model is based on a large model and is used for simulating regular characteristics of the real world, and the regular characteristics jointly form cognition of the world; wherein the regularity feature comprises: regularity features in two-dimensional planes, three-dimensional space, and four-dimensional space-time.
Further, in step S4, the agent generates an action based on the decision value, and the reward includes human supervision feedback and real world feedback.
Further, in the step S5, the correspondence relationship between the corrected feature vector of the physical entity and the current human language is established, which includes naming the physical entity based on the feature vector of each physical entity, optimizing the feature vector of each physical entity based on more external scene data, and establishing a search and recommendation mechanism.
In a second aspect, an embodiment of the present invention provides an agent learning system, including:
and a pretreatment module: the intelligent agent establishes effective cognition of each physical entity high-dimensional structure based on external scene data, and obtains effective cognition of the low-dimensional structure through segmentation and rendering;
and (3) a small model module: establishing a corresponding small model based on each physical entity, and carrying out feature cognition based on forward propagation on the effective cognition of a low-dimensional structure of the physical entity in the small model to obtain a feature vector of each physical entity;
abstract network module: establishing a corresponding abstract class based on the feature vector of each physical entity, obtaining a multi-level abstract network based on the relevance among the abstract classes, and outputting a decision value by the multi-level abstract network;
and a correction module: the intelligent agent corrects the decision value output by the multi-level abstract network, and the feature vector and the multi-level abstract network of the physical entity corresponding to the decision value based on the decision value and rewards obtained from the change of physical environment;
and an output module: and establishing a corresponding relation between the corrected feature vector of the physical entity and the current human language to complete the learning of the intelligent agent.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of an agent learning method described above when the computer program is executed.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of an agent learning method described above.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention provides an agent learning method, system, equipment and medium, comprising the following steps: the intelligent agent establishes effective cognition of each physical entity high-dimensional structure based on external scene data, and obtains effective cognition of the low-dimensional structure through segmentation and rendering; establishing a corresponding small model based on each physical entity, and carrying out feature cognition on the effective cognition of the low-dimensional structure of the physical entity in the small model to obtain a feature vector of each physical entity; establishing a corresponding abstract class based on the feature vector of each physical entity, obtaining a multi-level abstract network based on the relevance among the abstract classes, and outputting a decision value by the multi-level abstract network; the intelligent agent corrects the decision value output by the multi-level abstract network, and the feature vector and the multi-level abstract network of the physical entity corresponding to the decision value based on the decision value and rewards obtained from the change of physical environment; establishing a corresponding relation between the corrected feature vector of the physical entity and the current human language, and completing the learning of the intelligent agent; the method and the device can autonomously recognize the world, not only rely on the data provided by human beings, but also do not depend on the data marked by human beings, and can better utilize the data perceived from the real world; meanwhile, the method can solve the problems of zero samples and small samples more stably and efficiently, does not rely on similarity and statistical probability for generalization, and establishes a more powerful multi-level abstract network for regular extraction and application; the model structure and the parameter amount of the method are not completely fixed any more, but can be increased or reduced more flexibly under the action of an organic growth mechanism; and the training can be performed by a small amount of data without relying on huge manually collected data, and the mode of organic growth is entered.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.
The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment. The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
Aiming at the problems existing in the prior art, the invention provides an agent learning method, system, equipment and medium, which have stronger autonomous cognitive ability to the outside world based on a forward compression theory and a multi-level abstraction theory; the application can be applied to a plurality of technical fields about image acquisition, including the technical fields of automatic driving, content automatic generation and various robot technical fields, and for the technical field of automatic driving, the application can realize real full-automatic driving, but not 'advanced auxiliary intelligent driving', in the technical field of content automatic generation, the application can realize more advanced intelligent creation assistant, and generate and process text, image, video and other contents, such as in various robot technical fields: household robots, work robots, etc., to realize stronger and more comprehensive human intelligent assistants.
Fig. 1 illustrates an agent learning method 100 according to an embodiment of the present invention. As shown in fig. 1, the method 100 includes the steps of:
s101: the intelligent agent establishes effective cognition of each physical entity high-dimensional structure based on external scene data, and obtains effective cognition of the low-dimensional structure through segmentation and rendering;
s102: establishing a corresponding small model based on each physical entity, and carrying out feature cognition based on forward propagation on the effective cognition of a low-dimensional structure of the physical entity in the small model to obtain a feature vector of each physical entity;
s103: establishing a corresponding abstract class based on the feature vector of each physical entity, obtaining a multi-level abstract network based on the relevance among the abstract classes, and outputting a decision value by the multi-level abstract network;
s104: the intelligent agent corrects the decision value output by the multi-level abstract network, and the feature vector and the multi-level abstract network of the physical entity corresponding to the decision value based on the decision value and rewards obtained from the change of physical environment;
s105: establishing a corresponding relation between the corrected feature vector of the physical entity and the current human language, and completing the learning of the intelligent agent; on the one hand, the intelligent agent outputs human readable characteristic values and decision values, and on the other hand, the intelligent agent can directly learn from human linguistic cognition to make better decisions, so that the learning of the intelligent agent is perfected.
It should be noted that the high-dimensional structure and the low-dimensional structure described in the present application are relative concepts, and the high-dimensional structure generally refers to a large number of features in a data set, where the features may represent different attributes or parameters, and in the image analysis process in this embodiment, a color picture has three color channels of red, green and blue, so that it forms a three-dimensional data structure, because each pixel is composed of three values, and if adding to a time dimension and other dimensions, it may form a four-dimensional or higher-dimensional data structure for video analysis; the low-dimensional structure is opposite to the low-dimensional structure, and the feature quantity in the data set is less, or the dimension of the data is lower, in the image analysis process in the embodiment, the high-dimensional structure is segmented or compressed to obtain a plurality of parameters of each attribute, which can be regarded as the data of the low-dimensional structure; it should be further noted that the high-dimensional structure and the low-dimensional structure are not absolute, and they mainly depend on the specific application scenario and the data set, and may be referred to as a "high-dimensional structure" as a "first structure" and a "low-dimensional structure" as a "second structure" if necessary, in order to distinguish the two structures.
It should be further noted that, in the forward propagation described in this embodiment, based on the forward compression theory, different from the most common backward propagation algorithm at present, the loss value is transmitted from back to front and optimizes the parameters according to the chain rule, and the forward compression theory is an updated theory, and the core is "calculating from front to back according to a specific rule, to find a set of orthogonal basis vectors of the original data, where the number of basis vectors is specified by a person and is less than or equal to the dimension of the original data, and the low-dimensional data structure of the original (high-dimensional) data is obtained based on the set of basis vectors", which typically represents ReduNet taught by the university of berkeley Ma Yi in united states.
Still further, the multi-level abstract network described in this embodiment is based on a multi-level abstract theory, which is different from the current algorithm using a structure such as CNN, transformer, and the multi-level abstract theory is based on the most basic abstract class, establishes a multi-level complex abstract network according to the relationship between abstract classes, and simulates a world model by using the multi-level abstract network; world model: the model is a main function of a brand new large model which is pushed by professor New York and drawing prize acquirer Yann Lecun in the United states, and the core of the model is to use the large model to recognize and simulate the regularity characteristics of the real world, and the regularity characteristics have 2-dimensional plane, 3-dimensional space and 4-dimensional space and time, and the regularity characteristics jointly form the cognition of the world.
Finally, the intelligent agent in the embodiment is an intelligent agent capable of automatically recognizing the regularity feature of the real world, and the intelligent agent does not completely depend on manually collected data, particularly manually marked data, but can also use manual data, and can train and form only without depending on the data; therefore, the problems of zero samples and small samples in the prior art can be solved, wherein the zero samples are the samples which are completely absent in the original training data for the pre-training model; the small sample problem is that for a pre-trained model, there are only a very small number of samples in the raw training data.
The scheme provided by the embodiment is a universal artificial intelligent algorithm, and the intelligent algorithm with certain universal capability can be applied to a plurality of fields instead of a specific field such as law, investment and the like.
In one embodiment, as shown in fig. 2, the agent in the present application establishes an effective knowledge of a high-dimensional structure of each physical entity based on external scene data, and obtains the effective knowledge of a low-dimensional structure through segmentation and rendering, which specifically includes the following steps:
s1011: collecting and inputting images and video data of different target scenes;
s1012: training to obtain a deep neural network of a 3D structure of the target scene; the image and video data are acquired based on the sensing equipment;
s1013: dividing different physical entity objects in a scene based on a deep neural network of the full scene;
s1014: constructing deep neural networks corresponding to different object entities, wherein the deep neural networks corresponding to the different object entities are effective cognition of each physical entity high-dimensional structure;
s1015: and rendering 2D images of physical entities with different view angles based on a rendering algorithm of the deep neural network, wherein the 2D images of the physical entities with different view angles serve as effective cognition of a low-dimensional structure of each physical entity.
It should be noted that, the deep neural network is a neural radiation field NeRF model, the neural radiation field NeRF is a 3D reconstruction technology of a deep neural network algorithm based on a multi-layer perceptron (MLP), a scene can be modeled into a continuous 5D radiation field which is implicitly stored in the neural network, and a neural radiation field model can be obtained through training by only inputting a multi-angle 2D image, and a clear picture under any view angle can be rendered according to the model; as shown in fig. 3, the correspondence between the images photographed by the camera at different angles and the real scene can be obtained through the nerve radiation field; a light ray is emitted from the camera and passes through the real scene, and the value of a certain pixel point of the image under the angle corresponds to the value; knowing the world coordinate position of the camera (corresponding to three coordinate values (x, y, z)), and combining the direction r of the lightd (corresponding to two direction values (theta, phi)), each ray can be tracked, then a multi-layer perceptron (MLP) in a nerve radiation field NeRF model is utilized to train to obtain four output results corresponding to the ray (three of which are color values R-G-B on an image corresponding to the ray, and the other is a volume density value corresponding to an entity in the scene), and then an RGB value and a volume density value are utilized to obtain pixel values of points on the image at the position and the angle of the camera by adopting an image rendering method.
When a large number of images with different camera positions and different angles exist, the pixel values of points on the calculated images and the pixel values of the images for realizing photographing (obtaining a difference value) can be compared, the parameter of the model is continuously corrected by using a back propagation method, the difference value between the two pixel values is minimum, and finally the nerve radiation field NeRF corresponding to the entity in the scene is obtained; thereafter, the neural radiation field can be used to calculate an image of the camera at any angle at any location as needed as an input to the small model.
In one embodiment, in step S2 of the present application, an organic growth mechanism is adopted in the process of feature cognition of the small models, where the organic growth mechanism is a mechanism that continuously generates a large number of small models and organically combines the small models into a large model, each small model corresponds to one abstract class, different small models are obviously different in function, and the abstract classes corresponding to the small models are different; when the small model cannot effectively identify more new data, the organic growth mechanism actively builds a new small model applicable to identifying more new data outside the current small model, wherein the small model is used for new feature cognition.
In one embodiment, the step of establishing a corresponding abstract class based on the feature vector of each physical entity, and obtaining a multi-level abstract network based on the relevance between abstract classes, wherein the concrete of the multi-level abstract network for outputting the decision value comprises the following steps:
s1031: the extracted features are correspondingly used as abstract classes, and relations (values) among different abstract classes are established;
s1032: constructing a multi-level abstract network based on the relation (value) of different abstract classes;
s1033: analyzing and identifying the characteristics on any new entity based on a multi-level abstract network;
the relationship (value) includes a specific numerical value of the corresponding relationship, and also includes a logical relationship between them.
In one embodiment, as shown in FIG. 6, the multi-level abstract network described herein is used to model a world model that is a large model and that is used to model the regularity features of the real world, which together make up the knowledge of the world; the regularity features include: as shown in fig. 5, the abstract class is a set containing one or more features, the same features are associated to obtain a multi-level abstract network, and a person skilled in the art can set the associated content according to actual needs to ensure that the output decision value tends to actual conditions; it should be noted that, the extracted features are corresponding to Abstract classes (Abstract classes), and the relation values of different Abstract classes are established, and a multi-level Abstract network is constructed based on the relation values, and then the multi-level Abstract network and the features on any new entity are analyzed and identified.
In one embodiment, the agent generates an action based on the decision value in step S4, and the rewards include human supervision feedback and real world feedback, and it should be noted that the rewards include a positive rewards for encouraging the model to develop to a specific target or direction during the training process, and a negative rewards for preventing the model from not utilizing human behavior or other adverse behavior.
In one embodiment, the establishing the correspondence between the feature vector of the modified physical entity and the current human language includes naming each physical entity based on the feature vector of the physical entity, and optimizing the feature vector of each physical entity based on more external scene data; simultaneously establishing a searching and recommending mechanism; in this embodiment, as shown in fig. 7, the process of establishing correspondence between the feature vector of the modified physical entity and the current human language, outputting a human-readable decision value, and completing learning of the agent includes the following steps:
s1051: inputting a predicted value/Decision value D (precision) which is made by a large model under the condition of an original state S, wherein the large model is a multi-level abstract network, and the original state S refers to external scene data which is input or collected last time and correspondingly obtains a feature vector of each physical entity;
s1052: taking action a based on the predicted/decision value D;
s1053: action a produces a result, resulting in a different physical entity object of the environmental change S', wherein influencing factors include not only the environmental change, but also feedback R given by humans for action a;
s1054: the feedback/prize Z is derived from the environmental changes,
s1055: based on the adjusted predicted/decision value D 'and action A'
S1056: and feeding back the adjusted predicted value/decision value D' to the large model, and correcting the parameter optimization model.
It should be further noted that, as shown in fig. 8, when the small model obtains the feature vector of each physical entity, the feature vector is named, which includes the following steps:
s1061: inputting a feature vector of each physical entity in the small model;
s1062: naming the input features according to human language;
s1063: establishing a correspondence between the features and the names;
s1064: based on the correspondence relationship and by using more images or video data, performing clear optimization processing on the feature vector of the physical entity in step S1061;
the process tends to naming the physical entity by human beings, establishes the correspondence relation between the feature vector and the naming, and carries out clear optimization processing on the feature vector obtained by the small model based on the subsequently input image or video data, so as to perfect a retrieval mechanism and facilitate the rapid identification of the physical entity by the intelligent agent.
In addition, as shown in fig. 9, the present invention provides an agent learning system 900, the system 900 including a preprocessing module 910, a small model module 920, an abstract network module 930, a correction module 940, and an output module 950.
The preprocessing module 910 is configured to enable the agent to establish an effective knowledge of the high-dimensional structure of each physical entity based on the external scene data, and obtain an effective knowledge of the low-dimensional structure through segmentation and rendering;
the small model module 920 is configured to establish a corresponding small model based on each physical entity, and perform feature cognition based on forward propagation on the effective cognition of the low-dimensional structure of the physical entity in the small model to obtain a feature vector of each physical entity;
the abstract network module 930 is configured to establish a corresponding abstract class based on the feature vector of each physical entity, and obtain a multi-level abstract network based on the relevance between the abstract classes, and the multi-level abstract network outputs a decision value;
the modification module 940 is configured to modify the decision value output by the multi-level abstract network, and the feature vector and the multi-level abstract network of the physical entity corresponding to the decision value, based on the decision value and rewards derived from the change in physical environment by the agent;
the output module 950 is configured to establish a correspondence relationship between the feature vector of the modified physical entity and the current human language, thereby completing the learning of the agent.
In an embodiment of the present invention, there is provided an electronic device including: a processor and a memory storing a computer program, the processor being configured to perform any of the agent learning methods of the embodiments of the present invention when the computer program is run.
Fig. 10 shows a schematic diagram of an electronic device 1000 that may implement or implement embodiments of the present invention, and in some embodiments may include more or fewer electronic devices than shown. In some embodiments, it may be implemented with a single or multiple electronic devices. In some embodiments, implementation may be with cloud or distributed electronic devices.
As shown in fig. 10, the electronic device 1000 includes a processor 1001 that can perform various appropriate operations and processes in accordance with programs and/or data stored in a Read Only Memory (ROM) 1002 or programs and/or data loaded from a storage portion 1008 into a Random Access Memory (RAM) 1003. The processor 1001 may be a multi-core processor, or may include a plurality of processors. In some embodiments, the processor 1001 may include a general-purpose main processor and one or more special coprocessors, such as a Central Processing Unit (CPU), a Graphics Processor (GPU), a neural Network Processor (NPU), a Digital Signal Processor (DSP), and so forth. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processor 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The above-described processor is used in combination with a memory to execute a program stored in the memory, which when executed by a computer is capable of implementing the methods, steps or functions described in the above-described embodiments.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, a touch screen, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008. Only some of the components are schematically illustrated in fig. 9, which does not mean that the computer system 1000 includes only the components illustrated in fig. 9.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer or its associated components. The computer may be, for example, a mobile terminal, a smart phone, a personal computer, a laptop computer, a car-mounted human-computer interaction device, a personal digital assistant, a media player, a navigation device, a game console, a tablet, a wearable device, a smart television, an internet of things system, a smart home, an industrial computer, a server, or a combination thereof.
Although not shown, in an embodiment of the present invention, there is provided a storage medium storing a computer program configured to, when executed, perform any of the file difference-based compiling methods of the embodiment of the present invention.
Storage media in embodiments of the invention include both permanent and non-permanent, removable and non-removable items that may be used to implement information storage by any method or technology. Examples of storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device.
Methods, programs, systems, apparatus, etc. in accordance with embodiments of the invention may be implemented or realized in single or multiple networked computers, or in distributed computing environments. In the present description embodiments, tasks may be performed by remote processing devices that are linked through a communications network in such a distributed computing environment.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Thus, it will be apparent to those skilled in the art that the functional modules/units or controllers and associated method steps set forth in the above embodiments may be implemented in software, hardware, and a combination of software/hardware.
The acts of the methods, procedures, or steps described in accordance with the embodiments of the present invention do not have to be performed in a specific order and still achieve desirable results unless explicitly stated. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Various embodiments of the invention are described herein, but for brevity, description of each embodiment is not exhaustive and features or parts of the same or similar between each embodiment may be omitted. Herein, "one embodiment," "some embodiments," "example," "specific example," or "some examples" means that it is applicable to at least one embodiment or example, but not all embodiments, according to the present invention. The above terms are not necessarily meant to refer to the same embodiment or example. Those skilled in the art may combine and combine the features of the different embodiments or examples described in this specification and of the different embodiments or examples without contradiction.
The exemplary systems and methods of the present invention have been particularly shown and described with reference to the foregoing embodiments, which are merely examples of the best modes for carrying out the systems and methods. It will be appreciated by those skilled in the art that various changes may be made to the embodiments of the systems and methods described herein in practicing the systems and/or methods without departing from the spirit and scope of the invention as defined in the following claims.