If the current weight value is 0.5 and the current error value is 0.1, the vector information is the gradient vector value when x is 0.5 and y is 0.1.

Step S20, training the longitudinal federal model of the preset longitudinal federal service side based on the vector information, and updating the network weight of each preset reinforcement learning model;

in this embodiment, it should be noted that the vector information includes a gradient vector.

Training the longitudinal federal model of the preset longitudinal federal service party based on the vector information, updating the network weight of each preset reinforcement learning model, specifically, training the longitudinal federal model of the preset longitudinal federal service party based on the vector information to obtain sample information, training each preset reinforcement learning model based on the sample information, and updating the network weight of each preset reinforcement learning model.

And step S30, inputting each updated preset reinforcement learning model into a preset horizontal federal server periodically, and performing iterative updating on each updated preset reinforcement learning model.

In this embodiment, it should be noted that the preset horizontal federal server is a preset server, and may combine different preset reinforcement learning models to perform horizontal federal learning, where the horizontal federal learning is to extract a part of data with the same data characteristics of participants but not identical users to perform joint machine learning when the data characteristics of the participants overlap more and the users overlap less. For example, if two banks in different regions are provided in a participant, the user groups of the two banks are respectively from the regions where the two banks are located, the intersection of the two banks is small, but the businesses of the two banks are very similar, and the recorded user data features are mostly the same, the two banks can be helped to construct a joint model to predict the customer behavior of the two banks by using horizontal federal learning, in addition, all information interaction in the embodiment can be selected to be encrypted, and whether the encryption is performed or not can be selected by the user.

And inputting each updated preset reinforcement learning model into a preset horizontal federated server at regular intervals, and performing iterative updating on each updated preset reinforcement learning model. Specifically, the updated model parameters of each preset reinforcement learning model are periodically input into a preset horizontal federal server, the model parameters are fused to obtain global model parameters, the model parameters include gradient information, weight information and the like, the global model parameters are further distributed to each preset reinforcement learning model, each preset reinforcement learning model uses the received global model parameters as a starting point of local model training or as the latest model parameters of the local model to start training or continue training the preset reinforcement learning model, as shown in fig. 2, the reinforcement learning Agent1 and the reinforcement learning Agent2 are different reinforcement learning models, the data is stored as a data storage library for storing sample information, and the data source is used for receiving sensor data sent by each preset reinforcement learning model, the controller is used for realizing the operation corresponding to the control information

In the embodiment, available public information is obtained, the available public information is input into a preset longitudinal federal service side to obtain vector information, then a longitudinal federal model of the preset longitudinal federal service side is trained based on the vector information, the network weight of each preset reinforcement learning model is updated, furthermore, each updated preset reinforcement learning model is input into a preset transverse federal server periodically, and each updated preset reinforcement learning model is updated in an iterative manner. That is, in this embodiment, first, available public information is obtained, and then, the available public information is input to a preset longitudinal federated server to obtain vector information, further, based on the vector information, training of the longitudinal federated model is performed to update the network weight of each preset reinforcement learning model, and finally, each updated preset reinforcement learning model is periodically input to a preset transverse federated server to perform iterative update on each updated preset reinforcement learning model. That is, in the present embodiment, the available public information is input into the preset longitudinal federated model, and the longitudinal federated learning is performed on the preset longitudinal federated model, so that the preset reinforcement learning models are updated, in the present embodiment, the training data for model training is more comprehensive and wide, so that the control performance of the model is improved, the model is more robust, further, the updated preset reinforcement learning models are periodically input into the preset horizontal federated server, the horizontal federated learning is performed on the preset reinforcement learning models, and the updated preset reinforcement learning models are iteratively updated, further, the control performance and robustness of the model are improved, the effective training data of the preset reinforcement learning models are increased, further, the training process with low training effect is reduced, and further, the consumption of the resources of the computing system of the single preset reinforcement learning model is reduced, so that the technical problem of high consumption of the resources of the computing system of the reinforcement learning model in the prior art is solved.

Further, referring to fig. 3, in another embodiment of the method for federated across federations and federated lengthwise federations based on the first embodiment of the present application, the step of training each of the predetermined reinforcement learning models to update each of the predetermined reinforcement learning models based on each of the vector information includes:

step S21, receiving sensor data sent by each preset reinforcement learning model, and generating control information through the longitudinal federal model based on the sensor data and the vector information;

in this embodiment, it should be noted that, based on the control information, a preset reinforcement learning model may be controlled by a preset controller, for example, if the longitudinal federal model is an unmanned vehicle, the traveling speed and the traveling direction of the unmanned vehicle may be controlled by the control information.

Receiving sensor data sent by each preset reinforcement learning model, generating control information through the longitudinal federal model based on the sensor data and the vector information, specifically, acquiring the sensor data from a local data source corresponding to the preset reinforcement learning model, and sending the sensor data to a preset public federal party, wherein the sensor data includes distance sensor data, pressure sensor data, speed sensor data and the like, that is, the sensor data indicates state information of the current time step of the longitudinal federal model, and further generating the control information through the longitudinal federal model based on the sensor data and the vector information, wherein the direction of a gradient vector corresponding to the vector information is the direction in which the longitudinal federal model needs to be trained, so that the longitudinal federal model trains the state information of the next time step, the control information may control the longitudinal federated model to train towards next time step state information.

Step S22, training the longitudinal federal model under the training environment corresponding to the control information to obtain reward information and state information of the next time step;

in this embodiment, it should be noted that the reward information is obtained by calculation through a preset reward function, the reward function is used for adding a non-linear factor to the longitudinal federal model, the next time step state information is the model state information of the longitudinal federal model after the network weight of the longitudinal federal model is updated after the longitudinal federal model is trained, and before the longitudinal federal model is updated, that is, before the next time step state information is obtained, it is determined whether the update is favorable for reducing a model error, if the model error can be reduced, the update is performed, and if the model error cannot be reduced, the update is not performed.

In a training environment corresponding to the control information, training the longitudinal federal model to obtain reward information and state information of a next time step, and specifically, in the training environment corresponding to the control information, training the longitudinal federal model to obtain reward information and network weight of each neuron of a neural network in the longitudinal federal model, that is, to obtain reward information and state information of the next time step, wherein the neuron includes a convolutional layer, a pooling layer, a full connection layer, and the like.

Step S23, storing the reward information, the next time step status information, and the control information as sample information, and updating the network weight of each of the preset reinforcement learning models based on the sample information.

In this embodiment, the reward information, the next time step state information, and the control information are stored as sample information, and the network weight of each of the preset reinforcement learning models is updated based on the sample information, specifically, the reward information, the next time step state information, and the control information are combined into sample information and stored in a data repository corresponding to each of the preset reinforcement learning models, and then each of the preset reinforcement learning models can extract the sample information from the data repository corresponding to each of the preset reinforcement learning models for training, and update the network weight of each of the preset reinforcement learning models according to a training result.

In step S23, the step of updating the network weight of each of the preset reinforcement learning models based on the sample information includes:

step S231, inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;

In this embodiment, the sample information is input to the preset reinforcement learning model as training data to train the preset reinforcement learning model to obtain a training output value, and specifically, the sample information is input to the preset reinforcement learning model as training data to perform data processing on the training data, where the data processing includes convolution, pooling, full connection, and the like, so as to obtain a training output value, where the training output value includes an image, a vector, a numerical value, and the like.

Step S232, comparing the training output value with a real output value corresponding to the training data to obtain a model error value;

in this embodiment, the training output value is compared with a real output value corresponding to the training data to obtain a model error value, specifically, for example, if the training output value is X and the real output value is Y, a difference between the training output value and the real output value is X-Y, and the current error value is (X-Y)/X.

Step S233, comparing the model error value with a preset error threshold, and if the model error value is smaller than the preset error threshold, completing training of the preset reinforcement learning model;

In this embodiment, it should be noted that the condition that the model error value is smaller than the preset error threshold is one of optional training completion conditions for completing the training of the preset reinforcement learning model, where the training completion conditions further include loss function convergence, model parameter convergence, maximum iteration number reaching, maximum training time reaching, and the like, and the model parameter includes the model error value.

In step S234, if the model error value is greater than or equal to the preset error threshold, the network weight of the preset reinforcement learning model is updated based on the model error value, and the preset reinforcement learning model is retrained.

In this embodiment, it should be noted that the network weight is a convolution kernel or a weight matrix.

If the model error value is greater than or equal to the preset error threshold value, updating the network weight of the preset reinforcement learning model based on the model error value, and retraining the preset reinforcement learning model, specifically, if the model error value is greater than or equal to the preset error threshold value, obtaining a corresponding gradient vector value based on the model error value, updating the network weight of the preset reinforcement learning model based on the gradient vector value, and retraining the preset reinforcement learning model until a preset training completion condition is reached.

In this embodiment, the control information is generated by the longitudinal federal model by receiving the sensor data sent by each preset reinforcement learning model, based on the sensor data and the vector information, and then the longitudinal federal model is trained in the training environment corresponding to the control information to obtain reward information and next time step status information, and further, the reward information, the next time step status information and the control information are stored as sample information, and the network weight of each preset reinforcement learning model is updated based on the sample information. That is, in this embodiment, first, sensor data is obtained, then, control information is generated through the longitudinal federated model based on the sensor data and the vector information, further, in a training environment corresponding to the control information, training of the longitudinal federated model is performed, reward information and next time step state information are obtained, and finally, the reward information, the next time step state information and the control information are stored, so as to obtain sample information, so that, based on the sample information, each of the preset reinforcement learning models is updated. That is, in the embodiment, the available common information corresponding to each of the preset reinforcement learning models is converted into sample information, so that the purpose of training and updating each of the preset reinforcement learning models by combining data of a plurality of the preset reinforcement learning models is achieved, the control performance and the robustness of each of the preset reinforcement learning models are greatly enhanced, the model training time and the training amount of a single preset reinforcement learning model are reduced, and the resource consumption of a computing system of the single preset reinforcement learning model is reduced.

Further, referring to fig. 4, in another embodiment of the horizontal federal and vertical federal combined method based on the first embodiment and the second embodiment of the present application, the step of periodically inputting each updated preset reinforcement learning model into a preset horizontal federal server, and the step of iteratively updating each updated preset reinforcement learning model includes:

step S31, regularly inputting each updated preset reinforcement learning model into the preset horizontal federal server, so as to perform horizontal federal on each updated preset reinforcement learning model based on preset federal rules, and obtain a horizontal federal model;

in this embodiment, it should be noted that the preset horizontal federal server is a preset server that can be used for horizontal federal learning, and the regular period may be set by a user, for example, if the regular period is set to 10 minutes, the updated preset reinforced learning models are sent to the preset horizontal federal server every 10 minutes.

The updated preset reinforcement learning models are periodically input into the preset transverse federal server to perform transverse federal on the updated preset reinforcement learning models based on preset federal rules to obtain transverse federal models, specifically, the updated preset reinforcement learning models are periodically input into the preset transverse federal server to send model parameters of the preset reinforcement learning models to the transverse federal server to fuse the model parameters to obtain global model parameters, and the preset reinforcement learning models are updated based on the global model parameters to obtain transverse federal models.

Wherein each of the updated preset reinforcement learning models comprises updated model parameters,

step S311, regularly inputting each updated model parameter into the preset horizontal federated server to fuse each updated model parameter to obtain a global model parameter;

in this embodiment, each update model parameter is periodically input to the preset horizontal federal server to fuse each update model parameter, so as to obtain a global model parameter, specifically, each update model parameter is input to the preset horizontal federal server to perform data processing of a preset rule on each update model parameter, where the data processing of the preset rule includes averaging, weighted averaging, and the like, so as to obtain the global model parameter, and a weight ratio corresponding to each update model parameter participating in weighted averaging is set by a user.

Step S312, distributing the global model parameters to each updated preset reinforcement learning model, so as to train the updated preset reinforcement learning model based on the global model parameters, and obtain the horizontal federal model.

In this embodiment, the global model parameters are distributed to each updated preset reinforcement learning model, so that the updated preset reinforcement learning model is trained based on the global model parameters to obtain the horizontal federated model, specifically, the global model parameters are distributed to each updated preset reinforcement learning model, so that the global model parameters are used as a model training starting point of each preset reinforcement learning model or directly replace local model parameters of each preset reinforcement learning model, and then the updated preset reinforcement learning model is trained to obtain the horizontal federated model.

And step S32, performing iterative updating on each updated preset reinforcement learning model based on the transverse federal model.

In this embodiment, each updated preset reinforcement learning model is iteratively updated based on the horizontal federal model, specifically, based on global model parameters in the horizontal federal model, the global model parameters are used as model training starting points of each preset reinforcement learning model or directly replace local model parameters of each preset reinforcement learning model, so as to train the updated preset reinforcement learning model, and determine whether the trained preset reinforcement learning model reaches training completion conditions, if the training completion conditions are reached, the training of the preset reinforcement learning model is completed, if the training completion conditions are not reached, the network weights of the preset reinforcement learning model are updated, the preset reinforcement learning model is retrained, and the training completion conditions are known to be reached, the training completion conditions comprise loss function convergence, model parameter convergence, maximum iteration times, maximum training time and the like.

In this embodiment, each updated preset reinforcement learning model is periodically input to the preset horizontal federal server, so that each updated preset reinforcement learning model is subjected to horizontal federal based on preset federal rules to obtain a horizontal federal model, and then each updated preset reinforcement learning model is subjected to iterative update based on the horizontal federal model. That is, in this embodiment, each updated preset reinforcement learning model is periodically input to the preset horizontal federal server, so as to perform horizontal federation on each updated preset reinforcement learning model based on preset federal rules to obtain a horizontal federal model, and then perform iterative update on each updated preset reinforcement learning model based on the horizontal federal model. That is, the implementation provides a method for performing horizontal federation, in which updated preset reinforcement learning models are periodically input to a preset horizontal federation server, the updated preset reinforcement learning models are combined to learn, a horizontal federation model corresponding to each updated preset reinforcement learning model is obtained, and then, based on the horizontal federation model, each updated preset reinforcement learning model is iteratively updated, so that the control performance and robustness of the model are further improved, the model training time and training amount of a single preset reinforcement learning model are reduced, and further, the resource consumption of a computing system of the single preset reinforcement learning model is reduced, and therefore, a foundation is laid for solving the technical problems of poor control performance and low robustness of the reinforcement learning model in the prior art.

Referring to fig. 5, fig. 5 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 5, the horizontal federal and vertical federal combined facility may include: aprocessor 1001, such as a CPU, amemory 1005, and acommunication bus 1002. Thecommunication bus 1002 is used for realizing connection communication between theprocessor 1001 and thememory 1005. Thememory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). Thememory 1005 may alternatively be a memory device separate from theprocessor 1001 described above.

Optionally, the horizontal and vertical federal combined devices may further include a rectangular user interface, a network interface, a camera, RF (Radio Frequency) circuits, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

It will be understood by those skilled in the art that the configuration of the horizontal and vertical federal combined plant illustrated in fig. 5 is not intended to be limiting of the horizontal and vertical federal combined plant and may include more or fewer components than those illustrated, or some components in combination, or a different arrangement of components.

As shown in fig. 5, amemory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and horizontal and vertical federal union programs. The operating system is a program that manages and controls the hardware and software resources of the horizontal and vertical federal combined device, and supports the operation of the horizontal and vertical federal combined device and other software and/or programs. The network communication module is used to implement communication between the components within thememory 1005, as well as communication with other hardware and software in the federated system across and across federations.

In the horizontal federal and vertical federal combined facility shown in fig. 5, theprocessor 1001 is configured to execute a horizontal federal and vertical federal combined program stored in thememory 1005, and implement the steps of any one of the horizontal federal and vertical federal combined methods described above.

The specific implementation of the horizontal federal and vertical federal combined device of the present invention is basically the same as the embodiments of the horizontal federal and vertical federal combined method, and is not described herein again.

The invention also provides a lateral federal and longitudinal federal combined device, which comprises:

Optionally, the first updating module includes:

Optionally, the first updating unit includes:

Optionally, the second updating module includes:

Optionally, the periodic transmitting unit includes:

Optionally, the input module comprises:

The specific implementation of the horizontal federal and vertical federal combined device of the present invention is basically the same as the above-mentioned embodiments of the horizontal federal and vertical federal combined method, and is not described herein again.

The present invention provides a medium which is a computer readable storage medium storing one or more programs which are further executable by one or more processors for implementing the steps of any of the above-described horizontal and vertical federal federation methods.

The specific implementation of the medium of the present invention is basically the same as the embodiments of the above-mentioned horizontal federal and vertical federal combined method, and is not described herein again.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of horizontal federation and vertical federation combination, the method comprising:

acquiring available public information, inputting the available public information into a longitudinal federal model of a preset longitudinal federal service party, and acquiring vector information, wherein the available public information is available model training information, and the vector information is a gradient vector value;

training the longitudinal federated model based on the vector information, and updating the network weight of each preset reinforcement learning model;

inputting each updated preset reinforcement learning model into a preset horizontal federal server at regular intervals, and performing iterative updating on each updated preset reinforcement learning model;

the step of training the longitudinal federated model based on the vector information and updating the network weights of the preset reinforcement learning models comprises the following steps:

training and updating the longitudinal federated model based on the vector information to obtain sample information, wherein the sample information is obtained by combining control information, next time step state information and reward information generated in the process of training and updating the longitudinal federated model;

And training each preset reinforcement learning model according to the sample information, and updating the network weight of each preset reinforcement learning model.

2. The method according to claim 1, wherein the step of training the longitudinal federated model based on the vector information and updating the network weights of the respective pre-set reinforcement learning models comprises:

3. The method of claim 2, wherein the step of updating the network weights of the pre-set reinforcement learning models based on the sample information comprises:

comparing the model error value with a preset error threshold, and finishing the training of the preset reinforcement learning model if the model error value is smaller than the preset error threshold;

and if the model error value is greater than or equal to the preset error threshold value, updating the network weight of the preset reinforcement learning model based on the model error value, and retraining the preset reinforcement learning model.

4. The method of claim 1, wherein the step of periodically inputting each updated pre-set reinforcement learning model to a pre-set horizontal federated server, and the step of iteratively updating each updated pre-set reinforcement learning model comprises:

5. The method of claim 4, wherein each of the updated pre-defined reinforcement learning models comprises updated model parameters,

6. The method of claim 1, wherein the pre-set longitudinal federal service policy includes a longitudinal federal model, the longitudinal federal model including current weight values,

7. The method of claim 1, wherein the step of obtaining available public information comprises:

8. A reinforcement learning apparatus for federal learning, which is applied to reinforcement learning equipment for federal learning, the reinforcement learning apparatus for federal learning comprising:

The system comprises an input module, a processing module and a processing module, wherein the input module is used for acquiring available public information and inputting the available public information into a longitudinal federal model of a preset longitudinal federal service side to acquire vector information, the available public information is available model training information, and the vector information is a gradient vector value;

the first updating module is used for training the longitudinal federated model based on the vector information and updating the network weight of each preset reinforcement learning model;

the second updating module is used for inputting each updated preset reinforcement learning model into a preset horizontal federal server periodically and carrying out iterative updating on each updated preset reinforcement learning model;

9. A federally learned reinforcement learning apparatus, comprising: a memory, a processor, and a program stored on the memory for implementing the reinforcement learning method for federal learning,

the memory is used for storing a program of a reinforced learning method realized in federal learning;

the processor is configured to execute a program for implementing the reinforcement learning method for federal learning so as to implement the steps of the reinforcement learning method for federal learning of any one of claims 1 to 7.

10. A medium having stored thereon a program for a method of reinforcement learning for federal learning, the program for a method of reinforcement learning for federal learning being executed by a processor to perform the steps of a method of reinforcement learning for federal learning as claimed in any one of claims 1 to 7.