Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic device structure of a hardware running environment according to an embodiment of the present invention.
It should be noted that, the horizontal federal learning system optimization device in the embodiment of the present invention may be a smart phone, a personal computer, a server, etc., which is not limited herein.
As shown in fig. 1, the lateral federal learning system optimization device can include a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the configuration of the apparatus shown in FIG. 1 does not constitute a limitation of the lateral federal learning system optimization apparatus, and may include more or fewer components than illustrated, or certain components may be combined, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a lateral federal learning system optimization procedure may be included in a memory 1005, which is a type of computer storage medium. The operating system is a program for managing and controlling hardware and software resources of the device, and supports the operation of a horizontal federal learning system optimization program and other software or programs.
When the horizontal federation learning system optimization device is a coordination device participating in horizontal federation learning, in the device shown in fig. 1, the user interface 1003 is mainly used for data communication with the client, the network interface 1004 is mainly used for establishing a communication connection with a participating device participating in horizontal federation learning, and the processor 1001 may be used for calling a horizontal federation learning system optimization program stored in the memory 1005 and performing the following operations:
Acquiring equipment resource information of each piece of equipment participating in horizontal federal learning;
Respectively configuring calculation task parameters in the federal learning model training process corresponding to each participating device according to the device resource information, wherein the calculation task parameters comprise predicted processing time step and/or predicted processing batch size;
and correspondingly transmitting the calculation task parameters to each of the participating devices so that each of the participating devices can execute the federal learning task according to the respective calculation task parameters.
Further, the step of respectively configuring the calculation task parameters in the federal learning model training process corresponding to each participating device according to the device resource information includes:
classifying each of the participating devices according to the device resource information, and determining the resource category to which each of the participating devices respectively belongs;
And respectively configuring calculation task parameters in the federal learning model training process corresponding to each participating device according to the resource category to which the participating device belongs.
Further, the step of respectively configuring the calculation task parameters in the federal learning model training process corresponding to each of the participating devices according to the resource category to which each of the participating devices belongs includes:
according to the resource category of each participation device, respectively determining candidate task parameters corresponding to each participation device;
Respectively determining the corresponding estimated processing time length of each participating device based on the candidate task parameters, and detecting whether the estimated processing time length meets the preset time length consistency condition;
And if the preset duration consistency condition is met among the predicted processing durations, the candidate task parameters of the participation equipment are correspondingly used as the calculation task parameters of the participation equipment.
Further, when the model to be trained in the lateral federal learning is a recurrent neural network model, and the calculation task parameter includes an expected processing time step, after the step of sending the calculation task parameter to each of the participating devices, the processor 1001 may be configured to invoke a lateral federal learning system optimization program stored in the memory 1005, and further perform the following operations:
according to the predicted processing time step corresponding to each of the participating devices, configuring a time step selection strategy corresponding to each of the participating devices;
And correspondingly transmitting the time step selection strategy to each participating device so that each participating device can select and acquire sequence selection data from respective sequence data according to the respective time step selection strategy, and execute a federal learning task according to the sequence selection data, wherein the time step of the sequence selection data is smaller than or equal to the respective predicted processing time step of each participating device.
Further, when the computing task parameter includes an expected processing batch size, after the step of sending the computing task parameter to each of the participating devices, the processor 1001 may be configured to invoke a lateral federal learning system optimization program stored in the memory 1005, and further perform the following operations:
according to the expected processing batch sizes respectively corresponding to the participation devices, configuring learning rates respectively corresponding to the participation devices;
and correspondingly transmitting the learning rate to each participating device so that each participating device can execute the federal learning task according to the respective learning rate and the expected processing batch size received from the coordination device.
Further, the step of sending the calculation task parameters to each of the participating devices correspondingly, so that each of the participating devices performs the federal learning task according to the respective calculation task parameters includes:
And correspondingly transmitting the calculation task parameters to each piece of participating equipment, and transmitting the estimated duration of the local global model update to each piece of participating equipment so as to adjust the times of local model training according to the estimated duration when each piece of participating equipment performs local model training according to the calculation task parameters.
Further, the step of obtaining device resource information of each participating device participating in the horizontal federal learning includes:
and receiving device resource information sent by each participating device participating in the transverse federal learning, wherein the device resource information at least comprises one or more of electric quantity resource information, computing resource information and communication resource information.
Based on the above structure, various embodiments of a method for optimizing a horizontal federal learning system are provided.
Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the optimization method of the horizontal federal learning system according to the present invention. It should be noted that although a logical order is depicted in the flowchart, in some cases the steps depicted or described may be performed in a different order than presented herein.
In this embodiment, the optimization method of the horizontal federal learning system is applied to coordination equipment participating in horizontal federal learning, where the coordination equipment and each participating equipment participating in horizontal federal learning may be devices such as a smart phone, a personal computer, and a server. In this embodiment, the method for optimizing the horizontal federal learning system includes:
Step S10, acquiring equipment resource information of each piece of equipment participating in horizontal federal learning;
In this embodiment, the coordination device and each participating device may pre-establish communication connection through inquiry handshake authentication and identity authentication, and determine a model to be trained in the federal learning, for example, may be a neural network model or other machine learning model. The model to be trained of the same or similar structure can be built locally by each participating device, or the model to be trained can be built by the coordination device and then sent to each participating device. Each participating device locally has training data for training the model to be trained.
In the horizontal federal learning, the coordination equipment and the participation equipment are mutually matched to perform multiple global model updating on the model to be trained, wherein the model updating refers to updating model parameters of the model to be trained, such as connection weight values among neurons in a neural network, and finally the model meeting the quality requirement is obtained. In one global model update, each participating device adopts respective local training data to perform local training on a local model to be trained to obtain local model parameter update, wherein the local model parameter update can be gradient information for updating model parameters or model parameters after local updating, each participating device sends the respective local model parameter update to a coordination device, the coordination device fuses each local model parameter update, such as weighted average, to obtain global model parameter update and sends the global model parameter update to each participating device, and each participating device adopts the global model parameter update to update the model parameters of the local model to be trained, namely, performs model update on the local model to be trained, so as to complete one global model update. After each global model update, the model parameters of the models to be trained locally of the participating devices are synchronized.
In the federal learning process, the coordinating device may obtain device resource information of each participating device. The device resource information may be resource information related to computing efficiency and communication efficiency in the participating device, for example, computing resource information, electric quantity resource information, communication resource information, and the like, where the computing resource may be represented by the number of CPUs and GPUs owned by the participating device, the electric quantity resource may be represented by the time when the participating device can continue to operate, and the communication resource may be represented by the communication rate of the participating device. The coordination device may send a device resource query request to each of the participating devices, and each of the participating devices uploads the current device resource information to the coordination device after receiving the request.
Step S20, respectively configuring calculation task parameters in the federal learning model training process corresponding to each participating device according to the device resource information, wherein the calculation task parameters comprise predicted processing time step and/or predicted processing batch size;
The coordination device may acquire device resource information of each participating device before performing the first global model update to configure the computing task parameters in the subsequent global model update for each participating device, or acquire device resource information of each participating device before entering a certain global model update or each global model update to configure the computing task parameters in the current global model update for each participating device. That is, the coordination device may configure the computing task parameters of each participating device that participate in the federal learning model training for each participating device by acquiring device resource information of each participating device during the federal learning model training process.
Wherein the calculation task parameter comprises a predicted processing time step and/or a predicted processing batch size, i.e. the calculation task parameter comprises a predicted processing time step or a predicted processing batch size or both. The time step refers to the number of time steps, which is a concept in the recurrent neural network model, and for sequence data, the predicted processing time step refers to the number of time steps predicted to be processed by the participating device. Batch size (mini-batch size) refers to the size of the data batch that is expected to be employed each time a model update is made, and the expected processing batch size refers to the data batch size that is expected to be employed by the participating devices in making local model updates.
After the coordination device acquires the device resource information of each participation device, the coordination device configures calculation task parameters for each participation device according to the device resource information. Specifically, the more abundant the computing resources of the participating devices, the higher the computing efficiency, the more abundant the power resources, the longer the time that can continue to participate in federal learning, the more abundant the communication resources (the larger the communication bandwidth, the shorter the communication delay), and the higher the efficiency of transmitting data. Then, for the same computing task, the more device resource information is rich, the less time it takes. The principle of configuring the calculation task parameters is to configure more calculation tasks for the participating devices with rich device resources, and configure less calculation tasks for the participating devices with less device resources, so that the processing time of each participating device is the same as much as possible. The number of computing tasks may be quantified by a projected processing time step or projected processing batch size, with a greater projected processing time step or projected processing batch size indicating a greater amount of data to be processed by the participating devices, and thus a greater projected processing time step indicates a greater computing task and a greater projected processing batch size indicates a greater computing task. It should be noted that, because the local uploading of the participant device is gradient information or model parameter information, the uploaded data will not be different in data size due to different batch sizes or time steps, and if the communication resources of the participant device are rich, the data transmission speed of the participant device is fast, the time spent on uploading the data is less, and more time can be spent on local model training, so the coordination device can allocate more calculation tasks, i.e. allocate more predicted processing time steps or predicted processing batch sizes, to the participant device with rich communication resources.
There are various ways in which the coordinating device may configure the computing task parameters based on the device resource information of the participating devices. For example, the corresponding relation between the equipment resource information and the calculation task parameters can be preset, the equipment resource information is divided into a plurality of segments according to the numerical value, the resource richness represented by each segment is different, for example, when the equipment resource information is the number of CPUs, the number of the CPUs can be divided into a plurality of segments, the more the number of the CPUs is, the more the technical resources are represented, the calculation task parameters corresponding to each segment, namely the corresponding relation, are preset, the more the segments with the represented resource richness are, the more the calculation tasks defined by the calculation task parameters are correspondingly set, and the coordination equipment judges which segment the equipment resource information of the participation equipment falls into and correspondingly configures the calculation task parameters corresponding to which segment for the participation equipment.
Step S30, sending the calculation task parameters to each of the participating devices correspondingly, so that each of the participating devices executes the federal learning task according to the respective calculation task parameters.
And the coordination device correspondingly transmits the calculation task parameters of each participation device to each participation device. And after receiving the calculation task parameters, the participating equipment executes the federation learning task by adopting the received calculation task parameters. Specifically, when the coordination device sends the calculation task parameters to be used in each subsequent global model update, the participation device participates in each subsequent global model update based on the calculation task parameters, so as to complete the federal learning task, and when the coordination device sends the calculation task parameters to be used in one global model update, the participation device participates in the one global model update based on the calculation task parameters.
Taking the example that the participating device participates in one global model update based on the calculation task parameters:
When the model to be trained is a cyclic neural network and the calculation task parameter is a predicted processing time step, the participating device selects the data of the predicted processing time step from the sequence data locally used for model training (the selected data is called sequence selection data); specifically, for each piece of sequence data composed of a plurality of pieces of time-step data, the participating device selects a part of time-step data from the sequence data as sequence selection data, the time step of the part of time-step data can be smaller than or equal to the expected processing time step, wherein when the time step of the sequence data is smaller than or equal to the expected processing time step, the sequence data is not selected and directly used as sequence selection data, for example, for a piece of sequence data with 32 pieces of time-step data, the expected processing time step is 15, the participating device can select 15 pieces of time-step data from the piece of sequence data as sequence selection data, the selected modes are selected one by one, or 15 modes are selected randomly, and the like, different selection modes can be set according to specific model application scenarios, in addition, the selected modes adopted when the time step of each piece of sequence data is updated by the overall model parameter can be different, so that local data can be used for participating in model training can be ensured, for example, the obtained time step is smaller than or equal to the expected processing time step, the obtained by selecting each piece of sequence data is selected and used for 15 pieces of local model training is carried out, the model training is carried out in a more detailed training process, the local model is carried out in a training process is not consistent with the local model, and is carried out in a training process, the coordination device fuses the local model parameter update of each participation device to obtain global model parameter update and sends the global model parameter update to each participation device, and the participation device updates the model parameters of the local model to be trained by adopting the global model parameter update after receiving the global model parameter update.
When the model to be trained is a neural network model or other machine learning model, and the calculation task parameter is the expected processing batch size, the participating device may divide the local multiple pieces of training data into multiple batches, and the size of each batch, that is, the number of pieces of training data contained in each batch, is smaller than or equal to the expected processing batch size received from the coordinating device, for example, the expected processing batch size received by the participating device is 100 pieces, the participating device has 1000 pieces of training data locally, and then the participating device may divide the local training data into 10 batches, and after the participating device batches the local training data according to the expected processing batch size, the participating device uses one batch of data to perform local model updating in the process of participating in one global model updating.
When the model to be trained is a cyclic neural network and the calculation task parameters are the expected processing time step and the expected processing batch size, the participating device performs time step selection on each piece of training data by combining the operations in the two conditions, batches the selected pieces of sequence selection data, and participates in global model updating by adopting one batch of sequence selection data each time.
It will be appreciated that since the neural network nodes corresponding to different time steps in the recurrent neural network are weighted in common, the input data to the recurrent neural network may be variable-length, i.e., the time steps of the input data may be different, thereby enabling the respective participating devices to individually perform local model training based on the different predicted processing time steps.
In the embodiment, equipment resource information of each participating equipment is acquired through coordination equipment, calculation task parameters are respectively configured for each participating equipment according to the equipment resource information of each participating equipment, the calculation task parameters comprise expected processing time step and/or expected processing batch size, and the calculation task parameters of each participating equipment are correspondingly sent to each participating equipment to enable each participating equipment to execute federal learning tasks according to the calculation task parameters. In this embodiment, computing task parameters to be processed locally of each participating device are coordinated by configuring computing task parameters for each participating device by the coordinating device, where the computing task parameters include an expected processing time step and/or an expected processing batch size, that is, how many computing tasks to be processed locally of each participating device are coordinated by configuring an expected processing time step and/or an expected processing batch size for the participating device, and different computing task parameters are configured for each participating device according to device resource information of each participating device, so that differences of each participating device in device resource conditions are considered, for participating devices with abundant device resources, more computing tasks are allocated to the participating devices with deficient device resources, and for participating devices with less device resources are allocated to the participating devices with less device resources, so that the participating devices with deficient device resources can also quickly complete local model parameter updating without spending time waiting for the participating devices with abundant computing resources.
Further, based on the first embodiment, a second embodiment of the method for optimizing a horizontal federal learning system according to the present invention is provided, and in this embodiment, the step S20 includes:
Step S201, classifying each of the participating devices according to the device resource information, and determining the resource category to which each of the participating devices respectively belongs;
In this embodiment, a possible manner is provided in which the coordination device configures the computing task parameters according to the device resource information of the participating device. Specifically, the coordination device classifies each of the participating devices according to the device resource information of each of the participating devices, and determines the resource category of each of the participating devices. The coordination device may arrange the device resource information according to the value, and may set the number of categories to be classified in advance, and then equally divide the interval formed by the minimum value and the maximum value in the sorting to obtain several divided sections of the preset category number, where each divided section is one category, and determine in which divided section the value corresponding to the device resource information belongs to which category. For example, when the device resource information includes computing resource information expressed by the number of CPUs of the participating devices, the number of CPUs of the respective participating devices may be arranged. It can be appreciated that, compared with the manner of presetting each resource category, in this embodiment, the determination of the resource category is performed according to the resource information of each device, so that the division of the resource categories better accords with the actual resource situation of the participating device, and can adapt to the resource situation that the participating device is not invariable in the federal learning process.
When the equipment resource information comprises data of multiple types of equipment resources, the data of the various equipment resources can be normalized, so that the data of the various equipment resources can be calculated and compared. The normalization may be performed in a conventional manner, and will not be described in detail herein. For example, when the equipment resources comprise computing resources, electric quantity resources and communication resources, the computing resources, the electric quantity resources and the communication resources can be calculated and compared by carrying out normalization processing on the computing resources, the electric quantity resources and the communication resources, normalized data of various equipment resources of each participating equipment are obtained for equipment resource information of each participating equipment, weight values of various resources can be set in advance according to the influence degree of various resources on the local computing efficiency of the participating equipment, then the normalized data of various equipment resources are weighted and averaged to obtain a numerical value capable of evaluating the overall resource richness of the participating equipment, and the coordination equipment carries out the sorting, dividing and classifying operations based on the numerical value of each participating equipment obtained by calculation. The method comprises the steps of normalizing equipment resource information of each participating equipment, calculating an integral resource richness value, quantifying complex equipment resource information, and accordingly configuring calculation task parameters for each participating equipment more conveniently and accurately, and dividing resource categories of each participating equipment by quantifying the complex equipment resource information, and accordingly configuring calculation tasks for the participating equipment more rapidly.
Step S202, according to the resource category of each participating device, calculating task parameters in the federal learning model training process corresponding to each participating device are respectively configured.
After determining the resource category to which each of the participating devices belongs, the coordination device configures the computing task parameters corresponding to each of the participating devices according to the resource category to which each of the participating devices belongs. Specifically, a maximum expected processing time step may be set in advance, and numbers 1,2, and 3 are set for each resource category according to the resource richness from low to high, then according to the maximum expected processing time step, an expected processing time step corresponding to each category is calculated, specifically, the maximum expected processing time step is divided by the number of the resource categories to obtain a minimum time step, and then the numbers of each resource category are multiplied by the minimum time step to obtain an expected processing time step corresponding to each resource category. For example, the maximum predicted processing time step is 32, and the number of resource categories is 4, then from low to high, the predicted processing time steps corresponding to the 4 resource categories are 8, 16, 24 and 32, respectively. Similarly, a maximum expected processing batch size may be set in advance, and then the expected processing batch size corresponding to each category may be calculated based on the maximum expected processing batch size.
Further, the step S202 includes:
step S2021, according to the resource category to which each of the participating devices belongs, determining candidate task parameters corresponding to each of the participating devices;
Further, the coordination device may determine candidate task parameters corresponding to each of the participating devices according to the resource category to which each of the participating devices belongs. Specifically, the calculation task parameter corresponding to each resource class may be calculated according to a manner similar to the above-mentioned calculation of the predicted processing time step corresponding to each resource class according to the maximum predicted processing time step, so as to determine the calculation task parameter corresponding to each participating device according to the resource class to which each participating device belongs, where the calculation task parameter is first used as a candidate task parameter.
Step S2022, determining, based on the candidate task parameters, respective estimated processing durations corresponding to the participating devices, and detecting whether a preset duration consistency condition is satisfied between the estimated processing durations;
The coordination device may determine, based on the candidate task parameters corresponding to each of the participating devices, a predicted processing duration corresponding to each of the participating devices, that is, a time required for each of the participating devices to execute the federal learning task according to the respective candidate task parameters, and specifically may be a time required for the participating device to perform local model training and update and upload the model parameters when participating in the next global model update according to the candidate task parameters.
Specifically, the coordination device may estimate, according to device resource information of each of the participating devices, a time required for each of the participating devices to perform local model training and upload model parameter updating according to the candidate task parameters. The unit time of unit time step or unit batch size can be processed by unit resource according to test or experience, so that the predicted processing time length of the participating device can be calculated according to the unit time and the resource actually owned by the participating device and the predicted processing time step or predicted processing batch size in the candidate task parameters, and the whole process can be similar to the principle of obtaining the total amount by multiplying the unit by the number. For example, when a participant device is empirically set to have 1 CPU and process data of 1 time step, it takes x time length when the local model is trained, it takes y time length when the model parameter is updated, and a certain participant device has 3 CPUs and processes data of 10 time steps, then the expected processing time length of the participant device is (10x+10y)/3.
After the coordination device calculates and obtains the expected processing duration corresponding to each of the participating devices, the coordination device can detect whether the expected processing duration of each of the participating devices meets the preset duration consistency condition. The preset duration consistency condition can be preset according to the principle of ensuring the consistency of the expected processing duration of each participating device as much as possible. For example, the preset duration coincidence condition may be set such that the difference between the maximum value and the minimum value in each of the predicted processing durations needs to be smaller than a set threshold value, and when the coordinating device detects that the difference between the maximum value and the minimum value in each of the predicted processing durations is smaller than the threshold value, it is indicated that the predicted processing durations of the respective participating devices satisfy the preset duration coincidence condition, that is, the predicted processing durations of the respective participating devices are substantially the same. When the coordination device detects that the difference value between the maximum value and the minimum value in each predicted processing time length is not smaller than the threshold value, the predicted processing time length of each participation device is not satisfied with the preset time length consistency condition, namely the difference between the predicted processing time lengths of the participation devices is larger.
Step S2023, if the preset duration coincidence condition is satisfied between the predicted processing durations, associating the candidate task parameters of each of the participating devices with the candidate task parameters as the calculation task parameters of each of the participating devices.
If the coordination device detects that each predicted processing time length meets the preset time length consistency condition, the candidate task parameters of each participation device can be correspondingly used as final calculation task parameters of each participation device.
If the coordination device detects that the predicted processing duration does not meet the preset duration consistency condition, the coordination device indicates that the difference between the predicted processing durations of the respective participation devices is larger, at this time, if the respective participation devices execute the subsequent federal learning task based on the corresponding candidate task parameters, the difference between the processing times is larger, and some participation devices may need to wait for other participation devices, for example, in a global model updating process, a part of participation devices take shorter time for local model updating and model updating parameter uploading, and another part of participation devices take longer time, so that the coordination device needs to wait with the participation devices taking shorter time until the participation devices taking longer time upload model parameter updating, and the coordination device can perform fusion to obtain the global model parameter updating, thereby completing the global model updating. Therefore, the coordination device can adjust on the basis of each candidate task parameter when detecting that each predicted processing time length does not meet the preset time length consistency condition. For example, the candidate task parameters for the participating device that is expected to be the largest in processing time may be scaled down, the expected processing time step may be scaled down, or the expected processing batch size may be scaled down, or both the expected processing time step and the expected processing batch size may be scaled down. After the adjustment is carried out on the basis of the candidate task parameters, the coordination device predicts the predicted processing time length of each participation device again based on the adjusted candidate task parameters, detects whether the predicted processing time length meets the preset time length consistency condition, and circulates in this way until the preset time length consistency condition is detected to be met.
In this embodiment, by configuring the computing task parameters for each participating device, the predicted processing duration of each participating device may meet the preset duration consistency condition, so that each participating device may be consistent as much as possible in the predicted processing duration, and thus, each participating device may not need to wait for too long time, even need not wait, and even if it is a participating device with poor device resources, it may keep pace with the participating device with rich device resources, so that each participating device may participate in the horizontal federal learning, thereby improving the overall efficiency of the horizontal federal learning, and meanwhile, may also utilize contribution brought by data of each participating device, and even if it is a participating device with poor device resources, may also utilize contribution brought by training data possessed by it.
Further, based on the first and second embodiments, a third embodiment of the method for optimizing a horizontal federal learning system according to the present invention is provided. In this embodiment, after the step S30, the method further includes:
step S40, configuring time step selection strategies corresponding to the participating devices respectively according to the predicted processing time steps corresponding to the participating devices respectively;
Further, when the model to be trained in the horizontal federal learning is a cyclic neural network model and the calculation task parameter includes an expected processing time step, the coordination device may configure a time step selection policy corresponding to each of the participating devices according to the expected processing time step corresponding to each of the participating devices. The recurrent neural network (RNN, recurrent NeuralNetwork) model according to various embodiments of the present invention may be a general RNN, or may be a deep RNN, LSTM (LongShort-Term Memory network), GRU (Gated Recurrent Unit ), indRNN (INDEPENDENTLY RECURRENT NEURAL NETWORK, independent recurrent neural network), or the like.
The time step selection strategy is a strategy for selecting part of time step data from the sequence data, and the coordinating device can be used for configuring the time step selection strategy for each participating device. For example, if the predicted processing time steps of the participating device a and the participating device B are both 15, the time step selection policy configured by the coordinating device for the participating device a may be time step data with a singular number in the selection sequence data, and the time step selection policy configured for the participating device B may be time step data with a double number in the selection sequence data. By configuring different time step selection strategies for each participating device, even complementary time step selection strategies can be formed, so that the distribution of the adopted sequence data on the time steps is different when each participating device performs local model training, the model to be trained can learn the characteristics from more different sequence data, the generalization capability of the model can be improved, namely, the model can have better prediction capability for new samples in various forms.
Step S50, the time step selection strategy is correspondingly sent to each piece of participating equipment, so that each piece of participating equipment can select and acquire sequence selection data from respective sequence data according to the respective time step selection strategy, and perform a federal learning task according to the sequence selection data, wherein the time step of the sequence selection data is smaller than or equal to the respective predicted processing time step of each piece of participating equipment.
The coordination device correspondingly sends the time step selection strategy of each participation device to each participation device. It should be noted that, the coordinating device may send the time step selection policy of each participating device and the expected processing time step to the participating device together, or may send the time step selection policy and the expected processing time step separately. And after receiving the predicted processing time step and the time step selection strategy, the participating device selects sequence selection data from the local sequence data according to the time step selection strategy, wherein the time step of the sequence selection data is smaller than or equal to the predicted processing time step.
Further, in an embodiment, when the calculation task parameter includes the expected processing batch size, after the step S30, the method further includes:
step S60, configuring learning rates respectively corresponding to the participating devices according to the expected processing batch sizes respectively corresponding to the participating devices;
Further, when the calculation task parameter includes a predicted processing batch size, the coordination device may configure a learning rate corresponding to each of the respective participant devices according to the predicted processing batch size corresponding to each of the respective participant devices. The learning rate is a super parameter in the model training process, and the coordinating device configures the learning rate for each participating device in various ways, based on the principle that the expected processing batch size is proportional to the learning rate. For example, the coordinating device may set a reference learning rate, configure a learning rate smaller than the reference learning rate for the participating device if the predicted processing batch size of the participating device is smaller than the batch size corresponding to the reference learning rate, and configure a learning rate larger than the reference learning rate for the participating device if the predicted processing batch size of the participating device is larger than the batch size corresponding to the reference learning rate.
And step S70, correspondingly transmitting the learning rate to each participating device so that each participating device can execute the federal learning task according to the respective learning rate and the expected processing batch size received from the coordination device.
After the coordination device configures the learning rate for each of the participating devices, the coordination device correspondingly transmits the learning rate of each of the participating devices to each of the participating devices. The coordination device may transmit the learning rate and the expected processing batch size of each of the participating devices to the participating device together, or may transmit the learning rate and the expected processing batch size separately. The participating devices, upon receiving the learning rate and the projected processing batch size, perform federal learning tasks based on the learning rate and the projected processing batch size, e.g., in local model training, model training using a data batch of the projected processing batch size, and updating model parameters using the learning rate.
In this embodiment, the coordination device configures a learning rate for each of the participating devices based on the predicted processing batch size of each of the participating devices, so that the coordination device can control the model convergence rate of each of the participating devices as a whole, and thus, the model convergence rates of each of the participating devices can be made to be consistent by setting different learning rates for each of the participating devices, and the model to be trained can be better converged in the federal learning process.
It should be noted that, when the calculation task parameter includes the predicted processing time step and the predicted processing batch size, the scheme of the coordination device configuring the time step selection policy and the learning rate of each participating device may also be implemented in combination. The coordination device configures a time step selection strategy of each participation device according to the expected processing time step of each participation device, configures a learning rate of each participation device according to the expected processing batch size of each participation device, correspondingly sends the time step selection strategy and the learning rate of each participation device and calculation task parameters to each participation device, the participation device selects sequence selection data from local sequence data according to the time step selection strategy, adopts the sequence selection data of the expected processing batch size to carry out local model training, and adopts the received learning rate to carry out model updating in the training process.
Further, the step S20 includes:
Step S203, sending the calculation task parameters to each of the participating devices correspondingly, and sending the estimated duration of the global model update to each of the participating devices, so that when each of the participating devices performs local model training according to the calculation task parameters, the number of local model training times is adjusted according to the estimated duration.
The coordination device sends the calculation task parameters of each participation device to each participation device, and simultaneously sends the estimated time length of the global model update of the round to each participation device. The predicted duration of the global model update of the present round may be determined according to the predicted processing duration of each participant device, for example, a maximum value of the predicted processing durations of each participant device is taken as the predicted duration of the global model update of the present round. After receiving the calculation task parameter and the predicted time length, the participating device performs local model training according to the calculation task parameter, and can adjust the frequency of the local model training according to the predicted time length. The method comprises the steps of calculating a time length 1 spent on primary local model training after primary local model training, judging whether a result obtained by subtracting the time length 1 from a predicted time length is larger than the time length 1, if so, carrying out primary local model training again, namely increasing the times of primary local model training, calculating a time length 2 spent on secondary local model training, judging whether a result obtained by subtracting the time length 1 and the time length 2 from the predicted time length is larger than the time length 2, if so, carrying out primary local model training again, and if so, not carrying out the local model training again until the rest time length is detected to be smaller than the time length spent on the latest local model training, and updating local model parameters obtained by the local model training and uploading the local model parameters to the coordination equipment. That is, the participating device increases one local model training when it determines that the remaining time is still capable of performing one local model training according to the predicted time.
In this embodiment, the coordination device sends the predicted duration of the global model update to each of the participating devices, so that when the speed of actually performing the local model training by each of the participating devices is high, the time spent waiting for other participating devices can be avoided by increasing the times of the local model training.
Further, the step S10 includes:
step S101, receiving device resource information sent by each participating device participating in the horizontal federal learning, where the device resource information at least includes one or more of electric quantity resource information, computing resource information and communication resource information.
Further, each participating device may actively upload its device resource information to the coordinating device. The coordination device receives the device resource information uploaded by each participating device. Wherein the device resource information may include at least one or more of power resource information, computing resource information, and communication resource information. Specifically, the computing resource may be represented by the number of CPUs and GPUs owned by the participating device, the power resource may be represented by the time the participating device is able to continue to operate, and the communication resource may be represented by the communication rate of the participating device.
Further, in an embodiment, each participating device may be a remote sensing satellite having different sequence image data, and each remote sensing satellite performs lateral federal learning by using respective image data, so as to train the RNN to complete the weather prediction task. The coordination device may be one of the remote sensing satellites or a base station located on the ground. The coordination equipment acquires equipment resource information of each remote sensing satellite, then respectively configures calculation task parameters for each remote sensing satellite according to the equipment resource information of each remote sensing satellite, wherein the calculation task parameters comprise predicted processing time step and/or predicted processing batch size, and correspondingly transmits the calculation task parameters of each remote sensing satellite to each remote sensing satellite for each remote sensing satellite to execute federal learning tasks according to the calculation task parameters so as to complete training of RNNs. After the trained RNN is obtained, each remote sensing satellite may input the recently captured serial remote sensing image data into the RNN, and the subsequent weather conditions may be obtained through RNN prediction. In the RNN training process, the coordination equipment coordinates the calculation tasks of each remote sensing satellite according to the equipment resource information of each remote sensing satellite, so that the remote sensing satellite with rich calculation resources is not required to spend time waiting in the training process, the overall efficiency of each remote sensing satellite for transverse federal learning is improved, and the deployment of weather forecast RNN can be accelerated. In addition, in the training process, the contribution of the data of each remote sensing satellite to model training can be applied, and the contribution brought by the remote sensing satellite with a relatively deficient resource can be utilized, so that the stability of the model is further improved, and the reliability of the weather condition addition result obtained through RNN prediction is higher.
In addition, the embodiment of the invention also provides a device for optimizing the transverse federal learning system, which is deployed in coordination equipment participating in transverse federal learning, and referring to fig. 3, the device comprises:
an acquiring module 10, configured to acquire device resource information of each participating device participating in horizontal federal learning;
The configuration module 20 is configured to configure calculation task parameters in the federal learning model training process corresponding to each participating device according to each device resource information, where the calculation task parameters include an expected processing time step and/or an expected processing batch size;
And the sending module 30 is configured to send the calculation task parameters to each of the participating devices correspondingly, so that each of the participating devices executes a federal learning task according to the calculation task parameters.
Further, the configuration module 20 includes:
The classifying unit is used for classifying the participating devices according to the device resource information and determining the resource category to which the participating devices respectively belong;
the configuration unit is used for respectively configuring the calculation task parameters in the federal learning model training process corresponding to each participation device according to the resource category to which the participation device belongs.
Further, the configuration unit includes:
The first determining subunit is used for respectively determining candidate task parameters corresponding to the participating devices according to the resource categories to which the participating devices belong;
The detection subunit is used for respectively determining the corresponding estimated processing time lengths of the participation devices based on the candidate task parameters and detecting whether the preset time length consistency condition is met between the estimated processing time lengths;
And the second determining subunit is used for correspondingly taking the candidate task parameters of the participation equipment as the calculation task parameters of the participation equipment if the preset duration consistency condition is met between the predicted processing durations.
Further, when the model to be trained in the horizontal federal learning is a cyclic neural network model and the calculation task parameter includes an expected processing time step, the configuration module 20 is further configured to configure a time step selection policy corresponding to each of the participating devices according to the expected processing time step corresponding to each of the participating devices;
The sending module 30 is further configured to send the time step selection policy to each of the participating devices correspondingly, so that each of the participating devices selects and obtains sequence selection data from respective sequence data according to the respective time step selection policy, and performs a federal learning task according to the sequence selection data, where a time step of the sequence selection data is less than or equal to the respective predicted processing time step of each of the participating devices.
Further, when the calculation task parameter includes a predicted processing batch size, the configuration module 20 is further configured to configure a learning rate corresponding to each of the participating devices according to the predicted processing batch size corresponding to each of the participating devices;
The sending module 30 is further configured to send the learning rate correspondence to each of the participating devices, so that each of the participating devices performs a federal learning task according to the respective learning rate and the expected processing batch size received from the coordinating device.
Further, the sending module 30 is further configured to send the calculation task parameter to each of the participating devices correspondingly, and send the estimated duration of the global model update to each of the participating devices, so that each of the participating devices adjusts the number of local model training according to the estimated duration when performing local model training according to the calculation task parameter.
Further, the obtaining module 10 is further configured to receive device resource information sent by each of the participating devices participating in the horizontal federal learning, where the device resource information includes at least one or more of power resource information, computing resource information, and communication resource information.
The expansion content of the specific implementation mode of the horizontal federal learning system optimization device is basically the same as that of each embodiment of the horizontal federal learning system optimization method, and the description is omitted here.
In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the storage medium is stored with a transverse federal learning system optimization program, and the transverse federal learning system optimization program realizes the steps of the transverse federal learning system optimization method when being executed by a processor.
Embodiments of the present invention of a device for optimizing a horizontal federal learning system and a computer-readable storage medium may refer to embodiments of a method for optimizing a horizontal federal learning system, and are not described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.