Movatterモバイル変換


[0]ホーム

URL:


CN115511042B - Training method and device for realizing continuous learning neural network model and electronic equipment - Google Patents

Training method and device for realizing continuous learning neural network model and electronic equipment

Info

Publication number
CN115511042B
CN115511042BCN202110686037.4ACN202110686037ACN115511042BCN 115511042 BCN115511042 BCN 115511042BCN 202110686037 ACN202110686037 ACN 202110686037ACN 115511042 BCN115511042 BCN 115511042B
Authority
CN
China
Prior art keywords
floating point
historical
weight
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110686037.4A
Other languages
Chinese (zh)
Other versions
CN115511042A (en
Inventor
尚大山
张握瑜
李熠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microelectronics of CAS
Original Assignee
Institute of Microelectronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microelectronics of CASfiledCriticalInstitute of Microelectronics of CAS
Priority to CN202110686037.4ApriorityCriticalpatent/CN115511042B/en
Publication of CN115511042ApublicationCriticalpatent/CN115511042A/en
Application grantedgrantedCritical
Publication of CN115511042BpublicationCriticalpatent/CN115511042B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a training method and device for realizing a neural network model for continuous learning and electronic equipment, and relates to the fields of machine learning and artificial intelligence. A training method for realizing continuous learning of a neural network model comprises the steps of obtaining historical floating point weights of the neural network model, carrying out binarization processing on the historical floating point weights to determine historical binary weights, processing training samples by utilizing the historical binary weights of the neural network model to obtain errors of the historical binary weights, determining current actual floating point weights based on the historical floating point weights and the errors of the historical binary weights, and adjusting the floating point weights of the neural network model according to the current actual floating point weights. The method provided by the invention can not only realize accurate identification of the current task, but also memorize the task learned before, realize continuous learning of the binary neural network, improve the robustness of the neural network model and reduce the power consumption in the data processing process.

Description

Training method and device for realizing continuous learning neural network model and electronic equipment
Technical Field
The invention relates to the field of machine learning and artificial intelligence, in particular to a training method and device for a neural network model for realizing continuous learning and electronic equipment.
Background
With the development of the fields of machine learning and artificial intelligence, deep neural networks capable of realizing continuous learning function are being studied more and more. Continuous learning is an important deep learning function, which refers to the ability to learn knowledge that was learned before while learning new knowledge.
The human brain realizes continuous learning through complex biological systems such as a Hubble rule and a complementary learning system. However, the conventional artificial neural network can generate catastrophic forgetting (Catastrophic forgetting) under the scene of learning new knowledge, namely, the parameters of the neural network are rewritten, and the parameters of the neural network which are learned before can not be reserved, so that continuous learning can not be realized.
At present, various algorithms can avoid catastrophic forgetting and realize continuous learning in a data stream, but the algorithms have higher requirements on data precision and need to adopt floating point data.
On the one hand, however, implementing the continuous learning algorithm using conventional complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) circuit hardware requires a complex circuit design, and is structurally complex, resulting in high power consumption. On the other hand, the precision is difficult to meet the requirement when the novel in-memory computing device is realized by hardware, the novel in-memory computing device has non-ideal characteristics, the fixed-point 8-bit (bit) precision can be realized at present, and the requirement on the floating-point 32-bit computing precision can not be met. And, the power consumption required for high-precision data processing is high.
Disclosure of Invention
The invention aims to provide a training method, a training device and electronic equipment for realizing a neural network model for continuous learning, which are used for solving the problems of high storage capacity requirement and high data precision requirement of the existing continuous learning algorithm.
In a first aspect, the present invention provides a training method for implementing a neural network model for continuous learning, including:
acquiring historical floating point weights of the neural network model;
performing binarization processing on the historical floating point weight to determine a historical binary weight;
processing a training sample by using the historical binary weight of the neural network model to obtain an error of the historical binary weight;
determining a current actual floating point weight based on the historical floating point weight and an error of the historical binary weight;
And adjusting the floating point weight of the neural network model according to the current actual floating point weight.
Under the condition of adopting the technical scheme, the training method for realizing continuous learning can be used for acquiring the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing a training sample by utilizing the historical binary weight of the neural network model to obtain an error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the method provided by the invention can accurately identify the current task and memorize the task learned before by introducing the mixed precision training method of the binary neural network, thereby reducing the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.
In one possible implementation, the processing training samples using the historical binary weights of the neural network model to obtain an error of the historical binary weights includes:
processing training samples by using historical binary weights of the neural network model to obtain sample loss;
And determining the error of the historical binary weight according to the sample loss.
In one possible implementation, the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight includes:
and under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.
In one possible implementation, the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight includes:
And under the condition that the sign of the error of the historical binary weight is opposite to that of the historical binary weight and the error of the historical binary weight is smaller than a weight error threshold, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.
In one possible implementation, the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight includes:
Acquiring an error weight preset value corresponding to the error of the historical binary weight under the condition that the error of the historical binary weight is opposite to the sign of the historical binary weight and the binarization weight is larger than a weight error threshold value;
and determining the sum of the historical floating point weight and the error weight preset value as the current actual floating point weight.
In one possible implementation manner, if the loss function of the neural network model converges, the adjusting the floating point weight of the neural network model according to the current actual floating point weight includes:
and the floating point weight of the neural network model is the current actual floating point weight.
In one possible implementation manner, if the loss function of the neural network model does not converge, the adjusting the floating point weight of the neural network model according to the current actual floating point weight includes:
And updating the initial floating point weight of the neural network model according to the current actual floating point weight.
In a second aspect, the present invention further provides a training apparatus for implementing a neural network model for continuous learning, the apparatus comprising:
The acquisition module is used for acquiring the historical floating point weight of the neural network model;
the first determining module is used for carrying out binarization processing on the historical floating point weight and determining historical binary weight;
the obtaining module is used for processing training samples by utilizing the historical binary weight of the neural network model to obtain errors of the historical binary weight;
The second determining module is used for determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight;
and the adjusting module is used for adjusting the floating point weight of the neural network model according to the current actual floating point weight.
In one possible implementation, the obtaining module includes:
the obtaining submodule is used for processing training samples by utilizing the historical binary weight of the neural network model to obtain sample loss;
And the first determining submodule is used for determining the error of the historical binary weight according to the sample loss.
In one possible implementation manner, the second determining module includes:
And the second determining submodule is used for determining that the sum of the errors of the historical floating point weight and the historical binary weight is the current actual floating point weight under the condition that the errors of the historical binary weight and the historical binary weight have the same sign.
In one possible implementation manner, the second determining module includes:
And the third determining submodule is used for determining that the sum of the errors of the historical floating point weight and the historical binary weight is the current actual floating point weight under the condition that the signs of the errors of the historical binary weight and the historical binary weight are opposite and the errors of the historical binary weight are smaller than a weight error threshold value.
In one possible implementation manner, the second determining module includes:
the acquisition sub-module is used for acquiring an error weight preset value corresponding to the error of the historical binary weight under the condition that the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is larger than a weight error threshold value;
And the fourth determination submodule is used for determining that the sum of the historical floating point weight and the error weight preset value is the current actual floating point weight.
In one possible implementation manner, if the loss function of the neural network model converges, the adjusting module includes:
and a fifth determination submodule, configured to determine a floating point weight of the neural network model as the current actual floating point weight.
In one possible implementation, if the loss function of the neural network model does not converge, the adjustment module includes:
and the updating sub-module is used for updating the initial floating point weight of the neural network model according to the current actual floating point weight.
The training device for implementing the continuous learning neural network model provided in the second aspect has the same beneficial effects as the training method for implementing the continuous learning neural network model described in the first aspect or any possible implementation manner of the first aspect, and is not described herein.
In a third aspect, the invention also provides an electronic device comprising one or more processors and one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform the training apparatus of the neural network model described in any of the possible implementations of the second aspect to achieve continuous learning.
The beneficial effects of the electronic device provided in the third aspect are the same as the beneficial effects of the training device for implementing the neural network model for continuous learning described in the second aspect or any possible implementation manner of the second aspect, which are not described herein.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 shows a schematic structural diagram of an artificial intelligence processing device according to an embodiment of the present application;
Fig. 2 is a schematic flow chart of a training method for implementing a neural network model for continuous learning according to an embodiment of the present application;
FIG. 3 is a flowchart of another training method for implementing a neural network model for continuous learning according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a data set segmentation provided in an embodiment of the present application;
FIG. 5 shows a structural flow chart of a training device for implementing a neural network model for continuous learning according to an embodiment of the present application;
fig. 6 is a schematic hardware structure of an electronic device according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of a chip according to an embodiment of the present invention.
Detailed Description
In order to clearly describe the technical solution of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. For example, the first threshold and the second threshold are merely for distinguishing between different thresholds, and are not limited in order. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
In the present invention, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes an association of associated objects, meaning that there may be three relationships, e.g., A and/or B, and that there may be A alone, while A and B are present, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a, b or c) of a, b, c, a and b combination, a and c combination, b and c combination, or a, b and c combination, wherein a, b and c can be single or multiple.
An embodiment of the application provides an artificial intelligence processing device, and fig. 1 shows a schematic structural diagram of the artificial intelligence processing device provided by the embodiment of the application, as shown in fig. 1, the artificial intelligence processing device 01 comprises an artificial intelligence processor 01A, a storage unit 01B in communication connection with the artificial intelligence processor 01A, and an interface component 01C in communication connection with the artificial intelligence processor 01A and the storage unit 01B respectively, wherein an instruction set is stored in the storage unit 01B, and the artificial intelligence processor 01A can realize a learning process of the instruction set based on the instruction set and the control of the interface component 01C.
Wherein, in the built-in data type of the instruction set (Instruction Set Architecture), the floating point is in binary format. Floating points represent real numbers, whose value gap is determined by an exponent, and thus have a very wide range of values, the closer the value is to zero, the more accurate.
Fig. 2 is a schematic flow chart of a training method for implementing a neural network model for continuous learning according to an embodiment of the present application, where, as shown in fig. 2, the training method for a neural network model includes:
and 101, acquiring historical floating point weights of the neural network model.
In the present application, the neural network model may be a convolutional neural network, a fully-connected neural network, or other types of neural networks, which is not particularly limited in the embodiment of the present application. The neural network model has various architectures, such as ResNet network architecture, VGG network architecture, YOLO architecture, etc., and of course, the neural network model can also be designed according to actual situations. For example, the neural network model comprises a task input layer, a full connection layer and a prediction output layer which are sequentially connected in a communication way, and the neural network model can be trained in a continuous learning mode to realize correct recognition of re-input tasks.
In the method, the full-connection layer comprises a first full-connection layer, a second full-connection layer and a third full-connection layer, wherein the task input-output layer comprises 784 neurons, the first full-connection layer and the second full-connection layer respectively comprise 256 neurons, the prediction output layer comprises 10 neurons, and each full-connection layer adopts the method.
Step 102, binarizing the historical floating point weight to determine the historical binary weight.
In the forward propagation of neural networks, a sign function is typically employed as a weight binarization function,
Wb=sign(Wh)。
Where x and Wh represent historical floating point weights, y and Wb represent historical binary weights, sign (·) is a binary sign function.
After the historical binary weight is obtained, the historical binary weight can be used as the forward propagation weight of the neural network model, in the application, the activation function adopts a symbol function, training data corresponding to the current task can be respectively and sequentially propagated forward through the full-connection layer, the normalization layer and the activation function layer, and the output of the last full-connection layer passes through the output function.
Wherein the output function may be y=softmax (x);
wherein y represents an output value, and x represents a net activity value of the last full-connection layer corresponding to the current task. That is, the value of the ith output neuron may be:
Where yi represents the output value of the ith output neuron, and xi represents the net activity value of the ith neuron corresponding to the current task.
And 103, processing the training sample by using the historical binary weight of the neural network model to obtain an error of the historical binary weight.
The training data can be transmitted forward on the neural network with the historical binary weight, errors are calculated through the cross entropy function, and the update quantity of the binary weight is obtained. That is, the training samples may be processed with historical binary weights of the neural network model to obtain errors in the historical binary weights. After the training data corresponding to the current task is propagated back and forth once, the error (ΔWh) of the historical binary weight can be calculated by using a cross entropy function. In particular, the implementation process of the step 103 may include the following substeps:
And S1, initializing the neural network model weight into a floating point weight, obtaining an initial floating point weight, and binarizing the initial floating point weight to obtain a historical binary weight.
And S2, utilizing a neural network with historical binary weights to forward propagate, calculating to obtain errors, and calculating the update quantity of the historical binary weights according to the losses. The data may be derived using y=hardtanh (x) as the activation function. Wherein, theX represents input data and y represents output data.
Step 104, determining the current actual floating point weight based on the errors of the historical floating point weight and the historical binary weight.
In an alternative embodiment provided by the invention, under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight, determining the sum of the error of the historical floating point weight and the historical binary weight as the current actual floating point weight.
In still another embodiment of the present invention, when the sign of the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is smaller than the weight error threshold, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.
Wherein the weight error threshold may be: Wherein DeltaWhmax represents the maximum update amount of Wh, m represents the memory coefficient, epsilon represents the minimum value, and the function of preventing the denominator from being 0 is achieved.
In another embodiment provided by the embodiment of the invention, under the condition that the sign of the error of the historical binary weight is opposite to that of the historical binary weight and the historical binary weight is greater than a weight error threshold value, acquiring an error weight preset value corresponding to the error of the historical binary weight;
And determining the sum of the historical floating point weight and the error weight preset value as the current actual floating point weight.
And step 105, adjusting floating point weights of the neural network model according to the current actual floating point weights.
In an alternative embodiment provided by the embodiment of the present invention, the loss function of the neural network model converges, and the floating point weight of the neural network model is determined to be the current actual floating point weight.
In another alternative embodiment provided by the embodiment of the present invention, if the loss function of the neural network model does not converge, the historical floating point weight of the neural network model is updated according to the current actual floating point weight.
After the current task is learned, the next task is continuously learned, at this time, the floating point weight of the initial neural network model is not required to be re-initialized, the current task is updated to the historical floating point weight after the learning is completed, the steps of the method are circulated, when the next task arrives, the actual floating point weight is determined based on the historical floating point weight, namely, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network can be improved, and the power consumption in the data processing process can be reduced.
The training method for the neural network model for realizing continuous learning provided by the embodiment of the invention can be used for obtaining the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing training samples by utilizing the historical binary weight of the neural network model to obtain the error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the training method of the binary neural network hybrid precision is introduced, so that the neural network model not only can accurately identify the current task, but also can memorize the task learned before, thereby avoiding the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.
Optionally, fig. 3 is a schematic flow chart of another training method for implementing a neural network model for continuous learning according to an embodiment of the present application, referring to fig. 3, the training method for implementing a neural network model for continuous learning includes:
step 201, acquiring historical floating point weights of a neural network model.
In the present application, the neural network model may be a convolutional neural network, a fully-connected neural network, or other types of neural networks, which is not particularly limited in the embodiment of the present application.
The neural network model comprises a task input layer, a first full-connection layer, a second full-connection layer, a third full-connection layer and a prediction output layer which are sequentially in communication connection, and can be trained in a continuous learning mode to realize correct recognition of re-input tasks.
In the method, the full-connection layer comprises a first full-connection layer, a second full-connection layer and a third full-connection layer, wherein the task input-output layer comprises 784 neurons, the first full-connection layer and the second full-connection layer respectively comprise 256 neurons, the prediction output layer comprises 10 neurons, and each full-connection layer adopts the method.
Wherein, in the built-in data type of the instruction set (Instruction Set Architecture), the floating point is in binary format. Floating points represent real numbers, whose value gap is determined by an exponent, and thus have a very wide range of values, the closer the value is to zero, the more accurate.
In the application, the historical floating point weight refers to the floating point weight corresponding to the neural network model after the neural network model identifies the previous input task.
When the current task arrives, a historical floating point weight of the initial neural network model can be acquired. In the case where the current task is the first task, the floating point weight of the neural network model may be initialized first, with the initialized floating point weight being taken as the historical floating point weight.
In the case where the current task is not the first task, the floating point weight of the last task of the neural network model may be obtained as the historical floating point weight.
The input data set may be divided into a plurality of tasks by means of segmentation, for example, in the case that the input data set is an MNIST data set, the input data set may be divided into 5 tasks in sequence.
In the present application, the input data set may be voice data, image data, text data, or the like, which is not particularly limited in the embodiment of the present application.
Step 202, binarizing the historical floating point weight to determine the historical binary weight.
In the forward propagation of neural networks, a sign function is typically employed as a weight binarization function,
Wb=sign(Wh);
Where x and Wh each represent historical floating point weights, and y and Wb each represent historical binary weights.
After the historical binary weight is obtained, the historical binary weight can be used as the forward propagation weight of the neural network model, in the application, the activation function adopts a symbol function, training data corresponding to the current task can be respectively and sequentially propagated forward through the full-connection layer, the normalization layer and the activation function layer, and the output of the last full-connection layer passes through the output function.
Wherein the output function may be:
y=Softmax(x);
Where y represents the output value and x represents the net activity value of the last fully connected layer of the current task.
That is, the value of the ith output neuron may be:
Where yi represents the output value of the ith output neuron and xi represents the net activity value of the corresponding ith neuron for the current task.
And 203, processing the training sample by using the historical binary weight of the neural network model to obtain sample loss.
In the application, training data can be transmitted forward on a neural network with historical binary weights, errors are calculated through a cross entropy function, and the update quantity of the binary weights is obtained. That is, the training samples may be processed with historical binary weights of the neural network model to obtain errors in the historical binary weights. After the training data corresponding to the current task is propagated back and forth once, the error (ΔWh) of the historical binary weight can be calculated by using a cross entropy function.
In the application, the neural network model weight can be initialized to a floating point type weight to obtain a historical floating point weight, the historical floating point weight is binarized to obtain a historical binary weight, the neural network with the historical binary weight is utilized to forward propagate, the error is calculated, and the update quantity of the historical binary weight is calculated according to the loss, namely the error of the calculated historical binary weight.
And 204, determining the error of the historical binary weight according to the sample loss.
After the training data corresponding to the current task is propagated back and forth once, the error (ΔWh) of the historical binary weight can be calculated by using a cross entropy function.
In the method, y=Hardtan (x) can be used as an activation function to derive data in the process of calculating the error of the historical binary weight by adopting the cross entropy function.
Wherein, the
In the above formula, x represents input data and y represents output data.
Step 205, determining the current actual floating point weight based on the error of the historical floating point weight and the historical binary weight.
In an alternative embodiment provided by the invention, under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight, determining the sum of the error of the historical floating point weight and the historical binary weight as the current actual floating point weight.
In still another embodiment of the present invention, when the sign of the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is smaller than the weight error threshold, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.
Wherein the weight error threshold may be: Wherein DeltaWhmax represents the maximum update amount of Wh, m represents the memory coefficient, epsilon represents the minimum value, and the function of preventing the denominator from being 0 is achieved.
In another embodiment provided by the embodiment of the invention, under the condition that the sign of the error of the historical binary weight is opposite to that of the historical binary weight and the error of the historical binary weight is larger than a weight error threshold value, an error weight preset value corresponding to the error of the historical binary weight is obtained;
And determining the sum of the historical floating point weight and the error weight preset value as the current actual floating point weight.
And 206, adjusting the floating point weight of the neural network model according to the current actual floating point weight.
In an alternative embodiment provided by the embodiment of the present invention, the loss function of the neural network model converges, and the floating point weight of the neural network model is determined to be the current actual floating point weight.
In another alternative embodiment provided by the embodiment of the present invention, if the loss function of the neural network model does not converge, the initial floating point weight of the neural network model is updated according to the current actual floating point weight.
By adopting the neural network model trained by the embodiment of the application, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.
In the present application, the historical floating point weight refers to the floating point weight trained by the neural network model input last time. Of course, the floating point weight trained by any historical input of the neural network model can also be used.
In the case where the current task is not the first task, the floating point weight of the last task of the neural network model may be obtained as the historical floating point weight.
The input data set can be divided into a plurality of tasks in a segmentation way, fig. 4 shows a schematic diagram of data set segmentation provided by the embodiment of the application, as shown in fig. 4, the MNIST data set can be segmented into five tasks X, the neural network model learns one task at a time, when learning the current task, only training data corresponding to the current task is provided, and training data of historical tasks is not provided, so that the requirement on storage capacity can be reduced, and the power consumption in the training process is reduced.
Referring to fig. 4, in processing a data set divided into five tasks, historical floating point weights of a neural network model may be obtained as the current task arrives. In the case where the current task is the first task, the floating point weight of the neural network model may be initialized first, with the initialized floating point weight being taken as the historical floating point weight.
Further, by adopting the training method of the application, the floating point weight corresponding to the first task is obtained, the floating point weight corresponding to the first task is updated to be the historical floating point weight, when the second task arrives, the next task continues to be learned, the floating point weight of the initial neural network model is not required to be restarted at the moment, the current task is updated to be the historical floating point weight after the learning is finished, and the steps of the method are circulated until the fifth task is processed. In the process, when the next task arrives, the actual floating point weight is determined based on the historical floating point weight, namely, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network can be improved, and the power consumption in the data processing process can be reduced.
The training method for the neural network model for realizing continuous learning provided by the embodiment of the invention can be used for obtaining the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing training samples by utilizing the historical binary weight of the neural network model to obtain the error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the method provided by the invention can accurately identify the current task and memorize the task learned before by introducing the training method of the binary neural network hybrid precision, thereby avoiding the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.
Fig. 5 shows a schematic structural diagram of a training device for implementing a neural network model for continuous learning according to an embodiment of the present application, and as shown in fig. 5, the training device 300 for implementing a neural network model for continuous learning includes:
an obtaining module 301, configured to obtain a historical floating point weight of the neural network model;
The first determining module 302 is configured to perform binarization processing on the historical floating point weight, and determine a historical binary weight;
An obtaining module 303, configured to process the training sample by using the historical binary weight of the neural network model, and obtain an error of the historical binary weight;
A second determining module 304, configured to determine a current actual floating point weight based on the historical floating point weight and the error of the historical binary weight;
an adjustment module 305 is configured to adjust the floating point weight of the neural network model according to the current actual floating point weight.
Optionally, the obtaining module includes:
the first obtaining submodule is used for processing training samples by using historical binary weights of the neural network model to obtain sample loss;
And the second obtaining submodule is used for determining the error of the historical binary weight according to the sample loss.
Optionally, the first obtaining submodule includes:
optionally, the obtaining module includes:
The obtaining submodule is used for processing training samples by utilizing historical binary weights of the neural network model to obtain sample loss;
And the first determination submodule is used for determining the error of the historical binary weight according to the sample loss.
In one possible implementation, the second determining module includes:
and the second determining submodule is used for determining that the sum of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight.
Optionally, the second determining module includes:
and the third determining submodule is used for determining that the sum of the errors of the historical floating point weight and the historical binary weight is the current actual floating point weight under the condition that the signs of the errors of the historical binary weight and the historical binary weight are opposite and the errors of the historical binary weight are smaller than a weight error threshold value.
Optionally, the second determining module includes:
The acquisition sub-module is used for acquiring an error weight preset value corresponding to the error of the historical binary weight under the condition that the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is larger than a weight error threshold value;
And the fourth determination submodule is used for determining that the sum of the historical floating point weight and the error weight preset value is the current actual floating point weight.
Optionally, if the loss function of the neural network model converges, the adjusting module includes:
and a fifth determination submodule, wherein the floating point weight of the neural network model is the current actual floating point weight.
Optionally, if the loss function of the neural network model does not converge, the adjusting module includes:
and the updating sub-module is used for updating the initial floating point weight of the neural network model according to the current actual floating point weight.
After the current task is learned, the next task is continuously learned, at this time, the floating point weight of the initial neural network model is not required to be re-initialized, the current task is updated to the historical floating point weight after the learning is completed, the steps of the method are circulated, when the next task arrives, the actual floating point weight is determined based on the historical floating point weight, namely, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network can be improved, and the power consumption in the data processing process can be reduced.
The training device for the neural network model for realizing continuous learning provided by the embodiment of the invention can be used for acquiring the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing training samples by utilizing the historical binary weight of the neural network model to obtain the error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the training method of the binary neural network hybrid precision is introduced, so that the neural network model not only can accurately identify the current task, but also can memorize the task learned before, thereby reducing the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.
The training device for realizing the neural network model for continuous learning provided by the invention is applied to a training method for realizing the neural network model for continuous learning shown in any one of fig. 1 to 4, wherein the training method comprises a controller and at least one detection circuit electrically connected with the controller, and the training method is not repeated here.
The electronic device in the embodiment of the invention can be a device, a component in a terminal, an integrated circuit, or a chip. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present invention are not limited in particular.
The electronic device in the embodiment of the invention can be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, and the embodiment of the present invention is not limited specifically.
Fig. 6 shows a schematic hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic device 400 includes a processor 410.
As shown in FIG. 6, the processor 410 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
As shown in fig. 6, the electronic device 400 may further include a communication line 440. Communication line 440 may include a path to communicate information between the above-described components.
Optionally, as shown in fig. 6, the electronic device may further include a communication interface 420. The communication interface 420 may be one or more. Communication interface 420 may use any transceiver-like device for communicating with other devices or communication networks.
Optionally, as shown in fig. 6, the electronic device may also include a memory 430. Memory 430 is used to store computer-executable instructions for performing aspects of the present invention and is controlled by the processor for execution. The processor is configured to execute computer-executable instructions stored in the memory, thereby implementing the method provided by the embodiment of the invention.
As shown in fig. 6, the memory 430 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 430 may be stand alone and be coupled to the processor 410 via a communication line 440. Memory 430 may also be integrated with processor 410.
Alternatively, the computer-executable instructions in the embodiments of the present invention may be referred to as application program codes, which are not particularly limited in the embodiments of the present invention.
In a particular implementation, as one embodiment, as shown in FIG. 6, processor 410 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 6.
In a specific implementation, as an embodiment, as shown in fig. 6, the terminal device may include a plurality of processors, such as a first processor 4101 and a second processor 4102 in fig. 6. Each of these processors may be a single-core processor or a multi-core processor.
Fig. 7 is a schematic structural diagram of a chip according to an embodiment of the present invention. As shown in fig. 7, the chip 500 includes one or more (including two) processors 410.
Optionally, as shown in fig. 7, the chip further includes a communication interface 420 and a memory 430, and the memory 430 may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (non-volatile random access memory, NVRAM).
In some implementations, as shown in FIG. 7, the memory 430 stores elements, execution modules or data structures, or a subset thereof, or an extended set thereof.
In the embodiment of the present invention, as shown in fig. 7, by calling the operation instruction stored in the memory (the operation instruction may be stored in the operating system), the corresponding operation is performed.
As shown in fig. 7, the processor 410 controls the processing operations of any one of the terminal devices, and the processor 410 may also be referred to as a central processing unit (central processing unit, CPU).
As shown in fig. 7, memory 430 may include read only memory and random access memory, and provides instructions and data to the processor. A portion of the memory 430 may also include NVRAM. Such as a memory, a communication interface, and a memory coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus system 540 in fig. 7.
As shown in fig. 7, the method disclosed in the above embodiment of the present invention may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), an ASIC, a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
In one aspect, a computer readable storage medium is provided, in which instructions are stored, which when executed, implement the functions performed by the terminal device in the above embodiments.
In one aspect, a chip is provided, where the chip is applied to a terminal device, and the chip includes at least one processor and a communication interface, where the communication interface is coupled to the at least one processor, and the processor is configured to execute instructions to implement the functions performed by the training method for implementing a neural network model for continuous learning in the foregoing embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present invention are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a terminal, a user equipment, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, an optical medium such as a digital video disc (digital video disc, DVD), or a semiconductor medium such as a solid state disk (solid STATE DRIVE, SSD).
Although the invention is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Although the invention has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are merely exemplary illustrations of the present invention as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

CN202110686037.4A2021-06-212021-06-21Training method and device for realizing continuous learning neural network model and electronic equipmentActiveCN115511042B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110686037.4ACN115511042B (en)2021-06-212021-06-21Training method and device for realizing continuous learning neural network model and electronic equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110686037.4ACN115511042B (en)2021-06-212021-06-21Training method and device for realizing continuous learning neural network model and electronic equipment

Publications (2)

Publication NumberPublication Date
CN115511042A CN115511042A (en)2022-12-23
CN115511042Btrue CN115511042B (en)2025-09-19

Family

ID=84500359

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110686037.4AActiveCN115511042B (en)2021-06-212021-06-21Training method and device for realizing continuous learning neural network model and electronic equipment

Country Status (1)

CountryLink
CN (1)CN115511042B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109102017A (en)*2018-08-092018-12-28百度在线网络技术(北京)有限公司Neural network model processing method, device, equipment and readable storage medium storing program for executing
CN111950700A (en)*2020-07-062020-11-17华为技术有限公司 A neural network optimization method and related equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108491927A (en)*2018-03-162018-09-04新智认知数据服务有限公司A kind of data processing method and device based on neural network
WO2019237357A1 (en)*2018-06-152019-12-19华为技术有限公司Method and device for determining weight parameters of neural network model
US12033067B2 (en)*2018-10-312024-07-09Google LlcQuantizing neural networks with batch normalization
CN110929852A (en)*2019-11-292020-03-27中国科学院自动化研究所 Deep binary neural network training method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109102017A (en)*2018-08-092018-12-28百度在线网络技术(北京)有限公司Neural network model processing method, device, equipment and readable storage medium storing program for executing
CN111950700A (en)*2020-07-062020-11-17华为技术有限公司 A neural network optimization method and related equipment

Also Published As

Publication numberPublication date
CN115511042A (en)2022-12-23

Similar Documents

PublicationPublication DateTitle
CN112085186B (en)Method for determining quantization parameter of neural network and related product
US20240070225A1 (en)Reduced dot product computation circuit
US20200097827A1 (en)Processing method and accelerating device
US20190171927A1 (en)Layer-level quantization in neural networks
CN111105017A (en)Neural network quantization method and device and electronic equipment
US20250077182A1 (en)Arithmetic apparatus, operating method thereof, and neural network processor
CN112183326B (en) Face age recognition model training method and related device
US20240135174A1 (en)Data processing method, and neural network model training method and apparatus
CN111079753A (en)License plate recognition method and device based on deep learning and big data combination
CN110874627B (en) Data processing method, data processing device and computer readable medium
CN110728350A (en)Quantification for machine learning models
CN118170347B (en) Precision conversion device, data processing method, processor, and electronic device
CN113126953A (en)Method and apparatus for floating point processing
US20240233358A9 (en)Image classification method, model training method, device, storage medium, and computer program
WO2024146203A1 (en)Training method and apparatus for text recognition model for images, device, and medium
WO2020005599A1 (en)Trend prediction based on neural network
CN111242322B (en)Detection method and device for rear door sample and electronic equipment
CN116542673A (en)Fraud identification method and system applied to machine learning
KR20230097540A (en)Object detection device using object boundary prediction uncertainty and emphasis neural network and method thereof
CN115511042B (en)Training method and device for realizing continuous learning neural network model and electronic equipment
US20230136209A1 (en)Uncertainty analysis of evidential deep learning neural networks
CN117391145A (en) A convolutional neural network quantitative reasoning optimization method and system
CN118036682A (en)Method, device, equipment and medium for implementing in-memory calculation of addition neural network
KR102722476B1 (en) Neural processing elements with increased precision
CN114971866A (en) Artificial intelligence-based credit limit management and control method, device, equipment and medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp