CN115511042B

Movatterモバイル変換

Info

Publication number: CN115511042B
Application number: CN202110686037.4A
Authority: CN
Inventors: 尚大山; 张握瑜; 李熠
Original assignee: Institute of Microelectronics of CAS
Current assignee: Institute of Microelectronics of CAS
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2025-09-19
Anticipated expiration: 2041-06-21
Also published as: CN115511042A

Abstract

The invention discloses a training method and device for realizing a neural network model for continuous learning and electronic equipment, and relates to the fields of machine learning and artificial intelligence. A training method for realizing continuous learning of a neural network model comprises the steps of obtaining historical floating point weights of the neural network model, carrying out binarization processing on the historical floating point weights to determine historical binary weights, processing training samples by utilizing the historical binary weights of the neural network model to obtain errors of the historical binary weights, determining current actual floating point weights based on the historical floating point weights and the errors of the historical binary weights, and adjusting the floating point weights of the neural network model according to the current actual floating point weights. The method provided by the invention can not only realize accurate identification of the current task, but also memorize the task learned before, realize continuous learning of the binary neural network, improve the robustness of the neural network model and reduce the power consumption in the data processing process.

Description

Training method and device for realizing continuous learning neural network model and electronic equipment

Technical Field

The invention relates to the field of machine learning and artificial intelligence, in particular to a training method and device for a neural network model for realizing continuous learning and electronic equipment.

Background

With the development of the fields of machine learning and artificial intelligence, deep neural networks capable of realizing continuous learning function are being studied more and more. Continuous learning is an important deep learning function, which refers to the ability to learn knowledge that was learned before while learning new knowledge.

The human brain realizes continuous learning through complex biological systems such as a Hubble rule and a complementary learning system. However, the conventional artificial neural network can generate catastrophic forgetting (Catastrophic forgetting) under the scene of learning new knowledge, namely, the parameters of the neural network are rewritten, and the parameters of the neural network which are learned before can not be reserved, so that continuous learning can not be realized.

At present, various algorithms can avoid catastrophic forgetting and realize continuous learning in a data stream, but the algorithms have higher requirements on data precision and need to adopt floating point data.

On the one hand, however, implementing the continuous learning algorithm using conventional complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) circuit hardware requires a complex circuit design, and is structurally complex, resulting in high power consumption. On the other hand, the precision is difficult to meet the requirement when the novel in-memory computing device is realized by hardware, the novel in-memory computing device has non-ideal characteristics, the fixed-point 8-bit (bit) precision can be realized at present, and the requirement on the floating-point 32-bit computing precision can not be met. And, the power consumption required for high-precision data processing is high.

Disclosure of Invention

The invention aims to provide a training method, a training device and electronic equipment for realizing a neural network model for continuous learning, which are used for solving the problems of high storage capacity requirement and high data precision requirement of the existing continuous learning algorithm.

In a first aspect, the present invention provides a training method for implementing a neural network model for continuous learning, including:

acquiring historical floating point weights of the neural network model;

performing binarization processing on the historical floating point weight to determine a historical binary weight;

processing a training sample by using the historical binary weight of the neural network model to obtain an error of the historical binary weight;

determining a current actual floating point weight based on the historical floating point weight and an error of the historical binary weight;

And adjusting the floating point weight of the neural network model according to the current actual floating point weight.

Under the condition of adopting the technical scheme, the training method for realizing continuous learning can be used for acquiring the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing a training sample by utilizing the historical binary weight of the neural network model to obtain an error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the method provided by the invention can accurately identify the current task and memorize the task learned before by introducing the mixed precision training method of the binary neural network, thereby reducing the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.

In one possible implementation, the processing training samples using the historical binary weights of the neural network model to obtain an error of the historical binary weights includes:

processing training samples by using historical binary weights of the neural network model to obtain sample loss;

And determining the error of the historical binary weight according to the sample loss.

In one possible implementation, the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight includes:

and under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.

And under the condition that the sign of the error of the historical binary weight is opposite to that of the historical binary weight and the error of the historical binary weight is smaller than a weight error threshold, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.

Acquiring an error weight preset value corresponding to the error of the historical binary weight under the condition that the error of the historical binary weight is opposite to the sign of the historical binary weight and the binarization weight is larger than a weight error threshold value;

and determining the sum of the historical floating point weight and the error weight preset value as the current actual floating point weight.

In one possible implementation manner, if the loss function of the neural network model converges, the adjusting the floating point weight of the neural network model according to the current actual floating point weight includes:

and the floating point weight of the neural network model is the current actual floating point weight.

In one possible implementation manner, if the loss function of the neural network model does not converge, the adjusting the floating point weight of the neural network model according to the current actual floating point weight includes:

And updating the initial floating point weight of the neural network model according to the current actual floating point weight.

In a second aspect, the present invention further provides a training apparatus for implementing a neural network model for continuous learning, the apparatus comprising:

The acquisition module is used for acquiring the historical floating point weight of the neural network model;

the first determining module is used for carrying out binarization processing on the historical floating point weight and determining historical binary weight;

the obtaining module is used for processing training samples by utilizing the historical binary weight of the neural network model to obtain errors of the historical binary weight;

The second determining module is used for determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight;

and the adjusting module is used for adjusting the floating point weight of the neural network model according to the current actual floating point weight.

In one possible implementation, the obtaining module includes:

the obtaining submodule is used for processing training samples by utilizing the historical binary weight of the neural network model to obtain sample loss;

And the first determining submodule is used for determining the error of the historical binary weight according to the sample loss.

In one possible implementation manner, the second determining module includes:

And the second determining submodule is used for determining that the sum of the errors of the historical floating point weight and the historical binary weight is the current actual floating point weight under the condition that the errors of the historical binary weight and the historical binary weight have the same sign.

In one possible implementation manner, the second determining module includes:

And the third determining submodule is used for determining that the sum of the errors of the historical floating point weight and the historical binary weight is the current actual floating point weight under the condition that the signs of the errors of the historical binary weight and the historical binary weight are opposite and the errors of the historical binary weight are smaller than a weight error threshold value.

In one possible implementation manner, the second determining module includes:

the acquisition sub-module is used for acquiring an error weight preset value corresponding to the error of the historical binary weight under the condition that the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is larger than a weight error threshold value;

And the fourth determination submodule is used for determining that the sum of the historical floating point weight and the error weight preset value is the current actual floating point weight.

In one possible implementation manner, if the loss function of the neural network model converges, the adjusting module includes:

and a fifth determination submodule, configured to determine a floating point weight of the neural network model as the current actual floating point weight.

In one possible implementation, if the loss function of the neural network model does not converge, the adjustment module includes:

and the updating sub-module is used for updating the initial floating point weight of the neural network model according to the current actual floating point weight.

The training device for implementing the continuous learning neural network model provided in the second aspect has the same beneficial effects as the training method for implementing the continuous learning neural network model described in the first aspect or any possible implementation manner of the first aspect, and is not described herein.

In a third aspect, the invention also provides an electronic device comprising one or more processors and one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform the training apparatus of the neural network model described in any of the possible implementations of the second aspect to achieve continuous learning.

The beneficial effects of the electronic device provided in the third aspect are the same as the beneficial effects of the training device for implementing the neural network model for continuous learning described in the second aspect or any possible implementation manner of the second aspect, which are not described herein.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 shows a schematic structural diagram of an artificial intelligence processing device according to an embodiment of the present application;

Fig. 2 is a schematic flow chart of a training method for implementing a neural network model for continuous learning according to an embodiment of the present application;

FIG. 3 is a flowchart of another training method for implementing a neural network model for continuous learning according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a data set segmentation provided in an embodiment of the present application;

FIG. 5 shows a structural flow chart of a training device for implementing a neural network model for continuous learning according to an embodiment of the present application;

fig. 6 is a schematic hardware structure of an electronic device according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of a chip according to an embodiment of the present invention.

Detailed Description

In order to clearly describe the technical solution of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. For example, the first threshold and the second threshold are merely for distinguishing between different thresholds, and are not limited in order. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

In the present invention, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes an association of associated objects, meaning that there may be three relationships, e.g., A and/or B, and that there may be A alone, while A and B are present, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a, b or c) of a, b, c, a and b combination, a and c combination, b and c combination, or a, b and c combination, wherein a, b and c can be single or multiple.

An embodiment of the application provides an artificial intelligence processing device, and fig. 1 shows a schematic structural diagram of the artificial intelligence processing device provided by the embodiment of the application, as shown in fig. 1, the artificial intelligence processing device 01 comprises an artificial intelligence processor 01A, a storage unit 01B in communication connection with the artificial intelligence processor 01A, and an interface component 01C in communication connection with the artificial intelligence processor 01A and the storage unit 01B respectively, wherein an instruction set is stored in the storage unit 01B, and the artificial intelligence processor 01A can realize a learning process of the instruction set based on the instruction set and the control of the interface component 01C.

Wherein, in the built-in data type of the instruction set (Instruction Set Architecture), the floating point is in binary format. Floating points represent real numbers, whose value gap is determined by an exponent, and thus have a very wide range of values, the closer the value is to zero, the more accurate.

Fig. 2 is a schematic flow chart of a training method for implementing a neural network model for continuous learning according to an embodiment of the present application, where, as shown in fig. 2, the training method for a neural network model includes:

and 101, acquiring historical floating point weights of the neural network model.

In the present application, the neural network model may be a convolutional neural network, a fully-connected neural network, or other types of neural networks, which is not particularly limited in the embodiment of the present application. The neural network model has various architectures, such as ResNet network architecture, VGG network architecture, YOLO architecture, etc., and of course, the neural network model can also be designed according to actual situations. For example, the neural network model comprises a task input layer, a full connection layer and a prediction output layer which are sequentially connected in a communication way, and the neural network model can be trained in a continuous learning mode to realize correct recognition of re-input tasks.

In the method, the full-connection layer comprises a first full-connection layer, a second full-connection layer and a third full-connection layer, wherein the task input-output layer comprises 784 neurons, the first full-connection layer and the second full-connection layer respectively comprise 256 neurons, the prediction output layer comprises 10 neurons, and each full-connection layer adopts the method.

Step 102, binarizing the historical floating point weight to determine the historical binary weight.

In the forward propagation of neural networks, a sign function is typically employed as a weight binarization function,

W_b＝sign(W_h)。

Where x and W_h represent historical floating point weights, y and W_b represent historical binary weights, sign (·) is a binary sign function.

After the historical binary weight is obtained, the historical binary weight can be used as the forward propagation weight of the neural network model, in the application, the activation function adopts a symbol function, training data corresponding to the current task can be respectively and sequentially propagated forward through the full-connection layer, the normalization layer and the activation function layer, and the output of the last full-connection layer passes through the output function.

Wherein the output function may be y=softmax (x);

wherein y represents an output value, and x represents a net activity value of the last full-connection layer corresponding to the current task. That is, the value of the ith output neuron may be:

Where yⁱ represents the output value of the ith output neuron, and xⁱ represents the net activity value of the ith neuron corresponding to the current task.

And 103, processing the training sample by using the historical binary weight of the neural network model to obtain an error of the historical binary weight.

The training data can be transmitted forward on the neural network with the historical binary weight, errors are calculated through the cross entropy function, and the update quantity of the binary weight is obtained. That is, the training samples may be processed with historical binary weights of the neural network model to obtain errors in the historical binary weights. After the training data corresponding to the current task is propagated back and forth once, the error (ΔW_h) of the historical binary weight can be calculated by using a cross entropy function. In particular, the implementation process of the step 103 may include the following substeps:

And S1, initializing the neural network model weight into a floating point weight, obtaining an initial floating point weight, and binarizing the initial floating point weight to obtain a historical binary weight.

And S2, utilizing a neural network with historical binary weights to forward propagate, calculating to obtain errors, and calculating the update quantity of the historical binary weights according to the losses. The data may be derived using y=hardtanh (x) as the activation function. Wherein, theX represents input data and y represents output data.

Step 104, determining the current actual floating point weight based on the errors of the historical floating point weight and the historical binary weight.

In an alternative embodiment provided by the invention, under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight, determining the sum of the error of the historical floating point weight and the historical binary weight as the current actual floating point weight.

In still another embodiment of the present invention, when the sign of the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is smaller than the weight error threshold, determining that the sum of the error of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight.

Wherein the weight error threshold may be: Wherein DeltaW_hmax represents the maximum update amount of W_h, m represents the memory coefficient, epsilon represents the minimum value, and the function of preventing the denominator from being 0 is achieved.

In another embodiment provided by the embodiment of the invention, under the condition that the sign of the error of the historical binary weight is opposite to that of the historical binary weight and the historical binary weight is greater than a weight error threshold value, acquiring an error weight preset value corresponding to the error of the historical binary weight;

And step 105, adjusting floating point weights of the neural network model according to the current actual floating point weights.

In an alternative embodiment provided by the embodiment of the present invention, the loss function of the neural network model converges, and the floating point weight of the neural network model is determined to be the current actual floating point weight.

In another alternative embodiment provided by the embodiment of the present invention, if the loss function of the neural network model does not converge, the historical floating point weight of the neural network model is updated according to the current actual floating point weight.

After the current task is learned, the next task is continuously learned, at this time, the floating point weight of the initial neural network model is not required to be re-initialized, the current task is updated to the historical floating point weight after the learning is completed, the steps of the method are circulated, when the next task arrives, the actual floating point weight is determined based on the historical floating point weight, namely, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network can be improved, and the power consumption in the data processing process can be reduced.

The training method for the neural network model for realizing continuous learning provided by the embodiment of the invention can be used for obtaining the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing training samples by utilizing the historical binary weight of the neural network model to obtain the error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the training method of the binary neural network hybrid precision is introduced, so that the neural network model not only can accurately identify the current task, but also can memorize the task learned before, thereby avoiding the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.

Optionally, fig. 3 is a schematic flow chart of another training method for implementing a neural network model for continuous learning according to an embodiment of the present application, referring to fig. 3, the training method for implementing a neural network model for continuous learning includes:

step 201, acquiring historical floating point weights of a neural network model.

In the present application, the neural network model may be a convolutional neural network, a fully-connected neural network, or other types of neural networks, which is not particularly limited in the embodiment of the present application.

The neural network model comprises a task input layer, a first full-connection layer, a second full-connection layer, a third full-connection layer and a prediction output layer which are sequentially in communication connection, and can be trained in a continuous learning mode to realize correct recognition of re-input tasks.

In the application, the historical floating point weight refers to the floating point weight corresponding to the neural network model after the neural network model identifies the previous input task.

When the current task arrives, a historical floating point weight of the initial neural network model can be acquired. In the case where the current task is the first task, the floating point weight of the neural network model may be initialized first, with the initialized floating point weight being taken as the historical floating point weight.

In the case where the current task is not the first task, the floating point weight of the last task of the neural network model may be obtained as the historical floating point weight.

The input data set may be divided into a plurality of tasks by means of segmentation, for example, in the case that the input data set is an MNIST data set, the input data set may be divided into 5 tasks in sequence.

In the present application, the input data set may be voice data, image data, text data, or the like, which is not particularly limited in the embodiment of the present application.

Step 202, binarizing the historical floating point weight to determine the historical binary weight.

W_b＝sign(W_h);

Where x and W_h each represent historical floating point weights, and y and W_b each represent historical binary weights.

Wherein the output function may be:

y=Softmax(x);

Where y represents the output value and x represents the net activity value of the last fully connected layer of the current task.

That is, the value of the ith output neuron may be:

Where yⁱ represents the output value of the ith output neuron and xⁱ represents the net activity value of the corresponding ith neuron for the current task.

And 203, processing the training sample by using the historical binary weight of the neural network model to obtain sample loss.

In the application, training data can be transmitted forward on a neural network with historical binary weights, errors are calculated through a cross entropy function, and the update quantity of the binary weights is obtained. That is, the training samples may be processed with historical binary weights of the neural network model to obtain errors in the historical binary weights. After the training data corresponding to the current task is propagated back and forth once, the error (ΔW_h) of the historical binary weight can be calculated by using a cross entropy function.

In the application, the neural network model weight can be initialized to a floating point type weight to obtain a historical floating point weight, the historical floating point weight is binarized to obtain a historical binary weight, the neural network with the historical binary weight is utilized to forward propagate, the error is calculated, and the update quantity of the historical binary weight is calculated according to the loss, namely the error of the calculated historical binary weight.

And 204, determining the error of the historical binary weight according to the sample loss.

After the training data corresponding to the current task is propagated back and forth once, the error (ΔW_h) of the historical binary weight can be calculated by using a cross entropy function.

In the method, y=Hardtan (x) can be used as an activation function to derive data in the process of calculating the error of the historical binary weight by adopting the cross entropy function.

Wherein, the

In the above formula, x represents input data and y represents output data.

Step 205, determining the current actual floating point weight based on the error of the historical floating point weight and the historical binary weight.

In another embodiment provided by the embodiment of the invention, under the condition that the sign of the error of the historical binary weight is opposite to that of the historical binary weight and the error of the historical binary weight is larger than a weight error threshold value, an error weight preset value corresponding to the error of the historical binary weight is obtained;

And 206, adjusting the floating point weight of the neural network model according to the current actual floating point weight.

In another alternative embodiment provided by the embodiment of the present invention, if the loss function of the neural network model does not converge, the initial floating point weight of the neural network model is updated according to the current actual floating point weight.

By adopting the neural network model trained by the embodiment of the application, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.

In the present application, the historical floating point weight refers to the floating point weight trained by the neural network model input last time. Of course, the floating point weight trained by any historical input of the neural network model can also be used.

The input data set can be divided into a plurality of tasks in a segmentation way, fig. 4 shows a schematic diagram of data set segmentation provided by the embodiment of the application, as shown in fig. 4, the MNIST data set can be segmented into five tasks X, the neural network model learns one task at a time, when learning the current task, only training data corresponding to the current task is provided, and training data of historical tasks is not provided, so that the requirement on storage capacity can be reduced, and the power consumption in the training process is reduced.

Referring to fig. 4, in processing a data set divided into five tasks, historical floating point weights of a neural network model may be obtained as the current task arrives. In the case where the current task is the first task, the floating point weight of the neural network model may be initialized first, with the initialized floating point weight being taken as the historical floating point weight.

Further, by adopting the training method of the application, the floating point weight corresponding to the first task is obtained, the floating point weight corresponding to the first task is updated to be the historical floating point weight, when the second task arrives, the next task continues to be learned, the floating point weight of the initial neural network model is not required to be restarted at the moment, the current task is updated to be the historical floating point weight after the learning is finished, and the steps of the method are circulated until the fifth task is processed. In the process, when the next task arrives, the actual floating point weight is determined based on the historical floating point weight, namely, the accurate identification of the current task can be realized, the task learned before can be memorized, the continuous learning of the binary neural network can be realized, the robustness of the neural network can be improved, and the power consumption in the data processing process can be reduced.

The training method for the neural network model for realizing continuous learning provided by the embodiment of the invention can be used for obtaining the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing training samples by utilizing the historical binary weight of the neural network model to obtain the error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the method provided by the invention can accurately identify the current task and memorize the task learned before by introducing the training method of the binary neural network hybrid precision, thereby avoiding the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.

Fig. 5 shows a schematic structural diagram of a training device for implementing a neural network model for continuous learning according to an embodiment of the present application, and as shown in fig. 5, the training device 300 for implementing a neural network model for continuous learning includes:

an obtaining module 301, configured to obtain a historical floating point weight of the neural network model;

The first determining module 302 is configured to perform binarization processing on the historical floating point weight, and determine a historical binary weight;

An obtaining module 303, configured to process the training sample by using the historical binary weight of the neural network model, and obtain an error of the historical binary weight;

A second determining module 304, configured to determine a current actual floating point weight based on the historical floating point weight and the error of the historical binary weight;

an adjustment module 305 is configured to adjust the floating point weight of the neural network model according to the current actual floating point weight.

Optionally, the obtaining module includes:

the first obtaining submodule is used for processing training samples by using historical binary weights of the neural network model to obtain sample loss;

And the second obtaining submodule is used for determining the error of the historical binary weight according to the sample loss.

Optionally, the first obtaining submodule includes:

optionally, the obtaining module includes:

The obtaining submodule is used for processing training samples by utilizing historical binary weights of the neural network model to obtain sample loss;

And the first determination submodule is used for determining the error of the historical binary weight according to the sample loss.

In one possible implementation, the second determining module includes:

and the second determining submodule is used for determining that the sum of the historical floating point weight and the error of the historical binary weight is the current actual floating point weight under the condition that the error of the historical binary weight is the same as the sign of the historical binary weight.

Optionally, the second determining module includes:

Optionally, if the loss function of the neural network model converges, the adjusting module includes:

and a fifth determination submodule, wherein the floating point weight of the neural network model is the current actual floating point weight.

Optionally, if the loss function of the neural network model does not converge, the adjusting module includes:

The training device for the neural network model for realizing continuous learning provided by the embodiment of the invention can be used for acquiring the historical floating point weight of the neural network model, performing binarization processing on the historical floating point weight to determine the historical binary weight, processing training samples by utilizing the historical binary weight of the neural network model to obtain the error of the historical binary weight, determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight, and adjusting the floating point weight of the neural network model according to the current actual floating point weight. Therefore, the training method of the binary neural network hybrid precision is introduced, so that the neural network model not only can accurately identify the current task, but also can memorize the task learned before, thereby reducing the problem of disastrous forgetting in the continuous learning process. In addition, by adopting the method, continuous learning of the binary neural network can be realized without circuit design, the robustness of the neural network model can be improved, and the power consumption in the data processing process can be reduced.

The training device for realizing the neural network model for continuous learning provided by the invention is applied to a training method for realizing the neural network model for continuous learning shown in any one of fig. 1 to 4, wherein the training method comprises a controller and at least one detection circuit electrically connected with the controller, and the training method is not repeated here.

The electronic device in the embodiment of the invention can be a device, a component in a terminal, an integrated circuit, or a chip. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present invention are not limited in particular.

The electronic device in the embodiment of the invention can be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, and the embodiment of the present invention is not limited specifically.

Fig. 6 shows a schematic hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic device 400 includes a processor 410.

As shown in FIG. 6, the processor 410 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.

As shown in fig. 6, the electronic device 400 may further include a communication line 440. Communication line 440 may include a path to communicate information between the above-described components.

Optionally, as shown in fig. 6, the electronic device may further include a communication interface 420. The communication interface 420 may be one or more. Communication interface 420 may use any transceiver-like device for communicating with other devices or communication networks.

Optionally, as shown in fig. 6, the electronic device may also include a memory 430. Memory 430 is used to store computer-executable instructions for performing aspects of the present invention and is controlled by the processor for execution. The processor is configured to execute computer-executable instructions stored in the memory, thereby implementing the method provided by the embodiment of the invention.

As shown in fig. 6, the memory 430 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 430 may be stand alone and be coupled to the processor 410 via a communication line 440. Memory 430 may also be integrated with processor 410.

Alternatively, the computer-executable instructions in the embodiments of the present invention may be referred to as application program codes, which are not particularly limited in the embodiments of the present invention.

In a particular implementation, as one embodiment, as shown in FIG. 6, processor 410 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 6.

In a specific implementation, as an embodiment, as shown in fig. 6, the terminal device may include a plurality of processors, such as a first processor 4101 and a second processor 4102 in fig. 6. Each of these processors may be a single-core processor or a multi-core processor.

Fig. 7 is a schematic structural diagram of a chip according to an embodiment of the present invention. As shown in fig. 7, the chip 500 includes one or more (including two) processors 410.

Optionally, as shown in fig. 7, the chip further includes a communication interface 420 and a memory 430, and the memory 430 may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (non-volatile random access memory, NVRAM).

In some implementations, as shown in FIG. 7, the memory 430 stores elements, execution modules or data structures, or a subset thereof, or an extended set thereof.

In the embodiment of the present invention, as shown in fig. 7, by calling the operation instruction stored in the memory (the operation instruction may be stored in the operating system), the corresponding operation is performed.

As shown in fig. 7, the processor 410 controls the processing operations of any one of the terminal devices, and the processor 410 may also be referred to as a central processing unit (central processing unit, CPU).

As shown in fig. 7, memory 430 may include read only memory and random access memory, and provides instructions and data to the processor. A portion of the memory 430 may also include NVRAM. Such as a memory, a communication interface, and a memory coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus system 540 in fig. 7.

As shown in fig. 7, the method disclosed in the above embodiment of the present invention may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), an ASIC, a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

In one aspect, a computer readable storage medium is provided, in which instructions are stored, which when executed, implement the functions performed by the terminal device in the above embodiments.

In one aspect, a chip is provided, where the chip is applied to a terminal device, and the chip includes at least one processor and a communication interface, where the communication interface is coupled to the at least one processor, and the processor is configured to execute instructions to implement the functions performed by the training method for implementing a neural network model for continuous learning in the foregoing embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present invention are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a terminal, a user equipment, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, an optical medium such as a digital video disc (digital video disc, DVD), or a semiconductor medium such as a solid state disk (solid STATE DRIVE, SSD).

Although the invention is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Although the invention has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are merely exemplary illustrations of the present invention as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A training method for implementing a neural network model for continuous learning, comprising:

Acquiring historical floating point weights of a neural network model, wherein an input data set of the neural network model is segmented into training data of a plurality of input tasks, wherein the historical floating point weights are floating point weights or initialized floating point weights corresponding to the neural network model after the neural network model recognizes the previous input tasks, and the input data set comprises any one of voice data, image data and text data;

The binarization process is realized by the following formula:

,;

Wherein x and W_h represent historical floating point weights, y and W_b represent historical binary weights, sign (·) is a binary sign function;

Forward transmitting training data on a neural network with historical binary weights, and calculating errors through a cross entropy function to obtain the errors of the historical binary weights;

2. The method of claim 1, wherein the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight comprises:

3. The method of claim 1, wherein the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight comprises:

4. The method of claim 1, wherein the determining the current actual floating point weight based on the historical floating point weight and the error of the historical binary weight comprises:

Acquiring an error weight preset value corresponding to the error of the historical binary weight under the condition that the error of the historical binary weight is opposite to the sign of the historical binary weight and the error of the historical binary weight is larger than the weight error threshold value;

5. The method of claim 1, wherein said adjusting the floating point weights of the neural network model according to the current actual floating point weights if the loss function of the neural network model converges comprises:

and determining the floating point weight of the neural network model as the current actual floating point weight.

6. The method of claim 1, wherein adjusting the floating point weights of the neural network model based on the current actual floating point weights if the loss function of the neural network model does not converge comprises:

7. A training device for implementing a continuously learned neural network model, the device comprising:

The system comprises an acquisition module, a neural network model, a data processing module and a data processing module, wherein the acquisition module is used for acquiring historical floating point weights of the neural network model, wherein an input data set of the neural network model is segmented into training data of a plurality of input tasks, and the historical floating point weights are floating point weights or initialized floating point weights corresponding to the neural network model after the previous input tasks are identified;

the acquisition module is used for forward transmitting training data on a neural network with historical binary weights, and calculating errors through a cross entropy function to obtain the errors of the historical binary weights;

The adjusting module is used for adjusting the floating point weight of the neural network model according to the current actual floating point weight;

The binarization processing is realized through the following formula:

,;

8. An electronic device comprising one or more processors and one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, perform the method of training a neural network model for continuous learning of any of claims 1-6.