Domain increment method based on meta-learningTechnical Field
The invention relates to the technical field of computers, in particular to a field increment method based on meta-learning.
Background
With the rise of deep learning, the object classification method based on the convolutional neural network is rapidly developed, and the identification accuracy is greatly improved. However, the convolutional neural network based approach also has the drawback: when the distribution of the tested picture data is inconsistent with that of the training picture data, for example, the illumination, the background, the posture and the like are changed, and the accuracy of the model is reduced. Thus, when new domain data, i.e., data that does not match the original training data distribution, appears, there is a need for a model that is able to learn new domain classifications incrementally, i.e., learn new domain knowledge classifications while remembering the classifications of the old domain data.
At present, the most intuitive field increment learning method is to continue training a model by using data of a new field, but the method often has the condition that the precision cannot meet the requirement: if the training is insufficient, the accuracy rate of the data in the new field is not high; if the training is over-trained, the data accuracy rate will be reduced for the old domain, and the two are difficult to reconcile. And if the convolutional neural network is retrained by directly mixing the old domain data and the new domain data, the data storage and training time are high in cost, and particularly the cost is high as the new domain data are more and more in practice. Therefore, it is important to find a domain increment identification method capable of obtaining high-precision performance at low cost.
Disclosure of Invention
In order to solve the problems, the invention provides a field increment method based on meta-learning.
The invention adopts the following technical scheme:
a domain increment method based on meta-learning comprises the following steps:
s1, constructing a pre-training model: selecting a plurality of public data sets as metadata by using a meta-learning method iTAML, constructing a meta-task and learning a pre-training model to obtain a parameter phi of the pre-training model, wherein the pre-training model is a convolutional neural classification network;
s2, training the old model by using the pre-training model: constructing a classification model with the same type as the pre-training model as an old model, introducing a parameter phi of the pre-training model into the old model, and guiding old data D by using a cross entropy loss functionoldTraining the old model, and randomly sampling and keeping 5% of old data D after trainingoldAs memory data Dmemory;
S3 trainingAnd (3) practicing a new model: using said memory data DmemoryAnd new data DnewCo-training the old model together for new data DnewGuiding model learning by using cross entropy loss function, and for memory data DmemoryAnd (4) jointly guiding model learning by using a cross entropy loss function and a knowledge distillation loss function, thereby obtaining a new model.
Further, the convolutional neural classification network is one of VGG, ResNet, MobileNet, DenseNet or SENEt.
Further, the training process of the meta learning method iTAML in step S1 is an incremental type, T stages are trained in total, T is the total task number, and T represents the tth task;
when t is equal to 1, normally training data of task 1 by using a cross entropy loss formula to obtain a pre-training model parameter phi1The cross entropy loss formula is as follows:
wherein D is
tRepresenting a data set belonging to the t-th task, having a total of N samples, x
iIs one of them, p
iRepresenting model pairs x
iPredicted value of (a), y
iRepresents a true tag value;
when t is more than or equal to 2, the initialization parameter is the parameter phi trained in the last stagebase=φt-1Respectively taking out task 1, task 2,. and task t, wherein t task data are divided by phibaseUpdating and optimizing the initial parameters by using cross entropy loss to obtain a temporary parameter phi of the corresponding task1,φ2,…φtThen updates phibaseThe final result parameter phi of the stage is obtained when the loss does not decreaset=φbaseSaid update phibaseThe following formula is adopted:
finally, in the obtained phiTAs parameters for the pre-trained model.
Further, in step S3, the joint guidance of model learning by using the cross entropy loss function and the knowledge distillation loss function is the overall loss, which is expressed by the following formula: loss _ ce + Loss _ distill, where Loss _ ce represents the cross entropy Loss, Loss _ distill represents the knowledge distillation Loss,
and the solution formula for loss _ ce is as follows:
wherein x is
i∈D
memory∪D
newRepresenting samples belonging to memory data or new data, N in total, p
iRepresenting model pairs x
iPredicted value of (a), y
iRepresents a true tag value;
and the solving formula of loss _ distill is as follows:
wherein x is
i∈D
memoryRepresenting samples belonging to memory data, N in total, q
iIs the old model about data x
iPredicted value of p
iIs the model pair x in training
iThe predicted value of (2).
After adopting the technical scheme, compared with the background technology, the invention has the following advantages:
1. unlike the conventional pre-training model using a large public data set, the meta-learning pre-training model has the advantage of rapidly learning new task data, and can train target task data in less time;
2. metadata of the pre-training model does not need to be saved, so that the training time does not increase with the increase of new data;
3. the new model is finely tuned and trained by mixing randomly reserved 5% of memory data and new data, and learning of the new model is guided by combining a cross entropy loss function and a knowledge distillation loss function, so that the classification knowledge of the data in the new field is learned while the classification knowledge of the old field is remembered, the expenditure of data storage and training time is greatly reduced, and the accuracy of the model after new field increment is introduced is ensured.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Examples
This embodiment first gives a batch of old data Dold,DoldConsists of Mobai and golden hair, and a new batch of data D is provided laternew,DnewThe system consists of a small yellow car and a Husky, and the aim of the embodiment is to achieve higher accuracy rate on new and old data. As shown in fig. 1, the method of this embodiment is as follows:
s1, constructing a pre-training model: by using a meta-learning method iTAML, a plurality of public data sets are selected as meta-data, meta-tasks are constructed, and a pre-training model is learned, such as selecting airplanes and birds (task 1), driving trucks and deer (task 2), automobiles and horses (task 3) in a cifar10 data set, selecting MobileNetV2 for a classification model structure, and obtaining a parameter phi of the pre-training model, wherein the pre-training model is a convolutional neural classification network, and it is noted that iTAML is independent of models, and any convolutional neural classification network can be selected, such as VGG, ResNet, MobileNet, DenseNet or SENet.
The training process of the meta learning method iTAML in step S1 is incremental, T stages are trained in total, T is the total task number, T represents the tth task, this embodiment takes 3 tasks as an example:
when t is equal to 1, normally training data of task 1 by using a cross entropy loss formula to obtain a pre-training model parameter phi1The cross entropy loss formula is as follows:
wherein D is
tRepresenting a data set belonging to the t-th task, having a total of N samples, x
iIs one of them, p
iRepresenting model pairs x
iPredicted value of (a), y
iRepresents a true tag value;
when t is more than or equal to 2, the initialization parameter is the parameter phi trained in the last stagebase=φt-1Respectively taking out task 1, task 2,. and task t, wherein t task data are divided by phibaseUpdating and optimizing the initial parameters by using cross entropy loss to obtain a temporary parameter phi of the corresponding task1,φ2,…φtThen updates phibaseThe final result parameter phi of the stage is obtained when the loss does not decreaset=φbaseSaid update phibaseThe following formula is adopted:
finally, in the obtained phi3As parameters for the pre-trained model.
S2, training the old model by using the pre-training model: constructing a classification model with the same type as the pre-training model as an old model, introducing a parameter phi of a feature extraction layer in the pre-training model into the old model, and guiding old data D by using a cross entropy loss functionoldTraining the old model, and randomly sampling and keeping 5% of old data D after trainingoldAs memory data Dmemory;
Through this step, the old model is aligned to the old data DoldDistributed data has better classification accuracy, but for unknown data which is distributed in a non-consistent manner, the accuracy cannot be guaranteed.
S3, training a new model: using said memory data DmemoryAnd new data DnewCo-training the old model together for new data DnewGuiding model learning by using cross entropy loss function, and for memory data DmemoryUsing cross entropy loss function and knowledge distillationAnd the loss function jointly guides the model learning, so that a new model is obtained.
In step S3, the joint guidance of model learning using the cross entropy loss function and the knowledge distillation loss function is the overall loss, and the formula is: loss _ ce + Loss _ distill, where Loss _ ce represents the cross entropy Loss, Loss _ distill represents the knowledge distillation Loss,
and the solution formula for loss _ ce is as follows:
wherein x is
i∈D
memory∪D
newRepresenting samples belonging to memory data or new data, N in total, p
iRepresenting model pairs x
iPredicted value of (a), y
iRepresents a true tag value;
and the solving formula of loss _ distill is as follows:
wherein x is
i∈D
memoryRepresenting samples belonging to memory data, N in total, q
iIs the old model about data x
iPredicted value of p
iIs the model pair x in training
iThe predicted value of (2).
The new model is finely tuned and trained by mixing randomly reserved 5% of memory data and new data, and learning of the new model is guided by combining a cross entropy loss function and a knowledge distillation loss function, so that the learning of the classification knowledge of the data in the new field is guided while the classification knowledge of the old field is remembered, the expenditure of data storage and training time is greatly reduced, and meanwhile, the accuracy of the model after the increment of the new field is introduced is ensured.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.