Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a multi-scale measurement few-sample learning method based on class features, and solves the problem that class feature extraction is inaccurate in the existing few-sample learning.
The invention aims at realizing the technical scheme that the multi-scale measurement and sample-less learning method based on class characteristics comprises the following steps:
 s1, data preprocessing, namely enhancing data in a random fixed angle mode to expand data quantity and increase image samples with different angles for the same class, and obtaining a support set and a query set by an N-way K-shot method;
 s2, feature embedding step, namely embedding the features into the networkEmbedding the sample xi in the support set and the query set to obtain the features
S3, class feature extraction, namely fusing a plurality of sample features of the same class of support sets through a dynamic routing mechanism, and updating weight vectors of the input vectors in an iterative mode to obtain class integral features;
 s4, a multi-scale measurement step, namely carrying out similarity measurement on the support set characteristics and the query set samples through three measurement criteria of merging the parameter network measurement, the cosine distance measurement and the Euclidean distance measurement.
The obtaining the support set and the query set through the N-way K-shot method comprises the following steps:
 Randomly extracting N classes from the data set, wherein each class extracts k samples as a support set, and the samples in the support set are used for generating prototypes of the N classes;
 And extracting k samples from the remaining samples of the N classes as a query set, wherein the query set is used for calculating the accuracy of the network and has the same model performance.
The specific content of the class feature extraction step comprises the following steps:
 Transforming the support set sample feature vector eij obtained in the feature embedding step to obtainWherein Ws、bs is a conversion matrix and bias term, squash is a nonlinear function to compress the vector so that the length of the vector is normalized between 0 and 1;
 Iteratively applying a vector to an input vectorAnd (5) obtaining the class integral characteristics after updating the weight vector of the model.
The specific iterative process comprises the following steps:
dij=softmax(bi)
 wherein dij represents the association between the input vector and the output class feature ci, the initial value of bij is 0, the input vector and the output class feature ci become uniformly distributed after the Softmax function, and ci represents the class feature of the i class support set sample.
The multi-scale measurement step specifically comprises the following steps:
 obtaining a sample characteristic eq of the query set according to the characteristic embedding step and a class characteristic ci of the ith class support set according to the class characteristic extracting step, and obtaining a matching score between the sample of the ith class support set and the sample of the q query through Euclidean distanceIs that
Matching score with cosine similarity method as measurement criterion
When the measurement mode is that the parametric network with the attention mechanism is adopted, specific parameters in the network are obtained through optimization learning, and then the matching score is obtainedWhere C (,) is a concatenation function, MAttention (,) represents a metric with an attention mechanism, fφ represents a fully connected network with an activation function;
 Selecting the category i with the largest matching score of the three kinds of measurement mode addition as the category label of the query sample xq
The method for learning the few samples further comprises the step of setting a loss function, wherein the loss function is a loss function with a distance, and the calculation formula is as followsWherein m+ represents the interval, α represents the weight coefficient, 1iq represents the indicator function, and riq represents the matching score of the query sample and the i-th class support set sample;
 The loss function calculation formula shows the result of the mutual restriction between the query sample and all other class-level characteristics, the inward pulling force is generated between the similar samples, the outward pushing force is generated between the non-similar samples,The tensile force between the query sample q and the support type i is represented, and the optimization aim is to reduce the distance between the similar samples; The minimum distance between non-homogeneous samples is constrained to be not less than the threshold m+.
The multi-scale measurement less sample learning method based on the class features has the advantages that under the heuristic of the prior measurement-based less sample learning, the N-way K-shot (K > 1) less sample classification task is focused, the dynamic routing mechanism is adopted to generate the class integral features, and compared with the algorithm of direct weighted average, the class integral features obtained through the algorithm are more representative. In the measurement module, a attention mechanism is introduced in the measurement method of the parametric network, and in addition, the advantages and disadvantages of a plurality of measurement modes are combined to jointly determine the similarity among sample features, so that a CFMMN network model with better expressive force is obtained.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Accordingly, the following detailed description of the embodiments of the application, as presented in conjunction with the accompanying drawings, is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application. The application is further described below with reference to the accompanying drawings.
The invention mainly extends the core problems of N-way K-shot (K > 1) few sample classification task and class feature selection and extraction in the few sample learning algorithm, combines the advantages and disadvantages of various measurement modes to generate a multi-scale measurement method, and finally selects two data sets, namely Omniglot data sets and mini-ImageNet data sets which are widely used in the few sample learning field, and performs the few sample image classification test by using the algorithm in the invention. A flow chart of a multi-scale metric and sample-less learning algorithm based on class features is shown in fig. 1, and a network model of the algorithm is shown in fig. 2. The method mainly comprises the following steps:
 Step one, data preprocessing. Because of the specificity of the lack of the data resource of the few samples learning, the data is enhanced by adopting the simplest way of randomly rotating 90 degrees, 180 degrees and 270 degrees, on one hand, the data volume can be expanded, and on the other hand, the image samples with different angles are added for the same class, so that the effectiveness of the feature extraction of the class of the test model is facilitated. In addition, the training iteration of the few-sample image classification model is actually completed through a plurality of tasks, and the image data is required to be generated into tasks and then input into the model. Firstly, randomly extracting N classes from a data set, wherein k samples are extracted from each class to serve as a supporting set, the samples in the supporting set are used for generating prototypes of the N classes, and k samples are extracted from the rest samples of the N classes to serve as a query set, and the query set is used for calculating the accuracy of a network so as to verify the performance of a model. Each task contains a small number of classes and each class contains a small number of samples, such a task setting simulates a scenario of a few sample image classification.
And step two, feature embedding. Support set of k samples per class for a given N classesAnd a query setSample xi through feature embedding networkThe characteristic hi is obtained:
 Wherein, the embedded module is a concrete networkThe structure is shown in fig. 3. The embedding module of the relation network consists of four convolution block structures, wherein each convolution block structure comprises a convolution layer with a convolution kernel of 64 x 3, a Batch normalization layer and a ReLU layer. Wherein the first and second convolution blocks are followed by a 2 x2 max pooling layer to adjust the network specification, and the last two convolution blocks are followed by no pooling layer.
And step three, class feature extraction. After feature extraction is completed on the support set S and the query set Q, a dynamic routing mechanism is adopted to fuse a plurality of sample features of the same type of the support set. In order to enable the model to adapt to task input of more support set samples, the support set sample feature vector eij obtained through the second step is firstly transformed as follows:
 Where Ws、bs is the transformation matrix and bias term, the Squash function behaves like sigmoid, which is a nonlinear function that compresses the vector to a length between 0 and 1 to normalize the length of the vector. The calculation of the Squash function for either vector si is as follows:
 Iteratively applying a vector to an input vectorThe weight vector of (2) is updated to obtain the integral-like characteristics, and the specific iterative process is as follows:
dij=softmax(bi)
 Wherein dij represents the association between the input vector and the output class feature ci, the initial value of bij is 0, the input vector and the output class feature ci become uniformly distributed after the Softmax function, and ci is the class feature of the i class support set sample. If the current sample feature belongs to a certain class, the similarity will be higher, the weight will be larger in the next iteration, and if not, the weight vector should be smaller. In general, after multiple iterations, the individual sample contributions under the same class will become different after learning. After the iteration turns are finished, the characteristics on class level can be obtained, and 3 turns can be completed in general.
And step four, measuring multiple scales. After obtaining the class features of the N classes in the support set, a suitable manner is needed to measure the similarity between the class features of the support set and the query sample. In general, feature similarity measurement methods include cosine distance and euclidean distance. As shown in the network model of FIG. 2, the algorithm combines three measurement criteria of a parameter network measurement, a cosine distance measurement and an Euclidean distance measurement, specifically.
Obtaining a query sample feature eq through a pre-feature embedded network and a class feature ci of an ith class support set through a dynamic routing module, and obtaining a matching score between the ith class support set sample and a qth query sample through Euclidean distanceThe method comprises the following steps:
 if the cosine similarity method is used as a measurement criterion, the matching scoreThe method comprises the following steps:
 As shown in fig. 4, when the measurement mode is a parametric network with a attention mechanism, specific parameters in the network are required to be obtained through optimization learning, and the finally obtained matching score is as follows:
 Where C (,) is a concatenation function, MAttention (,) represents a metric with an attention mechanism, and fφ represents a fully connected network with an activation function. Specifically, the attention mechanism is that after the spliced feature matrix P is subjected to 31 x1 convolution kernels to generate three new feature graphs A, B, C, the calculation method of the attention layer is as follows:
H(A,B)=soft max(ATB)
 the equation obtains the feature map PAttentionOut with the attention weight through the residual error neutralization thought, and attention is introduced into the network, so that not only can each class feature in the support set be comprehensively examined, but also a part with more pertinence between the class feature and the query feature can be found for measurement learning.
Obtaining matching score by combining the three measurement modesThe final result of the task classification with less samples is determined together, and the category i with the largest matching score added by the three measurement modes is selected as the category label of the query sample xq:
 And fifthly, designing a loss function. The loss function is in the present invention linked to the optimization problem as a learning criterion, i.e. the model is evaluated by minimizing the loss function solution. The invention specifically designs a loss function calculation method with intervals aiming at CFMMN less sample learning scenes:
 Where m+ represents the interval, α represents the weight coefficient, 1iq represents the indicator function, and riq represents the matching score of the query sample and the i-th class support set sample. The above equation shows the result of the interaction between the query sample and all other class-level features, with inward pull between like samples and outward push between non-like samples. The first term represents the pulling force between the query sample q and the support class i feature, and the optimization aims at trying to reduce the distance between similar samples, wherein the second term constrains the minimum distance between non-similar samples to be not less than the threshold m+.
The multi-scale measurement and sample less learning network model based on the class characteristics is trained and tested by a task mode, so that the original data set D is required to be sampled and constructed. Firstly, the original data set D is divided into a training data set and a test data set, which correspond to training and test phases of the learning with few samples respectively. The training data set and the test data set are randomly sampled to generate a plurality of tasks, wherein a single task comprises a support sample set and a query sample set, the query sample labels of the tasks are necessarily contained in the support sample labels, that is, the purpose of the model is to query which category labels in the support set the sample belongs to in the test task after a large number of training tasks are learned. If a task's supporting sample set has N classes, each class has K samples, such a task is called an N-way K-shot task, and a typical 5-way 1-shot task is shown in FIG. 5 below.
In order to evaluate the performance of the multi-scale measurement and sample learning CFMMN network model based on class characteristics in the invention, experiments of 5-way 1-shot, 5-way 5-shot, 20-way 1-shot and 20-way 5-shot are respectively carried out on Omniglot and mini-ImageNet data sets, and comparison analysis is carried out on other algorithms. In the invention, a training set-test set=8:2 mode is adopted, the standard of model evaluation is the accuracy Acc of inquiring sample labels on the test set, and the baselines of MN, PN and RN network models listed below are the same as those in the text.
As shown in FIG. 6, the experimental result of CFMMN network model on Omniglot data set shows that the accuracy of the 5-way 1-shot task reaches 99.34% +/-0.27%, the accuracy of the 5-way 5-shot task reaches 99.55% +/-0.19%, the accuracy of the 5-way 5-shot task is respectively improved by 1.74% and 1.25% compared with MN, the accuracy of the 5-way 5-shot task is respectively improved by 2.04% and 0.65% compared with PN, the accuracy of the 5-way 1-shot task is respectively improved by 0.44% and 0.51% compared with RN, the accuracy of the 20-way 1-shot task is improved by 1.82% on RN, and the accuracy of the 20-way 5-shot task is improved by 0.51% on RN.
The experimental results of CFMMN network models on a mini-ImageNet data set are shown in figure 7, and the classification accuracy of the 5-way 1-shot task and the 5-way 5-shot task is improved by 5.35% and 6.74% compared with that of RN.
The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.