Disclosure of Invention
Aiming at the defects of the prior art, the application provides a multi-scene adaptive model fusion method and a face recognition system, the application realizes the compatibility of various face recognition scenes by a single threshold value through fusion of models, can be rapidly deployed in different recognition scenes, and the recognition speed and recognition accuracy of the face recognition by the system are improved. The application adopts the following technical scheme.
Firstly, in order to achieve the aim, a multi-scene adaptive model fusion method is provided, which comprises the steps that firstly, the combination of face recognition models corresponding to different scenes in a model library is exhausted; step two, screening out the combination with the operation speed meeting the requirement of the target platform in each combination in the step one, and recording the combination as a model combination; thirdly, respectively calculating the accuracy and the threshold value C { A1, T1}, C { A2, T2}, C { An, tn }, of each model combination obtained in the second step, and then respectively carrying out normalization processing on each accuracy and each threshold value to obtain the normalization accuracy An and the normalization threshold value Tn of each model combination in each scene; wherein n represents a scene number, an represents the precision of the model in the nth scene, tn represents the threshold value of the model in the nth scene, a fourth step of calculating a weighted sum acc=c { w1 a1+w2+w2.+ An }, of the normalized precision of each model combination in each scene, calculating variance var=var (T1, T2, T3.+ -., tn) of the normalized threshold value of each model combination in each scene, wherein wn represents a weighted value of the normalized precision corresponding to the model combination in the nth scene, a fifth step of calculating An evaluation value eval=acc+ (1-VAR) of each model combination sum according to the weighted sum ACC of the normalized precision and the variance VAR of the normalized threshold value, a sixth step of screening out a model combination with the highest evaluation value Eval, constructing a fusion model according to the model combination, and splicing and combining the feature vectors extracted by each face recognition model in the model combination, and carrying out face recognition according to the feature vector group obtained after the splicing and combination.
Optionally, in the third step, the precision an of the model combination in the nth scene is obtained by calculating an ROC curve of the model combination in the nth scene according to a test set corresponding to the nth scene, and searching a recall rate or a false detection rate meeting the false detection rate requirement in the ROC curve, and calculating to obtain the precision an of the model combination in the nth scene.
Optionally, in the third step of the multi-scene adaptive model fusion method, the threshold tn of the model combination in the nth scene is obtained by calculating a ROC curve of the model combination in the nth scene according to a test set corresponding to the nth scene, and searching the threshold tn meeting the false detection rate requirement in the ROC curve.
Optionally, the multi-scene adaptive model fusion method according to any one of the above, wherein the step of calculating the ROC curve of the model combination under the nth scene according to the test set corresponding to the nth scene includes the steps of r1 respectively extracting model feature vectors corresponding to face images in the test set according to each face recognition model included in the model combination, r2 splice and combine the model feature vectors respectively extracted by each face recognition model in the step r1 into multi-dimensional vectors, and r3 compare the inter-vector distances between the multi-dimensional vectors obtained by the splice and combination and the recognition vectors corresponding to each face image, and obtain the false detection rate and the recall rate under the threshold according to different thresholds.
The multi-scene adaptive model fusion method is characterized in that the fusion model is used for carrying out recognition processing on face images to be recognized according to the following steps, wherein the step S1 is used for respectively extracting model feature vectors corresponding to the face images to be recognized according to each face recognition model contained in the fusion model, the step S2 is used for splicing and combining the model feature vectors respectively extracted by the face recognition models in the step S1 into multi-dimensional vectors, the step S3 is used for comparing the multi-dimensional vectors obtained by combination with the inter-vector distance between recognition vectors corresponding to the recognition objects, and when the Euclidean distance between the two vectors is smaller than a threshold Tn, the recognition result is output as the recognition object corresponding to the recognition vector.
Optionally, the multi-scene adaptive model fusion method according to any one of the above, wherein the recognition vectors corresponding to the recognition objects are stored in the storage unit in advance according to the steps of firstly extracting model feature vectors corresponding to the recognition objects according to each face recognition model contained in the fusion model, then combining the model feature vectors extracted by the face recognition models into one-dimensional recognition vectors, storing the recognition vectors in the storage unit, and marking the correspondence between the recognition vectors and the recognition objects.
Meanwhile, in order to achieve the above purpose, the application also provides a face recognition system which comprises an image acquisition module, a first storage unit and a second storage unit, wherein the image acquisition module is used for acquiring face images to be recognized, the first storage unit is internally stored with a model library, each face recognition model in the model library corresponds to different scenes respectively, the second storage unit is internally stored with an executable program, and when the executable program is executed by a processor, the processor is enabled to construct a fusion model according to any one of the method steps, so that recognition vectors corresponding to each recognition object are recorded according to the fusion model obtained through construction, and recognition processing is carried out on the face images to be recognized according to the fusion model obtained through construction.
Optionally, the face recognition system according to any one of the above embodiments, wherein the specific step of performing recognition processing on the face image to be recognized according to the obtained fusion model includes a step S1 of respectively extracting a model feature vector corresponding to the face image to be recognized according to each face recognition model included in the fusion model, a step S2 of splicing and combining the model feature vectors respectively extracted by each face recognition model in the step S1 into a multidimensional vector, and a step S3 of comparing a vector distance between the multidimensional vector obtained by the splicing and combining and a recognition vector corresponding to each recognition object, and outputting a recognition result as the recognition object corresponding to the recognition vector when a euclidean distance between the two vectors is smaller than a threshold Tn, otherwise judging that the recognition fails.
Optionally, the face recognition system according to any one of the preceding claims, further comprising an interaction interface, configured to receive, in the fourth step, setting a weighting value wn of normalization precision corresponding to the model combination in the nth scene.
Optionally, the face recognition system according to any one of the preceding claims further comprises a recognition object storage unit for storing recognition vectors corresponding to the recognition objects, wherein the recognition vectors are stored by firstly extracting model feature vectors corresponding to the recognition objects according to each face recognition model contained in the fusion model, then splicing and combining the model feature vectors extracted by each face recognition model into a multidimensional recognition vector, and storing the multidimensional recognition vectors in the recognition object storage unit and marking the correspondence between the multidimensional recognition vectors and the recognition objects.
Advantageous effects
According to the application, a plurality of face recognition models meeting the operation speed requirement are screened out according to different target platforms, the face recognition models are combined into a plurality of model combinations, then the precision and the threshold value of each model combination are evaluated under different scenes, and the model combinations with high precision and strong threshold value compatibility are screened out according to the precision and the threshold value of different model combinations to construct a fusion model. The fusion model obtained by the method can realize compatibility of various face recognition scenes by a single threshold value, can be rapidly deployed in different recognition scenes, and improves the recognition speed and recognition accuracy of the face recognition by the system.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Detailed Description
In order to make the purpose and technical solutions of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present application. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present application fall within the protection scope of the present application.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The meaning of "and/or" in the present application means that each exists alone or both exist.
"Connected" as used herein means either a direct connection between components or an indirect connection between components via other components.
Fig. 1 is a schematic diagram of a multi-scenario adaptive model fusion method according to the present application, which can construct a fusion model suitable for multiple application scenarios in the manner shown in fig. 2. The fusion model can be installed in a face recognition system with an image acquisition module through an installation program so as to effectively recognize face images under different working environments according to a unified threshold through the steps of fig. 1.
The face recognition system may be configured to include:
The image acquisition module can be realized by a camera or an image sensor and is used for acquiring the face image to be identified;
The first storage unit can be arranged in the face recognition system and can also realize cloud interaction through a communication network, a model library is stored in the first storage unit, and each face recognition model in the model library corresponds to different scenes respectively;
The second storage unit can be selectively arranged at the local of the image acquisition module and can also selectively provide identification operation through a cloud interaction mode, and an executable program is stored in the second storage unit, and when the executable program is executed by a processor at the local of the image acquisition module or a processor at the cloud, the corresponding processor can be set to construct a fusion model according to the following method steps, so that identification vectors corresponding to all identification objects are recorded according to the fusion model obtained by construction, and identification processing is carried out on a face image to be identified according to the fusion model obtained by construction:
Firstly, exhaustion of the combination of face recognition models corresponding to different scenes in a model library;
screening out the combination with the face recognition model operation speed meeting the target platform requirement in each combination in the first step, and recording the combination as a model combination;
Thirdly, respectively calculating the precision and the threshold value of each model combination in each scene, taking one model combination as An example, wherein the precision and the threshold value of each model combination in n scenes can be respectively recorded as C { a1, t1}, C { a2, t2}, C { An, tn }, and then respectively carrying out normalization processing on each precision and each threshold value to obtain the normalization precision An and the normalization threshold value Tn of each model combination in each scene, wherein n represents the scene label, an represents the precision of the model in the n scene, and Tn represents the threshold value of the model in the n scene;
Calculating a weighted sum of normalization accuracy of each model combination in n scenes for unifying threshold values of the fusion models in the multiple scenes, wherein acc=c { w1 x a1+w2 x a2+ & gt wn x An }, calculating variance var=var (T1, T2, T3,) of the normalization threshold values of each model combination in n scenes, wherein the smaller the variance of the threshold values indicates stronger compatibility of the models across the scenes, and wherein wn represents a weighted value of normalization accuracy corresponding to the currently calculated model combination in the nth scene;
Fifthly, respectively calculating an evaluation value Eval=ACC+ (1-VAR) of the combination sum of the models according to the weighted sum ACC of the normalization precision and the variance VAR of the normalization threshold value so as to synthesize precision and a threshold variance evaluation index, wherein the higher the synthesized value is, the stronger the scene adaptation capability of the fusion model is;
And step six, screening out a model combination with the highest evaluation value Eval, constructing a fusion model according to the model combination, splicing and combining feature vectors extracted by each face recognition model in the model combination through the fusion model, comparing recognition vectors corresponding to each recognition object according to the feature vector group obtained after the splicing and combining, and carrying out face recognition according to whether the distance between the vectors calculated by the comparison reaches a threshold Tn. For example, if one fusion model includes 3 face recognition models, each face recognition model can output a 512-dimensional feature vector, the feature vectors after the combination of the fusion models include three models, the three models are fused into a 3x 512-dimensional feature vector, and the feature vector is compared with the recognition vector corresponding to each recognition object, so as to determine whether the feature vector is the recognition object corresponding to the recognition vector.
In the above process, the precision an of any one model combination in scene n is obtained by the following steps:
Firstly, a test set of face recognition is established, and generally, each different face recognition scene, such as an adult scene, a child scene, a mask wearing scene and the like, needs to be respectively corresponding to different test sets. In order to evaluate the precision of the face recognition model in different scenes, ROC curves need to be calculated on each test set to obtain recall rates under a certain false detection rate, and meanwhile, different thresholds are corresponding, and the test set needs to contain a base of the same person and snap shots in the corresponding scene;
Then, each face recognition model contained in the model combination is used for respectively extracting model feature vectors corresponding to each face of the bottom library in the test set and each face of the snapshot, the model feature vectors respectively extracted by the face recognition models are spliced and combined into multidimensional vectors, and Euclidean distance is calculated between the bottom library of the face and the face snapshot features;
Taking the number of the base libraries as n and the number of the snap shots as m as an example, m x n groups of Euclidean distances can be obtained after the calculation, so that the comparison results of the features between the face base libraries and the snap shots can be sequenced from large to small, the inter-vector distance between the multidimensional vector obtained by splicing and combining and the recognition vector corresponding to each face image is obtained, and the corresponding error detection rate and recall rate under different thresholds are calculated according to the requirements of the scene on the error detection rate, so that ROC curves of the test set corresponding to each scene under the scene combined by the model are obtained;
and searching the recall rate or the false detection rate meeting the false detection rate requirement in the ROC curve, and calculating to obtain the precision an of the model combination in the nth scene.
The ROC curve is used as an index of a measurement model, the abscissa of the curve is the false detection rate, and the ordinate is the recall rate. Thus, the passing rate at different false detection rates can be represented by each point on the curve, corresponding to a different threshold. In order to control the false detection rate of face recognition, the recall rate under a certain false detection rate is generally selected as an index of an evaluation model.
Similarly, the threshold tn for any model combination in scene n is obtained by:
calculating ROC curves of model combinations under the nth scene according to the test set corresponding to the nth scene;
and searching a threshold tn meeting the false detection rate requirement in the ROC curve.
The fusion model obtained in the mode of fig. 2 can perform recognition processing on the face image to be recognized according to the following steps:
the recognition vectors corresponding to the recognition objects are stored in a storage unit in advance according to the following steps a-c to be used as evaluation references of face recognition:
Step a, respectively extracting model feature vectors corresponding to the recognition objects according to each face recognition model contained in the fusion model,
Step b, combining the model feature vectors respectively extracted by the face recognition models into one-dimensional recognition vectors,
Step c, storing the identification vector in a storage unit and marking the corresponding relation between the identification vector and the identification object;
Then, the recognition objects in the storage unit are subjected to face recognition according to steps S1 to S3:
Step S1, respectively extracting model feature vectors corresponding to face images to be recognized according to each face recognition model contained in the fusion model;
s2, splicing and combining model feature vectors respectively extracted by the face recognition models in the step S1 into multidimensional vectors;
and S3, comparing the multi-dimensional vectors obtained by combination with the inter-vector distances between the recognition vectors corresponding to the recognition objects, and outputting a recognition result as the recognition object corresponding to the recognition vector when the Euclidean distance between the two vectors is smaller than a threshold Tn, otherwise, judging that the recognition fails if the inter-vector distance always exceeds the threshold.
Considering that different weighted values need to be set according to scene specific in the construction process of the fusion model under different scenes, the application can also preferably add an interactive interface in the face recognition system to be used for receiving the setting of the weighted value wn of the normalization precision corresponding to the model combination under the nth scene in the fourth step.
Therefore, the application provides a cross-scene adaptive face recognition model fusion method for solving the scene adaptability problem of the face recognition model. The method can fuse a plurality of face recognition models into a fusion model suitable for a plurality of scenes, and can realize face recognition under different scenes through a unified threshold. The application can solve the problem of single scene overfitting of the face recognition model by a model fusion technology, and can use the same threshold value to be compatible with a plurality of use scenes. The method can limit the speed of the fusion model at the same time, and the model with highest precision meeting the speed requirement is obtained.
The foregoing is a description of embodiments of the application, which are specific and detailed, but are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application.