Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, electromechanical integration, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The invention provides a face recognition method. Referring to fig. 1, a flow chart of a face recognition method according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the face recognition method includes:
step S110, inputting a face image dataset into a face recognition network based on lightweight multi-scale feature fusion to perform face recognition model training, wherein the face image dataset comprises a face picture, a real frame of a face target marked on the face picture, the width and the height of the real frame of the face target, center point coordinates and identity categories marked on the real frame of the face target.
Specifically, the face image dataset includes a plurality of face images, each face image includes at least one face image, the source of the face image can be a face image acquired through a social network or a face image which is specially shot according to an actual purpose, each face on each face image carries out real frame marking of a face target, the width, the height and the center point coordinates of each real frame are calculated, and identity class marking is required for each real frame, wherein the identity class can be identity information of the face, such as an identity card number and a name, or other identity information which needs to be authenticated during face recognition, such as a work number code of a corresponding employee in staff attendance checking equipment.
As an optional embodiment of the present invention, the face image dataset is stored in a blockchain, and before the face image dataset is input into the face recognition network based on lightweight multi-scale feature fusion for training the face recognition model, the method further comprises:
acquiring a face image sample to obtain a face image sample set;
labeling the real frames of the face targets in the face image sample set, calculating the width, the height and the center point coordinates of the real frames of the labeled face targets, and marking the identity types of the real frames of the face targets to obtain a face image data set.
Specifically, corresponding face image samples are collected according to the actual face recognition application scene, for example, for places needing identity verification such as stations and airports, the face images and corresponding information can be collected through a system, at the moment, each image can only comprise one face, the face images of workers can be directly collected for staff attendance, and each image can have one face or a plurality of faces. The method comprises the steps of collecting a plurality of face image samples to form a face image sample set, carrying out real frame marking on each face image in the face image sample set, and carrying out identity category marking on the real frame of each marked face object by adopting picture files in different formats, for example TXT format pictures, recording face coordinates in the picture files in the format, marking the real frame of the face object through face coordinates, calculating the width, the height and the center point coordinates of the real frame, and carrying out identity category marking on the real frame of each marked face object.
As an optional embodiment of the invention, marking the real frame of the face object in the face image sample set, calculating the width, height and center point coordinates of the marked real frame of the face object, marking the identity category of the real frame of the face object, and obtaining the face image data set comprises:
Defining a face image sample set as { Datak (X, Y), K epsilon [1, K ], X epsilon [1, X ], Y epsilon [1, Y ] }, wherein Datak (X, Y) represents pixel information of an xth line and a yth column of a kth picture in the face image sample set, K represents the number of pictures in the face image sample set, X represents the number of lines of pixels of the pictures in the face image sample set, and Y represents the number of columns of the pixels of the pictures in the face image sample set;
Labeling a real frame of each face target in each picture in the defined face image sample set, wherein the real frame of the face target is defined as:
wherein, theRepresenting the upper left corner coordinates of the real border of the nth face object in the kth picture in the face image sample set,Representing the abscissa of the upper left corner coordinate point of the real frame of the nth face object in the kth picture in the face image sample set,Representing the ordinate of the upper left corner coordinate point of the real frame of the nth face target in the kth picture in the face image sample set; Representing the lower right corner coordinates of the real border of the nth face object in the kth picture in the face image sample set,Representing the abscissa of the lower right corner coordinate point of the real frame of the nth face object in the kth picture in the face image sample set,The method comprises the steps of obtaining a preset face recognition network model, representing the ordinate of a right lower corner coordinate point of a real frame of an nth face target in a kth picture in a face image sample set, wherein K represents the number of pictures in the face image sample set, and Nk represents the number of real frames of the face target in the kth picture in a face image training data set selected as the face image training data set for the preset face recognition network model training;
Respectively calculating the width, height and center point coordinates of the real frame of the face target according to a preset real frame width calculation formula, a preset real frame height calculation formula and a preset real frame center point coordinate calculation formula, marking the identity category of the real frame of the face target to obtain a face image dataset,
The preset real frame width calculation formula is as follows:
the preset real frame height calculation formula is as follows:
The preset real frame center point coordinate calculation formula is as follows:
the identity category of the real frame of the face target is defined as:
wherein, theRepresenting the width of the real frame of the nth face object in the kth image in the face image sample set,And the height of a real frame of an nth face target in an kth image in the face image sample set is represented, and t represents reality.
Specifically, by defining a face image sample set, each picture in the face image sample set and each face image on each picture can be represented by the corresponding position, the real frame of the face target is defined according to the position of the face image, the height, width and center coordinates of the face target are calculated according to the real frame, and the identity class of the real frame is defined, so that the picture information is converted into digital information capable of being operated, and the subsequent calculation is facilitated.
And step 120, constructing a loss function model in the face recognition network trained by the face recognition model through the real frame of the face target, the width and the height of the real frame of the face target, the center point coordinates and the identity class of the real frame of the face target.
Specifically, the loss function (loss function) model is used for measuring the inconsistency degree of the predicted value f (x) and the true value Y of the face recognition network obtained through training, and is a non-negative real value function, and is expressed by L (Y, f (x)), and the smaller the loss function is, the better the robustness of the network obtained through training is. The loss function is a core part of the empirical risk function and is also an important component of the structural risk function. The structural risk function of the model includes empirical risk terms and regularization terms.
In order to ensure the accuracy of the face recognition network model obtained after the face recognition network is trained, a loss function model is required to be built in the face recognition network obtained after the training, a face image data set can be divided into two parts in the process of training the model, one part is used as a face image training data set, the other part is used as a face image verification optimizing data set, the face image training data set is used for training the face recognition network, and then the face image verification optimizing data set is used for optimizing.
As an alternative embodiment of the invention, the loss function model includes a target bounding box loss model;
the calculation formula of the target bounding box loss model is as follows:
wherein, theThe Euclidean distance between the center points of the predicted frame and the real frame of the face target is represented, c represents the length of the diagonal line of the minimum rectangle of the predicted frame and the real frame which can cover the face target, and the calculation mode of c is as follows:
IOU represents the intersection ratio of the predicted border and the real border of the face object, Ch and CW represent the height and width of the smallest rectangle capable of covering the predicted border and the real border of the face object,H and hgt respectively represent the height of the predicted frame of the face target and the height of the real frame of the face target, w and wgt respectively represent the height of the predicted frame of the face target and the width of the real frame of the face target, and the calculation modes of v and alpha in the formula are as follows:
The width and the height of the predicted frame of each face target in each face picture in the face image data set are respectively defined as:
And
The central point coordinates of the predicted frame of each face target in each face image in the face image data set are defined as follows:
specifically, through the created target bounding box loss model, the accuracy degree of the predicted frame marking position of the face target of the face image to be recognized can be optimized.
As an alternative embodiment of the invention, the loss function model further comprises a target confidence loss model;
the calculation formula of the target confidence loss model is as follows:
wherein, theRepresenting identity categories within the true borders of the nth face object in the kth picture in the face image dataset,And (3) representing identity categories in a predicted frame of an nth face target in a kth picture in the face image data set, wherein lambdanoobject represents confidence penalty weights when no target exists in the predicted frame of the face target.
Specifically, the accuracy of the predicted target of the predicted frame of the face image to be recognized can be optimized through the created target confidence loss model.
As an alternative embodiment of the invention, the loss function model further comprises a target class loss model;
The calculation formula of the target class loss model is as follows:
wherein, theRepresenting the confidence of the identity class in the real border of the nth face object in the kth picture in the face image dataset,Representing the identity category confidence in the predicted frame of the nth face object in the kth picture in the face image data set;
The predicted frame of each face target in each face image in the face image dataset is defined as:
wherein, theRepresenting the upper left corner coordinates of the predicted border of the nth face object in the kth picture in the face image dataset,Representing the abscissa of the upper left corner coordinate point of the predicted border of the nth face object in the kth picture in the face image dataset,Representing the ordinate of the upper left corner coordinate point of the predicted frame of the nth face target in the kth picture in the face image data set; Representing the lower right corner coordinates of the predicted border of the nth face object in the kth picture in the face image dataset,Representing the abscissa of the lower right corner coordinate point of the n-th face target prediction frame in the k-th picture in the face image data set,And Nk 'represents the ordinate of a right lower corner coordinate point of a predicted frame of an nth face target in a kth picture in the face image dataset, and Nk' represents the number of predicted frames of the face target selected as the face target in the face image training dataset for training of the preset face recognition network model.
Specifically, the accuracy of identity class prediction in the prediction frame of the face image to be recognized can be optimized through the created target class loss model.
Step S130, optimizing and training the loss function model by adopting a gradient descent method, obtaining a face recognition network model based on lightweight multi-scale feature fusion when the loss function of the loss function model reaches a preset threshold value, wherein,
The face recognition network model based on the lightweight multi-scale feature fusion comprises a main network layer, a pooling layer, a feature fusion layer and a detection head layer, wherein the main network layer is used for carrying out feature extraction on three different image dimensions on a face image, the pooling layer is used for pooling third output features obtained by the main network layer, the feature fusion layer is used for carrying out feature fusion processing on first output features, second output features and pooled third output features obtained by the pooling layer respectively, and the detection head layer is used for generating a prediction result according to the three fused features obtained by the feature fusion layer.
Specifically, the loss function of the loss function model of the face recognition network based on lightweight multi-scale feature fusion can only comprise one of a target confidence loss model, a target boundary box loss model and a target category loss model, or can also comprise two of the three loss function models, when the three loss function models are simultaneously included, the loss function of the face recognition network based on lightweight multi-scale feature fusion is loss (object) =loss (box) + loss (confidence) +loss (type), and when loss (object) reaches a preset threshold, training is completely optimized, and the face recognition network model based on lightweight multi-scale feature fusion is obtained.
The loss function model of the face recognition network based on lightweight multi-scale feature fusion is trained by adopting the face recognition network based on lightweight multi-scale feature fusion, so that the face recognition network based on lightweight multi-scale feature fusion has the advantages of less calculation parameters, smaller occupied memory, high accuracy and the like, and as shown in fig. 1.1 and 1.2, the invention can obviously realize high-performance face recognition at a mobile terminal and embedded equipment by improving the integral structure of the original YOLOv network structure, thereby obtaining the structure of the face recognition network based on lightweight multi-scale feature fusion, extracting image features by using the lightweight network, and introducing the self-adaptive feature fusion structure to fuse features of different scales.
Step S140, inputting the face image to be recognized into a face recognition network model based on lightweight multi-scale feature fusion for face recognition, and obtaining a predicted frame of a face target in the face image to be recognized, an identity class corresponding to the predicted frame and recognition accuracy corresponding to the identity class.
Specifically, the face image to be recognized is input into the face recognition network model based on lightweight multi-scale feature fusion, the face image is processed by each functional structure layer in the face recognition network model based on lightweight multi-scale feature fusion, and finally the predicted frame of the face target in the face image to be recognized, the identity class corresponding to the predicted frame and the recognition accuracy corresponding to the identity class are output.
As an optional embodiment of the invention, inputting the face image to be recognized into a face recognition network model based on lightweight multi-scale feature fusion for face recognition, and obtaining the predicted frame of the face target in the face image to be recognized, the identity class corresponding to the predicted frame and the recognition accuracy corresponding to the identity class comprise:
Three different image dimension feature extraction is carried out on a face image to be identified through a backbone network layer to obtain three different dimension features, namely a feature X1, a feature X2 and a feature X3, and convolution operation processing is carried out on the feature X1, the feature X2 and the feature X3 to obtain a first output feature Level1, a second output feature Level2 and a third output feature respectively;
Carrying out pooling treatment on the third output characteristics through a pooling layer to obtain pooled third output characteristics Level3;
performing feature fusion processing on the first output feature Level1, the second output feature Level2 and the pooled third output feature Level3 through a feature fusion layer to respectively obtain a first fusion feature ASFF, a second fusion feature ASFF and a third fusion feature ASFF3, wherein,
The first fusion feature ASFF is obtained by multiplying and then adding Level1, level2 and Level3 to the parameter alpha1,β1,γ1 respectively, ASFF is obtained by multiplying and then adding Level1, level2, level3 to the parameter alpha2,β2,γ2 respectively, and ASFF3 is obtained by multiplying and then adding Level1, level2, level3 to the parameter alpha3,β3,γ3 respectively;
By detecting the head layer, a predicted frame of a face target in a face image to be recognized, an identity class corresponding to the predicted frame, and recognition accuracy corresponding to the identity class are generated according to the first fusion feature ASFF, the second fusion feature ASFF and the third fusion feature ASFF.
Specifically, when a face image to be recognized is input into a face recognition network model based on lightweight multi-scale feature fusion, feature extraction is performed on the face image according to three different size dimensions on a backbone network layer, at this time, single features of the face, such as eyes, noses, mouths and the like, are extracted, then the single features are comprehensively formed into three output features through convolution calculation, namely a first output feature Level1, a second output feature Level2 and a third output feature, as a feature layer of the third output feature is connected with a pooling layer, pooling processing is performed through the third output feature of the pooling layer, computation of the third output layer is simplified after dimension reduction is performed, a pooled third output feature Level3 is obtained, feature fusion processing is performed on the first output feature Level1, the second output feature Level2 and the pooled third output feature Level3 through the feature fusion layer, alpha1,β1,γ1,α2,β2,γ2,α3,β3,γ3 is a known parameter, and finally a predicted result of the face image to be recognized is obtained according to the feature to the fused by detecting the head layer.
Fig. 2 is a functional block diagram of a face recognition device according to an embodiment of the present invention.
The face recognition device 200 of the present invention may be installed in an electronic apparatus. Depending on the implemented functionality, the face recognition device may include a training module 210, a loss function construction module 220, an optimization module 230, a prediction module 240. The module of the present invention may also be referred to as a unit, meaning a series of computer program segments capable of being executed by the processor of the electronic device and of performing fixed functions, stored in the memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the training module 210 is configured to input the face image dataset into a face recognition network based on lightweight multi-scale feature fusion for face recognition model training.
The face image dataset comprises a face picture, a real frame of a face target marked on the face picture, the width and the height of the real frame of the face target, center point coordinates and identity categories marked on the real frame of the face target.
Specifically, the face image dataset includes a plurality of face images, each face image includes at least one face image, the source of the face image can be a face image acquired through a social network or a face image which is specially shot according to an actual purpose, each face on each face image carries out real frame marking of a face target, the width, the height and the center point coordinates of each real frame are calculated, and identity class marking is required for each real frame, wherein the identity class can be identity information of the face, such as an identity card number and a name, or other identity information which needs to be authenticated during face recognition, such as a work number code of a corresponding employee in staff attendance checking equipment.
As an alternative embodiment of the present invention, the face image dataset is stored in a blockchain, and the face recognition device 200 further includes a face image sample acquisition module and a labeling module (not shown). Wherein, the
The face image sample acquisition module is used for acquiring a face image sample to obtain a face image sample set;
The labeling module is used for labeling the real frames of the face targets in the face image sample set, calculating the width, the height and the center point coordinates of the real frames of the labeled face targets, and labeling the identity categories of the real frames of the face targets to obtain the face image data set.
Specifically, through a face image sample acquisition module, corresponding face image samples are acquired according to the application scene of actual face recognition, for example, for places needing identity verification such as stations and airports, the face images and corresponding information can be acquired through a public security system, at the moment, only one face image can be included in each picture, the face images of workers can be directly acquired for staff attendance, and each picture can have one face or a plurality of faces. The method comprises the steps of collecting a plurality of face image samples to form a face image sample set, marking the real frames of each face image on each face image in the face image sample set through a marking module, finishing the marking of the real frames by adopting different format image files, for example, TXT format images, recording face coordinates in the format image files, marking the real frames of the face targets through the face coordinates, calculating the width, the height and the center point coordinates of the real frames, and marking the identity category of the real frames of each marked face target.
As an alternative embodiment of the invention, the labeling module further comprises a sample set definition unit, a real border labeling unit and a computing unit (not shown in the figure). Wherein, the
The sample set definition unit is used for defining the face image sample set as { Datak (x, y), k epsilon-
Wherein, datak (X, Y) represents the pixel information of the X-th row and Y-th column of the kth image in the face image sample set, K represents the number of pictures in the face image sample set, X represents the number of rows of the pixels of the pictures in the face image sample set, and Y represents the number of columns of the pixels of the pictures in the face image sample set;
The real frame labeling unit is used for labeling the real frame of each face target in each picture in the defined face image sample set, wherein the real frame of the face target is defined as follows:
wherein, theRepresenting the upper left corner coordinates of the real border of the nth face object in the kth picture in the face image sample set,Representing the abscissa of the upper left corner coordinate point of the real frame of the nth face object in the kth picture in the face image sample set,Representing the ordinate of the upper left corner coordinate point of the real frame of the nth face target in the kth picture in the face image sample set; Representing the lower right corner coordinates of the real border of the nth face object in the kth picture in the face image sample set,Representing the abscissa of the lower right corner coordinate point of the real frame of the nth face object in the kth picture in the face image sample set,The method comprises the steps of obtaining a preset face recognition network model, representing the ordinate of a right lower corner coordinate point of a real frame of an nth face target in a kth picture in a face image sample set, wherein K represents the number of pictures in the face image sample set, and Nk represents the number of real frames of the face target in the kth picture in a face image training data set selected as the face image training data set for the preset face recognition network model training;
A calculating unit, configured to calculate the width, height and center point coordinates of the real frame of the face target according to a preset real frame width calculating formula, a preset real frame height calculating formula and a preset real frame center point coordinate calculating formula, and mark the identity class of the real frame of the face target to obtain a face image dataset,
The preset real frame width calculation formula is as follows:
the preset real frame height calculation formula is as follows:
The preset real frame center point coordinate calculation formula is as follows:
the identity category of the real frame of the face target is defined as:
wherein, theRepresenting the width of the real frame of the nth face object in the kth image in the face image sample set,And the height of a real frame of an nth face target in an kth image in the face image sample set is represented, and t represents reality.
Specifically, a face image sample set is defined through a sample set definition unit, each picture in the face image sample set and each face image on each picture can be represented by the corresponding position of the picture, the real frame of a face target is defined according to the position of the face image through a real frame labeling unit, the height, the width and the center coordinates of the face target are calculated according to the real frame through a calculation unit, the identity category of the real frame is defined, and the picture information is converted into digital information capable of being operated, so that subsequent calculation is facilitated.
The loss function construction module 220 is configured to construct a loss function model according to the real frame of the face object, the width and the height of the real frame of the face object, and the identity class of the center point coordinate and the real frame of the face object in the face recognition network trained by the face recognition model.
Specifically, the loss function (loss function) model is used for measuring the inconsistency degree of the predicted value f (x) and the true value Y of the face recognition network obtained through training, and is a non-negative real value function, and is expressed by L (Y, f (x)), and the smaller the loss function is, the better the robustness of the network obtained through training is. The loss function is a core part of the empirical risk function and is also an important component of the structural risk function. The structural risk function of the model includes empirical risk terms and regularization terms.
In order to ensure the accuracy of the face recognition network model obtained after the face recognition network is trained, a loss function model is required to be built in the face recognition network obtained after the training, a face image data set can be divided into two parts in the process of training the model, one part is used as a face image training data set, the other part is used as a face image verification optimizing data set, the face image training data set is used for training the face recognition network, and then the face image verification optimizing data set is used for optimizing.
As an alternative embodiment of the invention, the loss function model includes a target bounding box loss model;
the calculation formula of the target bounding box loss model is as follows:
wherein, theThe Euclidean distance between the center points of the predicted frame and the real frame of the face target is represented, c represents the length of the diagonal line of the minimum rectangle of the predicted frame and the real frame which can cover the face target, and the calculation mode of c is as follows:
IOU represents the intersection ratio of the predicted border and the real border of the face object, Ch and Cw represent the height and width of the smallest rectangle capable of covering the predicted border and the real border of the face object,H and hgt respectively represent the height of the predicted frame of the face target and the height of the real frame of the face target, w and wgt respectively represent the height of the predicted frame of the face target and the width of the real frame of the face target, and the calculation modes of v and alpha in the formula are as follows:
The width and the height of the predicted frame of each face target in each face picture in the face image data set are respectively defined as:
And
The central point coordinates of the predicted frame of each face target in each face image in the face image data set are defined as follows:
specifically, through the created target bounding box loss model, the accuracy degree of the predicted frame marking position of the face target of the face image to be recognized can be optimized.
As an alternative embodiment of the invention, the loss function model further comprises a target confidence loss model;
the calculation formula of the target confidence loss model is as follows:
wherein, theRepresenting identity categories within the true borders of the nth face object in the kth picture in the face image dataset,And (3) representing identity categories in a predicted frame of an nth face target in a kth picture in the face image data set, wherein lambdanoobject represents confidence penalty weights when no target exists in the predicted frame of the face target.
Specifically, the accuracy of the predicted target of the predicted frame of the face image to be recognized can be optimized through the created target confidence loss model.
As an alternative embodiment of the invention, the loss function model further comprises a target class loss model;
The calculation formula of the target class loss model is as follows:
wherein, theRepresenting the confidence of the identity class in the real border of the nth face object in the kth picture in the face image dataset,Representing the identity category confidence in the predicted frame of the nth face object in the kth picture in the face image data set;
The predicted frame of each face target in each face image in the face image dataset is defined as:
wherein, theRepresenting the upper left corner coordinates of the predicted border of the nth face object in the kth picture in the face image dataset,Representing the abscissa of the upper left corner coordinate point of the predicted border of the nth face object in the kth picture in the face image dataset,Representing the ordinate of the upper left corner coordinate point of the predicted frame of the nth face target in the kth picture in the face image data set; Representing the lower right corner coordinates of the predicted border of the nth face object in the kth picture in the face image dataset,Representing the abscissa of the lower right corner coordinate point of the n-th face target prediction frame in the k-th picture in the face image data set,And Nk 'represents the ordinate of a right lower corner coordinate point of a predicted frame of an nth face target in a kth picture in the face image dataset, and Nk' represents the number of predicted frames of the face target selected as the face target in the face image training dataset for training of the preset face recognition network model.
Specifically, the accuracy of identity class prediction in the prediction frame of the face image to be recognized can be optimized through the created target class loss model.
The optimizing module 230 is configured to perform optimization training on the loss function model by using a gradient descent method, and obtain a face recognition network model based on lightweight multi-scale feature fusion when the loss function of the loss function model reaches a preset threshold value, where,
The face recognition network model based on the lightweight multi-scale feature fusion comprises a main network layer, a pooling layer, a feature fusion layer and a detection head layer, wherein the main network layer is used for carrying out feature extraction on three different image dimensions on a face image, the pooling layer is used for pooling third output features obtained by the main network layer, the feature fusion layer is used for carrying out feature fusion processing on first output features, second output features and pooled third output features obtained by the pooling layer respectively, and the detection head layer is used for generating a prediction result according to the three fused features obtained by the feature fusion layer.
Specifically, the loss function of the loss function model of the face recognition network based on lightweight multi-scale feature fusion can only comprise one of a target confidence loss model, a target boundary box loss model and a target category loss model, or can also comprise two of the three loss function models, when the three loss function models are simultaneously included, the loss function of the face recognition network based on lightweight multi-scale feature fusion is loss (object) =loss (box) + loss (confidence) +loss (type), and when loss (object) reaches a preset threshold, training is completely optimized, and the face recognition network model based on lightweight multi-scale feature fusion is obtained.
The loss function model of the face recognition network based on lightweight multi-scale feature fusion is trained by adopting the face recognition network based on lightweight multi-scale feature fusion, so that the face recognition network based on lightweight multi-scale feature fusion has the advantages of less calculation parameters, smaller occupied memory, high accuracy and the like, and as shown in fig. 1.1 and 1.2, the invention can obviously realize high-performance face recognition at a mobile terminal and embedded equipment by improving the integral structure of the original YOLOv network structure, thereby obtaining the structure of the face recognition network based on lightweight multi-scale feature fusion, extracting image features by using the lightweight network, and introducing the self-adaptive feature fusion structure to fuse features of different scales.
The prediction module 240 is configured to input a face image to be recognized into a face recognition network model based on lightweight multi-scale feature fusion to perform face recognition, so as to obtain a predicted frame of a face target in the face image to be recognized, an identity class corresponding to the predicted frame, and recognition accuracy corresponding to the identity class.
Specifically, the face image to be recognized is input into the face recognition network model based on lightweight multi-scale feature fusion, the face image is processed by each functional structure layer in the face recognition network model based on lightweight multi-scale feature fusion, and finally the predicted frame of the face target in the face image to be recognized, the identity class corresponding to the predicted frame and the recognition accuracy corresponding to the identity class are output.
As an alternative embodiment of the present invention, the prediction module 240 further includes a feature extraction unit, a pooling unit, a feature fusion unit, and a prediction unit (not shown in the figure). Wherein, the
The feature extraction unit is used for extracting features of three different image dimensions of a face image to be identified through a backbone network layer to obtain three features of different dimensions, namely a feature X1, a feature X2 and a feature X3, and performing convolution operation processing on the feature X1, the feature X2 and the feature X3 to obtain a first output feature Level1, a second output feature Level2 and a third output feature respectively;
The pooling unit is used for pooling the third output characteristics through the pooling layer to obtain pooled third output characteristics Level3;
the feature fusion unit is used for carrying out feature fusion processing on the first output feature Level1, the second output feature Level2 and the pooled third output feature Level3 through the feature fusion layer to respectively obtain a first fusion feature ASFF, a second fusion feature ASFF2 and a third fusion feature ASFF3,
The first fusion feature ASFF is obtained by multiplying and then adding Level1, level2 and Level3 to the parameter alpha1,β1,γ1 respectively, ASFF is obtained by multiplying and then adding Level1, level2, level3 to the parameter alpha2,β2,γ2 respectively, and ASFF3 is obtained by multiplying and then adding Level1, level2, level3 to the parameter alpha3,β3,γ3 respectively;
And generating a predicted frame of a face target in the face image to be recognized, an identity class corresponding to the predicted frame and recognition accuracy corresponding to the identity class by detecting the head layer according to the first fusion feature ASFF, the second fusion feature ASFF and the third fusion feature ASFF.
Specifically, when a face image to be recognized is input into a face recognition network model based on lightweight multi-scale feature fusion, feature extraction is performed on the face image according to three different dimension dimensions in a backbone network layer by a feature extraction unit, at this time, single features of the face, such as eyes, noses, mouths and the like, are extracted, then the single features are comprehensively formed into three output features by convolution calculation, namely a first output feature Level1, a second output feature Level2 and a third output feature, as a feature layer of the third output feature is connected with a pooling layer, pooling processing is performed on the third output feature by the pooling unit, after dimension reduction is performed, calculation of the third output layer is simplified, a pooled third output feature Level3 is obtained, then feature fusion processing is performed on the first output feature Level1, the second output feature Level2 and the pooled third output feature Level3, wherein ,α1,β1,γ1,α2, β2,γ2,α3,β3,γ3 is known parameter, and finally, a face image to be recognized is obtained after the feature layer is detected by a detection unit.
Fig. 3 is a schematic structural diagram of an electronic device implementing a face recognition method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a face recognition program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, for example, a plug-in mobile hard disk, a smart memory card (SMART MEDIA CARD, abbreviated as SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of face recognition programs, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., face recognition programs, etc.) stored in the memory 11, and calling data stored in the memory 11.
The bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may further comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The face recognition program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
Inputting a face image dataset into a face recognition network based on lightweight multi-scale feature fusion to perform face recognition model training, wherein the face image dataset comprises a face picture, a real frame of a face target marked on the face picture, a width and a height of the real frame of the face target, a center point coordinate and identity categories marked on the real frame of the face target;
In a face recognition network trained by a face recognition model, constructing a loss function model through the real frame of the face target, the width and the height of the real frame of the face target, the center point coordinates and the identity class of the real frame of the face target;
Optimizing and training the loss function model by adopting a gradient descent method, obtaining a face recognition network model based on lightweight multi-scale feature fusion when the loss function of the loss function model reaches a preset threshold value, wherein,
The face recognition network model based on lightweight multi-scale feature fusion comprises a main network layer, a pooling layer, a feature fusion layer and a detection head layer, wherein the main network layer is used for carrying out feature extraction on three different image dimensions on a face image, the pooling layer is used for pooling third output features obtained by the main network layer, the feature fusion layer is used for carrying out feature fusion processing on first output features, second output features and pooled third output features obtained by the pooling layer respectively, and the detection head layer is used for generating a prediction result according to the three fused features obtained by the feature fusion layer;
And inputting the face image to be recognized into a face recognition network model based on lightweight multi-scale feature fusion to perform face recognition, so as to obtain a predicted frame of a face target in the face image to be recognized, an identity class corresponding to the predicted frame and recognition accuracy corresponding to the identity class.
Specifically, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein. It is emphasized that, to further ensure the privacy and security of the face image dataset, the face image dataset may also be stored in a blockchain node.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.