CN108805077A

Movatterモバイル変換

Info

Publication number: CN108805077A
Application number: CN201810592758.7A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2018-06-11
Filing date: 2018-06-11
Publication date: 2018-11-13

Abstract

A kind of face identification system of the deep learning network based on triple loss function proposed in the present invention, main contents include：Mutual correlation matches convolutional neural networks, the Heavenly Stems and Earthly Branches integrate convolutional neural networks, depth convolutional neural networks integrate and Performance Evaluation, its process is, first three samples of selection are concentrated from training data, including a static ROI (area-of-interest), a positive sample similar with static ROI and one and the negative sample of static ROI dissmilarities, these three samples is allowed to collectively constitute a triple；Then this triple is input in deep learning network and is trained, triple function (distance for the similar ROI that can further) is used in training process；Finally, similar ROI can be formed and be gathered, to achieve the purpose that recognition of face.Present invention employs triple loss functions, have higher accuracy of identification compared to traditional face identification system is played, and operation complexity is relatively low, operation efficiency is higher.

Description

A kind of face identification system of the deep learning network based on triple loss function

Technical field

The present invention relates to field of face identification, more particularly, to a kind of deep learning net based on triple loss functionThe face identification system of network.

Background technology

Recognition of face is a kind of biological identification technology that the facial feature information based on people carries out identification.Use video cameraOr camera acquires image or video flowing containing face, and automatic detect and track face in the picture, and then to detectingFace carry out a series of relevant treatments of face, usually also referred to as Identification of Images, face recognition.Enterprise, house safety andManagement aspect, face recognition technology can be used for access control and attendance system, recognition of face antitheft door etc.；In public security, the administration of justice and criminal investigation sideFace carries out criminal in combination with face recognition technology face database and chases in the world；In E-Government and e-commerceUse aspect can be more accurate since face identification system is using biological characteristic (rather than traditional character password)Accomplish that party is unified in online digital identity and true identity, to greatly increase e-commerce and e-government systemReliability；In addition, face recognition technology is also in the fields extensive use such as space flight, electric power, frontier inspection, education.However, current faceIdentifying system operation complexity is higher, and operation efficiency is low, and the accuracy identified is also not high enough.

A kind of face identification system of the deep learning network based on triple loss function proposed in the present invention, first fromTraining data concentrates three samples of selection, including a static ROI (area-of-interest), a positive sample similar with static ROIThis and one and the negative sample of static ROI dissmilarities, allow these three samples to collectively constitute a triple；Then by this threeTuple is input in deep learning network and is trained, used in training process triple function (the similar ROI that can further away fromFrom)；Finally, similar ROI can be formed and be gathered, to achieve the purpose that recognition of face.Present invention employs triple loss function,Have higher accuracy of identification compared to a traditional face identification system is played, and operation complexity is relatively low, operation efficiency compared withIt is high.

Invention content

Higher for current face identification system operation complexity, operation efficiency is low, and the accuracy identifiedThe problems such as not high enough, a kind of recognition of face system of the deep learning network based on triple loss function proposed in of the inventionSystem first concentrates three samples of selection, including a static ROI, a positive sample similar with static ROI and one from training dataA negative sample with static ROI dissmilarities allows these three samples to collectively constitute a triple；Then this triple is defeatedEnter into deep learning network and be trained, triple function (distance for the similar ROI that can further) is used in training process；MostAfterwards, similar ROI can be formed and be gathered, to achieve the purpose that recognition of face.

To solve the above problems, the present invention provides a kind of face knowledge of the deep learning network based on triple loss functionOther system, main contents include：

(1) mutual correlation matching convolutional neural networks (CCM-CNN)；

(2) Heavenly Stems and Earthly Branches integrate convolutional neural networks (TBE-CNN)；

(3) depth convolutional neural networks integrate (HaarNet)；

(4) Performance Evaluation.

Wherein, the mutual correlation matches convolutional neural networks, uses matrix Hadamard product, is followed by one and connects entirelyLayer is connect, for simulating adaptive weighted cross-correlation technique；Face characterization is learnt using a kind of method optimized based on tripleDiscriminate, these face characterizations be based on triple, including positive sample and negative sample video interested region (ROI) withAnd corresponding static state ROI；It is non-targeted a based on static and video by generating in order to further increase the robustness of maskThe synthesis face of body ROI, it includes multinomial information to make the trim process of CCM-CNN；It is main that mutual correlation matches convolutional neural networksIncluding three parts：Feature extraction, mutual correlation matching and triple loss optimization.

Further, the feature extraction is realized by feature extraction pipeline, for not sympathizing with the same objectThe ROI obtained under condition carries out the extraction of distinctive feature；Feature extraction pipeline includes three sub-networks, is corresponded to respectively static, justThe face of sample and negative sample；Each sub-network includes 9 convolutional layers, is a space batch standard after each convolutional layerLayer loses layer and line rectification function layer.

Further, mutual correlation matching, can be efficiently to spy mainly using a kind of pixel matching methodSign mapping is compared, and is estimated it and matched similitude；Comparison process includes mainly three parts：Matrix product, full connectionLayer and Softmax layers；The method uses Feature Mapping to indicate ROI, these Feature Mappings are multiplied to tripleROI carries out own coding, greatly reduces the complexity of comparison.

Further, triple loss optimization, is efficiently trained using a two-way triple majorized functionNetwork；In order to make triple loss optimization and Web-compatible, need to add additional feature extraction branch in a network；TripleLoss can be indicated with following formula：

Wherein, S_tp、S_tnAnd S_npSimilarity score in being matched for mutual correlation (is respectively static ROI and positive sample ROI'sCompare score, the comparison score of the comparison score and positive sample ROI and negative sample ROI of static ROI and negative sample ROI).

Wherein, the Heavenly Stems and Earthly Branches integrate convolutional neural networks, can be used for the face from overall face image and Heavenly Stems and Earthly Branches networkComplementary feature is extracted in tag block；It is artificial synthesized from static image (mainly to adopt in order to be emulated to real video dataManually out of focus and dynamic fuzzy learns to fuzzy insensitive face characterization) fuzzy training data；TBE-CNN includes oneCore network and multiple branching networks, some utility layer of Heavenly Stems and Earthly Branches network, for being implanted into global and local information, this methodIt reduces and calculates cost and effectively merged information；The output feature schematic diagram of Heavenly Stems and Earthly Branches network is together in series to connect entirelyIt connects and generates final face characterization in layer.

Wherein, the depth convolutional neural networks are integrated, can efficiently learn the strong face characterization of distinctiveness to meetVideo face identifies；HaarNet contains a core network and three branching networks, and the design of these networks is for being implanted intoFacial characteristics, posture feature and other distinctive features；In addition, in order to promote discrimination, it is more that HaarNet uses one kindThe training method in stage, and additionally use a second-order statistics standard triple loss equation and obtained from changing between class in classIt wins the confidence breath；Finally, the correlation information of face ROI is implanted into a fine tuning stage, these information, which are stored in, to be logined and promotedIdentify the stage of accuracy.

Further, the multistage training method includes mainly that triple is inputted HaarNet, HaarNetIt exports result and carries out L₂Standardization indicates and carries out triple loss processing, and this training method can efficiently optimizeThe inner parameter of HaarNet.

Wherein, the Performance Evaluation comments this system using Cox face database (Cox Face DB)Estimate, the facial information in Cox Face DB includes the high quality mug shot shot under controllable environment from stillcameraWith video equipment in the non-controllable low quality face-image shot；Mainly there are two aspects for Performance Evaluation：By static figurePiece and video image compared, the assessment of computational complexity.

Further, the assessment of the computational complexity, computational complexity depends mainly on operational process, and (matching is staticROI and video ROI) quantity, the quantity of network parameter and the number of plies；Computational complexity determines the efficiency of recognition of face.

Description of the drawings

Fig. 1 is a kind of system frame of the face identification system of the deep learning network based on triple loss function of the present inventionFrame figure.

Fig. 2 is a kind of network rack of the face identification system of the deep learning network based on triple loss function of the present inventionComposition.

Fig. 3 is a kind of training stream of face identification system of the deep learning network based on triple loss function of the present inventionCheng Tu.

Specific implementation mode

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phaseIt mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.

Fig. 1 is a kind of system frame of the face identification system of the deep learning network based on triple loss function of the present inventionFrame figure.Include mainly mutual correlation matching convolutional neural networks, the Heavenly Stems and Earthly Branches integrate convolutional neural networks, depth convolutional neural networks integrateAnd Performance Evaluation.

Mutual correlation matches convolutional neural networks, uses matrix Hadamard product, is followed by a full articulamentum, is used for mouldIntend adaptive weighted cross-correlation technique；Face characterization discriminate is learnt using a kind of method optimized based on triple, theseFace characterization be based on triple, including positive sample and the video interested region (ROI) of negative sample and corresponding quietState ROI；In order to further increase the robustness of mask, by generating the conjunction based on static and the non-targeted individual ROI of videoAt face, it includes multinomial information to make the trim process of CCM-CNN；It includes three portions that mutual correlation, which matches convolutional neural networks mainly,Point：Feature extraction, mutual correlation matching and triple loss optimization.

The Heavenly Stems and Earthly Branches integrate convolutional neural networks, can be used for from the facial markers block of overall face image and Heavenly Stems and Earthly Branches network extractingComplementary feature；In order to be emulated to real video data, it is artificial synthesized from static image (it is main using artificial out of focus andDynamic fuzzy learns to fuzzy insensitive face characterization) fuzzy training data；TBE-CNN includes core network and moreA branching networks, some utility layer of Heavenly Stems and Earthly Branches network, for being implanted into global and local information, this method, which reduces, to be calculated asOriginally and information has effectively been merged；The output feature schematic diagram of Heavenly Stems and Earthly Branches network is together in series to be generated most in full articulamentumWhole face characterization.

Depth convolutional neural networks are integrated, can efficiently learn the strong face characterization of distinctiveness to meet video face knowledgeNot；HaarNet contains a core network and three branching networks, the designs of these networks be for be implanted into facial characteristics,Posture feature and other distinctive features；In addition, in order to promote discrimination, HaarNet uses a kind of multistage instructionPractice method, and additionally uses a second-order statistics standard triple loss equation and obtain information from changing between class in class；Finally, the correlation information of face ROI is implanted into a fine tuning stage, these information, which are stored in, logins and promoted identification accuratelyThe stage of property.

Performance Evaluation is assessed this system, Cox Face using Cox face database (Cox Face DB)Facial information in DB includes that the high quality mug shot shot under controllable environment from stillcamera and video equipment existThe low quality face-image shot in the case of non-controllable；Mainly there are two aspects for Performance Evaluation：By static picture and video imageIt is compared, the assessment of computational complexity.

Wherein, the assessment of computational complexity, computational complexity depend mainly on operational process and (match static ROI and videoROI the quantity of quantity, network parameter and the number of plies)；Computational complexity determines the efficiency of recognition of face.

Fig. 2 is a kind of network rack of the face identification system of the deep learning network based on triple loss function of the present inventionComposition.Include mainly feature extraction, mutual correlation matching and triple loss optimization so that mutual correlation matches convolutional neural networks as an exampleThree parts.

Feature extraction realized by feature extraction pipeline, the ROI for being obtained in varied situations to the same object intoThe extraction of row distinctive feature；Feature extraction pipeline includes three sub-networks, corresponds to the face of static, positive sample and negative sample respectivelyPortion；Each sub-network includes 9 convolutional layers, is a space batch index bed after each convolutional layer, loses layer and linearRectification function layer.

Mutual correlation matches, and mainly using a kind of pixel matching method, can efficiently be compared to Feature Mapping,And estimates it and match similitude；Comparison process includes mainly three parts：Matrix product, full articulamentum and Softmax layers；ThisMethod uses Feature Mapping to indicate ROI, these Feature Mappings are multiplied to carry out own coding to the ROI of triple, greatlyThe big complexity for reducing comparison.

Triple loss optimization, network is efficiently trained using a two-way triple majorized function；In order to make ternaryGroup loss optimization and Web-compatible, need to add additional feature extraction branch in a network；Triple loss can be used to lower public affairsFormula indicates：

Fig. 3 is a kind of training stream of face identification system of the deep learning network based on triple loss function of the present inventionCheng Tu.By taking the training flow that depth convolutional neural networks integrate as an example, this training method is a kind of multistage training method, mainInclude the output result progress L that triple is inputted to HaarNet, HaarNet₂Standardization indicates and carries out triple lossProcesses, this training methods such as processing can efficiently optimize the inner parameter of HaarNet.

For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present inventionIn the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hairBright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention'sProtection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the inventionMore and change.

Claims

1. a kind of face identification system of the deep learning network based on triple loss function, which is characterized in that include mainlyMutual correlation matches convolutional neural networks (one)；The Heavenly Stems and Earthly Branches integrate convolutional neural networks (two)；Depth convolutional neural networks integrate (three)；Performance Evaluation (four).

2. based on the mutual correlation matching convolutional neural networks (one) described in claims 1, which is characterized in that mutual correlation matching volumeProduct neural network (CCM-CNN) uses matrix Hadamard product, is followed by a full articulamentum, adaptive weighted for simulatingCross-correlation technique；Face characterization discriminate is learnt using a kind of method optimized based on triple, these face characterizations are basesIn triple, including positive sample and negative sample video interested region (ROI) and corresponding static state ROI；In order intoOne step improves the robustness of mask, by generating based on static and the non-targeted individual ROI of video synthesis face, makes CCM-The trim process of CNN includes multinomial information；It includes three parts that mutual correlation, which matches convolutional neural networks mainly,：Feature extraction,Mutual correlation matches and triple loss optimization.

3. based on the feature extraction described in claims 2, which is characterized in that realized by feature extraction pipeline, for sameThe ROI that one object obtains in varied situations carries out the extraction of distinctive feature；Feature extraction pipeline includes three sub-networks,The face of static, positive sample and negative sample is corresponded to respectively；Each sub-network includes 9 convolutional layers, after each convolutional layerIt is a space batch index bed, loses layer and line rectification function layer.

4. based on the mutual correlation matching described in claims 2, which is characterized in that mutual correlation matches mainly using a kind of picturePlain matching process can efficiently compare Feature Mapping, and estimate it and match similitude；Comparison process includes mainlyThree parts：Matrix product, full articulamentum and Softmax layers；The method uses Feature Mapping to indicate ROI, by these spiesSign mapping is multiplied to carry out own coding to the ROI of triple, greatly reduces the complexity of comparison.

5. losing optimization based on the triple described in claims 2, which is characterized in that optimize letter using a two-way tripleIt counts efficiently to train network；In order to make triple loss optimization and Web-compatible, need to add additional feature in a networkExtracting branch；Triple loss can be indicated with following formula：

Wherein, S_tp、S_tnAnd S_npSimilarity score in being matched for mutual correlation (is respectively the comparison of static ROI and positive sample ROIThe comparison score of score, the comparison score and positive sample ROI and negative sample ROI of static ROI and negative sample ROI).

6. integrating convolutional neural networks (two) based on the Heavenly Stems and Earthly Branches described in claims 1, which is characterized in that the Heavenly Stems and Earthly Branches integrate convolution godIt can be used for extracting complementary feature from the facial markers block of overall face image and Heavenly Stems and Earthly Branches network through network (TBE-CNN)；ForReal video data emulated, it is artificial synthesized from static image (main out of focus to be learned with dynamic fuzzy using artificialPractise to fuzzy insensitive face characterization) obscure training data；TBE-CNN includes a core network and multiple branching networks,Some utility layer of Heavenly Stems and Earthly Branches network, for being implanted into global and local information, this method reduce calculate cost and effectivelyInformation is merged in ground；The output feature schematic diagram of Heavenly Stems and Earthly Branches network is together in series to generate final facial table in full articulamentumSign.

7. integrating (three) based on the depth convolutional neural networks described in claims 1, which is characterized in that depth convolutional Neural netNetwork, which integrates (HaarNet), can efficiently learn the strong face characterization of distinctiveness to meet video face identification；HaarNet is containedOne core network and three branching networks, the designs of these networks be for be implanted into facial characteristics, posture feature and otherDistinctive feature；In addition, in order to promote discrimination, HaarNet uses a kind of multistage training method, and also usesOne second-order statistics standard triple loss equation obtains information from changing between class in class；Finally, rank is finely tuned at oneThe correlation information of implantation face ROI, these information are stored in the stage for logining and being promoted identification accuracy in section.

8. based on the multistage training method described in claims 7, which is characterized in that main includes inputting tripleThe output result of HaarNet, HaarNet carry out L₂Standardization indicates and carries out triple loss processing, this training method energyEnough inner parameters for efficiently optimizing HaarNet.

9. based on the Performance Evaluation (four) described in claims 1, which is characterized in that use Cox face database (CoxFace DB) this system to be assessed, the facial information in Cox Face DB includes from stillcamera in controllable environmentThe high quality mug shot and video equipment of lower shooting are in the non-controllable low quality face-image shot；Performance Evaluation masterIt will be there are two aspect：Static picture and video image are compared, the assessment of computational complexity.

10. the assessment based on the computational complexity described in claims 9, which is characterized in that computational complexity depends mainly onQuantity, the quantity of network parameter and the number of plies of operational process (matching static ROI and video ROI)；Computational complexity determines peopleThe efficiency of face identification.