Input	Convolution operation	" expansion " times t	Port number c	Unit number n	Span s
						24x24x3	Conv2d	-	8	1	2
12x12x8	Convolution unit	6	16	2	2
						6x6x16	Convolution unit	6	24	2	2
3x3x24	3x3 convolution unit	-	32	2	1
						1x1x32	Conv2d 1x1	-	96	1	-

It should be noted that BR-Net network is mainly used for being trained the candidate region of face, from candidate regionRemove non-face region.In one embodiment, BR-Net network is according to the image of 24 × 24 resolution ratio of input to faceCandidate region be trained, to remove non-face region.

Wherein, the convolution frame of BO-Net network is as shown in table 3, the visible Figure 11 of corresponding with table 3 convolutional coding structure.

The convolution frame of 3 BO-Net network of table

Input	Convolution operation	" expansion " times t	Port number c	Unit number n	Span s
						48x48x3	Conv2d	-	8	1	2
12x12x8	Convolution unit	6	16	2	2
						6x6x16	Convolution unit	6	24	2	2
3x3x24	3x3 convolution unit	-	48	2	1
						1x1x48	Conv2d 1x1	-	128	1	-

It should be noted that BO-Net network is mainly used in the candidate region for removing non-face region to face keyPosition is positioned, and obtains human face region according to the positioning result of face key position.In one embodiment, BO-Net netNetwork positions face key position in candidate region according to the image of 12 × 12 resolution ratio of input, from two eye centers,Nose, two corners of the mouths five human body key positions determine face, and obtain human face region.

It will be understood to those skilled in the art that in the lightweight deep neural network BFACENET that the present embodiment usesForm cascade structure between BP-Net network, BR-Net network and BO-Net network, include in each network one orMultiple bottleneck convolution units (available symbols BottleNeck expression), bottleneck convolution unit BottleNeck are used for the figure to inputAs carrying out process of convolution operation, since the structure of bottleneck convolution unit BottleNeck is simple, conducive to constructed network is reducedParameter, from the arithmetic speed for accelerating Face datection.

The objective function of the present embodiment training associated depth neural network model are as follows:

Formula (1-1) is to (1-4), y_iRepresent the sample label of face, p_iFor the probability for obtaining face；Det is face classificationTask (or recurrence task for Face datection), box are that bounding box returns task (or judgement task for face), and mark is indicatedKey point location tasks；∝_jIt is returned for face classification, bounding box, three tasks of crucial point location are being presently in stage damageBecome homeless weight (the preferably ∝ in the present embodiment accounted for_det=0.5, ∝_box=0.25, ∝_mark=0.25)；Whether to be faceInstruction scalar, use 1 indicates face, and use 0 indicates no face；I, j indicates the serial number of current task, and when subscript indicates currentTask category, the stage locating for current task is indicated when subscript；L is loss function；Double absolute value signs indicate secondary normal formIt calculates；In { } indicates set operation.

It should be noted that formula (1-1) to (1-4) can be seen that, the result of upper layer sub-network is by next straton network instituteIt uses, to reach mutual cascade effect between BP-Net network, BR-Net network and BO-Net network.

It should be noted that the face classification in Fig. 9, Figure 10 and Figure 11 is the vector of a 1x1x2, that is to say, that it has1 or 0 result indicates；Bounding box recurrence is the vector of a 1x1x4, the coordinate of main output boundary frame；Crucial point location isThe vector of one 1x1x10, the main coordinate for exporting 5 key points of face.

Step S330.Recognition of face processing unit 13 detects human face region from image to be detected.In one embodiment,Recognition of face processing unit 13 obtains human face region according to the face key position positioned in BO-Net network, using as detectingHuman face region.

Step S400, when currently once detecting human face region in image sequence, to image to be detected carry out face withTrack processing.In one embodiment, the human face region that face tracking processing unit 14 once detects in image sequence before obtaining,KCF target following processing is carried out to the preceding human face region once detected in image sequence in image to be detected, thusHuman face region into image to be detected, then, step S400 may include step S410-S430.

Step S410, the human face region once detected in image sequence before obtaining.In one embodiment, precedingIn the case that one-time detection goes out human face region, face tracking processing unit 14 obtains the previous frame figure of image to be detected in video flowingThe human face region detected as in.

Step S420 carries out KCF target following processing in image to be detected.In one embodiment, face trackingThe human face region detected in previous frame image and image to be detected are input to KCF target following processing by processing unit 14 jointlyAlgorithm among, target following is carried out to the human face region that detects in previous frame image in image to be detected, to obtainHuman face region in image to be detected.

It should be noted that the algorithm of KCF target following processing is more common in field of image processing, it is often used for pairTarget object in image is tracked analysis, and this method is typically all one object detector of training in tracing process,It goes whether detection next frame predicted position is target using object detector, then reuses new testing result and go to update training setAnd then update object detector.In training objective detector, general target area of choosing is positive sample, the peripheral region of targetFor negative sample, a possibility that closer to mesh target area being positive sample, is bigger.This formal for using KCF target following Processing AlgorithmKind of characteristic, the present embodiment realize according to the human face region in previous frame image to the human face region in image to be detected carry out withThe purpose of track.Since KCF target following Processing Algorithm is the prior art, therefore is no longer described in detail here.

Step S430 obtains the human face region in image to be detected.In one embodiment, human face target tracking is handledUnit 14 identifies the human face region obtained according to KCF target following Processing Algorithm, to obtain in image to be detectedHuman face region.

It should be noted that KCF target following Processing Algorithm during track human faces region, can be produced inevitablyRaw drift may cause the accuracy decline of target following, to avoid this situation, in the next frame image to image to be detectedBefore carrying out human face target tracking processing, the face confidence in step S200 is also carried out to the human face region in image to be detectedDegree calculates, and drifting problem brought by KCF target tracking algorism can be effectively prevented by face confidence level, to be stepJudging result in S200 provides accurate judgment basis.

Step S500 exports the human face region in described image to be detected according to the result of processing.In one embodiment, seeFig. 1, output unit 15 is by people in the human face region that recognition of face processing unit 13 detects in step S330 or step S430The human face region that face tracking treatment unit 14 detects carries out rectangle marked, continuously exports each frame image in image sequence,If detecting human face region in the current frame image of output, the rectangle mark of human face region on current frame image and image is exportedNote only exports current frame image if not detecting human face region in the current frame image of output.

Embodiment two:

Referring to FIG. 6, the human face detection device 2 of another embodiment is also disclosed in the application comprising people in embodiment oneFace detection device 1 further includes frame number judging unit 16 and ROI region computing unit 17, illustrates separately below.

The frame number judging unit 16 is located between detection judging unit 12 and face tracking processing unit 14, for schemingWhen as detecting human face region in sequence, each frame image of detection is counted, when count results are more than preset frameWhen number (preset frame number available symbols T is indicated, preferably using the numerical value within the scope of 48-128), then to described image to be detectedROI region calculating is carried out, conversely, then carrying out face tracking processing to image to be detected and removing count results to carry out next roundCounting.

ROI region computing unit 17 is connect with frame number judging unit 16 and recognition of face processing unit 13, in frame numberWhen judging unit 16 judges that calculated result is more than preset frame number, according to the preceding human face region once detected in image sequenceROI region calculating is carried out to image to be detected, obtains the discreet area of face in image to be detected.Then ROI region calculates singleThe discreet area of face in image to be detected is input in recognition of face processing unit 13 by member 17, in the discreet area of faceThe treatment process of the interior lightweight deep neural network for carrying out recognition of face, to detect face area in image to be detectedDomain.

It should be noted that area-of-interest (ROI) is the region selected from image in field of image processing,This region the emphasis of interest as image analysis is drawn a circle to approve into the region to be further processed, when using ROI regionThe target object in image can quickly be obtained, it is possible to reduce the processing time increases processing accuracy.

It should be noted that KCF target following Processing Algorithm employed in face tracking processing unit 14 for a long time,It will lead to drifting problem when continuously handling image, will seriously affect the detection effect of human face region, and frame number judging unit 16It can be carried out continuously in face tracking processing unit 14 after certain face tracking number of processes, ROI is carried out to image to be detectedRegion calculates, to carry out recognition of face processing in ROI region, quickly detects standard conducive within lesser image-regionTrue human face region, and then achieve the purpose that carry out position correction to human face region, it avoids face tracking processing unit 14 and existsFallibility when KCF target following processing and the drifting problem of generation are carried out to human face region.

Referring to FIG. 7, correspondingly, another method for detecting human face is also disclosed in the present embodiment comprising step S100-S600。

Method for detecting human face in the present embodiment two has had more step relative to the method for detecting human face in embodiment oneS600, step S600 may include step S610-S630, illustrate separately below.

Step S610 is located at before step S400, can be described as frame number judgment step.In one embodiment, which judgesStep includes:

It detects to start (i.e. judging result is when being to detection judging unit 12 for the first time) when human face region in image sequence,Each frame image of detection is counted, when count results are more than preset frame number T (preferably using within the scope of 48-128Numerical value), then S620 is entered step, conversely, then entering step S400.

It should be noted that be so that the counting arbitration functions in step S610 persistently carry out, when entering step S400,Frame number judging unit 16 will remove count results to carry out the counting of next round, that is, detect the judging result of judging unit 12 againSecondary is when being, frame number judging unit 16 restarts to count.

Step S620 carries out the area ROI to image to be detected according to the preceding human face region once detected in image sequenceDomain calculates, and obtains the discreet area of face in image to be detected.In one embodiment, calculating process are as follows:

ROI_W=T1*FACE_W (2-3)

ROI_H=T2*FACE_H (2-4)

In above formula, ROI_XFor the x coordinate of the top left corner pixel point of area-of-interest, ROI_YFor the upper left corner of area-of-interestThe y-coordinate of pixel, then, (FACE_X, FACE_Y) be the human face region detected in previous frame image top left corner pixel pointCoordinate.FACE_WAnd FACE_HThe width and height of the human face region respectively detected in previous frame image, ROI_WAnd ROI_HTo needThe height and width for the area-of-interest to be calculated；T1, T2 be respectively the threshold value of the customized setting of user (rule of thumb, preferablyT1=2.5, T2=1.6 is arranged in ground).If ROI_X、ROI_Y、FACE_WAnd FACE_HValue exceed image boundary, then with image boundaryCoordinate value be true value.

The discreet area of face in image to be detected is input to a lightweight depth for being used for recognition of face by step S630Neural network, to detect human face region from image to be detected.In one embodiment, ROI region computing unit 17 will be to be checkedThe discreet area of face is input in face identification unit 13 in altimetric image, and discreet area is input to the lightweight of recognition of faceIn deep neural network BFACENET, to obtain the human face region in image to be detected.The lightweight depth mind of recognition of faceComposition and calculation method through network can refer to the step S300 in embodiment one, and it will not be described here.

It will be understood by those of skill in the art that using the method needs pair of lightweight deep neural network in step S300Entire image to be detected is scanned, and by traversal whole image come locating human face region, therefore will increase many invalid timesBetween lasting.And the characteristics of temporal correlation in image sequence is utilized in step S630, using lightweight deep neural networkMethod only needs to be scanned the image in face discreet area, so that the range of Face datection traversal be schemed by wholeAs narrowing down to the range estimated, the sliding of good-for-nothing's face can be reduced, the interference in non-face region, and then the fortune of boosting algorithm are removedCalculate speed.

It will be understood by those skilled in the art that all or part of function of various methods can pass through in above embodimentThe mode of hardware is realized, can also be realized by way of computer program.When function all or part of in above embodimentWhen being realized by way of computer program, which be can be stored in a computer readable storage medium, and storage medium canTo include: read-only memory, random access memory, disk, CD, hard disk etc., it is above-mentioned to realize which is executed by computerFunction.For example, program is stored in the memory of equipment, when executing program in memory by processor, can be realizedState all or part of function.In addition, when function all or part of in above embodiment is realized by way of computer programWhen, which also can store in storage mediums such as server, another computer, disk, CD, flash disk or mobile hard disksIn, through downloading or copying and saving into the memory of local device, or version updating is carried out to the system of local device, when logicalWhen crossing the program in processor execution memory, all or part of function in above embodiment can be realized.

Use above specific case is illustrated the present invention, is merely used to help understand the present invention, not to limitThe system present invention.For those skilled in the art, according to the thought of the present invention, can also make several simpleIt deduces, deform or replaces.