Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring specifically to fig. 1, fig. 1 is a schematic diagram of a monitoring system of the present application. As shown, the monitoring system includes a camera, a radar, and a processing system. The camera is in communication connection with the processing system. The radar is in communication with the processing system. The camera is in direct communication connection with the radar or in indirect communication connection through the processing system. The communication connection here may comprise a wired or wireless connection.
The number of cameras is not limited in the present application, and although two cameras are shown in fig. 1, the monitoring system of the present application may include only one camera, or may include more than two cameras. The effective scanning range of the radar covers the shooting range of all cameras.
Alternatively, the monitoring system of the present application may not include a separate processing system, and its corresponding functions may be implemented by the camera and the processor included in the radar itself. The specific method of the present application will be described in detail below, taking as an example the processor included in the camera and the radar itself. The specific methods of the present application may be realized centrally by the same processor.
Specifically, before the monitoring system is put into use, the camera and the radar need to be calibrated.
The calibration of the camera is used for determining the corresponding relation between pixel points in the imaging of the camera and space points in the real world.
In one embodiment, the calibration of the camera is mainly used to determine the position of a human body corresponding to a target object such as a human face in the image of the camera in a two-dimensional coordinate system of the camera (for example, the ground monitored by the camera).
The imaging position of the three-dimensional object in the real world in the camera is related to the position of the three-dimensional object in the real world, the angle and the position of the camera, the imaging distortion of the camera and the like. The calibration of the camera determines the corresponding relationship between the pixel points and the space points through a reference object, for example. Optionally, the calibration is performed by using a camera self-calibration method.
For example, in a crosswalk red light running monitoring system to which the present application is applied, the position and angle of a camera in use are generally fixed, and the size of pedestrians monitored by the camera is regular. The operator can calibrate the camera by means of a person or a human body model as a reference.
The application does not limit the specific camera calibration mode.
The calibration of the radar is used to determine the correspondence of points in the scanning space of the radar to spatial points in the real world.
In one embodiment, the calibration of the radar is primarily used to determine the position of a target object scanned by the radar in a two-dimensional coordinate system of the radar (e.g., the ground monitored by the radar).
In some embodiments, calibration of the radar further comprises determining the radar reflectivity of a range of targets, or determining a scaling factor between the radar echo intensity and the radar receiver output power value for those targets.
For example, in the pedestrian crossing red light running monitoring system applying the application, by calibrating the radar, on one hand, the corresponding relation between points in the scanning space of the radar and space points in the real world can be established, and on the other hand, the radar reflectivity of common targets such as a road surface, a human body, vehicles and the like can be determined as reference parameters.
The calibration of the camera and the radar also comprises the joint calibration of the camera and the radar. The joint calibration is used for determining the corresponding relationship between points in the scanning space of the radar and pixel points imaged by the camera, or is used for determining the relationship between the two-dimensional coordinate system of the camera and the two-dimensional coordinate system of the radar, namely, the space synchronization of the radar and the camera. This correspondence is typically approximate due to the presence of errors.
The joint calibration of the camera and the radar also includes the field angle of the radar-synchronized camera, i.e. the two-dimensional acquisition area of the camera, i.e. the ground acquisition area of the camera, is marked in the scanning space of the radar.
In some embodiments, the radar may be a rotary mechanical radar, a solid state lidar, or the like. The present application does not limit the specific type of radar as long as it can achieve the object of the present application.
With particular reference to fig. 2. Fig. 2 is a schematic flow chart of the face detection method of the present application. The face detection method of the embodiment comprises the following steps.
Step S11, a first image of a scene captured by a camera is acquired.
In one embodiment, one or more cameras capture corresponding scenes or respective monitored areas to obtain a first image. The first image may be acquired by a separate processing system or a processor integrated with the camera.
The one or more cameras respectively carry out continuous shooting on one or more corresponding scenes, and therefore continuous multi-frame images of the corresponding scenes are obtained.
And S12, identifying at least one face frame T in the first image.
The processor of the camera may process the first image through a face recognition algorithm to identify at least one face frame T. Here, the face frame refers to a frame including a rectangular frame, a circular frame, or any other arbitrary shape that can frame the detected face.
Further, the processor of the camera may mark the at least one face frame T to identify the same at least one face frame T in consecutive multi-frame images, so as to obtain trajectory information of the at least one face frame T, or further obtain motion state information of the at least one face frame T, such as information of speed, acceleration, motion trajectory, motion direction, and the like of the at least one face frame T. For example, the camera detects that the face frame T moves from left to right at a speed V in the multi-frame image.
The processor of the camera may also extract human skeleton information of a region around the face frame from the consecutive multiple frames of the first image using a skeleton extraction algorithm. The human body skeleton information includes human body skeleton characteristics, human body motion posture information such as human body gait characteristics, and the like.
For example, in a pedestrian crossing red light running monitoring system to which the present application is applied, one or more cameras monitor a road surface near a red light. At this time, the processor of the camera processes the first image obtained from the camera to obtain at least one face frame T. According to the mark of the camera, the area of the object corresponding to the at least one face frame T on the road surface can be further obtained. That is, in the scene, the two-dimensional face region corresponding to the at least one face frame T is an approximate region of the human body object corresponding to the at least one face frame T on the road surface.
Step S13, a second image of the scene scanned by the radar is acquired.
In one embodiment, the processor of the radar acquires a second image obtained by scanning the monitoring area of one or more cameras by the radar. In the calibration of the radar and the camera, the radar synchronizes the field angle of the camera, that is, a two-dimensional acquisition area of the camera is obtained. For example, the radar synchronously photographs the target corresponding to the face frame T according to the moving direction and the moving speed of at least one face frame T transmitted from the processor of the camera.
Optionally, the radar scans the corresponding scene upon receiving a request from the processor of the one or more cameras, resulting in the second image.
Optionally, the radar scans the entire scanning area of the radar upon receiving multiple requests from one or more cameras or processors, resulting in the second image.
Optionally, the radar scans the two-dimensional acquisition area of the corresponding camera after receiving the two-dimensional face area corresponding to the at least one face frame T from the camera or the processor.
And S14, dividing a comparison area corresponding to the face frame T on the second image, wherein the coverage of the comparison area to the scene contains and is greater than the coverage of the face frame T to the scene.
With particular reference to fig. 3. Fig. 3 is a schematic diagram of a radar scan area showing asecond image 304, amapping area 300 of a face box T on thesecond image 304, acomparison area 302, one or morehuman candidate boxes 306.
Thesecond image 304 may be obtained by synchronously scanning the acquisition region of the corresponding camera by the radar.
As described above, the camera and the radar are calibrated in advance, and the spatial transformation relationship between the camera coordinate system of the camera and the radar coordinate system of the radar may be known. Therefore, the face frame T can be mapped from the first image to thesecond image 304 by using the above-mentioned spatial mapping relationship, and thecorresponding mapping region 300 can be formed. Further, the mappedregion 300 in thesecond image 304 is enlarged to obtain acomparison region 302.
Optionally, the coordinate systems of the first andsecond images 304 are normalized. That is, the coordinate system of the first image and the coordinate system of thesecond image 304 are made to coincide.
Thecomparison area 302 is formed by taking the center point of themapping area 300 as the center of the comparison area and enlarging the width and height of themapping area 300 according to a certain ratio K.
In one embodiment, the height magnification ratio of themapping region 300 may be set to be greater than the width magnification ratio, and the ratio of the two may be set to the aspect ratio of a normal human body.
Alternatively, the face frame T and themapping area 300 may not correspond precisely due to the presence of measurement errors of the radar and the camera, calibration errors, communication delay in the system, and the like, and thus acomparison area 302 larger than the face frame T needs to be divided.
In one embodiment, K is a real number between 1 and 5. Optionally, K has a value of 3. The value of K is related to the calibration progress, the communication delay between the radar and the camera, the target movement speed and the like.
In yet another embodiment, when the resolutions of the radar and the camera to the same area are different, the normalization operation is performed on the corresponding areas of the radar and the camera.
At least one human body candidate box is identified in the comparison area 302S 15.
As described above, the radar can detect the distance of the target from the radar, i.e., the depth data of the target. Alternatively, the radar can detect the reflectivity of the target to the radar wave. In the present application, a human face is substantially identical to human depth information, and has a large difference in depth from a background such as a road surface. And the reflectivity of the human face and the reflectivity of the human body to the radar waves are basically consistent, and the human face and the background such as a road have larger difference in the reflectivity to the radar waves.
Accordingly, the processor of the radar identifies a continuous area or a single connected area in which the depth difference and/or the reflectivity difference is less than or equal to a preset depth difference threshold and/or reflectivity difference threshold in the comparison area, and selects a minimum rectangular area capable of being framed to the area as the humanbody candidate frame 306.
The depth difference threshold and/or the reflectivity difference threshold are set by a user according to experience and actual application scenarios.
After identifying the human body candidate frame, the radar may mark the identified human body candidate frame and track the marked human body candidate frame through scanning to obtain a motion state of the human body candidate frame, such as a motion track, a motion direction, a speed, an acceleration, and the like.
The processor of the radar may use a skeleton extraction algorithm to obtain the body skeleton information of the body candidate frame in thesecond image 304 of the plurality of frames from the radar through multiple scans. The human body skeleton information includes human body skeleton characteristics, human body motion posture information such as human body gait characteristics, and the like.
The resulting human candidate box according to step S15 is often more than one.
Optionally, the obtained human body candidate frames may be primarily screened to eliminate targets that are obviously non-human bodies and human body candidate frames that are obviously inconsistent with the face frame detected by the camera. And if no face candidate frame remains after the preliminary screening, judging the face in the face frame to be a false face.
The following describes a method of prescreening human candidate frames.
In one embodiment, whether the difference between the motion state of the human body candidate frame and the motion state of the human face frame is larger than or equal to a preset motion difference threshold value or not is judged, and the human body candidate frame larger than or equal to the preset motion difference threshold value is removed, so that the human body candidate frame obviously inconsistent with the human face frame detected by the camera is removed from the human body candidate frames. The preset motion difference threshold may be determined empirically or experimentally by the user.
In this embodiment, the motion state of the human body candidate frame and the motion state of the face frame are obtained from the analysis of the plurality of frames of the first image and the plurality of frames of thesecond image 304, respectively. Wherein the motion state is characterized by one or more of a motion trajectory, a motion direction, a velocity, an acceleration, and the like.
In another embodiment, it is determined whether the difference between the aspect ratio of the human body frame candidates and the aspect ratio standard is greater than or equal to a preset aspect ratio difference threshold, and the human body frame candidates greater than or equal to the aspect ratio difference threshold are removed, so as to remove the human body frame candidates that are obviously non-human from the human body frame candidates.
The aspect ratio of human males, i.e. the ratio of shoulder width to height, is around 0.28 and that of females around 0.25. Due to errors in the actual data processing process of the radar and the influence of human dress and the like, the appropriate preset aspect ratio difference thresholds may be different. The user may determine the preset aspect ratio difference threshold by performing field experiments on the monitoring system. The user may also determine the preset aspect ratio difference threshold based on past experience.
In another embodiment, whether the difference between the first skeleton information of the human body candidate frame and the human body skeleton standard is larger than or equal to a preset skeleton difference threshold value or not is judged, and the human body candidate frame larger than or equal to the skeleton difference threshold value is removed, so that the human body candidate frame obviously non-human is removed from the human body candidate frames.
The human skeleton standard includes, for example, human skeleton topology information, limb posture information corresponding to human body movement, and the like.
For example, in a monitoring scene of a pedestrian crossing, a human body candidate frame corresponding to a passing pedestrian has very obvious human body limb information. Such as a portrait poster carried by a person or vehicle, is rejected because there is no limb movement posture information corresponding to the movement.
In another embodiment, whether the difference between the first skeleton information and the second skeleton information of the surrounding area of the face frame is greater than or equal to a preset skeleton difference threshold value or not is judged, and the human body candidate frame greater than or equal to the skeleton difference threshold value is removed, so that the human body candidate frame obviously inconsistent with the face frame detected by the camera is removed from the human body candidate frame.
Optionally, determining the difference between the first skeletal information and the second skeletal information comprises determining a difference in motion pose information in the first skeletal information and the second skeletal information.
And S16, evaluating the truth based on the human body candidate frame.
In one embodiment, the similarity between at least one index obtained by the human body candidate frame and the reference standard is calculated and used as the score of the human body candidate frame. And judging whether the score of the human body candidate frame is greater than or equal to a preset score threshold value. And if the score is greater than or equal to the score threshold value, judging the face in the face frame to be a real face.
Optionally, the calculating the similarity between the at least one index obtained through the human body candidate box and the reference standard includes: calculating the similarity between the aspect ratio of the human body candidate frame and a preset human body aspect ratio standard; calculating the similarity between the motion state of the human body candidate frame and the motion state of the human face frame; the method includes the steps of calculating similarity between first skeleton information obtained through a human body candidate frame and a human body skeleton standard and calculating similarity between the first skeleton information and second skeleton information obtained through a peripheral region of a face frame of a first image.
Optionally, the score is a weighted sum of one or at least two of the above similarities.
Optionally, if the maximum value of the scores of one or more face frames is greater than or equal to the score threshold, the face in the face frame is determined to be a real face. Otherwise, the face in the face frame is judged to be a false face.
Alternatively, if the face in the face frame is determined to be a dummy face, the step S14 is re-entered according to the information of the face frame that is tracked and updated by the camera until the face in the face frame is determined to be a real face, or until a preset execution time is exceeded. And if the execution time exceeds the preset execution time, judging the face in the face frame to be a false face. The preset execution time is set by the user based on experience or experiment.
To sum up, this application sets up the comparative region that corresponds through the people's face frame that the camera detected in the radar image to in the comparative region discernment human candidate frame, reuse human candidate frame come the supplementary authenticity of judging the people's face, can utilize the supplementary authenticity of judging the people's face that the camera detected of radar effectively.
Compared with the prior art, this application does not need radar and camera to combine by force to a radar can be served for a plurality of cameras simultaneously. Therefore, by using the method of the invention, the existing camera can be combined with the radar after simple software upgrading and other operations. Therefore, the invention can fully utilize the existing social resources and has low cost and high efficiency.
The above description is only an embodiment of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.