Disclosure of Invention
The embodiment of the invention aims to provide an image detection method and device, which can determine the size and the shape of a target to be detected at a detection position.
In order to achieve the above object, in one aspect, there is provided an image detection method, including the steps of:
step one, acquiring monitoring video data of monitoring equipment;
step two, obtaining calibration information of the monitoring equipment and actual shape information of the detected target;
and thirdly, acquiring the shape and size information of the detected target at the detection position according to the detection position in the image to be detected of the video data, the calibration information and the actual shape information.
Preferably, the method further comprises the step of,
and fourthly, in the process of target detection based on statistics, searching the target with the corresponding scale according to the shape and size information.
Preferably, in the above method, the calibration information includes: normalized vector of vanishing lineAnd a vanishing point vector v in the vertical direction; the actual shape information is the actual height Z and the actual length-width ratio of the detected target, and the detection position is the top coordinate t of the detected target in the image to be detected.
Preferably, in the above method, the third step specifically includes:
using the formula: <math> <mrow> <mi>αZ</mi> <mo>=</mo> <mfrac> <mrow> <mo>-</mo> <mo>|</mo> <mo>|</mo> <mi>b</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> <mrow> <mrow> <mo>(</mo> <mover> <mi>l</mi> <mo>^</mo> </mover> <mo>.</mo> <mi>b</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <mi>v</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>,</mo> </mrow></math>calculating the bottom coordinate b of the detected target in the image to be detected, wherein the parameter alpha is a prior value obtained through a test;
obtaining the image height h of the detected target in the image to be detected according to the difference value of the top coordinate t and the bottom coordinate b;
and determining the shape and size information of the detected target at the detection position according to the image height h and the actual length-width ratio.
Preferably, in the above method, the parameter α is obtained by: measuring the top coordinate t and the bottom coordinate b of a known target in a known image of said video data and de-interlacing the vector
The vanishing point vector v and the actual height Z of the known target are substituted into the formula to obtain the parameter α.
Preferably, in the above method, in the second step, the calibration information is known information stored in the monitoring device; or,
obtaining the calibration information by: and acquiring geometric information for calibration from the video data, and calculating the calibration information according to the geometric information.
Preferably, in the above method, the geometric information includes: two pairs of parallel lines parallel to but not in the same direction as the ground, and one pair of parallel lines perpendicular to the ground.
In another aspect of the present invention, an apparatus for image detection includes:
a data acquisition module to: acquiring monitoring video data of monitoring equipment;
a calibration information acquisition module configured to: obtaining calibration information of the monitoring equipment;
an actual shape information acquisition module to: acquiring actual shape information of the detected target;
a calculation module to: and acquiring the shape and size information of the detected target at the detection position according to the detection position in the image to be detected of the video data, the calibration information and the actual shape information.
Preferably, in the above apparatus, the detection module is configured to: and in the process of target detection based on statistics, target search of corresponding scales is carried out according to the shape and size information.
Preferably, in the above apparatus, the calculating module is configured to calculate the target value by formula
<math> <mrow> <mi>αZ</mi> <mo>=</mo> <mfrac> <mrow> <mo>-</mo> <mo>|</mo> <mo>|</mo> <mi>b</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> <mrow> <mrow> <mo>(</mo> <mover> <mi>l</mi> <mo>^</mo> </mover> <mo>.</mo> <mi>b</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <mi>v</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> </mrow></math>Calculating the bottom coordinate b of the detected target in the image to be detected, wherein alpha is a prior value obtained through a test, Z is the actual height of the detected target, t is the top coordinate of the detected target in the image to be detected,
is a normalized vanishing line vector, and v is a vanishing point vector in the vertical direction.
Preferably, in the above apparatus, the calibration information obtaining module includes:
a geometric information extraction unit, configured to obtain geometric information for calibration from the video data, where the geometric information includes: two pairs of parallel lines parallel to the ground but in different directions, and one pair of parallel lines perpendicular to the ground;
a calibration unit for obtaining normalized vanishing line vector according to the geometric information
And a vanishing point vector v in the vertical direction.
Preferably, in the above apparatus, the actual shape information acquiring module further includes a storage unit, and the storage unit stores actual heights and actual length-width ratios of the plurality of targets to be detected.
Preferably, in the above apparatus, the plurality of objects to be detected includes a human object and an automobile object, the actual height of the human object is set as a statistical average of human heights, the statistical average of human heights is any value between 165 cm and 175 cm, and the actual aspect ratio of the human object is set as a statistical average of human aspect ratios.
The embodiment of the invention has at least the following technical effects:
1) the embodiment of the invention expands the application based on single-view calibration, and can reversely calculate the shape and size information of the detected target at the detection position according to the calibration information of the single view and the actual shape information of the detected target, namely the shape and size information of the real target after projection imaging is obtained.
2) The embodiment of the invention applies the obtained shape and size information of the detected target at the detection position to the target detection process based on statistics, and searches the target with corresponding scale according to the shape and size information, thereby reducing the calculated amount and accelerating the detection speed.
3) The embodiment of the invention stores the actual shape information of real targets (such as cars, heads and the like) in advance, and because the shapes and sizes of the targets of the same type are consistent, the embodiment of the invention sets an average shape value for the targets of different types, can be directly applied to the detection targets of the corresponding types, and ensures the detection precision.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments is provided with reference to the accompanying drawings.
Currently, many studies have been conducted on calibration of single-view imaging, and most of the studies are to calculate some or all calibration parameters based on typical geometric information (such as parallel lines, right angles, etc.) in the image. The application of single-view-based calibration is mainly three-dimensional reconstruction of a regular building or calculation of the real height of a target according to the position information of the target in an image, and the like.
The embodiment of the invention is expanded based on the application of single-view calibration. Unlike the above-described calculation of three-dimensional information from image information, the embodiment of the present invention derives a method of calculating a post-imaging shape of an object from three-dimensional information of the object. The method can be used for assisting in target detection based on statistics and accelerating the speed of target detection.
Fig. 1 is a flowchart of steps of a method according to an embodiment of the present invention, where the method for detecting an image according to the embodiment of the present invention is an image target shape determining method based on single-view calibration, and can obtain target shape information at any position in a monitored video image, as shown in fig. 1, the method includes:
step 101, acquiring monitoring video data of monitoring equipment;
102, obtaining calibration information of the monitoring equipment and actual shape information of a detected target;
103, acquiring the shape and size information of the detected target at the detection position according to the detection position in the image to be detected of the video data, the calibration information and the actual shape information.
The shape and size information of the detected target at the detection position is determined, and the corresponding scale can be directly searched, so that the searching scale can be reduced, the calculated amount is obviously reduced, and the detection speed is accelerated in the target detection process based on statistics.
Therefore, when the present invention is applied to the target detection based on statistics, the method further comprises the following steps: and in the process of target detection based on statistics, target search of corresponding scales is carried out according to the shape and size information.
In the embodiment of the invention, the actual shape information and the calibration information are the basis for calculating the shape and size information of the detected target at the detection position (namely the shape and size information of the real target after projection imaging), and since the actual shape information of the real target (such as a car, a human head and the like) is known and the shape and size of the same type of target are relatively consistent, the invention sets and stores the average shape value for different types of targets, so the actual shape information is set in advance before detection and is a known quantity before calculation.
As for the calibration information, it may be stored in the camera before detection, and for the camera without calibration, the following method (one) may be adopted for calibration.
Camera calibration method based on single view
Video monitoring generally adopts a mode of monitoring a target area by a single camera. In a monitoring scene, the traditional method for placing the calibration control object has a great limitation, because the monitored area is often an area where people can not reach or can not place the calibration reference object, such as an expressway, a dangerous area, an area where people are prohibited to enter, and the like. Therefore, in video monitoring, calibration generally adopts a vanishing point or vanishing line based method, for example, geometric information of buildings in a scene and edge information of a road are used to extract vanishing lines and vanishing points, and a pedestrian or a vehicle is also used to extract vanishing lines.
The embodiment of the invention also carries out calibration based on the principles of line extinction and point extinction. In order to obtain the line-out position conveniently, the invention utilizes target information vertical to the ground in a monitoring scene, such as imaging information of the same person standing at different positions, telegraph pole information, lane line information on the road and the like. According to the principle of projective geometry, parallel lines in the three-dimensional Euclidean space intersect at a vanishing point on the image after projection, and the two vanishing points can determine the vanishing line. So as long as two pairs of parallel lines which are parallel to the ground but have different directions in the three-dimensional space are known, the line-out position can be calculated.
According to the calibration principle, besides the line extinction, the vanishing point position in the direction vertical to the ground is also required to be known. The vanishing point information may be calculated from a pair of parallel lines perpendicular to the ground (e.g., from wall information of a building or from utility poles perpendicular to the ground). For simplicity of calculation, it may also be assumed that the vanishing point in the vertical direction is at infinity in the y-direction.
Therefore, in the embodiment of the invention, the normalized de-line vector is obtained through the two pairs of parallel lines which are parallel to the ground but have different directions
The vanishing point vector v in the vertical direction is obtained by a pair of parallel lines perpendicular to the ground. Of course, a vector of the line extinction can be obtained
The way of correlating the vanishing point vector v is not limited to only the above geometric elements, but may be obtained by other geometric elements by those skilled in the art.
After the calibration information is obtained by the above method, the inverse calculation of the target size is required according to the actual shape information and the calibration information, and the shape and size information of the detected target at the detection position is calculated. The inverse calculation method of the target size is as follows:
method for inverse calculation of target size
In the statistical-based target detection, the size of a target at any position in a monitored scene, such as the size of a human head in an image, the size of a vehicle at any position for traffic monitoring, and the like, often needs to be known. The monitor device generally photographs a scene from the side, so that according to the principle of projection imaging, an object close to the camera is imaged to a large extent, and vice versa to a small extent. In order to calculate the size of the projected target in the image, calibration information of the camera and the height of the target need to be known. The shape of the actual object is for example as shown in fig. 2, then:
the formula for calculating the height is:
<math> <mrow> <mi>αZ</mi> <mo>=</mo> <mfrac> <mrow> <mo>-</mo> <mo>|</mo> <mo>|</mo> <mi>b</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> <mrow> <mrow> <mo>(</mo> <mover> <mi>l</mi> <mo>^</mo> </mover> <mo>.</mo> <mi>b</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <mi>v</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow></math>
wherein
Is the normalized vanishing line vector and v is the vanishing point vector in the vertical direction. b, t are the bottom and top coordinates of the object on the image. Since the target is perpendicular to the ground, the abscissa of b, t is the same, and the ordinate differs by h, then:
substituting the above values into formula (1) to obtain
<math> <mrow> <mi>b</mi> <mo>×</mo> <mi>t</mi> <mo>=</mo> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>h</mi> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mo>-</mo> <mi>xh</mi> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <mi>v</mi> <mo>×</mo> <mi>t</mi> <mo>=</mo> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msub> <mi>v</mi> <mi>y</mi> </msub> <mo>-</mo> <msub> <mi>v</mi> <mi>z</mi> </msub> <mi>y</mi> </mtd> </mtr> <mtr> <mtd> <msub> <mi>v</mi> <mi>z</mi> </msub> <mi>x</mi> <mo>-</mo> <msub> <mi>v</mi> <mi>x</mi> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>v</mi> <mi>x</mi> </msub> <mi>y</mi> <mo>-</mo> <msub> <mi>v</mi> <mi>y</mi> </msub> <mi>x</mi> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <mover> <mi>l</mi> <mo>^</mo> </mover> <mo>.</mo> <mi>b</mi> <mo>=</mo> <msub> <mi>l</mi> <mi>x</mi> </msub> <mi>x</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>y</mi> </msub> <mi>y</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>y</mi> </msub> <mi>h</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>z</mi> </msub> </mrow></math>
<math> <mrow> <mo>|</mo> <mo>|</mo> <mi>b</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> <mo>=</mo> <mi>h</mi> <msqrt> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msqrt> </mrow></math>
If A | v × t |, then: <math> <mrow> <mo>-</mo> <mi>h</mi> <msqrt> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </msqrt> <mo>=</mo> <mi>αZA</mi> <mrow> <mo>(</mo> <msub> <mi>l</mi> <mi>x</mi> </msub> <mi>x</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>y</mi> </msub> <mi>y</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>y</mi> </msub> <mi>h</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>z</mi> </msub> <mo>)</mo> </mrow> </mrow></math>
<math> <mrow> <mrow> <mo>(</mo> <mi>αZA</mi> <msub> <mi>l</mi> <mi>y</mi> </msub> <mo>-</mo> <msqrt> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </msqrt> <mo>)</mo> </mrow> <mi>h</mi> <mo>=</mo> <mi>αZA</mi> <mrow> <mo>(</mo> <msub> <mi>l</mi> <mi>x</mi> </msub> <mi>x</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>y</mi> </msub> <mi>y</mi> <mo>+</mo> <msub> <mi>l</mi> <mi>z</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow></math>
h can be calculated according to equation (2).
As can be seen from the above description, when the true height Z of the object and the top coordinate t of the object in the image are known, the height value h of the object in the image can be calculated by the above formula. Then, according to the aspect ratio of the real target, the shape information of the target can be determined.
Take human head detection as an example. It is necessary to know the head size at any position in the image, i.e. how to calculate the head size at that point when t is known. It can be assumed that t, b have the same x-coordinate and a difference in y-coordinate by one h. The h value represents the height of the spatial body height projected on the image when the head is at t. Because the height of the head and the height of the person approximately meet a certain proportion, the size of the head can be approximately calculated when h is known. Z is considered to be the height of a person of standard height, and is 170cm in this embodiment. Alpha is a priori value obtained according to experiments, and the value of alpha can be calculated in advance by using a formula (1) according to the head and sole coordinates of a person with known height on a known image when the person is at rest at a certain position.
Fig. 3 is a block diagram of an apparatus according to an embodiment of the present invention. As shown in the figure, the apparatus for image detection according to the embodiment of the present invention includes:
adata acquisition module 310 configured to: acquiring monitoring video data of monitoring equipment;
a calibrationinformation obtaining module 320, configured to: obtaining calibration information of the monitoring equipment;
an actual shapeinformation obtaining module 330, configured to: acquiring actual shape information of the detected target;
acalculation module 340 for: and acquiring the shape and size information of the detected target at the detection position according to the detection position in the image to be detected of the video data, the calibration information and the actual shape information.
Thecalculation module 340 includes: a detectionposition acquisition unit 341 configured to acquire a detection position in an image to be detected of the video data; anarithmetic unit 342 for calculating the shape and size information.
After thecalculation module 340 obtains the shape and size information of the detected object at the detection position, the shape and size information may be input to a detection module (not shown), which is configured to: and in the process of target detection based on statistics, target search of corresponding scales is carried out according to the shape and size information.
The calculation module 340 is formulated by
<math> <mrow> <mi>αZ</mi> <mo>=</mo> <mfrac> <mrow> <mo>-</mo> <mo>|</mo> <mo>|</mo> <mi>b</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> <mrow> <mrow> <mo>(</mo> <mover> <mi>l</mi> <mo>^</mo> </mover> <mo>.</mo> <mi>b</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <mi>v</mi> <mo>×</mo> <mi>t</mi> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> </mrow></math>Calculating the bottom coordinate b of the detected target in the image to be detected, wherein alpha is a prior value obtained through a test, Z is the actual height of the detected target, t is the top coordinate of the detected target in the image to be detected,
is a normalized vanishing line vector, and v is a vanishing point vector in the vertical direction; after the bottom coordinate b is obtained, obtaining the image height h of the detected target in the image to be detected according to the difference value of the top coordinate t and the bottom coordinate b; and then, determining the shape and size information of the detected target at the detection position according to the image height h and the actual length-width ratio.
In a case that the camera is not calibrated, thecalibration information 320 obtaining module may further include:
a geometricinformation extracting unit 321, configured to obtain geometric information for calibration from the video data, where the geometric information includes: two pairs of parallel lines parallel to the ground but in different directions, and one pair of parallel lines perpendicular to the ground;
a
calibration unit 322, configured to obtain a normalized vanishing line vector according to the geometric information
And vanishing point vector in the vertical directionThe amount v.
The actual shape information acquisition module further comprises a storage unit, and the storage unit stores actual heights and actual length-width ratios of various targets to be detected. The multiple targets to be detected comprise a human target and an automobile target, the actual height of the human target is set as a statistical average value of the human height, the statistical average value of the human height is any value between 165 centimeters and 175 centimeters, and the actual length-width ratio of the human target is set as a statistical average value of the length-width ratio of the human target.
From the above, the embodiments of the present invention have the following advantages:
1) the embodiment of the invention expands the application based on single-view calibration, and can reversely calculate the shape and size information of the detected target at the detection position according to the calibration information of the single view and the actual shape information of the detected target, namely the shape and size information of the real target after projection imaging is obtained.
2) The embodiment of the invention applies the obtained shape and size information of the detected target at the detection position to the target detection process based on statistics, and searches the target with corresponding scale according to the shape and size information, thereby reducing the calculated amount and accelerating the detection speed.
3) The embodiment of the invention stores the actual shape information of real targets (such as cars, heads and the like) in advance, and because the shapes and sizes of the targets of the same type are consistent, the embodiment of the invention sets an average shape value for the targets of different types, can be directly applied to the detection targets of the corresponding types, and ensures the detection precision.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.