Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The current common traffic monitoring methods mainly comprise three types: the method comprises a manual monitoring observation method, a detection method of an underground embedded sensor and a detection method of a suspension system. The underground embedded sensor detection method is used for acquiring traffic parameters by using geomagnetic detection, coil detection and other modes. The suspension type detection method mainly comprises ultrasonic, radar, infrared, video sequence detection methods and the like. Although the manual monitoring method has high accuracy, a large amount of manpower and material resources are consumed in the implementation process, and the intellectualization of the detection system cannot be realized. The underground embedded sensor detection method is to acquire traffic parameters by using geomagnetic detection, coil detection and other modes. Although the system has good stability and is not easily influenced by the external environment, the system also has the defects of inconvenient installation, large damage to the road surface and the like. The suspension type detection method has the advantages of single detection function, low detection precision, poor real-time performance and the like by using methods such as ultrasonic waves, radars, infrared rays and the like. In comparison, the video detection method has the advantages of convenience in installation and maintenance, high detection precision, no influence of external environment, good real-time performance and the like, so that the video detection method is widely concerned and applied at present.
In an intelligent traffic system based on video images, traffic real-time video is the key of urban traffic research. When monitoring traffic, the related technology mainly determines vehicle information on roads in a video image by foreground detection extraction or by performing characteristic point tracking analysis on the video image. The two modes are easily influenced by different external environment factors such as illumination, shadow and the like, so that the monitoring result is inaccurate.
In view of the above situation, an embodiment of the present invention provides a traffic monitoring method. It should be noted that in the application scenario of the method, a camera may be laid in advance for each road segment, so that whether traffic congestion occurs at present is judged according to a real-time monitoring video returned by the camera. Referring to fig. 1, the method includes:
101. and inputting the current image frame into a vehicle detection model, and outputting a target detection frame in the current image frame, wherein the target detection frame is used for marking the vehicle in the current image frame.
The method mainly comprises the steps of rapidly and accurately identifying all vehicles in a current image frame monitoring area through a vehicle detection model, and determining the spatial information of the vehicles. The target detection frame is used for marking the possible position of the vehicle in the current image frame. The target detection frame may be represented by coordinates of center and edge points of the frame, which is not particularly limited in the embodiment of the present invention. Accordingly, the target detection frame in the current image frame is output, that is, the coordinates of the center and edge points of the target detection frame are output, that is, the spatial information of the vehicle is output. The vehicle detection model may specifically be a deep neural network model or a convolutional neural network model, which is not specifically limited in this embodiment of the present invention.
Based on the content of the above embodiments, as an alternative embodiment, the vehicle detection model may be trained based on sample images of determined targets. In addition, the sample images used for training may include a variety of categories including, but not limited to, sample images under different external environmental factors, such as different illumination levels, to overcome the effects of the external environmental factors when the deep neural network is subsequently used.
Taking the vehicle detection model as the deep neural network model as an example, the deep neural network model may be specifically a YOLOv3 network model, which is not specifically limited in this embodiment of the present invention. Based on the content of the above embodiments, as an optional embodiment, the deep neural network model may be composed of a feature extraction network, a multi-scale prediction network, and a multi-label classification prediction network. The feature extraction network may be formed by a large number of 3 × 3 and 1 × 1 convolution layers, and the layers may be connected by shotcut in the residual network, as shown in fig. 2. In FIG. 2, X is the output of the upper active layer, F (X) is the output of the current layer convolutional layer, the input of the current layer convolutional layer is F (X) + X, relu represents the active function between layers.
The YOLOv3 network model adopts a multi-scale fusion mode for prediction, and has a good detection and identification effect on small targets. Wherein the first predicted layer feature map size is N x N; sampling the first prediction layer feature graph, and forming a second prediction layer feature graph with the size of 2N x 2N after a certain convolution layer; and sampling the second prediction layer feature graph, and forming a third prediction layer feature graph with the size of 4N x 4N after a certain convolution layer. Each mesh on each prediction layer feature map is responsible for predicting the location, class, and score of the 3 bounding boxes. The loss function consists of three parts: coordinate error, IOU (Intersection ratio) error, and classification error, the loss function can be expressed by the following formula (1):
wherein W is the size of the characteristic diagram, errorcoordAs coordinate error, erroriouFor IOU error, errorclassA classification error. Based on the contents of the above embodiments, as an alternative embodiment, the classification prediction of the bounding box of each target of the YOLOv3 network model no longer uses the commonly used softmax classifier, but uses several simple logistic regression classifiers. The softmax classifier enables each bounding box to be correspondingly allocated with a category, namely, the categoryThe most numerous category. But the bounding box may have overlapping class labels so Softmax is not suitable for multi-label classification. In addition, the classification precision of the system is not reduced by adopting a plurality of simple logistic regression classifiers. It should be noted that the target is a vehicle that has been tracked in the previous image frame.
102. And according to the target detection frame, determining the position information of the specified tracking target in the current image frame, and specifying the tracking target as a vehicle specified to be tracked at the moment when the current image frame corresponds.
In step 101, the coordinates of the center and the edge points of the target detection frame, that is, the position information of the target detection frame in the current image frame, can be output through the vehicle detection model. And some of the target detection frames are vehicles corresponding to tracked targets which are tracked in previous image frames, and some of the target detection frames may be vehicles which do not appear in the image frames before the current image frame and are new to appear. Whether traffic jam occurs in the traffic video or not needs to be judged subsequently, and if the judgment basis in the judgment process is a high-precision tracking target, the monitoring result is more accurate. Therefore, the position information of the specified tracking target in the current image frame can be determined according to the target detection frame. The designated tracking target may be a vehicle designated for tracking at the time when the current image frame corresponds, that is, a high-precision tracking target screened from the tracking targets. Of course, the tracking target may not be screened, that is, the tracking target that has been tracked in the previous image frame is designated, and this is not particularly limited in the embodiment of the present invention.
103. And determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of the specified tracking targets in the current image frame.
The position information of the specified tracking target on any image frame prior to the current image frame can be determined according to the above-mentioned process. Therefore, after the position information of the specified tracking target in the current image frame is obtained, the average speed of the specified tracking target at the corresponding moment of the current image frame can be determined by combining the position information of the specified tracking target on the previous image frame.
104. And judging whether traffic jam occurs in the traffic video corresponding to the current image frame according to the average speed.
The average speed reflects the overall running speed of the vehicle, so that whether traffic jam occurs in the traffic video corresponding to the current image frame can be judged according to the average speed. Specifically, the occurrence of traffic jam may be determined by directly comparing the average speed with a preset threshold value, and if the average speed is greater than the preset threshold value. And if the traffic jam is smaller than the preset threshold value, determining that the traffic jam does not occur. It should be noted that the traffic video is composed of image frames (including a start image frame and an end image frame) between a start image frame and an end image frame, and the average speed is calculated only by using two frames of data before and after the start image frame and the end image frame. As can be seen from the above description, the ending image frame of the traffic video is the current image frame located at the back, and the starting image frame of the traffic video is the first frame used for calculating the average speed and located at the front.
According to the method provided by the embodiment of the invention, the current image frame is input into the vehicle detection model, and the target detection frame in the current image frame is output. And determining the position information of the specified tracking target in the current image frame according to the target detection frame, and determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of the specified tracking target in the current image frame. And judging whether traffic jam occurs in the traffic video corresponding to the current image frame according to the average speed. The target detection frame in the current image frame can be detected and identified through the vehicle detection model, and the vehicle detection model can be obtained based on sample images of various types through training, so that the interference of external environment factors can be overcome, and the accuracy of the monitoring result is improved.
According to the content of the embodiment, whether the traffic jam occurs in the traffic video needs to be judged subsequently, and if the judgment basis in the judgment process is a high-precision tracking target, the monitoring result is more accurate. Based on the above principle and the contents of the above embodiments, as an alternative embodiment, before determining the position information of the specified tracking target in the current image frame according to the target detection frame, the specified tracking target may also be determined, and the method for determining the specified tracking target is not specifically limited by the embodiments of the present invention. Referring to fig. 3, the method includes, but is not limited to:
301. for any tracking target, determining an initial image frame of any tracking target which is detected for the first time, and determining the tracking length of any tracking target according to the frame number difference between the current image frame and the initial image frame; and the tracking target is a vehicle tracked at the moment corresponding to the current image frame.
For convenience of understanding, for any tracking target, the starting image frame detected by the tracking target for the first time is the 1 st frame, and the current image frame is the 10 th frame. Since the tracking is continued for 10 frames from the 1 st frame to the 10 th frame, the tracking length can be determined to be 10 according to the frame number difference between the current image frame and the starting image frame.
302. Counting the total number of detected frames of any tracking target from the initial image frame to the current image frame, and taking the total number as the detection weight value of any tracking target.
With reference to the above example, if the vehicle corresponding to the tracking target is the vehicle i, and the vehicle i is detected in the 2 nd frame, the 3 rd frame, the 6 th frame and the 7 th frame in addition to the 1 st frame and the 10 th frame, respectively, the total number of frames detected from the initial image frame to the current image frame of the vehicle i is 6. Accordingly, the detection weight value corresponding to the vehicle i as the tracking target is 6.
303. And if the tracking length of any tracking target is greater than a first preset threshold, the detection weight value of any tracking target is greater than a second preset threshold and any tracking target is detected in the current image frame, taking any tracking target as a specified tracking target.
The first preset threshold and the second preset threshold may be set according to a requirement, which is not specifically limited in the embodiment of the present invention. It should be noted that each tracked target can be determined according to the above process, and after the step is completed, a specified tracked target can be obtained from a plurality of tracked targets. It should be noted that, by screening the specific tracked targets from the tracked targets through the above process, the number of the specific tracked targets may be 0.
According to the method provided by the embodiment of the invention, for any tracking target, the initial image frame of the tracking target which is detected for the first time is determined, and the tracking length of any tracking target is determined according to the frame number difference between the current image frame and the initial image frame. And counting the total number of detected frames of the tracking target from the initial image frame to the current image frame, and taking the total number as a detection weight value of the tracking target. And if the tracking length of the tracking target is greater than a first preset threshold, the detection weight value of the tracking target is greater than a second preset threshold and the tracking target is detected in the current image frame, taking the tracking target as an appointed tracking target. Because the high-precision specified tracking targets can be screened from the tracking targets, the accuracy of the monitoring result can be improved.
In the process of traffic monitoring, vehicles appearing in the first image frame are continuously tracked as tracking targets, vehicles appearing in the previous image frame and serving as tracking targets are likely to appear again in the next image frame, so that the position information of the specified tracking targets in the current image frame can be determined by combining the position information of the tracking targets in the previous image frame and the position information of the vehicles detected and identified in the current image frame in the actual implementation process. In combination with the above-mentioned principles and embodiments, as an alternative embodiment, the embodiment of the present invention does not specifically limit the method for determining the position information of the specified tracking target in the current image frame according to the target detection frame. Referring to fig. 4, including but not limited to:
1021. and determining a target prediction frame of the specified tracking target in the current image frame according to the position information of the specified tracking target in the image frame before the current image frame.
To specify the position information of the tracking target in the previous image frame of the current image frame for Xk-1For example, when determining the target prediction frame in the previous image frame of the specified tracking target, the target prediction frame can be determined according to the following formula (2), which is as follows:
X'k=AkXk-1(2)
wherein, X'kA target prediction frame representing a preceding image frame of the specified tracking target, AkA state transition matrix, X, representing the correspondence of the specified trace targetk-1Indicating the position information of the specified tracking target in the previous image frame. It should be noted that there may be more than one designated tracking target, and each designated tracking target may determine its target prediction frame in the current image frame according to the above-mentioned process.
1022. And determining the position information of the specified tracking target in the current image frame according to the target prediction frame and the target detection frame.
After the target prediction frame and the target detection frame are obtained, the position information of the specified tracking target in the current image frame may be determined in an intersection manner, which is not specifically limited in the embodiment of the present invention.
The method provided by the embodiment of the invention determines the target prediction frame of the specified tracking target in the current image frame according to the position information of the specified tracking target in the previous image frame of the current image frame. And determining the position information of the specified tracking target in the current image frame according to the target prediction frame and the target detection frame. The position information of the appointed tracking target in the current image frame can be determined by combining the position information of the appointed tracking target in the previous image frame and the position information of the vehicle detected and identified by the current image frame, so that the accuracy of the subsequent monitoring result can be improved.
Based on the contents of the above embodiments, as an alternative embodiment, the embodiment of the present invention does not specifically limit the method for determining the position information of the specified tracking target in the current image frame according to the target prediction frame and the target detection frame. Referring to fig. 5, the method includes, but is not limited to:
10221. and for any target prediction frame, calculating the intersection ratio between any target prediction frame and each target detection frame.
In this step, the number of the target prediction boxes is usually not one, and the intersection ratio between each target prediction box and each target detection box can be calculated. For any target prediction frame, after the intersection ratio between the target prediction frame and each target detection frame is calculated, the maximum intersection ratio can be determined from all the intersection ratios.
It should be noted that, as can be seen from the above steps, for any tracking target, the total number of frames detected from the starting image frame to the current image frame of the tracking target needs to be counted. That is, it is necessary to determine whether the tracking target is detected in the image frame. Taking the determination of whether the tracking target is detected in the current image frame as an example, the embodiment of the present invention does not specifically limit the manner of determining whether the tracking target is detected in the current image frame, and includes but is not limited to: determining a target prediction frame of the tracking target in the current image frame according to the position information of the tracking target in the previous image frame of the current image frame; inputting the current image frame into a vehicle detection model, and outputting a target detection frame in the current image frame; and calculating the intersection ratio between the target prediction frame and each target detection frame. If the maximum intersection ratio value of all the intersection ratio values is smaller than a third preset threshold value, it may be determined that the tracking target is detected in the current image frame. Of course, in practical implementation, it may also be determined whether the tracking target is detected in the image frame by other ways, such as determining whether the tracking target is detected by image recognition, which is not specifically limited by the embodiment of the present invention.
It should be noted that, for the first image frame, by inputting the first image frame into the vehicle detection model, a plurality of object detection frames may be output, and thus a plurality of tracking objects may be determined. Meanwhile, the plurality of tracking targets may be considered to be detected in the first image frame.
10222. If the maximum intersection ratio of all the intersection ratios is smaller than a third preset threshold, taking any one target prediction frame as the position information of the specified tracking target corresponding to any one target prediction frame in the current image frame, and if the maximum intersection ratio is not smaller than the third preset threshold, determining the position information of the specified tracking target corresponding to any one target prediction frame in the current image frame according to any one target prediction frame and the target detection frame corresponding to the maximum intersection ratio.
By X with position information in the current image framekFor example, if the maximum cross-over ratio is less than the third preset threshold, then X 'can be directly obtained by combining the above formula (2)'kIs assigned to Xk. That is, for any target prediction frame, the target prediction frame may be used as the position information of the specified tracking target corresponding to the target prediction frame in the current image frame.
If the maximum intersection ratio is not smaller than the third preset threshold, the following formula (3) may be referred to determine the position information of the specified tracking target corresponding to the target prediction frame in the current image frame, specifically as follows:
Xk=X'k+Qk(Zk-RkX'k) (3)
wherein, X'kRepresents the target prediction box, QkKalman filter gain factor, Z, representing the current image framekRepresenting the target detection box corresponding to the maximum intersection ratio, RkIs an observation matrix.
QkThe calculation can be performed by the following formula (4), specifically as follows:
wherein, P'kIndicating the predicted state of the specified tracking target corresponding to the target prediction frameState XkCovariance prediction matrix of time, RkA representation of an observation matrix is shown,represents RkThe transposed matrix of (2). SkTo observe the noise covariance matrix, and obey a standard normal distribution.
P'kThe calculation can be performed by the following formula (5), specifically as follows:
wherein, P'kIndicating that the designated tracking target corresponding to the target prediction frame is in the prediction state XkCovariance prediction matrix of time, AkA state transition matrix representing a designated trace target corresponding to the target prediction box,is AkThe transposed matrix of (2). B isk-1Is a dynamic noise covariance matrix and follows a standard normal distribution.
According to the method provided by the embodiment of the invention, for any target prediction frame, the intersection ratio between the target prediction frame and each target detection frame is calculated. If the maximum intersection ratio of all the intersection ratios is smaller than a third preset threshold, the target prediction frame is used as the position information of the specified tracking target corresponding to the target prediction frame in the current image frame, and if the maximum intersection ratio is not smaller than the third preset threshold, the position information of the specified tracking target corresponding to the target prediction frame in the current image frame is determined according to the target prediction frame and the target detection frame corresponding to the maximum intersection ratio. The position information of the appointed tracking target corresponding to the target prediction frame in the current image frame can be calculated based on different situations of the intersection ratio, so that the accuracy of the subsequent monitoring result can be improved.
Based on the content of the foregoing embodiments, as an alternative embodiment, the embodiment of the present invention does not specifically limit the method for determining the average speed of all the specified tracking targets at the corresponding time in the current image frame according to the position information of the specified tracking targets in the current image frame. Referring to fig. 6, the method includes, but is not limited to:
1031. and determining the appointed initial image frame corresponding to each appointed tracking target according to the detection weight value corresponding to each appointed tracking target and a fourth preset threshold value.
In conjunction with the example of the above embodiment, for any specific tracking target, the starting image frame detected by the specific tracking target for the first time is the 1 st frame, and the current image frame is the 10 th frame. Since the tracking is continued for 10 frames from the 1 st frame to the 10 th frame, the tracking length can be determined to be 10 according to the frame number difference between the current image frame and the starting image frame. If the vehicle corresponding to the tracking target is designated as the vehicle i, and the vehicle i is detected in the 2 nd frame, the 3 rd frame, the 6 th frame and the 7 th frame in addition to the 1 st frame and the 10 th frame, respectively, the total number of frames detected from the initial image frame to the current image frame of the vehicle i is 6. Accordingly, the detection weight value corresponding to the vehicle i as the tracking target is 6.
After the detection weight value is obtained, a difference value between the detection weight value and a fourth preset threshold value is made, so that the difference value between the detection weight value and the fourth preset threshold value is used as a new detection weight value, and a new tracking length is determined according to the new detection weight value. Finally, the appointed initial image frame corresponding to the appointed tracking target can be determined according to the new tracking length. Taking the fourth preset threshold as 2 as an example, in combination with the above example, the new detection weight value is (6-2) ═ 4. And the vehicle i as the designated tracking target reaches the detection weight value of 4 only when the vehicle i is at the 6 th frame, so that the new tracking length is 6, and the 6 th frame can be used as the designated initial image frame corresponding to the designated tracking target.
It should be noted that, through the above process, the designated starting image frame corresponding to each designated tracking target may be determined, and the designated starting image frame corresponding to each designated tracking target may not be the same. It should be further noted that, for the image frames before the current image frame, the same process as calculating the position information of the specified tracking target in the current image frame is performed, and the position information of the specified tracking target in these image frames may be calculated according to the method provided in the above embodiment, and details are not repeated here.
1032. And determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of each specified tracking target in the current image frame and the position information of each specified tracking target in the specified initial image frame corresponding to each specified tracking target.
The embodiment of the present invention does not specifically limit the manner of determining the average speed of all the designated tracking targets at the corresponding time of the current image frame according to the position information of each designated tracking target in the current image frame and the position information of each designated tracking target in the corresponding designated starting image frame, and includes, but is not limited to, the following processes:
(1) and respectively carrying out coordinate transformation on the position information of each appointed tracking target in the current image frame and the position information of each appointed tracking target in the appointed initial image frame corresponding to each appointed tracking target through a perspective transformation matrix to obtain the real position information of each appointed tracking target in the current image frame and the real position information of each appointed tracking target in the corresponding appointed initial image frame.
Since the camera is two-dimensional when shooting video and may be at any angle, the distance between any two points in the video image frame is not the real distance between the two points. Thus, the perspective transformation matrix may be acquired before performing step (1). Specifically, after the camera is fixed, the perspective transformation matrix may be calculated in the first image frame of the video, and the perspective matrix value may be saved, so that the saved perspective transformation matrix may be used in the subsequent image frames in the video. It should be noted that, in practical implementation, the perspective transformation matrix may not be calculated in the first image frame, such as the perspective transformation matrix may be calculated in the second image frame or the subsequent image frame.
Wherein, the perspective change matrix M can refer to the following formula (6):
for any specified tracking target, the position information of the specified tracking target in the current image frame is (c)x,cy) For example, the actual position information (c _ t) of the specified tracking targetx,c_ty) Can be calculated by the following formula (7), specifically as follows:
it should be noted that, since the specified tracking target is marked in the current image frame in a frame form, the position information of the specified tracking target in the current image frame may be a center coordinate of the frame, which is not specifically limited in this embodiment of the present invention. Similarly, the real position information of the specified tracking target in the specified initial image frame corresponding to the specified tracking target can be calculated according to the formula.
(2) And calculating the average displacement of all the specified tracking targets on the corresponding moment of the current image frame according to the real position information of each specified tracking target in the current image frame and the real position information of each specified tracking target in the corresponding specified initial image frame.
Taking the jth designated tracking target as an example, based on the real position information obtained by the above calculation, the distance between the jth designated tracking target and the jth designated tracking target can be calculated and counted as dj, and the difference in the number of frames between the current image frame and the designated start image frame corresponding to the jth designated tracking target can be counted as len1j-len2 j. Accordingly, the average displacement of all the designated tracking targets at the corresponding time of the current image frame can be calculated by the following formula (8), specifically as follows:
wherein d _ mean represents the average displacement of all the specified tracking targets at the corresponding moment of the current image frame, and ot _ real represents the total number of the specified tracking targets.
(3) And calculating the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the average displacement and the frame frequency corresponding to all the specified tracking targets.
As can be seen from the above embodiments, the total number of the designated tracking targets may be 0. When the total number of the specified tracking targets is 0, a preset constant value may be taken as the average speed. Wherein the constant value may take a larger value. When the total number of the specified tracking targets is greater than 0, calculating the average speed of all the specified tracking targets at the corresponding moment of the current image frame by using the following formula (9), specifically as follows:
wherein speed _ mean represents the average speed of all specified tracking targets at the corresponding moment of the current image frame, and fps represents the frame frequency.
According to the method provided by the embodiment of the invention, the appointed initial image frame corresponding to each appointed tracking target is determined according to the detection weight value corresponding to each appointed tracking target and the fourth preset threshold value. And determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of each specified tracking target in the current image frame and the position information of each specified tracking target in the specified initial image frame corresponding to each specified tracking target. And whether traffic jam occurs or not can be determined according to the average speed of all the specified tracking targets at the corresponding moment of the current image frame, so that the accuracy of the monitoring result can be improved.
Based on the content of the foregoing embodiment, as an optional embodiment, the embodiment of the present invention does not specifically limit the method for determining whether traffic congestion occurs in the traffic video corresponding to the current image frame according to the average speed. Referring to fig. 7, including but not limited to:
1041. and determining and judging a first total frame number between the initial image frame and the current image frame, and counting and judging a second total frame number of which the average speed between the initial image frame and the current image frame is less than a fifth preset threshold value.
By the method provided by the embodiment, the average speed of all the specified tracking targets at the corresponding moment of each image frame can be calculated, that is, each image frame can correspond to one average speed. Therefore, in this step, it can be statistically determined that the second total frame number is smaller than the fifth preset threshold value from the starting image frame to the current image frame.
1042. And if the ratio of the second total frame number to the first total frame number is greater than a sixth preset threshold value, determining that traffic jam occurs in the traffic video, wherein the traffic video is composed of image frames from the judgment starting image frame to the current image frame.
The traffic video comprises a starting image frame and a current image frame. The size of the preset threshold referred to in the above embodiments may be set according to actual requirements, which is not specifically limited in the embodiments of the present invention. For example, taking the case that the initial image frame is judged to be the 1 st frame and the current image frame is judged to be the 10 th frame, the first total frame number is 10. And if the average speed corresponding to 4 frames in the 10 frames is less than the fifth preset threshold, the second total frame number is 4, and the ratio of the two is 0.4. If 0.4 is greater than a sixth preset threshold, it may be determined that traffic congestion occurs in the traffic video.
In the method provided by the embodiment of the invention, the first total frame number between the initial image frame and the current image frame is determined and judged, and the second total frame number between the initial image frame and the current image frame, the average speed of which is smaller than a fifth preset threshold value, is counted and judged. And if the ratio of the second total frame number to the first total frame number is greater than a sixth preset threshold value, determining that traffic jam occurs in the traffic video, wherein the traffic video is composed of image frames from the judgment starting image frame to the current image frame. Whether traffic jam occurs can be determined according to the average speed of all the specified tracking targets at the corresponding moment of the current image frame, so that the accuracy of the monitoring result can be improved.
It should be noted that, all the above-mentioned alternative embodiments may be combined arbitrarily to form alternative embodiments of the present invention, and are not described in detail herein.
Based on the content of the foregoing embodiments, an embodiment of the present invention provides a traffic monitoring device, which is configured to execute the traffic monitoring method in the foregoing method embodiments. Referring to fig. 8, the apparatus includes: an output module 801, a determination module 802 and a judgment module 803; wherein,
the output module 801 is configured to input the current image frame into the deep neural network, and output a target detection frame in the current image frame, where the target detection frame is used to mark a vehicle in the current image frame;
the first determining module 802 is configured to determine, according to the target detection frame, position information of the designated tracking target in the current image frame, and determine, according to the position information of the designated tracking target in the current image frame, average speeds of all the designated tracking targets at a time corresponding to the current image frame, where the designated tracking target is a designated tracking vehicle at the time corresponding to the current image frame;
the determining module 803 is configured to determine whether a traffic jam occurs in the traffic video corresponding to the current image frame according to the average speed.
Based on the content of the foregoing embodiment, as an alternative embodiment, the apparatus further includes:
the second determining module is used for determining an initial image frame of any tracking target detected for the first time and determining the tracking length of any tracking target according to the frame number difference between the current image frame and the initial image frame; the tracking target is a vehicle tracked at the moment corresponding to the current image frame;
the counting module is used for counting the total number of detected frames of any tracking target from the initial image frame to the current image frame and taking the total number of detected frames as the detection weight value of any tracking target;
and the third determining module is used for taking any tracking target as the specified tracking target when the tracking length of any tracking target is greater than the first preset threshold, the detection weight value of any tracking target is greater than the second preset threshold and any tracking target is detected in the current image frame.
Based on the content of the foregoing embodiments, as an alternative embodiment, the first determining module 802 includes: a first determining unit and a second determining unit;
a first determination unit configured to determine a target prediction frame of the specified tracking target in the current image frame based on position information of the specified tracking target in an image frame preceding the current image frame;
and the second determining unit is used for determining the position information of the specified tracking target in the current image frame according to the target prediction frame and the target detection frame.
Based on the content of the foregoing embodiment, as an optional embodiment, the second determining unit is configured to calculate, for any target prediction box, an intersection ratio between the any target prediction box and each target detection box; if the maximum intersection ratio of all the intersection ratios is smaller than a third preset threshold, taking any one target prediction frame as the position information of the specified tracking target corresponding to any one target prediction frame in the current image frame, and if the maximum intersection ratio is not smaller than the third preset threshold, determining the position information of the specified tracking target corresponding to any one target prediction frame in the current image frame according to any one target prediction frame and the target detection frame corresponding to the maximum intersection ratio.
Based on the content of the foregoing embodiment, as an optional embodiment, the first determining module 802 further includes: a third determining unit and a fourth determining unit;
the third determining unit is used for determining the appointed initial image frame corresponding to each appointed tracking target according to the detection weight value corresponding to each appointed tracking target and a fourth preset threshold value;
and the fourth determining unit is used for determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of each specified tracking target in the current image frame and the position information of each specified tracking target in the specified initial image frame corresponding to each specified tracking target.
Based on the content of the foregoing embodiment, as an optional embodiment, the fourth determining unit is configured to perform coordinate transformation on the position information of each specified tracking target in the current image frame and the position information of each specified tracking target in the specified starting image frame corresponding to each specified tracking target through a perspective transformation matrix, respectively, to obtain real position information of each specified tracking target in the current image frame and real position information of each specified tracking target in the corresponding specified starting image frame; calculating the average displacement of all the specified tracking targets on the corresponding moment of the current image frame according to the real position information of each specified tracking target in the current image frame and the real position information of each specified tracking target in the corresponding specified initial image frame; and calculating the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the average displacement and the frame frequency corresponding to all the specified tracking targets.
Based on the content of the foregoing embodiment, as an optional embodiment, the determining module 803 is configured to determine a first total frame number between the start image frame and the current image frame, and count and determine a second total frame number between the start image frame and the current image frame, where an average speed between the start image frame and the current image frame is smaller than a fifth preset threshold; and if the ratio of the second total frame number to the first total frame number is greater than a sixth preset threshold value, determining that traffic jam occurs in the traffic video, wherein the traffic video is composed of image frames from the judgment starting image frame to the current image frame.
According to the device provided by the embodiment of the invention, the current image frame is input into the vehicle detection model, and the target detection frame in the current image frame is output. And determining the position information of the specified tracking target in the current image frame according to the target detection frame, and determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of the specified tracking target in the current image frame. And judging whether traffic jam occurs in the traffic video corresponding to the current image frame according to the average speed. The target detection frame in the current image frame can be detected and identified through the vehicle detection model, and the vehicle detection model can be obtained based on sample images of various types through training, so that the interference of external environment factors can be overcome, and the accuracy of the monitoring result is improved.
Fig. 9 illustrates a physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor)910, a communication Interface (Communications Interface)920, a memory (memory)930, and a communication bus 940, wherein the processor 910, the communication Interface 920, and the memory 930 communicate with each other via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform the following method: inputting the current image frame into a vehicle detection model, and outputting a target detection frame in the current image frame, wherein the target detection frame is used for marking a vehicle in the current image frame; determining the position information of the specified tracking target in the current image frame according to the target detection frame, determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of the specified tracking target in the current image frame, and specifying the tracking target as a vehicle specified to be tracked at the corresponding moment of the current image frame; and judging whether traffic jam occurs in the traffic video corresponding to the current image frame according to the average speed.
Furthermore, the logic instructions in the memory 930 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: inputting the current image frame into a vehicle detection model, and outputting a target detection frame in the current image frame, wherein the target detection frame is used for marking a vehicle in the current image frame; determining the position information of the specified tracking target in the current image frame according to the target detection frame, determining the average speed of all the specified tracking targets at the corresponding moment of the current image frame according to the position information of the specified tracking target in the current image frame, and specifying the tracking target as a vehicle specified to be tracked at the corresponding moment of the current image frame; and judging whether traffic jam occurs in the traffic video corresponding to the current image frame according to the average speed.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.