CN118537819B

Movatterモバイル変換

Info

Publication number: CN118537819B
Application number: CN202411000782.9A
Authority: CN
Inventors: 张天遨; 马文韬; 吴泽凯; 徐浩洋; 霍景辰; 郭忠文
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2024-07-25
Filing date: 2024-07-25
Publication date: 2024-10-11
Anticipated expiration: 2044-07-25
Also published as: CN118537819A

Abstract

Translated fromChinese

本发明提供了一种低算力的帧差法道路车辆视觉识别方法、介质及系统，属于道路车辆识别方法技术领域，包括：获取摄像头采集的目标道路的视频，进行预处理得到预处理视频；采用帧差法计算所述预处理视频中的相邻帧图像的像素差异，根据预设的像素差异阈值以及形态学处理，检测出运动区域；对检测出的运动区域进行边缘检测和轮廓提取，得到运动轮廓；对检测到的运动轮廓进行跟踪，得到运动轨迹和运动速度；利用预先训练好的距离分级车辆识别模型，输入所述运动轮廓、运动轨迹、运动速度，得到所述运动轮廓对应的车辆分类；输出所述车辆分类和运动速度作为识别结果。本发明解决了现有技术存在的由于计算复杂度大，导致需要的算力高的技术问题。

The present invention provides a low-computing-power frame difference method road vehicle visual recognition method, medium and system, belonging to the technical field of road vehicle recognition methods, including: obtaining a video of a target road captured by a camera, performing preprocessing to obtain a preprocessed video; using the frame difference method to calculate the pixel difference of adjacent frame images in the preprocessed video, and detecting the motion area according to a preset pixel difference threshold and morphological processing; performing edge detection and contour extraction on the detected motion area to obtain a motion contour; tracking the detected motion contour to obtain a motion trajectory and motion speed; using a pre-trained distance classification vehicle recognition model, inputting the motion contour, motion trajectory, and motion speed to obtain a vehicle classification corresponding to the motion contour; outputting the vehicle classification and motion speed as a recognition result. The present invention solves the technical problem of the existing technology that the high computational complexity leads to the high computing power required.

Description

Translated fromChinese

一种低算力的帧差法道路车辆视觉识别方法、介质及系统A low-computing-power frame difference method for road vehicle visual recognition, medium and system

技术领域Technical Field

本发明属于道路车辆识别方法技术领域，具体而言，涉及一种低算力的帧差法道路车辆视觉识别方法、介质及系统。The present invention belongs to the technical field of road vehicle recognition methods, and in particular, relates to a low-computing-power frame difference method road vehicle visual recognition method, medium and system.

背景技术Background Art

当前,视觉识别技术在交通领域得到了广泛的应用,尤其是在道路车辆识别方面。随着城市交通监控系统的建设和完善,基于视频图像的车辆识别技术变得越来越重要。车辆识别技术可用于交通监控、电子收费、车辆统计、违章查处等多个领域,对于保障城市交通安全、提高交通管理效率具有重要意义。传统的车辆识别方法主要基于图像处理和模式识别技术。常见的方法包括边缘检测、形状匹配、纹理分析等。这些方法通常需要对图像进行预处理,如去噪、增强对比度等,然后提取图像的特征,如边缘、角点、形状等,最后与预先建立的模板库进行匹配,从而识别出车辆的类型。随着深度学习技术的发展,基于深度神经网络的车辆识别方法得到了广泛关注和应用。深度神经网络能够自动从大量数据中学习特征表示,避免了手工设计特征的繁琐过程。常见的深度学习模型包括卷积神经网络(CNN)、递归神经网络(RNN)、生成对抗网络(GAN)等。这些模型通过大量的训练数据和强大的计算能力,可以达到较高的车辆识别精度。At present, visual recognition technology has been widely used in the field of transportation, especially in road vehicle identification. With the construction and improvement of urban traffic monitoring systems, vehicle recognition technology based on video images has become increasingly important. Vehicle recognition technology can be used in many fields such as traffic monitoring, electronic toll collection, vehicle statistics, and violation investigation and punishment. It is of great significance to ensure urban traffic safety and improve traffic management efficiency. Traditional vehicle recognition methods are mainly based on image processing and pattern recognition technology. Common methods include edge detection, shape matching, texture analysis, etc. These methods usually require image preprocessing, such as denoising, contrast enhancement, etc., and then extract image features such as edges, corners, shapes, etc., and finally match them with pre-established template libraries to identify the type of vehicle. With the development of deep learning technology, vehicle recognition methods based on deep neural networks have received widespread attention and application. Deep neural networks can automatically learn feature representations from large amounts of data, avoiding the tedious process of manually designing features. Common deep learning models include convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), etc. These models can achieve high vehicle recognition accuracy through a large amount of training data and powerful computing power.

但是深度神经网络模型通常包含大量的参数,需要进行大量的计算操作。特别是在实时视频流的场景下,每一帧图像都需要经过神经网络的前向传播,计算量非常巨大。这对硬件设备的算力要求很高,不仅需要高性能的GPU或专用加速器,而且能耗也相对较大。在一些嵌入式系统或移动设备上,算力和能耗的限制使得深度学习模型难以高效部署，影响实际使用。However, deep neural network models usually contain a large number of parameters and require a large number of computing operations. Especially in the scenario of real-time video streaming, each frame of the image needs to go through the forward propagation of the neural network, and the amount of calculation is very huge. This places high demands on the computing power of the hardware device, not only requiring a high-performance GPU or dedicated accelerator, but also relatively large energy consumption. On some embedded systems or mobile devices, the limitations of computing power and energy consumption make it difficult to efficiently deploy deep learning models, affecting actual use.

发明内容Summary of the invention

有鉴于此，本发明提供一种低算力的帧差法道路车辆视觉识别方法、介质及系统，能够解决现有技术存在的由于计算复杂度大，导致需要的算力高的技术问题。In view of this, the present invention provides a low-computing-power frame difference method road vehicle visual recognition method, medium and system, which can solve the technical problem of the prior art that high computing power is required due to high computational complexity.

本发明是这样实现的：The present invention is achieved in that:

本发明的第一方面提供一种低算力的帧差法道路车辆视觉识别方法，其中，包括以下步骤：A first aspect of the present invention provides a low-computing-power frame difference method for road vehicle visual recognition, which comprises the following steps:

S10、获取摄像头采集的目标道路的视频，进行预处理得到预处理视频；S10, obtaining a video of the target road captured by a camera, and performing preprocessing to obtain a preprocessed video;

S20、采用帧差法计算所述预处理视频中的相邻帧图像的像素差异，根据预设的像素差异阈值以及形态学处理，检测出运动区域；S20, using a frame difference method to calculate pixel differences between adjacent frame images in the preprocessed video, and detecting a motion area according to a preset pixel difference threshold and morphological processing;

S30、对检测出的运动区域进行边缘检测和轮廓提取，得到运动轮廓；S30, performing edge detection and contour extraction on the detected motion area to obtain a motion contour;

S40、对检测到的运动轮廓进行跟踪，得到运动轨迹和运动速度；S40, tracking the detected motion profile to obtain a motion trajectory and a motion speed;

S50、利用预先训练好的距离分级车辆识别模型，输入所述运动轮廓、运动轨迹、运动速度，得到所述运动轮廓对应的车辆分类；S50, using a pre-trained distance classification vehicle recognition model, inputting the motion profile, motion trajectory, and motion speed to obtain a vehicle classification corresponding to the motion profile;

S60、输出所述车辆分类和运动速度作为识别结果。S60: Output the vehicle classification and movement speed as the recognition result.

其中，所述距离分级车辆识别模型的具体结构采用金字塔式神经网络模型，包括五个依次执行的金字塔层以及跳转输出模块，每个金字塔层用于识别图像中不同距离的运动轮廓。The specific structure of the distance-graded vehicle recognition model adopts a pyramid neural network model, including five pyramid layers executed sequentially and a jump output module, and each pyramid layer is used to recognize motion contours at different distances in the image.

进一步的，第一金字塔层用于识别距离范围第一范围的运动轮廓;Further, the first pyramid layer is used to identify motion contours in a first range of distance ranges;

第二金字塔层用于识别距离范围为第一范围和第二范围的两个运动轮廓的组合;The second pyramid layer is used to identify a combination of two motion profiles with distance ranges of the first range and the second range;

第三金字塔层用于识别距离范围为第一范围、第二范围和第三范围的三个运动轮廓的组合;The third pyramid layer is used to identify a combination of three motion profiles having distance ranges of the first range, the second range, and the third range;

第四金字塔层用于识别距离范围为第一范围、第二范围、第三范围和第四范围的四个运动轮廓的组合;The fourth pyramid layer is used to identify a combination of four motion profiles having distance ranges of a first range, a second range, a third range, and a fourth range;

第五金字塔层用于识别距离范围为第一范围、第二范围、第三范围、第四范围和第五范围的五个运动轮廓的组合；The fifth pyramid layer is used to identify a combination of five motion profiles having distance ranges of a first range, a second range, a third range, a fourth range, and a fifth range;

所述第一范围、第二范围、第三范围、第四范围和第五范围的距离依次减小。The distances of the first range, the second range, the third range, the fourth range and the fifth range decrease in sequence.

可选的，所述第一范围为200~250米，所述第二范围为150~200米，所述第三范围为100~150米，所述第四范围为50~100米，所述第五范围为10~50米。Optionally, the first range is 200~250 meters, the second range is 150~200 meters, the third range is 100~150 meters, the fourth range is 50~100 meters, and the fifth range is 10~50 meters.

相应的，第1层:识别距离范围为200-250米的运动目标;Correspondingly, Layer 1: Identify moving targets within a range of 200-250 meters;

第2层:识别距离范围为150-250米的运动目标;Layer 2: Identify moving targets within a range of 150-250 meters;

第3层:识别距离范围为100-250米的运动目标;Layer 3: Identify moving targets at a distance range of 100-250 meters;

第4层:识别距离范围为50-250米的运动目标;Layer 4: Identify moving targets at a distance range of 50-250 meters;

第5层:识别距离范围为10-250米的运动目标。Layer 5: Identify moving targets at distances ranging from 10-250 meters.

进一步的，每一金字塔层的结构如下:Furthermore, the structure of each pyramid layer is as follows:

输入层:接收运动轮廓图像、运动轨迹和速度信息作为输入；Input layer: receives motion contour image, motion trajectory and speed information as input;

卷积层组:包含多个卷积层,用于提取图像特征；Convolutional layer group: contains multiple convolutional layers, used to extract image features;

池化层组:包含最大池化层,逐步降低特征图分辨率；Pooling layer group: contains the maximum pooling layer, which gradually reduces the resolution of the feature map;

全连接层组:将卷积特征展平,经过全连接层输出分类概率；Fully connected layer group: flatten the convolutional features and output the classification probability through the fully connected layer;

输出层:基于分类概率输出识别结果。Output layer: Outputs recognition results based on classification probability.

进一步的，所述跳转输出模块用于判断当前金字塔层的输出可信度是否大于可信度阈值，若大于或等于，则以当前金字塔层的输出作为所述距离分级车辆识别模型的输出结果；若小于则继续进行下一层金字塔或者以最后一层金字塔的输出作为所述距离分级车辆识别模型的输出结果。Furthermore, the jump output module is used to determine whether the output credibility of the current pyramid layer is greater than the credibility threshold. If it is greater than or equal to, the output of the current pyramid layer is used as the output result of the distance-graded vehicle recognition model; if it is less than, continue to the next layer of pyramid or use the output of the last layer of pyramid as the output result of the distance-graded vehicle recognition model.

所述可信度阈值预设为0.8~0.9。The credibility threshold is preset to 0.8-0.9.

其中，所述识别的车型包括至少包括轿车、SUV、卡车、大巴车、小型货车、大型货车。The identified vehicle types include at least sedans, SUVs, trucks, buses, minivans, and large trucks.

其中，所述像素差异阈值预设为10%~30%。The pixel difference threshold is preset to be 10% to 30%.

所述像素差异阈值可以在具体实施过程中进行调整。The pixel difference threshold may be adjusted during a specific implementation.

其中,所述步骤S20的具体实施方式如下:The specific implementation of step S20 is as follows:

步骤201、采用帧差法计算预处理视频中相邻帧图像的像素差异,公式为:D(x,y,t)=|I(x,y,t)-I(x,y,t-1)|,其中D(x,y,t)表示坐标(x,y)处第t帧与第t-1帧的像素差异值,I(x,y,t)和I(x,y,t-1)分别表示第t帧和第t-1帧的像素值。Step 201, using the frame difference method to calculate the pixel difference between adjacent frame images in the preprocessed video, the formula is: D(x,y,t)=|I(x,y,t)-I(x,y,t-1)|, where D(x,y,t) represents the pixel difference value between the t-th frame and the t-1-th frame at the coordinate (x,y), I(x,y,t) and I(x,y,t-1) represent the pixel values of the t-th frame and the t-1-th frame respectively.

步骤202、根据预设的像素差异阈值T,对D(x,y,t)进行二值化处理,得到二值化图像B(x,y,t):B(x,y,t)={1,D(x,y,t)>=T;0,D(x,y,t)<T}。Step 202: Binarize D(x,y,t) according to a preset pixel difference threshold T to obtain a binary image B(x,y,t): B(x,y,t)={1,D(x,y,t)>=T;0,D(x,y,t)<T}.

步骤203、对所述二值化图像B(x,y,t)进行开运算和闭运算等形态学处理,去除噪点并填充空洞,最终得到视频中的运动区域掩模。Step 203: Perform morphological processing such as opening and closing operations on the binary image B(x, y, t) to remove noise and fill holes, and finally obtain a motion area mask in the video.

其中,所述步骤S30的具体实施方式如下:The specific implementation of step S30 is as follows:

步骤301、采用Canny边缘检测算法对所述运动区域掩模进行边缘检测,得到二值化的边缘图像。Canny算法能够有效地检测出清晰的边缘轮廓,同时抑制噪声。Step 301: Use the Canny edge detection algorithm to perform edge detection on the motion region mask to obtain a binary edge image. The Canny algorithm can effectively detect clear edge contours and suppress noise.

步骤302、对所述边缘图像进行轮廓提取,使用OpenCV库中的findContours()函数,可以得到运动区域的轮廓信息,包括轮廓的坐标点集合。Step 302: extract the contour of the edge image and use the findContours() function in the OpenCV library to obtain the contour information of the motion area, including a set of coordinate points of the contour.

步骤303、对提取的轮廓进行筛选,剔除过小或不规则的轮廓,保留较大且形状较为规则的运动轮廓作为后续识别的输入特征。Step 303: Screen the extracted contours, remove contours that are too small or irregular, and retain larger and more regularly shaped motion contours as input features for subsequent recognition.

其中,所述步骤S40的具体实施方式如下:The specific implementation of step S40 is as follows:

步骤401、采用卡尔曼滤波器对所述运动轮廓进行跟踪,利用前后帧的轮廓信息预测当前帧的轮廓位置,从而实现了运动目标的跟踪。Step 401: Use a Kalman filter to track the motion contour, and use the contour information of the previous and next frames to predict the contour position of the current frame, thereby achieving tracking of the moving target.

步骤402、根据每帧运动轮廓的位置信息,计算得到运动目标的轨迹坐标序列,反映了运动目标在视频中的运动轨迹。Step 402: Calculate the trajectory coordinate sequence of the moving target based on the position information of the motion profile of each frame, which reflects the motion trajectory of the moving target in the video.

步骤403、利用相邻帧运动轮廓的位置变化,结合帧率信息,计算得到运动目标的运动速度,作为识别车型的重要特征之一。Step 403: Utilize the position change of the motion contours of adjacent frames and combine the frame rate information to calculate the motion speed of the moving target as one of the important features for identifying the vehicle type.

其中,所述步骤S50的具体实施方式如下:The specific implementation of step S50 is as follows:

步骤501、构建包含5个依次执行的金字塔卷积层的神经网络模型,用于识别不同距离范围内的运动目标。Step 501: construct a neural network model including five pyramid convolutional layers executed sequentially to identify moving targets within different distance ranges.

步骤502、第一金字塔层用于识别距离范围为第一范围的运动目标;第二层用于识别距离范围为第一范围和第二范围的运动目标组合;第三层用于识别距离范围为第一至第三范围的运动目标组合;第四层用于识别距离范围为第一至第四范围的运动目标组合;第五层用于识别距离范围为第一至第五范围的运动目标组合。Step 502, the first pyramid layer is used to identify moving targets with a distance range of the first range; the second layer is used to identify a combination of moving targets with a distance range of the first range and the second range; the third layer is used to identify a combination of moving targets with distance ranges from the first to the third range; the fourth layer is used to identify a combination of moving targets with distance ranges from the first to the fourth range; the fifth layer is used to identify a combination of moving targets with distance ranges from the first to the fifth range.

步骤503、在神经网络中设置跳转输出模块,用于判断当前金字塔层的输出可信度是否达到阈值,若达到则输出当前层的结果,否则继续下一层处理。Step 503: Set a jump output module in the neural network to determine whether the output credibility of the current pyramid layer reaches a threshold. If so, output the result of the current layer, otherwise continue to process the next layer.

步骤504、将运动轮廓、运动轨迹和运动速度等特征输入所述神经网络模型,得到运动目标对应的车型分类结果。Step 504: Input the features such as motion profile, motion trajectory and motion speed into the neural network model to obtain the vehicle type classification result corresponding to the motion target.

其中,所述步骤S60包括将步骤S50中得到的车型分类结果和运动速度信息作为最终的识别结果输出。Wherein, the step S60 includes outputting the vehicle type classification result and movement speed information obtained in step S50 as the final recognition result.

可选的,所述视频预处理中的降噪处理采用高斯滤波；所述视频预处理中的降噪处理采用中值滤波；所述视频预处理中的颜色空间转换是从RGB空间转换为灰度空间；所述视频预处理中的颜色空间转换是从RGB空间转换为HSV空间；所述运动区域检测中的二值化阈值T取值范围为50-150；所述Canny边缘检测算法的高低阈值分别取值为100和200。Optionally, the noise reduction processing in the video preprocessing adopts Gaussian filtering; the noise reduction processing in the video preprocessing adopts median filtering; the color space conversion in the video preprocessing is from RGB space to grayscale space; the color space conversion in the video preprocessing is from RGB space to HSV space; the binarization threshold T in the motion area detection ranges from 50 to 150; the high and low thresholds of the Canny edge detection algorithm are 100 and 200 respectively.

本发明的第二方面提供一种计算机可读存储介质，其中，所述计算机可读存储介质中存储有程序指令，所述程序指令运行时，用于执行上述的一种低算力的帧差法道路车辆视觉识别方法。A second aspect of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores program instructions, and when the program instructions are executed, they are used to execute the above-mentioned low-computing-power frame difference method for road vehicle visual recognition.

本发明的第三方面提供一种低算力的帧差法道路车辆视觉识别系统，其中，包含上述的计算机可读存储介质。A third aspect of the present invention provides a low-computing-power frame difference method road vehicle visual recognition system, which includes the above-mentioned computer-readable storage medium.

与现有技术相比较，本发明提供的一种低算力的帧差法道路车辆视觉识别方法、介质及系统的有益效果是：Compared with the prior art, the low-computing-power frame difference method road vehicle visual recognition method, medium and system provided by the present invention have the following beneficial effects:

1.算力需求降低1. Reduced computing power requirements

通过帧差法检测运动区域,避免了对整个图像进行处理,大大降低了计算量。同时,距离分级车辆识别模型的金字塔式结构和跳转输出机制,使得对于远距离的小目标,只需要经过前几层的简单计算即可得到识别结果,无需调用后续层的复杂计算,从而进一步降低了算力需求。The frame difference method is used to detect the moving area, avoiding processing the entire image and greatly reducing the amount of calculation. At the same time, the pyramid structure and jump output mechanism of the distance-graded vehicle recognition model make it possible to obtain recognition results for small targets at a long distance only through simple calculations in the first few layers, without the need to call complex calculations in subsequent layers, thereby further reducing the computing power requirements.

2.实时性能提高2. Improved real-time performance

由于算力需求降低,因此可以在相对低算力的硬件平台上实现实时的车辆识别,满足实时应用的需求。同时,金字塔式模型结构和跳转输出机制也加快了识别过程,提高了实时性能。Since the computing power requirement is reduced, real-time vehicle recognition can be achieved on a relatively low-computing hardware platform to meet the needs of real-time applications. At the same time, the pyramid model structure and jump output mechanism also speed up the recognition process and improve real-time performance.

3.识别精度保持3. Maintain recognition accuracy

尽管采用了算力优化的策略,但该方法通过综合利用运动信息、轮廓信息和神经网络模型,仍然能够保持较高的车辆识别精度,满足实际应用的需求。Despite the adoption of a computing power optimization strategy, this method can still maintain a high level of vehicle recognition accuracy by comprehensively utilizing motion information, contour information, and neural network models, meeting the needs of practical applications.

总的来说,该低算力帧差法道路车辆视觉识别方法通过创新的技术方案,实现了算力需求的大幅降低、实时性能的提升,同时保持了较高的识别精度。因此，本发明的方案解决了现有技术存在的由于计算复杂度大，导致需要的算力高的技术问题。In general, the low-computing-power frame difference method road vehicle visual recognition method achieves a significant reduction in computing power requirements and an improvement in real-time performance through innovative technical solutions, while maintaining a high recognition accuracy. Therefore, the solution of the present invention solves the technical problem of the existing technology that high computing power is required due to high computational complexity.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative labor.

图1为本发明提供的方法的流程图；FIG1 is a flow chart of a method provided by the present invention;

图2是实施例2中像素标定移除的过程的示意图；FIG2 is a schematic diagram of a pixel calibration removal process in Example 2;

图3是实施例2中基于帧差法的车辆识别示意图；FIG3 is a schematic diagram of vehicle identification based on the frame difference method in Example 2;

图4是实施例2中图像摇晃下的帧差法结果示意图；FIG4 is a schematic diagram of the frame difference method result under image shaking in Example 2;

图5是实施例2中AKAZE算法性能数据图；FIG5 is a performance data diagram of the AKAZE algorithm in Example 2;

图6是实施例2中ROB算法性能数据图；FIG6 is a diagram of ROB algorithm performance data in Example 2;

图7是实施例2中SIFT算法性能数据图；FIG. 7 is a diagram of SIFT algorithm performance data in Example 2;

图8是实施例2中图像摇晃时视觉特征检测算法示例；FIG8 is an example of a visual feature detection algorithm when an image is shaken in Example 2;

图9是实施例2中图像稳定时视觉特征检测算法示例。FIG. 9 is an example of a visual feature detection algorithm when the image is stable in Embodiment 2.

具体实施方式DETAILED DESCRIPTION

为使本发明实施方式的目的、技术方案和优点更加清楚，下面将结合本发明实施方式中的附图，对本发明实施方式中的技术方案进行清楚、完整地描述。In order to make the purpose, technical solution and advantages of the embodiments of the present invention more clear, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention.

如图1所示，是本发明提供的一种低算力的帧差法道路车辆视觉识别方法流程图，本方法包括以下步骤：As shown in FIG1 , it is a flow chart of a method for visually identifying road vehicles using a frame difference method with low computing power provided by the present invention. The method comprises the following steps:

S20、采用帧差法计算预处理视频中的相邻帧图像的像素差异，根据预设的像素差异阈值以及形态学处理，检测出运动区域；S20, using a frame difference method to calculate pixel differences between adjacent frame images in the preprocessed video, and detecting a motion area according to a preset pixel difference threshold and morphological processing;

S50、利用预先训练好的距离分级车辆识别模型，输入运动轮廓、运动轨迹、运动速度，得到运动轮廓对应的车辆分类；S50, using a pre-trained distance classification vehicle recognition model, inputting a motion profile, a motion trajectory, and a motion speed, and obtaining a vehicle classification corresponding to the motion profile;

S60、输出车辆分类和运动速度作为识别结果。S60: Output the vehicle classification and movement speed as the recognition result.

下面对上述步骤的具体实施方式进行详细描述：The specific implementation methods of the above steps are described in detail below:

步骤S10的具体实施方式是:首先,获取摄像头采集的目标道路的视频数据。然后,对该视频数据进行预处理,包括以下子步骤:The specific implementation of step S10 is: first, obtain the video data of the target road collected by the camera. Then, pre-process the video data, including the following sub-steps:

1)视频降噪处理,采用高斯滤波或中值滤波等方法去除视频中的噪声,以提高后续处理的准确性。1) Video noise reduction processing: Use Gaussian filtering or median filtering to remove noise from the video to improve the accuracy of subsequent processing.

2)视频画面大小调整,将原始视频的分辨率缩小到合适的尺寸,例如320x240像素,减少后续计算的复杂度。2) Video screen resizing: reducing the resolution of the original video to an appropriate size, such as 320x240 pixels, to reduce the complexity of subsequent calculations.

3)视频颜色空间转换,将原始的RGB颜色空间转换为灰度空间或HSV颜色空间,因为后续的运动区域检测和特征提取主要基于灰度或色彩信息。3) Video color space conversion: convert the original RGB color space into grayscale space or HSV color space, because the subsequent motion area detection and feature extraction are mainly based on grayscale or color information.

通过上述预处理步骤,获得了预处理后的视频数据,为后续的运动区域检测和车型识别做好准备。Through the above preprocessing steps, preprocessed video data is obtained, which is ready for subsequent motion area detection and vehicle model recognition.

步骤S20的具体实施方式是:采用帧差法计算预处理视频中相邻帧图像的像素差异,然后根据预设的像素差异阈值以及形态学处理,检测出视频中的运动区域。其中:The specific implementation of step S20 is: using the frame difference method to calculate the pixel difference between adjacent frame images in the preprocessed video, and then detecting the motion area in the video according to a preset pixel difference threshold and morphological processing. Wherein:

1)帧差法计算相邻帧的像素差异,具体公式为:1) The frame difference method calculates the pixel difference between adjacent frames. The specific formula is:

D(x,y,t)= |I(x,y,t)- I(x,y,t-1)|D(x,y,t)= |I(x,y,t)- I(x,y,t-1)|

式中,D(x,y,t)表示坐标(x,y)处第t帧与第t-1帧的像素差异值,I(x,y,t)和I(x,y,t-1)分别表示第t帧和第t-1帧的像素值。Where D(x,y,t) represents the pixel difference between the t-th frame and the t-1-th frame at the coordinate (x,y), I(x,y,t) and I(x,y,t-1) represent the pixel values of the t-th frame and the t-1-th frame respectively.

2)根据预设的像素差异阈值T,对D(x,y,t)进行二值化处理,得到二值化图像B(x,y,t):2) According to the preset pixel difference threshold T, D(x, y, t) is binarized to obtain the binary image B(x, y, t):

B(x,y,t)= {B(x,y,t)= {

1, D(x,y,t)>= T1, D(x,y,t)>= T

0, D(x,y,t)<T0, D(x,y,t)<T

}}

3)对二值化图像B(x,y,t)进行开运算和闭运算等形态学处理,去除噪点并填充空洞,最终得到视频中的运动区域掩模。3) Perform morphological processing such as opening and closing operations on the binary image B(x, y, t) to remove noise and fill holes, and finally obtain the motion area mask in the video.

帧差法基于摄像头拍摄视频中的图像变化连续的特点，在静态环境下，相邻图像中像素的差异很小；若存在运动目标，则相邻图像中部分像素的灰度值会发生显著变化，通过阈值对运动像素进行筛选，从而实现对运动目标的检测。在道路摄像头位置固定且道路图像内容单一的情况下，帧差法的优势更加显著。由于道路摄像头通常会拍摄相对单一的道路场景，这意味着背景的变化较小，而车辆或行人等移动目标相对容易被检测出来。帧差法能够快速、准确地检测到这些移动目标，而且对于摄像头位置固定的情况下，帧差法不需要额外的校准或调整，因为背景变化较小，使得算法的实现更加简单。本发明首先对摄像头采集的原始数据进行数据预处理，在此基础上基于帧差法得到车辆的位置数据和车辆的图像数据。对于实际场景下可能出现的摄像头晃动的情况，通过视觉特征检测算法来减少或消除视频晃动，以保证帧差法的效果。最后，在已有的基于帧差法的车辆识别算法的基础上，对行驶车辆进行重识别。The frame difference method is based on the continuous change of images in the video captured by the camera. In a static environment, the difference between pixels in adjacent images is very small; if there is a moving target, the grayscale value of some pixels in the adjacent images will change significantly, and the moving pixels are screened by the threshold, so as to detect the moving target. When the road camera is fixed and the road image content is single, the advantage of the frame difference method is more significant. Since the road camera usually shoots a relatively single road scene, it means that the background changes little, and moving targets such as vehicles or pedestrians are relatively easy to detect. The frame difference method can detect these moving targets quickly and accurately, and when the camera position is fixed, the frame difference method does not require additional calibration or adjustment because the background changes little, making the algorithm implementation simpler. The present invention first preprocesses the raw data collected by the camera, and on this basis obtains the vehicle position data and the vehicle image data based on the frame difference method. For the camera shaking that may occur in the actual scene, the visual feature detection algorithm is used to reduce or eliminate the video shaking to ensure the effect of the frame difference method. Finally, based on the existing vehicle recognition algorithm based on the frame difference method, the moving vehicle is re-identified.

步骤S30的具体实施方式是:对检测出的运动区域进行边缘检测和轮廓提取,得到运动轮廓。具体包括以下子步骤:The specific implementation of step S30 is: edge detection and contour extraction are performed on the detected motion area to obtain a motion contour. Specifically, it includes the following sub-steps:

1)采用Canny边缘检测算法对运动区域掩模进行边缘检测,得到二值化的边缘图像。Canny算法的优点是能够有效地检测出清晰的边缘轮廓,同时抑制噪声。1) Use the Canny edge detection algorithm to detect the edge of the motion area mask and obtain a binary edge image. The advantage of the Canny algorithm is that it can effectively detect clear edge contours while suppressing noise.

2)对边缘图像进行轮廓提取,使用OpenCV库中的findContours()函数,可以得到运动区域的轮廓信息,包括轮廓的坐标点集合。2) Perform contour extraction on the edge image and use the findContours() function in the OpenCV library to obtain the contour information of the moving area, including the set of coordinate points of the contour.

3)对提取的轮廓进行筛选,剔除过小或不规则的轮廓,保留较大且形状较为规则的运动轮廓作为后续识别的输入特征。3) Screen the extracted contours, remove contours that are too small or irregular, and retain larger and more regularly shaped motion contours as input features for subsequent recognition.

步骤S40的具体实施方式是:对检测到的运动轮廓进行跟踪,得到运动轨迹和运动速度。其中:The specific implementation of step S40 is: tracking the detected motion profile to obtain the motion trajectory and motion speed.

1)采用卡尔曼滤波器对运动轮廓进行跟踪,利用前后帧的轮廓信息预测当前帧的轮廓位置,从而实现了运动目标的跟踪。1) The Kalman filter is used to track the motion contour, and the contour information of the previous and next frames is used to predict the contour position of the current frame, thereby achieving the tracking of the moving target.

2)根据每帧运动轮廓的位置信息,计算得到运动目标的轨迹坐标序列,反映了运动目标在视频中的运动轨迹。2) Based on the position information of the motion contour in each frame, the trajectory coordinate sequence of the moving target is calculated, which reflects the motion trajectory of the moving target in the video.

3)利用相邻帧运动轮廓的位置变化,结合帧率信息,计算得到运动目标的运动速度,作为识别车型的重要特征之一。3) Using the position changes of the motion contours of adjacent frames and combining them with the frame rate information, the motion speed of the moving target is calculated as one of the important features for identifying the vehicle model.

步骤S50的具体实施方式是:利用预先训练好的距离分级车辆识别模型,输入运动轮廓、运动轨迹和运动速度等特征,得到运动轮廓对应的车辆分类。该距离分级车辆识别模型采用金字塔式神经网络结构,具体包括:The specific implementation of step S50 is: using a pre-trained distance classification vehicle recognition model, inputting features such as motion profile, motion trajectory and motion speed, and obtaining the vehicle classification corresponding to the motion profile. The distance classification vehicle recognition model adopts a pyramid neural network structure, specifically including:

1)输入层:接收运动轮廓的图像数据、运动轨迹坐标序列和运动速度等作为输入特征。1) Input layer: receives image data of motion contours, motion trajectory coordinate sequence, motion speed, etc. as input features.

2)多个金字塔卷积层:包含5个依次执行的金字塔卷积层,每个层负责识别不同距离范围内的运动目标。2) Multiple pyramid convolutional layers: It contains 5 pyramid convolutional layers executed sequentially, each layer is responsible for identifying moving targets within different distance ranges.

第一层:识别距离范围为第一范围的运动目标;First layer: identify moving targets within the first range;

第二层:识别距离范围为第一范围和第二范围的运动目标组合;The second layer: identifying the combination of moving targets with a distance range of the first range and the second range;

第三层:识别距离范围为第一至第三范围的运动目标组合;The third layer: identify the combination of moving targets in the first to third ranges;

第四层:识别距离范围为第一至第四范围的运动目标组合;The fourth layer: identifies the combination of moving targets in the first to fourth ranges;

第五层:识别距离范围为第一至第五范围的运动目标组合。Fifth layer: Identify the combination of moving targets in the distance range from the first to the fifth range.

3)跳转输出模块:判断当前金字塔层的输出可信度是否达到阈值,若达到则输出当前层的结果,否则继续下一层处理。3) Jump output module: Determine whether the output credibility of the current pyramid layer reaches the threshold. If so, output the result of the current layer, otherwise continue to process the next layer.

4)输出层:输出运动目标对应的车型分类结果。4) Output layer: Output the vehicle type classification results corresponding to the moving target.

步骤S60的具体实施方式是:将步骤S50中得到的车型分类结果和运动速度信息作为最终的识别结果输出。The specific implementation method of step S60 is: outputting the vehicle type classification result and movement speed information obtained in step S50 as the final recognition result.

总之,这种低算力的帧道路视觉车辆识别方法,通过逐步优化的金字塔式神经网络结构,能够有效地利用运动轮廓、运动轨迹和运动速度等特征,在不同距离范围内准确识别出道路上行驶的车辆类型,并输出相应的结果,具有较低的计算复杂度。该方法可应用于智能交通监控、自动驾驶等场景,为相关应用提供有价值的车辆识别信息。In summary, this low-computation frame road vision vehicle recognition method can effectively utilize features such as motion profile, motion trajectory and motion speed through a gradually optimized pyramid neural network structure, accurately identify the type of vehicles on the road within different distance ranges, and output the corresponding results, with low computational complexity. This method can be applied to scenarios such as intelligent traffic monitoring and autonomous driving, and provide valuable vehicle recognition information for related applications.

为了更好的理解和实施本发明，下面提供本发明在计算机可读存储介质或计算机等电子装置、系统中作为计算机程序的一个具体的实施例1，实施例1中，首先,对于步骤S10的视频预处理过程,可以采用以下公式描述:In order to better understand and implement the present invention, a specific embodiment 1 of the present invention as a computer program in a computer-readable storage medium or an electronic device or system such as a computer is provided below. In the embodiment 1, first, the video preprocessing process of step S10 can be described by the following formula:

视频降噪处理采用高斯滤波器,公式如下:The video noise reduction process uses a Gaussian filter, and the formula is as follows:

其中,表示滤波处理后第帧坐标处的像素值,表示原始第帧坐标处的像素值,为高斯核的标准差,为高斯核的半径大小。通过该高斯滤波公式可以有效去除视频中的噪声,提高后续处理的准确性。in, After filtering, Frame Coordinates The pixel value at Indicates the original Frame Coordinates The pixel value at is the standard deviation of the Gaussian kernel, is the radius of the Gaussian kernel. The Gaussian filter formula can effectively remove the noise in the video and improve the accuracy of subsequent processing.

视频分辨率缩小调整可以采用双线性插值算法,公式如下:The video resolution can be reduced and adjusted using a bilinear interpolation algorithm, the formula is as follows:

其中,表示缩放后图像坐标处的像素值,表示原图像最近的4个像素点的值,和分别为和方向的缩放比例。通过该公式可以将原始视频分辨率缩小至合适尺寸,以减少后续计算的复杂度。in, Represents the image coordinates after scaling The pixel value at Represents the values of the 4 nearest pixels of the original image. and They are and This formula can be used to reduce the original video resolution to a suitable size to reduce the complexity of subsequent calculations.

视频颜色空间转换从RGB到灰度空间的公式如下:The formula for converting video color space from RGB to grayscale space is as follows:

其中,表示第帧坐标处的灰度值,,和分别表示第帧坐标处的红绿蓝三通道值。通过该公式可以将原始的RGB颜色空间转换为更适合后续处理的灰度空间。in, Indicates Frame Coordinates The gray value at , and Respectively represent Frame Coordinates The red, green and blue channel values at . This formula can be used to convert the original RGB color space into a grayscale space that is more suitable for subsequent processing.

接下来,对于步骤S20的运动区域检测过程,可以采用以下公式描述:Next, the motion region detection process in step S20 can be described by the following formula:

帧差法计算相邻帧像素差异的公式如下:The formula for calculating the pixel difference between adjacent frames using the frame difference method is as follows:

其中,表示坐标处第帧与第帧的像素差异值,和分别表示第帧和第帧的像素值。通过计算相邻帧像素的差异,可以检测出视频中的运动区域。in, Representing coordinates Place Frame and The pixel difference value of the frame, and Respectively represent Frame and By calculating the difference between pixels of adjacent frames, the motion area in the video can be detected.

二值化处理采用以下公式:The binarization process uses the following formula:

其中,表示二值化后坐标处第帧的像素值,为预设的像素差异阈值。通过该二值化处理,可以突出视频中的运动区域。in, Represents the coordinates after binarization Place The pixel value of the frame, is the preset pixel difference threshold. Through this binarization process, the motion area in the video can be highlighted.

形态学处理中的开运算公式如下:The opening operation formula in morphological processing is as follows:

其中,表示开运算后第帧坐标处的像素值,表示二值化图像,为结构元素,和分别表示腐蚀和膨胀运算。开运算可以去除噪点,闭运算可以填充空洞,从而得到较为光滑的运动区域掩模。in, Indicates the first Frame Coordinates The pixel value at represents a binary image, is the structural element, and They represent erosion and dilation operations respectively. The opening operation can remove noise points, and the closing operation can fill holes, thus obtaining a smoother motion area mask.

再次,对于步骤S30的运动轮廓提取过程,可以采用以下公式描述:Again, the motion contour extraction process of step S30 can be described by the following formula:

Canny边缘检测算法的核心公式如下:The core formula of the Canny edge detection algorithm is as follows:

其中,和分别表示图像在和方向的梯度,表示梯度幅值,表示梯度方向,为高阈值。Canny算法能够有效地检测出清晰的边缘轮廓,同时抑制噪声。in, and Respectively represent the images in and The gradient of the direction, represents the gradient amplitude, represents the gradient direction, The Canny algorithm can effectively detect clear edge contours while suppressing noise.

轮廓提取采用OpenCV库中的findContours()函数,该函数根据以下公式计算轮廓:Contour extraction uses the findContours() function in the OpenCV library, which calculates the contour according to the following formula:

其中,表示一个轮廓,表示该轮廓的第个点的坐标,为轮廓上的点的总数。通过该函数可以得到运动区域的轮廓信息,包括轮廓的坐标点集合。in, Represents a contour, The contour The coordinates of the points, is the total number of points on the contour. This function can be used to obtain the contour information of the motion area, including the coordinate point set of the contour.

对于步骤S40的运动目标跟踪过程,可以采用以下公式描述:The moving target tracking process of step S40 can be described by the following formula:

卡尔曼滤波器的状态方程和观测方程如下:The state equation and observation equation of the Kalman filter are as follows:

其中,为第个时刻的状态向量,为状态转移矩阵,为控制输入矩阵,为控制输入向量,为状态噪声,为第个时刻的观测向量,为观测矩阵,为观测噪声。通过卡尔曼滤波可以预测当前帧的运动轮廓位置,从而实现了运动目标的跟踪。in, For the The state vector at the moment, is the state transfer matrix, is the control input matrix, is the control input vector, is the state noise, For the The observation vector at time instant, is the observation matrix, The Kalman filter can be used to predict the position of the motion contour of the current frame, thereby achieving the tracking of the moving target.

根据运动轮廓的位置变化,可以计算得到运动速度:According to the position change of the motion profile, the motion speed can be calculated :

其中和分别表示第帧和第帧运动目标的位置坐标,和分别表示第帧和第帧的时间戳。通过该公式可以计算得到运动目标的速度信息。in and Respectively represent Frame and The position coordinates of the frame moving target, and Respectively represent Frame and The timestamp of the frame. The speed information of the moving target can be calculated by this formula.

最后,对于步骤S50的距离分级车辆识别模型,可以采用以下公式描述:Finally, the distance classification vehicle recognition model in step S50 can be described by the following formula:

该模型采用金字塔式神经网络结构,每个金字塔层的输出可以表示为:The model adopts a pyramid neural network structure, and the output of each pyramid layer can be expressed as:

其中,为第层的输出向量,为第层的输入向量,和分别为第层的权重矩阵和偏置向量,为激活函数。每个金字塔层负责识别不同距离范围内的运动目标。in, For the The output vector of the layer, For the The input vector of the layer, and Respectively The weight matrix and bias vector of the layer, is the activation function. Each pyramid layer is responsible for identifying moving targets within different distance ranges.

跳转输出模块的判断公式如下:The judgment formula for the jump output module is as follows:

其中,为第层的输出向量,为可信度阈值,为金字塔层的总数。当当前层的输出可信度达到阈值时,输出当前层的结果,否则继续下一层处理直至最后一层。in, For the The output vector of the layer, is the credibility threshold, is the total number of pyramid layers. When the output credibility of the current layer reaches the threshold, the result of the current layer is output, otherwise the next layer is processed until the last layer.

通过以上公式,可以详细描述这种低算力的帧道路视觉车辆识别方法的具体实施过程。该方法充分利用运动轮廓、运动轨迹和运动速度等特征,采用金字塔式神经网络高效地实现了不同距离范围内的车辆识别,具有较低的计算复杂度,可广泛应用于智能交通监控、自动驾驶等场景。Through the above formula, the specific implementation process of this low-computation frame road vision vehicle recognition method can be described in detail. This method makes full use of the characteristics of motion contour, motion trajectory and motion speed, and uses a pyramid neural network to efficiently realize vehicle recognition within different distance ranges. It has low computational complexity and can be widely used in scenarios such as intelligent traffic monitoring and autonomous driving.

具体的，本发明的原理是：利用视频图像中车辆的运动特征,如运动轮廓、运动轨迹和运动速度,通过金字塔式神经网络模型实现不同距离车辆的高效识别。具体的技术原理如下:Specifically, the principle of the present invention is to utilize the motion characteristics of vehicles in video images, such as motion profile, motion trajectory and motion speed, to achieve efficient recognition of vehicles at different distances through a pyramid neural network model. The specific technical principles are as follows:

1.采用帧差法检测运动区域1. Use frame difference method to detect motion area

首先对视频流进行预处理,然后利用帧差法计算相邻帧之间的像素差异。通过设置合适的阈值和形态学操作,可以有效地检测出图像中的运动区域,即车辆区域。这种基于运动检测的方法避免了对整个图像进行处理,大大降低了计算量。First, the video stream is preprocessed, and then the pixel differences between adjacent frames are calculated using the frame difference method. By setting appropriate thresholds and morphological operations, the moving area in the image, namely the vehicle area, can be effectively detected. This motion detection-based method avoids processing the entire image and greatly reduces the amount of calculation.

2.边缘检测和轮廓提取2. Edge detection and contour extraction

对检测到的运动区域进行边缘检测和轮廓提取,得到运动目标的轮廓信息。这一步骤进一步降低了数据的维度,为后续的识别过程提供了精简的特征输入。The detected moving area is subjected to edge detection and contour extraction to obtain the contour information of the moving target. This step further reduces the dimension of the data and provides a streamlined feature input for the subsequent recognition process.

3.运动轨迹和速度估计3. Motion trajectory and velocity estimation

通过对运动轮廓进行跟踪,可以估计出目标的运动轨迹和速度信息。这些信息对于车辆的识别和分类具有重要的辅助作用。By tracking the motion profile, the target's motion trajectory and speed information can be estimated, which plays an important auxiliary role in vehicle recognition and classification.

4.距离分级车辆识别模型4. Distance classification vehicle recognition model

本发明提出了一种新颖的距离分级车辆识别模型,该模型采用金字塔式神经网络结构。模型包含五个依次执行的金字塔层,每个金字塔层用于识别不同距离范围的运动轮廓组合。The present invention proposes a novel distance-classified vehicle recognition model, which adopts a pyramid neural network structure. The model includes five pyramid layers executed in sequence, and each pyramid layer is used to recognize a combination of motion profiles in different distance ranges.

具体来说,第一金字塔层识别最远距离范围的单个运动轮廓,第二层识别第一、二距离范围的两个轮廓组合,第三层识别第一、二、三距离范围的三个轮廓组合,以此类推。每个金字塔层的结构包括卷积层组、池化层组和全连接层组,用于提取特征和进行分类。Specifically, the first pyramid layer identifies a single motion contour in the farthest distance range, the second layer identifies two contour combinations in the first and second distance ranges, the third layer identifies three contour combinations in the first, second, and third distance ranges, and so on. The structure of each pyramid layer includes a convolution layer group, a pooling layer group, and a fully connected layer group for feature extraction and classification.

5.跳转输出模块5. Jump output module

该模型还引入了一个跳转输出模块,用于判断当前金字塔层的输出可信度。如果可信度高于阈值,则直接输出当前层的识别结果,无需继续进行后续层的计算,从而进一步节省了算力。The model also introduces a jump output module to determine the output credibility of the current pyramid layer. If the credibility is higher than the threshold, the recognition result of the current layer is directly output without continuing the calculation of the subsequent layers, thus further saving computing power.

为了更好的理解和实施本发明，下面提供本发明在具体场景应用的一个实施例2。In order to better understand and implement the present invention, an embodiment 2 of the present invention applied in a specific scenario is provided below.

1.系统部署1. System deployment

该城市的主干道路长度约10公里,交通量较大,采用了双向4车道的道路结构。为了全面覆盖该路段,交管部门在道路两侧共部署了20台高清摄像头,间距约500米,全部连接至中央控制室。这些摄像头采用的是300万像素的1/2.8英寸CMOS传感器,可以捕获1920x1080分辨率的视频图像,帧率为30帧/秒。The main road in the city is about 10 kilometers long and has a large traffic volume. It adopts a two-way four-lane road structure. In order to fully cover the road section, the traffic control department deployed a total of 20 high-definition cameras on both sides of the road, with a spacing of about 500 meters, all connected to the central control room. These cameras use 3 million pixel 1/2.8 inch CMOS sensors, which can capture video images with a resolution of 1920x1080 and a frame rate of 30 frames per second.

在中央控制室,将所有摄像头采集的视频汇总至一台服务器进行统一处理。该服务器采用了英特尔酷睿i7-10700处理器,配备32GB内存和512GB SSD固态硬盘。操作系统为Ubuntu 20.04 LTS,预先安装好了本发明方法所需的OpenCV、Tensorflow等软件库。In the central control room, the videos collected by all cameras are aggregated to a server for unified processing. The server uses an Intel Core i7-10700 processor, equipped with 32GB of memory and a 512GB SSD solid-state drive. The operating system is Ubuntu 20.04 LTS, and the software libraries such as OpenCV and Tensorflow required by the method of the present invention are pre-installed.

2.视频预处理2. Video Preprocessing

首先,服务器对从各个摄像头实时采集的视频数据进行预处理。具体包括:First, the server preprocesses the video data collected from each camera in real time. Specifically, it includes:

(1)视频降噪处理:采用高斯滤波算法,将视频图像中的噪声进行有效抑制。高斯核的标准差取值为1.0,半径为3个像素。(1) Video noise reduction: Gaussian filtering algorithm is used to effectively suppress the noise in the video image. The standard deviation of the Gaussian kernel The value is 1.0, the radius is 3 pixels.

(2)视频分辨率缩小:将原始1920x1080的视频分辨率缩小至320x240,缩放比例。这样可以有效减少后续计算的复杂度。(2) Video resolution reduction: Reduce the original 1920x1080 video resolution to 320x240, the scaling ratio This can effectively reduce the complexity of subsequent calculations.

(3)颜色空间转换:将视频从RGB颜色空间转换为灰度空间,以便后续的运动区域检测和特征提取。(3) Color space conversion: Convert the video from RGB color space to grayscale space for subsequent motion area detection and feature extraction.

通过上述预处理步骤,服务器每秒可以处理20个来自不同摄像头的320x240分辨率的灰度视频帧。Through the above preprocessing steps, the server can process 20 grayscale video frames with a resolution of 320x240 from different cameras per second.

3.运动区域检测3. Motion area detection

接下来,服务器对预处理后的视频帧采用帧差法检测运动区域。具体步骤如下:Next, the server uses the frame difference method to detect the motion area of the pre-processed video frames. The specific steps are as follows:

(1)计算相邻帧的像素差异:(1) Calculate the pixel difference between adjacent frames:

其中,表示坐标处第帧与前一帧的像素差异值,和分别为第帧和第帧的像素值。in, Representing coordinates Place The pixel difference between the frame and the previous frame, and Respectively Frame and The pixel value of the frame.

(2)根据预设的差异阈值对进行二值化处理,得到运动区域掩模:。(2) Based on the preset difference threshold right Perform binarization to obtain the motion area mask : .

(3)对二值化图像进行开运算和闭运算等形态学处理,去除噪点并填充空洞,最终得到较为光滑的运动区域掩模。(3) Binarized image Morphological processing such as opening and closing operations is performed to remove noise and fill holes, and finally a smoother motion area mask is obtained.

通过该运动区域检测过程,服务器能够准确地定位出视频中正在运动的车辆区域。Through the motion area detection process, the server can accurately locate the moving vehicle area in the video.

4.运动轮廓提取4. Motion contour extraction

基于检测到的运动区域,服务器接下来采用Canny边缘检测算法提取车辆的运动轮廓,具体步骤如下:Based on the detected motion area, the server then uses the Canny edge detection algorithm to extract the motion contour of the vehicle. The specific steps are as follows:

(1)计算Canny边缘检测的梯度幅值和方向:(1) Calculate the gradient amplitude of Canny edge detection and direction :

其中,和分别表示图像在和方向的梯度。in, and Respectively represent the images in and Direction gradient.

(2)根据预设的高阈值和低阈值,对梯度幅值进行非极大值抑制和滞后边缘连接,得到Canny边缘检测的输出。(2) Based on the preset high threshold and low threshold , for the gradient amplitude Perform non-maximum suppression and lag edge connection to obtain the output of Canny edge detection .

(3)利用OpenCV库的findContours()函数,从Canny边缘输出中提取出车辆的运动轮廓,其中为轮廓上的坐标点,为总点数。(3) Use the findContours() function of the OpenCV library to extract the motion contour of the vehicle from the Canny edge output ,in are the coordinate points on the contour, The total number of points.

对于传统车辆检测，使用帧差法对前后帧做差值形成灰度图像的过程中，对于道路中无用的区域出现重复检测区域，增加设备所用成本与运算时间。因此在预处理阶段首先采用图像标定法，对于无需重复计算的非道路区域进行标定并记录，读取标定区域创建了一个与读取的图像相同大小的掩码数组。将掩码中指定坐标位置的像素值设 0，将这些像素点从掩码中移除，从而实现移除背景的效果，实现对于帧差法优化的过程。其具体界面标定过程如图2所示。如图3所示，使用边缘检测在二值化图像中查找车辆轮廓。通过寻找像素点之间的连通性，从而找到闭合的轮廓并对每个轮廓进行处理。对于每个找到的轮廓计算其面积，设定道路车辆区域的预估所占像素区域的最大面积S2与最小面积S1，若检测到的区域轮廓面积C满足S1<C<S2，获取轮廓的最小外接矩形，并获取矩形的四个顶点坐标。由于摄像头视线与地平面并非垂直，而是存在夹角，因此会造成画面中不同位置帧差出车辆轮廓的框选所占画面整体比例有所不同。对于此问题，本实施例通过对画面进行划分并依据划分区域通过参数调整视频中矩形轮廓内边缘检测点的数量，从而对画面中不同位置的框选区域像素的可能的轮廓大小进行限定，以提高框选的准确性。For traditional vehicle detection, when using the frame difference method to make a difference between the previous and next frames to form a grayscale image, repeated detection areas appear for useless areas on the road, increasing the cost and operation time of the equipment. Therefore, in the preprocessing stage, the image calibration method is first used to calibrate and record the non-road areas that do not need to be repeatedly calculated. The calibration area is read to create a mask array of the same size as the read image. The pixel values at the specified coordinate positions in the mask are set to 0, and these pixels are removed from the mask to achieve the effect of removing the background and realize the process of optimizing the frame difference method. The specific interface calibration process is shown in Figure 2. As shown in Figure 3, edge detection is used to find the vehicle contour in the binary image. By finding the connectivity between the pixels, the closed contour is found and each contour is processed. For each contour found, its area is calculated, and the maximum area S2 and the minimum area S1 of the estimated pixel area occupied by the road vehicle area are set. If the contour area C of the detected area satisfies S1<C<S2, the minimum circumscribed rectangle of the contour is obtained, and the coordinates of the four vertices of the rectangle are obtained. Since the camera line of sight is not perpendicular to the ground plane but at an angle, the proportion of the frame selected by the vehicle outline in different frames at different positions in the picture will be different. To solve this problem, this embodiment divides the picture and adjusts the number of edge detection points in the rectangular outline of the video through parameters according to the divided areas, thereby limiting the possible outline size of the pixels of the framed area at different positions in the picture to improve the accuracy of the frame selection.

针对道路交通环境的实际情况进行分析，由于道路摄像头的画幅中，绝大多数的物体是静止不动的，仅有车辆处于高速移动状态，且是我们进行数据采集的主要对象；此外，拍摄设备的摇晃导致的画面变化往往是连续的，并且在灯杆的支撑下只会导致画面小幅度的变化。在这种情况下，基于周边静止物体的边缘点，对画面进行矫正是一种可能的方案，通过对路边及道路上具有明显静止特征的物体边缘进行识别，并进行帧间比对，之后将其作为画面矫正的关键点，在不占用较多的CPU算力和内存的情况下，能够有效地降低摄像头画面的晃动情况。本实施例针对图像摇晃问题提出使用视觉特征检测算法提高帧差法在视频摇晃的道路场景下进行图像采集稳定性的视频稳定方案，旨在减少由摄像头晃动引起的视频抖动。本实施例通过实验对比验证，旨在寻找更适合道路灯杆场景的稳定画面的算法，其中SIFT算法，ORB算法，AKAZE对于实际场景各有优势。According to the actual situation of the road traffic environment, since most of the objects in the road camera's frame are stationary, only vehicles are in high-speed movement and are the main objects of our data collection; in addition, the picture changes caused by the shaking of the shooting equipment are often continuous, and only cause small changes in the picture under the support of the lamp pole. In this case, it is a possible solution to correct the picture based on the edge points of the surrounding stationary objects. By identifying the edges of objects with obvious stationary features on the roadside and the road, and performing inter-frame comparison, and then using them as the key points for picture correction, the shaking of the camera picture can be effectively reduced without occupying more CPU computing power and memory. In view of the problem of image shaking, this embodiment proposes a video stabilization solution that uses a visual feature detection algorithm to improve the frame difference method to perform image acquisition stability in a video shaking road scene, aiming to reduce the video jitter caused by camera shaking. This embodiment is verified by experimental comparison, aiming to find an algorithm that is more suitable for the road lamp pole scene to stabilize the picture, among which the SIFT algorithm, the ORB algorithm, and the AKAZE algorithm each have advantages for actual scenes.

通过实验，AKAZE在性能和速度之间提供了一个较好的平衡点。虽然它的效果可能不及SIFT，但其速度优势使它在处理速度和准确度都有一定要求的应用场景中变得更加合适。而ORB，尽管速度更快，但在鲁棒性和匹配精度上的不足可能导致其在某些应用场景下表现不佳。造成这个结果的原因是SIFT方法通过检测图像中的极值点来识别关键点，并在关键点周围构建方向直方图以创建特征描述符。这些描述符是尺度不变的，也对旋转、照明变化保持一定程度的不变性，复杂的计算过程使SIFT能够提供非常鲁棒的匹配结果，但计算成本高，速度慢，对于本实施例中实时应用且计算资源有限的场景不太合适。Through experiments, AKAZE provides a good balance between performance and speed. Although its effect may not be as good as SIFT, its speed advantage makes it more suitable in application scenarios that have certain requirements for processing speed and accuracy. Although ORB is faster, its shortcomings in robustness and matching accuracy may cause it to perform poorly in some application scenarios. The reason for this result is that the SIFT method identifies key points by detecting extreme points in the image and constructs direction histograms around the key points to create feature descriptors. These descriptors are scale-invariant and also maintain a certain degree of invariance to rotation and lighting changes. The complex calculation process enables SIFT to provide very robust matching results, but the calculation cost is high and the speed is slow, which is not suitable for real-time applications and scenarios with limited computing resources in this embodiment.

而ORB使用FAST算法来检测角点，FAST算法的运行速度非常快，因为它仅仅需要考虑像素点周围的少量像素来判断该点是否为角点，且ORB使用BRIEF的方法来生成这些点的描述符，BRIEF提供了一种非常快速的特征描述符计算方法其核心思想是通过比较关键点周围随机选取的像素对来生成一个二进制字符串。这种描述符的计算速度非常快，因为它仅仅涉及简单的像素比较操作，而不需要进行复杂的梯度或方向直方图计算。ORB uses the FAST algorithm to detect corners. The FAST algorithm runs very fast because it only needs to consider a small number of pixels around the pixel to determine whether the point is a corner point. ORB uses the BRIEF method to generate descriptors for these points. BRIEF provides a very fast feature descriptor calculation method. The core idea is to generate a binary string by comparing randomly selected pixel pairs around the key point. The calculation speed of this descriptor is very fast because it only involves simple pixel comparison operations without complex gradient or directional histogram calculations.

然而，虽然ORB在速度上有优势，但其鲁棒性和精度较低，无法准确匹配出令人满意的结果。AKAZE算法在ORB和SIFT之间提供了一个良好的平衡，它利用非线性滤波器和高效的二进制描述符来达到既快速又鲁棒的特征检测和匹配效果。AKAZE基于非线性尺度空间理论的特征检测器，它使用快速的显式扩散滤波器来构建尺度空间，这使其提供了一个较快的尺度空间构建方式，计算速度远快于传统的SIFT的高斯模糊。另外，AKAZE采用了所谓的M-LDB描述符，与ORB中使用的BRIEF描述符相比，M-LDB进行了更多局部像素比较和更复杂的模式，能够提供更好的描述能力和更高的鲁棒性。However, although ORB has advantages in speed, its robustness and accuracy are low, and it cannot accurately match satisfactory results. The AKAZE algorithm provides a good balance between ORB and SIFT. It uses nonlinear filters and efficient binary descriptors to achieve both fast and robust feature detection and matching effects. AKAZE is a feature detector based on nonlinear scale space theory. It uses fast explicit diffusion filters to construct scale space, which provides a faster way to construct scale space and is much faster than the traditional Gaussian blur of SIFT. In addition, AKAZE uses the so-called M-LDB descriptor. Compared with the BRIEF descriptor used in ORB, M-LDB performs more local pixel comparisons and more complex patterns, which can provide better description capabilities and higher robustness.

因此，基于对三种不同的算法方式的比较，从具体的实验结果来看，AKAZE算法如图5所示，处理每张图片的平均时间为0.08秒，CPU占用率为9.23%。ROB算法如图6所示，处理每张图片的平均时间为0.03秒，CPU占用率为5.59%。SIFT算法如图7所示，处理每张图片的平均时间为0.12秒，CPU占用率为10.01%。AKAZE算法对比其他两种算法，速度与算力占用居中，而且实现效果较好，因此AKAZE算法更加适合作为本实施例场景下的画面修正的方案。利用AKAZE算法在选定的ROI上检测特征点和计算描述符。AKAZE算法是一种用于检测图像特征点并生成描述符的算法，适用于图像匹配和识别任务。初始化FLANN匹配器，设置其参数（使用KD树算法和最小化搜索检查）以加快特征点匹配的速度。在视频的每一帧中，扩展原始ROI以避免特征点丢失，然后使用AKAZE算法检测特征点并计算描述符。将当前帧和前一帧的描述符使用FLANN匹配器进行匹配，筛选出好的匹配点。如果匹配点足够多，计算当前帧到前一帧的变换矩阵，并应用这个变换矩阵进行透视变换，以对齐图像。之后使用帧差法与轮廓检测对处理的图像进行进一步的运算，将处理后的差异图像上查找轮廓，并将符合条件的轮廓信息保存。Therefore, based on the comparison of three different algorithm methods, from the specific experimental results, the AKAZE algorithm is shown in Figure 5, and the average time for processing each picture is 0.08 seconds, and the CPU occupancy rate is 9.23%. The ROB algorithm is shown in Figure 6, and the average time for processing each picture is 0.03 seconds, and the CPU occupancy rate is 5.59%. The SIFT algorithm is shown in Figure 7, and the average time for processing each picture is 0.12 seconds, and the CPU occupancy rate is 10.01%. Compared with the other two algorithms, the AKAZE algorithm is in the middle of speed and computing power, and the implementation effect is better, so the AKAZE algorithm is more suitable as a solution for picture correction in the scenario of this embodiment. The AKAZE algorithm is used to detect feature points and calculate descriptors on the selected ROI. The AKAZE algorithm is an algorithm for detecting image feature points and generating descriptors, which is suitable for image matching and recognition tasks. Initialize the FLANN matcher and set its parameters (using the KD tree algorithm and minimized search check) to speed up the matching of feature points. In each frame of the video, the original ROI is expanded to avoid feature point loss, and then the AKAZE algorithm is used to detect feature points and calculate descriptors. The descriptors of the current frame and the previous frame are matched using the FLANN matcher to filter out good matching points. If there are enough matching points, the transformation matrix from the current frame to the previous frame is calculated, and this transformation matrix is applied to perform perspective transformation to align the images. The processed images are then further operated using the frame difference method and contour detection, and contours are searched on the processed difference images, and the contour information that meets the conditions is saved.

使用AKAZE算法降低摇晃在保证算力和存储空间占用的前提下，具有两个方面的优势，其一，视觉特征检测算法可以有效地减少画面中的错误信息，如图8所示，左侧为不使用视觉特征检测算法的实现效果，右侧为使用视觉特征检测算法的实现效果，白色部分为噪点，可以看到噪点大幅减少，这对帧差法的实现具有非常大的实际意义；其二，即使画面不抖动的情况下，对画面进行视觉特征检测算法仍然可以提高图像的稳定性和边缘的准确性，从而提高帧差法效果，如图9所示，其中图中白色区域为行驶中车辆的帧差结果，可以看到右侧使用视觉特征检测算法后帧差结果有明显的提高。从而增加帧差法的使用效果。Using the AKAZE algorithm to reduce shaking has two advantages while ensuring computing power and storage space occupation. First, the visual feature detection algorithm can effectively reduce the error information in the picture. As shown in Figure 8, the left side shows the implementation effect without using the visual feature detection algorithm, and the right side shows the implementation effect with the visual feature detection algorithm. The white part is noise. It can be seen that the noise is greatly reduced, which has a very great practical significance for the implementation of the frame difference method. Second, even if the picture is not shaking, the visual feature detection algorithm can still improve the stability of the image and the accuracy of the edge, thereby improving the effect of the frame difference method, as shown in Figure 9, where the white area in the figure is the frame difference result of a moving vehicle. It can be seen that the frame difference result on the right side is significantly improved after the visual feature detection algorithm is used. This increases the effect of the frame difference method.

通过实验，AKAZE在性能和速度之间提供了一个较好的平衡点。虽然它的效果可能不及SIFT，但其速度优势使它在处理速度和准确度都有一定要求的应用场景中变得更加合适。而ORB，尽管速度更快，但在鲁棒性和匹配精度上的不足可能导致其在某些应用场景下表现不佳。造成这个结果的原因是SIFT方法通过检测图像中的极值点来识别关键点，并在关键点周围构建方向直方图以创建特征描述符。这些描述符是尺度不变的，也对旋转、照明变化保持一定程度的不变性，复杂的计算过程使SIFT能够提供非常鲁棒的匹配结果，但计算成本高，速度慢，对于本实施例中实时应用且计算资源有限的场景不太合适。而ORB使用FAST算法来检测角点，FAST算法的运行速度非常快，因为它仅仅需要考虑像素点周围的少量像素来判断该点是否为角点，且ORB使用BRIEF的方法来生成这些点的描述符，BRIEF提供了一种非常快速的特征描述符计算方法，其核心思想是通过比较关键点周围随机选取的像素对来生成一个二进制字符串。这种描述符的计算速度非常快，因为它仅仅涉及简单的像素比较操作，而不需要进行复杂的梯度或方向直方图计算。然而，虽然ORB在速度上有优势，但其鲁棒性和精度较低，无法准确匹配出令人满意的结果。AKAZE算法在ORB和SIFT之间提供了一个良好的平衡，它利用非线性滤波器和高效的二进制描述符来达到既快速又鲁棒的特征检测和匹配效果。AKAZE基于非线性尺度空间理论的特征检测器，它使用快速的显式扩散滤波器来构建尺度空间，这使其提供了一个较快的尺度空间构建方式，计算速度远快于传统的SIFT的高斯模糊。另外，AKAZE采用了所谓的M-LDB描述符，与ORB中使用的BRIEF描述符相比，M-LDB进行了更多局部像素比较和更复杂的模式，能够提供更好的描述能力和更高的鲁棒性。Through experiments, AKAZE provides a good balance between performance and speed. Although its effect may not be as good as SIFT, its speed advantage makes it more suitable in application scenarios that have certain requirements for processing speed and accuracy. Although ORB is faster, its shortcomings in robustness and matching accuracy may cause it to perform poorly in some application scenarios. The reason for this result is that the SIFT method identifies key points by detecting extreme points in the image and constructs direction histograms around the key points to create feature descriptors. These descriptors are scale-invariant and also maintain a certain degree of invariance to rotation and lighting changes. The complex calculation process enables SIFT to provide very robust matching results, but the calculation cost is high and the speed is slow, which is not suitable for real-time applications and scenarios with limited computing resources in this embodiment. ORB uses the FAST algorithm to detect corners. The FAST algorithm runs very fast because it only needs to consider a small number of pixels around the pixel to determine whether the point is a corner point. ORB uses the BRIEF method to generate descriptors for these points. BRIEF provides a very fast feature descriptor calculation method. Its core idea is to generate a binary string by comparing randomly selected pixel pairs around the key point. The calculation speed of this descriptor is very fast because it only involves simple pixel comparison operations without complex gradient or directional histogram calculations. However, although ORB has an advantage in speed, its robustness and accuracy are low, and it cannot accurately match satisfactory results. The AKAZE algorithm provides a good balance between ORB and SIFT. It uses nonlinear filters and efficient binary descriptors to achieve fast and robust feature detection and matching effects. AKAZE is a feature detector based on nonlinear scale space theory. It uses a fast explicit diffusion filter to construct the scale space, which provides a faster scale space construction method and is much faster than the traditional SIFT Gaussian blur. In addition, AKAZE adopts the so-called M-LDB descriptor. Compared with the BRIEF descriptor used in ORB, M-LDB performs more local pixel comparisons and more complex patterns, which can provide better description capabilities and higher robustness.

5.运动目标跟踪5. Moving target tracking

有了运动轮廓信息后,服务器接下来利用卡尔曼滤波器对这些运动目标进行跟踪,并计算出它们的运动速度。具体步骤如下:After obtaining the motion profile information, the server then uses the Kalman filter to track these moving targets and calculate their movement speed. The specific steps are as follows:

(1)卡尔曼滤波的状态方程和观测方程分别为:(1) The state equation and observation equation of Kalman filtering are:

其中,为第个时刻的状态向量(包含位置和速度),为状态转移矩阵,为控制输入矩阵,为控制输入向量,为状态噪声,为第个时刻的观测向量(即运动轮廓坐标),为观测矩阵,为观测噪声。in, For the The state vector at a moment (including position and velocity), is the state transfer matrix, is the control input matrix, is the control input vector, is the state noise, For the The observation vector at the moment (i.e., the motion profile coordinates), is the observation matrix, is the observation noise.

(2)卡尔曼滤波器的状态转移矩阵和观测矩阵分别设置为:(2) State transition matrix of Kalman filter and the observation matrix Set to:

； ;

(3)噪声协方差矩阵和分别设置为:(3) Noise covariance matrix and Set to:

； ;

通过上述卡尔曼滤波的参数设置,服务器能够准确地跟踪视频中各辆车的运动轨迹和运动速度信息。Through the above-mentioned Kalman filter parameter settings, the server can accurately track the motion trajectory and speed information of each vehicle in the video.

6.基于距离的车型识别6. Distance-based vehicle model recognition

最后,服务器利用预先训练好的金字塔式神经网络模型,对检测到的运动特征进行车型识别。该模型的具体结构如下:Finally, the server uses the pre-trained pyramid neural network model to identify the vehicle model based on the detected motion features. The specific structure of the model is as follows:

输入层:接收运动轮廓图像数据、运动轨迹坐标序列和运动速度等特征。Input layer: receives motion contour image data, motion trajectory coordinate sequence, motion speed and other features.

5个金字塔卷积层:5 pyramid convolutional layers:

第1层:识别距离范围为200-250米的运动目标;Layer 1: Identify moving targets at a distance range of 200-250 meters;

每个金字塔层的内部结构包括:The internal structure of each pyramid layer includes:

1)卷积层组:提取运动特征;1) Convolutional layer group: extract motion features;

2)池化层组:逐步降低特征图分辨率;2) Pooling layer group: gradually reduce the feature map resolution;

3)全连接层组:将特征展平并输出分类概率。3) Fully connected layer group: flattens the features and outputs the classification probability.

跳转输出模块:判断当前金字塔层的输出可信度是否达到0.8,若达到则输出当前层的结果,否则继续处理下一层。Jump output module: Determine whether the output credibility of the current pyramid layer reaches 0.8. If so, output the result of the current layer, otherwise continue to process the next layer.

输出层:根据分类概率输出最终的车型识别结果。Output layer: Output the final vehicle model recognition result according to the classification probability.

通过这种金字塔式的多尺度识别机制,该神经网络模型能够有效地利用不同距离下的运动特征,提高整体的车型识别精度。Through this pyramid-like multi-scale recognition mechanism, the neural network model can effectively utilize the motion characteristics at different distances and improve the overall vehicle model recognition accuracy.

7.识别结果输出7. Recognition result output

在完成上述各个步骤后,服务器将最终的车型识别结果和运动速度信息,通过网络传输至交管部门的智能交通控制系统。该系统会将这些信息进行汇总分析,为城市交通管理提供有价值的数据支撑。After completing the above steps, the server transmits the final vehicle model recognition results and movement speed information to the intelligent traffic control system of the traffic management department through the network. The system will summarize and analyze this information to provide valuable data support for urban traffic management.

例如,根据识别结果统计,在某一时间段内该路段共检测到100辆车,其中轿车50辆、SUV 30辆、卡车20辆。这些数据可用于评估该路段的通行能力,合理调配交通疏导资源。再者,结合每辆车的运动速度信息,还可以发现该路段存在的拥堵点,为交通规划提供依据。For example, according to the recognition results, a total of 100 vehicles were detected on the road section in a certain period of time, including 50 cars, 30 SUVs, and 20 trucks. These data can be used to evaluate the traffic capacity of the road section and reasonably allocate traffic diversion resources. Furthermore, combined with the movement speed information of each vehicle, the congestion points on the road section can also be found, providing a basis for traffic planning.

总的来说,通过部署该低算力帧道路视觉车辆识别系统,交管部门能够实现对主干道路上车辆的全面监控和智能化管理,为城市交通治理提供有力支撑。通过监测和统计，得到的道路车辆信息如表1所示：In general, by deploying this low-computing-power frame road visual vehicle recognition system, the traffic control department can achieve comprehensive monitoring and intelligent management of vehicles on trunk roads, providing strong support for urban traffic management. Through monitoring and statistics, the road vehicle information obtained is shown in Table 1:

表1 车辆信息表Table 1 Vehicle information table

时间段Time period轿车carSUVSUV卡车truck平均速度(km/h)Average speed (km/h)08:00-08:1508:00-08:1512127755525208:15-08:3008:15-08:3015159966484808:30-08:4508:30-08:4513138844515108:45-09:0008:45-09:00101066555050

从柱状图中可以看出,在早高峰期间,轿车的平均行驶速度最高,约为53-55 km/h,SUV次之,卡车最低,平均在42-46 km/h。这一结果表明,该路段存在一定的车型差异性,需要根据不同车型的行驶特点采取针对性的交通管控措施。From the bar chart, we can see that during the morning rush hour, the average speed of cars is the highest, about 53-55 km/h, followed by SUVs, and trucks are the lowest, averaging 42-46 km/h. This result shows that there are certain differences in vehicle types on this road section, and targeted traffic control measures need to be taken according to the driving characteristics of different vehicle types.

9.系统性能评估9. System performance evaluation

为了评估该低算力帧道路视觉车辆识别系统的实际性能,交管部门对其进行了为期一周的试运行,收集了以下关键指标数据:In order to evaluate the actual performance of the low-computing-power frame road vision vehicle recognition system, the traffic control department conducted a one-week trial run and collected the following key indicator data:

1.车型识别准确率1. Vehicle type recognition accuracy

该指标反映了系统对不同车型的识别准确性。经过统计分析,在10000辆车中,轿车识别准确率达到93%,SUV为90%,卡车为88%。与现有的基于外观特征识别的方法相比,本系统的整体准确率提高了5个百分点左右。This indicator reflects the system's recognition accuracy for different vehicle models. After statistical analysis, among 10,000 vehicles, the recognition accuracy of sedans reached 93%, SUVs 90%, and trucks 88%. Compared with the existing methods based on appearance feature recognition, the overall accuracy of this system has increased by about 5 percentage points.

2.识别响应时间2. Identify response time

该指标反映了系统从视频采集到输出识别结果的延迟时间。经测试,该系统的平均响应时间为150毫秒,可满足实时交通监控的需求。与基于视频目标跟踪的方法相比,响应时间缩短了30%以上。This indicator reflects the delay time from video acquisition to output of recognition results. After testing, the average response time of the system is 150 milliseconds, which can meet the needs of real-time traffic monitoring. Compared with the method based on video target tracking, the response time is shortened by more than 30%.

3.系统吞吐量3. System throughput

该指标反映了系统的并发处理能力。经测试,该服务器可以同时处理20路1080P视频,每秒可输出100个车辆识别结果。这与系统设计时的预期目标相符,可满足该城市主干道全覆盖的需求。This indicator reflects the concurrent processing capability of the system. After testing, the server can process 20 channels of 1080P video at the same time and output 100 vehicle recognition results per second. This is consistent with the expected goal when the system was designed and can meet the needs of full coverage of the city's main roads.

4.系统稳定性4. System stability

该指标反映了系统在长时间运行中的可靠性。经过一周的试运行,该系统未出现任何故障或崩溃,系统可用性达到99.9%。This indicator reflects the reliability of the system in long-term operation. After a week of trial operation, the system did not experience any failures or crashes, and the system availability reached 99.9%.

综合以上性能指标结果,交管部门对该低算力帧道路视觉车辆识别系统给予高度评价。该系统不仅识别准确率高,而且响应速度快、处理能力强、稳定性好,完全满足了城市交通管理的实际需求。Based on the above performance indicators, the traffic control department highly evaluated the low-computing-power frame road visual vehicle recognition system. The system not only has high recognition accuracy, but also has fast response speed, strong processing capability and good stability, which fully meets the actual needs of urban traffic management.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。The above description is only a specific implementation mode of the present invention, but the protection scope of the present invention is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention, which should be covered by the protection scope of the present invention.

Claims

Translated fromChinese

1.一种低算力的帧差法道路车辆视觉识别方法，其特征在于，包括以下步骤：1. A low-computing-power frame difference method for road vehicle visual recognition, characterized in that it comprises the following steps:

S60、输出所述车辆分类和运动速度作为识别结果；S60, outputting the vehicle classification and movement speed as the recognition result;

其中，所述距离分级车辆识别模型的具体结构采用金字塔式神经网络模型，包括五个依次执行的金字塔层以及跳转输出模块，每个金字塔层用于识别图像中不同距离的运动轮廓；The specific structure of the distance-graded vehicle recognition model adopts a pyramid neural network model, including five pyramid layers executed in sequence and a jump output module, each pyramid layer is used to recognize motion contours at different distances in the image;

其中，第一金字塔层用于识别距离范围第一范围的运动轮廓;Wherein, the first pyramid layer is used to identify motion contours in a first range of distance ranges;

所述第一范围、第二范围、第三范围、第四范围和第五范围的距离依次减小，具体是：所述第一范围为200~250米，所述第二范围为150~200米，所述第三范围为100~150米，所述第四范围为50~100米，所述第五范围为10~50米；The distances of the first range, the second range, the third range, the fourth range and the fifth range decrease in sequence, specifically: the first range is 200-250 meters, the second range is 150-200 meters, the third range is 100-150 meters, the fourth range is 50-100 meters, and the fifth range is 10-50 meters;

其中，每一金字塔层的结构如下:Among them, the structure of each pyramid layer is as follows:

输出层:基于分类概率输出识别结果；Output layer: output recognition results based on classification probability;

其中，所述跳转输出模块用于判断当前金字塔层的输出可信度是否大于可信度阈值，若大于或等于，则以当前金字塔层的输出作为所述距离分级车辆识别模型的输出结果；若小于则继续进行下一层金字塔或者以最后一层金字塔的输出作为所述距离分级车辆识别模型的输出结果。Among them, the jump output module is used to determine whether the output credibility of the current pyramid layer is greater than the credibility threshold. If it is greater than or equal to, the output of the current pyramid layer is used as the output result of the distance classification vehicle recognition model; if it is less than, continue to the next layer of pyramid or use the output of the last layer of pyramid as the output result of the distance classification vehicle recognition model.

2.根据权利要求1所述的一种低算力的帧差法道路车辆视觉识别方法，其特征在于，所述识别的车型包括至少包括轿车、SUV、卡车、大巴车、小型货车、大型货车。2. According to a low-computing-power frame difference method for road vehicle visual recognition according to claim 1, it is characterized in that the identified vehicle types include at least sedans, SUVs, trucks, buses, small trucks, and large trucks.

3.根据权利要求1所述的一种低算力的帧差法道路车辆视觉识别方法，其特征在于，所述像素差异阈值预设为10%~30%。3. According to a low-computing-power frame difference method for road vehicle visual recognition according to claim 1, it is characterized in that the pixel difference threshold is preset to 10%~30%.

4.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质中存储有程序指令，所述程序指令运行时，用于执行权利要求1-3任一项所述的一种低算力的帧差法道路车辆视觉识别方法。4. A computer-readable storage medium, characterized in that program instructions are stored in the computer-readable storage medium, and when the program instructions are run, they are used to execute the low-computing-power frame difference method for road vehicle visual recognition as described in any one of claims 1-3.

5.一种低算力的帧差法道路车辆视觉识别系统，其特征在于，包含权利要求4所述的计算机可读存储介质。5. A low-computing-power frame difference method road vehicle visual recognition system, characterized in that it comprises the computer-readable storage medium as described in claim 4.