CN117593733A

Movatterモバイル変換

Info

Publication number: CN117593733A
Application number: CN202311558676.8A
Authority: CN
Inventors: 王润民; 朱彦斌; 陈华; 朱祯琳; 黑洁蕾; 徐娟; 许赛; 张如意; 尹忠渤; 丁亚军; 代建华
Original assignee: Hunan Normal University
Current assignee: Hunan Normal University
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-23

Abstract

The invention discloses a method, a device, equipment and a storage medium for detecting and identifying traffic texts, which relate to the field of intelligent traffic and the field of traffic text detection and comprise the following steps: acquiring an image to be processed containing traffic texts, and performing image preprocessing operation on the image to be processed to obtain a preprocessed image; performing feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map; performing text region detection on the global feature map by using a progressive scale expansion algorithm to obtain an initial text region, and performing a second preset feature extraction operation on the initial text region to obtain a local feature map; and acquiring a fusion feature map based on the global feature map and the local feature map, and inputting the fusion feature map to a text recognition module for text recognition operation. The method and the device perform text detection and recognition by combining the global features of the traffic scene image and the local features of the text candidate region, and have higher detection precision and recognition accuracy while guaranteeing timeliness.

Description

Translated fromChinese

一种交通文本检测与识别方法、装置、设备及存储介质A traffic text detection and recognition method, device, equipment and storage medium

技术领域Technical field

本发明涉及智能交通领域及交通文本检测领域，特别涉及一种交通文本检测与识别方法、装置、设备及存储介质。The invention relates to the field of intelligent transportation and the field of traffic text detection, and in particular to a traffic text detection and recognition method, device, equipment and storage medium.

背景技术Background technique

文本是人类语言的书面形式，是传递信息的重要载体，在现代社会的生活场景中充满了丰富多样的文本信息，具有明确且特定语义的文本对于自然场景的概括、说明和表达具有至关重要的意义。交通文本是一种特殊的自然场景文本，其检测和识别难度不同于传统的扫描文件文本。场景文本的检测和识别面临诸如复杂背景、多样字体、不均匀照明、多方向文本和透视失真等挑战，使得其难度大大提高。当前的研究在交通标志检测领域已经取得了显著的成果，但交通文本的检测与识别仍有待深入研究，快速发展的无人驾驶行业使得交通文本检测与识别技术的迅速更新变得更为重要。由于交通标志通常处于复杂的自然环境中，因此原始相机采集的图像质量可能受到雨雪和大雾天气的影响而降低，当交通标志图像模糊或难以识别时，检测器和识别器很难确定交通文本的位置，从而导致识别困难。在自然场景文本检测任务中，目前只能对固定大小的目标区域进行特征提取，无法充分提取具有极端宽高比的交通文本区域特征。在自然环境中交通标志的背景通常非常复杂，学习类似于天空、交通标志等重要背景对于区分文本与非文本、防止误检具有重要意义。学习文本区域的局部特征是检测与识别过程中最为关键的阶段之，文字的特征可以使检测器更好地定位文本区域，从而提高识别结果的准确性。Text is the written form of human language and an important carrier for transmitting information. Life scenes in modern society are full of rich and diverse text information. Text with clear and specific semantics is crucial to the summary, explanation and expression of natural scenes. meaning. Traffic text is a special type of natural scene text whose detection and recognition difficulty is different from traditional scanned document text. The detection and recognition of scene text faces challenges such as complex backgrounds, diverse fonts, uneven lighting, multi-directional text and perspective distortion, making it much more difficult. Current research has achieved remarkable results in the field of traffic sign detection, but the detection and recognition of traffic text still needs in-depth research. The rapid development of the driverless industry makes the rapid update of traffic text detection and recognition technology more important. Since traffic signs are usually in complex natural environments, the image quality captured by the original camera may be reduced by rain, snow and fog. When the traffic sign image is blurry or difficult to identify, it is difficult for the detector and recognizer to determine the traffic The position of the text makes recognition difficult. In the natural scene text detection task, currently only fixed-size target areas can be extracted, and features of traffic text areas with extreme aspect ratios cannot be fully extracted. The background of traffic signs in natural environments is usually very complex. Learning important backgrounds such as the sky and traffic signs is of great significance to distinguish text from non-text and prevent misdetection. Learning the local features of the text area is one of the most critical stages in the detection and recognition process. The characteristics of the text can enable the detector to better locate the text area, thereby improving the accuracy of the recognition results.

交通文本广泛分布在自然场景的驾驶环境中。由于交通文本通常与交通标志紧密相互关联，且交通标志在设计上相对容易观察，因此与一般自然场景文本相比，交通文本具有更丰富的视觉上下文信息。然而在实际情况中，户外环境中存在许多不确定因素，如天气变化、光线条件恶劣、局部遮挡以及图像质量低下等，这些因素都可能对检测结果产生严重影响。此外，照片捕获设备所获取的交通标识图像在户外随机场景中可能具有很大的差异，这也为交通文本的检测带来了巨大挑战。Traffic text is widely distributed in driving environments in natural scenes. Since traffic text is usually closely related to traffic signs, and traffic signs are designed to be relatively easy to observe, traffic text has richer visual context information than general natural scene text. However, in actual situations, there are many uncertain factors in the outdoor environment, such as weather changes, poor light conditions, local occlusion, and low image quality. These factors may have a serious impact on the detection results. In addition, traffic sign images acquired by photo capture devices may have great differences in random outdoor scenes, which also brings great challenges to the detection of traffic text.

目前在自然场景文本检测任务中，通常使用深度残差网络(ResNet50)和传统卷积核进行特征提取。由于ResNet50的速度较慢，因此在实时检测和识别上效果不是很好，传统卷积核只能对固定大小的目标区域进行特征提取，无法充分提取具有极端宽高比的交通文本区域特征。文本检测算法通常分为两种类型：一是传统的两阶段目标检测算法，即首先生成目标建议框，再对这些框进行回归。这种方法通常较为繁琐，需要较长的计算时间，且难以对弯曲或形变的文本进行回归定位。二是基于语义分割的方法，将文本区域视为一种类别，从背景中分割出来。这种方法准确度较高，但往往难以区分相近的文本，导致粘连现象。传统的特征提取网络通常只能对整幅图像进行处理，而整幅图像包含大量背景信息，交通文本往往只占很小一部分，网络可能会学习到许多噪声，这些噪声会降低定位准确性。Currently, in natural scene text detection tasks, deep residual networks (ResNet50) and traditional convolution kernels are usually used for feature extraction. Due to the slow speed of ResNet50, it is not very effective in real-time detection and recognition. Traditional convolution kernels can only extract features from fixed-size target areas and cannot fully extract features of traffic text areas with extreme aspect ratios. Text detection algorithms are usually divided into two types: one is the traditional two-stage target detection algorithm, which first generates target suggestion frames and then regresses these frames. This method is usually cumbersome, requires long calculation time, and is difficult to perform regression positioning on curved or deformed text. The second is a method based on semantic segmentation, which treats the text area as a category and segments it from the background. This method has high accuracy, but it is often difficult to distinguish between similar texts, leading to adhesion. Traditional feature extraction networks can usually only process the entire image, which contains a large amount of background information, and traffic text often accounts for only a small part. The network may learn a lot of noise, which will reduce positioning accuracy.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种交通文本检测与识别方法、装置、设备和存储介质，能够保证实时性的同时具有较高的检测精度和识别准确度。其具体方案如下：In view of this, the purpose of the present invention is to provide a traffic text detection and recognition method, device, equipment and storage medium, which can ensure real-time performance and have high detection accuracy and recognition accuracy. The specific plan is as follows:

第一方面，本发明公开了一种交通文本检测与识别方法，包括：In a first aspect, the present invention discloses a traffic text detection and recognition method, including:

获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；Obtain an image to be processed that contains traffic text, and perform an image preprocessing operation on the image to be processed to obtain a preprocessed image;

通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；Perform feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map;

利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；Using a progressive scale expansion algorithm to perform text area detection on the global feature map to obtain an initial text area, and performing a second preset feature extraction operation on the initial text area to obtain a local feature map;

基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。A fused feature map is obtained based on the global feature map and the local feature map, and the fused feature map is input into the text recognition module to perform a text recognition operation.

可选的，所述对所述待处理图像进行图像预处理操作，以得到预处理后图像，包括：Optionally, performing an image preprocessing operation on the image to be processed to obtain a preprocessed image includes:

求解所述待处理图像对应的暗通道，并从所述暗通道中选择预设百分比的点；Solve the dark channel corresponding to the image to be processed, and select a preset percentage of points from the dark channel;

计算所述待处理图像中与所述预设百分比的点对应的点的均值，以得到全局大气光估计值；Calculate the mean value of points corresponding to the preset percentage of points in the image to be processed to obtain a global atmospheric light estimate;

获取所述待处理图像中每个待处理图像的每个点，并计算每个所述点对应的局部窗口的传输率；其中，所述局部窗口为以所述点为中心的局部窗口；Obtain each point of each image to be processed in the image to be processed, and calculate the transmission rate of the local window corresponding to each point; wherein the local window is a local window centered on the point;

基于所述每个点、所述全局大气光估计值、所述传输率以及预设雾天退化模型获取所述预处理后图像。The preprocessed image is obtained based on each point, the global atmospheric light estimate, the transmission rate, and a preset foggy degradation model.

可选的，所述通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图，包括：Optionally, performing feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map includes:

利用ResNet18网络对所述预处理后图像进行特征提取，以得到全局特征图。The ResNet18 network is used to extract features from the preprocessed image to obtain a global feature map.

可选的，所述利用ResNet18网络对所述预处理后图像进行特征提取，以得到全局特征图，包括：Optionally, the ResNet18 network is used to perform feature extraction on the preprocessed image to obtain a global feature map, including:

将所述ResNet18网络中的原始卷积层替换为目标卷积层；其中，所述目标卷积层为包含第一卷积核、第二卷积核以及第三卷积核的卷积层；Replace the original convolution layer in the ResNet18 network with a target convolution layer; wherein the target convolution layer is a convolution layer including a first convolution kernel, a second convolution kernel and a third convolution kernel;

获取所述目标卷积层输出的第一预设数量个第一特征图；Obtain a first preset number of first feature maps output by the target convolution layer;

对所述第一预设数量个所述第一特征图进行上采样，以得到对应的第二特征图；Upsample the first preset number of first feature maps to obtain corresponding second feature maps;

将所述第一特征图与所述第二特征图中大小通道相同的特征图相加，以得到目标特征图；Add the feature maps with the same size channel in the first feature map and the second feature map to obtain the target feature map;

基于通道将所述目标特征图进行拼接，以得到特征融合后的特征图；Splice the target feature maps based on channels to obtain feature maps after feature fusion;

对所述特征融合后的特征图进行所述上采样，以得到所述全局特征图。The upsampling is performed on the feature map after feature fusion to obtain the global feature map.

可选的，所述对所述初始文本区域执行第二预设特征提取操作，得到局部特征图，包括：Optionally, performing a second preset feature extraction operation on the initial text area to obtain a local feature map includes:

将所述初始文本区域输入至所述ResNet18网络中进行卷积操作，以得到所述局部特征图。The initial text area is input into the ResNet18 network to perform a convolution operation to obtain the local feature map.

可选的，所述基于所述全局特征图、所述局部特征图获取融合特征图，包括：Optionally, obtaining the fusion feature map based on the global feature map and the local feature map includes:

将所述全局特征图与所述局部特征图进行拼接，以得到拼接特征图；Splicing the global feature map and the local feature map to obtain a spliced feature map;

根据所述拼接特征图的通道数划分所述拼接特征图，以得到与所述通道数的数量相同的切分特征图；Divide the splicing feature map according to the number of channels of the splicing feature map to obtain segmented feature maps with the same number as the number of channels;

将每个所述切分特征图输入至全局最大池化层，以得到第三特征图；Input each of the segmented feature maps into a global max pooling layer to obtain a third feature map;

将所述切分特征图输入至卷积-批归一化-激活层并变化形状，以得到第四特征图和第五特征图；Input the segmented feature map into the convolution-batch normalization-activation layer and change the shape to obtain the fourth feature map and the fifth feature map;

将所述第四特征图与所述第五特征图进行矩阵乘法，以得到第六特征图；Perform matrix multiplication of the fourth feature map and the fifth feature map to obtain a sixth feature map;

将所述第三特征图与所述第六特征图进行融合，以得到第七特征图；Fusion of the third feature map and the sixth feature map to obtain a seventh feature map;

拼接全部所述第七特征图，以得到第八特征图，并将所述第八特征图输入至所述卷积-批归一化-激活层并进行所述上采样操作，以得到第九特征图；Splice all the seventh feature maps to obtain the eighth feature map, and input the eighth feature map to the convolution-batch normalization-activation layer and perform the upsampling operation to obtain the ninth feature map. feature map;

将所述第九特征图与所述拼接特征图相加融合，以得到所述融合特征图。The ninth feature map and the splicing feature map are added and fused to obtain the fused feature map.

可选的，所述利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，包括：Optionally, using a progressive scale expansion algorithm to perform text area detection on the global feature map to obtain an initial text area includes:

利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果；Using the progressive scale expansion algorithm to perform segmentation prediction on the global feature map to obtain a segmentation result;

在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点，并获取每个所述边界点与真实文本实例之间的最短真实距离；Randomly sample a second preset number of boundary points at the boundary of each text instance in the segmentation result, and obtain the shortest true distance between each boundary point and the real text instance;

基于所述真实距离、预测距离以及预设损失函数获取所述初始文本区域；其中，所述预测距离为所述分割结果中预测文本框边界点与所述真实文本实例之间的最短距离。The initial text area is obtained based on the real distance, the predicted distance and the preset loss function; wherein the predicted distance is the shortest distance between the predicted text box boundary point and the real text instance in the segmentation result.

第二方面，本发明公开了一种交通文本检测与识别装置，包括：In a second aspect, the present invention discloses a traffic text detection and recognition device, including:

图像预处理模块，用于获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；An image preprocessing module, used to obtain an image to be processed that contains traffic text, and perform an image preprocessing operation on the image to be processed to obtain a preprocessed image;

第一特征提取模块，用于通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；A first feature extraction module, configured to perform feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map;

文本区域检测模块，用于利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域；A text area detection module, used to perform text area detection on the global feature map using a progressive scale expansion algorithm to obtain an initial text area;

第二特征提取模块，用于对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；A second feature extraction module, configured to perform a second preset feature extraction operation on the initial text area to obtain a local feature map;

融合模块，用于基于所述全局特征图、所述局部特征图获取融合特征图；A fusion module, configured to obtain a fusion feature map based on the global feature map and the local feature map;

文本识别模块，用于将所述融合特征图输入至文本识别模块进行文本识别操作。A text recognition module, configured to input the fused feature map to the text recognition module to perform text recognition operations.

第三方面，本发明公开了一种电子设备，包括：In a third aspect, the present invention discloses an electronic device, including:

存储器，用于保存计算机程序；Memory, used to hold computer programs;

处理器，用于执行所述计算机程序，以实现如前述公开的交通文本检测与识别方法的步骤。A processor, configured to execute the computer program to implement the steps of the traffic text detection and recognition method disclosed above.

第四方面，本发明公开了一种计算机可读存储介质，用于存储计算机程序；其中，所述计算机程序被处理器执行时实现如前述公开的交通文本检测与识别方法。In a fourth aspect, the present invention discloses a computer-readable storage medium for storing a computer program; wherein when the computer program is executed by a processor, the traffic text detection and recognition method disclosed above is implemented.

可见，本发明提供了一种交通文本检测与识别方法，包括：获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。由此可见，本发明通过对图像预处理来提高交通文本图像的清晰度，通过第一预设特征提取操作对所述预处理后图像进行特征提取，以便在参数较少的情况下保证检测精度，对所述全局特征图进行文本区域检测，以提高定位精度，得到更精确的文本区域，通过结合交通场景图像的全局特征和文字候选区域的局部特征进行文本检测与识别，在保证实时性的同时具有较高的检测精度和识别准确度。It can be seen that the present invention provides a traffic text detection and recognition method, which includes: obtaining an image to be processed containing traffic text, and performing an image preprocessing operation on the image to be processed to obtain a preprocessed image; through the first preprocessing Assume that the feature extraction operation is performed on the preprocessed image to obtain a global feature map; a progressive scale expansion algorithm is used to perform text area detection on the global feature map to obtain an initial text area, and the initial text area is A second preset feature extraction operation is performed to obtain a local feature map; a fused feature map is obtained based on the global feature map and the local feature map, and the fused feature map is input into the text recognition module to perform a text recognition operation. It can be seen that the present invention improves the clarity of traffic text images through image preprocessing, and performs feature extraction on the preprocessed image through a first preset feature extraction operation to ensure detection accuracy with fewer parameters. , perform text area detection on the global feature map to improve positioning accuracy and obtain a more accurate text area. Text detection and recognition are performed by combining the global features of the traffic scene image and the local features of the text candidate area, while ensuring real-time performance. At the same time, it has high detection accuracy and recognition accuracy.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.

图1为本发明公开的一种交通文本检测与识别方法流程图；Figure 1 is a flow chart of a traffic text detection and recognition method disclosed in the present invention;

图2为本发明公开的交通文本检测与识别框图；Figure 2 is a block diagram of traffic text detection and recognition disclosed in the present invention;

图3为本发明公开的一种具体的交通文本检测与识别方法流程图；Figure 3 is a flow chart of a specific traffic text detection and recognition method disclosed in the present invention;

图4为本发明公开的全局-局部特征融合模块示意图；Figure 4 is a schematic diagram of the global-local feature fusion module disclosed in the present invention;

图5为本发明提供的交通文本检测与识别装置结构示意图；Figure 5 is a schematic structural diagram of the traffic text detection and recognition device provided by the present invention;

图6为本发明提供的一种电子设备结构图。Figure 6 is a structural diagram of an electronic device provided by the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整的描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

目前，在自然场景文本检测任务中，ResNet50的速度较慢在实时检测和识别上效果不是很好，传统卷积核只能对固定大小的目标区域进行特征提取，无法充分提取具有极端宽高比的交通文本区域特征。文本检测算法一种较为繁琐，需要较长的计算时间，且难以对弯曲或形变的文本进行回归定位，另一种准确度较高，但往往难以区分相近的文本，导致粘连现象。传统的特征提取网络通常只能对整幅图像进行处理，而整幅图像包含大量背景信息，交通文本往往只占很小一部分，网络可能会学习到许多噪声，这些噪声会降低定位准确性。为此，本发明提供了一种交通文本检测与识别方法，能够保证实时性的同时具有较高的检测精度和识别准确度。Currently, in natural scene text detection tasks, ResNet50 is slow and not very effective in real-time detection and recognition. Traditional convolution kernels can only extract features from fixed-size target areas and cannot fully extract features with extreme aspect ratios. Traffic text area features. One text detection algorithm is more cumbersome, requires a long calculation time, and is difficult to perform regression positioning of curved or deformed text. The other is more accurate, but it is often difficult to distinguish between similar texts, resulting in adhesion. Traditional feature extraction networks can usually only process the entire image, which contains a large amount of background information, and traffic text often accounts for only a small part. The network may learn a lot of noise, which will reduce positioning accuracy. To this end, the present invention provides a traffic text detection and recognition method, which can ensure real-time performance and have high detection accuracy and recognition accuracy.

本发明实施例公开了一种交通文本检测与识别方法，参见图1所示，该方法包括：An embodiment of the present invention discloses a traffic text detection and recognition method, as shown in Figure 1. The method includes:

步骤S11：获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像。Step S11: Obtain an image to be processed that contains traffic text, and perform an image preprocessing operation on the image to be processed to obtain a preprocessed image.

本实施例中，获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像。具体的，获取包含交通文本的待处理图像，求解所述待处理图像对应的暗通道，并从所述暗通道中选择预设百分比的点；计算所述待处理图像中与所述预设百分比的点对应的点的均值，以得到全局大气光估计值；获取所述待处理图像中每个待处理图像的每个点，并计算每个所述点对应的局部窗口的传输率；其中，所述局部窗口为以所述点为中心的局部窗口；基于所述每个点、所述全局大气光估计值、所述传输率以及预设雾天退化模型获取所述预处理后图像。In this embodiment, an image to be processed containing traffic text is obtained, and an image preprocessing operation is performed on the image to be processed to obtain a preprocessed image. Specifically, obtain an image to be processed that contains traffic text, solve the dark channel corresponding to the image to be processed, and select a preset percentage of points from the dark channel; calculate the points in the image to be processed that are consistent with the preset percentage The mean value of the points corresponding to the points to obtain the global atmospheric light estimate; obtain each point of each image to be processed in the image to be processed, and calculate the transmission rate of the local window corresponding to each point; where, The local window is a local window centered on the point; the preprocessed image is obtained based on each point, the global atmospheric light estimate, the transmission rate, and a preset foggy weather degradation model.

在图像预处理模块中，暗通道先验算法是一个简单而高效的去雾图像增强算法，由于雨雪天气可以视作雨雾与雪雾，所以在去雨雪上也有一定的作用。采用暗通道先验算法提高雨雪天和大雾天气条件下交通文本图像的清晰度，相比于其他的传统算法，暗通道先验算法处理图片速度快，效果较好，且相较于深度学习去雾算法，暗通道先验算法实时性强，数据量少且容易部署到其他开放式平台。In the image preprocessing module, the dark channel prior algorithm is a simple and efficient dehazing image enhancement algorithm. Since rain and snow weather can be regarded as rain fog and snow fog, it also plays a certain role in removing rain and snow. The dark channel prior algorithm is used to improve the clarity of traffic text images in rainy, snowy and foggy weather conditions. Compared with other traditional algorithms, the dark channel prior algorithm processes images quickly and has better effects, and compared with depth Learn the defogging algorithm. The dark channel prior algorithm has strong real-time performance, small data volume and is easy to deploy to other open platforms.

可以理解的是，将原始相机采集的图像送入预处理模块，先根据有雾有雨雪图像求解其暗通道；然后从暗通道中选前0.1％(即预设百分比，可以根据不同情况设置不同的百分比)的点，对应到待处理图像的原始图像中计算这些点的像素值的均值，作为全局大气光的估计值；之后对待处理图像的原始图像中的每个点k，求得以其为中心的局部窗口w_k的传输率对每个通道c的每个点k，根据雾天退化模型求得J(即所述预处理后图像)。It can be understood that the images collected by the original camera are sent to the pre-processing module, and the dark channel is first solved based on the fog, rain and snow images; then the top 0.1% (i.e. the preset percentage) is selected from the dark channel, which can be set differently according to different situations. percentage) of the points corresponding to the original image of the image to be processed, calculate the mean of the pixel values of these points as an estimate of the global atmospheric light; then for each point k in the original image of the image to be processed, find it as The transmission rate of the local window w_k in the center For each point k of each channel c, J (ie, the preprocessed image) is obtained according to the foggy weather degradation model.

步骤S12：通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图。Step S12: Perform feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map.

本实施例中，对所述待处理图像进行图像预处理操作，以得到预处理后图像之后，通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图。具体的，利用ResNet18网络对所述预处理后图像进行特征提取，以得到全局特征图。也即将所述ResNet18网络中的原始卷积层替换为目标卷积层；其中，所述目标卷积层为包含第一卷积核、第二卷积核以及第三卷积核的卷积层；获取所述目标卷积层输出的第一预设数量个第一特征图；对所述第一预设数量个所述第一特征图进行上采样，以得到对应的第二特征图；将所述第一特征图与所述第二特征图中大小通道相同的特征图相加，以得到目标特征图；基于通道将所述目标特征图进行拼接，以得到特征融合后的特征图；对所述特征融合后的特征图进行所述上采样，以得到所述全局特征图。In this embodiment, after performing an image preprocessing operation on the image to be processed to obtain a preprocessed image, feature extraction is performed on the preprocessed image through a first preset feature extraction operation to obtain a global feature map. Specifically, the ResNet18 network is used to extract features from the preprocessed image to obtain a global feature map. That is to say, the original convolution layer in the ResNet18 network is replaced with a target convolution layer; wherein, the target convolution layer is a convolution layer including a first convolution kernel, a second convolution kernel, and a third convolution kernel. ; Acquire a first preset number of first feature maps output by the target convolution layer; Upsample the first preset number of first feature maps to obtain a corresponding second feature map; The first feature map and the feature map with the same large and small channels in the second feature map are added to obtain a target feature map; the target feature map is spliced based on the channel to obtain a feature map after feature fusion; The feature map after feature fusion is upsampled to obtain the global feature map.

特征提取模块使用参数量较少的ResNet18网络。将所述ResNet18网络中的原始卷积层替换为目标卷积层，所述目标卷积层为包含第一卷积核、第二卷积核以及第三卷积核的卷积层，由于交通文本通常呈扁平状或细垂直状，分别利用1×3、3×1、3×3卷积核提取水平特征、垂直特征和普通特征，并将水平特征、垂直特征和普通特征融合(融合后得到第一特征图)作为下一阶段卷积层的输入，这样做相较于普通卷积核更有助于网络学习到符合交通文本特征(尺度、形状等)的特征，从而在参数较少的情况下保证检测精度。相较于卷积神经网络特征提取中常用的ResNet50网络，该网络速度训练与推理速度更快，参数量更少。经过三次卷积操作后，使用双线性插值进行上采样，将上采样后的四个特征图拼接，作为粗略的全局特征。The feature extraction module uses the ResNet18 network with fewer parameters. Replace the original convolution layer in the ResNet18 network with the target convolution layer. The target convolution layer is a convolution layer including the first convolution kernel, the second convolution kernel and the third convolution kernel. Due to traffic Text is usually flat or thin and vertical. 1 × 3, 3 × 1, and 3 × 3 convolution kernels are used to extract horizontal features, vertical features, and ordinary features, and the horizontal features, vertical features, and ordinary features are fused (after fusion). Obtain the first feature map) as the input of the next stage of the convolution layer. Compared with ordinary convolution kernels, this will help the network learn features that conform to the characteristics of traffic text (scale, shape, etc.), thereby reducing the number of parameters. to ensure detection accuracy. Compared with the ResNet50 network commonly used in convolutional neural network feature extraction, this network has faster training and inference speed and fewer parameters. After three convolution operations, bilinear interpolation is used for upsampling, and the four upsampled feature maps are spliced together as rough global features.

如图2所示，在特征增强模块中，使用ResNet18作为提取特征的骨干网络。把普通的卷积层替换为包含1×3，3×1，3×3三个不同卷积核的卷积层，将经过新的卷积层卷积后得到的融合结果，作为下一个卷积层的输入。连续三个上述阶段后，对得到的四张特征图(即第一预设数量个第一特征图)进行四次上采样操作，得到/>四个特征图(即第二特征图)，再将上采样前和上采样后大小通道相同的特征图加起来，得到四张图(即目标特征图)后，再concat逐通道拼接起来，进行特征融合，得到/>(即特征融合后的特征图)，再进行一次上采样操作，得到粗略的全局特征图R^H×W×4C。其中，H为特征图的高，W为特征图的宽，C为特征图的通道数。As shown in Figure 2, in the feature enhancement module, ResNet18 is used as the backbone network for feature extraction. Replace the ordinary convolution layer with a convolution layer containing three different convolution kernels: 1×3, 3×1, and 3×3. The fusion result obtained after convolution with the new convolution layer is used as the next convolution layer. The input to the accumulation layer. After three consecutive stages above, the four feature maps obtained (i.e. the first preset number of first feature maps) perform four upsampling operations to obtain/> Four feature maps (i.e., the second feature map), and then add up the feature maps with the same size channels before upsampling and after upsampling. After obtaining four pictures (i.e., the target feature map), concat them channel by channel, and perform Feature fusion, get/> (i.e., the feature map after feature fusion), and then perform an upsampling operation to obtain a rough global feature map R^H×W×4C . Among them, H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map.

步骤S13：利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图。Step S13: Use a progressive scale expansion algorithm to perform text area detection on the global feature map to obtain an initial text area, and perform a second preset feature extraction operation on the initial text area to obtain a local feature map.

本实施例中，通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图之后，利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图。可以理解的是，利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果；在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点，并获取每个所述边界点与真实文本实例之间的最短真实距离；基于所述真实距离、预测距离以及预设损失函数获取所述初始文本区域；其中，所述预测距离为所述分割结果中预测文本框边界点与所述真实文本实例之间的最短距离。所述对所述初始文本区域执行第二预设特征提取操作，得到局部特征图，包括：将所述初始文本区域输入至所述ResNet18网络中进行卷积操作，以得到所述局部特征图。In this embodiment, after performing feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map, a progressive scale expansion algorithm is used to perform text area detection on the global feature map to obtain an initial text area, and performs a second preset feature extraction operation on the initial text area to obtain a local feature map. It can be understood that the progressive scale expansion algorithm is used to perform segmentation prediction on the global feature map to obtain a segmentation result; a second preset number of boundary points are randomly sampled at the boundary of each text instance of the segmentation result, and Obtain the shortest real distance between each boundary point and the real text instance; obtain the initial text area based on the real distance, predicted distance and preset loss function; wherein the predicted distance is the segmentation result Predict the shortest distance between a text box boundary point and the real text instance. Performing a second preset feature extraction operation on the initial text area to obtain a local feature map includes: inputting the initial text area into the ResNet18 network and performing a convolution operation to obtain the local feature map.

在检测模块中为防止文本分割结果发生粘连，采用渐进尺度扩展算法来区分相邻文本。在得到粗略文本区域后，随机采样n个边界点，并学习边界点到真实文本边界的距离图。使用辅助的L-2距离损失进行距离监督，从而提高定位精度，得到更精确的文本区域。普通的语义分割方法会使得相邻文本粘连，回归方法对于形变文本的检测精度较低，但本发明的方法兼顾二者的优点，达到优势互补的效果。在获得准确的文本区域后，使用ResNet18网络进行进一步特征提取，得到局部特征图。In the detection module, in order to prevent the text segmentation results from being stuck, a progressive scale expansion algorithm is used to distinguish adjacent texts. After obtaining the rough text area, randomly sample n boundary points and learn the distance map from the boundary points to the real text boundary. An auxiliary L-2 distance loss is used for distance supervision, thereby improving positioning accuracy and obtaining more precise text areas. Ordinary semantic segmentation methods will cause adjacent texts to stick together, and regression methods have lower detection accuracy for deformed texts. However, the method of the present invention takes into account the advantages of both and achieves a complementary effect. After obtaining the accurate text area, the ResNet18 network is used for further feature extraction to obtain the local feature map.

步骤S14：基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。Step S14: Obtain a fused feature map based on the global feature map and the local feature map, and input the fused feature map into the text recognition module to perform a text recognition operation.

本实施例中，对所述初始文本区域执行第二预设特征提取操作，得到局部特征图之后，基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。具体的，将所述全局特征图与所述局部特征图进行拼接，以得到拼接特征图；根据所述拼接特征图的通道数划分所述拼接特征图，以得到与所述通道数的数量相同的切分特征图；将每个所述切分特征图输入至全局最大池化层，以得到第三特征图；将所述切分特征图输入至卷积-批归一化-激活层并变化形状，以得到第四特征图和第五特征图；将所述第四特征图与所述第五特征图进行矩阵乘法，以得到第六特征图；将所述第三特征图与所述第六特征图进行融合，以得到第七特征图；拼接全部所述第七特征图，以得到第八特征图，并将所述第八特征图输入至所述卷积-批归一化-激活层并进行所述上采样操作，以得到第九特征图；将所述第九特征图与所述拼接特征图相加融合，以得到所述融合特征图。In this embodiment, a second preset feature extraction operation is performed on the initial text area. After obtaining the local feature map, a fused feature map is obtained based on the global feature map and the local feature map, and the fused feature map is Input to the text recognition module for text recognition operations. Specifically, the global feature map and the local feature map are spliced to obtain a spliced feature map; the spliced feature map is divided according to the number of channels of the spliced feature map to obtain the same number as the number of channels. segmentation feature map; input each segmentation feature map into the global maximum pooling layer to obtain the third feature map; input the segmentation feature map into the convolution-batch normalization-activation layer and Change the shape to obtain the fourth feature map and the fifth feature map; perform matrix multiplication of the fourth feature map and the fifth feature map to obtain the sixth feature map; combine the third feature map and the The sixth feature map is fused to obtain the seventh feature map; all the seventh feature maps are spliced to obtain the eighth feature map, and the eighth feature map is input to the convolution-batch normalization- Activate the layer and perform the upsampling operation to obtain a ninth feature map; add and fuse the ninth feature map and the spliced feature map to obtain the fused feature map.

可以理解的是，简单的拼接操作无法很好地融合全局和局部特征，本发明设计一个创新的特征融合模块，将拼接后的文本特征均匀分割为k块，对每一块进行全局最大池化，接着执行卷积-批归一化-激活操作，再进行矩阵乘法以充分融合特征。最后进行上采样，得到全局和局部融合后的特征。将融合特征图输入识别模块，识别出交通文本。其中，识别模块采用了经典的双向长短时记忆网络(BiLSTM)和注意力解码器。It is understandable that simple splicing operations cannot integrate global and local features well. The present invention designs an innovative feature fusion module to evenly divide the spliced text features into k blocks, and perform global maximum pooling on each block. Then perform convolution-batch normalization-activation operations, and then perform matrix multiplication to fully fuse features. Finally, upsampling is performed to obtain global and local fused features. Input the fused feature map into the recognition module to identify the traffic text. Among them, the recognition module uses the classic bidirectional long short-term memory network (BiLSTM) and attention decoder.

进一步的，将得到的融合特征通过一个1×1卷积进行降维，更新检测前的特征图，并再次送入检测模块，以实现更精确的文本区域检测。Further, the obtained fused features are dimensionally reduced through a 1×1 convolution, the feature map before detection is updated, and sent to the detection module again to achieve more accurate text area detection.

本发明能够有效应对复杂自然环境中不利因素的鲁棒检测与识别算法；开发一种交通文本特征提取网络，缓解现有深度学习方法的速度与精度矛盾；设计适用于交通文本的检测器，结合两阶段目标检测算法和语义分割方法的优点，提高检测效果；优化特征提取网络，使其能够更好地学习重要背景信息，降低噪声影响，提高定位准确性。The present invention is a robust detection and recognition algorithm that can effectively deal with unfavorable factors in complex natural environments; develops a traffic text feature extraction network to alleviate the contradiction between speed and accuracy of existing deep learning methods; designs a detector suitable for traffic text, combined with The advantages of the two-stage target detection algorithm and semantic segmentation method improve the detection effect; the feature extraction network is optimized so that it can better learn important background information, reduce the impact of noise, and improve positioning accuracy.

可见，本发明提供了一种交通文本检测与识别方法，包括：获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。由此可见，本发明通过对图像预处理来提高交通文本图像的清晰度，通过第一预设特征提取操作对所述预处理后图像进行特征提取，以便在参数较少的情况下保证检测精度，对所述全局特征图进行文本区域识别，以提高定位精度，得到更精确的文本区域，通过结合交通场景图像的全局特征和文字候选区域的局部特征进行文本检测与识别，在保证实时性的同时具有较高的检测精度和识别准确度。It can be seen that the present invention provides a traffic text detection and recognition method, which includes: obtaining an image to be processed containing traffic text, and performing an image preprocessing operation on the image to be processed to obtain a preprocessed image; through the first preprocessing Assume that the feature extraction operation is performed on the preprocessed image to obtain a global feature map; a progressive scale expansion algorithm is used to perform text area detection on the global feature map to obtain an initial text area, and the initial text area is A second preset feature extraction operation is performed to obtain a local feature map; a fused feature map is obtained based on the global feature map and the local feature map, and the fused feature map is input into the text recognition module to perform a text recognition operation. It can be seen that the present invention improves the clarity of traffic text images through image preprocessing, and performs feature extraction on the preprocessed image through a first preset feature extraction operation to ensure detection accuracy with fewer parameters. , perform text area recognition on the global feature map to improve positioning accuracy and obtain a more accurate text area. Text detection and recognition are performed by combining the global features of the traffic scene image and the local features of the text candidate area, while ensuring real-time performance. At the same time, it has high detection accuracy and recognition accuracy.

参见图3所示，本发明实施例公开了一种交通文本检测与识别方法，相对于上一实施例，本实施例对技术方案作了进一步的说明和优化。Referring to Figure 3, an embodiment of the present invention discloses a traffic text detection and recognition method. Compared with the previous embodiment, this embodiment further explains and optimizes the technical solution.

步骤S21：获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像。Step S21: Obtain an image to be processed that contains traffic text, and perform an image preprocessing operation on the image to be processed to obtain a preprocessed image.

步骤S22：通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图。Step S22: Perform feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map.

步骤S23：利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果。Step S23: Use the progressive scale expansion algorithm to perform segmentation prediction on the global feature map to obtain a segmentation result.

本实施例中，通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图之后，利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果。可以理解的是，如上述图2所示，将得到的粗略的全局特征图R^H×W×4C输入检测模块，使用渐进尺度扩展算法(Progressive Scale Expansion)，对每个文本实例进行多个分割预测，得到分割结果对应多个不同尺度的kernels。每个kernels与原始的整个文本实例共享相似的形状，并且都位于相同的中心点，但比例有所不同。In this embodiment, after performing feature extraction on the pre-processed image through a first preset feature extraction operation to obtain a global feature map, the progressive scale expansion algorithm is used to perform segmentation prediction on the global feature map to obtain Segmentation results. It can be understood that, as shown in Figure 2 above, the obtained rough global feature map R^H×W×4C is input into the detection module, and the Progressive Scale Expansion algorithm is used to perform multiple segmentations on each text instance. Predict and obtain segmentation results corresponding to multiple kernels of different scales. Each kernel shares a similar shape to the original whole text instance and is located at the same center point, but with different proportions.

步骤S24：在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点，并获取每个所述边界点与真实文本实例之间的最短真实距离。Step S24: Randomly sample a second preset number of boundary points at the boundary of each text instance in the segmentation result, and obtain the shortest real distance between each boundary point and the real text instance.

本实施例中，利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果之后，在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点，并获取每个所述边界点与真实文本实例之间的最短真实距离。可以理解的是，如上述图2所示，通过渐进尺度扩展算法将最小尺度的kernels逐渐扩展到最大(最完整)的形状，从而达到分离相邻文本的效果。但这个方法得到的分割结果不会非常的精准，这时在最完整的分割结果的每条文本实例边上随机采样n个边界点，得到这些边界点距真实文本实例的距离图(即真实距离)。In this embodiment, after using the progressive scale expansion algorithm to perform segmentation prediction on the global feature map to obtain the segmentation result, a second preset number of boundary points are randomly sampled at the boundary of each text instance of the segmentation result, And get the shortest true distance between each said boundary point and the real text instance. It can be understood that, as shown in Figure 2 above, the smallest scale kernels are gradually expanded to the largest (most complete) shape through the progressive scale expansion algorithm, thereby achieving the effect of separating adjacent texts. However, the segmentation results obtained by this method will not be very accurate. At this time, n boundary points are randomly sampled next to each text instance of the most complete segmentation result, and the distance map of these boundary points from the real text instance (i.e., the real distance) is obtained. ).

步骤S25：基于所述真实距离、预测距离以及预设损失函数获取所述初始文本区域，对所述初始文本区域执行第二预设特征提取操作，得到局部特征图。Step S25: Obtain the initial text area based on the real distance, predicted distance and preset loss function, and perform a second preset feature extraction operation on the initial text area to obtain a local feature map.

本实施例中，在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点，并获取每个所述边界点与真实文本实例之间的最短真实距离之后，基于所述真实距离、预测距离以及预设损失函数获取所述初始文本区域，对所述初始文本区域执行第二预设特征提取操作，得到局部特征图。其中，所述预测距离为所述分割结果中预测文本框边界点与所述真实文本实例之间的最短距离。可以理解的是，使用L-2距离损失函数进行监督，具体公式如下：In this embodiment, after randomly sampling a second preset number of boundary points at the boundary of each text instance in the segmentation result, and obtaining the shortest true distance between each boundary point and the real text instance, based on the The real distance, predicted distance and preset loss function are used to obtain the initial text area, and a second preset feature extraction operation is performed on the initial text area to obtain a local feature map. Wherein, the prediction distance is the shortest distance between the predicted text box boundary point and the real text instance in the segmentation result. It can be understood that the L-2 distance loss function is used for supervision, and the specific formula is as follows:

式中，f(x_i)为预测文本框边界点到真实文本实例的距离，x_i为文本框边界点，y_i为真实距离，当y_i为0时，该损失函数变为如下公式：In the formula, f(xi₎ is the distance from the predicted text box boundary point to the real text instance,_xi is the text box boundary point, and_yi is the real distance. When_yi is 0, the loss function becomes the following formula:

此时得到最终的文本区域检测结果(即初始文本区域)，如上述图2所示，得到文本区域检测结果后，将局部特征图输入ResNet18网络，经过1×1卷积后，得到R^H×W×4C的局部特征图。At this time, the final text area detection result (i.e., the initial text area) is obtained, as shown in Figure 2 above. After obtaining the text area detection result, the local feature map is input into the ResNet18 network, and after 1×1 convolution, R^H× is obtained Local feature map of^W×4C .

步骤S26：基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。Step S26: Obtain a fused feature map based on the global feature map and the local feature map, and input the fused feature map into the text recognition module to perform a text recognition operation.

本实施例中，将全局特征图与局部特征图通过concat逐通道拼接，得到R^H×W×8C的拼接特征图。然后将拼接特征图按通道数平均分为k份，每一份为(即切分特征图)。In this embodiment, the global feature map and the local feature map are spliced channel by channel through concat to obtain a spliced feature map of R^H×W×8C . Then the spliced feature map is evenly divided into k parts according to the number of channels, and each part is (i.e. segmented feature map).

得到k份特征图后，将其输入至全局-局部特征融合模块，以得到精细的融合特征图。具体的，如图4所示，把每一份/>特征图输入全局最大池化(Global-avgpooling)层，得到大小为/>的第三特征图。同时，把/>输入卷积-批归一化-激活层(Conv-batchnorm-relu)并变化形状(reshape)，得到/>第四特征图和/>第五特征图，再将二者进行矩阵乘法，得到/>第六特征图。将全局最大池化后的结果/>第三特征图和刚刚矩阵乘法后的/>第六特征图进行特征融合，即通过相加操作得到第七特征图。对每一份/>都进行上述操作后，将全部第七特征图concat(逐通道拼接)拼接起来，得到R^1×1×8C第八特征图。再经过一个卷积-批归一化-激活层和上采样操作得到R^H×W×8C第九特征图，最后将R^H×W×8C和现在的R^H×W×8C相加得到融合后的全局-局部融合特征图R^H×W×8C(即融合特征图)，将所述融合特征图输入至文本识别模块进行文本识别操作。Get k copies After the feature map is generated, it is input to the global-local feature fusion module to obtain a fine fused feature map. Specifically, as shown in Figure 4, put each portion/> The feature map is input to the global maximum pooling (Global-avgpooling) layer, and the resulting size is/> The third feature map of . At the same time, put/> Input the convolution-batch normalization-activation layer (Conv-batchnorm-relu) and change the shape (reshape) to get /> The fourth feature map and/> The fifth feature map is then matrix multiplied between the two to obtain/> The sixth feature map. The result of global max pooling/> The third feature map and the one just after matrix multiplication/> The sixth feature map performs feature fusion, that is, the seventh feature map is obtained through the addition operation. For each serving/> After performing the above operations, all the seventh feature maps are concated (channel-by-channel splicing) to obtain the R^1×1×8C eighth feature map. After another convolution-batch normalization-activation layer and upsampling operation, the ninth feature map of R^H×W×8C is obtained. Finally, the R^H×W×8C and the current R^H×W×8C are added to obtain the fusion. The final global-local fusion feature map R^H×W×8C (that is, the fusion feature map) is input to the text recognition module for text recognition operation.

进一步的，将融合后的特征图送入检测模块，替代之前的粗略全局特征图，从而可以得到更为精确的文字区域检测结果。将融合后的全局与局部特征图R^H×W×8C送入文本识别模块。识别模块由双向长短期记忆网络(BiLSTM)和注意力解码器组成。BiLSTM对特征进行编码，注意力解码器对编码后的结果进行解码，生成最终的识别字符。Furthermore, the fused feature map is sent to the detection module to replace the previous rough global feature map, so that more accurate text area detection results can be obtained. The fused global and local feature maps R^H×W×8C are sent to the text recognition module. The recognition module consists of a bidirectional long short-term memory network (BiLSTM) and an attention decoder. BiLSTM encodes the features, and the attention decoder decodes the encoded results to generate the final recognized characters.

本发明有效且具有一定理论基础的解决方案以提高交通文本检测与识别的性能，将交通场景的全局特征和文字区域的局部特征结合，保证实时性的同时具有较高的检测精度和识别准确度。可以理解的是，图像预处理阶段采用暗通道先验算法对原始图像进行预处理，得到去雾、去雨雪后的清晰图像，为特征提取奠定基础。根据交通文本的先验特性并考虑检测的实时性需求，采用ResNet18提取特征，将普通卷积层替换为1×3、3×1、3×3三种不同卷积核融合卷积层，以提取粗略的全局特征，并通过上采样融合多尺度信息。在检测阶段先使用渐进尺度扩展算法得到粗略的文本区域估计，再通过对预测文本区域边界点的采样而计算距离图(与真实边界框的距离)，使用一个辅助损失来进行监督，缩小该距离，从而获得更为精确的文本实例区域。为了学习更显著的文字特征，将文本实例区域输入ResNet18网络以提取局部特征，以简洁高效的方式处理交通文本这类密集型场景文本，提出一个全局-局部特征融合模块，通过全局最大池化、卷积-批归一化-激活层、矩阵乘法将全图特征与局部文字区域特征充分融合，然后反馈到检测网络中进行特征图更新，从而提高交通文本的检测精度，为后续识别模块做好充分准备。提出的检测与识别方法可进行端到端的学习，在保证实时性的同时，具有较高的检测精度和识别准确度。This invention is an effective solution with a certain theoretical basis to improve the performance of traffic text detection and recognition. It combines the global features of the traffic scene with the local features of the text area to ensure real-time performance while having high detection accuracy and recognition accuracy. . It can be understood that in the image preprocessing stage, the dark channel prior algorithm is used to preprocess the original image to obtain a clear image after removing haze, rain and snow, which lays the foundation for feature extraction. According to the prior characteristics of traffic text and considering the real-time requirements of detection, ResNet18 is used to extract features, and the ordinary convolution layer is replaced by three different convolution kernels of 1×3, 3×1, and 3×3 to fuse the convolution layer. Extract coarse global features and fuse multi-scale information through upsampling. In the detection stage, the progressive scale expansion algorithm is first used to obtain a rough text area estimate, and then the distance map (distance from the real bounding box) is calculated by sampling the boundary points of the predicted text area, and an auxiliary loss is used for supervision to reduce the distance. , thereby obtaining a more accurate text instance area. In order to learn more significant text features, the text instance area is input into the ResNet18 network to extract local features and process dense scene text such as traffic text in a simple and efficient way. A global-local feature fusion module is proposed, which uses global maximum pooling, Convolution-batch normalization-activation layer and matrix multiplication fully integrate the full image features and local text area features, and then feed them back to the detection network for feature map update, thereby improving the detection accuracy of traffic text and preparing for the subsequent recognition module. fully prepare. The proposed detection and recognition method can perform end-to-end learning, and has high detection accuracy and recognition accuracy while ensuring real-time performance.

关于上述步骤S21、S22的具体内容可以参考前述实施例中公开的相应内容，在此不再进行赘述。Regarding the specific contents of the above steps S21 and S22, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and will not be described again here.

可见，本发明实施例通过获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果；在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点，并获取每个所述边界点与真实文本实例之间的最短真实距离；基于所述真实距离、预测距离以及预设损失函数获取所述初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作，保证实时性的同时具有较高的检测精度和识别准确度。It can be seen that the embodiment of the present invention obtains a to-be-processed image containing traffic text and performs an image pre-processing operation on the to-be-processed image to obtain a pre-processed image; the pre-processed image is obtained through a first preset feature extraction operation. Perform feature extraction on the image to obtain a global feature map; use the progressive scale expansion algorithm to perform segmentation prediction on the global feature map to obtain a segmentation result; randomly sample a second preset at the boundary of each text instance of the segmentation result A number of boundary points, and obtain the shortest real distance between each boundary point and the real text instance; obtain the initial text area based on the real distance, predicted distance and preset loss function, and compare the initial text The region performs the second preset feature extraction operation to obtain a local feature map; obtains a fused feature map based on the global feature map and the local feature map, and inputs the fused feature map into the text recognition module for text recognition operations to ensure It is real-time and has high detection accuracy and recognition accuracy.

参见图5所示，本发明实施例还相应公开了一种交通文本检测与识别装置，包括：Referring to Figure 5, the embodiment of the present invention also discloses a traffic text detection and recognition device, which includes:

图像预处理模块11，用于获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；The image preprocessing module 11 is used to obtain an image to be processed that contains traffic text, and perform an image preprocessing operation on the image to be processed to obtain a preprocessed image;

第一特征提取模块12，用于通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；The first feature extraction module 12 is configured to perform feature extraction on the preprocessed image through a first preset feature extraction operation to obtain a global feature map;

文本区域检测模块13，用于利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域；The text area detection module 13 is used to perform text area detection on the global feature map using a progressive scale expansion algorithm to obtain an initial text area;

第二特征提取模块14，用于对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；The second feature extraction module 14 is used to perform a second preset feature extraction operation on the initial text area to obtain a local feature map;

融合模块15，用于基于所述全局特征图、所述局部特征图获取融合特征图；The fusion module 15 is used to obtain a fusion feature map based on the global feature map and the local feature map;

文本识别模块16，用于将所述融合特征图输入至文本识别模块进行文本识别操作。The text recognition module 16 is used to input the fused feature map to the text recognition module to perform text recognition operations.

可见，本发明包括：获取包含交通文本的待处理图像，并对所述待处理图像进行图像预处理操作，以得到预处理后图像；通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图；利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，并对所述初始文本区域执行第二预设特征提取操作，得到局部特征图；基于所述全局特征图、所述局部特征图获取融合特征图，并将所述融合特征图输入至文本识别模块进行文本识别操作。由此可见，本发明通过对图像预处理来提高交通文本图像的清晰度，通过第一预设特征提取操作对所述预处理后图像进行特征提取，以便在参数较少的情况下保证检测精度，对所述全局特征图进行文本区域检测，以提高定位精度，得到更精确的文本区域，通过结合交通场景图像的全局特征和文字候选区域的局部特征进行文本检测与识别，在保证实时性的同时具有较高的检测精度和识别准确度。It can be seen that the present invention includes: obtaining an image to be processed containing traffic text, and performing an image preprocessing operation on the image to be processed to obtain a preprocessed image; and performing a first preset feature extraction operation on the preprocessed image. Perform feature extraction to obtain a global feature map; use a progressive scale expansion algorithm to perform text area detection on the global feature map to obtain an initial text area, and perform a second preset feature extraction operation on the initial text area to obtain a local Feature map: obtain a fused feature map based on the global feature map and the local feature map, and input the fused feature map into the text recognition module to perform a text recognition operation. It can be seen that the present invention improves the clarity of traffic text images through image preprocessing, and performs feature extraction on the preprocessed image through a first preset feature extraction operation to ensure detection accuracy with fewer parameters. , perform text area detection on the global feature map to improve positioning accuracy and obtain a more accurate text area. Text detection and recognition are performed by combining the global features of the traffic scene image and the local features of the text candidate area, while ensuring real-time performance. At the same time, it has high detection accuracy and recognition accuracy.

在一些具体实施例中，所述图像预处理模块11，具体包括：In some specific embodiments, the image preprocessing module 11 specifically includes:

待处理图像获取单元，用于获取包含交通文本的待处理图像；The image acquisition unit to be processed is used to acquire the image to be processed containing traffic text;

暗通道获取单元，用于求解所述待处理图像对应的暗通道；A dark channel acquisition unit, used to solve the dark channel corresponding to the image to be processed;

点选择单元，用于从所述暗通道中选择预设百分比的点；A point selection unit configured to select a preset percentage of points from the dark channel;

全局大气光估计值计算单元，用于计算所述待处理图像中与所述预设百分比的点对应的点的均值，以得到全局大气光估计值；A global atmospheric light estimation value calculation unit, configured to calculate the mean value of points corresponding to the preset percentage of points in the image to be processed to obtain a global atmospheric light estimation value;

传输率计算单元，用于获取所述待处理图像中每个待处理图像的每个点，并计算每个所述点对应的局部窗口的传输率；其中，所述局部窗口为以所述点为中心的局部窗口；A transmission rate calculation unit, used to obtain each point of each image to be processed in the image to be processed, and calculate the transmission rate of the local window corresponding to each of the points; wherein the local window is based on the point centered partial window;

预处理后图像获取单元，用于基于所述每个点、所述全局大气光估计值、所述传输率以及预设雾天退化模型获取所述预处理后图像。A preprocessed image acquisition unit configured to acquire the preprocessed image based on each point, the global atmospheric light estimate, the transmission rate, and a preset foggy weather degradation model.

在一些具体实施例中，所述第一特征提取模块12，具体包括：In some specific embodiments, the first feature extraction module 12 specifically includes:

卷积层替换单元，用于将所述ResNet18网络中的原始卷积层替换为目标卷积层；其中，所述目标卷积层为包含第一卷积核、第二卷积核以及第三卷积核的卷积层；A convolutional layer replacement unit, used to replace the original convolutional layer in the ResNet18 network with a target convolutional layer; wherein the target convolutional layer includes a first convolution kernel, a second convolution kernel and a third convolutional layer. Convolution layer of convolution kernel;

第一特征图获取单元，用于获取所述目标卷积层输出的第一预设数量个第一特征图；A first feature map acquisition unit, configured to acquire a first preset number of first feature maps output by the target convolution layer;

第二特征图获取单元，用于对所述第一预设数量个所述第一特征图进行上采样，以得到对应的第二特征图；A second feature map acquisition unit, configured to upsample the first preset number of first feature maps to obtain corresponding second feature maps;

目标特征图获取单元，用于将所述第一特征图与所述第二特征图中大小通道相同的特征图相加，以得到目标特征图；A target feature map acquisition unit, configured to add feature maps with the same size channel in the first feature map and the second feature map to obtain a target feature map;

特征融合后的特征图获取单元，用于基于通道将所述目标特征图进行拼接，以得到特征融合后的特征图；The feature map acquisition unit after feature fusion is used to splice the target feature map based on channels to obtain the feature map after feature fusion;

全局特征图获取单元，用于对所述特征融合后的特征图进行所述上采样，以得到所述全局特征图。A global feature map acquisition unit is configured to perform upsampling on the feature map after feature fusion to obtain the global feature map.

在一些具体实施例中，所述文本区域检测模块13，具体包括：In some specific embodiments, the text area detection module 13 specifically includes:

全局特征图分割预测单元，用于利用所述渐进尺度扩展算法对所述全局特征图进行分割预测，以得到分割结果；A global feature map segmentation prediction unit, used to perform segmentation prediction on the global feature map using the progressive scale expansion algorithm to obtain a segmentation result;

边界点采样单元，用于在所述分割结果的每条文本实例边界随机采样第二预设数量个边界点；A boundary point sampling unit, configured to randomly sample a second preset number of boundary points at the boundary of each text instance in the segmentation result;

真实距离获取单元，用于获取每个所述边界点与真实文本实例之间的真实距离；A real distance acquisition unit, used to obtain the real distance between each boundary point and the real text instance;

初始文本区域确定单元，用于基于所述真实距离、预测距离以及预设损失函数获取所述初始文本区域；其中，所述预测距离为所述分割结果中预测文本框边界点与所述真实文本实例之间的最短距离。An initial text area determination unit, configured to obtain the initial text area based on the real distance, predicted distance and preset loss function; wherein the predicted distance is the predicted text box boundary point in the segmentation result and the real text The shortest distance between instances.

在一些具体实施例中，所述第二特征提取模块14，具体包括：In some specific embodiments, the second feature extraction module 14 specifically includes:

局部特征图获取单元，用于将所述初始文本区域输入至所述ResNet18网络中进行卷积操作，以得到所述局部特征图。A local feature map acquisition unit is used to input the initial text area into the ResNet18 network to perform a convolution operation to obtain the local feature map.

在一些具体实施例中，所述融合模块15，具体包括：In some specific embodiments, the fusion module 15 specifically includes:

拼接特征图获取单元，用于将所述全局特征图与所述局部特征图进行拼接，以得到拼接特征图；A splicing feature map acquisition unit, used to splice the global feature map and the local feature map to obtain a spliced feature map;

切分特征图获取单元，用于根据所述拼接特征图的通道数划分所述拼接特征图，以得到与所述通道数的数量相同的切分特征图；A segmentation feature map acquisition unit, configured to divide the splicing feature map according to the number of channels of the splicing feature map to obtain segmentation feature maps with the same number as the number of channels;

第三特征图获取单元，用于将每个所述切分特征图输入至全局最大池化层，以得到第三特征图；The third feature map acquisition unit is used to input each of the segmented feature maps to the global maximum pooling layer to obtain the third feature map;

第四特征图和第五特征图获取单元，用于将所述切分特征图输入至卷积-批归一化-激活层并变化形状，以得到第四特征图和第五特征图；The fourth feature map and the fifth feature map acquisition unit are used to input the segmented feature map into the convolution-batch normalization-activation layer and change the shape to obtain the fourth feature map and the fifth feature map;

第六特征图获取单元，用于将所述第四特征图与所述第五特征图进行矩阵乘法，以得到第六特征图；A sixth feature map acquisition unit is used to perform matrix multiplication of the fourth feature map and the fifth feature map to obtain a sixth feature map;

第七特征图获取单元，用于将所述第三特征图与所述第六特征图进行融合，以得到第七特征图；A seventh feature map acquisition unit, configured to fuse the third feature map and the sixth feature map to obtain a seventh feature map;

第八特征图获取单元，用于拼接全部所述第七特征图，以得到第八特征图；An eighth feature map acquisition unit is used to splice all the seventh feature maps to obtain an eighth feature map;

第九特征图获取单元，用于将所述第八特征图输入至所述卷积-批归一化-激活层并进行所述上采样操作，以得到第九特征图；A ninth feature map acquisition unit, configured to input the eighth feature map into the convolution-batch normalization-activation layer and perform the upsampling operation to obtain a ninth feature map;

融合特征图获取单元，用于将所述第九特征图与所述拼接特征图相加融合，以得到所述融合特征图。A fusion feature map acquisition unit is configured to add and fuse the ninth feature map and the spliced feature map to obtain the fused feature map.

在一些具体实施例中，所述文本识别模块16，具体包括：In some specific embodiments, the text recognition module 16 specifically includes:

文本识别单元，用于将所述融合特征图输入至文本识别模块进行文本识别操作。The text recognition unit is used to input the fused feature map to the text recognition module to perform text recognition operations.

进一步的，本发明实施例还提供了一种电子设备。图6是根据一示例性实施例示出的电子设备20结构图，图中的内容不能认为是对本发明的使用范围的任何限制。Further, an embodiment of the present invention also provides an electronic device. FIG. 6 is a structural diagram of the electronic device 20 according to an exemplary embodiment. The content in the figure cannot be considered as any limitation on the scope of the present invention.

图6为本发明实施例提供的一种电子设备20的结构示意图。该电子设备20，具体可以包括：至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中，所述存储器22用于存储计算机程序，所述计算机程序由所述处理器21加载并执行，以实现前述任一实施例公开的交通文本检测与识别方法中的相关步骤。另外，本实施例中的电子设备20具体可以为电子计算机。FIG. 6 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present invention. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input-output interface 25 and a communication bus 26. The memory 22 is used to store a computer program, which is loaded and executed by the processor 21 to implement relevant steps in the traffic text detection and recognition method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in this embodiment may specifically be an electronic computer.

本实施例中，电源23用于为电子设备20上的各硬件设备提供工作电压；通信接口24能够为电子设备20创建与外界设备之间的数据传输通道，其所遵循的通信协议是能够适用于本发明技术方案的任意通信协议，在此不对其进行具体限定；输入输出接口25，用于获取外界输入数据或向外界输出数据，其具体的接口类型可以根据具体应用需要进行选取，在此不进行具体限定。In this embodiment, the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of the present invention is not specifically limited here; the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here No specific limitation is made.

另外，存储器22作为资源存储的载体，可以是只读存储器、随机存储器、磁盘或者光盘等，其上所存储的资源可以包括操作系统221、计算机程序222等，存储方式可以是短暂存储或者永久存储。In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc. The resources stored thereon can include an operating system 221, a computer program 222, etc., and the storage method can be short-term storage or permanent storage. .

其中，操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机程序222，其可以是Windows Server、Netware、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的交通文本检测与识别方法的计算机程序之外，还可以进一步包括能够用于完成其他特定工作的计算机程序。Among them, the operating system 221 is used to manage and control each hardware device and the computer program 222 on the electronic device 20, which can be Windows Server, Netware, Unix, Linux, etc. In addition to computer programs that can be used to complete the traffic text detection and recognition method executed by the electronic device 20 disclosed in any of the foregoing embodiments, the computer program 222 may further include computer programs that can be used to complete other specific tasks.

进一步的，本发明实施例还公开了一种存储介质，所述存储介质中存储有计算机程序，所述计算机程序被处理器加载并执行时，实现前述任一实施例公开的交通文本检测与识别方法步骤。Further, the embodiment of the present invention also discloses a storage medium. A computer program is stored in the storage medium. When the computer program is loaded and executed by the processor, the traffic text detection and recognition disclosed in any of the foregoing embodiments is realized. Method steps.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其它实施例的不同之处，各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or any such actual relationship or sequence between operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.

以上对本发明所提供的一种交通文本检测与识别方法、装置、设备及存储介质进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。The traffic text detection and recognition method, device, equipment and storage medium provided by the present invention have been introduced in detail above. This article uses specific examples to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only It is used to help understand the method and its core idea of the present invention; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation and application scope according to the idea of the present invention. In summary, this The content of the description should not be construed as limiting the invention.

Claims

Translated fromChinese

1.一种交通文本检测与识别方法，其特征在于，包括：1. A traffic text detection and recognition method, characterized by including:

2.根据权利要求1所述的交通文本检测与识别方法，其特征在于，所述对所述待处理图像进行图像预处理操作，以得到预处理后图像，包括：2. The traffic text detection and recognition method according to claim 1, characterized in that the image preprocessing operation is performed on the image to be processed to obtain the preprocessed image, including:

3.根据权利要求1所述的交通文本检测与识别方法，其特征在于，所述通过第一预设特征提取操作对所述预处理后图像进行特征提取，以得到全局特征图，包括：3. The traffic text detection and recognition method according to claim 1, characterized in that the feature extraction of the pre-processed image through a first preset feature extraction operation to obtain a global feature map includes:

4.根据权利要求3所述的交通文本检测与识别方法，其特征在于，所述利用ResNet18网络对所述预处理后图像进行特征提取，以得到全局特征图，包括：4. The traffic text detection and recognition method according to claim 3, characterized in that the ResNet18 network is used to perform feature extraction on the pre-processed image to obtain a global feature map, including:

5.根据权利要求4所述的交通文本检测与识别方法，其特征在于，所述对所述初始文本区域执行第二预设特征提取操作，得到局部特征图，包括：5. The traffic text detection and recognition method according to claim 4, wherein the second preset feature extraction operation is performed on the initial text area to obtain a local feature map, including:

6.根据权利要求4所述的交通文本检测与识别方法，其特征在于，所述基于所述全局特征图、所述局部特征图获取融合特征图，包括：6. The traffic text detection and recognition method according to claim 4, characterized in that said obtaining the fusion feature map based on the global feature map and the local feature map includes:

7.根据权利要求1至6任一项所述的交通文本检测与识别方法，其特征在于，所述利用渐进尺度扩展算法对所述全局特征图进行文本区域检测，以得到初始文本区域，包括：7. The traffic text detection and recognition method according to any one of claims 1 to 6, characterized in that the use of a progressive scale expansion algorithm to perform text area detection on the global feature map to obtain an initial text area includes :

8.一种交通文本检测与识别装置，其特征在于，包括：8. A traffic text detection and recognition device, characterized by including:

9.一种电子设备，其特征在于，包括：9. An electronic device, characterized in that it includes:

处理器，用于执行所述计算机程序，以实现如权利要求1至7任一项所述的交通文本检测与识别方法的步骤。A processor, configured to execute the computer program to implement the steps of the traffic text detection and recognition method according to any one of claims 1 to 7.

10.一种计算机可读存储介质，其特征在于，用于存储计算机程序；其中，所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的交通文本检测与识别方法。10. A computer-readable storage medium, characterized in that it is used to store a computer program; wherein when the computer program is executed by a processor, the traffic text detection and recognition method as claimed in any one of claims 1 to 7 is implemented. .