技术领域Technical Field
本发明涉及互联网与云计算产业中工业物联网信息感知;人工智能产业中生产领域人工智能等技术领域,尤其涉及一种图像中线状结构识别分割的深度学习模型、方法、存储介质和装置。The present invention relates to the technical fields of industrial Internet of Things information perception in the Internet and cloud computing industries; artificial intelligence in the production field in the artificial intelligence industry, and in particular to a deep learning model, method, storage medium and device for identifying and segmenting linear structures in images.
背景技术Background Art
随着智能化和无人化的不断发展,无人机的研究和应用越来越广泛,在军事领域,无人机被广泛用于侦察、目标识别、通信中继、电子对抗等任务。而在民用领域,无人机在航拍、农业、植保、快递运输、灾难救援、观察野生动物、电力巡检、影视拍摄等方面也有广泛的应用。发达国家也在积极扩展行业应用与发展无人机技术,将其归类为“会飞的照相机”。图像处理和目标检测在无人机作业中的运用也非常的成熟和广泛,但是目前暂时没有专门针对无人机作业时电线的识别和避障。With the continuous development of intelligence and unmanned operation, the research and application of drones are becoming more and more extensive. In the military field, drones are widely used in reconnaissance, target identification, communication relay, electronic countermeasures and other tasks. In the civilian field, drones are also widely used in aerial photography, agriculture, plant protection, express transportation, disaster relief, wildlife observation, power inspection, film and television shooting, etc. Developed countries are also actively expanding industry applications and developing drone technology, classifying it as a "flying camera". The application of image processing and target detection in drone operations is also very mature and extensive, but there is currently no special recognition and obstacle avoidance of wires during drone operations.
目前无人机在电力线巡检作业中的应用非常多,但是一般会选择天气等外界环境适合时候进行作业,并且会事先根据电力线的排布进行路径规划,然后无人机再在电线上面进行俯拍和检测。Currently, drones are widely used in power line inspection operations, but generally the operations are carried out when the weather and other external conditions are suitable, and the route is planned in advance according to the layout of the power lines. The drones then take aerial photos and conduct inspections on the power lines.
在实现本发明的过程中,申请人发现传统技术图像中线状结构识别分割方法存在如下技术缺陷:In the process of implementing the present invention, the applicant discovered that the conventional method for identifying and segmenting linear structures in images has the following technical defects:
(1)实时性较差:现有的一些对电线检测做到研究利用的数据集都是俯视图或者近景图,而且背景单一且复杂,无法将其算法利用到无人机的实时作业中。(1) Poor real-time performance: Some existing data sets used for research on wireline detection are overhead views or close-up views, and the background is simple and complex, making it impossible to use their algorithms in real-time operations of drones.
(2)准确率较低:现有的技术对无人机实时飞行中的电线识别率不高,需要新的模型来适用于实时作业的情况。(2) Low accuracy: The existing technology has a low accuracy for identifying wires during real-time flight of drones, and new models are needed to adapt to real-time operation situations.
(3)图像清晰度较差:现有数据集的图像大部分是由专业相机进行拍摄,但是一般民用的无人机并不会配备高质量相机,所以需要利用图像处理的技术对图像的质量进行优化。并且无人机的作业环境可能是雨天或者雾天,一般的无人机并不会配备对复杂天气的图像处理工具。(3) Poor image clarity: Most of the images in the existing datasets are taken by professional cameras, but ordinary civilian drones are not equipped with high-quality cameras, so image processing technology is needed to optimize the image quality. In addition, the operating environment of drones may be rainy or foggy, and ordinary drones are not equipped with image processing tools for complex weather.
针对以上三个方面,目前并没有针对无人机实时作业中的电线识别和避障的成熟解决方案,Regarding the above three aspects, there is currently no mature solution for wire identification and obstacle avoidance in real-time drone operations.
发明内容Summary of the invention
一、要解决的技术问题1. Technical issues to be solved
本发明期望能够至少部分解决上述技术问题中的其中之一。The present invention is expected to at least partially solve one of the above technical problems.
二、技术方案2. Technical Solution
本发明第一方面提供了一种图像中线状结构识别分割的深度学习模型,包括:A first aspect of the present invention provides a deep learning model for identifying and segmenting linear structures in an image, comprising:
主神经网络,包括:The main neural network, including:
编码器模块,包括:N个编码器块,N≥2,编码器块用于将待识别图像转化为空间缩减的特征图;An encoder module, comprising: N encoder blocks, N ≥ 2, the encoder blocks are used to convert the image to be recognized into a spatially reduced feature map;
特征融合模块,由空间缩减的特征图得到结合全局上下文和局部细节的融合特征图;Feature fusion module, which obtains a fused feature map combining global context and local details from the spatially reduced feature map;
残差解码器模块,包括:N个解码器块,残差解码器模块以融合特征图作为输入,逐步恢复特征图的分辨率,并细化分割结果;The residual decoder module includes: N decoder blocks. The residual decoder module takes the fused feature map as input, gradually restores the resolution of the feature map, and refines the segmentation result;
识别结果输出模块,用于将细化分割结果转换为二进制分割掩码输出;The recognition result output module is used to convert the refined segmentation result into a binary segmentation mask output;
边缘检测解码器神经网络,以编码器模块中的前2个编码器块的输出作为输入,用于强化对边缘特征的学习能力;The edge detection decoder neural network takes the output of the first two encoder blocks in the encoder module as input to enhance the learning ability of edge features;
其中,边缘检测解码器神经网络中提取的边缘特征馈入相应的解码器块中。Among them, the edge features extracted from the edge detection decoder neural network are fed into the corresponding decoder block.
本发明第二方面提供了一种图像中线状结构识别分割的方法,包括:对原始图像进行清晰化处理,得到待识别图像;将待识别图像输入如上的图像中线状结构识别分割的深度学习模型,得到对输入图像中线状结构的识别结果。The second aspect of the present invention provides a method for identifying and segmenting linear structures in an image, comprising: clarifying the original image to obtain an image to be identified; inputting the image to be identified into the deep learning model for identifying and segmenting linear structures in images as described above to obtain a recognition result of the linear structure in the input image.
本发明第三方面提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现:如上的深度学习模型;或,如上的方法。A third aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements: the deep learning model as described above; or, the method as described above.
本发明第四方面提供了计算机装置,包括:处理器;存储器,其上存储有计算机程序;处理器执行计算机程序时,实现:如上的深度学习模型;或,如上的方法。The fourth aspect of the present invention provides a computer device, comprising: a processor; a memory on which a computer program is stored; when the processor executes the computer program, it implements: the deep learning model as above; or, the method as above.
三、有益效果3. Beneficial Effects
从上述技术方案可知,本发明相对于现有技术至少具有以下有益效果之一:It can be seen from the above technical solution that the present invention has at least one of the following beneficial effects compared with the prior art:
1.本发明中,深度学习模型能够同时进行边界预测和语义分割,显著增强了模型在训练过程中对边缘特征的学习能力。边缘检测解码器神经网络一方面在不同尺度上进行跨层特征融合,有效增强了模型对边缘检测的敏感性和准确性;另一方面,通过损失函数对预测图与真实标签之间进行损失计算,使主神经网络能够有效捕捉边界信息并增强分割网络中的层次特征,从而得到更精细的分割结果。1. In the present invention, the deep learning model can perform boundary prediction and semantic segmentation at the same time, which significantly enhances the model's ability to learn edge features during training. On the one hand, the edge detection decoder neural network performs cross-layer feature fusion at different scales, effectively enhancing the model's sensitivity and accuracy to edge detection; on the other hand, the loss function is used to calculate the loss between the predicted image and the true label, so that the main neural network can effectively capture boundary information and enhance the hierarchical features in the segmentation network, thereby obtaining a more refined segmentation result.
2.单一的损失函数对模型的提升有限,因此本发明提出加权损失函数,通过对背景与电线的损失进行权重分配来解决图形分割中样本不平衡问题,防止其干扰模型的训练收敛过程,并提升模型在测试阶段的泛化性能。2. A single loss function has limited effect on the improvement of the model. Therefore, the present invention proposes a weighted loss function, which solves the sample imbalance problem in graphic segmentation by weighting the losses of the background and the wires, prevents them from interfering with the training convergence process of the model, and improves the generalization performance of the model in the testing phase.
3.为了充分利用低级特征中的边缘信息,同时减少噪声干扰,本发明中采用了门控融合策略(GatedFusion)来融合编码器前两个阶段的特征。融合后的特征经过卷积层进行边缘提取,并与解码器的输出特征进行叠加,从而在上采样过程中有效恢复边缘特征。3. In order to make full use of the edge information in the low-level features and reduce noise interference, the present invention adopts the gated fusion strategy to fuse the features of the first two stages of the encoder. The fused features are extracted through the convolution layer and superimposed with the output features of the decoder, so as to effectively restore the edge features during the upsampling process.
4.本发明中,面对外部环境复杂的作业环境,图像可能会存在高曝光的情况,本发明中对天空区域和非天空区域进行划分,利用非天空区域进行光照强度的计算,最后再以此进行亮度恢复,提升清晰化的效果。4. In the present invention, facing the working environment with complex external environment, the image may have high exposure. In the present invention, the sky area and the non-sky area are divided, and the light intensity is calculated using the non-sky area. Finally, the brightness is restored to improve the clarity effect.
5.本发明中,利用中值滤波器和最小值滤波器结合的方式提取图像中的边缘信息,与传统单一滤波器相比,能获取更多、更细节的边缘信息。5. In the present invention, the edge information in the image is extracted by combining the median filter and the minimum filter. Compared with the traditional single filter, more and more detailed edge information can be obtained.
6.本发明中,利用传导滤波算法对透射图进行优化,能够避免出现模糊和伪影现象并有效的进一步抑制图像中的噪声,同时对计算效率的提升也表现突出。6. In the present invention, the transmission map is optimized using a conduction filtering algorithm, which can avoid blurring and artifacts and effectively further suppress noise in the image, while also significantly improving computational efficiency.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明实施例图像中线状结构识别分割的方法的整体运行流程图FIG. 1 is a flowchart of the overall operation of the method for identifying and segmenting linear structures in an image according to an embodiment of the present invention.
图2为图1所示图像中线状结构识别分割方法中图像清晰化处理部分的流程图。FIG. 2 is a flow chart of the image sharpening processing part in the linear structure recognition and segmentation method in the image shown in FIG. 1 .
图3为图2所示图像清晰化处理部分中各步骤执行所得到的图像。FIG. 3 is an image obtained by executing each step in the image sharpening processing part shown in FIG. 2 .
图4为图1所示图像中线状结构识别分割方法中用于线状结构识别的EdgeAwareTRUNet神经网络模型的主体结构图。FIG. 4 is a main structure diagram of the EdgeAwareTRUNet neural network model used for linear structure recognition in the linear structure recognition and segmentation method in the image shown in FIG. 1 .
图5为图4中EdgeAware TRUNet神经网络模型中边缘检测解码器神经网络的处理流程图。FIG5 is a processing flow chart of the edge detection decoder neural network in the EdgeAware TRUNet neural network model in FIG4 .
图6为本发明实施例航拍图像电力线识别分割方法的流程示意图。FIG. 6 is a schematic flow chart of a method for identifying and segmenting power lines in aerial images according to an embodiment of the present invention.
图7为本发明实施例计算机可读存储介质的示意图。FIG. 7 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
图8为本发明实施例计算机装置的结构示意图。FIG. 8 is a schematic diagram of the structure of a computer device according to an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
本发明设计的应用场景针对的是无人机在未知环境的实时作业,根据无人机所携带的摄像机所拍摄的平视远景的画面进行电线识别。设计目标是提供一种能够适用于多种复杂天气环境下的快速处理图像的工具和训练一个基于语义分割在多案例测试下的深度学习模型。The application scenario designed by the invention is aimed at the real-time operation of drones in unknown environments, and the wires are identified based on the horizontal view images taken by the camera carried by the drone. The design goal is to provide a tool that can quickly process images in a variety of complex weather environments and train a deep learning model based on semantic segmentation under multi-case tests.
本领域技术人员应当理解,虽然本发明是针对无人机识别电线的应用场景而研发,但其同样可以应用于其他的,在图像中识别线状结构的应用场景,这些同样在本发明的保护范围之内。Those skilled in the art should understand that although the present invention is developed for the application scenario of drones identifying wires, it can also be applied to other application scenarios of identifying linear structures in images, which are also within the scope of protection of the present invention.
本发明的整体分为两个部分:The whole of the present invention is divided into two parts:
①图像的清晰化处理① Image clarity processing
因为在未知环境下所获得的实时图像会受到天气等外界因素的影响而降低电线的识别率,本部分需要所获取的图像进行清晰化处理。Because the real-time images obtained in an unknown environment will be affected by external factors such as weather and reduce the recognition rate of wires, this part requires the acquired images to be clarified.
②神经网络识别②Neural network recognition
在清晰化处理后的图像中利用深度学习模型进行电线的识别,从根据实时图像中是否存在有威胁的电线对无人机进行预警,有效避免无人机的炸机和损坏。A deep learning model is used to identify wires in the cleared image, and drones are warned based on whether there are threatening wires in the real-time image, effectively avoiding drone crashes and damage.
为了方便理解,下文按照以上两个部分的顺序进行说明。但本领域技术人员应当理解,对于未经过清晰化处理的图像,如果分辨率满足特定要求,其同样可以直接应用本发明中的深度学习模型进行线状结构识别,即,本发明中的“图像中线状结构识别分割的深度学习模型”可以单独的实施,这些同样在本发明的保护范围之内。For ease of understanding, the following description is made in the order of the above two parts. However, those skilled in the art should understand that for images that have not been sharpened, if the resolution meets specific requirements, the deep learning model of the present invention can also be directly applied to perform linear structure recognition, that is, the "deep learning model for linear structure recognition and segmentation in images" in the present invention can be implemented separately, which is also within the scope of protection of the present invention.
为使本发明的目的、技术方案和优点更加清楚明白,下文结合具体实施方式,并参照附图,对本发明进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with specific implementation methods and with reference to the accompanying drawings.
在本发明的一个示例性实施例中,提供了一种图像中线状结构识别分割的方法。在本实施例中,图像为无人机在空中飞行过程中通过摄像头所获得的实时图像,由于硬件和带宽限制,该实时图像的分辨率可能不高。本实施例中,线状结构为空中布设的电线,该电线可能是高压电缆,也可能是民用电缆、绳索等结构。In an exemplary embodiment of the present invention, a method for identifying and segmenting linear structures in an image is provided. In this embodiment, the image is a real-time image obtained by a camera while a drone is flying in the air. Due to hardware and bandwidth limitations, the resolution of the real-time image may not be high. In this embodiment, the linear structure is an electric wire laid in the air, which may be a high-voltage cable, or a civilian cable, a rope, or other structure.
图1为本发明实施例图像中线状结构识别分割的方法的整体运行流程图。如图1所示,本实施例主要分为三部分:FIG1 is a flowchart of the overall operation of the method for identifying and segmenting linear structures in an image according to an embodiment of the present invention. As shown in FIG1 , this embodiment is mainly divided into three parts:
①图像的清晰化处理;① Image clarity processing;
在图像清晰化处理部分中,本发明首先利用暗通道先验算法、导向滤波算法和灰度世界假设理论进行快速的图像清晰化处理。In the image sharpening processing part, the present invention firstly uses a dark channel priori algorithm, a guided filtering algorithm and a gray world hypothesis theory to perform rapid image sharpening processing.
②电线识别;②Wire identification;
在电线识别部分,本发明引入了一种先进的深度学习模型——Edge AwareTRUNet,进行电力线的精确识别和分割。该模型融合了编码器-解码器结构、Transformer编码器模块以及专门的边缘检测解码器神经网络,旨在强化边缘特征的学习与提取,确保电力线分割的准确性和效率。In the wire identification part, this paper introduces an advanced deep learning model, Edge AwareTRUNet, to accurately identify and segment power lines. This model combines the encoder-decoder structure, the Transformer encoder module, and a dedicated edge detection decoder neural network to enhance the learning and extraction of edge features and ensure the accuracy and efficiency of power line segmentation.
③上传识别信息并预警。③Upload identification information and issue an early warning.
通过上述的图像处理和深度学习技术,本发明的图像中线状结构识别分割的方法在无人机航拍等实时作业场景中展现了显著的实用性和优势。Through the above-mentioned image processing and deep learning technology, the method of identifying and segmenting linear structures in images of the present invention has demonstrated significant practicality and advantages in real-time operation scenarios such as drone aerial photography.
以下重点针对第①、②两部分进行说明。图2为图1所示图像中线状结构识别分割方法中图像清晰化处理部分的流程图。图3为图2所示图像清晰化处理部分中各步骤执行所得到的图像。The following focuses on the description of parts ① and ②. Figure 2 is a flow chart of the image sharpening processing part in the linear structure recognition and segmentation method in the image shown in Figure 1. Figure 3 is an image obtained by executing each step in the image sharpening processing part shown in Figure 2.
如图2所示,本实施例中,图像清晰化处理部分包括:As shown in FIG. 2 , in this embodiment, the image sharpening processing part includes:
步骤A,区分原始图像IMG中的天空区域和非天空区域,分别存储为天空区域图像Sky_IMG和非天空区域图像Non_Sky_IMG,利用非天空区域图像计算光照强度A;Step A, distinguishing the sky area and the non-sky area in the original image IMG, storing them as the sky area image Sky_IMG and the non-sky area image Non_Sky_IMG respectively, and calculating the illumination intensity A using the non-sky area image;
步骤B,对于原始图像,采用中值滤波器和最小值滤波器结合的方式获得其所有边缘,获得边缘图像D;Step B, for the original image, using a combination of a median filter and a minimum filter to obtain all its edges, and obtaining an edge image D;
步骤C,对于原始图像,利用光照强度A提取其透射图;利用透射图和原始图像的灰度图像,由传导滤波算法进行优化,得到精炼透射图像Tr_G;Step C, for the original image, extract its transmission map using the illumination intensity A; optimize the transmission map and the grayscale image of the original image using a transmission filter algorithm to obtain a refined transmission image Tr_G;
步骤D,对于非天空区域图像,在RGB颜色空间内,计算各个颜色通道的平均色差dc,c∈(R,G,B);Step D: for the non-sky area image, in the RGB color space, calculate the average color difference dc , c∈(R,G,B) of each color channel;
步骤E,对于非天空区域图像,在RGB颜色空间内的每个颜色通道,调用图像恢复公式进行处理,得到每个颜色通道对应的清晰化之后的图像,多个颜色通道清晰化之后的图像叠加,得到清晰化后的恢复图像J;Step E: for the non-sky area image, in each color channel in the RGB color space, the image restoration formula is called to process, and the cleared image corresponding to each color channel is obtained. The cleared images of multiple color channels are superimposed to obtain the cleared restored image J;
步骤F,将所述恢复图像J和天空区域图像Sky_IMG组合,得到清晰化后的待识别图像。Step F, combining the restored image J and the sky area image Sky_IMG to obtain a cleared image to be identified.
以下对图像清晰化处理部分的各个步骤进行详细说明。The following is a detailed description of each step of the image sharpening process.
在实际场景中,电力线航拍图像的原始图像如图3中(a)所示,考虑到相机的曝光,天空的亮度一般会远高于其他部分,所以首先需要区分图像的天空区域和非天空区域来作为后续计算大气光照值的预处理。基于此,本实施例中,步骤A进一步包括:In actual scenes, the original image of the aerial image of the power line is shown in FIG3 (a). Considering the exposure of the camera, the brightness of the sky is generally much higher than other parts, so it is necessary to first distinguish the sky area and the non-sky area of the image as a preprocessing for the subsequent calculation of the atmospheric illumination value. Based on this, in this embodiment, step A further includes:
子步骤A1,将原始图像IMG从RGB颜色空间转换到HSV颜色空间-HSV_IMG;Sub-step A1, converting the original image IMG from the RGB color space to the HSV color space - HSV_IMG;
子步骤A2,分离HSV颜色空间中的各颜色通道H,S,V;Sub-step A2, separating each color channel H, S, V in the HSV color space;
子步骤A3,利用V颜色通道亮度计算亮度平均值AVG_B;Sub-step A3, using the brightness of the V color channel to calculate the average brightness AVG_B;
子步骤A4,将原始图像转换为灰度图像Gray_IMG;如果灰度图像Gray_IMG的亮度平均值AVG_B大于120,则对灰度图像进行直方图均衡化处理;如果亮度平均值AVG_B小于120,则维持当前灰度图像;Sub-step A4, converting the original image into a grayscale image Gray_IMG; if the average brightness value AVG_B of the grayscale image Gray_IMG is greater than 120, performing histogram equalization processing on the grayscale image; if the average brightness value AVG_B is less than 120, maintaining the current grayscale image;
子步骤A5,对于灰度图像Gray_IMG,将灰度大于灰度阈值的部分划分为非天空区域;小于等于灰度阈值的部分划分为天空区域;Sub-step A5, for the grayscale image Gray_IMG, the part with grayscale greater than the grayscale threshold is divided into the non-sky area; the part with grayscale less than or equal to the grayscale threshold is divided into the sky area;
子步骤A6,将非天空区域和天空区域分别存储为非天空区域图像Non_Sky_IMG和天空区域图像Sky_IMG;Sub-step A6, storing the non-sky area and the sky area as a non-sky area image Non_Sky_IMG and a sky area image Sky_IMG respectively;
其中,非天空区域图像Non_Sky_IMG如图3中(b)所示。The non-sky area image Non_Sky_IMG is shown in FIG3( b ).
子步骤A7,利用非天空区域图像Non_Sky_IMG计算光照强度A。Sub-step A7, calculating the illumination intensity A using the non-sky area image Non_Sky_IMG.
需要特别说明的是,本实施例中,面对外部环境复杂的作业环境,图像可能会存在高曝光的情况,本发明中对天空区域和非天空区域进行划分,利用非天空区域进行光照强度的计算,最后再以此进行亮度恢复,提升清晰化的效果。It should be specially noted that in this embodiment, facing the working environment with complex external environment, the image may have high exposure. In the present invention, the sky area and the non-sky area are divided, and the light intensity is calculated using the non-sky area. Finally, the brightness is restored to improve the clarity effect.
本实施例中,步骤B采用传统的中值滤波器和最小值滤波器结合的方式获取图像的所有边缘。具体地,步骤B进一步包括:In this embodiment, step B uses a combination of a traditional median filter and a minimum filter to obtain all edges of the image. Specifically, step B further includes:
子步骤B1,对于原始图像,在RGB颜色空间的各颜色通道内选择最小值图像W;Sub-step B1, for the original image, selecting a minimum value image W in each color channel of the RGB color space;
子步骤B2,利用中值滤波器对最小值图像W进行去噪处理,得到W_Median,其中,所述中值滤波器的中值取得是每个3×3像素框内的中间值;Sub-step B2, using a median filter to perform denoising on the minimum image W to obtain W_Median, wherein the median of the median filter is the middle value in each 3×3 pixel frame;
子步骤B3,将最小值图像W按照3×3的网格进行划分,每个网格都用最小值腐蚀得到W_Min;Sub-step B3, divide the minimum image W into 3×3 grids, and use the minimum value to erode each grid to obtain W_Min;
子步骤B4,计算边缘图像D:D=Omega*min(W_Median,W)-W_Min,其中,Omega为预设参数,其取值范围为0.9~0.99。Sub-step B4, calculating the edge image D: D=Omega*min(W_Median, W)-W_Min, where Omega is a preset parameter, and its value range is 0.9 to 0.99.
本实施例中,由子步骤B4得到的边缘信息图D如图3中(d)所示。In this embodiment, the edge information graph D obtained by sub-step B4 is shown in (d) of FIG. 3 .
需要特别说明的是,本实施例中,利用中值滤波器和最小值滤波器结合的方式提取图像中的边缘信息,与传统单一滤波器相比,能获取更多、更细节的边缘信息。It should be noted that, in this embodiment, the edge information in the image is extracted by combining the median filter and the minimum filter, which can obtain more and more detailed edge information compared with the traditional single filter.
本实施例中,步骤C主要用于图像处理中的平滑和增强。引导滤波器结合了输入图像和引导图像的信息,能够同时做到图像去噪、图像增强和边缘保留。具体地,步骤C进一步包括:In this embodiment, step C is mainly used for smoothing and enhancing in image processing. The guided filter combines the information of the input image and the guided image, and can simultaneously perform image denoising, image enhancement and edge preservation. Specifically, step C further includes:
子步骤C1,由原始图像得到透射图像:Tr=1-0.95*(erode(min(IMG/A,2)))-D,其中,erode()为腐蚀函数;Sub-step C1, obtaining a transmission image from the original image: Tr = 1-0.95*(erode(min(IMG/A,2)))-D, where erode() is an erosion function;
本实施例中,由子步骤C1得到的透射图像Tr如图3中(c)所示。In this embodiment, the transmission image Tr obtained by sub-step C1 is shown in FIG. 3( c ).
子步骤C2,根据预设半径参数R计算原始图像的均值IMG_I和透射图像的均值IMG_p,以及,原始图像与透射图像的乘积的均值IMG_Ip;Sub-step C2, calculating the mean IMG_I of the original image and the mean IMG_p of the transmission image, as well as the mean IMG_Ip of the product of the original image and the transmission image according to a preset radius parameter R;
本实施例中,R=20。但在本发明其他实施例中,5≤R≤50。In this embodiment, R=20. However, in other embodiments of the present invention, 5≤R≤50.
子步骤C3,计算灰度图像的均值的协方差Cov_Ip和方差Var_I;Sub-step C3, calculating the covariance Cov_Ip and variance Var_I of the mean of the grayscale image;
子步骤C4,计算参数a和参数b:a=Cov_Ip/(Var_I+Eps);b=IMG_p-a*IMG_I,其中,Eps为预设参数,Eps≤0.0001;Sub-step C4, calculating parameter a and parameter b: a=Cov_Ip/(Var_I+Eps); b=IMG_p-a*IMG_I, where Eps is a preset parameter, Eps≤0.0001;
子步骤C5,使用参数a和参数b对灰度图像进行滤波,得到IMG_A和IMG_B;Sub-step C5, filtering the grayscale image using parameter a and parameter b to obtain IMG_A and IMG_B;
子步骤C6,由IMG_A和IMG_B得到精炼透射图像Tr_G:Tr_G=IMG_A*IMG+IMG_B。Sub-step C6, obtaining a refined transmission image Tr_G from IMG_A and IMG_B: Tr_G = IMG_A*IMG+IMG_B.
本实施例中,由子步骤C6得到的精炼透射图像Tr_G如图3中(e)所示。In this embodiment, the refined transmission image Tr_G obtained by sub-step C6 is shown in FIG. 3( e ).
需要特别说明的是,本实施例中,利用传导滤波算法对透射图进行优化,能够避免出现模糊和伪影现象并有效的进一步抑制图像中的噪声,同时对计算效率的提升也表现突出。It should be noted that, in this embodiment, the transmission map is optimized using a conduction filtering algorithm, which can avoid blurring and artifacts and effectively further suppress noise in the image, while also significantly improving computational efficiency.
本实施例中,步骤E主要用于对图像进行恢复,图像恢复公式为:In this embodiment, step E is mainly used to restore the image, and the image restoration formula is:
其中,Jc为清晰化后的特定通道的非天空区域图像;Non_Sky_IMGc为清晰化前的特定通道的非天空区域图像;Ac为清晰化前的特定通道的非天空区域图像的光照强度;Tr_G为精炼透射图像;T(0)为设定阈值,其范围介于0.05~0.4之间。Among them, Jc is the non-sky area image of the specific channel after sharpening; Non_Sky_IMGc is the non-sky area image of the specific channel before sharpening;Ac is the illumination intensity of the non-sky area image of the specific channel before sharpening; Tr_G is the refined transmission image; T(0) is the set threshold, which ranges from 0.05 to 0.4.
需要特别说明的是,本实施例中,利用图像恢复公式分别对非天空区域图像的各个颜色通道进行色彩恢复,能够更好的对图像进行保真。It should be noted that, in this embodiment, the image restoration formula is used to perform color restoration on each color channel of the non-sky area image, so that the image fidelity can be better preserved.
本实施例中,所述步骤F进一步包括:In this embodiment, the step F further comprises:
子步骤F1,将清晰化之后的恢复图像J和天空区域图像Sky_IMG组合,得到中间图像O;Sub-step F1, combining the cleared restored image J and the sky area image Sky_IMG to obtain an intermediate image O;
子步骤F2,对于中间图像O,利用原始图像的亮度平均值AVG_B对其进行自适应亮度调整,得到清晰化后的待识别图像,如图3中(f)所示。Sub-step F2, for the intermediate image O, adaptively adjust its brightness using the average brightness value AVG_B of the original image to obtain a cleared image to be recognized, as shown in (f) in FIG. 3 .
在本实施例的第二部分-电线识别,提出了一种名为EdgeAware TRUNet的新型电力线提取网络,该网络专为航拍图像中的电力线自动提取而设计,旨在通过融合深度学习领域的先进技术,提高电力线分割的准确性和效率。In the second part of this embodiment - wire identification, a new power line extraction network called EdgeAware TRUNet is proposed, which is designed for automatic extraction of power lines in aerial images and aims to improve the accuracy and efficiency of power line segmentation by integrating advanced technologies in the field of deep learning.
图4为图1所示图像中线状结构识别分割方法中用于线状结构识别的EdgeAwareTRUNet神经网络模型的主体结构图。FIG. 4 is a main structure diagram of the EdgeAwareTRUNet neural network model used for linear structure recognition in the linear structure recognition and segmentation method in the image shown in FIG. 1 .
如图4所示,本实施例中,图像中线状结构识别分割的深度学习模型包括:主神经网络和边缘检测解码器神经网络。其中,主解码器块的输出则通过1×1卷积层和sigmoid激活函数生成二进制分割掩码,该掩码会与真实标签进行Dice和BCE融合损失计算,通过最小化这些损失能够优化模型的分割效果,得到最终电力线分割图。边缘解码器神经网络会生成一个边缘特征图,用于计算与真实标签之间的边缘损失。As shown in FIG4 , in this embodiment, the deep learning model for linear structure recognition and segmentation in an image includes: a main neural network and an edge detection decoder neural network. The output of the main decoder block generates a binary segmentation mask through a 1×1 convolutional layer and a sigmoid activation function. The mask is subjected to Dice and BCE fusion loss calculations with the true label. By minimizing these losses, the segmentation effect of the model can be optimized to obtain the final power line segmentation map. The edge decoder neural network generates an edge feature map for calculating the edge loss with the true label.
其中,主神经网络包括:Among them, the main neural network includes:
编码器模块,包括:N个编码器块,N≥2,编码器块用于将待识别图像转化为空间缩减的特征图;An encoder module, comprising: N encoder blocks, N ≥ 2, the encoder blocks are used to convert the image to be recognized into a spatially reduced feature map;
特征融合模块,由空间缩减的特征图得到结合全局上下文和局部细节的融合特征图,包括:Transformer编码器模块、扩张卷积模块(Dilated Conv);Feature fusion module, which obtains a fused feature map combining global context and local details from the spatially reduced feature map, including: Transformer encoder module, dilated convolution module (Dilated Conv);
残差解码器模块(Residual Block),包括:N个解码器块,残差解码器模块以融合特征图作为输入,逐步恢复特征图的分辨率,并细化分割结果;The residual decoder module (Residual Block) includes: N decoder blocks. The residual decoder module takes the fused feature map as input, gradually restores the resolution of the feature map, and refines the segmentation result;
识别结果输出模块,用于将细化分割结果转换为二进制分割掩码输出;The recognition result output module is used to convert the refined segmentation result into a binary segmentation mask output;
边缘检测解码器神经网络,以编码器模块中的前2个编码器块的输出作为输入,用于强化对边缘特征的学习能力;The edge detection decoder neural network takes the output of the first two encoder blocks in the encoder module as input to enhance the learning ability of edge features;
其中,所述边缘检测解码器神经网络中提取的边缘特征馈入相应的解码器块中。The edge features extracted from the edge detection decoder neural network are fed into the corresponding decoder block.
本发明中,深度学习模型能够同时进行边界预测和语义分割,显著增强了模型在训练过程中对边缘特征的学习能力。边缘检测解码器神经网络一方面在不同尺度上进行跨层特征融合,有效增强了模型对边缘检测的敏感性和准确性;另一方面,通过损失函数对预测图与真实标签之间进行损失计算,使主神经网络能够有效捕捉边界信息并增强分割网络中的层次特征,从而得到更精细的分割结果。In the present invention, the deep learning model can simultaneously perform boundary prediction and semantic segmentation, significantly enhancing the model's ability to learn edge features during training. On the one hand, the edge detection decoder neural network performs cross-layer feature fusion at different scales, effectively enhancing the model's sensitivity and accuracy to edge detection; on the other hand, the loss function is used to calculate the loss between the predicted image and the true label, so that the main neural network can effectively capture boundary information and enhance the hierarchical features in the segmentation network, thereby obtaining a more refined segmentation result.
以下对本实施例中图像中线状结构识别分割的深度学习模型进行详细说明。The following is a detailed description of the deep learning model for recognizing and segmenting linear structures in images in this embodiment.
在实际场景中,将经过清晰化后的待识别图像输入到Edge Aware TRUNet神经网络结构中。In actual scenarios, the clarified image to be identified is input into the Edge Aware TRUNet neural network structure.
本实施例中,编码器模块为Resnet50网络结构。在实际进行图像识别时,待识别图像经过预训练的ResNet50网络结构得到特征信息。In this embodiment, the encoder module is a Resnet50 network structure. When image recognition is actually performed, the image to be recognized is subjected to a pre-trained ResNet50 network structure to obtain feature information.
编码器模块由四个ResNet50编码器块(ResNet50 encoder)组成。每个ResNet50编码器块内部包含多个残差瓶颈结构(Residual Bottleneck Blocks)以及池化层(PoolingLayers),将输入图像转化为空间缩减的特征表示,得到了空间缩减的特征表示作为编码器的输出,旨在提取图像中的深层特征。The encoder module consists of four ResNet50 encoder blocks. Each ResNet50 encoder block contains multiple residual bottleneck structures and pooling layers, which convert the input image into a spatially reduced feature representation. The spatially reduced feature representation is obtained as the output of the encoder, aiming to extract deep features in the image.
本实施例中,特征融合模块包括:Transformer编码器模块和扩张卷积模块。其中,编码器模块被输出被传递至Transformer编码器模块和扩张卷积模块(Dilated Conv)进行处理,得到结合了全局上下文和局部细节的融合特征图。In this embodiment, the feature fusion module includes: a Transformer encoder module and a dilated convolution module. The output of the encoder module is passed to the Transformer encoder module and the dilated convolution module (Dilated Conv) for processing to obtain a fused feature map that combines global context and local details.
①Transformer编码器模块① Transformer encoder module
Transformer编码器模块包含了一个注意力网络和前馈神经网络,这样的设计增强了模型的鲁棒性。The Transformer encoder module contains an attention network and a feedforward neural network. This design enhances the robustness of the model.
②扩张卷积模块(Dilated Conv)②Dilated Conv
扩张卷积模块通过四个并行的3×3卷积层(3×3Conv,每层具有不同的扩张率:1、3、6和9)来扩大编码器的感受野,进而提升上下文信息的提取能力。这些卷积层后均接有批量归一化和ReLU激活函数,之后通过1×1的卷积层(1×1Conv)减少特征通道数量。Transformer编码器模块和扩张卷积模块(Dilated Conv)的输出被连接后,输入到第一个解码器块;The dilated convolution module expands the encoder's receptive field through four parallel 3×3 convolutional layers (3×3Conv, each with different dilation rates: 1, 3, 6, and 9) to improve the ability to extract contextual information. These convolutional layers are followed by batch normalization and ReLU activation functions, and then the number of feature channels is reduced through a 1×1 convolutional layer (1×1Conv). The outputs of the Transformer encoder module and the dilated convolution module (Dilated Conv) are concatenated and input to the first decoder block;
在残差解码器模块中,融合特征图输入到网络中,逐步恢复特征图的分辨率,并细化分割结果。残差解码器模块由四个残差解码器块(Residual Block)组成,每个解码器块以双线性上采样开始,然后与来自编码器块的特征进行跳跃连接。In the residual decoder module, the fused feature map is input into the network, the resolution of the feature map is gradually restored, and the segmentation result is refined. The residual decoder module consists of four residual decoder blocks, each of which starts with bilinear upsampling and then jumps with the features from the encoder block.
请参照图4,在每个残差解码器块中:Please refer to Figure 4, in each residual decoder block:
①融合特征图首先经过一个双线性上采样层,以提高其特征图的尺寸,使其与编码器中相应阶段的特征图尺寸相匹配;① The fused feature map first passes through a bilinear upsampling layer to increase the size of its feature map to match the size of the feature map of the corresponding stage in the encoder;
②随后,上采样后的特征图与来自编码器对应阶段的特征图通过跳跃连接进行融合,得到融合后的特征图;② Subsequently, the upsampled feature map is fused with the feature map from the corresponding stage of the encoder through a skip connection to obtain a fused feature map;
需要特别说明的是,跳跃连接的连接方式不仅有助于防止电力线特征在网络深层中丢失,还能在反向传播期间促进梯度流动,进而提升网络性能。It is important to note that the connection method of skip connections not only helps prevent the power line features from being lost in the deep layers of the network, but also promotes the flow of gradients during back-propagation, thereby improving network performance.
融合后的特征图被送入由3×3卷积层和恒等映射组成的残差块中,经过一系列残差解码器块的处理后,特征图的分辨率逐渐恢复到与原始输入图像相近的大小,同时保留了丰富的语义信息和精细的空间细节,随后,特征图被逐级传递到后续的解码器块中,逐步转化为更具语义信息的特征。The fused feature map is fed into a residual block consisting of a 3×3 convolutional layer and an identity map. After being processed by a series of residual decoder blocks, the resolution of the feature map is gradually restored to a size close to that of the original input image, while retaining rich semantic information and fine spatial details. Subsequently, the feature map is passed to subsequent decoder blocks step by step and gradually transformed into features with more semantic information.
请参照图4,解码器模块的输出通过1×1卷积层(1×1Conv)和sigmoid激活函数生成二进制分割掩码;使用加权损失函数LossPre对二进制分割掩码与真实标签GroundTruth进行损失计算,得到电力线分割图;Please refer to Figure 4. The output of the decoder module generates a binary segmentation mask through a 1×1 convolution layer (1×1Conv) and a sigmoid activation function. The weighted loss function LossPre is used to calculate the loss between the binary segmentation mask and the true label GroundTruth to obtain a power line segmentation map.
由于电力线本身纤细且分散,相较于背景而言,其占据的空间较小且分布不均具体而言,电力线像素通常仅占总图像像素的1-5%,而其余大部分像素均属于背景,这种类别比例的不均衡导致了模型对占据大量比例的背景像素过度偏向,进而影响了对作为少数类的电线像素的准确分类能力。Because power lines are thin and scattered, they occupy a smaller space and are unevenly distributed compared to the background. Specifically, power line pixels usually account for only 1-5% of the total image pixels, while most of the remaining pixels belong to the background. This imbalance in class proportions causes the model to be overly biased towards background pixels that occupy a large proportion, which in turn affects the ability to accurately classify wire pixels, which are a minority class.
LossBCE在类场景中数据均等分布时效果最佳,但由于电力线像素仅占整个图像的一小部分,直接使用LossBCE可能会导致模型过于关注背景像素,而忽视电力线像素;LossBCE works best when data is evenly distributed in class scenes, but since power line pixels only account for a small part of the entire image, directly using LossBCE may cause the model to focus too much on background pixels and ignore power line pixels;
LossDice是一种常用于图像分割任务中的损失函数,它直接衡量预测分割与真实分割之间的重叠程度,在处理前景和背景比例极不均衡的场景时表现更为准确。LossDice is a loss function commonly used in image segmentation tasks. It directly measures the overlap between the predicted segmentation and the true segmentation, and is more accurate when dealing with scenes with extremely unbalanced foreground and background ratios.
单一的损失函数对模型的提升有限,因此本发明提出加权损失函数,通过对背景与电线的损失进行权重分配来解决图形分割中样本不平衡问题,防止其干扰模型的训练收敛过程,并提升模型在测试阶段的泛化性能。A single loss function has limited effect on the improvement of the model. Therefore, the present invention proposes a weighted loss function, which solves the sample imbalance problem in graphic segmentation by weighting the losses of background and wires, prevents them from interfering with the training convergence process of the model, and improves the generalization performance of the model in the testing phase.
具体地,对于所述主神经网络,在训练过程中,损失函数Losspre为:Specifically, for the main neural network, during the training process, the loss function Losspre is:
其中,LossBCE代表二元交叉熵损失,LossDice代表骰子相似系数损失,LossBCE与LossDice两者均通过二进制分割掩码与真实识别结果比对得到,为权重平衡参数,Among them, LossBCE represents binary cross entropy loss, LossDice represents dice similarity coefficient loss, and both LossBCE and LossDice are obtained by comparing the binary segmentation mask with the true recognition result. is the weight balancing parameter,
需要特别说明的是,考虑到电力线数据集的边缘特征较为细微、不易学习,本发明增加了一个专门的边缘检测解码器神经网络,以强化对边缘特征的学习。为了充分利用低级特征中的边缘信息,同时减少噪声干扰,本发明中采用了门控融合策略(GatedFusion)来融合编码器前两个阶段的特征。融合后的特征经过卷积层进行边缘提取,并与解码器的输出特征进行叠加,从而在上采样过程中有效恢复边缘特征。It should be noted that, considering that the edge features of the power line data set are relatively subtle and difficult to learn, the present invention adds a special edge detection decoder neural network to strengthen the learning of edge features. In order to make full use of the edge information in the low-level features and reduce noise interference, the present invention adopts a gated fusion strategy (GatedFusion) to fuse the features of the first two stages of the encoder. The fused features are extracted through the convolution layer and superimposed with the output features of the decoder, so as to effectively restore the edge features during the upsampling process.
图5为图4中EdgeAware TRUNet神经网络模型中边缘检测解码器神经网络的处理流程图。如图5所示,在该边缘检测解码器神经网络中:FIG5 is a processing flow chart of the edge detection decoder neural network in the EdgeAware TRUNet neural network model in FIG4. As shown in FIG5, in the edge detection decoder neural network:
1.门控融合模块1. Gated Fusion Module
首先采用门控融合策略(Gated Fusion)融合来自编码器模块(ResNet50Encoder)中前两个阶段的特征。具体地,将所述来自编码器的两个特征分别送入3×3卷积层(3×3Conv),然后分别进行批归一化(Batch Normalization)和ReLU激活函数(ReLU)处理,随后将这两个特征连接起来,并应用两个独立的1×1卷积层(1×1Conv)来计算组合权重,得到F_edge1。First, the gated fusion strategy (Gated Fusion) is used to fuse the features from the first two stages in the encoder module (ResNet50Encoder). Specifically, the two features from the encoder are sent to the 3×3 convolution layer (3×3Conv) respectively, and then batch normalization (Batch Normalization) and ReLU activation function (ReLU) are processed respectively. Then the two features are connected and two independent 1×1 convolution layers (1×1Conv) are applied to calculate the combined weight to obtain F_edge1.
2.串联的3个卷积块2. 3 convolutional blocks in series
F_edge1被进一步送入三个连续的卷积块中,每个卷积块都由3×3卷积层(3×3Conv)、批归一化(BatchNormalization)和ReLU激活函数(ReLU)组成。这三个卷积块的输出分别表示为F_edge2、F_edge3和F_edge4。其中:F_edge1 is further fed into three consecutive convolutional blocks, each of which consists of a 3×3 convolutional layer (3×3Conv), batch normalization (BatchNormalization), and ReLU activation function (ReLU). The outputs of these three convolutional blocks are denoted as F_edge2, F_edge3, and F_edge4, respectively. Among them:
①F_edge4被送入一个1×1(1×1Conv)的卷积层生成边界图,并将边界图上采样到与原始图像相同的分辨率,以生成边缘检测解码器神经网络的最终边界图PredEdge;① F_edge4 is fed into a 1×1 (1×1Conv) convolutional layer to generate a boundary map, and the boundary map is upsampled to the same resolution as the original image to generate the final boundary map PredEdge of the edge detection decoder neural network;
②经过融合和卷积层边缘提取后的特征F_edge1和进一步通过三个连续的卷积块处理的特征F_edge2分别与残差解码器模块中分辨率相同部分的输出特征进行叠加,使得解码器特征在上采样过程中能够有效恢复边缘特征信息。② The feature F_edge1 after fusion and convolution layer edge extraction and the feature F_edge2 further processed by three consecutive convolution blocks are superimposed with the output features of the same resolution part in the residual decoder module, so that the decoder features can effectively restore the edge feature information during the upsampling process.
3.损失函数3. Loss Function
使用LossEdge损失函数评估边缘检测解码器神经网络所生成的边界图PredEdge与真实标签GroundTruth进行损失计算,并以该LossEdge损失函数进行反向传播以对边缘检测解码器神经网络进行训练。The LossEdge loss function is used to evaluate the loss between the boundary map PredEdge generated by the edge detection decoder neural network and the true label GroundTruth, and the LossEdge loss function is used for backpropagation to train the edge detection decoder neural network.
LossEdge是PyTorch中一个常用的损失函数‘BCEWithLogitsLoss’,用于二分类任务。它结合了Sigmoid激活函数和二元交叉熵损失函数,适用于未经过Sigmoid激活的模型输出;LossEdge is a commonly used loss function 'BCEWithLogitsLoss' in PyTorch for binary classification tasks. It combines the Sigmoid activation function and the binary cross entropy loss function, and is suitable for model outputs that have not been activated by Sigmoid;
对于每个样本i,其标签为yi,模型的输出(未经过Sigmoid的logit)为xi;加权损失函数LossEdge公式如下:For each sample i, its label isyi , and the output of the model (logit without Sigmoid) isxi ; the weighted loss function LossEdge formula is as follows:
其中,xi是第i个样本的模型输出;yi是第i个样本的真实标签,取值为0或1;N是样本的总数。Among them,xi is the model output of the i-th sample;yi is the true label of the i-th sample, which takes a value of 0 or 1; N is the total number of samples.
需要特别说明的是,虽然本实施例中对于编码器模块中编码器块的数目;残差解码器模块(Residual Block)中解码器块的数目均设定为4,但本发明并不以此为限,在本发明其他实施例中,还可以设置2、3、5或者更多的数目,均可以实现本发明,同样在本发明的保护范围之内。同样地,ResNet50编码器块内部残差瓶颈结构的数目、扩张卷积模块中卷积层的数目等,均可以根据实际场景需要进行设定,并不局限于本实施例中的所描述的数目。It should be noted that, although the number of encoder blocks in the encoder module and the number of decoder blocks in the residual decoder module (Residual Block) are set to 4 in this embodiment, the present invention is not limited thereto. In other embodiments of the present invention, 2, 3, 5 or more numbers can be set, all of which can implement the present invention and are also within the protection scope of the present invention. Similarly, the number of residual bottleneck structures inside the ResNet50 encoder block, the number of convolutional layers in the dilated convolution module, etc. can all be set according to the actual scene requirements and are not limited to the number described in this embodiment.
以下将本实施例的方法具体应用到航拍图像的电力线分割实际场景下,来提供本发明更为具体的实施例。The method of this embodiment is specifically applied to the actual scenario of power line segmentation of aerial images to provide a more specific embodiment of the present invention.
图6为本发明实施例航拍图像电力线识别分割方法的流程示意图。请参考图6,本实施例航拍图像电力线识别分割方法包括:FIG6 is a flow chart of a method for identifying and segmenting power lines in aerial images according to an embodiment of the present invention. Referring to FIG6 , the method for identifying and segmenting power lines in aerial images according to the present embodiment includes:
S201:收集数据集并对所述数据集预处理和数据扩增,按照预设比例划分为训练集、验证集和测试集;S201: Collect a data set, pre-process and amplify the data set, and divide it into a training set, a validation set, and a test set according to a preset ratio;
本发明选择城市场景电力线数据集(PLDU,链接地址:https://github.com/SnorkerHeng/PLD-UAV)作为模型训练的数据集,这些数据集包含由无人机捕获的453张电力线视觉光图像以及相应的像素方向地面真(GT)注释,图像大小为560×360或360×560,电线与背景的类不平衡比为1.18:98.82;The present invention selects the Urban Scene Power Line Dataset (PLDU, link address: https://github.com/SnorkerHeng/PLD-UAV) as the dataset for model training. These datasets contain 453 power line visual light images captured by UAVs and corresponding pixel-wise ground truth (GT) annotations. The image size is 560×360 or 360×560, and the class imbalance ratio of wires to background is 1.18:98.82.
将收集到的数据集进行预处理,所述预处理包括去除图像噪声和图像增强处理。图像增强处理包括旋转、翻转和缩放等标准数据增强技术,通过图像处理以增加训练数据的数量,提高模型的泛化能力,避免出现过拟合的现象。对拓充后的数据集按照3:1:1的比例划分出训练集、验证集和测试集;The collected data set is preprocessed, including image noise removal and image enhancement. Image enhancement includes standard data enhancement techniques such as rotation, flipping, and scaling. Image processing is used to increase the amount of training data, improve the generalization ability of the model, and avoid overfitting. The expanded data set is divided into training set, validation set, and test set in a ratio of 3:1:1;
S202:构建EdgeAware TRUNet网络模型并利用所述数据集进行训练和验证;S202: constructing an EdgeAware TRUNet network model and using the data set for training and verification;
本步骤中,利用所述训练集训练EdgeAware TRUNet网络参数,将S201中训练集的数据作为神经网络的输入,初始学习率设置为0.0001,进行迭代200个epoch,使用骰子相似系数损失(LossDice)和二元交叉熵损失(LossBCE)的加权损失函数LossPre解决类不平衡问题,通过Adam优化器进行优化;In this step, the EdgeAware TRUNet network parameters are trained using the training set, the data of the training set in S201 is used as the input of the neural network, the initial learning rate is set to 0.0001, and 200 epochs are iterated. The weighted loss function LossPre of the dice similarity coefficient loss (LossDice ) and the binary cross entropy loss (LossBCE ) is used to solve the class imbalance problem, and the optimization is performed using the Adam optimizer;
每一轮训练完成后,将步骤S201中验证集的数据用于评估当前模型的性能,保存当前模型的权重,并分别保存当前模型的训练集和验证集的评价指标结果;After each round of training is completed, the data of the validation set in step S201 is used to evaluate the performance of the current model, the weight of the current model is saved, and the evaluation index results of the training set and validation set of the current model are saved respectively;
S203:验证并优化EdgeAware TRUNet模型;S203: Verify and optimize the EdgeAware TRUNet model;
利用所述测试集测试训练好的所述EdgeAware TRUNet的网络参数,计算目标评价指标,得到目标评价指标结果;Using the test set to test the trained network parameters of the EdgeAware TRUNet, calculating the target evaluation index, and obtaining the target evaluation index result;
根据所述目标评价指标结果调整所述EdgeAware TRUNet网络参数;Adjusting the EdgeAware TRUNet network parameters according to the target evaluation indicator results;
上述目标评价指标有F1 score、Recall、Precision、IoU,几项指标分别定义为:The above target evaluation indicators are F1 score, Recall, Precision, and IoU, which are defined as:
F1 score=2TP/(2TP+FP+FN)F1 score = 2TP/(2TP+FP+FN)
Recall=TP/(TP+FN)Recall=TP/(TP+FN)
Precision=TP/(TP+FP)Precision=TP/(TP+FP)
IoU=TP/(TP+FP+FN)IoU = TP/(TP+FP+FN)
其中,式中的TP、FP、TN、FN分别代表实际为分割区域且正确检测为分割区域、实际为背景区域但错误检测为分割区域、实际为背景区域且正确检测为背景区域、实际为分割区域但错误检测为背景区域;Among them, TP, FP, TN, and FN in the formula represent the actual segmented area and correctly detected as the segmented area, the actual background area but incorrectly detected as the segmented area, the actual background area and correctly detected as the background area, and the actual segmented area but incorrectly detected as the background area, respectively;
将训练好的Edge Aware TRUNet模型在测试集上进行评估,本方法在测试集上的预测表现及评价指标表现都优于常见的语义分割模型;The trained Edge Aware TRUNet model is evaluated on the test set. The prediction performance and evaluation index performance of this method on the test set are better than those of common semantic segmentation models.
S204:将S203最终优化好的EdgeAware TRUNet模型用于无人机实际航拍图像中进行测试;S204: The EdgeAware TRUNet model optimized in S203 is used for testing in actual drone aerial images;
将S203最终优化好的EdgeAware TRUNet模型用于无人机实际航拍图像中进行测试,对模型的泛化能力、实时性以及分割精度进行综合分析。通过计算处理速度、边界分割的准确度以及不同场景下的鲁棒性,综合得出模型在实际应用中的性能表现;The EdgeAware TRUNet model optimized by S203 was used in actual drone aerial images for testing, and the generalization ability, real-time performance, and segmentation accuracy of the model were comprehensively analyzed. The performance of the model in practical applications was comprehensively obtained by calculating the processing speed, accuracy of boundary segmentation, and robustness in different scenarios;
根据实际性能表现对模型进行进一步的优化调整,以及为无人机航拍图像的自动分割和处理提供强有力的技术支持。同时,这也将促进Edge Aware TRUNet在无人机遥感、环境监测、城市规划等领域的广泛应用。The model will be further optimized and adjusted based on actual performance, and strong technical support will be provided for the automatic segmentation and processing of drone aerial images. At the same time, this will also promote the widespread application of Edge Aware TRUNet in drone remote sensing, environmental monitoring, urban planning and other fields.
本发明第二方面提供了一种图像中线状结构识别分割的深度学习模型。该深度学习模型已经在前述实施例中进行了详细描述,此处不再说明。The second aspect of the present invention provides a deep learning model for identifying and segmenting linear structures in images. The deep learning model has been described in detail in the above embodiments and will not be described again here.
本发明第三方面提供了一种计算机可读存储介质。图7为本发明实施例计算机可读存储介质的示意图。如图7所示,本实施例计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现:如上的图像中线状结构识别分割的深度学习模型;或,如上的图像中线状结构识别分割的方法。两者均可以参见上文的相关说明,此处不再赘述。The third aspect of the present invention provides a computer-readable storage medium. FIG7 is a schematic diagram of a computer-readable storage medium of an embodiment of the present invention. As shown in FIG7, a computer program is stored on the computer-readable storage medium of this embodiment, and when the computer program is executed by the processor, it implements: the deep learning model for linear structure recognition and segmentation in an image as above; or, the method for linear structure recognition and segmentation in an image as above. Both can refer to the relevant descriptions above, and will not be repeated here.
本发明第四方面提供了一种计算机装置。图8为本发明实施例计算机装置的结构示意图。如图8所示,本实施例计算机装置包括:处理器;存储器,其上存储有计算机程序;其中,所述处理器执行所述计算机程序时,实现:如上的图像中线状结构识别分割的深度学习模型;或,如上的图像中线状结构识别分割的方法。两者均可以参见上文的相关说明,此处不再赘述。The fourth aspect of the present invention provides a computer device. FIG8 is a schematic diagram of the structure of a computer device according to an embodiment of the present invention. As shown in FIG8 , the computer device according to the present embodiment includes: a processor; a memory having a computer program stored thereon; wherein, when the processor executes the computer program, it implements: a deep learning model for identifying and segmenting linear structures in an image as described above; or, a method for identifying and segmenting linear structures in an image as described above. Both of them can refer to the relevant descriptions above, and will not be repeated here.
至此,本发明的各个实施例介绍完毕。依据以上描述,本领域技术人员应当对本发明有了清楚的认识。At this point, the various embodiments of the present invention have been introduced. According to the above description, those skilled in the art should have a clear understanding of the present invention.
需要说明的是,除非明确指明为相反之意,本发明的说明书及权利要求中的数值参数可以是近似值,能够根据通过本发明的内容改变。具体而言,所有记载于说明书及权利要求中表示组成的含量、反应条件等的数字,应理解为在所有情况中是受到“约”的用语所修饰,其在一些实施例中表现为±10%的变化。It should be noted that, unless explicitly indicated to the contrary, the numerical parameters in the description and claims of the present invention may be approximate values and may be changed according to the content of the present invention. Specifically, all the numbers indicating the content of the composition, reaction conditions, etc. recorded in the description and claims should be understood to be modified by the term "about" in all cases, which in some embodiments is expressed as a change of ±10%.
本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The present invention may also be implemented as a device or apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program implementing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。硬件结构的物理实现包括但不局限于物理器件,物理器件包括但不局限于晶体管,忆阻器,DNA计算机、单片机、微处理器或者数字信号处理器(DSP)。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现本发明的内容,本文对特定语言所做的描述是为了披露本发明的最佳实施方式。The present invention can be implemented by means of hardware including several different elements and by means of a suitably programmed computer. The various component embodiments of the present invention can be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. The physical implementation of the hardware structure includes but is not limited to physical devices, which include but are not limited to transistors, memristors, DNA computers, single-chip microcomputers, microprocessors or digital signal processors (DSPs). In addition, the present invention is not directed to any specific programming language. It should be understood that the content of the present invention can be implemented using various programming languages, and the description of specific languages herein is to disclose the best implementation of the present invention.
本领域技术人员应当理解,本发明权利要求书和说明书中,单词“包含”不排除存在未列在权利要求中的元件(或步骤)。位于元件(或步骤)之前的单词“一”或“一个”不排除存在多个这样的元件(或步骤)。Those skilled in the art should understand that in the claims and description of the present invention, the word "comprising" does not exclude the existence of elements (or steps) not listed in the claims. The word "a" or "an" preceding an element (or step) does not exclude the existence of multiple such elements (or steps).
对于某些实现方式,如果其并非本发明的关键内容,且为所属技术领域中普通技术人员所熟知,则基于篇幅所限,在说明书附图或正文中并未对其进行详细说明,此时可参照相关现有技术进行理解。For certain implementations, if they are not the key content of the present invention and are well known to ordinary technicians in the relevant technical field, they are not described in detail in the drawings or text of the specification due to space limitations. In this case, reference can be made to relevant existing technologies for understanding.
类似的,应当理解,为了精简本发明,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图,或者对其的描述中。然而,并不应将该发明的方法解释成反映如下意图:所要求保护的发明需要比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如权利要求书所反映的那样,各个发明方面在于少于前面单个实施例的所有特征。并且,实施例可基于设计及可靠度的考虑,彼此混合搭配使用或与其他实施例混合搭配使用,即不同实施例中的技术特征可以自由组合形成更多的实施例。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be understood that in order to simplify the present invention, in the above description of the exemplary embodiments of the present invention, the various features of the present invention are sometimes grouped together into a single embodiment, figure, or description thereof. However, the method of the invention should not be interpreted as reflecting the following intention: the claimed invention requires more features than the features explicitly stated in each claim. More specifically, as reflected in the claims, each inventive aspect lies in less than all the features of the previous single embodiment. Moreover, the embodiments can be mixed and matched with each other or with other embodiments based on design and reliability considerations, that is, the technical features in different embodiments can be freely combined to form more embodiments. Therefore, the claims following the specific embodiment are hereby explicitly incorporated into the specific embodiment, wherein each claim itself serves as a separate embodiment of the present invention.
以上各个具体实施例,对本发明的目的、技术手段和有益效果进行了详细说明,应理解的是,详细说明的目的在于本领域技术人员能够更清楚地理解本发明,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above specific embodiments provide a detailed description of the objectives, technical means and beneficial effects of the present invention. It should be understood that the purpose of the detailed description is to enable those skilled in the art to understand the present invention more clearly, and it is not used to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410906692.XACN118865171A (en) | 2024-07-08 | 2024-07-08 | Deep learning model, method, storage medium and device for linear structure recognition and segmentation in images |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410906692.XACN118865171A (en) | 2024-07-08 | 2024-07-08 | Deep learning model, method, storage medium and device for linear structure recognition and segmentation in images |
| Publication Number | Publication Date |
|---|---|
| CN118865171Atrue CN118865171A (en) | 2024-10-29 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410906692.XAPendingCN118865171A (en) | 2024-07-08 | 2024-07-08 | Deep learning model, method, storage medium and device for linear structure recognition and segmentation in images |
| Country | Link |
|---|---|
| CN (1) | CN118865171A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119169396A (en)* | 2024-11-21 | 2024-12-20 | 北京智源人工智能研究院 | Method and system for constructing synthetic image detection frame |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119169396A (en)* | 2024-11-21 | 2024-12-20 | 北京智源人工智能研究院 | Method and system for constructing synthetic image detection frame |
| Publication | Publication Date | Title |
|---|---|---|
| CN114663346A (en) | Strip steel surface defect detection method based on improved YOLOv5 network | |
| CN111768388B (en) | A product surface defect detection method and system based on positive sample reference | |
| CN113052006B (en) | Image target detection method, system and readable storage medium based on convolutional neural network | |
| Kang et al. | Deep learning-based weather image recognition | |
| CN112163628A (en) | Method for improving target real-time identification network structure suitable for embedded equipment | |
| CN116385958A (en) | An edge intelligent detection method for power grid inspection and monitoring | |
| Sun et al. | IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes | |
| CN111582092B (en) | Pedestrian abnormal behavior detection method based on human skeleton | |
| CN118865178B (en) | A flood extraction and location method based on deep learning and spatial information fusion | |
| CN116123040B (en) | A method and system for detecting fan blade status based on multimodal data fusion | |
| CN113487610B (en) | Herpes image recognition method and device, computer equipment and storage medium | |
| CN115239672A (en) | Defect detection method and device, equipment and storage medium | |
| CN114445705A (en) | Target detection method of high-efficiency aerial image based on dense region perception | |
| CN118506263A (en) | A foreign body detection method for power transmission lines in complex environments based on deep learning | |
| CN117853942A (en) | Cloud and fog identification method, cloud and fog identification device and cloud and fog identification system | |
| CN117727046A (en) | Novel mountain torrent front-end instrument and meter reading automatic identification method and system | |
| CN118865171A (en) | Deep learning model, method, storage medium and device for linear structure recognition and segmentation in images | |
| CN116740572A (en) | Marine vessel target detection method and system based on improved YOLOX | |
| CN117058105A (en) | Battery pole piece defect detection method, computing device and storage medium | |
| CN116309270B (en) | Binocular image-based transmission line typical defect identification method | |
| CN118262258B (en) | Ground environment image aberration detection method and system | |
| CN114219989A (en) | A Ship Instance Segmentation Method in Foggy Scenes Based on Disturbance Suppression and Dynamic Contours | |
| CN119274122A (en) | A sanitation work evaluation method based on MRS-YOLO model | |
| CN117994594A (en) | Power operation risk identification method based on deep learning | |
| CN118366031A (en) | Unmanned ship-oriented lightweight marine complex target detection method |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |