Movatterモバイル変換


[0]ホーム

URL:


CN118942114A - A pedestrian detection method based on multi-scale and multi-feature neural network - Google Patents

A pedestrian detection method based on multi-scale and multi-feature neural network
Download PDF

Info

Publication number
CN118942114A
CN118942114ACN202410996723.5ACN202410996723ACN118942114ACN 118942114 ACN118942114 ACN 118942114ACN 202410996723 ACN202410996723 ACN 202410996723ACN 118942114 ACN118942114 ACN 118942114A
Authority
CN
China
Prior art keywords
model
feature
pedestrian
target detection
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410996723.5A
Other languages
Chinese (zh)
Inventor
石永响
马梓杰
李帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huai'an Zhongjia Information Technology Co ltd
Original Assignee
Huai'an Zhongjia Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huai'an Zhongjia Information Technology Co ltdfiledCriticalHuai'an Zhongjia Information Technology Co ltd
Priority to CN202410996723.5ApriorityCriticalpatent/CN118942114A/en
Publication of CN118942114ApublicationCriticalpatent/CN118942114A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供了一种基于多尺度多特征神经网络的行人目标检测方法,属于基于计算机视觉的目标检测技术领域;首先,采集行人图像数据并对其进行预处理;其次,对于处理好的数据,构建多尺度特征提取网络提取不同尺度的特征图;再次,构建多尺度特征融合网络将不同尺度的特征图进行融合,使网络可以学习到更丰富的特征信息,从而提高模型检测不同大小目标的能力;再次,构建行人目标检测网络,将融合特征输入网络中获得行人目标检测结果;最后,设计损失函数,并通过反向传播算法计算梯度来更新模型的权重,经过充分的训练,得到最终的目标检测模型并部署使用。

The present invention provides a pedestrian target detection method based on a multi-scale and multi-feature neural network, belonging to the technical field of target detection based on computer vision. Firstly, pedestrian image data is collected and preprocessed. Secondly, for the processed data, a multi-scale feature extraction network is constructed to extract feature maps of different scales. Thirdly, a multi-scale feature fusion network is constructed to fuse feature maps of different scales, so that the network can learn richer feature information, thereby improving the ability of the model to detect targets of different sizes. Thirdly, a pedestrian target detection network is constructed, and the fused features are input into the network to obtain pedestrian target detection results. Finally, a loss function is designed, and the gradient is calculated by a back-propagation algorithm to update the weight of the model. After sufficient training, the final target detection model is obtained and deployed for use.

Description

Pedestrian target detection method based on multi-scale multi-feature neural network
Technical Field
The invention relates to the technical field of pedestrian detection, in particular to a pedestrian target detection method based on a multi-scale multi-feature neural network.
Background
Pedestrian detection techniques, which aim to identify and locate pedestrians from image or video sources, have become an important tool to evaluate whether pedestrians are present in an image or video sequence, and to mark and display them. At present, the pedestrian detection technology is widely applied to a plurality of fields such as intelligent driving auxiliary systems, intelligent robot technology, intelligent video monitoring and the like, and plays a key role in improving the intelligent level of the systems. The pedestrian detection technology is not isolated, and is closely connected with tasks such as pedestrian tracking, behavior recognition and the like in video analysis, but complex factors such as various postures of pedestrians, shielding among pedestrians, shielding between pedestrians and the background and the like have become key challenges for influencing the accuracy of pedestrian detection. Therefore, the detection capability of the pedestrian detection technology on small target pedestrians in a complex environment is improved, and the method has important research significance and application prospect.
The invention patent of China with the application number of CN202210979010.9 discloses an all-weather-oriented cross-mode self-adaptive fusion pedestrian target detection system and method, which mainly comprise a cross-mode differential information fusion module and a confidence perception self-adaptive fusion module. The cross-modal differential information fusion module is mainly used for carrying out complementary feature enhancement on the feature information of the visible light and infrared modes extracted by the network, enhancing the space information of the differential feature map of the visible light and the infrared modes through global pooling and average pooling operation, acquiring fusion feature vectors of all modes through a full connection layer and a Tanh activation function, and further carrying out feature enhancement expression on the initially extracted visible light and infrared mode features respectively; the confidence perception self-adaptive fusion module fully utilizes the confidence perception expression to carry out self-adaptive weighting on the characteristics among the different enhanced modes, so that the network detector can better select the reliable modes for processing, and the robustness of the detector is improved; and finally, optimizing network model parameters by utilizing the multitasking loss.
The invention patent of China with the application number of CN202410103120.8 discloses a pedestrian target detection method, device, equipment and medium based on a monitoring scene, which comprises the following steps: step one, acquiring a current video frame in target monitoring in real time; inputting the current video frame into a pre-trained human head positioning depth learning network model to perform human head positioning processing, so as to obtain at least one human head position corresponding to each frame image in the current video frame; step three, acquiring pedestrian bounding boxes corresponding to each frame image in the current video frame through an improved target detection algorithm; and step four, optimizing the pedestrian bounding box through the head point positions of the pedestrians to obtain a pedestrian target detection bounding box result, so that the number of pedestrians and the positions of the pedestrians corresponding to the current video frame are determined according to the pedestrian target detection bounding box result.
The prior art has the following limitations:
(1) When the data input into the model are images with different scales, the detection accuracy of the method is lower;
(2) When the background in the image is too much to block pedestrians, the method cannot accurately detect the pedestrians in the image.
In view of the above, existing target detection techniques present challenges and shortcomings in pedestrian detection, and further research and improvement is needed to improve accuracy, efficiency and applicability.
Disclosure of Invention
In view of the above problems, a first aspect of the present invention proposes a pedestrian target detection method based on a multi-scale multi-feature neural network, by first collecting image data including a pedestrian and preprocessing it; secondly, extracting feature graphs with different scales from the processed data through a designed feature pyramid network, fusing the feature graphs, enabling the network to learn richer feature information, and further refining the fused feature graphs by adopting convolution operation; finally, introducing a cross entropy loss function and a smooth L1 loss function, calculating gradient through a back propagation algorithm to update the weight of the model, and obtaining a final target detection model through sufficient training and deployment and use, wherein the method comprises the following steps:
Step 1, pedestrian data acquisition and processing; collecting image data containing pedestrians, preprocessing the original image data, and dividing the preprocessed data into a training set and a testing set;
Step 2, designing a multi-scale multi-feature fusion module; the image is processed on multiple scales simultaneously by introducing a characteristic pyramid structure network, and characteristic graphs of different scales are fused, so that the model can learn more abundant characteristic information;
step 3, designing a pedestrian target detection module; using a target detection model based on a deep convolutional neural network;
Step 4, training the model; testing and verifying the trained model effect by using a test set, and storing a final pedestrian target detection model;
Step 5, deploying the model; the model is deployed on a hardware platform to detect pedestrians.
Preferably, the step 1 specifically includes the following steps:
s201, acquiring a real image containing pedestrians, marking the acquired effect to form a corresponding tag data set, and recording image data containing the pedestrians as { x1, x2, & gt, xn }, wherein the corresponding tags are recorded as { y1, y2, & gt, yn };
S202, preprocessing the image data in the step S1, including smoothing the image by using Gaussian filtering, and improving the quality and definition of the image data; data enhancement is performed using rotation, translation, brightness and contrast adjustment.
Preferably, the step 2 specifically includes the following steps:
S301, introducing a characteristic pyramid structure; the Feature pyramid structure can enable the network to process images on multiple scales simultaneously, each level of the pyramid is fused with the Feature map of the adjacent level through top-down upsampling and transverse connection, and the upsampling calculation process is as follows assuming that Featurel represents the Feature map of the first level:
Ul=Upsample(Featurel)
(1)
Where Ul represents the up-sampled feature map, upsample () represents the up-sampling operation;
The feature fusion calculation process of the transverse connection is as follows:
Pl=Fl+U(l―1)
(2)
Wherein Pl is the Feature map after fusion, featurel is the bottom-up layer I Feature map, and U(l-1) is the result of upsampling the top-down layer (l-1) Feature map;
S302, the fused feature graphs are subjected to 3X 3 convolution operation to further refine, so that each layer of the feature pyramid can generate more abundant and strong-adaptability feature representations, the performance of the whole network is improved, and the calculation process of the convolution operation is as follows:
Rl=Conv3×3(Pl)
(3)
Wherein Rl is the final feature map after 3×3 convolution operation; further refinement may be considered as a feature mapping process, in which the new value of each pixel in the feature map is obtained by weighted summation of pixel values in the surrounding neighborhood, where the calculation process is as follows:
Rij=∑mnKernelmn×Featurei―m,j―n (4)
Where Rij is the pixel value on the refined Feature map, featurei-m,j-n is the pixel value on the original Feature map, kernelmn is the weight of the convolution Kernel, and m, n represent the position of the element in the convolution Kernel.
Preferably, the step 3 specifically includes the following steps:
The YOLOv network structure is selected as a main model structure of the pedestrian target detection module and is used for receiving the multi-scale fusion feature map output by the multi-scale multi-feature fusion module, and the calculation process is as follows:
Y=YOLOv8(Pl1,Pl2,...,Pln) (5)
Where Y is the output of YOLOv and Pli is a feature map of different scale.
Preferably, the step 4 specifically includes the following steps:
S501, defining a loss function, selecting a cross entropy loss function as a classification loss function, and selecting a smooth L1 loss function as a bounding box regression loss function;
the cross entropy loss function is calculated as follows:
Wherein N is the total number of samples, C is the total number of categories, yi,c is the one-hot encoding of the real label of sample i, and pi,c is the probability that model prediction sample i belongs to category C;
the calculation formula of the smoothed L1 loss function is as follows:
where x is the difference between the predicted value and the true value;
S502, setting model termination conditions, and terminating training of the model when loss of the model in the training process is continuously unchanged for 10 times;
S503, setting an optimizer of the model, selecting Adam as the model training optimizer, and updating parameters of the model and improving the convergence rate of the model;
S504, the trained optimal model is stored and used for subsequent model deployment.
Preferably, the step 5 specifically includes the following steps:
s601, converting the trained complete model into a format compatible with a hardware platform, and loading and initializing the model, wherein the operations comprise loading model weight, distributing running memory space for the model and the like;
s602, inputting image data to be detected, enabling the model to detect pedestrians in the middle image, and performing the following calculation process:
Result=Model(Image) (8)
The Model represents a trained network, the Image represents an input Image to be detected, and the Result represents a pedestrian detection Result.
The second aspect of the present invention also provides a pedestrian target detection device based on a multi-scale multi-feature neural network, which is characterized in that: the apparatus includes at least one processor and at least one memory, the processor and the memory coupled; a computer-implemented program of a pedestrian target detection model based on a multi-scale multi-feature neural network constructed by the construction method according to the first aspect is stored in the memory; when the processor executes a computer execution program stored in the memory, the processor is caused to execute a pedestrian target detection method based on the multi-scale multi-feature neural network.
Compared with the prior art, the invention has the following beneficial effects:
(1) The images are processed on multiple scales simultaneously through the characteristic pyramid structure network, and the characteristic images of different scales are fused, so that the model can learn richer characteristic information, and the detection capability of targets with different sizes is improved;
(2) The recognition capability of the model to the pedestrian target is enhanced by utilizing a multi-scale multi-feature fusion technology, and particularly, the detection of small target pedestrians in a complex environment is improved, so that the detection precision is improved;
(3) In the data preprocessing stage, gaussian filtering is adopted to carry out image smoothing and data enhancement is carried out by methods of rotation, translation, brightness adjustment, contrast adjustment and the like, so that the generalization capability of the model is further improved; in the model training stage, a cross entropy loss function and a smooth L1 loss function are selected, and an Adam optimizer is selected, so that the model training efficiency and the model training convergence speed are improved.
In general, the invention provides a reliable solution for pedestrian target detection and has wide application prospect.
Drawings
Fig. 1 shows the main steps of the present invention.
Fig. 2 is a general flow chart of a pedestrian target detection method based on a multi-scale multi-feature neural network.
Fig. 3 pedestrian detection results.
Fig. 4 is a schematic structural diagram of a pedestrian target detection device based on a multi-scale multi-feature neural network in embodiment 2 of the present invention.
Detailed Description
The invention will be further described with reference to specific examples.
Example 1:
The invention provides a pedestrian target detection method based on a multi-scale multi-feature neural network, the general flow of the invention is shown in figure 1, the general flow of the pedestrian target detection method based on the multi-scale multi-feature neural network is shown in figure 2, and the method comprises the following key steps: firstly, image data containing pedestrians are collected and preprocessed; secondly, extracting feature graphs with different scales from the processed data through a designed feature pyramid network, fusing the feature graphs, enabling the network to learn richer feature information, and further refining the fused feature graphs by adopting convolution operation; and finally, introducing a cross entropy loss function and a smooth L1 loss function, calculating gradient through a back propagation algorithm to update the weight of the model, and obtaining a final target detection model through sufficient training and deployment.
1. Pedestrian data acquisition and processing, including the following steps:
s201, acquiring a real image containing pedestrians, marking the acquired effect to form a corresponding tag data set, and recording image data containing the pedestrians as { x1, x2, & gt, xn }, wherein the corresponding tags are recorded as { y1, y2, & gt, yn };
S202, preprocessing the image data in the step S1, including smoothing the image by using Gaussian filtering, and improving the quality and definition of the image data; data enhancement is performed using rotation, translation, brightness and contrast adjustment.
2. The design of the multi-scale multi-feature fusion module comprises the following steps:
S301, introducing a characteristic pyramid structure; the Feature pyramid structure can enable the network to process images on multiple scales simultaneously, each level of the pyramid is fused with the Feature map of the adjacent level through top-down upsampling and transverse connection, and the upsampling calculation process is as follows assuming that Featurel represents the Feature map of the first level:
Ul=Upsample(Featurel)
(1)
Where Ul represents the up-sampled feature map, upsample () represents the up-sampling operation;
The feature fusion calculation process of the transverse connection is as follows:
Pl=Fl+U(l―1)
(2)
Wherein Pl is the Feature map after fusion, featurel is the bottom-up layer I Feature map, and U(l-1) is the result of upsampling the top-down layer (l-1) Feature map;
S302, the fused feature graphs are subjected to 3X 3 convolution operation to further refine, so that each layer of the feature pyramid can generate more abundant and strong-adaptability feature representations, the performance of the whole network is improved, and the calculation process of the convolution operation is as follows:
Rl=Conv3×3(Pl)
(3)
Wherein Rl is the final feature map after 3×3 convolution operation; further refinement may be considered as a feature mapping process, in which the new value of each pixel in the feature map is obtained by weighted summation of pixel values in the surrounding neighborhood, where the calculation process is as follows:
Rij=∑mnKernelmn×Featurei―m,j―n
(4)
Where Rij is the pixel value on the refined Feature map, featurei-m,j-n is the pixel value on the original Feature map, kernelmn is the weight of the convolution Kernel, and m, n represent the position of the element in the convolution Kernel.
3. The pedestrian target detection module design comprises the following steps:
The YOLOv network structure is selected as a main model structure of the pedestrian target detection module and is used for receiving the multi-scale fusion feature map output by the multi-scale multi-feature fusion module, and the calculation process is as follows:
Y=YOLOv8(Pl1,Pl2,...,Pln) (5)
Where Y is the output of YOLOv and Pli is a feature map of different scale.
4. Neural network model training, comprising the steps of:
S501, defining a loss function, selecting a cross entropy loss function as a classification loss function, and selecting a smooth L1 loss function as a bounding box regression loss function;
the cross entropy loss function is calculated as follows:
Wherein N is the total number of samples, C is the total number of categories, yi,c is the one-hot encoding of the real label of sample i, and pi,c is the probability that model prediction sample i belongs to category C;
the calculation formula of the smoothed L1 loss function is as follows:
where x is the difference between the predicted value and the true value;
S502, setting model termination conditions, and terminating training of the model when loss of the model in the training process is continuously unchanged for 10 times;
S503, setting an optimizer of the model, selecting Adam as the model training optimizer, and updating parameters of the model and improving the convergence rate of the model;
S504, the trained optimal model is stored and used for subsequent model deployment.
5. The target detection model deployment comprises the following steps:
s601, converting the trained complete model into a format compatible with a hardware platform, and loading and initializing the model, wherein the operations comprise loading model weight, distributing running memory space for the model and the like;
s602, inputting image data to be detected, enabling the model to detect pedestrians in the middle image, and performing the following calculation process:
Result=Model(Image) (8)
The Model represents a trained network, the Image represents an input Image to be detected, and the Result represents a pedestrian detection Result.
The following example results were presented for this process:
In order to verify the effectiveness of the pedestrian target detection method based on the multi-scale multi-feature neural network, the embodiment provides a displayable result shown in fig. 3, all pedestrians in the graph are framed by a white rectangular frame, and most bodies of the pedestrians are framed by the white rectangular frame, so that the method has higher pedestrian target detection precision. The embodiment also provides a comparison between the results of the current target detection models Fast R-CNN, SSD and YOLOv, and the index for evaluating the performance of the model selects the overall average accuracy mAP. The test results show that mAP of Fast R-CNN is 83.4, mAP of SSD is 86.1, mAP of Yolov7 is 86.8, and mAP of the method is 89.2. The method provided by the invention obtains the maximum value of mAP in all comparison methods, and shows that compared with the existing method, the method provided by the invention has the highest pedestrian target detection precision, and the effectiveness and the practicability of the method provided by the invention are further verified.
Example 2:
as shown in fig. 4, the present application also provides a pedestrian target detection device based on a multi-scale multi-feature neural network, the device comprising at least one processor and at least one memory, and further comprising a communication interface and an internal bus; the memory stores computer executing program; a memory stores a computer-implemented program for pedestrian target detection based on a multi-scale multi-feature neural network constructed by the construction method described in embodiment 1; when the processor executes the computer-implemented program stored in the memory, the processor can be caused to execute a pedestrian target detection method based on the multi-scale multi-feature neural network. Wherein the internal bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, PCI) bus, or an extended industry standard architecture (. XtendedIndustry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus. The memory may include a high-speed RAM memory, and may further include a nonvolatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk, or an optical disk.
The device may be provided as a terminal, server or other form of device.
Fig. 4 is a block diagram of an apparatus shown for illustration. The device may include one or more of the following components: a processing component, a memory, a power component, a multimedia component, an audio component, an input/output (I/O) interface, a sensor component, and a communication component. The processing component generally controls overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component may include one or more processors to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component may include one or more modules that facilitate interactions between the processing component and other components. For example, the processing component may include a multimedia module to facilitate interaction between the multimedia component and the processing component.
The memory is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like. The memory may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The communication component is configured to facilitate communication between the electronic device and other devices in a wired or wireless manner. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further comprises a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
While the foregoing describes the embodiments of the present invention, it should be understood that the present invention is not limited to the embodiments, and that various modifications and changes can be made by those skilled in the art without any inventive effort.

Claims (8)

Translated fromChinese
1.一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于,包括以下过程:1. A pedestrian target detection method based on a multi-scale multi-feature neural network, characterized by comprising the following processes:步骤1,行人数据采集与处理;采集行人图像数据,对原始图像数据进行预处理,将预处理后的数据划分为训练集和测试集;Step 1, pedestrian data collection and processing: collect pedestrian image data, pre-process the original image data, and divide the pre-processed data into a training set and a test set;步骤2,设计多尺度特征提取网络提取行人图像的多尺度特征,获得图像不同尺度的丰富特征;Step 2: Design a multi-scale feature extraction network to extract multi-scale features of pedestrian images and obtain rich features of different scales of images;步骤3,设计多尺度特征融合网络将所述步骤2获得的行人图像多尺度特征进行融合,使模型可以学习到更加丰富的特征信息;Step 3, designing a multi-scale feature fusion network to fuse the multi-scale features of the pedestrian image obtained in step 2, so that the model can learn richer feature information;步骤4,设计行人目标检测网络;使用基于深度卷积神经网络的目标检测模型检测行人图像中行人的位置;Step 4: Design a pedestrian target detection network; use a target detection model based on a deep convolutional neural network to detect the position of pedestrians in pedestrian images;步骤5,对模型进行训练;使用测试集对训练后的模型效果进行测试验证,并将最终的行人目标检测模型进行保存;Step 5: train the model; use the test set to test and verify the effect of the trained model, and save the final pedestrian target detection model;步骤6,模型部署;将模型部署在硬件平台上,实现对行人的实时检测。Step 6: Model deployment: Deploy the model on the hardware platform to achieve real-time detection of pedestrians.2.如权利要求1所述的一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于,所述步骤1具体包括以下过程:2. The method for pedestrian target detection based on a multi-scale multi-feature neural network according to claim 1, wherein step 1 specifically comprises the following process:S201,获取包含行人的真实影像,并对获取的影响进行标注,形成对应的标签数据集,将包含行人的图像数据记为{x1,x2,...,xn},对应的标签记为{y1,y2,...,yn};S201, obtaining a real image containing pedestrians, and annotating the obtained images to form a corresponding label data set, wherein the image data containing pedestrians is recorded as {x1 ,x2 ,...,xn }, and the corresponding labels are recorded as {y1 ,y2 ,...,yn };S202,对S1所述的图像数据,进行预处理,包括使用高斯滤波对图像进行平滑操作,提升图像数据的质量和清晰度;使用旋转、平移、调整亮度和对比度进行数据增强。S202, preprocessing the image data described in S1, including using Gaussian filtering to smooth the image to improve the quality and clarity of the image data; using rotation, translation, and adjustment of brightness and contrast to perform data enhancement.3.如权利要求1所述的一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于,所述步骤2的具体过程为:3. The method for pedestrian target detection based on a multi-scale multi-feature neural network according to claim 1, wherein the specific process of step 2 is:所述多尺度特征提取模块包括三个不同尺度的上采样层和三个通道注意力机制层;The multi-scale feature extraction module includes three upsampling layers of different scales and three channel attention mechanism layers;S301,将行人图像输入不同尺度的上采样层,不同尺度的上采样层分别对输入的行人图像进行上采样,获得不同尺度的特征图,计算过程如下所示:S301, input the pedestrian image into upsampling layers of different scales, and the upsampling layers of different scales upsample the input pedestrian image respectively to obtain feature maps of different scales. The calculation process is as follows:Ul=Upsamplel(image) (1)Ul =Upsamplel (image) (1)其中,Ul表示行人图像输入第l个上采样层后获得的特征图,Upsamplel(·)表示第l个上采样层执行的上采样操作,image为行人图像,l=1,2,3;Where Ul represents the feature map obtained after the pedestrian image is input into the lth upsampling layer, Upsamplel (·) represents the upsampling operation performed by the lth upsampling layer, image is the pedestrian image, l = 1, 2, 3;S302,将Ul输入通道注意力机制层获得通道注意力权重Pl,计算过程如下:S302, input Ul into the channel attention mechanism layer to obtain the channel attention weight Pl , and the calculation process is as follows:Pl=σ(MLP(MPool(Ul)+APool(Ul)))Pl =σ(MLP(MPool(Ul )+APool(Ul )))其中,σ(·)为Sigmoid激活函数,MPool(·)和APool(·)分别表示最大池化和平均池化操作,MLP(·)为多层感知机;将Pl与Ul进行乘积操作获得通道注意力特征通过通道注意力机制层,网络能够有效地捕捉行人图像中的局部空间结构信息,增强对行人特征的关注,抑制不重要的环境特征。Where σ(·) is the Sigmoid activation function, MPool(·) and APool(·) represent the maximum pooling and average pooling operations respectively, and MLP(·) is a multi-layer perceptron. The channel attention feature is obtained by multiplying Pl and Ul. Through the channel attention mechanism layer, the network can effectively capture the local spatial structure information in pedestrian images, enhance the attention to pedestrian features, and suppress unimportant environmental features.4.如权利要求1所述的一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于,所述步骤3的具体过程为:4. The method for pedestrian target detection based on a multi-scale multi-feature neural network according to claim 1, wherein the specific process of step 3 is as follows:所述多尺度特征融合网络包括多头注意力机制层和特征映射层;The multi-scale feature fusion network includes a multi-head attention mechanism layer and a feature mapping layer;S401,将所述步骤2获得的通道注意力特征并行输入多头注意力机制层中获得表征交互关系的融合特征S1,具体计算过程如下:S401, the channel attention features obtained in step 2 and Input multiple attention layers in parallel to obtain representations and The fusion feature S1 of the interaction relationship is calculated as follows:其中,MultiHead(Q,K,V)为多头注意力机制的输出,在该机制中,每个头head都会生成一个输出向量,然后将这些输出向量连接起来,Concat为连接操作,在连接之后,将得到的向量乘以一个权重矩阵Wo进行线性变换,将其映射到期望的输出维度上,分别为第i个注意力头的线性投影,将查询、键和值映射到对应子空间;Among them, MultiHead(Q, K, V) is the output of the multi-head attention mechanism. In this mechanism, each head generates an output vector, and then these output vectors are connected. Concat is a connection operation. After the connection, the obtained vector is multiplied by a weight matrix Wo for linear transformation to map it to the desired output dimension. are the linear projections of the i-th attention head, mapping the query, key, and value to the corresponding subspace;S402,将S1输入特征映射层进一步细化,获得更加丰富和适应性强的特征表示S2,从而提高整个网络的性能,S2中第j个元素通过对S1中第j个元素邻域内的元素进行加权求和计算得到。S402, further refine the feature mapping layer ofS1 input to obtain a richer and more adaptable feature representationS2 , thereby improving the performance of the entire network. The j-th element inS2 is obtained by weighted summing the elements in the neighborhood of the j-th element inS1 .5.如权利要求1所述的一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于,所述步骤4的具体过程为:5. The method for pedestrian target detection based on a multi-scale multi-feature neural network according to claim 1, wherein the specific process of step 4 is:所述行人目标检测网络包括空间注意力机制层和YOLOV8目标检测头部网络;The pedestrian target detection network includes a spatial attention mechanism layer and a YOLOV8 target detection head network;S501,将所述步骤3获得的S2输入空间注意力机制层获得空间注意力权重K,具体计算过程如下:S501, inputS2 obtained in step 3 into the spatial attention mechanism layer to obtain the spatial attention weight K. The specific calculation process is as follows:K=σ(C(MPool(S2)+APool(S2)))K=σ(C(MPool(S2 )+APool(S2 )))其中,C(·)表示卷积操作;将K与S2进行乘积操作获得空间注意力特征S3;空间注意力机制可以帮助模型更准确地定位行人在图像中的位置;Where C(·) represents the convolution operation; K is multiplied by S2 to obtain the spatial attention feature S3 ; the spatial attention mechanism can help the model locate the position of pedestrians in the image more accurately;S502,将S3输入YOLOV8目标检测头部网络获得行人目标检测的结果;YOLOv8的头部网络采用了解耦头结构,将分类和检测头分离。这种设计使得模型能够更专注于各自的任务,从而提高对行人图像中行人的定位精度。S502, inputS3 into the YOLOv8 target detection head network to obtain the pedestrian target detection result; the YOLOv8 head network adopts a decoupled head structure to separate the classification and detection heads. This design allows the model to focus more on its respective tasks, thereby improving the positioning accuracy of pedestrians in pedestrian images.6.如权利要求1所述的一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于:所述步骤5模型训练具体包括以下过程:6. The method for pedestrian target detection based on a multi-scale multi-feature neural network as claimed in claim 1, characterized in that: the model training in step 5 specifically includes the following process:S601,定义损失函数,选取交叉熵损失函数作为分类损失函数,选取平滑L1损失函数作为边界框回归损失函数;S601, define a loss function, select a cross entropy loss function as a classification loss function, and select a smooth L1 loss function as a bounding box regression loss function;交叉熵损失函数的计算公式如下:The calculation formula of the cross entropy loss function is as follows:其中,N是样本总数,C是类别总数,yi,c是样本i的真实标签的独热编码,pi,c是模型预测样本i属于类别c的概率;Where N is the total number of samples, C is the total number of categories, yi,c is the one-hot encoding of the true label of sample i, and pi,c is the probability that the model predicts that sample i belongs to category c;平滑L1损失函数的计算公式如下:The calculation formula of the smooth L1 loss function is as follows:其中,x是预测值和真实值之间的差异;Where x is the difference between the predicted value and the true value;S602,设置模型终止条件,当模型在训练过程中的loss连续10次不发生变化时,便终止模型的训练;S602, setting the model termination condition, when the loss of the model does not change for 10 consecutive times during the training process, the model training is terminated;S603,设置模型的优化器,选取Adam作为模型训练的优化器,用来更新模型的参数以及提升模型的收敛速度;S603, setting the optimizer of the model, selecting Adam as the optimizer for model training, to update the parameters of the model and improve the convergence speed of the model;S604,将训练完备的最优模型进行保存,用于后续的模型部署。S604, saving the fully trained optimal model for subsequent model deployment.7.如权利要求1所述的一种基于多尺度多特征神经网络的行人目标检测方法,其特征在于:所述步骤5对于模型的部署具体包括以下过程:7. The method for pedestrian target detection based on a multi-scale multi-feature neural network according to claim 1, wherein the deployment of the model in step 5 specifically includes the following process:S701,将训练完备的模型转换成兼容硬件平台的格式,并进行模型加载和初始化,包括加载模型权重,为模型分配运行的内存空间等操作;S701, converting the trained model into a format compatible with the hardware platform, and loading and initializing the model, including loading model weights, allocating memory space for the model to run, etc.;S702,输入待检测的图像数据,模型便可检测中图像中的行人,计算过程如下:S702, input the image data to be detected, and the model can detect pedestrians in the image. The calculation process is as follows:Result=Model(Image)Result = Model (Image)其中Model表示训练好的网络,Image表示输入的待检测图像,Result表示行人检测的结果。Model represents the trained network, Image represents the input image to be detected, and Result represents the result of pedestrian detection.8.一种基于多尺度多特征神经网络的行人目标检测设备,其特征在于:所述设备包括至少一个处理器和至少一个存储器,所述处理器和存储器相耦合;所述存储器中存储有如权利要求1至5任意一项所述的构建方法所构建的基于多尺度多特征神经网络的行人目标检测模型的计算机执行程序;所述处理器执行存储器中存储的计算机执行程序时,使处理器执行一种基于多尺度多特征神经网络的行人目标检测方法。8. A pedestrian target detection device based on a multi-scale and multi-feature neural network, characterized in that: the device includes at least one processor and at least one memory, and the processor and the memory are coupled; the memory stores a computer execution program of a pedestrian target detection model based on a multi-scale and multi-feature neural network constructed by the construction method described in any one of claims 1 to 5; when the processor executes the computer execution program stored in the memory, the processor executes a pedestrian target detection method based on a multi-scale and multi-feature neural network.
CN202410996723.5A2024-07-242024-07-24 A pedestrian detection method based on multi-scale and multi-feature neural networkPendingCN118942114A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410996723.5ACN118942114A (en)2024-07-242024-07-24 A pedestrian detection method based on multi-scale and multi-feature neural network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410996723.5ACN118942114A (en)2024-07-242024-07-24 A pedestrian detection method based on multi-scale and multi-feature neural network

Publications (1)

Publication NumberPublication Date
CN118942114Atrue CN118942114A (en)2024-11-12

Family

ID=93358035

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410996723.5APendingCN118942114A (en)2024-07-242024-07-24 A pedestrian detection method based on multi-scale and multi-feature neural network

Country Status (1)

CountryLink
CN (1)CN118942114A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114202672A (en)*2021-12-092022-03-18南京理工大学 A small object detection method based on attention mechanism
CN114332921A (en)*2021-12-142022-04-12长讯通信服务有限公司 Pedestrian detection method based on Faster R-CNN network based on improved clustering algorithm
CN114581452A (en)*2022-03-022022-06-03腾讯科技(上海)有限公司Split network training method, device, equipment, computer program and medium
CN117058632A (en)*2023-08-212023-11-14浙江华为通信技术有限公司Yolov3 traffic sign detection classification method and system based on traffic simulation sand table
CN117274883A (en)*2023-11-202023-12-22南昌工程学院 Target tracking method and system based on multi-head attention optimization feature fusion network
CN117974572A (en)*2024-01-032024-05-03浙江理工大学 A three-stage cascade detection method for glass fiber cloth defects
CN118015539A (en)*2024-02-012024-05-10金陵科技学院 Improved YOLOv8 dense pedestrian detection method based on GSConv+VOV-GSCSP
CN118229961A (en)*2024-04-102024-06-21湖南君领科技有限公司Infrared target detection method, device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114202672A (en)*2021-12-092022-03-18南京理工大学 A small object detection method based on attention mechanism
CN114332921A (en)*2021-12-142022-04-12长讯通信服务有限公司 Pedestrian detection method based on Faster R-CNN network based on improved clustering algorithm
CN114581452A (en)*2022-03-022022-06-03腾讯科技(上海)有限公司Split network training method, device, equipment, computer program and medium
CN117058632A (en)*2023-08-212023-11-14浙江华为通信技术有限公司Yolov3 traffic sign detection classification method and system based on traffic simulation sand table
CN117274883A (en)*2023-11-202023-12-22南昌工程学院 Target tracking method and system based on multi-head attention optimization feature fusion network
CN117974572A (en)*2024-01-032024-05-03浙江理工大学 A three-stage cascade detection method for glass fiber cloth defects
CN118015539A (en)*2024-02-012024-05-10金陵科技学院 Improved YOLOv8 dense pedestrian detection method based on GSConv+VOV-GSCSP
CN118229961A (en)*2024-04-102024-06-21湖南君领科技有限公司Infrared target detection method, device, computer equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
YOUDONG HE ET AL.: "Object Detector with Multi-head Self-attention and Multi-scale Fusion", 2022 INTERNATIONAL CONFERENCE ON ALGORITHMS, DATA MINING, AND INFORMATION TECHMOLOGY(ADMIT), 25 September 2023 (2023-09-25)*
张国立等: "基于可变形卷积和多尺度残差注意力的多光谱行人检测", 激光与光电子学进展, 31 May 2024 (2024-05-31), pages 1037004*
徐瑞琪等: "基于多尺度特征融合和多头自注意力机制的非侵入式负荷监测", 科学技术与工程, 30 June 2024 (2024-06-30), pages 2385 - 2395*
方明等: "基于注意力的多尺度水下图像增强网络", 电子与信息学报, 28 December 2021 (2021-12-28), pages 3513 - 3521*
郭强,孟祥众: "多尺度语义特征水下图像增强研究", 兵器装备工程学报, 30 November 2022 (2022-11-30), pages 95 - 102*
陈纯毅等: "多尺度注意力融合的图像超分辨率重建", 中国光学, 31 October 2023 (2023-10-31), pages 1034*

Similar Documents

PublicationPublication DateTitle
CN114202672B (en) A small object detection method based on attention mechanism
CN110059558B (en) A Real-time Detection Method of Orchard Obstacles Based on Improved SSD Network
US11657602B2 (en)Font identification from imagery
Qu et al.RGBD salient object detection via deep fusion
Alani et al.Hand gesture recognition using an adapted convolutional neural network with data augmentation
CN109993102B (en)Similar face retrieval method, device and storage medium
CN112801018A (en)Cross-scene target automatic identification and tracking method and application
CN110929593B (en)Real-time significance pedestrian detection method based on detail discrimination
CN110929577A (en) An improved target recognition method based on YOLOv3 lightweight framework
US20180114071A1 (en)Method for analysing media content
CN110909820A (en) Image classification method and system based on self-supervised learning
CN113807399A (en)Neural network training method, neural network detection method and neural network detection device
CN110555420B (en)Fusion model network and method based on pedestrian regional feature extraction and re-identification
CN117036843B (en) Target detection model training method, target detection method and device
CN110414344A (en)A kind of human classification method, intelligent terminal and storage medium based on video
CN112861917A (en)Weak supervision target detection method based on image attribute learning
CN113487610B (en)Herpes image recognition method and device, computer equipment and storage medium
CN111881833B (en)Vehicle detection method, device, equipment and storage medium
CN115187772A (en) Target detection network training and target detection method, device and equipment
CN118429804B (en) Remote sensing image target detection method based on local-global feature complementary perception module
CN117876831A (en)Target detection and identification method, device, electronic equipment and storage medium
CN111178370B (en)Vehicle searching method and related device
CN118262258B (en)Ground environment image aberration detection method and system
CN113128487A (en)Dual-gradient-based weak supervision target positioning method and device
CN118379629A (en) A building color recognition detection model and method based on clustering algorithm

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp