Movatterモバイル変換


[0]ホーム

URL:


CN110287826A - A Video Object Detection Method Based on Attention Mechanism - Google Patents

A Video Object Detection Method Based on Attention Mechanism
Download PDF

Info

Publication number
CN110287826A
CN110287826ACN201910499786.9ACN201910499786ACN110287826ACN 110287826 ACN110287826 ACN 110287826ACN 201910499786 ACN201910499786 ACN 201910499786ACN 110287826 ACN110287826 ACN 110287826A
Authority
CN
China
Prior art keywords
detected
frame
feature
characteristic pattern
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910499786.9A
Other languages
Chinese (zh)
Other versions
CN110287826B (en
Inventor
李建强
白骏
刘雅琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kuaima (Beijing) Electronic Technology Co.,Ltd.
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of TechnologyfiledCriticalBeijing University of Technology
Priority to CN201910499786.9ApriorityCriticalpatent/CN110287826B/en
Publication of CN110287826ApublicationCriticalpatent/CN110287826A/en
Application grantedgrantedCritical
Publication of CN110287826BpublicationCriticalpatent/CN110287826B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及一种基于注意力机制的视频目标检测方法,涉及计算机视觉。本发明包括如下步骤:步骤S1,提取当前时间帧的候选特征图;步骤S2,在过去时间段设定融合窗口,计算窗口内各帧的拉普拉斯方差,将方差归一化作为窗口内各帧的权重,将窗口内所有帧的候选特征图进行加权求和得到时序特征,将当前时间帧的候选特征与时序特征相连接,得到待检测特征图;步骤S3,利用卷积层在待检测特征图上提取出额外尺度的特征图;步骤S4,在不同尺度的特征图上利用卷积层进行目标类别及位置预测。本发明的特征融合方法对过去时间段内不同质量的帧特征分配了不同的权重,使得时序信息的融合更加充分,提高了检测模型的性能。

The invention relates to a video target detection method based on an attention mechanism, and relates to computer vision. The present invention includes the following steps: step S1, extracting the candidate feature map of the current time frame; step S2, setting the fusion window in the past time period, calculating the Laplacian variance of each frame in the window, and normalizing the variance as the The weight of each frame, weighted and summed the candidate feature maps of all frames in the window to obtain the time series features, and connected the candidate features of the current time frame with the time series features to obtain the feature map to be detected; step S3, using the convolutional layer in the waiting Extract feature maps of additional scales from the detection feature map; step S4, use convolutional layers on feature maps of different scales to predict target categories and locations. The feature fusion method of the present invention assigns different weights to the frame features of different qualities in the past time period, so that the fusion of time series information is more sufficient and the performance of the detection model is improved.

Description

Translated fromChinese
一种基于注意力机制的视频目标检测方法A Video Object Detection Method Based on Attention Mechanism

技术领域technical field

本发明涉及计算机视觉,涉及深度学习,涉及视频目标检测技术。The invention relates to computer vision, deep learning, and video target detection technology.

背景技术Background technique

基于深度学习的图像目标检测方法在过去五年的时间取得了巨大的进展,如RCNN系列网络,SSD网络以及YOLO系列网络。但在视频监控、车辆辅助驾驶等领域,基于视频的目标检测有着更为广泛的需求。由于视频中存在运动模糊,遮挡,形态变化多样性,光照变化多样性等问题,仅利用图像目标检测技术检测视频中的目标并不能得到很好的检测结果。视频中相邻帧与帧之间在时间上有连续性,在空间上有相似性,帧与帧之间目标的位置是相关联的,如何利用视频中目标时序信息成为提升视频目标检测性能的关键。Image target detection methods based on deep learning have made great progress in the past five years, such as RCNN series networks, SSD networks and YOLO series networks. However, in the fields of video surveillance, vehicle assisted driving and other fields, video-based target detection has a wider demand. Due to the problems of motion blur, occlusion, variety of morphological changes, and diversity of illumination changes in the video, it is not possible to obtain good detection results only by using the image target detection technology to detect the target in the video. Adjacent frames in the video have continuity in time and similarity in space, and the position of the target between frames is related. How to use the timing information of the target in the video becomes the key to improve the performance of video target detection. The essential.

目前的视频目标检测框架主要有三类:一种将视频帧视为独立的图像利用图像目标检测算法进行检测,这类方法忽略了时间信息独立地对各帧进行检测,效果并不理想;另一种方法将目标检测与目标跟踪技术相结合,这类方法在检测的结果上进行后处理以便跟踪目标,跟踪的精度依赖于检测,易造成错误传播;还有一种方法只在少数关键帧上进行检测,然后利用光流信息和关键帧特征来生成其余帧的特征,这类方法虽然利用了时序信息但是光流的计算成本很大,难以进行快速检测。There are three main types of current video object detection frameworks: one regards video frames as independent images and uses image object detection algorithms to detect them. This method ignores time information and independently detects each frame, and the effect is not ideal; the other One method combines target detection and target tracking technology. This type of method performs post-processing on the detection results in order to track the target. The tracking accuracy depends on the detection, which is easy to cause error propagation; there is another method that only performs on a few key frames. Detection, and then use the optical flow information and key frame features to generate the features of the remaining frames. Although this type of method uses timing information, the calculation cost of optical flow is very high, and it is difficult to perform fast detection.

发明内容Contents of the invention

本发明的目的是提供一种充分融合时序特征的、快速准确的视频目标检测方法。The purpose of the present invention is to provide a fast and accurate video target detection method that fully integrates time series features.

为解决上述技术问题,本发明提供了一种基于注意力机制的视频目标检测方法,包括如下步骤:In order to solve the above-mentioned technical problems, the present invention provides a kind of video object detection method based on attention mechanism, comprises the following steps:

步骤S1,将当前时间点的视频帧图像输入Mobilenet网络提取得到候选特征图;Step S1, inputting the video frame image at the current time point into the Mobilenet network to extract candidate feature maps;

步骤S2,在与当前时间点相邻的过去时间段内设定一个时序特征融合窗口,对特征融合窗口内的待融合的视频帧,分别计算其图像拉普拉斯方差,将其归一化后,作为各待融合帧的融合权重,按照融合权重将所有待融合帧的候选特征图进行加权求和得到当前帧所需的时序特征,将当前时间步视频帧的候选特征与时序特征在特征的通道维相连接,得到融合了时序信息的待检测特征图;Step S2, set a time-series feature fusion window in the past time period adjacent to the current time point, and calculate the image Laplacian variance of the video frames to be fused in the feature fusion window, and normalize them Finally, as the fusion weight of each frame to be fused, according to the fusion weight, the candidate feature maps of all frames to be fused are weighted and summed to obtain the timing features required by the current frame, and the candidate features and timing features of the current time step video frame are combined in the feature The channel dimensions are connected to obtain a feature map to be detected that incorporates time series information;

步骤S3,利用卷积特征提取层以及最大池化层在待检测特征图上提取出额外尺度的待检测特征图;Step S3, using the convolutional feature extraction layer and the maximum pooling layer to extract additional scale feature maps to be detected on the feature map to be detected;

步骤S4,在不同尺度的待检测特征图上,利用卷积层进行当前帧上目标类别以及边界框坐标的预测。Step S4, on the feature maps to be detected of different scales, use the convolutional layer to predict the target category and bounding box coordinates on the current frame.

进一步,步骤S1中,对当前时间点t的视频帧进行检测,首先将当前时间点视频帧图像It输入Mobilenet网络进行特征提取,其中HI和WI分别为视频帧的高和宽,提取得到候选特征图代表实数,C1,H1和W1分别为候选特征图的特征通道数、高和宽。Further, in step S1, the video frame of current time point t is detected, at first the current time point video frame image It is input into Mobilenet network for feature extraction, wherein HI and WI are the height and width of the video frame respectively, and the candidate feature map is extracted Represents a real number, C1 , H1 and W1 are the number of feature channels, height and width of the candidate feature map, respectively.

进一步,步骤S2中,在当前时间点t的过去时间段内设定一个宽度w为s的特征融合窗口,令特征融合窗口内的待融合视频帧图像为:{It-i}i∈[1,s],特征融合窗口内待融合视频帧对应的候选特征图为:{Ft-i}i∈[1,s]。将每一个待融合视频帧图像It-i转换为灰度图Gt-i,并在灰度图的基础上计算图像的拉普拉斯方差,在灰度图G坐标(x,y)处的拉普拉斯算子为图像的拉普拉斯算子通过计算图像各像素点各方向的二阶导数,来捕捉图像中像素值急剧变化的区域,可以用来检测图像中的边角,图像的拉普拉斯方差则体现了整个图像的像素值变化情况,如果拉普拉斯方差较大,则说明图像较清晰,反之图像较为模糊。Further, in step S2, a feature fusion window with a width w of s is set in the past time period of the current time point t, so that the image of the video frame to be fused in the feature fusion window is: {Iti }i∈[1, s], the candidate feature map corresponding to the video frame to be fused in the feature fusion window is: {Fti }i∈[1, s]. Convert each video frame image Iti to be fused into a grayscale image Gti , and calculate the Laplacian variance of the image on the basis of the grayscale image, and the Laplacian variance at the grayscale image G coordinate (x, y) The Las operator is The Laplacian operator of the image captures the region where the pixel value changes sharply in the image by calculating the second derivative of each pixel in each direction of the image, which can be used to detect the corners in the image, and the Laplacian variance of the image is It reflects the change of the pixel value of the entire image. If the Laplacian variance is large, the image is clearer, otherwise the image is blurred.

首先计算每个灰度图Gt-i的拉普拉斯均值HI和WI分别为灰度图的高和宽:First calculate the Laplacian mean of each grayscale image Gti HI and WI are the height and width of the grayscale image respectively:

接下来计算每个灰度图Gt-i的拉普拉斯方差Next calculate the Laplacian variance of each grayscale image Gti

如果视频帧较清晰,则其候选特征有助于目标的检测,反之一些帧由于运动目标造成图像模糊。这些帧的候选特征不利于检测目标,对于不同清晰程度的视频帧,应分配不同的融合权重,从而使得检测模型更专注于清晰的特征而不是模糊的特征,首先计算所有待融合视频帧的融合权重αt-iIf the video frame is clear, its candidate features are helpful for object detection, otherwise some frames are blurred due to moving objects. The candidate features of these frames are not conducive to the detection of targets. For video frames with different degrees of clarity, different fusion weights should be assigned, so that the detection model can focus more on clear features rather than blurred features. First, calculate the fusion of all video frames to be fused Weight αti :

将特征融合窗口内的帧候选特征以加权求和的方式进行融合得到当前时间点的时序特征将时序特征与当前帧的候选特征在通道维进行连接,完成时序信息的融合,得到第一个用于检测的待检测特征图。The frame candidate features in the feature fusion window are fused in a weighted summation method to obtain the time series features at the current time point The timing features are connected with the candidate features of the current frame in the channel dimension to complete the fusion of timing information and obtain the first feature map to be detected for detection.

进一步,步骤S3中,在得到当前时间点融合了时序特征的待检测特征图后,为了得到更多尺度的待检测特征图,利用3×3卷积层和2×2池化层对待检测特征图进行进一步特征提取同时减小待检测特征图的尺寸,这样在尺寸大的待检测特征图中局部信息较为丰富,适合对小尺寸目标进行预测,尺寸小的待检测特征图含有更强的全局语义信息,适合检测尺寸较大的目标,经过e-1次特征提取,最终得到e个待检测特征图:Further, in step S3, after obtaining the feature map to be detected that combines the time series features at the current time point Finally, in order to obtain more scale feature maps to be detected, use 3×3 convolutional layer and 2×2 pooling layer to perform further feature extraction on the feature map to be detected while reducing the size of the feature map to be detected. The local information in the feature map to be detected is relatively rich, which is suitable for predicting small-sized objects. The small-sized feature map to be detected contains stronger global semantic information and is suitable for detecting larger-sized objects. After e-1 feature extraction, the final Get e feature maps to be detected:

进一步,步骤S4中,经过额外的特征提取,获得了多尺度的待检测特征图,通过在不同尺度的待检测图上设置具有先验位置的锚框,利用两个3×3卷积层在这些待检测特征图上利用通道维分别进行目标边界框相对锚框的偏移量和目标的类别。令类别数为d(包括背景),对于每个待检测特征图经过3×3卷积类别预测层和3×3卷积边界框预测层预测后得到分类预测结果以及边界框预测结果Further, in step S4, after additional feature extraction, a multi-scale feature map to be detected is obtained. By setting anchor boxes with prior positions on different scales to be detected, two 3×3 convolutional layers are used in the On these feature maps to be detected, the channel dimension is used to calculate the offset of the target bounding box relative to the anchor box and the category of the target. Let the number of categories be d (including background), for each feature map to be detected After 3×3 convolutional category prediction layer and 3×3 convolutional bounding box prediction layer, the classification prediction result is obtained and bounding box prediction results

附图说明Description of drawings

图1是本发明示意图。Figure 1 is a schematic diagram of the present invention.

具体实施方式Detailed ways

现在结合附图对本发明作进一步详细的说明。这些附图均为简化示意图,仅以示意方式说明本发明的基本结构,因此其仅显示与本发明有关的构成。The present invention is described in further detail now in conjunction with accompanying drawing. These drawings are all simplified schematic diagrams, and only illustrate the basic structure of the present invention in a schematic manner, so they only show the configurations related to the present invention.

实施例1Example 1

如图1所示,本实例提供了一种基于注意力机制的视频目标检测方法,包括如下步骤As shown in Figure 1, this example provides a video target detection method based on the attention mechanism, including the following steps

步骤S1,将当前时间点的视频帧图像输入Mobilenet网络提取得到候选特征图;Step S1, inputting the video frame image at the current time point into the Mobilenet network to extract candidate feature maps;

步骤S2,在与当前时间点相邻的过去时间段内设定一个时序特征融合窗口,对于特征融合窗口内的待融合的视频帧,分别计算其图像拉普拉斯方差,将其归一化后,作为各待融合帧的融合权重,按照权重将所有待融合帧的候选特征图进行加权求和得到当前帧所需的时序特征,将当前时间步视频帧的候选特征与时序特征在通道维相连接,得到融合了时序信息的待检测特征图;Step S2, set a temporal feature fusion window in the past time period adjacent to the current time point, and calculate the image Laplacian variance of the video frames to be fused in the feature fusion window, and normalize them Finally, as the fusion weight of each frame to be fused, the candidate feature maps of all frames to be fused are weighted and summed according to the weight to obtain the timing features required by the current frame, and the candidate features and timing features of the current time step video frame are in the channel dimension are connected to obtain a feature map to be detected that incorporates time series information;

步骤S3,利用卷积特征提取层以及最大池化层在待检测特征图上提取出额外尺度的待检测特征图;Step S3, using the convolutional feature extraction layer and the maximum pooling layer to extract additional scale feature maps to be detected on the feature map to be detected;

步骤S4,在不同尺度的待检测特征图上,利用卷积层进行当前帧上目标类别以及边界框坐标的预测。Step S4, on the feature maps to be detected of different scales, use the convolutional layer to predict the target category and bounding box coordinates on the current frame.

所述步骤S1中,对当前时间点t视频帧进行检测首先将当前时间点视频帧图像It输入Mobilenet进行特征提取,其中HI和WI分别是帧图像的高和宽,提取得到候选特征图其中C1,H1,W1分别为候选特征图的通道数,高和宽。In described step S1, the current time point t video frame is detected and at first the current time point video frame image It is input into Mobilenet to carry out feature extraction, wherein HI and WI are the height and width of the frame image respectively, and the candidate feature map is extracted Among them, C1 , H1 , and W1 are the number of channels, height, and width of the candidate feature map, respectively.

所述步骤S2中,在当前时间点t的过去时间段内设定一个宽度w为s的特征融合窗口,令过去时间段的长度为q,则特征融合窗口宽度的设定规则如下式所示,即如果过去的时间步长度大于s,则将融合窗口宽度设置为s,若过去时间步长度小于s,没有足够多的特征,则将融合窗口宽度设置为过去时间步的长度。In the step S2, set a feature fusion window with a width w of s in the past time period of the current time point t, let the length of the past time period be q, then the setting rule of the feature fusion window width is shown in the following formula , that is, if the past time step length is greater than s, set the fusion window width to s, if the past time step length is less than s, and there are not enough features, then set the fusion window width to the length of the past time step.

令特征融合窗口内的待融合视频帧图像为:{It-i}i∈[1,s],特征融合窗口内待融合视频帧对应的候选特征图为:{Ft-i}i∈[1,s]。将每一个待融合视频帧图像It-i转换为灰度图Gt-i,并在灰度图的基础上计算图像的拉普拉斯方差,在灰度图G坐标(x,y)处的拉普拉斯算子为:Let the image of the video frame to be fused in the feature fusion window be: {Iti }i∈[1, s], and the candidate feature map corresponding to the video frame to be fused in the feature fusion window is: {Fti }i∈[1, s] ]. Convert each video frame image Iti to be fused into a grayscale image Gti , and calculate the Laplacian variance of the image on the basis of the grayscale image, and the Laplacian variance at the grayscale image G coordinate (x, y) The Las operator is:

其中G(x,y)代表灰度图G在坐标(x,y)处的像素值。图像的拉普拉斯算子通过计算图像各像素点各方向的二阶导数,来捕捉图像中像素值急剧变化的区域,可以用来检测图像中的边角,图像的拉普拉斯方差则体现了整个图像的像素值变化情况,如果拉普拉斯方差较大,则说明图像较清晰,反之图像较为模糊。Where G(x, y) represents the pixel value of the grayscale image G at coordinates (x, y). The Laplacian operator of the image captures the region where the pixel value changes sharply in the image by calculating the second derivative of each pixel in each direction of the image, which can be used to detect the corners in the image, and the Laplacian variance of the image is It reflects the change of the pixel value of the entire image. If the Laplacian variance is large, the image is clearer, otherwise the image is blurred.

首先计算每个灰度图Gt-i的拉普拉斯均值HI和WI分别为灰度图的高和宽。First calculate the Laplacian mean of each grayscale image Gti HI and WI are the height and width of the grayscale image, respectively.

接下来计算每个灰度图Gt-i的拉普拉斯方差Next calculate the Laplacian variance of each grayscale image Gti

如果视频帧较清晰,则其候选特征有助于目标的检测,反之一些帧由于运动目标造成图像模糊。这些帧的候选特征不利于检测目标,对于不同清晰程度的视频帧,应分配不同的融合权重,越清晰的帧特征权重越大,从而使得检测模型更专注于清晰的特征而不是模糊的特征,首先计算所有待融合视频帧的融合权重αt-iIf the video frame is clear, its candidate features are helpful for object detection, otherwise some frames are blurred due to moving objects. The candidate features of these frames are not conducive to the detection of targets. For video frames with different degrees of clarity, different fusion weights should be assigned. The clearer the frame feature weight is, the greater the weight of the feature, so that the detection model is more focused on clear features rather than blurred features. First calculate the fusion weight αti of all video frames to be fused:

将特征融合窗口内的帧候选特征以加权求和的方式进行融合得到当前时间点的时序特征The frame candidate features in the feature fusion window are fused in a weighted summation method to obtain the time series features at the current time point

将时序特征与当前帧的候选特征在通道维进行连接,完成时序信息的融合,得到第一个用于检测的待检测特征图Connect the timing features with the candidate features of the current frame in the channel dimension, complete the fusion of timing information, and obtain the first feature map to be detected for detection

所述步骤S3中,在得到当前时间点融合了时序特征的待检测特征图后,为了得到更多尺度的待检测特征图,利用卷积层和池化层对待检测特征图进行进一步特征提取同时减小待检测特征图的尺寸,这样在尺寸大的待检测特征图中局部信息较为丰富,适合对小尺寸目标进行预测,尺寸小的待检测特征图含有更强的全局语义信息,适合检测尺寸较大的目标,经过e-1次特征提取,最终得到e个待检测特征图:In the step S3, the feature map to be detected that combines the time series features at the current time point is obtained Finally, in order to obtain more scale feature maps to be detected, use the convolutional layer and pooling layer to perform further feature extraction and reduce the size of the feature map to be detected, so that the local The information is relatively rich, and it is suitable for predicting small-sized targets. The small-sized feature map to be detected contains stronger global semantic information and is suitable for detecting larger-sized targets. After e-1 feature extraction, e features to be detected are finally obtained. picture:

所述步骤S4中,经过额外的特征提取,获得了多尺度的待检测特征图,通过在不同尺度的待检测图上设置具有先验位置的锚框,利用两个卷积层在这些待检测特征图上利用通道维分别进行目标边界框相对锚框的偏移量和目标的类别。令类别数为d(包括背景),对于每个待检测特征图其中CFi,HFi,WFi分别为该特征图的通道数、高和宽,每个像素位置的锚框数为ni,经过卷积类别预测层和卷积边界框预测层预测后得到分类预测结果以及边界框预测结果In the step S4, after additional feature extraction, a multi-scale feature map to be detected is obtained, and by setting anchor boxes with prior positions on different scales to be detected, two convolutional layers are used to map these to be detected On the feature map, the channel dimension is used to calculate the offset of the target bounding box relative to the anchor box and the category of the target. Let the number of categories be d (including background), for each feature map to be detected Among them, CFi , HFi , WFi are the number of channels, height and width of the feature map respectively, and the number of anchor boxes at each pixel position is ni , which is obtained after prediction by the convolutional category prediction layer and convolutional bounding box prediction layer Classification Prediction Results and bounding box prediction results

Claims (5)

In the step S4, by additional feature extraction, multiple dimensioned characteristic pattern to be detected is obtained, by different scaleMapping to be checked on setting have priori position anchor frame, utilized on these characteristic patterns to be detected using two 3 × 3 convolutional layersChannel dimension carries out object boundary frame with respect to the offset of anchor frame and the classification of target respectively;By 3 × 3 convolution classification prediction intervals and3 × 3 convolution bounding box prediction intervals are for each characteristic pattern to be detectedIt is predicted by convolution classification prediction interval and convolution bounding boxClassification prediction result is obtained after layer predictionAnd bounding box prediction result
CN201910499786.9A2019-06-112019-06-11Video target detection method based on attention mechanismActiveCN110287826B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910499786.9ACN110287826B (en)2019-06-112019-06-11Video target detection method based on attention mechanism

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910499786.9ACN110287826B (en)2019-06-112019-06-11Video target detection method based on attention mechanism

Publications (2)

Publication NumberPublication Date
CN110287826Atrue CN110287826A (en)2019-09-27
CN110287826B CN110287826B (en)2021-09-17

Family

ID=68003699

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910499786.9AActiveCN110287826B (en)2019-06-112019-06-11Video target detection method based on attention mechanism

Country Status (1)

CountryLink
CN (1)CN110287826B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110674886A (en)*2019-10-082020-01-10中兴飞流信息科技有限公司Video target detection method fusing multi-level features
CN110751646A (en)*2019-10-282020-02-04支付宝(杭州)信息技术有限公司Method and device for identifying damage by using multiple image frames in vehicle video
CN111310609A (en)*2020-01-222020-06-19西安电子科技大学Video target detection method based on time sequence information and local feature similarity
CN112016472A (en)*2020-08-312020-12-01山东大学Driver attention area prediction method and system based on target dynamic information
CN112434607A (en)*2020-11-242021-03-02北京奇艺世纪科技有限公司Feature processing method and device, electronic equipment and computer-readable storage medium
CN112561001A (en)*2021-02-222021-03-26南京智莲森信息技术有限公司Video target detection method based on space-time feature deformable convolution fusion
CN112686913A (en)*2021-01-112021-04-20天津大学Object boundary detection and object segmentation model based on boundary attention consistency
CN113393491A (en)*2020-03-122021-09-14阿里巴巴集团控股有限公司Method and device for detecting target object from video and electronic equipment
CN113609995A (en)*2021-08-062021-11-05中国工商银行股份有限公司 Remote sensing image processing method, device and server
CN113688801A (en)*2021-10-222021-11-23南京智谱科技有限公司Chemical gas leakage detection method and system based on spectrum video
WO2022036567A1 (en)*2020-08-182022-02-24深圳市大疆创新科技有限公司Target detection method and device, and vehicle-mounted radar
CN114594770A (en)*2022-03-042022-06-07深圳市千乘机器人有限公司Inspection method for inspection robot without stopping
CN115131710A (en)*2022-07-052022-09-30福州大学 A real-time action detection method based on multi-scale feature fusion attention
CN116386134A (en)*2023-03-012023-07-04中国科学院深圳先进技术研究院 Timing action detection method, device, electronic device and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102393958A (en)*2011-07-162012-03-28西安电子科技大学Multi-focus image fusion method based on compressive sensing
CN103152513A (en)*2011-12-062013-06-12瑞昱半导体股份有限公司Image processing method and relative image processing device
CN103702032A (en)*2013-12-312014-04-02华为技术有限公司Image processing method, device and terminal equipment
CN105913404A (en)*2016-07-012016-08-31湖南源信光电科技有限公司Low-illumination imaging method based on frame accumulation
US20170127016A1 (en)*2015-10-292017-05-04Baidu Usa LlcSystems and methods for video paragraph captioning using hierarchical recurrent neural networks
CN107481238A (en)*2017-09-202017-12-15众安信息技术服务有限公司Image quality measure method and device
US20180060666A1 (en)*2016-08-292018-03-01Nec Laboratories America, Inc.Video system using dual stage attention based recurrent neural network for future event prediction
CN108921803A (en)*2018-06-292018-11-30华中科技大学A kind of defogging method based on millimeter wave and visual image fusion
CN109104568A (en)*2018-07-242018-12-28苏州佳世达光电有限公司The intelligent cleaning driving method and drive system of monitoring camera
CN109684912A (en)*2018-11-092019-04-26中国科学院计算技术研究所A kind of video presentation method and system based on information loss function
CN109829398A (en)*2019-01-162019-05-31北京航空航天大学A kind of object detection method in video based on Three dimensional convolution network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102393958A (en)*2011-07-162012-03-28西安电子科技大学Multi-focus image fusion method based on compressive sensing
CN103152513A (en)*2011-12-062013-06-12瑞昱半导体股份有限公司Image processing method and relative image processing device
CN103702032A (en)*2013-12-312014-04-02华为技术有限公司Image processing method, device and terminal equipment
US20170127016A1 (en)*2015-10-292017-05-04Baidu Usa LlcSystems and methods for video paragraph captioning using hierarchical recurrent neural networks
CN105913404A (en)*2016-07-012016-08-31湖南源信光电科技有限公司Low-illumination imaging method based on frame accumulation
US20180060666A1 (en)*2016-08-292018-03-01Nec Laboratories America, Inc.Video system using dual stage attention based recurrent neural network for future event prediction
CN107481238A (en)*2017-09-202017-12-15众安信息技术服务有限公司Image quality measure method and device
CN108921803A (en)*2018-06-292018-11-30华中科技大学A kind of defogging method based on millimeter wave and visual image fusion
CN109104568A (en)*2018-07-242018-12-28苏州佳世达光电有限公司The intelligent cleaning driving method and drive system of monitoring camera
CN109684912A (en)*2018-11-092019-04-26中国科学院计算技术研究所A kind of video presentation method and system based on information loss function
CN109829398A (en)*2019-01-162019-05-31北京航空航天大学A kind of object detection method in video based on Three dimensional convolution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIN WANG: "Infrared dim target detection based on visual attention", 《INFRARED PHYSICS & TECHNOLOGY》*
王昕: "基于提升小波变换的图像清晰度评价算法", 《万方数据知识服务平台》*

Cited By (20)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110674886B (en)*2019-10-082022-11-25中兴飞流信息科技有限公司Video target detection method fusing multi-level features
CN110674886A (en)*2019-10-082020-01-10中兴飞流信息科技有限公司Video target detection method fusing multi-level features
CN110751646A (en)*2019-10-282020-02-04支付宝(杭州)信息技术有限公司Method and device for identifying damage by using multiple image frames in vehicle video
CN111310609A (en)*2020-01-222020-06-19西安电子科技大学Video target detection method based on time sequence information and local feature similarity
CN113393491A (en)*2020-03-122021-09-14阿里巴巴集团控股有限公司Method and device for detecting target object from video and electronic equipment
WO2022036567A1 (en)*2020-08-182022-02-24深圳市大疆创新科技有限公司Target detection method and device, and vehicle-mounted radar
CN114450720A (en)*2020-08-182022-05-06深圳市大疆创新科技有限公司Target detection method and device and vehicle-mounted radar
CN112016472A (en)*2020-08-312020-12-01山东大学Driver attention area prediction method and system based on target dynamic information
CN112016472B (en)*2020-08-312023-08-22山东大学Driver attention area prediction method and system based on target dynamic information
CN112434607A (en)*2020-11-242021-03-02北京奇艺世纪科技有限公司Feature processing method and device, electronic equipment and computer-readable storage medium
CN112434607B (en)*2020-11-242023-05-26北京奇艺世纪科技有限公司Feature processing method, device, electronic equipment and computer readable storage medium
CN112686913A (en)*2021-01-112021-04-20天津大学Object boundary detection and object segmentation model based on boundary attention consistency
CN112686913B (en)*2021-01-112022-06-10天津大学 Object Boundary Detection and Object Segmentation Models Based on Boundary Attention Consistency
CN112561001A (en)*2021-02-222021-03-26南京智莲森信息技术有限公司Video target detection method based on space-time feature deformable convolution fusion
CN113609995A (en)*2021-08-062021-11-05中国工商银行股份有限公司 Remote sensing image processing method, device and server
CN113688801A (en)*2021-10-222021-11-23南京智谱科技有限公司Chemical gas leakage detection method and system based on spectrum video
CN114594770A (en)*2022-03-042022-06-07深圳市千乘机器人有限公司Inspection method for inspection robot without stopping
CN114594770B (en)*2022-03-042024-04-26深圳市千乘机器人有限公司Inspection method for inspection robot without stopping
CN115131710A (en)*2022-07-052022-09-30福州大学 A real-time action detection method based on multi-scale feature fusion attention
CN116386134A (en)*2023-03-012023-07-04中国科学院深圳先进技术研究院 Timing action detection method, device, electronic device and storage medium

Also Published As

Publication numberPublication date
CN110287826B (en)2021-09-17

Similar Documents

PublicationPublication DateTitle
CN110287826A (en) A Video Object Detection Method Based on Attention Mechanism
CN111460926B (en) A video pedestrian detection method incorporating multi-target tracking cues
CN111311666B (en)Monocular vision odometer method integrating edge features and deep learning
CN108509859B (en)Non-overlapping area pedestrian tracking method based on deep neural network
CN109242884B (en) Remote sensing video target tracking method based on JCFNet network
CN102426705B (en)Behavior splicing method of video scene
CN113344932B (en) A Semi-Supervised Single-Object Video Segmentation Method
WO2008020598A1 (en)Subject number detecting device and subject number detecting method
CN111160291B (en)Human eye detection method based on depth information and CNN
CN117949942A (en)Target tracking method and system based on fusion of radar data and video data
CN116645592B (en) A crack detection method and storage medium based on image processing
CN108256462A (en)A kind of demographic method in market monitor video
CN117974895B (en) A pipeline monocular video 3D reconstruction and depth prediction method and system
CN118172399A (en)Target ranging system based on self-supervision monocular depth estimation method
Liu et al.D-vpnet: A network for real-time dominant vanishing point detection in natural scenes
Guo et al.Monocular 3D multi-person pose estimation via predicting factorized correction factors
Midwinter et al.Unsupervised defect segmentation with pose priors
CN113888604B (en) A target tracking method based on deep optical flow
CN113920161B (en) A multi-target tracking method with fusion mechanism
CN111986233A (en)Large-scene minimum target remote sensing video tracking method based on feature self-learning
CN120147619A (en) Deformable object detection method based on adaptive feature extraction network and attention mechanism
WO2023093086A1 (en)Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product
Long et al.Detail preserving residual feature pyramid modules for optical flow
CN110910497A (en)Method and system for realizing augmented reality map
CN112380970B (en)Video target detection method based on local area search

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TR01Transfer of patent right

Effective date of registration:20241211

Address after:518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Patentee after:Shenzhen Wanzhida Technology Co.,Ltd.

Country or region after:China

Address before:100124 No. 100 Chaoyang District Ping Tian Park, Beijing

Patentee before:Beijing University of Technology

Country or region before:China

TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20250521

Address after:Room 1-103, 1st Floor, Building 3, No. 5 Guangmao Street, Daxing Economic Development Zone, Daxing District, Beijing, 102600

Patentee after:Kuaima (Beijing) Electronic Technology Co.,Ltd.

Country or region after:China

Address before:518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Patentee before:Shenzhen Wanzhida Technology Co.,Ltd.

Country or region before:China

TR01Transfer of patent right

[8]ページ先頭

©2009-2025 Movatter.jp