
技术领域technical field
本发明涉及计算机视觉的技术领域,特别是涉及一种基于排序网络的弱监督物体数目估计方法。The invention relates to the technical field of computer vision, in particular to a method for estimating the number of weakly supervised objects based on a ranking network.
背景技术Background technique
公共场合中通过摄像机实现人数、车辆等关键物体的计数具有重要的研究价值。比如:候车大厅中人群计数的结果、交通路口中的车辆数目估计,可优化公共交通的调度;某区域中人数的急剧变化既可能会导致意外事件的发生,又可能是意外事件发生的结果。因此图像视频中的物体数目估计在智能安防领域具有重要价值,是计算机视觉和智能视频监控领域的重要研究内容。It has important research value to realize the counting of key objects such as people and vehicles through cameras in public places. For example, the results of crowd counting in the waiting hall and the estimation of the number of vehicles in traffic intersections can optimize the scheduling of public transportation; a sharp change in the number of people in a certain area may lead to or be the result of an accident. Therefore, the estimation of the number of objects in images and videos has important value in the field of intelligent security and is an important research content in the fields of computer vision and intelligent video surveillance.
目前,物体数目估计方法大致可以分为三种:1)物体检测:这种方法比较直接,在物体较稀疏的场景中,通过检测图像中的物体,进而得到物体数目,这种方法在物体拥挤情况下不大奏效。2)视觉特征轨迹聚类:对于视频监控,一般用KLT跟踪器和聚类的方法,通过轨迹聚类得到的数目来物体数目。3)基于特征的回归:建立图像特征和图像物体数目的回归模型,通过测量图像特征从而估计场景中的物体数目。由于拥挤情况下采用直接法容易受到遮挡等难点问题的影响,而间接法从物体群体的整体特征出发,具有大规模物体计数的能力。At present, the number of objects estimation methods can be roughly divided into three types: 1) Object detection: This method is relatively straightforward. In scenes with sparse objects, the number of objects is obtained by detecting objects in the image. This method is used in crowded objects. Not very effective in this case. 2) Visual feature trajectory clustering: For video surveillance, KLT tracker and clustering methods are generally used, and the number of objects obtained by trajectory clustering is calculated. 3) Feature-based regression: establish a regression model of image features and the number of image objects, and estimate the number of objects in the scene by measuring image features. Since the direct method is easily affected by difficult problems such as occlusion under crowded conditions, the indirect method starts from the overall characteristics of the object group and has the ability to count objects on a large scale.
现有的基于特征回归的算法存在着以下缺点。首先,物体位置的标注通常很昂贵。现有的物体数目估计数据集提供了每个物体的位置来训练数目回归网络,而在评估阶段,却没有考虑这些位置标签,仅仅评估估计的物体数目的准确性。实际上,在不需要位置的情况下,可以仅标注图像中物体的数目,利用更有效的弱监督方法来训练物体数目估计模型。The existing feature regression-based algorithms have the following shortcomings. First, the annotation of object locations is usually expensive. Existing object number estimation datasets provide the location of each object to train the number regression network, while in the evaluation stage, these location labels are not considered and only the accuracy of the estimated object number is evaluated. In fact, without the need for location, it is possible to only label the number of objects in the image, using more efficient weakly supervised methods to train the object number estimation model.
发明内容SUMMARY OF THE INVENTION
为解决上述技术问题,本发明提供一种不需要物体位置标注信息、节省人力资源、提高物体数目估计准确性的基于排序网络的弱监督物体数目估计方法。In order to solve the above technical problems, the present invention provides a method for estimating the number of weakly supervised objects based on a sorting network, which does not require object position labeling information, saves human resources, and improves the accuracy of estimating the number of objects.
本发明的一种基于排序网络的弱监督物体数目估计方法,包括以下步骤:A method for estimating the number of weakly supervised objects based on a sorting network of the present invention includes the following steps:
S1、使用预训练好的深度神经网络如VGG-16提取图像特征,然后利用卷积操作回归密度图;利用自适应池化层从密度图中提取多尺度特征来捕获图像中的全局和局部信息,输入到全连接层回归物体数目。其中自适应池化层包括全局子簇层和局部子簇层两种类型。S1. Use a pre-trained deep neural network such as VGG-16 to extract image features, and then use the convolution operation to regress the density map; use an adaptive pooling layer to extract multi-scale features from the density map to capture global and local information in the image , which is input to the fully connected layer to regress the number of objects. The adaptive pooling layer includes two types of global sub-cluster layer and local sub-cluster layer.
S2、使用图像物体数目排序网络对多尺度特征进行学习,使得多尺度特征对物体数目敏感。这里的排序网络为多分支网络,其输入为多张图像的多尺度特征,输出为依据图像中物体的数目进行排序的结果。S2. Use the image object number sorting network to learn multi-scale features, so that the multi-scale features are sensitive to the number of objects. The sorting network here is a multi-branch network, whose input is the multi-scale features of multiple images, and the output is the result of sorting according to the number of objects in the image.
S3、排序网络中使用Sinkhorn层将排序特征变为序数矩阵,利用图像中物体的真实数目构造软标签传输矩阵,使用交叉熵损失来训练排序网络,得到对物体数目敏感的特征;然后训练回归网络,最终得到物体数目回归模型;S3. The Sinkhorn layer is used in the sorting network to change the sorting features into an ordinal matrix, and the soft label transmission matrix is constructed by using the real number of objects in the image, and the cross-entropy loss is used to train the sorting network to obtain features that are sensitive to the number of objects; then train the regression network. , and finally get the number of objects regression model;
本发明的的一种基于排序网络的弱监督物体数目估计方法,所述步骤S1的具体操作为:利用在图像分析任务上预训练好的深度网络模型提取图像特征,回归一个伪概率密度图;然后使用步幅较大的池化层构造全局子簇层,从密度图中提取全局特征;利用步幅较小的池化层构造局部子簇层,从密度图中提取局部特征。In a method for estimating the number of weakly supervised objects based on a sorting network of the present invention, the specific operations of the step S1 are: extracting image features using a deep network model pre-trained on the image analysis task, and returning a pseudo probability density map; Then, a pooling layer with a larger stride is used to construct a global subcluster layer to extract global features from the density map; a pooling layer with a smaller stride is used to construct a local subcluster layer to extract local features from the density map.
本发明的一种基于排序网络的弱监督物体数目估计方法,,所述步骤S2的具体操作为:使用多分支排序网络来微调特征提取模型,获取对图像中物体数目全局、局部特征In a method for estimating the number of weakly supervised objects based on a sorting network of the present invention, the specific operation of the step S2 is: using a multi-branch sorting network to fine-tune the feature extraction model, and obtain global and local characteristics of the number of objects in the image.
本发明的一种基于排序网络的弱监督物体数目估计方法,所述步骤S3的具体操作为:使用可微分的Sinkhorn层将排序特征变为序数矩阵;构造更有效的软标签运输矩阵来训练排序网络;使用交叉熵损失来训练排序网络,使用均方误差来训练回归网络。In a method for estimating the number of weakly supervised objects based on a sorting network of the present invention, the specific operations of the step S3 are: using a differentiable Sinkhorn layer to change the sorting feature into an ordinal matrix; constructing a more effective soft label transport matrix to train sorting Networks; use the cross-entropy loss to train the ranking network and the mean squared error to train the regression network.
本发明的有益效果为:排序网络能够通图像间物体数目的相对关系来学习对物体数目敏感的多尺度特征,用于回归网络的输入,避免使用物体的位置信息,不需要大量人力来标注物体位置信息。使用收可微分的Sinkhorn层,使得网络可以端到端训练;利用图像中物体数目的相对关系来构建软标签运输矩阵,有效的反应了排序任务的复杂程序,提升了物体数目估计的准确性。The beneficial effects of the invention are: the sorting network can learn multi-scale features sensitive to the number of objects through the relative relationship of the number of objects between images, which is used for the input of the regression network, avoids using the position information of the objects, and does not require a lot of manpower to label the objects. location information. The use of the differentiable Sinkhorn layer enables the network to be trained end-to-end; the relative relationship between the number of objects in the image is used to construct the soft label transport matrix, which effectively reflects the complex procedure of the sorting task and improves the accuracy of the number of objects estimated.
附图说明Description of drawings
图1是本发明的示意图。Figure 1 is a schematic diagram of the present invention.
具体实施方式Detailed ways
下面结合实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围The specific embodiments of the present invention will be further described in detail below with reference to the examples. The following examples are used to illustrate the present invention, but not to limit the scope of the present invention
实施例Example
S1、使用预训练好的深度神经网络如VGG-16提取图像特征,然后利用卷积操作回归密度图;利用多个池化层从密度图中提取多尺度特征来捕获图像中的全局和局部信息,输入到全连接层回归物体数目。其中自适应池化层包括全局子簇层和局部子簇层两种类型。全局子簇层使用三Max池化层,池化步长分别为8、16、32;局部子簇层使用两个Average池化层,池化步长为1、2;S1. Use a pre-trained deep neural network such as VGG-16 to extract image features, and then use convolution operations to regress the density map; use multiple pooling layers to extract multi-scale features from the density map to capture global and local information in the image , which is input to the fully connected layer to regress the number of objects. The adaptive pooling layer includes two types of global sub-cluster layer and local sub-cluster layer. The global subcluster layer uses three Max pooling layers, and the pooling steps are 8, 16, and 32 respectively; the local subcluster layer uses two Average pooling layers, and the pooling steps are 1 and 2;
S2、使用图像物体数目排序网络对多尺度特征进行学习,使得多尺度特征对物体数目敏感。这里的排序网络为多分支网络,其输入为多张图像的多尺度特征,输出为依据图像中物体的数目进行排序的结果。具体可采用K分支网络,提取K张图像的多尺度特征f1,f2,f3,…,fK然后计算f1-f2,f1-f3,…,f1-fk,f2-f4,…,f2-fK,…,fK-1-fK,输入到排序网络中,得到一个K(K-1)维的排序向量fd;S2. Use the image object number sorting network to learn multi-scale features, so that the multi-scale features are sensitive to the number of objects. The sorting network here is a multi-branch network, whose input is the multi-scale features of multiple images, and the output is the result of sorting according to the number of objects in the image. Specifically, a K branch network can be used to extract multi-scale features f1 , f2 , f3 ,..., fK of K images, and then calculate f1 -f2 , f1 -f3 ,...,f1 -fk , f2 -f4 ,…,f2 -fK ,…,fK-1 -fK , input into the sorting network, and get a K(K-1)-dimensional sorting vector fd ;
S3、排序网络中使用Sinkhorn层将排序特征fd变为序数矩阵P,其中第i行第j列个元素Pi,j表第i张图像排在第j名的概率;利用图像中物体的真实数目构造软标签传输矩阵S3. The Sinkhorn layer is used in the sorting network to change the sorting feature fd into an ordinal matrix P, in which the element Pi in the i-th row and the j-th column, j represents the probability that the i-th image is ranked in the j-th place; True Number Constructing Soft Label Transmission Matrix
用σ表示图像真实的排序结果,其中σ第i个元素σ(i)表示第i张图像排在第σ(i)个位置,则软标签矩阵中的元素计算方式如下:Use σ to represent the real sorting result of the image, where the i-th element σ(i) of σ indicates that the i-th image is ranked in the σ(i)-th position, then the elements in the soft label matrix are calculated as follows:
其中in
△thr为预先定义的阈值。然使用如下交叉熵损失来训练排序网络,得到对物体数目敏感的特征。△thr is a predefined threshold. However, the following cross-entropy loss is used to train the ranking network to obtain features that are sensitive to the number of objects.
然后使用均方误差损失来训练回归网络,最终得到物体数目回归模型。Then use the mean square error loss to train the regression network, and finally get the object number regression model.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变型,这些改进和变型也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010845336.3ACN112101122B (en) | 2020-08-20 | 2020-08-20 | Weak supervision object number estimation method based on sorting network |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010845336.3ACN112101122B (en) | 2020-08-20 | 2020-08-20 | Weak supervision object number estimation method based on sorting network |
| Publication Number | Publication Date |
|---|---|
| CN112101122Atrue CN112101122A (en) | 2020-12-18 |
| CN112101122B CN112101122B (en) | 2024-02-09 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010845336.3AExpired - Fee RelatedCN112101122B (en) | 2020-08-20 | 2020-08-20 | Weak supervision object number estimation method based on sorting network |
| Country | Link |
|---|---|
| CN (1) | CN112101122B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107301387A (en)* | 2017-06-16 | 2017-10-27 | 华南理工大学 | A kind of image Dense crowd method of counting based on deep learning |
| US20180165554A1 (en)* | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
| US20200226735A1 (en)* | 2017-03-16 | 2020-07-16 | Siemens Aktiengesellschaft | Visual localization in images using weakly supervised neural network |
| CN111428733A (en)* | 2020-03-12 | 2020-07-17 | 山东大学 | A zero-sample target detection method and system based on semantic feature space transformation |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180165554A1 (en)* | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
| US20200226735A1 (en)* | 2017-03-16 | 2020-07-16 | Siemens Aktiengesellschaft | Visual localization in images using weakly supervised neural network |
| CN107301387A (en)* | 2017-06-16 | 2017-10-27 | 华南理工大学 | A kind of image Dense crowd method of counting based on deep learning |
| CN111428733A (en)* | 2020-03-12 | 2020-07-17 | 山东大学 | A zero-sample target detection method and system based on semantic feature space transformation |
| Title |
|---|
| 郑宝玉;王雨;吴锦雯;周全;: "基于深度卷积神经网络的弱监督图像语义分割", 南京邮电大学学报(自然科学版), no. 05* |
| Publication number | Publication date |
|---|---|
| CN112101122B (en) | 2024-02-09 |
| Publication | Publication Date | Title |
|---|---|---|
| CN111259850B (en) | A Pedestrian Re-ID Method Fused with Random Batch Masking and Multi-scale Representation Learning | |
| CN110414432B (en) | Training method of object recognition model, object recognition method and corresponding device | |
| CN109271960B (en) | People counting method based on convolutional neural network | |
| Wang et al. | Dairy goat detection based on Faster R-CNN from surveillance video | |
| CN111783576B (en) | Pedestrian re-identification method based on improved YOLOv3 network and feature fusion | |
| CN106096561B (en) | Infrared pedestrian detection method based on image block deep learning features | |
| CN106897670B (en) | Express violence sorting identification method based on computer vision | |
| CN111783590A (en) | A Multi-Class Small Object Detection Method Based on Metric Learning | |
| CN111709311A (en) | A pedestrian re-identification method based on multi-scale convolutional feature fusion | |
| CN111340881B (en) | A Direct Visual Localization Method Based on Semantic Segmentation in Dynamic Scenes | |
| Han et al. | Image crowd counting using convolutional neural network and Markov random field | |
| CN105243154B (en) | Remote sensing image retrieval method based on notable point feature and sparse own coding and system | |
| CN107767416B (en) | Method for identifying pedestrian orientation in low-resolution image | |
| CN109886176B (en) | Lane line detection method in complex driving scene | |
| CN107818307B (en) | A multi-label video event detection method based on LSTM network | |
| CN111783589A (en) | A crowd counting method for complex scenes based on scene classification and multi-scale feature fusion | |
| Xiong et al. | Contrastive learning for automotive mmWave radar detection points based instance segmentation | |
| CN110555420A (en) | fusion model network and method based on pedestrian regional feature extraction and re-identification | |
| CN109034258A (en) | Weakly supervised object detection method based on certain objects pixel gradient figure | |
| Li et al. | An aerial image segmentation approach based on enhanced multi-scale convolutional neural network | |
| CN115063831A (en) | A high-performance pedestrian retrieval and re-identification method and device | |
| CN115147644A (en) | Image description model training and description method, system, device and storage medium | |
| CN116543192A (en) | A small-sample classification method for remote sensing images based on multi-view feature fusion | |
| CN106056609B (en) | Method based on DBNMI model realization remote sensing image automatic markings | |
| Pillai et al. | Fine-tuned EfficientNetB4 transfer learning model for weather classification |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20240209 |