CN110751163A

Movatterモバイル変換

Info

Publication number: CN110751163A
Application number: CN201810821904.9A
Authority: CN
Inventors: 张鹏
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Guangzhou Gaohang Technology Transfer Co ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2020-02-04
Anticipated expiration: 2038-07-24
Also published as: CN110751163B

Abstract

Translated fromChinese

本发明公开了一种目标定位方法及其装置、计算机可读存储介质和电子设备，该目标定位方法包括：对待识别图像进行特征提取得到多个通道特征；针对任意一个通道特征，确定所述通道特征对应的加权参数，所述通道特征对应的加权参数用于表征该通道特征与从所述待识别图像中定位出的目标位置的相关度；依据所述通道特征对应的加权参数对该通道特征进行修正处理；利用修正后的各个通道特征定位所述待识别图像中包含目标的目标位置。该目标定位方法可提高对目标定位的准确性。

The invention discloses a target positioning method and device, a computer-readable storage medium and electronic equipment. The target positioning method includes: extracting features from an image to be recognized to obtain multiple channel features; for any channel feature, determining the channel The weighting parameter corresponding to the feature, the weighting parameter corresponding to the channel feature is used to characterize the correlation between the channel feature and the target position located in the image to be identified; according to the weighting parameter corresponding to the channel feature, the channel feature Correction processing is performed; the target position containing the target in the to-be-recognized image is located by using the corrected channel features. The target localization method can improve the accuracy of target localization.

Description

Translated fromChinese

目标定位方法及其装置、计算机可读存储介质和电子设备Target positioning method and device, computer-readable storage medium and electronic device

技术领域technical field

本发明涉及图像识别技术领域，尤其涉及目标定位方法及其装置、计算机可读存储介质和电子设备。The present invention relates to the technical field of image recognition, and in particular, to a target positioning method and device thereof, a computer-readable storage medium and an electronic device.

背景技术Background technique

图像识别指利用计算机对图像进行处理、分析和理解，以检测并识别出图像中各种不同模式的目标的技术。Image recognition refers to the technology of using computers to process, analyze and understand images to detect and identify objects in various patterns in images.

基于图像识别的目标定位方法指从图像中识别出特定目标并确定该特定目标在图像中的位置的方法，目前，可采用神经网络进行目标定位。The target localization method based on image recognition refers to a method of identifying a specific target from an image and determining the position of the specific target in the image. At present, a neural network can be used for target localization.

现有的采用神经网络进行目标定位的方法的准确度还需要进一步提升。The accuracy of the existing methods for target localization using neural networks needs to be further improved.

发明内容SUMMARY OF THE INVENTION

本发明提供一种目标定位方法及其装置、计算机可读存储介质和电子设备，以解决相关技术中的不足。The present invention provides a target positioning method and device thereof, a computer-readable storage medium and an electronic device, so as to solve the deficiencies in the related art.

根据本发明实施例的第一方面，提供一种目标定位方法，包括：According to a first aspect of the embodiments of the present invention, a method for locating a target is provided, including:

对待识别图像进行特征提取得到多个通道特征；Perform feature extraction on the image to be recognized to obtain multiple channel features;

针对任意一个通道特征，确定所述通道特征对应的加权参数，所述通道特征对应的加权参数用于表征该通道特征与从所述待识别图像中定位出的目标位置的相关度；For any channel feature, determine the weighting parameter corresponding to the channel feature, and the weighting parameter corresponding to the channel feature is used to characterize the correlation between the channel feature and the target position located from the to-be-recognized image;

依据所述通道特征对应的加权参数对该通道特征进行修正处理；modifying the channel feature according to the weighting parameter corresponding to the channel feature;

利用修正后的各个通道特征定位所述待识别图像中包含目标的目标位置。The target position containing the target in the image to be identified is located by using the corrected channel features.

可选的，所述对待识别图像进行特征提取得到多通道特征，包括：Optionally, performing feature extraction on the image to be recognized to obtain multi-channel features, including:

将所述待识别图像输入到训练好的神经网络中，由所述神经网络的卷积层对所述待识别图像进行特征提取，得到多个通道特征；Input the image to be recognized into the trained neural network, and perform feature extraction on the image to be recognized by the convolution layer of the neural network to obtain multiple channel features;

所述神经网络通过如下步骤训练得到：The neural network is trained through the following steps:

搭建神经网络，所述神经网络包括卷积层、池化层和全连接层；Build a neural network, the neural network includes a convolution layer, a pooling layer and a fully connected layer;

获取训练样本，所述训练样本包括标记有目标类型的标记图像；Obtaining training samples, the training samples include marked images marked with target types;

将所述训练样本输入所述神经网络，以由所述神经网络输出对所述标记图像的目标类型识别结果，根据所述神经网络输出的目标类型识别结果和所述训练样本中的目标类型之间的差异，对所述神经网络中的参数进行更新；The training sample is input into the neural network, so that the neural network outputs the target type recognition result for the marked image, according to the difference between the target type recognition result output by the neural network and the target type in the training sample. The difference between the two, the parameters in the neural network are updated;

对所述神经网络经过训练样本的训练后，得到训练好的神经网络。After the neural network is trained on the training samples, a trained neural network is obtained.

可选的，在获取包括标记有目标类型的标记图像的训练样本之后，还包括：Optionally, after acquiring the training samples including the marked images marked with the target type, the method further includes:

对标记图像的部分区域进行遮挡预处理。Occlusion preprocessing is performed on part of the marked image.

可选的，所述确定所述通道特征对应的加权参数，包括：Optionally, the determining the weighting parameter corresponding to the channel feature includes:

将所述卷积层输出的多个通道特征输入到所述全连接层，由所述全连接层确定各个通道特征对应的加权参数。The multiple channel features output by the convolution layer are input to the fully connected layer, and the fully connected layer determines the weighting parameter corresponding to each channel feature.

可选的，所述确定通道特征对应的加权参数，包括：Optionally, the determining the weighting parameter corresponding to the channel feature includes:

对通道特征中各特征进行求导得到各特征的导数；The derivative of each feature is obtained by derivation of each feature in the channel feature;

将计算出的各特征的导数的平均值作为该通道特征对应的加权参数。The average value of the calculated derivatives of each feature is taken as the weighting parameter corresponding to the channel feature.

可选的，所述利用修正后的各个通道特征定位所述待识别图像中包含目标的目标位置，包括：Optionally, the use of the corrected channel features to locate the target position containing the target in the to-be-recognized image includes:

根据修正后的各个通道特征获取待识别图像的各位置的响应值，响应值表示该位置存在目标的概率；Obtain the response value of each position of the image to be identified according to the corrected channel characteristics, and the response value represents the probability of the existence of the target at the position;

确定大于阈值的响应值对应的位置，将包括大于阈值的响应值对应的位置的区域作为目标位置。The position corresponding to the response value greater than the threshold value is determined, and the area including the position corresponding to the response value greater than the threshold value is used as the target position.

根据本发明实施例的第二方面，提供一种目标定位装置，包括：According to a second aspect of the embodiments of the present invention, there is provided a target positioning device, comprising:

特征提取模块，用于对待识别图像进行特征提取得到多个通道特征；The feature extraction module is used to perform feature extraction on the image to be recognized to obtain multiple channel features;

加权参数确定模块，用于针对任意一个通道特征，确定所述通道特征对应的加权参数，所述通道特征对应的加权参数用于表征该通道特征与从所述待识别图像中定位出的目标位置的相关度；A weighting parameter determination module, for determining a weighting parameter corresponding to the channel feature for any channel feature, and the weighting parameter corresponding to the channel feature is used to characterize the channel feature and the target position located from the to-be-recognized image relevance;

特征修正模块，用于依据所述通道特征对应的加权参数对该通道特征进行修正处理；a feature correction module, configured to perform correction processing on the channel feature according to the weighting parameter corresponding to the channel feature;

目标位置定位模块，用于利用修正后的各个通道特征定位所述待识别图像中包含目标的目标位置。A target position locating module is used for locating the target position containing the target in the to-be-recognized image by using the corrected features of each channel.

可选的，所述特征提取模块具体用于：Optionally, the feature extraction module is specifically used for:

所述装置还包括训练模块，所述训练模块用于：The device also includes a training module for:

对所述神经网络经过一定数量的训练样本的训练后，得到训练好的神经网络。After training the neural network with a certain number of training samples, a trained neural network is obtained.

可选的，所述加权参数确定模块具体用于：Optionally, the weighting parameter determination module is specifically used for:

将所述卷积层输出的多个通道特征输入到所述全连接层，由所述全连接层确定每个通道特征对应的加权参数。The multiple channel features output by the convolution layer are input to the fully connected layer, and the fully connected layer determines the weighting parameter corresponding to each channel feature.

对每个通道特征中各特征进行求导得到各特征的导数；The derivative of each feature is obtained by derivation of each feature in each channel feature;

将计算出的每个通道的各特征的导数的平均值作为每个通道特征对应的加权参数。The calculated average value of the derivatives of each feature of each channel is taken as the weighting parameter corresponding to each channel feature.

可选的，所述目标位置定位模块具体用于：Optionally, the target location positioning module is specifically used for:

确定大于阈值的响应值对应的位置，将包括大于阈值的响应值对应的位置的区域作为目标的位置。The position corresponding to the response value greater than the threshold value is determined, and the area including the position corresponding to the response value greater than the threshold value is used as the target position.

根据本发明实施例的第三方面，提供一种计算机可读存储介质，其上存储有计算机程序，所述程序被处理器执行时实现上述任一项所述方法。According to a third aspect of the embodiments of the present invention, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, any one of the methods described above is implemented.

根据本发明实施例的第四方面，提供一种电子设备，包括处理器和机器可读存储介质，所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令，所述处理器被所述机器可执行指令促使执行上述任一项所述方法。According to a fourth aspect of the embodiments of the present invention, there is provided an electronic device including a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the The processor is caused by the machine-executable instructions to perform any of the methods described above.

根据上述技术方案可知，该目标定位方法可提高对目标定位的准确性。According to the above technical solutions, the target positioning method can improve the accuracy of target positioning.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本发明。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

图1是根据本发明一示例性实施例提供的目标定位方法的工作流程图；FIG. 1 is a work flow diagram of a target positioning method provided according to an exemplary embodiment of the present invention;

图2是根据本发明另一示例性实施例提供的目标定位方法的工作流程图；FIG. 2 is a work flow diagram of a target positioning method provided according to another exemplary embodiment of the present invention;

图3A-图3C是采用本发明实施例提供的目标定位方法从待识别图像中定位出的目标位置的效果图；3A-3C are effect diagrams of the target position positioned from the image to be identified by using the target positioning method provided by the embodiment of the present invention;

图4是根据本发明一示例性实施例提供的目标定位方法对目标位置进行定位过程的示意图；4 is a schematic diagram of a process of locating a target position according to a target locating method provided by an exemplary embodiment of the present invention;

图5是根据本发明一示例性实施例提供的多通道特征的可视化分析的示意图；5 is a schematic diagram of a visual analysis of multi-channel features provided according to an exemplary embodiment of the present invention;

图6是根据本发明又一实施例提供的目标定位装置的方框图；6 is a block diagram of a target positioning apparatus provided according to another embodiment of the present invention;

图7是根据本发明一实施例提供的电子设备的硬件结构图。FIG. 7 is a hardware structure diagram of an electronic device provided according to an embodiment of the present invention.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with some aspects of the invention as recited in the appended claims.

本发明的目标定位方法是基于图像识别技术的定位方法，通过对图像进行识别定位出特定目标在图像中的位置，本发明提供的目标定位方法有利于提高目标定位的准确率。The target localization method of the present invention is a localization method based on image recognition technology, and the position of a specific target in the image is located by recognizing the image, and the target localization method provided by the present invention is beneficial to improve the accuracy of target localization.

下面给出几个具体的实施例，用于详细介绍本申请的技术方案。下面这几个具体的实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例不再赘述。Several specific embodiments are given below to introduce the technical solutions of the present application in detail. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

图1为本发明一示例性实施例提供的目标定位方法的流程图，参照图1所示，该目标定位方法，包括：FIG. 1 is a flowchart of a target positioning method provided by an exemplary embodiment of the present invention. Referring to FIG. 1 , the target positioning method includes:

步骤S10、对待识别图像进行特征提取得到多个通道特征；Step S10, performing feature extraction on the image to be recognized to obtain multiple channel features;

步骤S20、针对任意一个通道特征，确定每个通道特征对应的加权参数，通道特征对应的加权参数用于表征该通道特征与从所述待识别图像中定位出的目标位置的相关度；Step S20, for any channel feature, determine the weighting parameter corresponding to each channel feature, and the weighting parameter corresponding to the channel feature is used to characterize the correlation between the channel feature and the target position positioned from the to-be-recognized image;

步骤S30、依据所述通道特征对应的加权参数对该通道特征进行修正处理；Step S30, modifying the channel feature according to the weighting parameter corresponding to the channel feature;

步骤S40、利用修正后的各个通道特征定位所述待识别图像中包含目标的目标位置。Step S40 , using the corrected channel features to locate the target position containing the target in the to-be-recognized image.

本发明属于机器视觉中弱监督场景下的目标定位/检测领域，即用来训练定位/检测算法的信息不是通常所用的矩形框标定信息(Bounding Box Annotation)，而是图片的类别信息。虽然仅使用图片的类别信息，通过数据层的随机遮挡和特征图的加权等操作，该方法可以较为准确的定位出目标的位置，可以作为样本标定、目标分类、识别等任务的前置模块，减轻这些任务的学习难度。The invention belongs to the field of target positioning/detection under weak supervision in machine vision, that is, the information used to train the positioning/detection algorithm is not the commonly used Bounding Box Annotation, but the category information of the picture. Although only the category information of the image is used, through random occlusion of the data layer and weighting of the feature map, this method can more accurately locate the position of the target, and can be used as a pre-module for tasks such as sample calibration, target classification, and recognition. Reduce the learning difficulty of these tasks.

本发明提出的弱监督场景下的目标定位方法，在弱监督信息(图片类别信息)的指导下调节网络参数，提取目标的代表性特征；该特征会对目标的不同部位产生特定响应，通过对该特征图进行加权可以得到目标的类型以目标的位置信息。The target positioning method in the weakly supervised scene proposed by the present invention adjusts the network parameters under the guidance of weakly supervised information (picture category information), and extracts the representative features of the target; The feature map can be weighted to obtain the type of the target and the location information of the target.

待识别图像可以为图像采集设备(例如摄像机或者摄像头等)实时采集的图像，或者是应用该方法的设备预先存储的图像。The image to be recognized may be an image collected in real time by an image collection device (eg, a camera or a camera head, etc.), or an image pre-stored by a device to which the method is applied.

可利用图像识别算法或者基于深度学习的神经网络对待识别图像(以下简称图像)进行识别，从图像中提取多个通道特征，通道特征指对图像进行特征检测的输出结果，一个通道特征表示对某个特征检测的输出结果；对于通过神经网络对图像进行特征提取而言，通道特征是采用卷积过滤器进行过滤处理后的输出结果，也可称为特征图，通道特征的数量与采用的卷积过滤器的数量有关系。Image recognition algorithm or neural network based on deep learning can be used to recognize the image to be recognized (hereinafter referred to as image), and multiple channel features can be extracted from the image. The channel feature refers to the output result of feature detection on the image. The output results of feature detection; for the feature extraction of images through neural networks, the channel features are the output results after filtering with convolution filters, which can also be called feature maps. The number of channel features is related to the volume used. The number of product filters is related.

通道特征可表示出图像的全部特征(即物体级别特征)或者局部特征(即物体某个部位特征，例如头部特征、躯干特征)等，全部特征例如图像的纹理特征、颜色特征，空间关系特征等，局部特征例如图像中包含物体的各部分的形状特征、物体边缘特征等。Channel features can represent all the features of the image (ie, object-level features) or local features (ie, the features of a certain part of the object, such as head features, torso features), etc. All features such as image texture features, color features, and spatial relationship features etc., local features such as shape features of various parts of an object, edge features of objects, etc. included in the image.

目标指预从图像中识别出的特定型的物体，对于需要识别何种物体类型，即目标，与能够提取到的通道特征和基于通道特征进行分类的算法有关，目标位置也即图像中目标所在的区域，对于图像识别而言，可在图像中界定出目标所在区域，例如通过四方形边框、多边形边框或者其他形状的边框界定出目标所在区域，该边框框的位置即为目标位置。The target refers to a specific type of object that is pre-recognized from the image. The type of object that needs to be recognized, that is, the target, is related to the channel features that can be extracted and the algorithm for classification based on channel features. The target position is also the target in the image. For image recognition, the target area can be defined in the image, for example, the target area is defined by a square frame, a polygon frame or a frame of other shapes, and the position of the frame is the target position.

通道特征对应的加权参数用于该通道特征与从待识别图像中定位出的目标位置的相关度，就是说加权参数可以表示对应的通道特征对于目标位置识别结果的影响大小，加权参数越大表示该通道特征对目标位置识别结果影响越大，举例而言，对于识别图像中的车辆而言，能够表征车辆的车轮部分、车窗部分和车标部分的通道特征对于车辆位置识别结果的影响较大，因此，这些通道特征的加权参数较大，表征车辆的颜色或者纹理等的通道特征对于合理识别结果的影响较小，这些通道特征的加权参数较小。The weighting parameter corresponding to the channel feature is used for the correlation between the channel feature and the target position located from the image to be recognized, that is to say, the weighting parameter can represent the influence of the corresponding channel feature on the target position recognition result. The channel feature has a greater impact on the target position recognition result. For example, for identifying a vehicle in an image, the channel feature that can characterize the wheel part, the window part and the car logo part of the vehicle has a greater impact on the vehicle position recognition result. Therefore, the weighting parameters of these channel features are relatively large, and the channel features representing the color or texture of the vehicle have little influence on the reasonable recognition results, and the weighting parameters of these channel features are relatively small.

通过各个通道特征对应的加权参数对各个通道特征进行修正处理后，可以强化对于目标识别结果的影响程度大的通道特征的权重，弱化对于目标识别结果的影响程度大的通道特征的权重，因此，不仅可以较为准确的识别出目标的类型，还有利于对目标所在位置进行识别，提高目标定位的准确性。After modifying each channel feature through the weighting parameter corresponding to each channel feature, the weight of the channel feature with a large influence on the target recognition result can be strengthened, and the weight of the channel feature with a large influence on the target recognition result can be weakened. Therefore, Not only can the type of the target be more accurately identified, but also the location of the target can be identified, and the accuracy of target positioning can be improved.

在一个可选的实施方式中，上述步骤S10所述的对待识别图像进行特征提取得到多通道特征，包括：In an optional implementation manner, the feature extraction of the to-be-recognized image described in the above step S10 to obtain multi-channel features includes:

将待识别图像输入到训练好的神经网络中，由神经网络的卷积层对待识别图像进行特征提取，得到多个通道特征。The image to be recognized is input into the trained neural network, and the features of the image to be recognized are extracted by the convolution layer of the neural network to obtain multiple channel features.

本实施例中，采用神经网络进行特征提取，神经网络包括卷积层，通过卷积层的卷积处理后可得到多个通道特征，可以包括一层或多层卷积层，每个卷积层可包括一个或者多个卷积核，卷积核可以一定的步长滑动对图像的各个区域进行卷积处理，经过每个卷积核进行卷积处理可得到一个通道特征，最后得到的通道特征的数量由最后一个卷积层的卷积核的数量而定。In this embodiment, a neural network is used for feature extraction. The neural network includes a convolution layer. After convolution processing of the convolution layer, multiple channel features can be obtained, which may include one or more convolution layers. Each convolution layer The layer can include one or more convolution kernels. The convolution kernel can slide with a certain step size to perform convolution processing on each area of the image. After convolution processing of each convolution kernel, a channel feature can be obtained, and the channel finally obtained. The number of features is determined by the number of convolution kernels in the last convolutional layer.

举例而言，假设卷积核的尺寸例如为4×4，图像的尺寸大小例如为16×16，步长可以是1、2、3、4或6等，卷积核以上述步长滑动，依次对图像的各个区域进行卷积处理，对整个图像卷积完成后可得到一个通道特征，经过多个卷积核进行卷积处理可得到多个通道特征。For example, assuming that the size of the convolution kernel is 4×4, the size of the image is 16×16, the step size can be 1, 2, 3, 4 or 6, etc. The convolution kernel slides with the above step size, Convolution processing is performed on each area of the image in turn. After the convolution of the entire image is completed, one channel feature can be obtained, and multiple channel features can be obtained by performing convolution processing with multiple convolution kernels.

其中，对于多个通道特征，多个通道特征可组成一三维的矩阵，该矩阵的尺寸大小可表示为H×W×C，其中，H为通道特征的高度，表示通道特征的纵向上划分的像素点的数量；W为通道特征的宽度，表示通道特征的横向上划分的像素点的数量；C表示通道数，通道数由该基础卷积网络的最后一个卷积层的卷积核的数量而定，最后一个卷积层的每一卷积核可计算出一个通道的特征图。需要指出的是，卷积计算时可以用到多个卷积核，每一卷积核计算出一个通道对应的通道特征，一个通道特征可以用H×W×1表示，一个通道特征对应一个通道。Among them, for multiple channel features, multiple channel features can form a three-dimensional matrix, and the size of the matrix can be expressed as H×W×C, where H is the height of the channel feature, indicating the longitudinal division of the channel feature. The number of pixels; W is the width of the channel feature, representing the number of pixels divided horizontally by the channel feature; C represents the number of channels, which is determined by the number of convolution kernels of the last convolutional layer of the basic convolutional network However, each convolution kernel of the last convolutional layer can compute a feature map of one channel. It should be pointed out that multiple convolution kernels can be used in convolution calculation. Each convolution kernel calculates the channel feature corresponding to one channel. One channel feature can be represented by H×W×1, and one channel feature corresponds to one channel. .

上述的神经网络为基于深度学的神经网络，例如为卷积神经网络CNNThe above-mentioned neural network is a deep learning-based neural network, such as a convolutional neural network CNN

(Convolutional Neural Network，简称CNN)，CNN是一种前馈的人工神经网络，其神经元可以响应有限覆盖范围内周围单元，并通过权值共享和特征汇聚，有效提取图像的特征信息。(Convolutional Neural Network, CNN for short), CNN is a feedforward artificial neural network whose neurons can respond to surrounding units within a limited coverage area, and effectively extract image feature information through weight sharing and feature aggregation.

对于上述的神经网络训练时，是采用弱监督方式对神经网络进行训练，训练过程包括以下步骤：For the above-mentioned neural network training, the neural network is trained in a weakly supervised manner. The training process includes the following steps:

步骤S01、搭建神经网络，神经网络包括卷积层；Step S01, build a neural network, and the neural network includes a convolution layer;

该神经网络可以包括一层或多层卷积层。The neural network may include one or more convolutional layers.

步骤S02、获取训练样本，训练样本包括标记有目标类型的标记图像；Step S02, acquiring a training sample, the training sample includes a marked image marked with a target type;

本步骤中，将标记图像作为训练样本，标记图像为标记有目标类型的图像，只需要标记出图像中有哪些类型的目标，不需要标记目标的位置等，标记图像为初略标记的图像，标记图像例如为带有“牛”、“草”、“天空”标签的图像，神经网络只知道图像中具有这些标签的物体，但是不知道这些物体的具体位置，因此对于这幅图像的每个像素，都有可能是“牛”、“草”或者“天空”。In this step, the marked image is used as a training sample, and the marked image is an image marked with the target type. It is only necessary to mark what types of targets are in the image, and the position of the target does not need to be marked. The marked image is an image that is initially marked. Labeled images are, for example, images with labels of "cow", "grass", and "sky". The neural network only knows the objects with these labels in the image, but does not know the specific positions of these objects, so for each Pixels may be "cow", "grass" or "sky".

步骤S03、将训练样本输入神经网络，以由神经网络输出对标记图像的目标类型识别结果，根据神经网络输出的目标类型识别结果和训练样本中的目标类型之间的差异，对神经网络中的参数进行更新。Step S03, input the training sample into the neural network, to output the target type recognition result of the marked image by the neural network, according to the difference between the target type recognition result output by the neural network and the target type in the training sample, to the neural network. parameters are updated.

步骤S04、对神经网络经过训练样本的训练后，得到训练好的神经网络。Step S04, after the neural network is trained on the training samples, a trained neural network is obtained.

具体而言，训练样本例如为标记图像X，通过神经网络进行特征提取得到多个通道特征，该些通道特征可以有效的保留目标的空间相对关系，用Y＝f(X)表示神经网络输出的目标类型识别结果，其中f是神经网络操作(包括卷积、池化和全连接等)的集合描述，若弱监督任务的识别为分类，则神经网络输出的目标类型识别结果Y表示该标记图像X属于该目标类型的概率；若弱监督任务的识别为分类为图像标注，则Y表示该标记图像X具有图像标注的概率。Specifically, the training sample is, for example, a labeled image X, and multiple channel features are obtained by feature extraction through a neural network. These channel features can effectively retain the spatial relative relationship of the target. Y=f(X) represents the output of the neural network. Target type recognition result, where f is the set description of neural network operations (including convolution, pooling, and full connection, etc.), if the recognition of weakly supervised tasks is classification, the target type recognition result Y output by the neural network represents the labeled image The probability that X belongs to the target type; if the recognition of the weakly supervised task is classified as image annotation, then Y represents the probability that the labeled image X has an image annotation.

通过Y与训练样本中的目标类型的差异来监督更新神经网络中的参数，使神经网络可端到端的进行训练。The parameters in the neural network are supervised and updated by the difference between Y and the target type in the training samples, so that the neural network can be trained end-to-end.

上述参数例如包括神经网络中的涉及相关函数中的参数，可通过梯度反向传播方式，修改上述参数，使得神经网络输出的目标类型识别结果和训练样本中的目标类型之间的差异最小。The above parameters include, for example, parameters related to the correlation function in the neural network. The above parameters can be modified through gradient backpropagation to minimize the difference between the target type recognition result output by the neural network and the target type in the training sample.

可将一定数量的训练样本输入神经网络对网络进行训练，在对上述神经网络经过一定数量的样本的训练后，得到训练好的神经网络。A certain number of training samples can be input into the neural network to train the network, and after the above-mentioned neural network is trained with a certain number of samples, a trained neural network is obtained.

对于上述的神经网络训练时，是采用弱监督方式对神经网络进行训练，由于对训练样本只需标记目标类型，所需的工作量远远小于需要标记出每个目标的具体位置的工作量。For the above neural network training, the neural network is trained in a weakly supervised manner. Since only the target type needs to be marked for the training samples, the workload required is far less than the workload required to mark the specific location of each target.

弱监督方式指：仅依靠弱监督力的图像级的标签进行训练，在不知道目标在图像中的具体位置的情况下，利用图像中包含的目标类型来识别并定位图像中的目标。The weakly supervised method refers to: only relying on weakly supervised image-level labels for training, and without knowing the specific position of the target in the image, the target type contained in the image is used to identify and locate the target in the image.

在一个可选的实施方式中，在获取包括标记有目标类型的标记图像的训练样本，之前还包括：In an optional implementation manner, before acquiring the training samples including the marked images marked with the target type, the method further includes:

对标记图像的部分区域进行遮挡处理。Occlusion processing is performed on part of the marked image.

对于上述弱监督方式对神经网络进行训练而言，神经网络学习到的特征主要为目标的显著区域的特征，较难以学习到目标的非显著区域的特征，基于此，为了迫使神经网络去关注非显著区域，能学习到非显著区域的通用特征，通过对样本图像进行随机遮挡，使神经网络不仅仅学习到样本图像中的显著特征，还能学习到样本图像中的通用特征，进而提升定位准确率。对训练样本，对标记图像的部分区域进行遮挡，对于大量的训练样本而言，可随机的对每个标记图像随机遮挡其中的部分区域，遮挡的方式例如为，可将标记图像划分为不同大小(如32*32或64*64)的区域，以一定概率将其中的一个或者部分区域的颜色转变为黑色，以对部分区域进行遮挡。For the training of the neural network in the above weak supervision method, the features learned by the neural network are mainly the features of the salient regions of the target, and it is difficult to learn the features of the non-salient regions of the target. Based on this, in order to force the neural network to pay attention to non-salient regions The salient area can learn the general features of the non-salient area. By randomly occluding the sample image, the neural network can not only learn the salient features in the sample image, but also learn the general features in the sample image, thereby improving the accuracy of positioning. Rate. For training samples, part of the marked image is occluded. For a large number of training samples, part of each marked image can be randomly occluded. For example, the marked image can be divided into different sizes. (such as 32*32 or 64*64) area, convert the color of one or part of the area to black with a certain probability to block part of the area.

在一些例子中，上述步骤S20所述的确定各个目标图像特征对应的图像识别系数，包括：In some examples, determining the image recognition coefficient corresponding to each target image feature in the above step S20 includes:

步骤S21、将卷积层输出的多个通道特征输入到所述全连接层，由全连接层确定各个通道特征对应的加权参数。Step S21 , inputting multiple channel features output by the convolutional layer into the fully connected layer, and the fully connected layer determines the weighting parameters corresponding to each channel feature.

经过卷积层卷积处理后的输出多个通道特征，各个通道特征可以表征目标的形状和颜色的全部区域特点和局部区域等特点，将多个通道特征输入到全连接层后，全连接层可按照一定规律对多个通道特征进行筛选处理，确定出目标的关键区域和非关键区域，据此，确定每个通道特征对应的加权参数，对于目标的关键区域而言其对应的通道特征的加权参数较大，对于目标的非关键区域而言其对应的通道特征的加权参数较小。After the convolutional layer convolution process, multiple channel features are output. Each channel feature can represent the shape and color of the target in all regions and local regions. After inputting multiple channel features into the fully connected layer, the fully connected layer According to certain rules, multiple channel features can be screened to determine the key area and non-critical area of the target. Based on this, the weighting parameter corresponding to each channel feature is determined. For the key area of the target, the corresponding channel feature is determined. The weighting parameter is larger, and the weighting parameter of the corresponding channel feature is smaller for the non-critical area of the target.

在一些例子中，还可以通过下述方法确定每个通道对应的加权参数，该方法包括：In some examples, the weighting parameter corresponding to each channel can also be determined by the following method, and the method includes:

步骤S22、对通道特征中各特征进行求导得到各特征的导数；Step S22, derivation of each feature in the channel feature to obtain the derivative of each feature;

步骤S23、将计算出的各特征的导数的加权平均值作为该通道特征对应的加权参数。Step S23 , taking the calculated weighted average of the derivatives of each feature as a weighting parameter corresponding to the channel feature.

上述实施例是通过神经网络的全连接层确定每个通道特征对应的加权参数，本实施例中是确定每个通道特征对应的加权参数的另一种方法，具体而言，每个通道特征可以包括多个位置的特征，对于每个通道特征而言，对每个位置的特征进行求导，得到各特征的导数，然后计算这些导数的权平均值，将该平均值作为该通道特征对应的加权参数。The above embodiment is to determine the weighting parameter corresponding to each channel feature through the fully connected layer of the neural network. In this embodiment, it is another method for determining the weighting parameter corresponding to each channel feature. Specifically, each channel feature can be Including the features of multiple positions, for each channel feature, the feature of each position is derived to obtain the derivative of each feature, and then the weighted average value of these derivatives is calculated, and the average value is used as the corresponding channel feature. Weighting parameters.

对于每个通道特征而言，其可以用函数表示，对该函数上的各点可对应各位置的特征，可计算该函数上各点的导数，即可得到各特征的导数，函数在某一点的导数具体而言是该函数所代表的曲线在这一点上的切线斜率。For each channel feature, it can be represented by a function, each point on the function can correspond to the feature of each position, and the derivative of each point on the function can be calculated, and the derivative of each feature can be obtained. The derivative of is specifically the slope of the tangent at this point of the curve represented by the function.

本实施例中，通过对卷积层输出的通道特征求导的方式确定通道特征对应的加权参数，通过求导操作有利于获取目标的轮廓特征和纹理特征等，并且可弱化图像光照对目标识别的影响，因此，有利于提高目标定位的准确度。In this embodiment, the weighting parameters corresponding to the channel features are determined by derivation of the channel features output by the convolution layer, and the derivation operation is beneficial to obtain the contour features and texture features of the target, and can weaken the image illumination to identify the target. The effect, therefore, is beneficial to improve the accuracy of target localization.

在一个可选的实施方式中，如图2所示，上述步骤S40所述的利用修正后的各个通道特征定位待识别图像中包含目标的目标位置，包括：In an optional implementation manner, as shown in FIG. 2 , using the corrected channel features to locate the target position containing the target in the to-be-recognized image as described in the above step S40 includes:

步骤S41、根据修正后的各个通道特征获取待识别图像的各位置的响应值，响应值表示该位置存在目标的概率；Step S41, obtaining the response value of each position of the to-be-recognized image according to the corrected channel characteristics, and the response value represents the probability that a target exists in the position;

步骤S42、确定大于阈值的响应值对应的位置，将包括大于阈值的响应值对应的位置的区域作为目标位置。Step S42: Determine the position corresponding to the response value greater than the threshold, and use the area including the position corresponding to the response value greater than the threshold as the target position.

卷积层输出的多个通道特征为多个维度的数据，例如，多个通道特征F(X_o)组成形状为H*W*C的矩阵，其中，H表示矩阵的高度，W表示矩阵的宽度，C表示通道数目，该矩阵中各个位置的值分别对应一个通道特征。The multiple channel features output by the convolutional layer are data of multiple dimensions. For example, multiple channel features F(X_o) form a matrix of shape H*W*C, where H represents the height of the matrix and W represents the width of the matrix , C represents the number of channels, and the value of each position in the matrix corresponds to a channel feature.

根据加权参数对各通道特征进行加权后，得到的修正后的每个通道特征可表示图像的各位置的响应值，响应值表示该位置存在目标的概率，即该位置存在目标的可能性的大小，响应值越大表示该位置存在目标的可能性越大，通过各个通道特征对应的加权参数对各个通道特征进行修正处理后，可以强化对于目标识别结果的影响程度大的通道特征的权重，弱化对于目标识别结果的影响程度大的通道特征的权重，有利用更准确的定位目标位置。After each channel feature is weighted according to the weighting parameter, the obtained corrected channel feature can represent the response value of each position of the image, and the response value represents the probability of the existence of the target at the position, that is, the possibility of the existence of the target at the position. , the larger the response value, the greater the possibility of the existence of the target at the position. After the correction of each channel feature is performed through the weighting parameters corresponding to each channel feature, the weight of the channel feature that has a large impact on the target recognition result can be strengthened and weakened. For the weight of the channel feature that has a large influence on the target recognition result, it is possible to use more accurate positioning of the target position.

各位置的响应值可能并不相同，即图像的各位置存在目标的概率并不相同，为了进一步的定位目标所在的位置，设置一阈值，仅保留响应值大于阈值的位置，这些位置为存在目标可能性大的位置，这样可以从图像中筛选出目标的各个局部所在位置，滤除背景，例如，若目标为人，响应值大于阈值的各位置可以包括头部、身体、脚、胳膊等部分所在位置，这些位置可以表示目标的各个局部区域所在的位置，将包含这些位置的区域作为目标位置，例如，画出一个包含这些位置的外接矩形作为标注框，该标注框所在区域即为目标位置，实现对目标的定位。The response value of each position may be different, that is, the probability of the existence of the target at each position of the image is not the same. In order to further locate the position of the target, a threshold is set, and only the positions with the response value greater than the threshold are reserved. These positions are the existing targets. Locations with high probability, so that the location of each part of the target can be screened from the image, and the background can be filtered out. For example, if the target is a human, the positions with the response value greater than the threshold value can include the head, body, feet, arms and other parts. Position, these positions can represent the position of each local area of the target, and the area containing these positions is used as the target position. For example, draw a bounding rectangle containing these positions as a callout frame, and the area where the callout frame is located is the target position. achieve target positioning.

图3A-图3C中示出了采用上述目标定位方法从各张待识别图片中定位出目标位置的效果图，图3A为定位出的车辆的位置，图3B为定位出的飞机的位置，图3C为定位出的鸟的位置，由上述图可以看出，本发明提供的定位方法可以从图像中较为准确的定位出目标位置。Fig. 3A-Fig. 3C show the effect diagrams of locating the target position from each picture to be identified by using the above-mentioned target positioning method, Fig. 3A is the position of the positioned vehicle, Fig. 3B is the position of the positioned aircraft, Fig. 3C is the position of the located bird. It can be seen from the above figure that the positioning method provided by the present invention can more accurately locate the target position from the image.

下面以待识别图像为一幅包括人和狗的图像为例说明上述目标定位方法，参照图4所示，包括人和狗的图像为待识别图像，狗为目标，具体识别过程为：The above-mentioned target positioning method is described below by taking the image to be recognized as an image including a human and a dog as an example. Referring to FIG. 4 , an image including a human and a dog is an image to be recognized, and the dog is the target. The specific recognition process is:

将该图像输入神经网络中，神经网络包括多层卷积层，经过卷积层对图像进行卷积处理后得到多个通道特征；The image is input into the neural network, the neural network includes multiple convolutional layers, and multiple channel features are obtained after convolutional processing of the image through the convolutional layer;

确定各个通道特征对应的加权参数；Determine the weighting parameters corresponding to each channel feature;

在对神经网络进行训练过程中，可将卷积层得到的多个通道特征经过池化层的处理后输入到全连接层，由全连接层确定各个通道特征对应的加权参数，例如图4中的各通道特征的加权参数分别为w₁、w₂、…、w_n。In the process of training the neural network, the multiple channel features obtained by the convolutional layer can be processed by the pooling layer and then input to the fully connected layer, and the fully connected layer determines the weighting parameters corresponding to each channel feature, such as in Figure 4. The weighting parameters of each channel feature of , are w₁ , w₂ , ...,_wn , respectively.

依据各个通道特征对应的加权参数对各个通道特征进行修正处理；Correct each channel feature according to the weighting parameter corresponding to each channel feature;

根据修正后的各个通道特征获取待识别图像的各位置的响应值；Obtain the response value of each position of the to-be-recognized image according to the corrected characteristics of each channel;

参照图5所示，图5中为对通道特征进行可视化分析的示意图，根据各通道特征可获取待识别图像的各位置的响应值，例如，图5中等号左边的各张图分别表示各通道特征对目标的响应图，从图5中可以看出各通道特征可表示图像中各个位置存在目标的概率，图中亮度较高的区域为该通道特征存在目标可能性大的区域，响应值越大的位置表示存在目标的可能性大。Referring to Figure 5, Figure 5 is a schematic diagram of visual analysis of channel features. According to each channel feature, the response value of each position of the image to be recognized can be obtained. For example, each picture on the left side of the symbol in Figure 5 represents each channel The response diagram of the feature to the target, it can be seen from Figure 5 that each channel feature can represent the probability of the target existing at each position in the image. The area with higher brightness in the figure is the area where the channel feature has a high probability of having a target. A large position indicates a high probability of the presence of the target.

最后，通过对各个位置进行阈值筛选可得到目标位置，确定大于阈值的响应值对应的位置，将包括大于阈值的响应值对应的位置的区域作为目标位置，响应值大于阈值的各位置可所示的目标的各局部所在位置，可通过标注框标识出筛选出的包含各位置的区域，该区域即为目标的位置，例如，图5中等号右边的图中用标注框标识出目标位置。Finally, the target position can be obtained by threshold screening of each position, the position corresponding to the response value greater than the threshold value is determined, and the area including the position corresponding to the response value greater than the threshold value is taken as the target position, and each position with the response value greater than the threshold value can be shown as The location of each part of the target can be marked by a callout box to identify the screened area containing each location, and this area is the location of the target.

本发明实施例还提供了一种目标定位装置，如图6所示，该目标定位别装置06包括：The embodiment of the present invention also provides a target positioning device. As shown in FIG. 6 , thetarget positioning device 06 includes:

特征提取模块61，用于对待识别图像进行特征提取得到多个通道特征；Thefeature extraction module 61 is used to perform feature extraction on the image to be recognized to obtain multiple channel features;

加权参数确定模块62，用于针对任意一个通道特征，确定所述通道特征对应的加权参数，所述通道特征对应的加权参数用于表征该通道特征与从所述待识别图像中定位出的目标位置的相关度；The weightingparameter determination module 62 is used to determine the weighting parameter corresponding to the channel feature for any channel feature, and the weighting parameter corresponding to the channel feature is used to characterize the channel feature and the target located from the to-be-recognized image. the relevance of the location;

特征修正模块63，用于依据所述通道特征对应的加权参数对该通道特征进行修正处理；Afeature correction module 63, configured to perform correction processing on the channel feature according to the weighting parameter corresponding to the channel feature;

目标位置定位模块64，用于利用修正后的各个通道特征定位所述待识别图像中包含目标的目标位置。The targetposition locating module 64 is configured to locate the target position containing the target in the to-be-recognized image by using the corrected channel features.

在一些例子中，所述特征提取模块具体用于：In some examples, the feature extraction module is specifically used to:

在一个可选的实施方式中，在获取包括标记有目标类型的标记图像的训练样本之后，还包括：In an optional implementation manner, after acquiring the training samples including the marked images marked with the target type, the method further includes:

在一个可选的实施方式中，所述加权参数确定模块具体用于：In an optional implementation manner, the weighting parameter determination module is specifically configured to:

例如，所述加权参数确定模块具体用于：For example, the weighting parameter determination module is specifically used for:

在一些例子中，所述目标位置定位模块具体用于：In some examples, the target location locating module is specifically used to:

与前述目标定位方法的实施例相对应，本发明提供的目标定位装置可提高目标定位的准确性。Corresponding to the foregoing embodiments of the target positioning method, the target positioning device provided by the present invention can improve the accuracy of target positioning.

对于装置实施例而言，其中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程，在此不再赘述。For the device embodiment, the implementation process of the functions and functions of each unit is detailed in the implementation process of the corresponding steps in the above method, which will not be repeated here.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the present application. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本实施例的装置可借助软件的方式实现，或者软件加必需的通用硬件的方式来实现，当然也可以通过硬件实现。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，以软件实现为例，作为一个逻辑意义上的装置，是通过应用该装置的设备所在的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。Through the description of the above embodiments, the apparatus of this embodiment can be implemented by means of software, or by means of software plus necessary general-purpose hardware, and of course, it can also be implemented by hardware. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or that contribute to the prior art. Taking software implementation as an example, as a logical device, it is achieved by applying the The processor where the device of the apparatus is located reads the corresponding computer program instructions in the non-volatile memory into the memory for execution.

本发明还提供一种计算机可读存储介质，其上存储有计算机程序，所述程序被处理器执行时实现上述任一实施例所述方法的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the method described in any of the foregoing embodiments.

参见图7，本发明还提供一种电子设备的硬件架构图，该电子设备包括：通信接口101、处理器102、机器可读存储介质103、非易失性存储介质104和总线105；其中，通信接口101、处理器102、机器可读存储介质103和非易失性存储介质104通过总线105完成相互间的通信。处理器102通过读取并执行机器可读存储介质103中与目标定位方法的控制逻辑对应的机器可执行指令，可执行上文描述的目标定位方法。7, the present invention also provides a hardware architecture diagram of an electronic device, the electronic device includes: acommunication interface 101, aprocessor 102, a machine-readable storage medium 103, anon-volatile storage medium 104, and abus 105; wherein, Thecommunication interface 101 , theprocessor 102 , the machine-readable storage medium 103 and thenon-volatile storage medium 104 communicate with each other through thebus 105 . Theprocessor 102 can execute the above-described target positioning method by reading and executing the machine-executable instructions corresponding to the control logic of the target positioning method in the machine-readable storage medium 103 .

本文中提到的机器可读存储介质103可以是任何电子、磁性、光学或其它物理存储装置，可以包含或存储信息，如可执行指令、数据，等等。例如，机器可读存储介质可以是：RAM(Radom Access Memory，随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、任何类型的存储盘(如光盘、dvd等)，或者类似的存储介质，或者它们的组合。The machine-readable storage medium 103 referred to herein can be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: RAM (Random Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage drives (such as hard disk drives), any type of storage disks (such as optical disks) , DVD, etc.), or similar storage media, or a combination thereof.

此外，电子设备可以为各种终端设备或者后端设备，例如摄像机、服务器、移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备，仅举几例。In addition, the electronic devices may be various end devices or back-end devices, such as cameras, servers, mobile phones, personal digital assistants (PDAs), mobile audio or video players, game consoles, global positioning system (GPS) receivers, Or portable storage devices such as Universal Serial Bus (USB) flash drives, to name a few.

本领域技术人员在考虑说明书及实践这里公开的公开后，将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本发明未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本发明的真正范围和精神由权利要求指出。Other embodiments of the invention will readily suggest themselves to those skilled in the art upon consideration of the specification and practice of the disclosure disclosed herein. The present invention is intended to cover any variations, uses or adaptations of the present invention which follow the general principles of the invention and which include common knowledge or conventional techniques in the technical field not disclosed by the present invention . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the invention being indicated by the claims.