技术领域Technical field
本发明涉及图像处理技术领域,特别是涉及一种图像识别方法、装置及电子设备。The present invention relates to the field of image processing technology, and in particular, to an image recognition method, device and electronic equipment.
背景技术Background technique
在一些应用场景中,出于实际需求,可能需要利用机器对图像进行识别。例如,智能交通中可能需要自动识别图像中的车辆,又例如商场管理中可能需要识别图像中的人员。例如,可以是利用多个串联的卷积层反复提取待识别图像的图像特征,直至得到满足条件的图像特征,再利用多个串联的卷积层或者反卷积层将得到的深层图像特征经过多次映射,映射至识别结果。In some application scenarios, due to actual needs, it may be necessary to use machines to recognize images. For example, in intelligent transportation, it may be necessary to automatically identify vehicles in images, and in shopping mall management, it may be necessary to identify people in images. For example, multiple convolution layers in series can be used to repeatedly extract image features of the image to be recognized until image features that meet the conditions are obtained, and then multiple convolution layers or deconvolution layers in series can be used to process the obtained deep image features. Mapping multiple times to the recognition results.
但是如果提取的图像特征是深层图像特征,则图像特征中可能缺少较多的纹理信息,如果提取的图像特征是浅层图像特征,则图像特征中可能缺少较多的语义信息,因此提取到的图像特征难以适用于不同的图像识别任务,需要针对不同的图像识别任务采用不同的方式提取图像特征,导致图像识别流程复杂。However, if the extracted image features are deep image features, the image features may lack more texture information. If the extracted image features are shallow image features, the image features may lack more semantic information. Therefore, the extracted image features Image features are difficult to apply to different image recognition tasks. Different methods need to be used to extract image features for different image recognition tasks, resulting in a complicated image recognition process.
发明内容Contents of the invention
本发明实施例的目的在于提供一种图像识别方法、装置及电子设备,以实现简化图像识别的流程。具体技术方案如下:The purpose of embodiments of the present invention is to provide an image recognition method, device and electronic equipment to simplify the image recognition process. The specific technical solutions are as follows:
在本发明实施例的第一方面,提供了一种图像识别方法,所述方法包括:In a first aspect of the embodiment of the present invention, an image recognition method is provided. The method includes:
获取待识别图像在多个不同降采样倍率下的多个图像特征;Obtain multiple image features of the image to be recognized at multiple different downsampling ratios;
针对所述多个不同降采样倍率中的每个降采样倍率,融合所述多个图像特征在该降采样倍率下的投影,得到所述待识别图像在该将降采样倍率下的融合特征;For each of the plurality of different down-sampling magnifications, fuse the projections of the plurality of image features at the down-sampling magnification to obtain the fusion features of the image to be identified at the down-sampling magnification;
根据所述待识别图像在所有降采样倍率下的融合特征,确定所述待识别图像的识别结果。The recognition result of the image to be recognized is determined based on the fusion features of the image to be recognized at all downsampling magnifications.
结合第一方面,在第一种可能的实现方式中,所述针对所述多个不同降采样倍率中的每个降采样倍率,融合所述多个图像特征在该降采样倍率下的投影,得到所述待识别图像在该将采样倍率下的融合特征,包括:In conjunction with the first aspect, in a first possible implementation, for each of the plurality of different down-sampling magnifications, the projection of the multiple image features at the down-sampling magnification is fused, Obtaining the fusion features of the image to be recognized at the sampling magnification includes:
针对所述多个不同降采样倍率中的每个降采样倍率,重复执行以下步骤直至重复执行的次数达到预设次数,所述预设次数不小于所述多个不同采样倍率的数目:For each of the multiple different down-sampling ratios, the following steps are repeatedly performed until the number of repetitions reaches a preset number of times, and the preset number of times is not less than the number of the multiple different sampling ratios:
将所述待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征,所述相邻降采样倍率为将所述多个降采样倍率按照由大到小或由小到大的顺序进行排序时与该降采样倍率相邻的降采样倍率;Project the image features of the image to be identified at an adjacent down-sampling magnification to the down-sampling magnification to the down-sampling magnification to obtain projection features. The adjacent down-sampling magnification is to convert the multiple down-sampling magnifications according to The downsampling rate adjacent to the downsampling rate when sorted from large to small or from small to large;
融合所述投影特征与所述待识别图像在该降采样倍率下的图像特征,得到所述待识别图像在该降采样倍率下的新的图像特征;Fusion of the projection features and the image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate;
当重复执行的次数到达预设次数时,将每个降采样倍率下的图像特征作为所述待识别图像在该将降采样倍率下的融合特征。When the number of repeated executions reaches a preset number, the image features at each down-sampling rate are used as the fusion features of the image to be identified at the down-sampling rate.
结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述融合所述投影特征与所述待识别图像在该降采样倍率下的图像特征,得到所述待识别图像在该降采样倍率下的新的图像特征,包括:Combined with the first possible implementation of the first aspect, in a second possible implementation, the projection features and the image features of the image to be identified at the downsampling magnification are fused to obtain the image to be identified. Identify new image features of the image at this downsampling rate, including:
如果不是第预设次数重复执行,则融合所述投影特征与所述待识别图像在该降采样倍率下最新的图像特征,得到所述待识别图像在该降采样倍率下的新的图像特征;If it is not repeated for the preset number of times, fuse the projection features and the latest image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate;
如果是第预设次数重复执行,则融合所述投影特征、所述待识别图像在该降采样倍率下最新的图像特征、以及所述待识别图像在该降采样倍率下初始的图像特征。If it is repeated for the preset number of times, the projection features, the latest image features of the image to be recognized at the downsampling rate, and the initial image features of the image to be recognized at the downsampling rate are fused.
结合第一方面的第一种可能的实现方式,在第三种可能的实现方式中,所述将所述待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征,包括:With reference to the first possible implementation of the first aspect, in a third possible implementation, the image features of the image to be identified at a down-sampling magnification adjacent to the down-sampling magnification are projected to the Downsample the magnification to obtain the projection features, including:
对相邻降采样倍率中小于该降采样倍率的降采样倍率下的图像特征,进行步长大于1的池化处理,得到在该降采样倍率下的投影特征;Perform pooling processing with a step size greater than 1 on the image features at a downsampling rate that is smaller than the downsampling rate among adjacent downsampling rates to obtain the projection features at the downsampling rate;
对相邻降采样倍率中大于该降采样倍率的降采样倍率下的图像特征,进行上采样处理,得到在该降采样倍率下的投影特征。Perform an upsampling process on the image features at a downsampling rate greater than the downsampling rate among adjacent downsampling rates to obtain projection features at the downsampling rate.
结合第一方面,在第四种可能的实现方式中,所述根据所述待识别图像在所有降采样倍率下的融合特征,确定所述待识别图像的识别结果,包括:In conjunction with the first aspect, in a fourth possible implementation, determining the recognition result of the image to be recognized based on the fusion features of the image to be recognized at all downsampling magnifications includes:
将所述待识别图像在所有降采样倍率下的融合特征,输入至预先建立的识别模型进行识别,得到所述待识别图像的识别结果;Input the fusion features of the image to be recognized at all downsampling magnifications into a pre-established recognition model for recognition, and obtain the recognition result of the image to be recognized;
所述识别模型包括多个目标检测子模型、语义分割子模型、实例分割子模型以及姿态点估计子模型中的多个子模型;The recognition model includes multiple sub-models of a plurality of target detection sub-models, a semantic segmentation sub-model, an instance segmentation sub-model and a pose point estimation sub-model;
所述识别模型通过以下方式训练:The recognition model is trained in the following ways:
获取样本图像在所述多个不同降采样倍率下的多个图像特征,所述样本图像针对每个子模型标注有真值;Obtain multiple image features of the sample image under the multiple different downsampling magnifications, and the sample image is marked with a true value for each sub-model;
针对所述多个不同降采样倍率中的每个降采样倍率,融合所述多个图像特征在该降采样倍率下的投影,得到所述样本图像在该将降采样倍率下的融合特征;For each of the plurality of different down-sampling magnifications, fuse the projections of the plurality of image features at the down-sampling magnification to obtain the fusion features of the sample image at the down-sampling magnification;
将所述待识别图像在所有降采样倍率下的融合特征,输入至所述识别模型的每个子模型,得到所述识别模型的所有子模型输出的预测值;Input the fused features of the image to be recognized at all downsampling magnifications into each sub-model of the recognition model to obtain predicted values output by all sub-models of the recognition model;
针对每个子模型,根据该子模型输出的预测值与所述样本图像针对该子模型标注的真值间的损失,通过针对该子模型预设的训练方式,调整该子模型的模型参数。For each sub-model, based on the loss between the predicted value output by the sub-model and the true value marked by the sample image for the sub-model, the model parameters of the sub-model are adjusted through a preset training method for the sub-model.
在本发明实施例的第二方面,提供了一种图像识别装置,所述装置包括:In a second aspect of the embodiment of the present invention, an image recognition device is provided, and the device includes:
特征提取模块,用于获取待识别图像在多个不同降采样倍率下的多个图像特征;The feature extraction module is used to obtain multiple image features of the image to be recognized at multiple different downsampling magnifications;
特征融合模块,用于针对所述多个不同降采样倍率中的每个降采样倍率,融合所述多个图像特征在该降采样倍率下的投影,得到所述待识别图像在该将降采样倍率下的融合特征;A feature fusion module, configured to fuse the projections of the multiple image features at the down-sampling rate for each of the multiple different down-sampling rates to obtain the down-sampled image of the image to be identified. Fusion characteristics under magnification;
识别模块,用于根据所述待识别图像在所有降采样倍率下的融合特征,确定所述待识别图像的识别结果。A recognition module, configured to determine the recognition result of the image to be recognized based on the fusion features of the image to be recognized at all downsampling magnifications.
结合第二方面,在第一种可能的实现方式中,所述特征融合模块,具体用于针对所述多个不同降采样倍率中的每个降采样倍率,重复执行以下步骤直至重复执行的次数达到预设次数,所述预设次数不小于所述多个不同采样倍率的数目:In conjunction with the second aspect, in a first possible implementation, the feature fusion module is specifically configured to repeatedly perform the following steps for each of the multiple different down-sampling ratios until the number of repetitions The preset number of times is reached, and the preset number of times is not less than the number of the multiple different sampling ratios:
将所述待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征,所述相邻降采样倍率为将所述多个降采样倍率按照由大到小或由小到大的顺序进行排序时与该降采样倍率相邻的降采样倍率;Project the image features of the image to be identified at an adjacent down-sampling magnification to the down-sampling magnification to the down-sampling magnification to obtain projection features. The adjacent down-sampling magnification is to convert the multiple down-sampling magnifications according to The downsampling rate adjacent to the downsampling rate when sorted from large to small or from small to large;
融合所述投影特征与所述待识别图像在该降采样倍率下的图像特征,得到所述待识别图像在该降采样倍率下的新的图像特征;Fusion of the projection features and the image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate;
当重复执行的次数到达预设次数时,将每个降采样倍率下的图像特征作为所述待识别图像在该将降采样倍率下的融合特征。When the number of repeated executions reaches a preset number, the image features at each down-sampling rate are used as the fusion features of the image to be identified at the down-sampling rate.
结合第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述特征融合模块,具体用于如果不是第预设次数重复执行,则融合所述投影特征与所述待识别图像在该降采样倍率下最新的图像特征,得到所述待识别图像在该降采样倍率下的新的图像特征;With reference to the first possible implementation of the second aspect, in the second possible implementation, the feature fusion module is specifically configured to fuse the projection feature and the The latest image features of the image to be recognized at the downsampling rate are obtained to obtain the new image features of the image to be recognized at the downsampling rate;
如果是第预设次数重复执行,则融合所述投影特征、所述待识别图像在该降采样倍率下最新的图像特征、以及所述待识别图像在该降采样倍率下初始的图像特征。If it is repeated for the preset number of times, the projection features, the latest image features of the image to be recognized at the downsampling rate, and the initial image features of the image to be recognized at the downsampling rate are fused.
结合第二方面的第一种可能的实现方式,在第三种可能的实现方式中,所述特征融合模块,具体用于对相邻降采样倍率中小于该降采样倍率的降采样倍率下的图像特征,进行步长大于1的池化处理,得到在该降采样倍率下的投影特征;Combined with the first possible implementation manner of the second aspect, in a third possible implementation manner, the feature fusion module is specifically used to compare the feature fusion module at a down-sampling magnification of adjacent down-sampling magnifications that is smaller than the down-sampling magnification. Image features are pooled with a step size greater than 1 to obtain the projection features at the downsampling ratio;
对相邻降采样倍率中大于该降采样倍率的降采样倍率下的图像特征,进行上采样处理,得到在该降采样倍率下的投影特征。Perform an upsampling process on the image features at a downsampling rate greater than the downsampling rate among adjacent downsampling rates to obtain projection features at the downsampling rate.
结合第二方面,在第四种可能的实现方式中,所述识别模块,具体用于将所述待识别图像在所有降采样倍率下的融合特征,输入至预先建立的识别模型进行识别,得到所述待识别图像的识别结果;Combined with the second aspect, in a fourth possible implementation manner, the recognition module is specifically configured to input the fusion features of the image to be recognized at all downsampling magnifications into a pre-established recognition model for recognition, and obtain The recognition result of the image to be recognized;
所述识别模型包括多个目标检测子模型、语义分割子模型、实例分割子模型以及姿态点估计子模型中的多个子模型;The recognition model includes multiple sub-models of a plurality of target detection sub-models, a semantic segmentation sub-model, an instance segmentation sub-model and a pose point estimation sub-model;
所述识别模型通过以下方式训练:The recognition model is trained in the following ways:
获取样本图像在所述多个不同降采样倍率下的多个图像特征,所述样本图像针对每个子模型标注有真值;Obtain multiple image features of the sample image under the multiple different downsampling magnifications, and the sample image is marked with a true value for each sub-model;
针对所述多个不同降采样倍率中的每个降采样倍率,融合所述多个图像特征在该降采样倍率下的投影,得到所述样本图像在该将降采样倍率下的融合特征;For each of the plurality of different down-sampling magnifications, fuse the projections of the plurality of image features at the down-sampling magnification to obtain the fusion features of the sample image at the down-sampling magnification;
将所述待识别图像在所有降采样倍率下的融合特征,输入至所述识别模型的每个子模型,得到所述识别模型的所有子模型输出的预测值;Input the fused features of the image to be recognized at all downsampling magnifications into each sub-model of the recognition model to obtain predicted values output by all sub-models of the recognition model;
针对每个子模型,根据该子模型输出的预测值与所述样本图像针对该子模型标注的真值间的损失,通过针对该子模型预设的训练方式,调整该子模型的模型参数。For each sub-model, based on the loss between the predicted value output by the sub-model and the true value marked by the sample image for the sub-model, the model parameters of the sub-model are adjusted through a preset training method for the sub-model.
在本发明实施例的第三方面,提供了一种电子设备,包括:In a third aspect of the embodiment of the present invention, an electronic device is provided, including:
存储器,用于存放计算机程序;Memory, used to store computer programs;
处理器,用于执行存储器上所存放的程序时,实现上述第一方面任一所述的方法步骤。The processor is used to implement any of the method steps described in the first aspect when executing a program stored in the memory.
在本发明实施例的第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面任一所述的方法步骤。In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided. A computer program is stored in the computer-readable storage medium. When the computer program is executed by a processor, any one of the above-mentioned aspects of the first aspect is implemented. the method steps described.
本发明实施例提供的图像识别方法、装置及电子设备,可以通过融合不同采样倍率下的图像特征,得到同时包括较完整的纹理信息和语义信息的融合特征,因此该融合特征可以适用于多种不同的图像识别任务,即可以通过同一个流程完成不同的图像识别任务,因此简化了图像识别流程。当然,实施本发明的任一产品或方法并不一定需要同时达到以上所述的所有优点。The image recognition method, device and electronic equipment provided by the embodiments of the present invention can obtain fusion features that simultaneously include relatively complete texture information and semantic information by fusing image features at different sampling rates. Therefore, the fusion features can be applied to a variety of applications. Different image recognition tasks can be completed through the same process, thus simplifying the image recognition process. Of course, implementing any product or method of the present invention does not necessarily require achieving all the above-mentioned advantages simultaneously.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1为本发明实施例提供的图像识别方法的一种流程示意图;Figure 1 is a schematic flow chart of an image recognition method provided by an embodiment of the present invention;
图2为本发明实施例提供的图像识别框架的一种结构示意图;Figure 2 is a schematic structural diagram of an image recognition framework provided by an embodiment of the present invention;
图3a为本发明实施例提供的特征融合框架的一种结构示意图;Figure 3a is a schematic structural diagram of a feature fusion framework provided by an embodiment of the present invention;
图3b为本发明实施例提供的特征融合框架的另一种结构示意图;Figure 3b is another structural schematic diagram of the feature fusion framework provided by an embodiment of the present invention;
图4为本发明实施例提供的特征融合框架的一种原理示意图;Figure 4 is a schematic diagram of a principle of the feature fusion framework provided by an embodiment of the present invention;
图5为本发明实施例提供的图像识别框架训练方法的一种流程示意图;Figure 5 is a schematic flow chart of the image recognition framework training method provided by an embodiment of the present invention;
图6为本发明实施例提供的图像识别装置的一种结构示意图;Figure 6 is a schematic structural diagram of an image recognition device provided by an embodiment of the present invention;
图7为本发明实施例提供的电子设备的一种结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
参见图1,图1所示为本发明实施例提供的图像识别方法的一种流程示意图,可以包括:Referring to Figure 1, Figure 1 shows a schematic flow chart of an image recognition method provided by an embodiment of the present invention, which may include:
S101,获取待识别图像在多个不同降采样倍率下的多个图像特征。S101. Obtain multiple image features of the image to be recognized at multiple different downsampling ratios.
待识别图像根据应用场景的不同可以不同,并不特指某张图像。多个不同降采样倍率也可以根据应用场景的不同而不同,为描述方便,下面假设多个不同降采样倍率分别为:2倍、4倍、8倍、16倍、32倍。The image to be recognized can be different depending on the application scenario, and does not refer to a specific image. Multiple different down-sampling ratios can also vary according to different application scenarios. For the convenience of description, it is assumed that the multiple different down-sampling ratios are: 2 times, 4 times, 8 times, 16 times, and 32 times.
x倍降采样倍率下的图像特征的尺度为待识别图像的x分之一,例如假设待识别图像的分辨率为800*600,则2倍降采样下的图像特征的分辨率为400*300。多个不同降采样倍率下的图像特征,可以是利用多个串联的池化层对待识别图像进行池化处理得到的。以多个不同降采样倍率分别为:2倍、4倍、8倍、16倍、32倍为例,可以是利用5个串联的池化层连续对待识别图像进行步长为2的池化处理,每一个池化层的输入为上一个池化层的输出,则第一个池化层输出的即为2倍降采样下的图像特征,第二个池化层输出的即为4倍降采样下的图像特惠正,依次类推。The scale of the image feature under x times downsampling is one x of the image to be recognized. For example, assuming that the resolution of the image to be recognized is 800*600, the resolution of the image feature under 2 times downsampling is 400*300. . Image features at multiple different downsampling ratios can be obtained by pooling the image to be recognized using multiple pooling layers in series. Taking multiple different downsampling ratios: 2 times, 4 times, 8 times, 16 times, and 32 times as an example, you can use 5 series-connected pooling layers to continuously perform pooling processing with a step size of 2 on the image to be identified. , the input of each pooling layer is the output of the previous pooling layer, then the output of the first pooling layer is the image feature under 2 times downsampling, and the output of the second pooling layer is 4 times downsampling. The images under sampling are discounted, and so on.
可以理解的是,降采样倍率越低的图像特征中所包括的语义信息越少,纹理信息越多,降采样倍率越高的图像特征中所包括的纹理信息越少,语义信息越多。It can be understood that image features with lower downsampling ratios include less semantic information and more texture information, and image features with higher downsampling ratios include less texture information and more semantic information.
S102,针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到待识别图像在该降采样倍率下的融合特征。S102: For each down-sampling ratio among multiple different down-sampling ratios, fuse the projections of multiple image features at the down-sampling ratio to obtain the fusion features of the image to be identified at the down-sampling ratio.
一个图像特征在一个降采样倍率下的投影,可以是指通过上采样或者下采样将该图像特征缩放至该降采样倍率对应的尺度所得到的图像特征。示例性,假设待识别图像在2倍降采样率下的图像特征的分辨率为400*300,则待识别图像在4倍降采样率下的图像特征的分辨率为200*150。2倍降采样率下的图像特征在4倍降采样率下的投影,可以是将2倍降采样率下的图像特征经过2倍下采样得到的200*150的图像特征,也可以是将2倍降采样率下的图像特征经过4倍下采样再经过2倍上采样得到的200*150的图像特征,本实施例对此不做限制。The projection of an image feature at a downsampling rate may refer to the image feature obtained by scaling the image feature to a scale corresponding to the downsampling rate through upsampling or downsampling. For example, assuming that the resolution of the image features of the image to be recognized is 400*300 at a downsampling rate of 2 times, then the resolution of the image features of the image to be recognized at a downsampling rate of 4 times is 200*150. 2x downsampling The projection of image features at a sampling rate at a 4x downsampling rate can be a 200*150 image feature obtained by downsampling the image features at a 2x downsampling rate, or it can be a 2x downsampling The image features at 200*150 are obtained by 4 times downsampling and 2 times upsampling. This embodiment does not limit this.
融合得到的每个融合特征中,可以包括低采样倍率下的图像特征中较完整的纹理信息,同时也可以高采样倍率下的图像特征中较完整的语义信息。Each fusion feature obtained by fusion can include more complete texture information in image features at low sampling rates, and can also include more complete semantic information in image features at high sampling rates.
S103,根据待识别图像在所有降采样倍率下的融合特征,确定待识别图像的识别结果。S103. Determine the recognition result of the image to be recognized based on the fusion features of the image to be recognized at all downsampling magnifications.
可以是将待识别图像在所有降采样倍率下的融合特征,输入至识别模型,得到识别模型输出的识别结果,其中,识别模型为预先经过训练的用于实现特征到识别结果的映射的模型。该模型可以是基于深度学习得到的神经网络,也可以是基于传统机器学习得到的算法模型,本实施例对崔不做限制。The fused features of the image to be recognized at all downsampling magnifications can be input to the recognition model to obtain the recognition result output by the recognition model, where the recognition model is a model that has been trained in advance to implement mapping from features to recognition results. The model can be a neural network based on deep learning, or an algorithm model based on traditional machine learning. This embodiment does not limit Cui.
选用该实施例可以通过融合不同采样倍率下的图像特征,得到同时包括较完整的纹理信息和语义信息的融合特征,因此该融合特征可以适用于多种不同的图像识别任务,即可以通过同一个流程完成不同的图像识别任务,因此简化了图像识别流程。This embodiment can be used to obtain fusion features that include relatively complete texture information and semantic information by fusing image features at different sampling rates. Therefore, the fusion features can be applied to a variety of different image recognition tasks, that is, they can be used through the same The process accomplishes different image recognition tasks, thus simplifying the image recognition process.
相关技术中由于图像特征中无法同时包含较完整的纹理信息和语义信息,因此特征提取得到的图像特征无法适用于多种不同的图像识别任务,因此需要针对不同的图像识别任务分别设计不同的图像识别网络,导致图像识别流程繁琐,有鉴于此,本发明实施例提供了一种统一的图像识别框架,该图像识别框架的结构可以如图2所示,包括:特征提取模型210、特征融合模型220以及图像识别模型230。In related technologies, because image features cannot contain relatively complete texture information and semantic information at the same time, the image features obtained by feature extraction cannot be applied to a variety of different image recognition tasks. Therefore, different images need to be designed for different image recognition tasks. Recognition network makes the image recognition process cumbersome. In view of this, embodiments of the present invention provide a unified image recognition framework. The structure of the image recognition framework can be shown in Figure 2, including: feature extraction model 210, feature fusion model 220 and image recognition model 230.
其中,特征提取模型210用于提取输入的图像在多个不同降采样倍率下的图像特征,并将这些图像特征输入至特征融合模型220,特征融合模型220用于在融合所输入的图像特征在各个降采样倍率下的投影,得到各个降采样倍率下的融合特征,并将各个降采样倍率下的融合特征,输入至图像识别模型230。Among them, the feature extraction model 210 is used to extract image features of the input image at multiple different downsampling magnifications, and input these image features to the feature fusion model 220. The feature fusion model 220 is used to fuse the input image features at The projection at each downsampling magnification is used to obtain the fusion features at each downsampling magnification, and the fusion features at each downsampling magnification are input to the image recognition model 230 .
图像识别模型230可以包括目标检测子模型231、语义分割子模型232、实例分割子模型233以及姿态点估计子模型234,在其他可能的实施例中,图像识别模型230中也可以只包括这些子模型中的部分(一个或多个)子模型,而并非所有子模型。The image recognition model 230 may include a target detection sub-model 231, a semantic segmentation sub-model 232, an instance segmentation sub-model 233, and a pose point estimation sub-model 234. In other possible embodiments, the image recognition model 230 may also include only these sub-models. Some, but not all, of the submodels in the model.
当图像识别模型230只包括一个子模型时,可以是将各个降采样倍率下的融合特征输入至该子模型,并将该子模型的输出作为识别结果。当图像识别模型230包括多个子模型时,可以是将各个降采样倍率下的融合特征分别输入至每个子模型,并将所有子模型的输出作为识别结果。When the image recognition model 230 only includes one sub-model, the fused features at each downsampling ratio may be input to the sub-model, and the output of the sub-model may be used as the recognition result. When the image recognition model 230 includes multiple sub-models, the fused features at each downsampling ratio may be input to each sub-model respectively, and the outputs of all sub-models may be used as recognition results.
选用该实施例,可以充分利用得到的融合特征可以适用于多种不同的图像识别任务的特点,使得多个实现不同图像识别任务的子模型共享相同的特征提取模型和特征融合模型,可以利用同一个框架实现多种不同的图像识别任务,并且有效节省了计算量。By choosing this embodiment, you can make full use of the characteristics of the obtained fusion features that can be applied to a variety of different image recognition tasks, so that multiple sub-models that implement different image recognition tasks share the same feature extraction model and feature fusion model, and can use the same feature extraction model and feature fusion model. One framework implements multiple different image recognition tasks and effectively saves calculations.
为更清楚的对本发明实施例提供的图像识别方法进行说明,下面将结合图2所示的图像识别框架中的特征融合模型,对本发明实施例提供的图像识别方法中的特征融合进行说明。In order to explain the image recognition method provided by the embodiment of the present invention more clearly, the feature fusion in the image recognition method provided by the embodiment of the present invention will be described below in conjunction with the feature fusion model in the image recognition framework shown in Figure 2.
为描述方便,仍以多个不同降采样倍率分别为:2倍、4倍、8倍、16倍、32倍为例进行说明,则特征融合模型的结构可以如图3a所示,包括五行多列的单元,其中标注有S2的单元表示该单元用于对2倍采样率下的图像特征进行处理,标注有S4的单元表示该单元用于对4倍采样率下的图像特征进行处理,标注有S8的单元表示该单元用于对8倍采样率下的图像特征进行处理,标注有S16的单元表示该单元用于对16倍采样率下的图像特征进行处理,标注有S32的单元表示该单元用于对32倍采样率下的图像特征进行处理。在其他可能的实施例中,特征融合模型也可以是其他结构的,本实施例对此不做限制。For the convenience of description, multiple different downsampling ratios are still used as examples: 2 times, 4 times, 8 times, 16 times, and 32 times. The structure of the feature fusion model can be shown in Figure 3a, including five lines of Column units, the unit marked with S2 indicates that the unit is used to process image features at 2 times the sampling rate, and the unit marked with S4 indicates that the unit is used to process image features at 4 times the sampling rate, marked The unit marked with S8 indicates that the unit is used to process image features at 8 times the sampling rate. The unit marked with S16 indicates that the unit is used to process image features at 16 times the sampling rate. The unit marked with S32 indicates that the unit is used to process image features at 8 times the sampling rate. The unit is used to process image features at a sampling rate of 32 times. In other possible embodiments, the feature fusion model may also have other structures, which is not limited in this embodiment.
横向箭头表示卷积操作,可以是利用任意尺寸的卷积核进行步长为1的卷积处理。斜向上的箭头表示上采样处理,例如可以是最近邻插值处理,也可以是双线性插值处理。斜向下的箭头表示下采样处理,例如可以是步长为2的池化处理。The horizontal arrow indicates the convolution operation, which can be a convolution process with a step size of 1 using a convolution kernel of any size. The upward sloping arrow indicates upsampling processing, which may be nearest neighbor interpolation processing or bilinear interpolation processing, for example. The sloping downward arrow indicates downsampling processing, which may be, for example, pooling processing with a step size of 2.
该结构的第一列可以视为输入,最后一列可以视为输出,即第一列的单元可以表示待识别图像在对应的降采样倍率下初始的图像特征,例如第一行第一列的单元为待识别图像在2倍降采样下初始的图像特征,第二行第一列的单元为待识别图像在4倍降采样下初始的图像特征,依次类推。The first column of this structure can be regarded as input, and the last column can be regarded as output. That is, the units in the first column can represent the initial image features of the image to be recognized at the corresponding downsampling rate, such as the units in the first row and first column. is the initial image feature of the image to be recognized under 2x downsampling, the unit in the second row and the first column is the initial image feature of the image to be recognized under 4x downsampling, and so on.
除第一列以外的其他列可以视为重复进行特征融合。例如,第一行第二列的单元为经过一次特征融合后待识别图像在2倍降采样下的新的图像特征,第一行第三列的单元为经过二次特征融合后待识别图像在2倍降采样下的新的图像特征,依次类推。Columns other than the first column can be considered as repeated feature fusion. For example, the units in the first row and the second column are the new image features of the image to be recognized after one feature fusion under 2 times downsampling, and the units in the first row and the third column are the new image features of the image to be recognized after the second feature fusion. New image features under 2x downsampling, and so on.
为保证输出的融合特征中融合了待识别图像在每个降采样倍率下的初始的图像特征,以使得融合特征中包括尽可能多的图像信息,在该实施例中,重复图像融合的次数应该不少于多个不同的降采样倍率的数目,以该应用场景为例,重复图像融合的次数应该不少于5次。In order to ensure that the output fusion feature incorporates the initial image features of the image to be recognized at each downsampling magnification, so that the fusion feature includes as much image information as possible, in this embodiment, the number of times of repeated image fusion should be No less than the number of multiple different downsampling ratios. Taking this application scenario as an example, the number of repeated image fusion should be no less than 5 times.
以图中第二行第二列的单元为例,该单元为经过二次特征融合后待识别图像在2倍降采样下的新的图像特征,该新的图像特征是通过以下步骤计算得到的:Taking the unit in the second row and second column in the figure as an example, this unit is the new image feature of the image to be identified under 2x downsampling after secondary feature fusion. The new image feature is calculated through the following steps. :
步骤1、对第一行第一列的单元进行下采样,得到投影特征。Step 1. Downsample the units in the first row and column to obtain the projection features.
步骤2、对第二行第一列的单元进行卷积处理,得到待处理图像在4倍采样率下的图像特征。Step 2: Perform convolution processing on the units in the second row and first column to obtain the image features of the image to be processed at 4 times the sampling rate.
步骤3、对第三行第一列的单元进行上采样,得到投影特征。Step 3. Upsample the units in the third row and first column to obtain the projection features.
步骤4、融合步骤1-3中得到的所有特征,得到第二行第二列的单元。Step 4. Fusion of all the features obtained in steps 1-3 to obtain the units in the second row and second column.
为更清楚的进行描述,下面将对该结构的原理进行原理性说明,可以参见图4,包括:In order to describe it more clearly, the principle of the structure will be explained in principle below, which can be seen in Figure 4, including:
S401,将待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征。S401: Project image features of the image to be recognized at a down-sampling magnification adjacent to the down-sampling magnification to the down-sampling magnification to obtain projection features.
其中,相邻降采样倍率为将多个降采样倍率按照由大到小或由小到大的顺序进行排序时与该降采样倍率相邻的降采样倍率。例如,对于4倍降采样,相邻降采样倍率为2倍降采样和8倍降采样,对于2倍降采样,相邻降采样倍率为4倍降采样。Wherein, the adjacent down-sampling rate is the down-sampling rate adjacent to the down-sampling rate when multiple down-sampling rates are sorted from large to small or from small to large. For example, for 4x downsampling, the adjacent downsampling ratio is 2x downsampling and 8x downsampling, and for 2x downsampling, the adjacent downsampling ratio is 4x downsampling.
S402,融合投影特征与待识别图像在该降采样倍率下的图像特征,得到待识别图像在该降采样倍率下的新的图像特征。S402: Fusion of projection features and image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate.
可以参见前述关于图3a的说明,在此不再赘述。Reference may be made to the foregoing description of Figure 3a, which will not be described again here.
S403,返回执行S401,直至重复执行的次数达到预设次数。S403, return to execute S401 until the number of repeated executions reaches the preset number.
其中,预设次数不小于多个不同采样倍率的数目,该预设次数对应于该结构中除第一列以外的其他列的数目。Wherein, the preset number of times is not less than the number of multiple different sampling ratios, and the preset number of times corresponds to the number of other columns in the structure except the first column.
S404,将每个降采样倍率下的图像特征作为待识别图像在该将降采样倍率下的融合特征。S404: Use the image features at each downsampling rate as the fusion features of the image to be recognized at the downsampling rate.
选用该实施例,可以通过稠密连接的方式,以相对简单的架构实现对不同降采样倍率下的图像特征的融合。但是,随着重复融合的次数变多,得到的融合特征中可能丢失初始的图像特征中的部分信息,有鉴于此,本发明实施例提供了另一种特征融合架构,可以如图3b所示,其中虚线部分表示捷径连接(shotcut)。By using this embodiment, the fusion of image features at different downsampling ratios can be achieved with a relatively simple architecture through dense connections. However, as the number of repeated fusions increases, part of the information in the initial image features may be lost in the obtained fusion features. In view of this, embodiments of the present invention provide another feature fusion architecture, which can be shown in Figure 3b , where the dotted line part represents the shortcut connection (shotcut).
图3b所示的特征融合架构与图3a所示的特征融合架构的原理相近,区别仅在于最后一列的运算规则。即原理上区别在于重复执行第预设次数时,图3a所示的特征融合架构,仍是融合投影特征与待识别图像在该降采样倍率下最新的图像特征,而图3b所示的特征融合架构中,则是融合投影特征、待识别图像在该降采样倍率下最新的图像特征、以及待识别图像在该降采样倍率下初始的图像特征。即额外融合了待识别图像在该降采样倍率下初始的图像特征,使得输出的融合特征中能够尽可能保留初始的图像特征中的信息。The principle of the feature fusion architecture shown in Figure 3b is similar to that of the feature fusion architecture shown in Figure 3a, and the only difference lies in the operation rules of the last column. That is to say, the difference in principle is that when the feature fusion architecture shown in Figure 3a is repeated for the preset number of times, it still fuses the projection features and the latest image features of the image to be recognized at the downsampling ratio, while the feature fusion shown in Figure 3b In the architecture, the projection features, the latest image features of the image to be recognized at the downsampling rate, and the initial image features of the image to be recognized at the downsampling rate are fused. That is, the initial image features of the image to be recognized at the downsampling ratio are additionally fused, so that the output fusion features can retain the information in the initial image features as much as possible.
下面将对图2的图像识别框架的训练过程进行说明,可以参见图5,包括:The training process of the image recognition framework in Figure 2 will be described below, which can be seen in Figure 5, including:
S501,获取样本图像在多个不同降采样倍率下的多个图像特征。S501: Obtain multiple image features of the sample image under multiple different downsampling ratios.
其中,样本图像针对每个子模型标注有真值,以图2为例,则样本图像中标注有4个真值,分别为目标检测的真值、语义分割的真值、实例分割的真值以及姿态点估计的真值。Among them, the sample image is marked with a true value for each sub-model. Taking Figure 2 as an example, the sample image is marked with 4 true values, which are the true value of target detection, the true value of semantic segmentation, the true value of instance segmentation and Ground truth of pose point estimation.
S502,针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到样本图像在该将降采样倍率下的融合特征。S502: For each down-sampling ratio among multiple different down-sampling ratios, fuse the projections of multiple image features at the down-sampling ratio to obtain the fusion features of the sample image at the down-sampling ratio.
该步骤与S102相同,区别仅在于对象从待识别图像变为样本图像。可以参见前述关于S102的描述,在此不再赘述。This step is the same as S102, the only difference is that the object changes from the image to be recognized to the sample image. Please refer to the foregoing description of S102, which will not be described again here.
S503,将待识别图像在所有降采样倍率下的融合特征,输入至识别模型的每个子模型,得到识别模型的所有子模型输出的预测值,。S503: Input the fused features of the image to be recognized at all downsampling magnifications to each sub-model of the recognition model, and obtain the predicted values output by all sub-models of the recognition model.
以图2为例,则可以得到4个预测值,分别为目标检测的预测值、语义分割的预测值、实例分割的预测值以及姿态点估计的预测值Taking Figure 2 as an example, four prediction values can be obtained, namely the prediction value of target detection, the prediction value of semantic segmentation, the prediction value of instance segmentation and the prediction value of pose point estimation.
S504,针对每个子模型,根据该子模型输出的预测值与样本图像针对该子模型标注的真值间的损失,通过针对该子模型预设的训练方式,调整该子模型的模型参数。S504. For each sub-model, adjust the model parameters of the sub-model through the preset training method for the sub-model based on the loss between the predicted value output by the sub-model and the true value marked by the sample image for the sub-model.
示例性的,对于目标检测子模型可以采用YOLO、SSD等one-stage的方式进行训练,也可以采用Faster-RCNN等two-stage的方式进行训练,对于语义分割子模型、实例分割子模型可以分别采用交叉熵损失的方式进行训练,对于姿态点估计子模型可以采用L2损失的方式进行训练。在其他可能的实施例中,也可以是采用其他方式进行训练,本实施例对此不做限制。For example, the target detection sub-model can be trained using one-stage methods such as YOLO and SSD, or two-stage methods such as Faster-RCNN can be used for training. The semantic segmentation sub-model and instance segmentation sub-model can be trained separately. Cross-entropy loss is used for training, and the attitude point estimation submodel can be trained using L2 loss. In other possible embodiments, other methods may be used for training, and this embodiment does not limit this.
参见图6,图6所示为本发明实时提供的图像识别装置的一种结构示意图,可以包括:Referring to Figure 6, Figure 6 shows a schematic structural diagram of the image recognition device provided in real time by the present invention, which may include:
特征提取模块601,用于获取待识别图像在多个不同降采样倍率下的多个图像特征;Feature extraction module 601 is used to obtain multiple image features of the image to be recognized under multiple different downsampling magnifications;
特征融合模块602,用于针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到待识别图像在该将降采样倍率下的融合特征;The feature fusion module 602 is used to fuse the projections of multiple image features at the down-sampling rate for each of multiple different down-sampling rates to obtain the fused features of the image to be identified at the down-sampling rate. ;
识别模块603,用于根据待识别图像在所有降采样倍率下的融合特征,确定待识别图像的识别结果。The recognition module 603 is used to determine the recognition result of the image to be recognized based on the fusion features of the image to be recognized at all downsampling magnifications.
在一种可能的实施例中,特征融合模块602,具体用于针对多个不同降采样倍率中的每个降采样倍率,重复执行以下步骤直至重复执行的次数达到预设次数,预设次数不小于多个不同采样倍率的数目:In a possible embodiment, the feature fusion module 602 is specifically configured to repeatedly perform the following steps for each down-sampling ratio among multiple different down-sampling ratios until the number of repetitions reaches a preset number of times, and the preset number of times does not exceed Less than the number of multiple different sampling ratios:
将待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征,相邻降采样倍率为将多个降采样倍率按照由大到小的顺序进行排序时与该降采样倍率相邻的降采样倍率;Project the image features of the image to be recognized at the adjacent downsampling magnification to the downsampling magnification to obtain the projection feature. The adjacent downsampling magnification is to combine multiple downsampling magnifications in order from large to small. The downsampling rate adjacent to the downsampling rate when sorting;
融合投影特征与待识别图像在该降采样倍率下的图像特征,得到待识别图像在该降采样倍率下的新的图像特征;Fusion of the projection features and the image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate;
当重复执行的次数到达预设次数时,将每个降采样倍率下的图像特征作为待识别图像在该将降采样倍率下的融合特征。When the number of repeated executions reaches the preset number, the image features at each downsampling rate are used as the fusion features of the image to be recognized at the downsampling rate.
在一种可能的实施例中,特征融合模块602,具体用于如果不是第预设次数重复执行,则融合投影特征与待识别图像在该降采样倍率下最新的图像特征,得到待识别图像在该降采样倍率下的新的图像特征;In a possible embodiment, the feature fusion module 602 is specifically configured to fuse the projection features and the latest image features of the image to be identified at the downsampling rate if it is not repeated a preset number of times to obtain the image to be identified in New image features under this downsampling ratio;
如果是第预设次数重复执行,则融合投影特征、待识别图像在该降采样倍率下最新的图像特征、以及待识别图像在该降采样倍率下初始的图像特征。If it is repeated for the preset number of times, the projection features, the latest image features of the image to be recognized at the downsampling rate, and the initial image features of the image to be recognized at the downsampling rate are fused.
在一种可能的实施例中,特征融合模块602,具体用于对相邻降采样倍率中小于该降采样倍率的降采样倍率下的图像特征,进行步长大于1的池化处理,得到在该降采样倍率下的投影特征;In a possible embodiment, the feature fusion module 602 is specifically configured to perform pooling processing with a step length greater than 1 on the image features at a downsampling magnification that is smaller than the downsampling magnification among adjacent downsampling magnifications, to obtain Projection characteristics at this downsampling rate;
对相邻降采样倍率中大于该降采样倍率的降采样倍率下的图像特征,进行上采样处理,得到在该降采样倍率下的投影特征。Perform an upsampling process on the image features at a downsampling rate greater than the downsampling rate among adjacent downsampling rates to obtain projection features at the downsampling rate.
在一种可能的实施例中,识别模块603,具体用于将待识别图像在所有降采样倍率下的融合特征,输入至识别模型,得到识别模型输出的识别结果,识别模型为预先经过训练的用于实现特征到识别结果的映射的模型。In a possible embodiment, the recognition module 603 is specifically used to input the fusion features of the image to be recognized at all downsampling magnifications into the recognition model to obtain the recognition result output by the recognition model. The recognition model is a pre-trained A model for mapping features to recognition results.
在一种可能的实施例中,识别模型包括多个目标检测子模型、语义分割子模型、实例分割子模型以及姿态点估计子模型中的多个子模型;In a possible embodiment, the recognition model includes multiple sub-models among multiple target detection sub-models, semantic segmentation sub-models, instance segmentation sub-models and pose point estimation sub-models;
识别模型通过以下方式训练:The recognition model is trained by:
获取样本图像在多个不同降采样倍率下的多个图像特征,样本图像针对每个子模型标注有真值;Obtain multiple image features of the sample image at multiple different downsampling magnifications, and the sample image is labeled with a true value for each sub-model;
针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到样本图像在该将降采样倍率下的融合特征;For each down-sampling rate among multiple different down-sampling rates, fuse the projections of multiple image features at the down-sampling rate to obtain the fusion features of the sample image at the down-sampling rate;
将待识别图像在所有降采样倍率下的融合特征,输入至识别模型的每个子模型,得到识别模型的所有子模型输出的预测值;Input the fused features of the image to be recognized at all downsampling magnifications into each sub-model of the recognition model, and obtain the predicted values output by all sub-models of the recognition model;
针对每个子模型,根据该子模型输出的预测值与样本图像针对该子模型标注的真值间的损失,通过针对该子模型预设的训练方式,调整该子模型的模型参数。For each sub-model, based on the loss between the predicted value output by the sub-model and the true value annotated by the sample image for the sub-model, the model parameters of the sub-model are adjusted through the preset training method for the sub-model.
本发明实施例还提供了一种电子设备,如图7所示,包括:An embodiment of the present invention also provides an electronic device, as shown in Figure 7, including:
存储器701,用于存放计算机程序;Memory 701, used to store computer programs;
处理器702,用于执行存储器701上所存放的程序时,实现如下步骤:The processor 702 is used to execute the program stored on the memory 701 to implement the following steps:
获取待识别图像在多个不同降采样倍率下的多个图像特征;Obtain multiple image features of the image to be recognized at multiple different downsampling ratios;
针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到待识别图像在该将降采样倍率下的融合特征;For each down-sampling magnification of multiple different down-sampling magnifications, fuse the projections of multiple image features at the down-sampling magnification to obtain the fused features of the image to be identified at the down-sampling magnification;
根据待识别图像在所有降采样倍率下的融合特征,确定待识别图像的识别结果。The recognition result of the image to be recognized is determined based on the fusion features of the image to be recognized at all downsampling magnifications.
在一种可能的实施例中,针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到待识别图像在该将采样倍率下的融合特征,包括:In a possible embodiment, for each of multiple different down-sampling magnifications, the projections of multiple image features at the down-sampling magnification are fused to obtain the fusion of the image to be identified at the sampling magnification. Features, including:
针对多个不同降采样倍率中的每个降采样倍率,重复执行以下步骤直至重复执行的次数达到预设次数,预设次数不小于多个不同采样倍率的数目:For each of the multiple different down-sampling ratios, repeat the following steps until the number of repetitions reaches the preset number, and the preset number is not less than the number of multiple different sampling ratios:
将待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征,相邻降采样倍率为将多个降采样倍率按照由大到小的顺序进行排序时与该降采样倍率相邻的降采样倍率;Project the image features of the image to be identified at the adjacent downsampling magnification to the downsampling magnification to obtain the projection feature. The adjacent downsampling magnification is to combine multiple downsampling magnifications in order from large to small. The downsampling rate adjacent to the downsampling rate when sorting;
融合投影特征与待识别图像在该降采样倍率下的图像特征,得到待识别图像在该降采样倍率下的新的图像特征;Fusion of the projection features and the image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate;
当重复执行的次数到达预设次数时,将每个降采样倍率下的图像特征作为待识别图像在该将降采样倍率下的融合特征。When the number of repeated executions reaches the preset number, the image features at each downsampling rate are used as the fusion features of the image to be recognized at the downsampling rate.
在一种可能的实施例中,融合投影特征与待识别图像在该降采样倍率下的图像特征,得到待识别图像在该降采样倍率下的新的图像特征,包括:In a possible embodiment, the projection features and the image features of the image to be recognized at the downsampling rate are fused to obtain new image features of the image to be recognized at the downsampling rate, including:
如果不是第预设次数重复执行,则融合投影特征与待识别图像在该降采样倍率下最新的图像特征,得到待识别图像在该降采样倍率下的新的图像特征;If it is not repeated for the preset number of times, fuse the projection features and the latest image features of the image to be recognized at the downsampling rate to obtain new image features of the image to be recognized at the downsampling rate;
如果是第预设次数重复执行,则融合投影特征、待识别图像在该降采样倍率下最新的图像特征、以及待识别图像在该降采样倍率下初始的图像特征。If it is repeated for the preset number of times, the projection features, the latest image features of the image to be recognized at the downsampling rate, and the initial image features of the image to be recognized at the downsampling rate are fused.
在一种可能的实施例中,将待识别图像在与该降采样倍率的相邻降采样倍率下的图像特征投影至该降采样倍率,得到投影特征,包括:In a possible embodiment, image features of the image to be identified at a down-sampling magnification adjacent to the down-sampling magnification are projected to the down-sampling magnification to obtain projected features, including:
对相邻降采样倍率中小于该降采样倍率的降采样倍率下的图像特征,进行步长大于1的池化处理,得到在该降采样倍率下的投影特征;Perform pooling processing with a step size greater than 1 on the image features at a downsampling rate that is smaller than the downsampling rate among adjacent downsampling rates to obtain the projection features at the downsampling rate;
对相邻降采样倍率中大于该降采样倍率的降采样倍率下的图像特征,进行上采样处理,得到在该降采样倍率下的投影特征。Perform an upsampling process on the image features at a downsampling rate greater than the downsampling rate among adjacent downsampling rates to obtain projection features at the downsampling rate.
在一种可能的实施例中,根据待识别图像在所有降采样倍率下的融合特征,确定待识别图像的识别结果,包括:In a possible embodiment, determining the recognition result of the image to be recognized based on the fusion features of the image to be recognized at all downsampling magnifications includes:
将待识别图像在所有降采样倍率下的融合特征,输入至识别模型,得到识别模型输出的识别结果,识别模型为预先经过训练的用于实现特征到识别结果的映射的模型。The fused features of the image to be recognized at all downsampling magnifications are input to the recognition model to obtain the recognition result output by the recognition model. The recognition model is a model that has been trained in advance to implement mapping from features to recognition results.
在一种可能的实施例中,识别模型包括多个目标检测子模型、语义分割子模型、实例分割子模型以及姿态点估计子模型中的多个子模型;In a possible embodiment, the recognition model includes multiple sub-models among multiple target detection sub-models, semantic segmentation sub-models, instance segmentation sub-models and pose point estimation sub-models;
识别模型通过以下方式训练:The recognition model is trained by:
获取样本图像在多个不同降采样倍率下的多个图像特征,样本图像针对每个子模型标注有真值;Obtain multiple image features of the sample image at multiple different downsampling magnifications, and the sample image is labeled with a true value for each sub-model;
针对多个不同降采样倍率中的每个降采样倍率,融合多个图像特征在该降采样倍率下的投影,得到样本图像在该将降采样倍率下的融合特征;For each down-sampling rate among multiple different down-sampling rates, fuse the projections of multiple image features at the down-sampling rate to obtain the fusion features of the sample image at the down-sampling rate;
将待识别图像在所有降采样倍率下的融合特征,输入至识别模型的每个子模型,得到识别模型的所有子模型输出的预测值;Input the fused features of the image to be recognized at all downsampling magnifications into each sub-model of the recognition model, and obtain the predicted values output by all sub-models of the recognition model;
针对每个子模型,根据该子模型输出的预测值与样本图像针对该子模型标注的真值间的损失,通过针对该子模型预设的训练方式,调整该子模型的模型参数。For each sub-model, based on the loss between the predicted value output by the sub-model and the true value annotated by the sample image for the sub-model, the model parameters of the sub-model are adjusted through the preset training method for the sub-model.
上述电子设备提到的存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory mentioned in the above-mentioned electronic device may include random access memory (Random Access Memory, RAM) or non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital SignalProcessing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital SignalProcessing, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
在本发明提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述实施例中任一图像识别方法。In yet another embodiment provided by the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium stores instructions that, when run on a computer, cause the computer to execute any one of the above embodiments. Image recognition methods.
在本发明提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一图像识别方法。In yet another embodiment provided by the present invention, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to execute any of the image recognition methods in the above embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are mutually exclusive. any such actual relationship or sequence exists between them. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、电子设备、计算机可读存储介质以及计算机程序产品的实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the embodiments of devices, electronic equipment, computer-readable storage media and computer program products, since they are basically similar to the method embodiments, the descriptions are relatively simple. For relevant details, please refer to the partial description of the method embodiments. .
以上所述仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention are included in the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911054684.2ACN111767934B (en) | 2019-10-31 | 2019-10-31 | Image recognition method and device and electronic equipment |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911054684.2ACN111767934B (en) | 2019-10-31 | 2019-10-31 | Image recognition method and device and electronic equipment |
| Publication Number | Publication Date |
|---|---|
| CN111767934A CN111767934A (en) | 2020-10-13 |
| CN111767934Btrue CN111767934B (en) | 2023-11-03 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201911054684.2AActiveCN111767934B (en) | 2019-10-31 | 2019-10-31 | Image recognition method and device and electronic equipment |
| Country | Link |
|---|---|
| CN (1) | CN111767934B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114296629A (en)* | 2021-12-28 | 2022-04-08 | 五邑大学 | A kind of signal acquisition method and system |
| CN115170815A (en)* | 2022-06-20 | 2022-10-11 | 北京百度网讯科技有限公司 | Method, device and medium for processing visual task and training model |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108460403A (en)* | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
| CN109872364A (en)* | 2019-01-28 | 2019-06-11 | 腾讯科技(深圳)有限公司 | Image-region localization method, device, storage medium and medical image processing equipment |
| CN109948524A (en)* | 2019-03-18 | 2019-06-28 | 北京航空航天大学 | A Traffic Vehicle Density Estimation Method Based on Space-Based Surveillance |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108229455B (en)* | 2017-02-23 | 2020-10-16 | 北京市商汤科技开发有限公司 | Object detection method, neural network training method and device and electronic equipment |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108460403A (en)* | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
| CN109872364A (en)* | 2019-01-28 | 2019-06-11 | 腾讯科技(深圳)有限公司 | Image-region localization method, device, storage medium and medical image processing equipment |
| CN109948524A (en)* | 2019-03-18 | 2019-06-28 | 北京航空航天大学 | A Traffic Vehicle Density Estimation Method Based on Space-Based Surveillance |
| Title |
|---|
| Semantic segmentation via highly fused convolutional network with multiple soft cost functions;Tao Yang et al;《cognitive systems research》;全文* |
| 一种改进FCN的输电线路航拍图像语义分割方法;赵振兵;李胜利;戚银城;翟永杰;张珂;;中国科技论文(第14期);全文* |
| Publication number | Publication date |
|---|---|
| CN111767934A (en) | 2020-10-13 |
| Publication | Publication Date | Title |
|---|---|---|
| JP6902611B2 (en) | Object detection methods, neural network training methods, equipment and electronics | |
| CN111476719B (en) | Image processing method, device, computer equipment and storage medium | |
| CN110852349A (en) | Image processing method, detection method, related equipment and storage medium | |
| CN110119860B (en) | Rubbish account detection method, device and equipment | |
| CN112598091A (en) | Training model and small sample classification method and device | |
| CN114861842B (en) | Few-sample target detection method and device and electronic equipment | |
| WO2024031898A1 (en) | Commodity price identification method and apparatus, and device and storage medium | |
| CN114694005A (en) | Target detection model training method and device, target detection method and device | |
| CN111767934B (en) | Image recognition method and device and electronic equipment | |
| CN114202648A (en) | Text image correction method, training method, device, electronic device and medium | |
| CN117215728B (en) | Agent model-based simulation method and device and electronic equipment | |
| CN116844032A (en) | Target detection and identification method, device, equipment and medium in marine environment | |
| WO2021218037A1 (en) | Target detection method and apparatus, computer device and storage medium | |
| CN118865022A (en) | Equipment defect detection model training method, equipment defect detection method and device | |
| CN114792256B (en) | Crowd expansion method and device based on model selection | |
| CN119785218A (en) | Method and system for extracting buildings from remote sensing images based on local-global features | |
| CN115063810B (en) | Text detection method, device, electronic device and storage medium | |
| CN111967365A (en) | Method and device for extracting image connection points | |
| CN113361442B (en) | Image recognition method, device, electronic equipment and storage medium | |
| CN112801045A (en) | Text region detection method, electronic equipment and computer storage medium | |
| US12430541B2 (en) | Method and device with neural network model | |
| CN116932935A (en) | Address matching method, device, equipment, medium and program product | |
| CN116416507A (en) | Multi-target image detection method, device, computer equipment and medium | |
| CN115761430A (en) | Target detection method, model training method, device, medium and electronic equipment | |
| CN116342888A (en) | A method and device for training a segmentation model based on sparse annotation |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right | Effective date of registration:20250722 Address after:Rooms 602 and 605, No. 85 Xiangxue Avenue Middle, Huangpu District, Guangzhou City, Guangdong Province 510000 Patentee after:Guangzhou Gaohang Technology Transfer Co.,Ltd. Country or region after:China Address before:Hangzhou City, Zhejiang province 310051 Binjiang District Qianmo Road No. 555 Patentee before:Hangzhou Hikvision Digital Technology Co.,Ltd. Country or region before:China |