CN110598776A

Movatterモバイル変換

Info

Publication number: CN110598776A
Application number: CN201910830812.1A
Authority: CN
Inventors: 谢昱锐; 刘甲甲
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2019-12-20

Abstract

Translated fromChinese

本发明提供一种基于类内视觉模式分享的图像分类方法，包括以下步骤：图像对象性窗口生成；图像窗口深度特征提取；基于类内分享特性的视觉字典学习：根据所有语义类别图像的候选对象性窗口的深度特征，通过优化视觉字典学习模型，获取具有类内分享特性的结构化视觉字典；输入图像对象性窗口生成及特征提取；对象性窗口的特征编码；视觉特征集成并构建图像全局特征；SVM分类器语义标签预测。本发明从更具有实用价值的引入视觉模式分享特性的角度去分析解决问题，通过协同挖掘各语义类别内具有分享特性的视觉字典单词，实现了图像特征表示语义性的增强，提高了图像类别识别的准确性。

The present invention provides an image classification method based on intra-class visual pattern sharing, comprising the following steps: image object window generation; image window depth feature extraction; visual dictionary learning based on intra-class sharing characteristics: according to the candidate objects of all semantic category images By optimizing the visual dictionary learning model, a structured visual dictionary with intra-class sharing characteristics can be obtained; input image object window generation and feature extraction; feature encoding of object windows; visual feature integration and construction of image global features ; SVM classifier semantic label prediction. The present invention analyzes and solves the problem from the perspective of introducing the sharing characteristic of the visual mode, which has more practical value, and realizes the enhancement of the semantics of the image feature representation and improves the recognition of the image category by collaboratively mining the visual dictionary words with the sharing characteristic in each semantic category accuracy.

Description

Translated fromChinese

一种基于类内视觉模式分享的图像分类方法An Image Classification Method Based on Intra-class Visual Pattern Sharing

技术领域technical field

本发明涉及图像识别技术领域，具体的说是一种基于类内视觉模式分享的图像分类方法。The invention relates to the technical field of image recognition, in particular to an image classification method based on intra-class visual pattern sharing.

背景技术Background technique

随着当前数字多媒体技术及互联网技术的不断发展，人类社会已经步入了多媒体数据迅猛增长的大数据时代。在不同形式的多媒体数据中，图像数据由于具有直观且容易获取的特点，在人们社会生活各个方面发挥着重要作用，进而使如何有效分析并理解图像数据的内容变得日趋重要。在过去数年中，众多图像语义对象分类方法在视觉特征生成、对象模型构建以及强监督学习方式上取得了一定进步。然而，由于当前公认存在的底层视觉特征与中高层信息之间的语义鸿沟，现有图像语义对象分类方法在判别性特征构建、关联信息协同分析、以及视觉特征语义性等关键问题上仍进展缓慢。With the continuous development of the current digital multimedia technology and Internet technology, human society has entered the era of big data with the rapid growth of multimedia data. Among the different forms of multimedia data, image data plays an important role in all aspects of people's social life due to its intuitive and easy-to-acquire characteristics, so how to effectively analyze and understand the content of image data has become increasingly important. In the past few years, many image semantic object classification methods have made some progress in visual feature generation, object model construction, and strong supervision learning methods. However, due to the currently recognized semantic gap between low-level visual features and high-level information, existing image semantic object classification methods are still progressing slowly on key issues such as discriminative feature construction, collaborative analysis of associated information, and visual feature semantics. .

针对图像分类问题，当前研究重点主要集中于图像特征语义表示的构建方面。当图像特征表示能充分描述对象语义内容时，仅通过简单的线性分类器便能实现图像语义内容的准确预测。早期图像特征的获取常基于底层视觉线索，如颜色、形状及纹理等，通过人工定义的图像特征构建方式，生成视觉信息的直方图表示。然而，此底层视觉特征表示仅是视觉信息的统计性描述，难以有效刻画语义对象内容，最终导致实际分类任务中无法准确预测图像所属类别。为了解决以上问题，后续研究工作致力于借助机器学习的方式，提取更具有语义判别性的图像特征表示。在众多图像分类模型中，基于视觉字典学习的方法将图像语义特征表示的构建问题，分解为底层特征提取、视觉字典学习、局部特征编码、图像全局特征生成四个子问题，该类方法优越的性能使其在语义特征构建、图像分类等视觉识别问题中得到了广泛的应用。For the problem of image classification, current research focuses on the construction of semantic representation of image features. When the image feature representation can fully describe the semantic content of the object, the accurate prediction of the semantic content of the image can be achieved only by a simple linear classifier. The acquisition of early image features is often based on the underlying visual cues, such as color, shape, and texture, and the histogram representation of visual information is generated through the artificially defined image feature construction method. However, this underlying visual feature representation is only a statistical description of visual information, and it is difficult to effectively describe the content of semantic objects, which ultimately leads to the inability to accurately predict the category of the image in the actual classification task. In order to solve the above problems, follow-up research work is devoted to extracting more semantically discriminative image feature representations by means of machine learning. Among many image classification models, the method based on visual dictionary learning decomposes the construction of image semantic feature representation into four sub-problems: bottom-level feature extraction, visual dictionary learning, local feature encoding, and image global feature generation. The superior performance of this type of method It has been widely used in visual recognition problems such as semantic feature construction and image classification.

当前基于视觉字典学习的图像分类方法，所获取的各字典单词相互独立，缺乏字典单词间彼此关联性的探索，进而削弱了视觉字典构建图像特征表示的判别能力。事实上，在视觉字典学习过程中，协同挖掘具有关联性的视觉字典单词，能有效增强同语义类别图像其特征表示的一致性，以及不同语义类别图像特征表示间的差异性，最终提高图像语义对象类别预测的性能。In the current image classification method based on visual dictionary learning, the acquired dictionary words are independent of each other, and there is a lack of exploration of the correlation between dictionary words, which in turn weakens the discriminative ability of visual dictionaries to construct image feature representations. In fact, in the process of visual dictionary learning, collaborative mining of related visual dictionary words can effectively enhance the consistency of feature representations of images of the same semantic category, as well as the differences between feature representations of images of different semantic categories, and ultimately improve image semantics. Performance of object category prediction.

发明内容Contents of the invention

针对现有技术中存在的上述不足之处，本发明要解决的技术问题是提供一种基于类内视觉模式分享的图像分类方法，以解决图像特征表示的语义信息匮乏问题。In view of the above shortcomings in the prior art, the technical problem to be solved by the present invention is to provide an image classification method based on intra-class visual pattern sharing to solve the problem of lack of semantic information represented by image features.

本发明为实现上述目的所采用的技术方案是：一种基于类内视觉模式分享的图像分类方法，包括以下步骤：The technical solution adopted by the present invention to achieve the above object is: an image classification method based on intra-class visual pattern sharing, comprising the following steps:

图像对象性窗口生成：给定包含多种语义类别对象的图像训练集，生成所述图像训练集中每幅图像的候选对象性窗口；Image objectivity window generation: Given an image training set containing multiple semantic category objects, generate a candidate objectivity window for each image in the image training set;

图像窗口深度特征提取：提取所述候选对象性窗口的深度特征；Image window depth feature extraction: extracting the depth features of the candidate objectivity window;

基于类内分享特性的视觉字典学习：根据所有语义类别图像的候选对象性窗口的深度特征，通过优化视觉字典学习模型，获取具有类内分享特性的结构化视觉字典；Visual dictionary learning based on intra-class sharing characteristics: According to the depth features of candidate objectivity windows of all semantic category images, a structured visual dictionary with intra-class sharing characteristics is obtained by optimizing the visual dictionary learning model;

对未知语义类别的输入图像，生成该图像候选对象性窗口，并提取各候选对象性窗口的深度特征；For an input image of an unknown semantic category, generate candidate objectivity windows of the image, and extract the depth features of each candidate objectivity window;

根据所述结构化视觉字典，计算输入图像候选对象性窗口的特征编码；According to the structured visual dictionary, calculate the feature encoding of the input image candidate objectivity window;

基于输入图像所有对象性窗口的特征编码，合并对象性窗口特征编码，以构建图像全局特征表示；Based on the feature codes of all object windows of the input image, the feature codes of the object windows are combined to construct a global feature representation of the image;

根据所述图像全局特征表示，利用线性SVM分类器，预测输入图像的语义类别标签，实现图像的分类。According to the global feature representation of the image, the linear SVM classifier is used to predict the semantic category label of the input image to realize the classification of the image.

所述生成所述图像训练集中每幅图像的候选对象性窗口通过EdgeBox算法实现。The generation of the candidate objectivity window of each image in the image training set is realized by EdgeBox algorithm.

所述提取所述候选对象性窗口的深度特征通过VGG19深度网络模型完成。The extraction of the depth features of the candidate objectivity window is accomplished through the VGG19 deep network model.

所述优化视觉字典学习模型通过下式进行：The optimization visual dictionary learning model is carried out by the following formula:

上式中，X_i为对应第i个语义对象类别所有训练样本的视觉特征矩阵；D_∈i表示将结构化视觉字典D中对应于第i类的字典单词保留，其余语义对象类别字典单词置零后的类特定视觉字典；A_i为视觉特征矩阵X_i在类特定字典D_∈i上的表示系数矩阵；D为待优化的结构化视觉字典，其为所有语义对象类别字典单词的集合；Z_i是特征矩阵X_i在结构化视觉字典D上的表示系数；D_i与D_j分别表示结构化字典D内对应于第i类和第j类对象的视觉字典；符号||·||_F表示计算矩阵的 Frobenius范数；参数α、β、λ₁、λ₂为目标函数中平衡不同的代价项的权重系数。In the above formula, X_i is the visual feature matrix of all training samples corresponding to the i-th semantic object category; D_∈ i means that the dictionary words corresponding to the i-th category in the structured visual dictionary D are reserved, and the remaining semantic object category dictionary words are set The class-specific visual dictionary after zero; A_i is the representation coefficient matrix of the visual feature matrix X_i on the class-specific dictionary D_∈ i; D is the structured visual dictionary to be optimized, which is the collection of all semantic object category dictionary words; Z_i is the representation coefficient of feature matrix X_i on the structured visual dictionary D; D_i and D_j respectively represent the visual dictionaries corresponding to the i-th and j-th types of objects in the structured dictionary D; the symbols ||·||_F represents the Frobenius norm of the calculation matrix; the parameters α, β, λ₁ , and λ₂ are weight coefficients for balancing different cost items in the objective function.

所述计算输入图像候选对象性窗口的特征编码的目标函数为：The objective function of calculating the feature encoding of the input image candidate objectivity window is:

上式中，x为对象性窗口的深度视觉特征，y表示待优化求解的对象性窗口特征编码，D为结构化视觉字典，参数η表示用于控制特征编码y中非零元素的个数，即特征编码的稀疏性；符号||·||_F表示计算矩阵的Frobenius范数。In the above formula, x is the depth visual feature of the object window, y represents the feature code of the object window to be optimized and solved, D is a structured visual dictionary, and the parameter η represents the number of non-zero elements used to control the feature code y, That is, the sparsity of feature encoding; the symbol ||·||_F represents the Frobenius norm of the calculation matrix.

本发明具有以下优点及有益效果：The present invention has the following advantages and beneficial effects:

1、针对当前基于视觉字典学习的图像分类方法在视觉字典单词学习过程中忽视了各单词间的关联性约束，导致了图像特征表示语义信息的匮乏，本发明从更具有实用价值的引入视觉模式分享特性的角度去分析解决问题，通过协同挖掘各语义类别内具有分享特性的视觉字典单词，实现了图像特征表示语义性的增强，提高了图像类别识别的准确性。1. In view of the fact that the current image classification method based on visual dictionary learning ignores the correlation constraints between words in the visual dictionary word learning process, resulting in the lack of semantic information represented by image features, the present invention introduces a visual model with more practical value To analyze and solve problems from the perspective of sharing characteristics, through collaborative mining of visual dictionary words with sharing characteristics in each semantic category, the semantic enhancement of image feature representation is realized, and the accuracy of image category recognition is improved.

2、本发明具有无人工参与，分类准确性高等特点。本发明区别于当前基于字典学习的图像分类方法孤立的学习各字典单词，引入语义对象类别内视觉模式的分享特性，协同挖掘同语义类别内具有分享特性的视觉字典单词，建立了各字典单词间的关联约束，改善了当前图像视觉特征表示语义信息的匮乏问题。2. The present invention has the characteristics of no manual participation and high classification accuracy. The present invention is different from the current image classification method based on dictionary learning, which learns each dictionary word in isolation, introduces the sharing characteristic of the visual pattern in the semantic object category, and collaboratively excavates the visual dictionary words with the sharing characteristic in the same semantic category, and establishes the relationship between each dictionary word. The association constraints of the image improve the lack of semantic information represented by the current image visual features.

3、本发明的方法实用并有效。3. The method of the present invention is practical and effective.

附图说明Description of drawings

图1为本发明的方法流程图；Fig. 1 is method flowchart of the present invention;

图2为本发明的基于图像对象性窗口的视觉特征集成图。Fig. 2 is a visual feature integration diagram based on the image objectivity window of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明在Matlab R2016b实验平台上进行实现，如图1所示主要包含两个部分，具体在类内分享视觉模式挖掘部分主要涉及以下内容，包括图像对象性窗口生成、窗口深度特征提取、以及基于类内分享特性的视觉字典学习；在图像全局特征构建及分类部分，涉及了输入图像对象性窗口生成及特征提取、对象性窗口特征编码、视觉特征集成并构建图像全局特征、SVM分类器语义标签预测四个步骤。具体如下：The present invention is realized on the Matlab R2016b experimental platform, and mainly includes two parts as shown in Fig. Visual dictionary learning of intra-class shared features; in the construction and classification of image global features, it involves input image object window generation and feature extraction, object window feature encoding, visual feature integration and construction of image global features, SVM classifier semantic labels Forecast four steps. details as follows:

类内分享视觉模式挖掘：Intra-class shared visual pattern mining:

步骤一、图像对象性窗口生成Step 1. Image object window generation

给定包含多种语义类别对象的图像训练集，利用EdgeBox算法(参见文献：C.Lawrence Zitnick and Piotr Doll′ar.Edge boxes：Locating object proposalsfrom edges.In European Conference on Computer Vision，2014.)生成每幅图像的候选对象性窗口。Given an image training set containing objects of various semantic categories, use the EdgeBox algorithm (see literature: C. Lawrence Zitnick and Piotr Doll'ar. Edge boxes: Locating object proposals from edges. In European Conference on Computer Vision, 2014.) to generate each Candidate objectivity window for an image.

步骤二、图像窗口深度特征提取Step 2. Image window depth feature extraction

利用VGG19深度网络模型(参见文献：K.Simonyan and A.Zisserman.Very deepconvolutional networks for large-scale image recognition.In InternationalConference on Learning Representations，2015.)，提取图像候选对象性窗口深度特征。Using the VGG19 deep network model (see literature: K.Simonyan and A.Zisserman.Very deep convolutional networks for large-scale image recognition.In International Conference on Learning Representations, 2015.), extract image candidate object window depth features.

步骤三、基于类内分享特性的视觉字典学习Step 3. Visual dictionary learning based on intra-class sharing features

为了挖掘各语义类别内具有分享特性的视觉模式，本发明的实施例中设计了以下结构化的视觉字典，其数学形式具体如下：In order to mine visual patterns with sharing characteristics in each semantic category, the following structured visual dictionary is designed in the embodiment of the present invention, and its mathematical form is specifically as follows:

D＝[D₁，D₂，...，D_C]D=[D₁ , D₂ , . . . , D_C ]

上式中D表示构建的结构化视觉字典，其为各对象类别字典D_i，i＝1，2，...，C 的级联，C表示图像集所包含的对象类别数目。In the above formula, D represents the constructed structured visual dictionary, which is the concatenation of each object category dictionary D_i , i=1, 2, . . . , C, and C denotes the number of object categories contained in the image set.

根据所有语义类别图像候选对象性窗口的深度特征，通过优化以下构建的视觉字典学习模型，获取具有类内分享特性的结构化视觉字典D。According to the deep features of all semantic category image candidate objectivity windows, a structured visual dictionary D with intra-class sharing characteristics is obtained by optimizing the visual dictionary learning model constructed below.

上式中，X_i为对应第i个语义对象类别所有训练样本的视觉特征矩阵，D_∈i表示将结构化字典D中对应于第i类的字典单词保留，其余语义对象类别字典单词置零后的类特定视觉字典；A_i为视觉特征矩阵X_i在类特定字典D_∈i上的表示系数矩阵；D为待优化的结构化视觉字典，其为所有语义对象类别字典单词的集合； Z_i是特征矩阵X_i在结构化字典D上的表示系数；D_i与D_j分别表示结构化字典D 内对应于第i类和第j类对象的视觉字典；符号||·||_F表示计算矩阵的Frobenius范数。In the above formula, X_i is the visual feature matrix of all training samples corresponding to the i-th semantic object category, D_∈ i means that the dictionary words corresponding to the i-th category in the structured dictionary D are reserved, and the remaining semantic object category dictionary words are set to zero The final class-specific visual dictionary; A_i is the representation coefficient matrix of the visual feature matrix X_i on the class-specific dictionary D_∈ i; D is the structured visual dictionary to be optimized, which is a collection of all semantic object category dictionary words; Z_i is the representation coefficient of the feature matrix X_i on the structured dictionary D; D_i and D_j respectively represent the visual dictionaries corresponding to the i-th and j-th types of objects in the structured dictionary D; the symbol ||·||_F represents Computes the Frobenius norm of a matrix.

在构建的视觉字典学习模型中，前两个代价项及为数据重构残差项，其作用是利用学习的类特定视觉字典D_∈i及结构化视觉字典 D，实现对第i个语义对象类别视觉特征的有效重构。代价项为引入的表示系数间的一致性约束，使得选择结构化字典D中对应第i类的字典单词，以对该类别视觉特征进行重构表示，进而保证相同对象类别特征数据的重构系数具有一致性。字典学习模型中代价项为各语义对象类别字典D_i， i＝1，2，...，C之间的正交性约束，能增强不同类别字典单词间的差异性，保证了后续图像特征编码的判别能力。优化模型中最后两项λ₁||A_i||_2，1，λ₂||Z_i||_2，1为对表示系数 A_i及Z_i施加的正则化约束，本实施例中具体采用了一种基于l_2，1范数的组稀疏形式，以使求解的表示系数矩阵按行具有稀疏性。在字典学习模型中，将l_2，1范数组稀疏正则化项与表示系数一致性约束进行联合优化，一方面保证了第i类视觉字典D_i对该语义类别视觉特征的重构，另一方面也能有效挖掘第i 类别内具有分享特性的视觉模式，最终提高同语义类别特征表示的一致性，以及不同语义类别特征间的差异性。字典学习模型中参数α、β、λ₁、λ₂为目标函数中平衡不同的代价项的权重系数，通过实验，均经验设定为0.01。该字典学习目标函数为多变量优化问题，本实施例采用交替优化的策略进行迭代计算。具体而言，当优化目标函数中某一个变量时，其余变量均固定，进而将原凸优化问题转换为多个凸优化子问题进行求解。In the constructed visual dictionary learning model, the first two cost items and Reconstruct the residual term for the data, and its function is to use the learned class-specific visual dictionary D_∈ i and the structured visual dictionary D to realize effective reconstruction of the visual features of the i-th semantic object category. consideration item For the consistency constraint between the introduced representation coefficients, the dictionary words corresponding to the i-th category in the structured dictionary D are selected to reconstruct the visual features of the category, thereby ensuring that the reconstruction coefficients of the feature data of the same object category are consistent sex. Cost term in dictionary learning model For each semantic object category dictionary D_i , i=1, 2, . In the optimization model, the last two terms λ₁ ||A_i ||_2,1 , λ₂ ||Z_i ||_2,1 are the regularization constraints imposed on the representation coefficients A_i and Z_i , and in this embodiment, A form of group sparsity based on l₂ , 1 norm is proposed to make the representation coefficient matrix to be solved row-wise sparse. In the dictionary learning model, the l₂ , 1 norm array sparse regularization term and the representation coefficient consistency constraint Joint optimization, on the one hand, ensures that the i-th visual dictionary D_i can reconstruct the visual features of the semantic category; Consistency, and the difference between features of different semantic categories. The parameters α, β, λ₁ , and λ₂ in the dictionary learning model are weight coefficients for balancing different cost items in the objective function, and are empirically set to 0.01 through experiments. The dictionary learning objective function is a multivariate optimization problem, and this embodiment adopts an alternate optimization strategy for iterative calculation. Specifically, when one variable in the objective function is optimized, the other variables are fixed, and then the original convex optimization problem is transformed into multiple convex optimization sub-problems for solution.

图像全局特征构建及分类：Image global feature construction and classification:

步骤一、输入图像对象性窗口生成及特征提取Step 1. Input image object window generation and feature extraction

给定一幅未知语义类别的输入图像，利用EdgeBox算法生成该图像候选对象性窗口，进一步提取各候选对象性窗口的VGG19深度网络视觉特征。Given an input image with an unknown semantic category, use the EdgeBox algorithm to generate candidate objectivity windows of the image, and further extract the VGG19 deep network visual features of each candidate objectivity window.

步骤二、对象性窗口特征编码Step 2. Object window feature encoding

根据获取的结构化视觉字典D，计算输入图像候选对象性窗口的特征编码。目标函数的具体数学形式如下：According to the obtained structured visual dictionary D, the feature encoding of the candidate objectivity window of the input image is calculated. The specific mathematical form of the objective function is as follows:

为求解以上目标函数，具体采用特征指派搜索(Feature-sign search，参见文献(Honglak Lee，Alexis Battle，Rajat Raina，and Andrew Y.Ng.Efficient sparsecoding algorithms.The Conference on Neural Information Processing Systems，pages 801-808.2007.)算法计算待优化变量y，即获得图像窗口的特征编码。In order to solve the above objective function, feature-sign search (Feature-sign search, see literature (Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y.Ng. Efficient sparsecoding algorithms. The Conference on Neural Information Processing Systems, pages 801- 808.2007.) The algorithm calculates the variable y to be optimized, that is, obtains the feature code of the image window.

步骤三、视觉特征集成并构建图像全局特征Step 3. Integrate visual features and construct global image features

基于输入图像所有对象性窗口的特征编码，进一步借鉴传统Max-Pooling 特征集成(参见文献：Jianchao Yang，Kai Yu，Yihong Gong，and T.Huang.Linear spatialpyramid matching using sparse coding for image classification.In IEEEConference on Computer Vision and Pattern Recognition，pages 1794-1801，2009.)的方式，如图2所示，合并对象性窗口特征编码，以构建图像全局特征表示。Based on the feature encoding of all object windows of the input image, further refer to the traditional Max-Pooling feature integration (see literature: Jianchao Yang, Kai Yu, Yihong Gong, and T.Huang.Linear spatialpyramid matching using sparse coding for image classification.In IEEEConference on Computer Vision and Pattern Recognition, pages 1794-1801, 2009.), as shown in Figure 2, incorporates object-wise window feature encoding to construct an image global feature representation.

传统Max-Pooling特征集成方法由于基于图像局部兴趣点，为了在图像全局特征中嵌入空间分布信息，在其方法中加入了对图像不同空间尺度划分的步骤。区别于以上传统方法，本发明方法基于图像对象性窗口区域，在图像全局特征构建过程中有效引入了图像空间分布及对象语义信息，最终图像全局特征的构建直接将图像所有窗口特征编码在各特征维度上取最大值获得。Because the traditional Max-Pooling feature integration method is based on the local interest points of the image, in order to embed the spatial distribution information in the global feature of the image, a step of dividing the image into different spatial scales is added to its method. Different from the above traditional methods, the method of the present invention is based on the image object window area, and effectively introduces the image spatial distribution and object semantic information in the process of image global feature construction. The final image global feature construction directly encodes all the window features of the image in each feature Obtained by taking the maximum value in the dimension.

步骤四、SVM分类器语义标签预测Step 4, SVM classifier semantic label prediction

根据步骤三构建的图像全局特征表示，利用线性SVM分类器(参见文献： R.-E.Fan，K.-W.Chang，C.-J.Hsieh，et al.Liblinear：A library for large linearclassification.Journal of Machine Learning Research，2008，9：1871-1874.)，预测输入图像的语义类别标签，最终实现图像的分类。According to the image global feature representation constructed in step 3, use a linear SVM classifier (see literature: R.-E.Fan, K.-W.Chang, C.-J.Hsieh, et al.Liblinear: A library for large linearclassification .Journal of Machine Learning Research, 2008, 9:1871-1874.), predict the semantic category label of the input image, and finally realize the classification of the image.

表1 本发明方法与现有图像分类方法在UIUC8对象识别数据库上准确性的评价Table 1 Evaluation of the accuracy of the method of the present invention and the existing image classification method on the UIUC8 object recognition database

如上表所示，实验中在UIUC8对象识别数据库上与现有方法进行了比较。该数据库包含了8种不同体育运动类别的图像数据，共1972幅图像。为了计算不同方法的分类准确性，从各类别图像集中随机选择70幅图像作为训练数据，该类别剩余图像中随机选择60幅图像作为测试数据，最终分类准确性为各类别分类准确性的平均值。As shown in the table above, experiments are compared with existing methods on the UIUC8 object recognition database. The database contains image data of 8 different sports categories, with a total of 1972 images. In order to calculate the classification accuracy of different methods, 70 images are randomly selected from the image sets of each category as training data, and 60 images are randomly selected from the remaining images of this category as test data, and the final classification accuracy is the average of the classification accuracy of each category .

注：上表中的图像分类方法LLC，参见文献(Jinjun Wang，Jianchao Yang，Kai Yu，Fengjun Lv，T.Huang，and Yihong Gong.Locality-constrained linear coding forimage classification.In IEEE Conference on Computer Vision and PatternRecognition，pages 3360-3367，2010.)；图像分类方法LSC，参见文献(Lingqiao Liu， LeiWang，and Xinwang Liu.In defense of soft-assignment coding.In IEEEInternational Conference on Computer Vision，pages 2486-2493，2011.)；图像分类方法CNN，参见文献(K.Simonyan and A.Zisserman.Very deep convolutional networksfor large-scale image recognition.In International Conference on LearningRepresentations，2015.)。Note: For the image classification method LLC in the above table, see the literature (Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T.Huang, and Yihong Gong.Locality-constrained linear coding for image classification.In IEEE Conference on Computer Vision and PatternRecognition , pages 3360-3367, 2010.); image classification method LSC, see literature (Lingqiao Liu, LeiWang, and Xinwang Liu.In defense of soft-assignment coding.In IEEEInternational Conference on Computer Vision, pages 2486-2493, 2011.) ; Image classification method CNN, see literature (K.Simonyan and A.Zisserman.Very deep convolutional networks for large-scale image recognition.In International Conference on Learning Representations, 2015.).

Claims

1. An image classification method based on intra-class visual mode sharing is characterized by comprising the following steps:

image object window generation: giving an image training set containing multiple semantic class objects, and generating a candidate object window of each image in the image training set;

extracting depth features of an image window: extracting the depth feature of the candidate object window;

visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model;

generating candidate object windows of the image for the input image with unknown semantic category, and extracting the depth characteristics of the candidate object windows;

calculating the characteristic codes of candidate object windows of the input images according to the structured visual dictionary;

combining object window feature codes based on the feature codes of all object windows of the input image to construct an image global feature representation;

and predicting semantic category labels of the input images by utilizing a linear SVM classifier according to the image global feature representation to realize the classification of the images.

2. The method of claim 1, wherein the generating of the candidate object window for each image in the image training set is implemented by an EdgeBox algorithm.

3. The method of claim 1, wherein the extracting the depth features of the candidate object window is performed by a VGG19 depth network model.

4. The method according to claim 1, wherein the optimized visual dictionary learning model is performed by the following formula:

in the above formula, X_iA visual characteristic matrix of all training samples corresponding to the ith semantic object class; d_∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured visual dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. the_iIs a visual feature matrix X_iClass specific dictionary D_∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; z_iIs a feature matrix X_iRepresenting coefficients on a structured visual dictionary D; d_iAnd D_jRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbol | · | non-conducting phosphor_FA Frobenius norm representing a computational matrix; parameters alpha, beta, lambda₁、λ₂The weighting coefficients of the different cost terms are balanced in the objective function.

5. The method according to claim 1, wherein the objective function for calculating the feature codes of the candidate windows of the input image is:

in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphor_FRepresenting the Frobenius norm of the computational matrix.