CN108304882B

Movatterモバイル変換

Info

Publication number: CN108304882B
Application number: CN201810124834.1A
Authority: CN
Inventors: 陈承; 冯晓冰; 褚攀; 徐浩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Yayue Technology Co ltd
Priority date: 2018-02-07
Filing date: 2018-02-07
Publication date: 2022-03-04
Anticipated expiration: 2038-02-07
Also published as: WO2019154262A1; CN108304882A

Abstract

The embodiment of the invention discloses an image classification method, an image classification device, a server, a user terminal and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining an image to be classified, calling a classification model to classify the image to be classified, determining an initial class to which the image to be classified belongs, calling a feature extraction model to determine a target feature vector of the image to be classified, comparing the target feature vector of the image to be classified with a candidate classification feature vector configured for the initial class to obtain a comparison result, and determining the classification result of the image to be classified according to the comparison result. By adopting the embodiment of the invention, the accuracy of image classification can be effectively improved.

Description

Image classification method and device, server, user terminal and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image classification method, an image classification device, a server, a user terminal, and a storage medium.

Background

In the life of people, various unknown images are often encountered, the categories of the images cannot be identified by naked eyes for non-professionals, and the embarrassment of 'deer is horse' is likely to occur.

With the development of image recognition technology, people get rid of the traditional mode of recognizing images by human eyes, the time for recognizing images is greatly saved, and the recognition efficiency is improved. However, how to classify various unknown images more quickly and accurately is still a hot point of research.

Disclosure of Invention

The embodiment of the invention provides an image classification method, an image classification device, a server, a user terminal and a storage medium, which can quickly and accurately classify images.

In one aspect, an embodiment of the present invention provides an image classification method, including:

acquiring an image to be classified;

calling a classification model to classify the images to be classified, and determining an initial class to which the images to be classified belong;

calling a feature extraction model to determine a target feature vector of the image to be classified;

comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result;

and determining the classification result of the image to be classified according to the comparison result.

On the other hand, an embodiment of the present invention provides another image classification method, including:

when the trigger operation aiming at the identification button is detected, calling a camera shooting assembly to obtain a current preview image;

if the current preview image is detected to be in a stable state, determining the current preview image as an image to be classified;

sending a classification request to a server, wherein the classification request carries the image to be classified;

and receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.

In another aspect, an embodiment of the present invention provides an image classification apparatus, including:

the acquisition module is used for acquiring an image to be classified;

the calling module is used for calling a classification model to classify the images to be classified and determining the initial classes to which the images to be classified belong;

the calling module is also used for calling a feature extraction model to determine a target feature vector of the image to be classified;

the comparison module is used for comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result;

and the determining module is used for determining the classification result of the image to be classified according to the comparison result.

In another aspect, an embodiment of the present invention provides another image classification apparatus, including:

the detection module is used for detecting the triggering operation aiming at the identification button;

the acquisition module is used for calling a camera shooting assembly to acquire a current preview image when the detection module detects the trigger operation aiming at the identification button;

the detection module is further configured to detect whether the current preview image is in a stable state;

the determining module is used for determining the current preview image as an image to be classified if the detecting module detects that the current preview image is in a stable state;

the sending module is used for sending a classification request to a server, wherein the classification request carries the image to be classified;

and the receiving module is used for receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.

Correspondingly, an embodiment of the present invention further provides a server, including: a processor and a storage device; the storage device is used for storing program instructions; the processor calls the program instructions to perform: acquiring an image to be classified; calling a classification model to classify the images to be classified, and determining an initial class to which the images to be classified belong; calling a feature extraction model to determine a target feature vector of the image to be classified; comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result; and determining the classification result of the image to be classified according to the comparison result.

Correspondingly, an embodiment of the present invention further provides a user terminal, including: a processor and a storage device; the storage device is used for storing program instructions; the processor calls the program instructions to perform: when the trigger operation aiming at the identification button is detected, calling a camera shooting assembly to obtain a current preview image; if the current preview image is detected to be in a stable state, determining the current preview image as an image to be classified; sending a classification request to a server, wherein the classification request carries the image to be classified; and receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.

Accordingly, the embodiment of the present invention further provides a computer storage medium, in which program instructions are stored, and when the program instructions are executed, the computer storage medium is used for implementing the above methods.

The embodiment of the invention can obtain the image to be classified, call the classification model to perform primary classification on the image to be classified, call the feature extraction model to determine the target feature vector of the image to be classified, compare the target feature vector of the image to be classified with only the candidate classification feature vector configured for the primary classification, and determine the classification result of the image to be classified according to the comparison result, thereby effectively improving the accuracy of image classification, avoiding the need of comparing with the candidate classification feature vectors of excessive classes, improving the operation efficiency and saving software and hardware resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a framework for image classification according to an embodiment of the present invention;

FIG. 2a is a schematic structural diagram of an initial classification model according to an embodiment of the present invention;

FIG. 2b is a schematic structural diagram of an initial feature classification model according to an embodiment of the present invention;

FIG. 2c is a diagram illustrating the correspondence between the feature extraction network and the feature representation optimization module according to an embodiment of the present invention;

FIG. 2d is a schematic diagram of training whitening parameters for an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an image classification method according to an embodiment of the present invention;

FIG. 4 is a flow chart of another image classification method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a user interface of an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a server according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of another image classification apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a user terminal according to an embodiment of the present invention.

Detailed Description

In the embodiment of the invention, on one hand, the images to be classified are classified through the classification model which completes the classification training optimization to determine the initial classification, the classification model can be trained and optimized through a special classification training mode, and the classification model is configured in advance in equipment with classification functions such as a server and the like to better classify the images in multiple classes. On one hand, a more precise and accurate target feature vector is extracted from the image to be classified through a feature extraction model which is optimized through feature extraction, and the classification of the image to be classified is finally determined based on the comparison between the target feature vector and candidate classification feature vectors of one or more candidate images under the initial classification, so that a classification result is obtained. Therefore, the classification model carries out initial classification, and then the more refined feature vectors are subdivided in the initial classification, so that the determined classification is more accurate, the data processing amount of feature comparison is effectively reduced, the software and hardware resources are saved, and the processing efficiency is improved.

In an embodiment, for the classification model, a general classification model may be used, and images may be classified more coarsely by the general classification model, for example, the content of a certain image may be classified into categories such as cat or dog. The classification model can also adopt a fine-grained image classification model based on strong supervision information or a fine-grained image classification model based on weak supervision information. Based on the fine-grained image classification model, more detailed classification can be performed, for example, the classification can be subdivided into cat varieties, dog varieties and the like.

The fine-grained image classification model based on strong supervision information is as follows: during model training optimization, in order to obtain better classification accuracy, in addition to the class labels of the images, additional manual labeling information such as object labeling frames and part labeling points is used, and the classification model can achieve higher classification accuracy.

The classification model of the fine-grained image based on the weak supervision information is as follows: under the condition of not marking points by means of parts, better local information capture can be achieved. Compared with the strong supervision classification precision, the method has the advantages of low implementation cost and suitability for engineering application.

The embodiment of the invention mainly comprises a configuration stage of the classification model and the feature extraction model and an image classification stage based on the classification model and the feature extraction model. In the configuration stage, an initial classification model and an initial feature extraction model can be established off-line based on a configuration server, and the initial classification model and the initial feature extraction model are trained and optimized through a large number of training images to obtain a better classification model and a better feature extraction model capable of classifying images. And loading the classification model and the feature extraction model subjected to training optimization to a server (or other equipment for image classification) for image classification so as to provide a classification service for unknown images for users on line. In the image classification stage, a user only needs to shoot an image to be classified through a user terminal or obtain the image to be classified through other modes (such as downloading and the like), the image to be classified is carried in a classification request and sent to a server for image classification, the server calls a classification model and a feature extraction model to process the image to be classified, a classification result of the image to be classified is determined, response information of the classification request is returned to the user terminal, and the response information carries the classification result of the image to be classified.

In an embodiment, fig. 1 shows a schematic diagram of a framework for classifying images according to an embodiment of the present invention, where the framework includes a user side and a service side, where a user terminal on the user side may be an intelligent terminal with a camera function and a network function, such as a smart phone and a tablet computer, or may also be a terminal such as a personal computer. The on-line service of the service side can be provided by a server or a server group which can receive the classification request of the target application and is special for image classification, and the off-line processing service of the service side can be provided by a certain server or a server group which is special for designing and training the classification model and the feature extraction model. In other embodiments, the online service and the offline service on the service side may be provided by a dedicated server or a server group. In one embodiment, the target application may be an application having a scan identification module.

As shown in fig. 2a, an initial classification model is shown. The initial classification model includes: an Input layer, a Stem layer, an inclusion-net layer, a Reduction layer, an Average pore layer, a Dropout layer and a Softmax layer. The Input layer is an Input layer and can be used for inputting images; the inclusion-respet layer is a hidden layer, and the model has three hidden layers, which are respectively: an inclusion-rest-a first hidden layer, an inclusion-rest-B second hidden layer, and an inclusion-rest-C third hidden layer; the Stem layer is a preprocessing layer and can be used for preprocessing data input into the increment-net-A, and the preprocessing can comprise performing multiple convolution and pooling on the data; the Average Pooling layer is an Average Pooling layer and can be used for performing dimensionality reduction processing on data output by the inclusion-respet-C; the Dropout layer can be used for preventing the initial classification model from being over-fitted, so that the situation that the initial classification model can well classify training images, but the classification effect is poor for actual images needing to be classified after deployment is effectively avoided; the Softmax layer is a classification calculation layer, and the output result of the classification calculation layer can be the probability that the image Input through the Input layer belongs to each class.

In one embodiment, when configuring the classification model, training images of a plurality of image classes may be input to the initial classification model in advance by using a model generation tool, and the initial classification model may be trained. In one embodiment, when the model generation tool used is a tensoflow (a deep learning framework used in the field of image recognition and the like) tool, since the inverse gradient on the tensoflow tool is automatically calculated, parameters corresponding to each node in the initial classification model can be quickly adjusted and generated according to the input training image in the training optimization process, so that the training optimization of the initial classification model is realized, and the classification model is obtained. Wherein the class of the training image can be adjusted according to different design goals.

In one embodiment, in the training optimization process of the initial classification model, each successfully trained training image may be determined as a candidate map, a category label of a category to which the candidate map belongs is set for the candidate map according to the category, and the candidate map and the corresponding category label are stored in the database in an associated manner, so as to subsequently determine the category to which the candidate map belongs based on the category label.

In an embodiment of the present invention, the feature extraction model may be configured according to a feature extraction network and a feature representation optimization module, the feature extraction network is configured to extract an initial feature vector of an image to be classified based on a neural network, the feature representation optimization module is configured to optimize the initial feature vector to obtain an N-dimensional target feature vector, where N is a positive integer, and the feature extraction model finally outputs the N-dimensional target feature vector related to the image, for example, outputs a 2048-dimensional target feature vector.

In one embodiment, in order to generate the above feature extraction model, an initial feature classification model including an initial feature extraction network based on a neural network formation and a training image may be obtained. Further, the initial feature classification model can be trained and optimized according to the obtained training image to obtain a feature classification model, the feature classification model comprises a feature extraction network obtained after the initial feature extraction network is trained and optimized, the feature extraction network can be further obtained from the feature classification model, a feature representation optimization module is generated based on whitening parameters, and the feature extraction model is generated according to the feature extraction network and the feature representation optimization module. In one embodiment, the feature representation optimization module may be implemented based on R-MAC (an image feature extraction method).

In one embodiment, the characterization optimization module is generated according to a conversion formula: x is the number of_wX w + b. That is, after an initial matrix x is input to the feature representation optimization module, the feature representation optimization module can output the optimized matrix x according to the transformation relation_wX w + b. Similarly, when the feature extraction network inputs the feature vector to the feature representation optimization module, the feature representation optimization module may also output the optimized feature vector in a similar manner. By adopting such an optimization processing method, the correlation between the features represented by the feature values in the feature vector after the optimization processing is low, for example, the correlation needs to be lower than a certain correlation threshold value, and the features have the same or similar variances.

In one embodiment, the feature classification model and the feature representation optimization module may be generated by training an initial feature classification model and whitening parameters using a model generation tool. Specifically, the model generation tool used can also be a Tensorflow tool, and since the inverse gradient on Tensorflow is automatically calculated, the feature classification model and the feature representation optimization module can be obtained relatively quickly.

As shown in fig. 2b, an initial feature classification model is shown, which includes an initial feature extraction network constructed based on a neural network, which may employ a convolutional neural network. As can be seen from the figure, the initial feature extraction network includes three convolutional layers: the first, second, and third convolutional layers cfgl [0] block, cfgl [1] block, and cfgl [2] block, in other embodiments, may also include more convolutional layers, such as a fourth convolutional layer cfgl [3] block, after the cfgl [2] block, and so on. In the training optimization process, each convolution layer in the initial feature extraction network performs convolution processing on an input training image and outputs a feature vector related to the input training image (i.e., an initial classification feature vector of the training image). In FIG. 2b, cfgl [0] block and cfgl [1] block are convolutional layers of the first type, the cfgl [0] block transmits the data after convolution processing to the cfgl [1] block, and the cfgl [1] block transmits the data after convolution processing to the cfgl [2] block. The cfgl [2] block shown in FIG. 2b is a convolution layer of the second type, and the cfgl [2] block outputs the data after convolution processing (i.e. the initial classification feature vector of the training image), so that the subsequent network layer can calculate the initial classification feature vector to determine the category of the image. After training the initial feature classification model to obtain a feature classification model, extracting network layers for calculating classification feature vectors from the trained feature classification model, and using the network layers as a feature extraction network, for example, cfgl [0] block, cfgl [1] block, cfgl [2] block, and cfgl [3] block as the feature extraction network.

In the embodiment of the present invention, after the initial feature classification model shown in fig. 2b is trained and optimized to obtain the feature classification model, the initial feature extraction network after training and optimization can be obtained from the feature classification model (that is, the feature extraction network is obtained from the feature classification model), and a connection relationship between the feature extraction network and the feature representation optimization module is established as shown in fig. 2 c. It can be seen from the figure that after an image is input into a feature extraction network, each convolution layer (cfgl [0] block, cfgl [1] block and cfgl [2] block) in the feature extraction network can perform convolution processing on the input image, so as to determine an initial classification feature vector of the input image, and send the initial classification feature vector of the input image to a feature representation optimization module through the last convolution layer cfgl [2] block, so as to perform optimization processing on the initial classification feature vector of the input image by using the feature representation optimization module, and determine and output a candidate classification feature vector of the input image. The optimization process may be multidimensional vector identification on the initial classification feature vector of the input image, for example, 2048-dimensional vector representation, that is, the optimized feature vector (i.e., candidate classification feature vector) may be a 2048-dimensional vector, and the expression form of the 2048-dimensional feature data may be (0.1, 0.11, 0.15, … …, 0.16), for example. It is understood that vectors of other dimensions can be obtained based on different feature representation optimization modules, and the higher the dimension of the used vector is, the more accurate the classification result of the input image is.

In the embodiment of the invention, a classification model can be called to perform initial classification on various training images, and the successfully classified training images are stored in a database as candidate images. Furthermore, the candidate graphs can be used as the input of the feature extraction network, the feature extraction model is called to calculate candidate classification feature vectors of all the candidate graphs, the candidate classification feature vectors corresponding to all the candidate graphs are generated for all the candidate graphs, and the candidate graphs and the candidate classification feature vectors are stored in an associated manner, so that when a classification request of a user about an image is received subsequently, a classification result of the image to be classified can be determined directly based on the candidate classification feature vectors and returned to the user terminal, and the feature extraction model is not required to be called to calculate massive candidate graphs to obtain the candidate classification feature vectors after the classification request about the image is received, so that a large amount of calculation and query time is saved.

In an embodiment, when the candidate map and the candidate classification feature vector are stored in association, a mapping relationship list of the candidate map and the candidate classification feature vector may be established, where column 1 is a storage address of the candidate map and column 2 is the candidate classification feature vector, and when the candidate classification feature vector is found, the candidate map corresponding to the candidate classification feature vector may be quickly found according to the mapping relationship list, and vice versa.

After training the classification model and the feature extraction model and optimizing the related parameters are completed through a large number of training images, the classification model and the feature extraction model after training optimization are configured into corresponding servers, and a large number of candidate graphs and candidate classification feature vectors are stored in the corresponding servers in an associated mode, so that online image classification services are provided for users.

It is to be understood that the classification model and the feature extraction model may be trained and configured by a server in the embodiments of the present invention, and may also be implemented by a powerful personal computer with rich software and hardware resources in other embodiments, which is not limited in this respect.

Referring to fig. 3, it is a schematic flow chart of image classification according to an embodiment of the present invention, and the method according to an embodiment of the present invention may be executed by a server or a server group. The method of an embodiment of the present invention includes the following steps.

S301: and acquiring an image to be classified. In specific implementation, the server may receive a classification request sent by the user terminal, and obtain the carried image to be classified from the classification request. The image to be classified may be obtained by a user using a shooting module of the user terminal, or may be obtained by other methods, which is not specifically limited in the present invention.

S302: and calling a classification model to classify the images to be classified, and determining the initial class to which the images to be classified belong. In an embodiment, the classification model may be obtained by a user through training and optimization based on an initial classification model, where the classification model has a characteristic of high accuracy of TOP5, that is, the classification model may more accurately determine a category (i.e., an initial category) to which an image to be classified belongs, for example, when an object corresponding to the image to be classified is a bosch cat, the classification model may more accurately determine that the initial category to which the image to be classified belongs is a cat rather than a dog.

In one embodiment, the classification model is derived from an initial classification model through a number of training optimizations. As shown in fig. 2a, the initial classification model may be trained and optimized to obtain a classification model, which may refer to the above description, and is not described herein again.

In an embodiment, after the image to be classified is input into the classification model, the classification model may extract feature data of the image to be classified, and calculate probabilities that the image to be classified belongs to each candidate category according to the feature data, and further may determine K (K is a positive integer) candidate categories with the top probabilities as initial categories to which the image to be classified belongs.

S303: and calling a feature extraction model to determine a target feature vector of the image to be classified. The feature extraction model can be formed according to a feature extraction network and a feature representation optimization module, the feature extraction network is used for extracting initial feature vectors of the images to be classified based on a neural network, the feature representation module is used for optimizing the initial feature vectors to obtain N-dimensional target feature vectors, and N is a positive integer.

In one embodiment, in order to generate the above feature extraction model, an initial feature classification model including an initial feature extraction network based on a neural network formation and a training image may be obtained. Further, the initial feature classification model can be trained and optimized according to the obtained training image to obtain a feature classification model, the feature classification model comprises a feature extraction network obtained after the initial feature extraction network is trained and optimized, the feature extraction network can be further obtained from the feature classification model, and the feature extraction model is generated according to the feature extraction network and a pre-generated feature representation optimization module. The feature representation optimization module may be obtained by performing training optimization on whitening parameters based on a training image, and may be configured to optimize a feature vector output by the feature extraction network and output the optimized feature vector.

In one embodiment, the connection relationship between the feature extraction network and the feature representation optimization module can be as shown in fig. 2 c. The feature extraction network can be formed by a development user based on a neural network, and can comprise a first type convolutional layer and a second type convolutional layer, wherein the first type convolutional layer outputs convolution-processed data to another convolutional layer in the feature extraction network, and the second type convolutional layer outputs convolution-processed data to a feature representation optimization module. After the image to be classified is input into the feature extraction network, the feature extraction network can be called first to process the image to be classified, an initial feature vector related to the image to be classified can be obtained, the initial feature vector is input into the feature representation optimization module through the convolution layer of the second type, the feature representation optimization module is called to optimize the initial feature vector, and an N-dimensional target feature vector related to the image to be classified, such as a 2048-dimensional vector, can represent the image to be classified to a certain extent, is an image understood by a computer, and is not easy to understand and visualize by a user.

S304: and comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result. The comparison may be performed by calculating a similarity between each vector in the target feature vector and the candidate classification feature vector, for example, calculating a hamming distance or a euclidean distance of the similarity between each vector in each dimension. The higher the similarity determined finally is, the higher the probability that the image to be classified belongs to the category to which the corresponding candidate classification feature vector belongs is.

S305: and determining a classification result of the image to be classified according to the comparison result.

In one embodiment, the initial category may include at least one candidate graph, and the server or the server group may obtain the candidate graphs belonging to the initial category and invoke the feature extraction model to configure candidate classification feature vectors for the initial category according to the candidate graphs.

In an embodiment, the candidate classification feature vector configured for the initial class may be obtained by processing the candidate graph through a feature extraction network in a feature extraction model to obtain an initial classification feature vector in the initial class, and optimizing the initial classification feature vector in the initial class through a feature representation optimization module in the feature extraction model. The candidate classification feature vector may be an N-dimensional feature vector, for example, a 2048-dimensional vector.

In one embodiment, the initial category may include a first category and a second category, and the server may perform similarity calculation on a target feature vector of the image to be classified and a candidate classification feature vector configured for the first category to obtain a first similarity; and performing similarity calculation on the target feature vector and the candidate classification feature vector configured for the second category to obtain a second similarity, and comparing the first similarity with the second similarity to obtain a comparison result, wherein the comparison result indicates a larger value between the first similarity and the second similarity. Further, if the comparison result indicates that the first similarity is greater than the second similarity, it may be determined that the image to be classified belongs to the first category. The initial category may also include a third category, a fourth category, or other categories, which are not specifically limited in the present invention.

For example, the target feature vector of the image to be classified is (0.23, 0.44, …, 0.61), the initial class is a dog class, wherein the first class included in the dog class is golden hair, the second class is teddy, the candidate classification feature vector corresponding to the golden hair is (0.23, 0.44, …, 0.67), the candidate classification feature vector corresponding to the teddy is (0.23, 0.31, …, 0.60), the server obtains the first similarity by performing similarity calculation on the target feature vector (0.23, 0.41, …, 0.61) and the candidate classification feature vector (0.23, 0.44, …, 0.67) corresponding to the golden hair, further obtains the first similarity by performing similarity calculation on the target feature vector (0.23, 0.41, …, 0.61) and the candidate classification feature vector (0.23, 0.31, …, 0.60) corresponding to the golden hair, obtains the second similarity by performing similarity calculation, and determines that the second similarity is greater than the first similarity calculation, and the second similarity is greater than the second similarity calculation, it can be determined that the image to be classified belongs to a golden hair.

In an embodiment, the first category and the second category may each include a candidate map associated with a candidate classification feature vector, and if the comparison result indicates that the first similarity is greater than the second similarity, an associated image of the image to be classified may be determined according to the first similarity, where the associated image is: and the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with the first similarity. For example, the candidate images with the similarity degree between the feature vector to be classified and the target feature vector being the first similarity degree are image JPG1 and image JPG2, then image JPG1 and image JPG2 may both be determined as the associated images of the image to be classified.

In one embodiment, when similarity calculation is performed on a target feature vector of an image to be classified and candidate classification feature vectors configured for a first class or a second class, an euclidean distance or a cosine included angle between the target feature vector of the image to be classified and the candidate classification feature vectors in the first class or the second class may be calculated through a matching algorithm, so as to obtain a degree of similarity between the target feature vector and the candidate classification feature vectors corresponding to each candidate image. When the similarity is calculated through a matching algorithm, matching can be performed through sequential traversal and sequential comparison, and certainly, a faiss (an open-source library for effective similarity search and dense vector clustering) based search engine can be adopted to search candidate classification feature vectors and perform matching.

In one embodiment, after the server determines the classification result of the image to be classified, response information for the classification request may be returned to the user terminal, where the response information may include at least one of the classification result and the associated image.

Referring to fig. 4, a flowchart of another image classification method according to an embodiment of the present invention is shown, where the method according to the embodiment of the present invention may be executed by a user terminal. The method of an embodiment of the present invention includes the following steps.

S401: and when the trigger operation aiming at the identification button is detected, calling the camera shooting assembly to acquire the current preview image.

S402: and if the current preview image is detected to be in a stable state, determining the current preview image as the image to be classified.

In one embodiment, the user terminal is installed with a target application that has an image classification function portal, which may include a scan recognition module. When a user wants to identify an object, a trigger operation can be input aiming at an identification button provided by the scanning identification module, when the user terminal detects the trigger operation, a scanning identification operation event can be generated, and a camera shooting component can be called to obtain a current preview image based on the event. Further, if it is detected that the current preview image is in a stable state, the current preview image in the stable state may be taken as an image to be classified, or a stored image may be directly imported as the image to be classified. The target application may be, for example, a browser, an instant messaging application, a payment application, and other applications having an image classification function entry.

In one embodiment, the current preview image may be acquired at preset time intervals, when determining whether the current preview image is in a stable state, the current preview image may be subjected to similarity comparison with the preview images acquired T times before, and if the similarity is greater than or equal to a similarity threshold, the current preview image may be determined to be in a stable state. Wherein T is a positive integer greater than 0, and the specific value of T may be adjusted accordingly according to different design requirements, which is not specifically limited in the present invention.

In one embodiment, as shown in fig. 5, the target application is a browser, and after clicking ascan identification button 501 on the browser, the user terminal enters an image acquisition interface, where animage display area 502 and a "+"button 503 for prompting to import an image are included, and the user can enter an image selection and confirmation interface by clicking thebutton 503. When it is detected that the preview image displayed in theimage display area 502 is in a stable state, the preview image in the stable state is determined as an image to be classified.

S403: and sending a classification request to a server, wherein the classification request carries the image to be classified. After obtaining the image to be classified, the user terminal may generate a classification request carrying the image to be classified, and then send the classification request to a server providing an online classification service.

In one embodiment, after receiving the response message returned from the server, the user terminal may display a user interface displaying: any one or more of a classification result of the image to be classified, an associated image and description information, wherein the classification result and the associated image can be carried in response information, and the description information can be obtained by querying according to the classification result. The description information may be text information and image information associated with the classification result. For example, the classification result shows that the object included in the image to be classified is the X-series off-road vehicle 01, and then the description information may include a basic introduction of the X-series off-road vehicle 01, such as a performance list, time to market, price, 4S store nearby selling the X-series off-road vehicle 01, and the like, and may also include a platform website selling the X-series off-road vehicle 01, and the like, which is not specifically described in the present invention.

In one embodiment, after receiving the response information, the user terminal may directly perform online search based on the classification result carried in the response information, thereby obtaining the description information of the image to be classified, and displaying the description information on the user interface. Alternatively, when a trigger operation for the user interface is received, the description information of the classification result may be obtained online or locally, and the obtained description information may be displayed on the user interface.

It should be noted that, the description information displayed on the user interface may be obtained by querying the classification result, or the response information carries the description information, and the user terminal obtains the description information from the response information, which is not limited in this invention.

Embodiments of the present invention further provide a computer storage medium, in which program instructions are stored, and when the program instructions are executed, the computer storage medium is configured to implement the corresponding method described in the above embodiments.

Referring to fig. 6, it is a schematic structural diagram of an image classification device according to an embodiment of the present invention, where the image classification device may be disposed in a server, or may also be disposed in some intelligent terminals with rich software and hardware resources, such as some personal computers.

In one implementation of the apparatus of the embodiment of the present invention, the apparatus includes the following structure.

An obtainingmodule 601, configured to obtain an image to be classified;

thecalling module 602 is configured to call a classification model to classify the image to be classified, and determine an initial category to which the image to be classified belongs;

thecalling module 602 is further configured to call a feature extraction model to determine a target feature vector of the image to be classified;

acomparison module 603, configured to compare the target feature vector of the image to be classified with the candidate classification feature vector configured for the initial class, so as to obtain a comparison result;

and the determiningmodule 604 is configured to determine a classification result of the image to be classified according to the comparison result.

In one embodiment, the feature extraction model is formed according to a feature extraction network and a feature representation optimization module, the feature extraction network is used for extracting an initial feature vector of an image to be classified based on a neural network, the feature representation optimization module is used for optimizing the initial feature vector to obtain an N-dimensional target feature vector, and N is a positive integer.

In one embodiment, the apparatus may further include: atraining module 605, agenerating module 606, wherein: the obtainingmodule 601 is further configured to obtain an initial feature classification model and a training image, where the initial feature classification model includes an initial feature extraction network formed based on a neural network; atraining module 605, configured to perform training optimization on the initial feature classification model according to the training image obtained by the obtainingmodule 601 to obtain a feature classification model, where the feature classification model includes a feature extraction network obtained after performing training optimization on the initial feature extraction network; an obtainingmodule 601, configured to obtain the feature extraction network from the feature classification model; ageneration module 606 for generating a feature representation optimization module based on the whitening parameters; thegenerating module 606 is further configured to generate a feature extraction model according to the feature extraction network and the feature representation optimizing module, where the feature representation optimizing module is configured to optimize a feature vector output by the feature extraction network and output the optimized feature vector.

In one embodiment, the apparatus may further include: aconfiguration module 607, wherein: an obtainingmodule 601, configured to obtain a candidate graph belonging to the initial category; aconfiguration module 607, configured to invoke the feature extraction model to configure candidate classification feature vectors for the initial class according to the candidate map acquired by theacquisition module 601.

In an embodiment, theconfiguration module 607 may be specifically configured to process the candidate graph through a feature extraction network in the feature extraction model to obtain an initial classification feature vector in the initial category, and optimize the initial classification feature vector in the initial category through a feature representation optimization module in the feature extraction model to obtain an N-dimensional candidate classification feature vector in the initial category.

In one embodiment, the initial category includes a first category and a second category, and the comparingmodule 603 may include: a calculatingunit 6031, configured to perform similarity calculation on the target feature vector of the image to be classified and the candidate classification feature vector configured for the first class, so as to obtain a first similarity; similarity calculation is carried out on the target feature vector and the candidate classification feature vector configured for the second category, and second similarity is obtained; a comparingunit 6032, configured to compare the first similarity with the second similarity, and obtain a comparison result, where the comparison result indicates a larger value between the first similarity and the second similarity.

In one embodiment, the determiningmodule 604 may be specifically configured to determine that the image to be classified belongs to the first category if the comparison result indicates that the first similarity is greater than the second similarity.

In an embodiment, the first category and the second category each include a candidate map associated with a candidate classification feature vector, and the determiningmodule 604 may be further configured to determine to obtain an associated image of the image to be classified according to the first similarity if the comparison result indicates that the first similarity is greater than the second similarity, where the associated image is: and the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with a first similarity.

In the embodiment of the present invention, the specific implementation of the above modules may refer to the description of relevant contents in the embodiment corresponding to fig. 3.

Referring to fig. 7 again, it is a schematic structural diagram of a server according to an embodiment of the present invention, where the server according to an embodiment of the present invention includes a power supply module and other structures, and includes aprocessor 701, astorage device 702, and anetwork interface 703. Data can be interacted among theprocessor 701, thestorage device 702 and thenetwork interface 703, and theprocessor 701 realizes a corresponding image classification function.

Thestorage 702 may include a volatile memory (volatile memory), such as a random-access memory (RAM); thestorage device 702 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), or the like; the storage means 702 may also comprise a combination of memories of the kind described above.

Thenetwork interface 703 may interact data with other servers and various user terminals, and the user terminal may send a classification request carrying an image to be classified to thenetwork interface 703, and the classification request is output by thenetwork interface 703 to theprocessor 701 of the server for processing.

Theprocessor 701 may be a Central Processing Unit (CPU). In one embodiment, theprocessor 701 may also be a Graphics Processing Unit (GPU) 701. Theprocessor 701 may also be a combination of a CPU and a GPU. In the server, a plurality of CPUs and GPUs may be included as necessary to perform corresponding image processing. In one embodiment, thestorage 702 is used to store program instructions. Theprocessor 701 may invoke the program instructions to implement the various methods as described above in embodiments of the invention.

In a first possible implementation, theprocessor 701 of the server calls a program instruction stored in thestorage device 702 to obtain an image to be classified; calling a classification model to classify the images to be classified, and determining an initial class to which the images to be classified belong; calling a feature extraction model to determine a target feature vector of the image to be classified; comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result; and determining the classification result of the image to be classified according to the comparison result.

In one embodiment, theprocessor 701 is further configured to obtain an initial feature classification model, where the initial feature classification model includes an initial feature extraction network formed based on a neural network; acquiring a training image, and training and optimizing the initial feature classification model according to the training image to obtain a feature classification model, wherein the feature classification model comprises a feature extraction network obtained after training and optimizing the initial feature extraction network; acquiring the feature extraction network from the feature classification model; generating a feature representation optimization module based on the whitening parameters; and generating a feature extraction model according to the feature extraction network and the feature representation optimization module, wherein the feature representation optimization module is used for optimizing the feature vector output by the feature extraction network and outputting the optimized feature vector.

In an embodiment, theprocessor 701 is further configured to obtain a candidate graph belonging to the initial category; and calling the feature extraction model to configure candidate classification feature vectors for the initial classes according to the candidate graphs.

In an embodiment, theprocessor 701 is further configured to process the candidate graph through a feature extraction network in the feature extraction model to obtain an initial classification feature vector in the initial category, and optimize the initial classification feature vector in the initial category through a feature representation optimization module in the feature extraction model to obtain an N-dimensional candidate classification feature vector in the initial category.

In an embodiment, the initial category includes a first category and a second category, and theprocessor 701 is further configured to perform similarity calculation on a target feature vector of the image to be classified and a candidate classification feature vector configured for the first category to obtain a first similarity; similarity calculation is carried out on the target feature vector and the candidate classification feature vector configured for the second category, and second similarity is obtained; comparing the first similarity with the second similarity to obtain a comparison result, wherein the comparison result indicates a larger value between the first similarity and the second similarity.

In one embodiment, theprocessor 701 is further configured to determine that the image to be classified belongs to the first category if the comparison result indicates that the first similarity is greater than the second similarity.

In an embodiment, the first category and the second category each include a candidate map associated with a candidate classification feature vector, and theprocessor 701 is further configured to determine to obtain an associated image of the image to be classified according to the first similarity if the comparison result indicates that the first similarity is greater than the second similarity, where the associated image is: and the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with a first similarity.

In the embodiment of the present invention, theprocessor 701 may be implemented as described with reference to the foregoing description of the embodiment corresponding to fig. 3.

Referring to fig. 8, a schematic structural diagram of another image classification device according to an embodiment of the present invention is shown, where the image classification device may be disposed in a user terminal.

Adetection module 801, configured to detect a trigger operation for an identification button;

an obtainingmodule 802, configured to, when the detection module detects a trigger operation for an identification button, call a camera component to obtain a current preview image;

the detectingmodule 801 is further configured to detect whether the current preview image is in a stable state;

a determiningmodule 803, configured to determine the current preview image as an image to be classified if the detecting module detects that the current preview image is in a stable state;

a sendingmodule 804, configured to send a classification request to a server, where the classification request carries the image to be classified;

the receivingmodule 805 is configured to receive response information returned by the server, where the response information includes a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.

In one embodiment, the apparatus may further comprise: adisplay module 806, configured to display a user interface after receiving the response information returned from the server, where the user interface displays: any one or more of the classification result of the image to be classified, the associated image and the description information, wherein the classification result and the associated image are carried in the response information, and the description information is obtained by inquiring according to the classification result.

In the embodiment of the present invention, the specific implementation of the above modules may refer to the description of relevant contents in the embodiment corresponding to fig. 4.

Referring to fig. 9 again, it is a schematic structural diagram of a user terminal according to an embodiment of the present invention, and the user terminal according to an embodiment of the present invention may include a power supply module and the like, and includes aprocessor 901, astorage 902, and atransceiver 903. Data can be interacted among theprocessor 901, thestorage device 902 and thetransceiver 903, and theprocessor 901 realizes a corresponding image classification function.

Thestorage 902 may include a volatile memory (volatile memory), such as a random-access memory (RAM); thestorage 902 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; thestorage 902 may also comprise a combination of memories of the kind described above.

Thetransceiver 903 may interact with a server and various user terminals to exchange data, and the server may send response information carrying the classification result to thetransceiver 903, and thetransceiver 903 outputs the response information to theprocessor 901 of the user terminal for processing.

Theprocessor 901 may be a Central Processing Unit (CPU) 901. In one embodiment, theprocessor 901 may also be a Graphics Processing Unit (GPU) 901. Theprocessor 901 may also be a combination of a CPU and a GPU. In the user terminal, a plurality of CPUs and GPUs may be included as necessary to perform corresponding image processing. In one embodiment, thestorage 902 is used to store program instructions. Theprocessor 901 may call the program instructions to implement the various methods as described above in the embodiments of the present invention.

In a first possible implementation, theprocessor 901 of the user terminal invokes a program instruction stored in thestorage 902, so as to invoke a camera component to acquire a current preview image when detecting a trigger operation for an identification button; if the current preview image is detected to be in a stable state, determining the current preview image as an image to be classified; sending a classification request to a server, wherein the classification request carries the image to be classified; and receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.

In one embodiment, theprocessor 901 is further configured to display a user interface after receiving the response information returned from the server, where the user interface displays: any one or more of the classification result of the image to be classified, the associated image and the description information, wherein the classification result and the associated image are carried in the response information, and the description information is obtained by inquiring according to the classification result.

In the embodiment of the present invention, theprocessor 901 may be implemented as described in the foregoing description with reference to the embodiment corresponding to fig. 4.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a number of embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of image classification, the method comprising:

acquiring an image to be classified;

calling a classification model to classify the images to be classified, and determining initial classes to which the images to be classified belong, wherein the initial classes comprise a first class and a second class;

calling a feature extraction model to determine a target feature vector of the image to be classified, wherein after an initial feature classification model is trained to obtain a feature classification model, network layers used for calculating classification feature vectors are extracted from the trained feature classification model and are used as a feature extraction network, and the feature extraction model is generated according to the feature extraction network and a feature representation optimization module generated based on whitening parameters;

comparing the first similarity with the second similarity to obtain a comparison result, wherein the candidate classification feature vectors configured for the initial category are determined according to candidate graphs included in the initial category, and each candidate graph corresponds to one candidate classification feature vector or multiple candidate graphs correspond to the same candidate classification feature vector;

determining a classification result of the image to be classified according to the comparison result, wherein if the comparison result indicates that the first similarity is greater than the second similarity, the image to be classified is determined to belong to the first category;

determining to obtain an associated image of the image to be classified according to the first similarity, wherein the associated image refers to: the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with a first similarity;

returning response information, wherein the response information can comprise the classification result, the associated image and description information of the image to be classified, which is obtained by searching according to the classification result; the response information is used for indicating a terminal receiving the response information to display the classification result, the associated image and the description information on a user interface;

the initial feature classification model comprises an initial feature extraction network formed based on a neural network, the feature classification model is obtained by training and optimizing the initial feature classification model according to a training image, and the feature classification model comprises: training and optimizing the initial feature extraction network to obtain a feature extraction network;

the feature extraction network is used for extracting initial feature vectors of the images to be classified based on a neural network, the feature representation optimization module is used for optimizing the initial feature vectors to obtain N-dimensional target feature vectors, and N is a positive integer.

2. The method of claim 1, further comprising:

acquiring a candidate graph belonging to the initial category;

and calling the feature extraction model to configure candidate classification feature vectors for the initial classes according to the candidate graphs.

3. The method of claim 2, wherein said invoking the feature extraction model to configure candidate classification feature vectors for the initial class according to the candidate graph comprises:

processing the candidate graph through a feature extraction network in the feature extraction model to obtain an initial classification feature vector under the initial category; and

and optimizing the initial classification feature vector under the initial category through a feature representation optimization module in the feature extraction model to obtain the N-dimensional candidate classification feature vector under the initial category.

4. A method of image classification, the method comprising:

receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, an associated image and description information of the image to be classified, which is obtained by searching according to the classification result; the response information is used for indicating a terminal receiving the response information to display the classification result, the associated image and the description information on a user interface, wherein the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model;

the classification model is used for classifying the images to be classified and determining initial classes to which the images to be classified belong, the initial classes comprise a first class and a second class, the feature extraction model is used for determining target feature vectors of the images to be classified, after the initial feature classification model is trained, network layers used for calculating the classification feature vectors are extracted from the trained feature classification model and serve as feature extraction networks, and the feature extraction models are generated according to the feature extraction networks and a feature representation optimization module generated on the basis of whitening parameters;

the classification result of the image to be classified is determined by a first similarity obtained by comparing a target feature vector of the image to be classified with a candidate classification feature vector configured for a first class in the initial classes and a second similarity obtained by comparing the target feature vector of the image to be classified with a candidate classification feature vector configured for a second class in the initial classes, wherein if the comparison result indicates that the first similarity is greater than the second similarity, the image to be classified is determined to belong to the first class, the candidate classification feature vector configured for the initial class is determined according to candidate graphs included in the initial class, and each candidate graph corresponds to one candidate classification feature vector or multiple candidate graphs corresponding to the same candidate classification feature vector;

5. The method of claim 4, further comprising:

after response information returned from the server is received, displaying a user interface, wherein the user interface displays: and the classification result, the associated image and the description information of the image to be classified.

6. An image classification apparatus, comprising:

the acquisition module is used for acquiring an image to be classified;

the calling module is used for calling a classification model to classify the image to be classified and determining an initial class to which the image to be classified belongs, wherein the initial class comprises a first class and a second class;

the calling module is further used for calling a feature extraction model to determine a target feature vector of the image to be classified, wherein after the initial feature classification model is trained to be finished to obtain a feature classification model, network layers used for calculating the classification feature vector are extracted from the trained feature classification model, and the network layers are used as a feature extraction network; the feature extraction model is generated according to the feature extraction network and a feature representation optimization module generated based on whitening parameters;

the comparison module is used for carrying out similarity calculation on the target characteristic vector of the image to be classified and the candidate classification characteristic vector configured for the first class to obtain a first similarity; similarity calculation is carried out on the target feature vector and the candidate classification feature vector configured for the second category, and second similarity is obtained; comparing the first similarity with the second similarity to obtain a comparison result, wherein the candidate classification feature vectors configured for the initial category are determined according to candidate graphs included in the initial category, and each candidate graph corresponds to one candidate classification feature vector or multiple candidate graphs correspond to the same candidate classification feature vector;

a determining module, configured to determine a classification result of the image to be classified according to the comparison result, where if the comparison result indicates that the first similarity is greater than the second similarity, it is determined that the image to be classified belongs to the first category, an associated image of the image to be classified is determined according to the first similarity, and response information is returned, where the response information may include the classification result, the associated image, and description information of the image to be classified searched according to the classification result; the response information is used for indicating a terminal receiving the response information to display the classification result, the associated image and the description information on a user interface, wherein the associated image is: the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with a first similarity;

7. An image classification apparatus, comprising:

the receiving module is used for receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, an associated image and description information of the image to be classified, which is obtained by searching according to the classification result; the response information is used for indicating a terminal receiving the response information to display the classification result, the associated image and the description information on a user interface, wherein the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model;

8. A server, characterized by comprising a processor and a storage device, the processor and the storage device being interconnected, wherein the storage device is configured to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method according to any one of claims 1-3.

9. A user terminal, characterized in that it comprises a processor and a storage device, said processor and storage device being interconnected, wherein said storage device is adapted to store a computer program, said computer program comprising program instructions, said processor being configured to invoke said program instructions to perform the method according to claim 4 or 5.

10. A computer storage medium having stored thereon program instructions for implementing a method according to any one of claims 1-3 or implementing a method according to claim 4 or 5 when executed.