Detailed Description
In the embodiment of the invention, on one hand, the images to be classified are classified through the classification model which completes the classification training optimization to determine the initial classification, the classification model can be trained and optimized through a special classification training mode, and the classification model is configured in advance in equipment with classification functions such as a server and the like to better classify the images in multiple classes. On one hand, a more precise and accurate target feature vector is extracted from the image to be classified through a feature extraction model which is optimized through feature extraction, and the classification of the image to be classified is finally determined based on the comparison between the target feature vector and candidate classification feature vectors of one or more candidate images under the initial classification, so that a classification result is obtained. Therefore, the classification model carries out initial classification, and then the more refined feature vectors are subdivided in the initial classification, so that the determined classification is more accurate, the data processing amount of feature comparison is effectively reduced, the software and hardware resources are saved, and the processing efficiency is improved.
In an embodiment, for the classification model, a general classification model may be used, and images may be classified more coarsely by the general classification model, for example, the content of a certain image may be classified into categories such as cat or dog. The classification model can also adopt a fine-grained image classification model based on strong supervision information or a fine-grained image classification model based on weak supervision information. Based on the fine-grained image classification model, more detailed classification can be performed, for example, the classification can be subdivided into cat varieties, dog varieties and the like.
The fine-grained image classification model based on strong supervision information is as follows: during model training optimization, in order to obtain better classification accuracy, in addition to the class labels of the images, additional manual labeling information such as object labeling frames and part labeling points is used, and the classification model can achieve higher classification accuracy.
The classification model of the fine-grained image based on the weak supervision information is as follows: under the condition of not marking points by means of parts, better local information capture can be achieved. Compared with the strong supervision classification precision, the method has the advantages of low implementation cost and suitability for engineering application.
The embodiment of the invention mainly comprises a configuration stage of the classification model and the feature extraction model and an image classification stage based on the classification model and the feature extraction model. In the configuration stage, an initial classification model and an initial feature extraction model can be established off-line based on a configuration server, and the initial classification model and the initial feature extraction model are trained and optimized through a large number of training images to obtain a better classification model and a better feature extraction model capable of classifying images. And loading the classification model and the feature extraction model subjected to training optimization to a server (or other equipment for image classification) for image classification so as to provide a classification service for unknown images for users on line. In the image classification stage, a user only needs to shoot an image to be classified through a user terminal or obtain the image to be classified through other modes (such as downloading and the like), the image to be classified is carried in a classification request and sent to a server for image classification, the server calls a classification model and a feature extraction model to process the image to be classified, a classification result of the image to be classified is determined, response information of the classification request is returned to the user terminal, and the response information carries the classification result of the image to be classified.
In an embodiment, fig. 1 shows a schematic diagram of a framework for classifying images according to an embodiment of the present invention, where the framework includes a user side and a service side, where a user terminal on the user side may be an intelligent terminal with a camera function and a network function, such as a smart phone and a tablet computer, or may also be a terminal such as a personal computer. The on-line service of the service side can be provided by a server or a server group which can receive the classification request of the target application and is special for image classification, and the off-line processing service of the service side can be provided by a certain server or a server group which is special for designing and training the classification model and the feature extraction model. In other embodiments, the online service and the offline service on the service side may be provided by a dedicated server or a server group. In one embodiment, the target application may be an application having a scan identification module.
As shown in fig. 2a, an initial classification model is shown. The initial classification model includes: an Input layer, a Stem layer, an inclusion-net layer, a Reduction layer, an Average pore layer, a Dropout layer and a Softmax layer. The Input layer is an Input layer and can be used for inputting images; the inclusion-respet layer is a hidden layer, and the model has three hidden layers, which are respectively: an inclusion-rest-a first hidden layer, an inclusion-rest-B second hidden layer, and an inclusion-rest-C third hidden layer; the Stem layer is a preprocessing layer and can be used for preprocessing data input into the increment-net-A, and the preprocessing can comprise performing multiple convolution and pooling on the data; the Average Pooling layer is an Average Pooling layer and can be used for performing dimensionality reduction processing on data output by the inclusion-respet-C; the Dropout layer can be used for preventing the initial classification model from being over-fitted, so that the situation that the initial classification model can well classify training images, but the classification effect is poor for actual images needing to be classified after deployment is effectively avoided; the Softmax layer is a classification calculation layer, and the output result of the classification calculation layer can be the probability that the image Input through the Input layer belongs to each class.
In one embodiment, when configuring the classification model, training images of a plurality of image classes may be input to the initial classification model in advance by using a model generation tool, and the initial classification model may be trained. In one embodiment, when the model generation tool used is a tensoflow (a deep learning framework used in the field of image recognition and the like) tool, since the inverse gradient on the tensoflow tool is automatically calculated, parameters corresponding to each node in the initial classification model can be quickly adjusted and generated according to the input training image in the training optimization process, so that the training optimization of the initial classification model is realized, and the classification model is obtained. Wherein the class of the training image can be adjusted according to different design goals.
In one embodiment, the design goal is to build a classification model that can be used to distinguish between a first class (e.g., cats) and a second class (e.g., dogs) of images. For this case, when training the initial classification model, M (M is a positive integer, e.g., 10000) images that have been determined as a dog category may be selected in advance as the dog training images of the initial classification model, and P (P is a positive integer, e.g., 10000) images that have been determined as a cat category may be selected as the cat training images of the initial classification model. In one embodiment, after a certain dog training image is input into the initial classification model, the initial classification model may extract image feature data of the dog training image, classify the dog training image according to the image feature data, and if the output classification result indicates that the category of the dog training image is also a dog, it indicates that the classification of the dog training image by the classification network model is successful. Further, after the M training images labeled as the dog categories are classified, if the success rate is greater than a preset success rate threshold (e.g., 90%), it is determined that the initial classification model can well classify and identify the images of the dog categories, otherwise, parameters corresponding to each node in the initial classification model can be adjusted, and the M dog training images are classified again through the adjusted classification model. Similarly, the initial classification model can be trained and optimized by using the P cat training images in the same manner, if the classification success rates of the dog training images and the cat training images meet the preset success rate threshold, the training of the initial classification model is completed, and the trained initial classification model is used as the classification model in the embodiment of the invention. In other embodiments, more different categories may be set, and the initial classification model is trained and optimized by obtaining a large number of training images of different categories, so that the success rate of classifying each type of image by the finally obtained classification model is higher than a certain success rate threshold. The categories such as the first category and the second category may be categories such as cats or dogs, or more detailed categories such as specific breeds of cats and/or dogs, for example, fine categories such as shepherd dogs, dog puppies, etc.
In one embodiment, in the training optimization process of the initial classification model, each successfully trained training image may be determined as a candidate map, a category label of a category to which the candidate map belongs is set for the candidate map according to the category, and the candidate map and the corresponding category label are stored in the database in an associated manner, so as to subsequently determine the category to which the candidate map belongs based on the category label.
In an embodiment of the present invention, the feature extraction model may be configured according to a feature extraction network and a feature representation optimization module, the feature extraction network is configured to extract an initial feature vector of an image to be classified based on a neural network, the feature representation optimization module is configured to optimize the initial feature vector to obtain an N-dimensional target feature vector, where N is a positive integer, and the feature extraction model finally outputs the N-dimensional target feature vector related to the image, for example, outputs a 2048-dimensional target feature vector.
In one embodiment, in order to generate the above feature extraction model, an initial feature classification model including an initial feature extraction network based on a neural network formation and a training image may be obtained. Further, the initial feature classification model can be trained and optimized according to the obtained training image to obtain a feature classification model, the feature classification model comprises a feature extraction network obtained after the initial feature extraction network is trained and optimized, the feature extraction network can be further obtained from the feature classification model, a feature representation optimization module is generated based on whitening parameters, and the feature extraction model is generated according to the feature extraction network and the feature representation optimization module. In one embodiment, the feature representation optimization module may be implemented based on R-MAC (an image feature extraction method).
In one embodiment, the feature representation optimization module may be obtained by performing training optimization on whitening parameters based on a training image, where the whitening parameters may be composed of a matrix w and a bias matrix b, and the feature representation optimization module may be configured to optimize feature vectors output by the feature extraction network and output the optimized feature vectors. The processing of the characterization optimization module may be a whitening process. In one embodiment, the training optimization process of the whitening parameters includes: the feature extraction network can be called to extract feature vectors from a plurality of training images respectively, and the feature vectors are calculated according to the obtained feature vectors to optimize to obtain a matrix w and a bias matrix b, so that a feature representation optimization module is obtained, and the feature representation optimization module is generated based on the matrix w and the bias matrix b. For example, there are 100 ten thousand training samples, that is, there are 100 ten thousand training images, 2048-dimensional feature representation training matrix x is obtained for each training image, and a 2048 × 2048 matrix w and a 2048 offset matrix b need to be calculated to whiten the 2048-dimensional feature representation training matrix x, so that the 2048-dimensional feature representation training matrix x is obtained after whiteningIn the matrix, the dimension data are not correlated, that is, the value of one dimension data is not helpful to guess or calculate the value of the other dimension after being known. Specifically, as shown in fig. 2d, the image on the left side in fig. 2d is a two-dimensional vector of 100 ten thousand 2048 dimensions, it can be seen that the dimension data are relatively dense in the image on the left side, a certain correlation exists between the values of the dimension data, the matrix w and the offset matrix b are continuously calculated in a mathematical calculation manner, and finally the two-dimensional vector on the left side passes through xwThe right image is obtained after calculation of the formula x w + b. It can be seen that in the image on the right side, the correlation between the respective dimensional data is low. In theory, w and b can be directly calculated, but often, because the number of samples is too large, one w and b are approached through a training mode, so that the input two-dimensional vector can be converted from the image on the left side to the image on the right side after passing through the w and b. Wherein 2048 dimensions refers to the dimension quantity that needs to be output by the feature representation optimization module, and if there is a need for other dimension metrics, for example, 4096-dimensional or even more-dimensional metric data needs to be output, during training w and b to generate the feature representation optimization module, the 2048 needs to be adjusted to 4096-dimensional or even more-dimensional metric values.
In one embodiment, the characterization optimization module is generated according to a conversion formula: x is the number ofwX w + b. That is, after an initial matrix x is input to the feature representation optimization module, the feature representation optimization module can output the optimized matrix x according to the transformation relationwX w + b. Similarly, when the feature extraction network inputs the feature vector to the feature representation optimization module, the feature representation optimization module may also output the optimized feature vector in a similar manner. By adopting such an optimization processing method, the correlation between the features represented by the feature values in the feature vector after the optimization processing is low, for example, the correlation needs to be lower than a certain correlation threshold value, and the features have the same or similar variances.
In one embodiment, the feature classification model and the feature representation optimization module may be generated by training an initial feature classification model and whitening parameters using a model generation tool. Specifically, the model generation tool used can also be a Tensorflow tool, and since the inverse gradient on Tensorflow is automatically calculated, the feature classification model and the feature representation optimization module can be obtained relatively quickly.
As shown in fig. 2b, an initial feature classification model is shown, which includes an initial feature extraction network constructed based on a neural network, which may employ a convolutional neural network. As can be seen from the figure, the initial feature extraction network includes three convolutional layers: the first, second, and third convolutional layers cfgl [0] block, cfgl [1] block, and cfgl [2] block, in other embodiments, may also include more convolutional layers, such as a fourth convolutional layer cfgl [3] block, after the cfgl [2] block, and so on. In the training optimization process, each convolution layer in the initial feature extraction network performs convolution processing on an input training image and outputs a feature vector related to the input training image (i.e., an initial classification feature vector of the training image). In FIG. 2b, cfgl [0] block and cfgl [1] block are convolutional layers of the first type, the cfgl [0] block transmits the data after convolution processing to the cfgl [1] block, and the cfgl [1] block transmits the data after convolution processing to the cfgl [2] block. The cfgl [2] block shown in FIG. 2b is a convolution layer of the second type, and the cfgl [2] block outputs the data after convolution processing (i.e. the initial classification feature vector of the training image), so that the subsequent network layer can calculate the initial classification feature vector to determine the category of the image. After training the initial feature classification model to obtain a feature classification model, extracting network layers for calculating classification feature vectors from the trained feature classification model, and using the network layers as a feature extraction network, for example, cfgl [0] block, cfgl [1] block, cfgl [2] block, and cfgl [3] block as the feature extraction network.
In the embodiment of the present invention, after the initial feature classification model shown in fig. 2b is trained and optimized to obtain the feature classification model, the initial feature extraction network after training and optimization can be obtained from the feature classification model (that is, the feature extraction network is obtained from the feature classification model), and a connection relationship between the feature extraction network and the feature representation optimization module is established as shown in fig. 2 c. It can be seen from the figure that after an image is input into a feature extraction network, each convolution layer (cfgl [0] block, cfgl [1] block and cfgl [2] block) in the feature extraction network can perform convolution processing on the input image, so as to determine an initial classification feature vector of the input image, and send the initial classification feature vector of the input image to a feature representation optimization module through the last convolution layer cfgl [2] block, so as to perform optimization processing on the initial classification feature vector of the input image by using the feature representation optimization module, and determine and output a candidate classification feature vector of the input image. The optimization process may be multidimensional vector identification on the initial classification feature vector of the input image, for example, 2048-dimensional vector representation, that is, the optimized feature vector (i.e., candidate classification feature vector) may be a 2048-dimensional vector, and the expression form of the 2048-dimensional feature data may be (0.1, 0.11, 0.15, … …, 0.16), for example. It is understood that vectors of other dimensions can be obtained based on different feature representation optimization modules, and the higher the dimension of the used vector is, the more accurate the classification result of the input image is.
In the embodiment of the invention, a classification model can be called to perform initial classification on various training images, and the successfully classified training images are stored in a database as candidate images. Furthermore, the candidate graphs can be used as the input of the feature extraction network, the feature extraction model is called to calculate candidate classification feature vectors of all the candidate graphs, the candidate classification feature vectors corresponding to all the candidate graphs are generated for all the candidate graphs, and the candidate graphs and the candidate classification feature vectors are stored in an associated manner, so that when a classification request of a user about an image is received subsequently, a classification result of the image to be classified can be determined directly based on the candidate classification feature vectors and returned to the user terminal, and the feature extraction model is not required to be called to calculate massive candidate graphs to obtain the candidate classification feature vectors after the classification request about the image is received, so that a large amount of calculation and query time is saved.
In an embodiment, when the candidate map and the candidate classification feature vector are stored in association, a mapping relationship list of the candidate map and the candidate classification feature vector may be established, where column 1 is a storage address of the candidate map and column 2 is the candidate classification feature vector, and when the candidate classification feature vector is found, the candidate map corresponding to the candidate classification feature vector may be quickly found according to the mapping relationship list, and vice versa.
After training the classification model and the feature extraction model and optimizing the related parameters are completed through a large number of training images, the classification model and the feature extraction model after training optimization are configured into corresponding servers, and a large number of candidate graphs and candidate classification feature vectors are stored in the corresponding servers in an associated mode, so that online image classification services are provided for users.
In one embodiment, when a large number of candidate graphs and candidate classification feature vectors are stored in association with each other in the corresponding server, different candidate classification feature vectors may be configured for different categories according to the category to which the candidate graph belongs. Specifically, when L (L is a positive integer) candidate maps are stored in a certain category, L candidate classification feature vectors may be directly configured for the category, that is, each candidate map corresponds to one candidate classification feature vector. Or, the similarity calculation may be performed on the candidate classification feature vectors of each candidate map in the category, and each candidate classification feature vector having a similarity smaller than the first similarity threshold is classified as a feature vector of the same category, so that each candidate map corresponding to the candidate classification feature vector of the same category is characterized by using the same candidate classification feature vector, that is, each candidate map corresponds to the same candidate classification feature vector, and then less than L candidate classification feature vectors are configured for the category. By adopting the method, the number of candidate classification characteristic vectors configured for each category can be reduced while the image classification accuracy is ensured, and the operation rate is further improved. In an embodiment, the candidate graph may be various training images adopted in the foregoing, or may be images of corresponding types acquired by a user through downloading or shooting, the images have known categories, and candidate classification feature vectors are extracted from the images through the above-mentioned feature extraction model, and both the images and the corresponding candidate classification feature vectors may be stored in a database, so as to facilitate subsequent search.
It is to be understood that the classification model and the feature extraction model may be trained and configured by a server in the embodiments of the present invention, and may also be implemented by a powerful personal computer with rich software and hardware resources in other embodiments, which is not limited in this respect.
Referring to fig. 3, it is a schematic flow chart of image classification according to an embodiment of the present invention, and the method according to an embodiment of the present invention may be executed by a server or a server group. The method of an embodiment of the present invention includes the following steps.
S301: and acquiring an image to be classified. In specific implementation, the server may receive a classification request sent by the user terminal, and obtain the carried image to be classified from the classification request. The image to be classified may be obtained by a user using a shooting module of the user terminal, or may be obtained by other methods, which is not specifically limited in the present invention.
S302: and calling a classification model to classify the images to be classified, and determining the initial class to which the images to be classified belong. In an embodiment, the classification model may be obtained by a user through training and optimization based on an initial classification model, where the classification model has a characteristic of high accuracy of TOP5, that is, the classification model may more accurately determine a category (i.e., an initial category) to which an image to be classified belongs, for example, when an object corresponding to the image to be classified is a bosch cat, the classification model may more accurately determine that the initial category to which the image to be classified belongs is a cat rather than a dog.
In one embodiment, the classification model is derived from an initial classification model through a number of training optimizations. As shown in fig. 2a, the initial classification model may be trained and optimized to obtain a classification model, which may refer to the above description, and is not described herein again.
In an embodiment, after the image to be classified is input into the classification model, the classification model may extract feature data of the image to be classified, and calculate probabilities that the image to be classified belongs to each candidate category according to the feature data, and further may determine K (K is a positive integer) candidate categories with the top probabilities as initial categories to which the image to be classified belongs.
S303: and calling a feature extraction model to determine a target feature vector of the image to be classified. The feature extraction model can be formed according to a feature extraction network and a feature representation optimization module, the feature extraction network is used for extracting initial feature vectors of the images to be classified based on a neural network, the feature representation module is used for optimizing the initial feature vectors to obtain N-dimensional target feature vectors, and N is a positive integer.
In one embodiment, in order to generate the above feature extraction model, an initial feature classification model including an initial feature extraction network based on a neural network formation and a training image may be obtained. Further, the initial feature classification model can be trained and optimized according to the obtained training image to obtain a feature classification model, the feature classification model comprises a feature extraction network obtained after the initial feature extraction network is trained and optimized, the feature extraction network can be further obtained from the feature classification model, and the feature extraction model is generated according to the feature extraction network and a pre-generated feature representation optimization module. The feature representation optimization module may be obtained by performing training optimization on whitening parameters based on a training image, and may be configured to optimize a feature vector output by the feature extraction network and output the optimized feature vector.
In one embodiment, the connection relationship between the feature extraction network and the feature representation optimization module can be as shown in fig. 2 c. The feature extraction network can be formed by a development user based on a neural network, and can comprise a first type convolutional layer and a second type convolutional layer, wherein the first type convolutional layer outputs convolution-processed data to another convolutional layer in the feature extraction network, and the second type convolutional layer outputs convolution-processed data to a feature representation optimization module. After the image to be classified is input into the feature extraction network, the feature extraction network can be called first to process the image to be classified, an initial feature vector related to the image to be classified can be obtained, the initial feature vector is input into the feature representation optimization module through the convolution layer of the second type, the feature representation optimization module is called to optimize the initial feature vector, and an N-dimensional target feature vector related to the image to be classified, such as a 2048-dimensional vector, can represent the image to be classified to a certain extent, is an image understood by a computer, and is not easy to understand and visualize by a user.
S304: and comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result. The comparison may be performed by calculating a similarity between each vector in the target feature vector and the candidate classification feature vector, for example, calculating a hamming distance or a euclidean distance of the similarity between each vector in each dimension. The higher the similarity determined finally is, the higher the probability that the image to be classified belongs to the category to which the corresponding candidate classification feature vector belongs is.
S305: and determining a classification result of the image to be classified according to the comparison result.
In one embodiment, the initial category may include at least one candidate graph, and the server or the server group may obtain the candidate graphs belonging to the initial category and invoke the feature extraction model to configure candidate classification feature vectors for the initial category according to the candidate graphs.
In an embodiment, the candidate classification feature vector configured for the initial class may be obtained by processing the candidate graph through a feature extraction network in a feature extraction model to obtain an initial classification feature vector in the initial class, and optimizing the initial classification feature vector in the initial class through a feature representation optimization module in the feature extraction model. The candidate classification feature vector may be an N-dimensional feature vector, for example, a 2048-dimensional vector.
Q candidate maps (one candidate class feature vector for each candidate map) may be included under the initial class. For this situation, when the feature extraction model is called to configure candidate classification feature vectors for the initial class, Q candidate classification feature vectors may be directly configured for the initial class, that is, each candidate map corresponds to one candidate classification feature vector. Or, similarity calculation may be performed on the candidate classification feature vectors of each candidate map in the initial category, and the candidate classification feature vectors with similarities smaller than the first similarity threshold are classified into the same category, and each candidate map corresponding to the candidate classification feature vector belonging to the same category is characterized by using the same candidate classification feature vector, that is, a plurality of candidate maps each correspond to the same candidate classification feature vector, so as to configure less than Q candidate classification feature vectors for the initial category. By adopting the method, the number of candidate classification characteristic vectors configured for the initial class can be reduced while the image classification precision is ensured, and the calculation amount is further reduced, so that the operation rate is improved.
In one embodiment, the initial category may include a first category and a second category, and the server may perform similarity calculation on a target feature vector of the image to be classified and a candidate classification feature vector configured for the first category to obtain a first similarity; and performing similarity calculation on the target feature vector and the candidate classification feature vector configured for the second category to obtain a second similarity, and comparing the first similarity with the second similarity to obtain a comparison result, wherein the comparison result indicates a larger value between the first similarity and the second similarity. Further, if the comparison result indicates that the first similarity is greater than the second similarity, it may be determined that the image to be classified belongs to the first category. The initial category may also include a third category, a fourth category, or other categories, which are not specifically limited in the present invention.
For example, the target feature vector of the image to be classified is (0.23, 0.44, …, 0.61), the initial class is a dog class, wherein the first class included in the dog class is golden hair, the second class is teddy, the candidate classification feature vector corresponding to the golden hair is (0.23, 0.44, …, 0.67), the candidate classification feature vector corresponding to the teddy is (0.23, 0.31, …, 0.60), the server obtains the first similarity by performing similarity calculation on the target feature vector (0.23, 0.41, …, 0.61) and the candidate classification feature vector (0.23, 0.44, …, 0.67) corresponding to the golden hair, further obtains the first similarity by performing similarity calculation on the target feature vector (0.23, 0.41, …, 0.61) and the candidate classification feature vector (0.23, 0.31, …, 0.60) corresponding to the golden hair, obtains the second similarity by performing similarity calculation, and determines that the second similarity is greater than the first similarity calculation, and the second similarity is greater than the second similarity calculation, it can be determined that the image to be classified belongs to a golden hair.
In an embodiment, the first category and the second category may each include a candidate map associated with a candidate classification feature vector, and if the comparison result indicates that the first similarity is greater than the second similarity, an associated image of the image to be classified may be determined according to the first similarity, where the associated image is: and the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with the first similarity. For example, the candidate images with the similarity degree between the feature vector to be classified and the target feature vector being the first similarity degree are image JPG1 and image JPG2, then image JPG1 and image JPG2 may both be determined as the associated images of the image to be classified.
In one embodiment, when similarity calculation is performed on a target feature vector of an image to be classified and candidate classification feature vectors configured for a first class or a second class, an euclidean distance or a cosine included angle between the target feature vector of the image to be classified and the candidate classification feature vectors in the first class or the second class may be calculated through a matching algorithm, so as to obtain a degree of similarity between the target feature vector and the candidate classification feature vectors corresponding to each candidate image. When the similarity is calculated through a matching algorithm, matching can be performed through sequential traversal and sequential comparison, and certainly, a faiss (an open-source library for effective similarity search and dense vector clustering) based search engine can be adopted to search candidate classification feature vectors and perform matching.
In one embodiment, after the server determines the classification result of the image to be classified, response information for the classification request may be returned to the user terminal, where the response information may include at least one of the classification result and the associated image.
Referring to fig. 4, a flowchart of another image classification method according to an embodiment of the present invention is shown, where the method according to the embodiment of the present invention may be executed by a user terminal. The method of an embodiment of the present invention includes the following steps.
S401: and when the trigger operation aiming at the identification button is detected, calling the camera shooting assembly to acquire the current preview image.
S402: and if the current preview image is detected to be in a stable state, determining the current preview image as the image to be classified.
In one embodiment, the user terminal is installed with a target application that has an image classification function portal, which may include a scan recognition module. When a user wants to identify an object, a trigger operation can be input aiming at an identification button provided by the scanning identification module, when the user terminal detects the trigger operation, a scanning identification operation event can be generated, and a camera shooting component can be called to obtain a current preview image based on the event. Further, if it is detected that the current preview image is in a stable state, the current preview image in the stable state may be taken as an image to be classified, or a stored image may be directly imported as the image to be classified. The target application may be, for example, a browser, an instant messaging application, a payment application, and other applications having an image classification function entry.
In one embodiment, the current preview image may be acquired at preset time intervals, when determining whether the current preview image is in a stable state, the current preview image may be subjected to similarity comparison with the preview images acquired T times before, and if the similarity is greater than or equal to a similarity threshold, the current preview image may be determined to be in a stable state. Wherein T is a positive integer greater than 0, and the specific value of T may be adjusted accordingly according to different design requirements, which is not specifically limited in the present invention.
In one embodiment, as shown in fig. 5, the target application is a browser, and after clicking ascan identification button 501 on the browser, the user terminal enters an image acquisition interface, where animage display area 502 and a "+"button 503 for prompting to import an image are included, and the user can enter an image selection and confirmation interface by clicking thebutton 503. When it is detected that the preview image displayed in theimage display area 502 is in a stable state, the preview image in the stable state is determined as an image to be classified.
S403: and sending a classification request to a server, wherein the classification request carries the image to be classified. After obtaining the image to be classified, the user terminal may generate a classification request carrying the image to be classified, and then send the classification request to a server providing an online classification service.
S404: and receiving response information returned by the server. The response information includes a classification result of the image to be classified, and the classification result can be obtained after the server classifies and determines the image to be classified according to the classification model and the feature extraction model. In an embodiment, after receiving the classification request, the server may determine a classification result of the image to be classified in response to the classification request, and return the classification result to the user terminal by carrying the classification result in the response information. Further, when the user terminal receives the response information carrying the classification result returned by the server, the classification result can be displayed on the user interface, so that the user can conveniently check the classification result. The response information may also include, in addition to the classification result, an associated image of the image to be classified or other information, which is not specifically limited in the present invention.
In one embodiment, after receiving the response message returned from the server, the user terminal may display a user interface displaying: any one or more of a classification result of the image to be classified, an associated image and description information, wherein the classification result and the associated image can be carried in response information, and the description information can be obtained by querying according to the classification result. The description information may be text information and image information associated with the classification result. For example, the classification result shows that the object included in the image to be classified is the X-series off-road vehicle 01, and then the description information may include a basic introduction of the X-series off-road vehicle 01, such as a performance list, time to market, price, 4S store nearby selling the X-series off-road vehicle 01, and the like, and may also include a platform website selling the X-series off-road vehicle 01, and the like, which is not specifically described in the present invention.
In one embodiment, after receiving the response information, the user terminal may directly perform online search based on the classification result carried in the response information, thereby obtaining the description information of the image to be classified, and displaying the description information on the user interface. Alternatively, when a trigger operation for the user interface is received, the description information of the classification result may be obtained online or locally, and the obtained description information may be displayed on the user interface.
It should be noted that, the description information displayed on the user interface may be obtained by querying the classification result, or the response information carries the description information, and the user terminal obtains the description information from the response information, which is not limited in this invention.
Embodiments of the present invention further provide a computer storage medium, in which program instructions are stored, and when the program instructions are executed, the computer storage medium is configured to implement the corresponding method described in the above embodiments.
Referring to fig. 6, it is a schematic structural diagram of an image classification device according to an embodiment of the present invention, where the image classification device may be disposed in a server, or may also be disposed in some intelligent terminals with rich software and hardware resources, such as some personal computers.
In one implementation of the apparatus of the embodiment of the present invention, the apparatus includes the following structure.
An obtainingmodule 601, configured to obtain an image to be classified;
thecalling module 602 is configured to call a classification model to classify the image to be classified, and determine an initial category to which the image to be classified belongs;
thecalling module 602 is further configured to call a feature extraction model to determine a target feature vector of the image to be classified;
acomparison module 603, configured to compare the target feature vector of the image to be classified with the candidate classification feature vector configured for the initial class, so as to obtain a comparison result;
and the determiningmodule 604 is configured to determine a classification result of the image to be classified according to the comparison result.
In one embodiment, the feature extraction model is formed according to a feature extraction network and a feature representation optimization module, the feature extraction network is used for extracting an initial feature vector of an image to be classified based on a neural network, the feature representation optimization module is used for optimizing the initial feature vector to obtain an N-dimensional target feature vector, and N is a positive integer.
In one embodiment, the apparatus may further include: atraining module 605, agenerating module 606, wherein: the obtainingmodule 601 is further configured to obtain an initial feature classification model and a training image, where the initial feature classification model includes an initial feature extraction network formed based on a neural network; atraining module 605, configured to perform training optimization on the initial feature classification model according to the training image obtained by the obtainingmodule 601 to obtain a feature classification model, where the feature classification model includes a feature extraction network obtained after performing training optimization on the initial feature extraction network; an obtainingmodule 601, configured to obtain the feature extraction network from the feature classification model; ageneration module 606 for generating a feature representation optimization module based on the whitening parameters; thegenerating module 606 is further configured to generate a feature extraction model according to the feature extraction network and the feature representation optimizing module, where the feature representation optimizing module is configured to optimize a feature vector output by the feature extraction network and output the optimized feature vector.
In one embodiment, the apparatus may further include: aconfiguration module 607, wherein: an obtainingmodule 601, configured to obtain a candidate graph belonging to the initial category; aconfiguration module 607, configured to invoke the feature extraction model to configure candidate classification feature vectors for the initial class according to the candidate map acquired by theacquisition module 601.
In an embodiment, theconfiguration module 607 may be specifically configured to process the candidate graph through a feature extraction network in the feature extraction model to obtain an initial classification feature vector in the initial category, and optimize the initial classification feature vector in the initial category through a feature representation optimization module in the feature extraction model to obtain an N-dimensional candidate classification feature vector in the initial category.
In one embodiment, the initial category includes a first category and a second category, and the comparingmodule 603 may include: a calculatingunit 6031, configured to perform similarity calculation on the target feature vector of the image to be classified and the candidate classification feature vector configured for the first class, so as to obtain a first similarity; similarity calculation is carried out on the target feature vector and the candidate classification feature vector configured for the second category, and second similarity is obtained; a comparingunit 6032, configured to compare the first similarity with the second similarity, and obtain a comparison result, where the comparison result indicates a larger value between the first similarity and the second similarity.
In one embodiment, the determiningmodule 604 may be specifically configured to determine that the image to be classified belongs to the first category if the comparison result indicates that the first similarity is greater than the second similarity.
In an embodiment, the first category and the second category each include a candidate map associated with a candidate classification feature vector, and the determiningmodule 604 may be further configured to determine to obtain an associated image of the image to be classified according to the first similarity if the comparison result indicates that the first similarity is greater than the second similarity, where the associated image is: and the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with a first similarity.
In the embodiment of the present invention, the specific implementation of the above modules may refer to the description of relevant contents in the embodiment corresponding to fig. 3.
Referring to fig. 7 again, it is a schematic structural diagram of a server according to an embodiment of the present invention, where the server according to an embodiment of the present invention includes a power supply module and other structures, and includes aprocessor 701, astorage device 702, and anetwork interface 703. Data can be interacted among theprocessor 701, thestorage device 702 and thenetwork interface 703, and theprocessor 701 realizes a corresponding image classification function.
Thestorage 702 may include a volatile memory (volatile memory), such as a random-access memory (RAM); thestorage device 702 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), or the like; the storage means 702 may also comprise a combination of memories of the kind described above.
Thenetwork interface 703 may interact data with other servers and various user terminals, and the user terminal may send a classification request carrying an image to be classified to thenetwork interface 703, and the classification request is output by thenetwork interface 703 to theprocessor 701 of the server for processing.
Theprocessor 701 may be a Central Processing Unit (CPU). In one embodiment, theprocessor 701 may also be a Graphics Processing Unit (GPU) 701. Theprocessor 701 may also be a combination of a CPU and a GPU. In the server, a plurality of CPUs and GPUs may be included as necessary to perform corresponding image processing. In one embodiment, thestorage 702 is used to store program instructions. Theprocessor 701 may invoke the program instructions to implement the various methods as described above in embodiments of the invention.
In a first possible implementation, theprocessor 701 of the server calls a program instruction stored in thestorage device 702 to obtain an image to be classified; calling a classification model to classify the images to be classified, and determining an initial class to which the images to be classified belong; calling a feature extraction model to determine a target feature vector of the image to be classified; comparing the target characteristic vector of the image to be classified with the candidate classification characteristic vector configured for the initial class to obtain a comparison result; and determining the classification result of the image to be classified according to the comparison result.
In one embodiment, the feature extraction model is formed according to a feature extraction network and a feature representation optimization module, the feature extraction network is used for extracting an initial feature vector of an image to be classified based on a neural network, the feature representation optimization module is used for optimizing the initial feature vector to obtain an N-dimensional target feature vector, and N is a positive integer.
In one embodiment, theprocessor 701 is further configured to obtain an initial feature classification model, where the initial feature classification model includes an initial feature extraction network formed based on a neural network; acquiring a training image, and training and optimizing the initial feature classification model according to the training image to obtain a feature classification model, wherein the feature classification model comprises a feature extraction network obtained after training and optimizing the initial feature extraction network; acquiring the feature extraction network from the feature classification model; generating a feature representation optimization module based on the whitening parameters; and generating a feature extraction model according to the feature extraction network and the feature representation optimization module, wherein the feature representation optimization module is used for optimizing the feature vector output by the feature extraction network and outputting the optimized feature vector.
In an embodiment, theprocessor 701 is further configured to obtain a candidate graph belonging to the initial category; and calling the feature extraction model to configure candidate classification feature vectors for the initial classes according to the candidate graphs.
In an embodiment, theprocessor 701 is further configured to process the candidate graph through a feature extraction network in the feature extraction model to obtain an initial classification feature vector in the initial category, and optimize the initial classification feature vector in the initial category through a feature representation optimization module in the feature extraction model to obtain an N-dimensional candidate classification feature vector in the initial category.
In an embodiment, the initial category includes a first category and a second category, and theprocessor 701 is further configured to perform similarity calculation on a target feature vector of the image to be classified and a candidate classification feature vector configured for the first category to obtain a first similarity; similarity calculation is carried out on the target feature vector and the candidate classification feature vector configured for the second category, and second similarity is obtained; comparing the first similarity with the second similarity to obtain a comparison result, wherein the comparison result indicates a larger value between the first similarity and the second similarity.
In one embodiment, theprocessor 701 is further configured to determine that the image to be classified belongs to the first category if the comparison result indicates that the first similarity is greater than the second similarity.
In an embodiment, the first category and the second category each include a candidate map associated with a candidate classification feature vector, and theprocessor 701 is further configured to determine to obtain an associated image of the image to be classified according to the first similarity if the comparison result indicates that the first similarity is greater than the second similarity, where the associated image is: and the similarity between the candidate classification feature vector and the target feature vector is a candidate graph with a first similarity.
In the embodiment of the present invention, theprocessor 701 may be implemented as described with reference to the foregoing description of the embodiment corresponding to fig. 3.
Referring to fig. 8, a schematic structural diagram of another image classification device according to an embodiment of the present invention is shown, where the image classification device may be disposed in a user terminal.
In one implementation of the apparatus of the embodiment of the present invention, the apparatus includes the following structure.
Adetection module 801, configured to detect a trigger operation for an identification button;
an obtainingmodule 802, configured to, when the detection module detects a trigger operation for an identification button, call a camera component to obtain a current preview image;
the detectingmodule 801 is further configured to detect whether the current preview image is in a stable state;
a determiningmodule 803, configured to determine the current preview image as an image to be classified if the detecting module detects that the current preview image is in a stable state;
a sendingmodule 804, configured to send a classification request to a server, where the classification request carries the image to be classified;
the receivingmodule 805 is configured to receive response information returned by the server, where the response information includes a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.
In one embodiment, the apparatus may further comprise: adisplay module 806, configured to display a user interface after receiving the response information returned from the server, where the user interface displays: any one or more of the classification result of the image to be classified, the associated image and the description information, wherein the classification result and the associated image are carried in the response information, and the description information is obtained by inquiring according to the classification result.
In the embodiment of the present invention, the specific implementation of the above modules may refer to the description of relevant contents in the embodiment corresponding to fig. 4.
Referring to fig. 9 again, it is a schematic structural diagram of a user terminal according to an embodiment of the present invention, and the user terminal according to an embodiment of the present invention may include a power supply module and the like, and includes aprocessor 901, astorage 902, and atransceiver 903. Data can be interacted among theprocessor 901, thestorage device 902 and thetransceiver 903, and theprocessor 901 realizes a corresponding image classification function.
Thestorage 902 may include a volatile memory (volatile memory), such as a random-access memory (RAM); thestorage 902 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; thestorage 902 may also comprise a combination of memories of the kind described above.
Thetransceiver 903 may interact with a server and various user terminals to exchange data, and the server may send response information carrying the classification result to thetransceiver 903, and thetransceiver 903 outputs the response information to theprocessor 901 of the user terminal for processing.
Theprocessor 901 may be a Central Processing Unit (CPU) 901. In one embodiment, theprocessor 901 may also be a Graphics Processing Unit (GPU) 901. Theprocessor 901 may also be a combination of a CPU and a GPU. In the user terminal, a plurality of CPUs and GPUs may be included as necessary to perform corresponding image processing. In one embodiment, thestorage 902 is used to store program instructions. Theprocessor 901 may call the program instructions to implement the various methods as described above in the embodiments of the present invention.
In a first possible implementation, theprocessor 901 of the user terminal invokes a program instruction stored in thestorage 902, so as to invoke a camera component to acquire a current preview image when detecting a trigger operation for an identification button; if the current preview image is detected to be in a stable state, determining the current preview image as an image to be classified; sending a classification request to a server, wherein the classification request carries the image to be classified; and receiving response information returned by the server, wherein the response information comprises a classification result of the image to be classified, and the classification result is obtained after the server classifies and determines the image to be classified according to a classification model and a feature extraction model.
In one embodiment, theprocessor 901 is further configured to display a user interface after receiving the response information returned from the server, where the user interface displays: any one or more of the classification result of the image to be classified, the associated image and the description information, wherein the classification result and the associated image are carried in the response information, and the description information is obtained by inquiring according to the classification result.
In the embodiment of the present invention, theprocessor 901 may be implemented as described in the foregoing description with reference to the embodiment corresponding to fig. 4.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described with reference to a number of embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.