Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides an image matching method and device, electronic equipment and a storage medium. The image matching apparatus may be specifically integrated in an electronic device, and the electronic device may be a terminal or a server or the like.
It is understood that the image matching method of the present embodiment may be executed on the terminal, may be executed on the server, or may be executed by both the terminal and the server.
Take the image matching method executed by the terminal and the server together as an example.
As shown in fig. 1, the terminal 10 may obtain an image to be matched through an input module, send the image to be matched to the server, so that the server searches for a similar image similar to the image to be matched in the candidate image, and returns the similar image to the terminal for displaying.
The server 11 may be configured to: acquiring an image to be matched; extracting feature maps under multiple scales from the image to be matched; processing the feature maps under the multiple scales to obtain a feature vector of the image to be matched; coding the feature vector to obtain a matching feature vector for image matching, wherein elements in the matching feature vector meet preset requirements; acquiring matching feature vectors of the candidate images; and determining similar images similar to the images to be matched in the candidate images based on the matching feature vectors of the images to be matched and the matching feature vectors of the candidate images.
The Image matching method provided by the embodiment of the application relates to Computer Vision (CV) in the field of Artificial Intelligence (AI), and particularly relates to similarity graph detection/deduplication in the field of Image Recognition (IR) in Computer Vision. According to the embodiment of the application, the feature map of the image can be extracted and processed to obtain the matching feature vector for image matching, and the similar image is determined through the similarity calculation of the matching feature vectors between the images.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence software technology mainly comprises a computer vision technology, a machine learning/deep learning direction and the like.
Among them, Computer Vision technology (CV) is a science for studying how to make a machine "see", and more specifically, it refers to machine Vision for identifying, measuring, etc. a target by using a Computer instead of human eyes, and further performing image processing, so that an image is processed by a Computer to be an image more suitable for human eyes or transmitted to an instrument for detection. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition and other technologies, and also includes common face recognition, human body posture recognition and other biological feature recognition technologies.
The similar image detection/duplication removal technology is used for detecting/duplication removal of similar images by extracting image features, calculating similarity between images based on the extracted image features and further obtaining the similar images.
The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
The first embodiment,
The embodiment will be described from the perspective of an image matching apparatus, where the image matching apparatus may be specifically integrated in an electronic device, and the electronic device may be a server or a terminal; the terminal may include a tablet Computer, a notebook Computer, a Personal Computer (PC), and the like.
The image matching method of the embodiment of the application can be applied to searching similar images under various types of recommended scenes, including but not limited to: an information flow recommendation scenario, a commodity recommendation scenario, and the like, where the commodity recommendation scenario may be an apparel commodity recommendation scenario.
As shown in fig. 2, the specific flow of the image matching method may be as follows:
101. and acquiring an image to be matched.
In this embodiment, there are various ways to acquire an image to be matched.
For example, the image to be matched may be acquired by an image acquisition device on the electronic device, for example, when a shooting instruction is received, the image acquisition device is turned on to shoot an image, and the shot image is taken as the image to be matched, where the image acquisition device may be a camera or the like.
For example, the image to be matched may also be obtained from a local gallery of the electronic device, for example, the image to be matched is stored in the local gallery of the electronic device, and when an instruction to obtain the image to be matched is received, the image to be matched may be directly obtained from the local gallery of the electronic device, where the local refers to the electronic device.
For example, the image to be matched may also be obtained through the internet and then provided to the image matching device, for example, the image to be matched is obtained through internet downloading.
For example, the image to be matched may also be obtained by other devices and then provided to the image matching apparatus, that is, the image matching apparatus may specifically receive the image to be matched sent by other devices, such as other terminals.
For a scene in which the electronic device is a server, acquiring an image to be matched may include: and receiving the image to be matched sent by the terminal.
The image to be matched in this embodiment may be a static image or a dynamic image, and may be an expression image, a commodity image, a portrait of a person, or other types of images, where the commodity image includes: and (5) clothing images.
102. And extracting feature maps under multiple scales for the image to be matched.
In this embodiment, the image to be matched may be preprocessed, and then the feature maps of the image to be matched at different scales are extracted by the plurality of feature extraction blocks in the neural network.
The preprocessing may include image resizing, image data enhancement, image rotation, or the like of the image to be matched. Image data enhancement may include histogram equalization, sharpening, smoothing, and the like. In addition, the neural Network may be a 16-layer Visual Geometry Group Network (VGGNet, Visual Geometry Group Network), a Residual Network (ResNet, Residual Network), a Dense connection convolution Network (densnet, Dense connectivity Network), and so on, but it should be understood that the neural Network of the present embodiment is not limited to the above-listed types.
For example, referring to fig. 3, when the neural network is a VGGNet with 16 layers, the image size of the image to be matched is adjusted first, and if the resolution of the image to be matched is adjusted to 224 × 224, since the VGGNet with 16 layers can be divided into five convolution groups, that is, five feature extraction blocks, in this embodiment, five feature maps are extracted through the five convolution groups, for different convolution groups, the extracted feature maps have different scales, that is, multi-scale features are obtained, and for different convolution groups, the dimensions of the extracted feature maps are also different. For example, the first feature extraction block extracts a feature map with 64 channels, each with dimensions 224 × 224, and the second feature extraction block extracts a feature map with 128 channels, each with dimensions 112 × 112. Here, the extracted features are basic features of the image, such as shape features of the image. For example, when the image is an image of a clothing article, the extracted features may be the shape outline of the clothing.
The neural network can be obtained through deep learning, and the deep learning is machine learning which realizes artificial intelligence in a computing system by establishing the neural network with a hierarchical structure. Because the neural network with the hierarchical structure can extract and screen the input information layer by layer, the deep learning has the characteristic learning capability and can realize end-to-end supervised learning and unsupervised learning.
Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence.
103. Processing the feature maps under multiple scales to obtain a feature vector of the image to be matched;
the step of processing the feature maps under multiple scales to obtain the feature vector of the image to be matched may include steps a and b:
step a: and respectively carrying out dimension reduction operation on the feature maps under the multiple scales.
In this embodiment, the channel dimensions of the feature map in each scale may be respectively reduced to the same preset dimensions, where the channel dimensions are the dimensions of each channel of the feature map, and the preset dimensions may be set according to the requirements of practical applications, which is not described herein again.
For example, the feature maps at each scale are reduced by Pooling, which may include Max-Pooling (Maximum Pooling), Average Pooling (Avg-Pooling, Average Pooling), and Generalized mean Pooling (GEM-Pooling), among others.
If the dimensionality reduction is carried out through the generalized mean pooling algorithm, the process is as follows through the expression of the generalized mean pooling algorithm:
to calculate the pooled feature map. Where x is the input of the pooling operation, vector f is the output of the pooling operation, g is the total number of channels of the feature extraction block, k represents the kth channel, PkFor each channel of the feature map, the parameters can take into account the correlation between the channels. Firstly, calculating the characteristic graph of each channel to obtain the characteristic graph f of each channel of the characteristic extraction blockk(g)Then according to the characteristic diagram f of each channelk(g)Obtaining an output f of the feature extraction block(g). Through the calculation, the feature map under each scale after dimension reduction is obtained. The dimension of the post-dimensionality-reduction channel is preset, for example, preset to 1 × 1, and then the dimension of the post-dimensionality-reduction feature map 1 is 1 × 64 × 1, and similarly, the dimension of the post-dimensionality-reduction feature map 2 is 1 × 128 × 1, as shown in fig. 3.
Step b: and fusing all the feature maps subjected to the dimension reduction to obtain the feature vector of the image to be matched.
The fusion refers to feature fusion, and the fusion of features of different scales can improve the characterization capability of the features. The resolution of the low-level features is higher, and the low-level features contain more detailed information, but the low-level features have more noise and low semantic property due to less convolution; the high-level features have strong semantic information, but the resolution is low and the loss of details is large. The accuracy of image matching can be improved by fusing multi-layer features, namely fusing multi-scale features.
In this embodiment, the step of fusing all the feature maps after the dimension reduction to obtain the feature vector of the image to be matched may include:
and splicing all the feature maps subjected to the dimension reduction to obtain the feature vector of the image to be matched.
For example, all the feature maps after dimension reduction are spliced from large to small according to the size of the feature maps to obtain the feature vector of the image to be matched.
It should be noted that the neural network is trained by a plurality of training data with labels, the training data of the embodiment includes a plurality of training images, and the labels refer to types of image bearing objects; the neural network may be specifically provided to the image matching apparatus after being trained by other devices, or may be trained by the image matching apparatus itself.
If the image matching device performs training by itself, before the step "acquiring an image to be matched", the image matching method may further include:
acquiring training data, wherein the training data comprises a plurality of training images, and the types of objects borne by the training images are multiple; performing classification training of objects on the neural network based on the training data; extracting feature graphs under multiple scales from the training image through a feature extraction block in the trained neural network; processing the feature map of the training image to obtain a feature vector of the training image; and optimizing parameters of the neural network based on the feature vectors of the training images of various objects so that the difference of the feature vectors corresponding to the training images of different objects in the same object meets the preset requirement.
The classification training is to distribute the feature vectors of the training data of the same category in the same region in space, and the feature vectors of the training data not belonging to the same category are not highly similar.
Optionally, in this embodiment, at least two or more objects carried by the training images are the same object, and the object information displayed by different training images of the same object is not completely the same. For example, for the example where the training image is a commodity photograph, there are at least two commodity photographs taken from different angles for the same commodity.
For example, in a specific clothing commodity recommendation scenario, the neural network needs to be trained by using images in a commodity library. The training data is from images in a library of products that provide a large number of classifications of apparel images, for example, the library may have 30 ten thousand apparel images, including 15 apparel categories, where there are 5 ten thousand apparel images with an image bearing object category of "T-shirts," and there are at least two apparel images for each "T-shirt," with the two apparel images taken at different angles.
In this embodiment, for each type of training image, the feature vector difference corresponding to the training images of different objects in the same type of object can meet the preset requirement through the following process, which specifically includes the following steps:
extracting three images from various training images to form a triple to obtain a plurality of triples, wherein the content carried by two images in the triple belongs to the same object and is called as an anchor example and a positive example, and the other image is a sample of different objects in the same category and is called as a negative example; calculating the distance between the anchor example feature vector and the positive example feature vector and the distance between the anchor example feature vector and the negative example feature vector to obtain the loss of the triplet; and optimizing parameters of the neural network through the loss of the triples to update the characteristic vectors, and continuously repeating the optimization process until convergence or training to a preset iteration number to obtain the trained neural network.
The anchor example is a randomly selected training image, and the content carried by the anchor example and the content carried by the positive example belong to the same object, for example, the anchor example and the positive example may specifically be training data which are obtained by shooting at different angles and belong to the same object. The above optimization process can be expressed by the following equation:
L=max(d(a,p)-d(a,n)+margin,0)
where a represents the feature vector of the anchor instance, p represents the feature vector of the positive instance, n represents the feature vector of the negative instance, and L is the convergence flag, then d (a, p) represents the distance of the anchor instance from the positive instance, d (a, n) represents the distance of the anchor instance from the negative instance, and the parameter margin is the threshold used to define d (a, p) and d (a, n). The goal of the optimization is to reduce the distance of d (a, p) and increase the distance of d (a, n). That is, in the training process, the feature vector learned by the neural network makes the positive example feature vector and the anchor example feature vector more and more similar, and makes the negative example feature vector and the anchor example feature vector more and more different.
First, d (a, p) and d (a, n) of each triplet feature vector are calculated, and then the loss of the triplet is calculated: d (a, p) -d (a, n) + margin, the value of L is calculated from the loss of the triplet, and the value of L is output equal to 0 as long as d (a, n) is greater than d (a, p) + margin. When the value of L is not converged or the number of training iterations is less than the preset number, the parameters of the neural network need to be continuously optimized according to the loss of the triplet, the feature vector is updated, the loss of the triplet is recalculated by using the updated feature vector, and the process is continuously repeated to optimize the parameters of the neural network, as shown in fig. 4.
104. And coding the feature vector to obtain a matching feature vector for image matching, wherein elements in the matching feature vector meet preset requirements.
In this embodiment, the feature vector may be encoded and converted into a matching feature vector composed of two elements, i.e., 0 and 1. For example, the feature vector is encoded by using a hash encoding algorithm, and a floating-point number vector is converted into a binary hash code, as shown in fig. 5, the specific process is as follows:
learning a rotation matrix of the feature vector based on an iterative quantization algorithm; and carrying out hash coding on the feature vector through the rotation matrix to obtain a matching feature vector consisting of 0 and 1 elements.
The rotation matrix is obtained by learning through an Iterative Quantization algorithm (ITQ), which is as follows:
wherein, R is a rotation matrix, B is quantization coding, and X is a floating point number characteristic, namely a characteristic vector. Iteration can be performed with reference to the prior art, for example, a random matrix is generated, Singular Value Decomposition (SVD) is performed on the random matrix to obtain a corresponding orthogonal matrix, the orthogonal matrix is used as an initial Value of a rotation matrix, then, the Value of the rotation matrix R is fixed to solve a quantization code B, after the quantization code B is solved, Singular Value Decomposition is performed on the result of right multiplication of the quantization code B and a floating point feature X to update the rotation matrix, so that multiple iterations are performed on the rotation matrix to obtain the rotation matrix with quantization errors meeting preset conditions, and the rotation matrix can be learned through about 50 iterations generally.
Optionally, the iterative quantization algorithm may also obtain the hashed mapping matrix by constraining the output of the neuron, for example, limiting the output of the neuron to two elements, i.e., 0 and 1, and learning the hashed mapping matrix through data.
Before "encoding the feature vector to obtain a matching feature vector for image matching", the method may further include:
and reducing the dimension of the feature vector by using a whitening algorithm, wherein the dimension number of the feature vector after the dimension reduction is lower than a preset numerical value. Other non-whitening methods may alternatively be used.
In this embodiment, the feature vector may be normalized first, the feature vector after normalization may be subjected to dimensionality reduction through a whitening algorithm, and the feature vector after dimensionality reduction may be normalized. The Whitening algorithm may be Principal Component Analysis Whitening (Principal Component Analysis Whitening), projection Whitening (projects Whitening), or the like, and L may be used for normalization2Norm normalization, etc.
If the feature vector is subjected to dimensionality reduction through a projection whitening algorithm, the process is as follows:
wherein f (i), f (j) represent the feature vectors of two images respectively, and Y (i, j) ═ 1 represents f (i)And f (j) is a characteristic vector of two images with bearing contents belonging to the same object, Y (i, j) ═ 0 represents f (i) and f (j) is a characteristic vector of two images with bearing contents belonging to different objects under the same class, the eig function represents the characteristic value and the characteristic vector, P is a mapping matrix, C is a mapping matrix, and
S、C
Din two cases of Y (i, j) being 1 and Y (i, j) being 0
The covariance matrix of (2). For example, Y (i, j) ═ 1 represents f (i) and f (j) are feature vectors of images belonging to the same "T-shirt", and Y (i, j) ═ 0 represents f (i) and f (j) are feature vectors of images of different "T-shirts".
After the supervision information of the feature vector is obtained, a mapping matrix P can be obtained based on the supervision information through the formula, and then the feature vector after normalization is multiplied by the mapping matrix P to reduce the feature vector of high dimension to the feature vector of lower dimension. For example, the specific dimension reduction process may be: the dimension of the normalized eigenvector is 1 × n, the dimension of the mapping matrix P is n × m, where n > m, and the left multiplication of the normalized eigenvector by the mapping matrix P yields a matrix with a dimension of 1 × m, and the dimension decreases from n to m, as shown in fig. 6.
Alternatively, the projective whitening can also be converted into a fully-connected layer of the neural network and learned in the neural network.
105. And acquiring the matching feature vector of the candidate image.
In this embodiment, the matching feature vector of the candidate image may be obtained from the electronic device storing the matching feature vector of the candidate image, where the matching feature vector of the candidate image is calculated and then stored in the electronic device.
Optionally, in order to improve the efficiency of image matching, before "obtaining the matching feature vector of the candidate image", the method may further include: and determining the class of the object carried by the image to be matched.
Correspondingly, the step of "obtaining matching feature vectors of candidate images" may include:
determining a candidate image, wherein the class of the object carried by the candidate image is the same as the class of the object carried by the image to be matched;
and acquiring the matching feature vector of the candidate image.
The matching feature vectors of the candidate images can be obtained by calculating the matching feature vectors of the candidate images in real time. In this embodiment, the matching feature vector of the candidate image may be calculated in advance, stored in the database, for example, stored in a shared book of a block chain, and when the matching feature vector of the candidate image needs to be acquired, the matching feature vector of the candidate image is acquired from the database. For example, matching feature vectors for candidate images are obtained from a shared ledger of a blockchain.
Optionally, in this embodiment, the calculation scheme of the matching feature vector of the candidate image is similar to the calculation scheme of the matching feature vector of the image to be matched in step 102-104. Extracting feature maps under multiple scales from the candidate image through a neural network; processing the feature maps under the multiple scales to obtain feature vectors of candidate images; and coding the feature vector to obtain a matching feature vector for image matching, wherein elements in the matching feature vector meet preset requirements. After the computation is completed, the matching feature vectors of the candidate images may be stored in the database in correspondence with the candidate images.
106. And determining similar images similar to the image to be matched in the candidate images based on the matching feature vectors of the image to be matched and the matching feature vectors of the candidate images.
In this embodiment, the distance between the matching feature vectors may be determined by measuring, for example, a hamming distance between the matching feature vectors, but it is understood that the distance is not limited to the hamming distance.
Optionally, the step of "determining a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image" may include:
calculating the Hamming distance between the matching characteristic vector of the image to be matched and the matching characteristic vector of the candidate image; and determining the candidate images with the Hamming distance meeting the preset conditions in the candidate images as similar images similar to the image to be matched.
The hamming distance is the number of digits of which the result is 1 after the exclusive or operation is performed on the two character strings. The difference between the matched feature vectors can be obtained through the size of the Hamming distance, and the similarity between the images is reflected by the difference between the matched feature vectors, so that the similar image similar to the image to be matched can be determined.
The preset condition in this embodiment may be that the hamming distance is not higher than a preset threshold, and the preset threshold may be set according to actual needs, for example, specifically set according to the number of bits of the matched feature vector, which is not limited in this embodiment. For example, a candidate image whose hamming distance satisfies a preset condition is a candidate image whose hamming distance is not higher than 3.
Optionally, after the step of determining a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image, "the method may further include: the similar image is output.
There are various ways to output the similar image.
For example, the similar image may be saved in a folder local to the electronic device, or may be displayed on a display of the electronic device, or may be transmitted to another device, and received by another device, such as another terminal.
In this embodiment, the number of the similar images is at least one, when the number of the similar images is multiple, the similarity between the similar images and the image to be matched may be determined according to the calculation in thestep 106, the similar images are sorted based on the similarity, a similar image set composed of the sorted similar images is obtained, and the step "outputting the similar images" may include: and outputting a similar image set consisting of the sorted similar images.
The similarity between the similar image and the image to be matched can be determined based on the Hamming distance between the similar image and the image to be matched; the smaller the value of the Hamming distance is, the higher the similarity is; the larger the value of the hamming distance, the lower the similarity.
For example, specifically in a clothing commodity recommendation scene, a matching feature vector of a clothing image is obtained through the neural network, meanwhile, a matching feature vector of an image in a clothing commodity library is obtained, an image similar to the clothing image in the clothing commodity library is determined by calculating a hamming distance between the matching feature vector of the clothing image and the matching feature vector of the image in the clothing commodity library, and then the image similar to the clothing image is recommended to a user. By the method, the clothing can be efficiently and accurately retrieved, and the relevance between the advertisement of the clothing commodity and the recommendation of the content can be improved, so that the click rate of the user on the recommended advertisement and content of the clothing commodity is improved.
As can be seen from the above, in this embodiment, after an image to be matched is obtained, feature maps under multiple scales are extracted from the image to be matched, the feature maps under multiple scales are processed to obtain feature vectors of the image to be matched, and the feature vectors are encoded to obtain matching feature vectors for image matching, where elements in the matching feature vectors meet preset requirements; then, acquiring a matching feature vector of a candidate image, and determining a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image; the embodiment of the application can extract the feature maps under multiple scales through the neural network obtained by machine learning of artificial intelligence, so that the feature vectors contain the feature information of the image to be matched under multiple scales, the characterization force of the feature vectors is strong, and when image similarity matching is carried out, the accuracy of image matching can be greatly improved, and the matching effect is better.
Example II,
The method described in the foregoing embodiment will be described in further detail below with the image matching apparatus being specifically integrated in a server.
The image matching method provided by the embodiment of the application relates to the technologies of artificial intelligence, such as computer vision, and the like, and is specifically explained by the following embodiment:
the training of the neural network can be specifically as follows:
(1) and (5) training a classification model.
First, training data may be obtained, where the training data includes a plurality of training images, where the training images bear a plurality of classes of objects, and the number of training images of the same object is at least two. It should be noted that the training image is labeled, and the label refers to the type of the image bearing object. For example, taking training data as a clothing image, the training data includes a plurality of clothing categories, a plurality of images bearing the object category of "T-shirt" are provided, each training image of "T-shirt" includes 2 training images, and the 2 training images belong to the same piece of clothing but are shot at different angles.
Secondly, carrying out object classification training on the neural network based on the training data, and enabling the feature vectors of the clothes of the same category to be distributed in the same area in space, wherein the feature vectors of the clothes which do not belong to the same category are not highly similar. For example, the feature vector of a "skirt" can only be similar to the feature vector of another "skirt" and can not be highly similar to the feature vector of a "trousers".
(2) The neural network was trained using triple Loss (Triplet Loss).
As shown in fig. 4, feature maps under multiple scales may be extracted from a training image by the feature extraction block in the trained neural network, and then the feature maps of the training image are processed, for example, dimension reduction is performed through generalized mean pooling, and feature fusion is performed on the feature maps after dimension reduction, so as to obtain a feature vector of the training image; and then, optimizing parameters of the neural network based on the feature vectors of the training images of various objects so that the difference of the feature vectors corresponding to the training images of different objects in the similar objects meets the preset requirement.
In this embodiment, for each class of training images, the difference between the feature vectors corresponding to the training images of different objects in the same class of objects can meet the preset requirement through the following process, which specifically includes:
three training data are extracted from various types of training data to form a triple, wherein the content carried by two images in the triple belongs to the same object and is called an anchor example and a positive example, the content carried by the other image is a sample of different objects in the same class and is called a negative example, and then the triple loss is used for training. The formula for the triple loss is as follows:
L=max(d(a,p)-d(a,n)+margin,0)
where a denotes the feature vector of the anchor instance, p denotes the eigenvector of the positive instance, n denotes the feature vector of the negative instance, L is the convergence flag, then d (a, p) denotes the distance of the anchor instance from the positive instance, d (a, n) denotes the distance of the anchor instance from the negative instance, and the parameter margin is the threshold used to define d (a, p) and d (a, n). For example, in a specific scene, a and p are two images of an image bearing object belonging to the same piece of apparel but different shooting angles, and n and a bear objects belong to different pieces of apparel under the same type of apparel, that is, the object borne by a and p belongs to the same T-shirt, and the object borne by n and a belongs to different T-shirts.
The triple loss training process is to calculate d (a, p) and d (a, n) first, and then calculate the loss of the triplet: d (a, p) -d (a, n) + margin, the parameter margin may be set to 0.1 during training, and the value of L is obtained by calculating the loss of the triplet. And when the value of L is not converged or the training iteration times are less than the preset times, optimizing parameters of the neural network, updating the feature vectors, recalculating the loss of the triples by using the updated feature vectors, and continuously repeating the process to optimize the parameters of the neural network to obtain the trained neural network.
And secondly, matching the images to be matched through the trained neural network.
As shown in fig. 7 and 8, a specific flow of an image matching method may be as follows:
201. and the server receives the image to be matched sent by the terminal.
In this embodiment, the server may receive the image to be matched obtained from the database of the terminal, for example, the image to be matched is stored in the database of the terminal, and when receiving the instruction to obtain the image to be matched, the server may directly obtain the image to be matched from the database of the terminal.
Optionally, in this embodiment, the server may further receive an image to be matched sent from the image acquisition device of the terminal, for example, when the server receives a shooting instruction, the image acquisition device of the terminal is started to shoot the image, and the shot image is used as the image to be matched, where the image acquisition device of the terminal may be a terminal camera or the like.
The image to be matched in this embodiment may be a static image or a dynamic image, and may be an expression image, a commodity image, a portrait of a person, or other types of images, where the commodity image includes: and (5) clothing images.
202. And the server extracts feature maps under multiple scales from the image to be matched through a neural network.
In this embodiment, the image to be matched may be preprocessed, for example, the image size is adjusted, and the resolution of the image to be matched is adjusted to 224 × 224. The example is given with a 16-layer VGGNet, as shown in fig. 3. Because the 16-layer VGGNet can be divided into five convolution groups, namely five feature extraction blocks, five feature maps are extracted through the five convolution groups, the scales of the extracted feature maps are different for different convolution groups, namely, multi-scale features are obtained, and the dimensions of the extracted feature maps are also different for different convolution groups. For example, the first convolution group extracts a signature having 64 channels, each with a dimension of 224 × 224, the second convolution group extracts a signature having 128 channels, each with a dimension of 112 × 112, the third convolution group extracts a signature having 256 channels, each with a dimension of 56 × 56, the fourth convolution group extracts a signature having 512 channels, each with a dimension of 28 × 28, and the fifth convolution group extracts a signature having 512 channels, each with a dimension of 14 × 14.
203. And the server processes the feature maps under multiple scales to obtain the feature vector of the image to be matched.
In this embodiment, a generalized mean pooling algorithm is adopted to perform dimensionality reduction on the extracted feature maps under multiple scales, and then all feature maps after dimensionality reduction are fused, for example, all feature maps are spliced to obtain a feature vector of a vector to be matched.
The generalized mean pooling algorithm has the following expression:
where x is the input of the pooling operation, vector f is the output of the pooling operation, g is the total number of channels of the feature extraction block, k represents the kth channel, PkFor each channel of the feature map, the parameters can take into account the correlation between the channels. Firstly, calculating the characteristic graph of each channel to obtain the characteristic graph f of each channel of the characteristic extraction blockk(g)Then according to the characteristic diagram f of each channelk(g)Obtaining an output f of the feature extraction block(g). Through the calculation, the feature map under each scale after dimension reduction is obtained. The dimension of the post-dimensionality-reduction channel is preset, for example, preset to 1 × 1, the dimension of the post-dimensionality-reduction feature map 1 is 1 × 64 × 1, similarly, the dimension of the post-dimensionality-reduction feature map 2 is 1 × 128 × 1, the dimension of the post-dimensionality-reduction feature map 3 is 1 × 256 × 1, the dimension of the post-dimensionality-reduction feature map 4 is 1 × 512 × 1, and the dimension of the post-dimensionality-reduction feature map 5 is 1 × 512 × 1, as shown in fig. 3.
After the dimensionality reduction is carried out by the generalized mean pooling algorithm, all the dimensionality reduced feature maps are spliced together to obtain a feature vector of the image to be matched, wherein the feature vector is a 1472-dimensional floating point feature vector.
204. The server performs dimensionality reduction on the feature vector using a whitening algorithm.
In this embodiment, a projection whitening algorithm is used to perform dimension reduction on the feature vector, and the process is as follows:
wherein f (i), f (j) respectively represent feature vectors of two images, Y (i, j) ═ 1 represents the feature vectors of f (i) and f (j) of the same T-shirt, Y (i, j) ═ 0 represents f (i) and f (j) represent the feature vectors of different T-shirts, eig function represents feature value and feature vector, P is mapping matrix, C is
S、C
DIn two cases of Y (i, j) being 1 and Y (i, j) being 0
The covariance matrix of (2). After the supervision information of the feature vector is obtained, a mapping matrix P can be obtained based on the supervision information through the formula, and then the feature vector after normalization is multiplied by the mapping matrix P to reduce the feature vector of high dimension to the feature vector of lower dimension. For example, the specific dimension reduction process may be: the dimension of the normalized eigenvector is 1 × n, the dimension of the mapping matrix P is n × m, where n > m, and the left multiplication of the normalized eigenvector by the mapping matrix P yields a matrix with a dimension of 1 × m, and the dimension decreases from n to m, as shown in fig. 6.
Alternatively, the projection whitening can also be converted into a fully connected layer of the neural network and learned in the neural network.
205. And the server encodes the whitened feature vector and converts the feature vector subjected to dimensionality reduction into a matched feature vector consisting of 0 and 1 elements.
In this embodiment, a hash coding algorithm may be used to code the whitened floating-point feature vector, and convert the floating-point feature vector into a matching feature vector composed of two elements, i.e., 0 and 1, as shown in fig. 5, the specific process is as follows:
learning a rotation matrix of the feature vector based on an iterative quantization algorithm; and carrying out hash coding on the feature vector through the rotation matrix to obtain a matching feature vector consisting of 0 and 1 elements.
The rotation matrix is obtained by learning through an Iterative Quantization algorithm (ITQ), which is as follows:
wherein, R is a rotation matrix, B is quantization coding, and X is a floating point number characteristic, namely a characteristic vector. Iteration can be performed with reference to the prior art, for example, a random matrix is generated, Singular Value Decomposition (SVD) is performed on the random matrix to obtain a corresponding orthogonal matrix, the orthogonal matrix is used as an initial Value of a rotation matrix, then, the Value of the rotation matrix R is fixed to solve a quantization code B, after the quantization code B is solved, Singular Value Decomposition is performed on the result of right multiplication of the quantization code B and a floating point feature X to update the rotation matrix, so that multiple iterations are performed on the rotation matrix to obtain the rotation matrix with quantization errors meeting preset conditions, and the rotation matrix can be learned through about 50 iterations generally.
Optionally, the iterative quantization algorithm may also obtain the hashed mapping matrix by constraining the output of the neuron, for example, limiting the output of the neuron to two elements, i.e., 0 and 1, and learning the hashed mapping matrix through data.
206. The server obtains the matching feature vectors of the candidate images.
Optionally, before the "server obtains the matching feature vector of the candidate image", the category of the image bearing object to be matched may be determined.
Correspondingly, the step "the server obtains matching feature vectors of the candidate images" may include:
determining a candidate image, wherein the category of the candidate image bearing object is the same as that of the image bearing object to be matched;
the server obtains the matching feature vectors of the candidate images.
The matching feature vectors of the candidate images can be obtained by calculating the matching feature vectors of the candidate images in real time. In this embodiment, the matching feature vector of the candidate image may be calculated in advance, stored in the database, for example, stored in a shared book of a block chain, and when the matching feature vector of the candidate image needs to be acquired, the matching feature vector of the candidate image is acquired from the database. For example, matching feature vectors for candidate images are obtained from a shared ledger of a blockchain.
Optionally, in this embodiment, the calculation scheme of the matching feature vector of the candidate image is similar to the calculation scheme of the matching feature vector of the image to be matched in step 202-205. Namely, the server extracts feature maps under multiple scales from the candidate images through a neural network; processing the feature maps under the multiple scales to obtain feature vectors of candidate images; using a whitening algorithm to reduce the dimension of the feature vector of the candidate image; and coding the whitened feature vector, and converting the feature vector subjected to dimension reduction into a matched feature vector consisting of 0 and 1 elements. After the computation is completed, the matching feature vectors of the candidate images may be stored in the database in correspondence with the candidate images.
207. And the server determines a similar image similar to the image to be matched in the candidate images based on the matching feature vector of the image to be matched and the matching feature vector of the candidate images.
In this embodiment, the distance between the matching feature vectors may be determined by measuring, for example, a hamming distance between the matching feature vectors, but it is understood that the distance is not limited to the hamming distance.
Optionally, the step of "determining a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image" may include:
calculating the Hamming distance between the matching characteristic vector of the image to be matched and the matching characteristic vector of the candidate image; and determining the candidate images with the Hamming distance meeting the preset conditions in the candidate images as similar images similar to the image to be matched.
The preset condition in this embodiment may be that the hamming distance is not higher than a preset threshold, and the preset threshold may be set according to actual needs, for example, specifically set according to the number of bits of the matched feature vector, which is not limited in this embodiment. For example, a candidate image whose hamming distance satisfies a preset condition is a candidate image whose hamming distance is not higher than 3.
208. The server transmits the similar image to the terminal.
In this embodiment, the server may store the similar images in a database of the terminal, where the number of the similar images is at least one.
209. The terminal receives the similar image and displays the similar image.
In this embodiment, the terminal receives the similar image sent by the server, and displays the similar image on a display of the terminal. Wherein, the number of the similar images is at least one, when the number of the similar images is multiple, beforestep 209, the similarity between the similar images and the images to be matched may be determined according to the calculation instep 207, the similar images are sorted based on the similarity to obtain a similar image set composed of the sorted similar images, and the step "the terminal receives the similar images and displays the similar images" may include: the terminal receives a similar image set sent by the server, the similar images in the similar image set are sorted from large to small according to the size of the similarity, and the terminal displays the similar images on the display based on the sorting sequence.
The similarity between the similar image and the image to be matched can be determined based on the Hamming distance between the similar image and the image to be matched; the smaller the value of the hamming distance is, the higher the similarity is, and the larger the value of the hamming distance is, the lower the similarity is.
As can be seen from the above, in this embodiment, after an image to be matched is obtained, feature maps under multiple scales are extracted from the image to be matched, the feature maps under multiple scales are processed to obtain feature vectors of the image to be matched, and the feature vectors are encoded to obtain matching feature vectors for image matching, where elements in the matching feature vectors meet preset requirements; then, acquiring a matching feature vector of a candidate image, and determining a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image; the feature images under multiple scales are extracted, so that the feature vectors contain feature information of the image to be matched under multiple scales, the characterization force of the feature vectors is strong, and the accuracy of image matching can be greatly improved when image similarity matching is carried out, so that the matching effect is better.
Example III,
In order to better implement the above method, an image matching apparatus is further provided in an embodiment of the present application, as shown in fig. 9 a. The image matching apparatus may include afirst acquisition unit 901, anextraction unit 902, aprocessing unit 903, anencoding unit 904, asecond acquisition unit 905, and adetermination unit 906 as follows:
(1) afirst acquisition unit 901;
a first obtainingunit 901, configured to obtain an image to be matched.
In this embodiment, the first obtainingunit 901 obtains the image to be matched in various ways.
For example, the first acquiringunit 901, upon receiving a shooting instruction, turns on an image acquiring device of the first acquiring unit to shoot an image, and takes the shot image as an image to be matched, where the image acquiring device may be a camera or the like.
For example, the first obtainingunit 901 may also obtain the image to be matched from a database local to the image matching apparatus, for example, the image to be matched is stored in the database local to the image matching apparatus, and when receiving an instruction to obtain the image to be matched, the first obtainingunit 901 may directly obtain the image to be matched from the database local to the image matching apparatus, where the local refers to the image matching apparatus.
(2) Anextraction unit 902;
an extractingunit 902, configured to extract feature maps under multiple scales for the image to be matched.
In this embodiment, the extractingunit 902 may include apreprocessing subunit 9021 and an extractingsubunit 9022, see fig. 9b, where:
thepreprocessing subunit 9021 is configured to perform preprocessing on an image to be matched.
For example, thepreprocessing subunit 9021 may be specifically configured to perform image resizing, image data enhancement, image rotation, and/or the like on an image to be matched.
Theextraction subunit 9022 is configured to extract feature maps of the preprocessed image to be matched at different scales through a plurality of feature extraction blocks in the neural network.
(3) Aprocessing unit 903;
and theprocessing unit 903 is configured to process the feature maps under multiple scales to obtain a feature vector of the image to be matched.
In this embodiment, theprocessing unit 903 may include adimension reduction subunit 9031 and afusion subunit 9032, see fig. 9c, where:
thedimension reduction subunit 9031 is configured to perform dimension reduction operations on the feature maps under the multiple scales respectively.
Thefusion subunit 9032 is configured to fuse all the feature maps after the dimension reduction, so as to obtain a feature vector of the image to be matched.
In this embodiment, thedimension reduction subunit 9031 may specifically reduce the channel dimensions of the feature map in each scale to the same preset dimensions respectively, where the channel dimensions are the dimensions of each channel of the feature map, and the preset dimensions may be set according to the requirements of actual applications, which is not described herein again. For example, the feature maps at each scale may be reduced in dimension using a generalized mean pooling algorithm.
In this embodiment, thefusion subunit 9032 may specifically splice all the feature maps after the dimension reduction to obtain a feature vector of the image to be matched. For example, all the feature maps after dimension reduction are spliced from large to small according to the size of the feature maps to obtain the feature vector of the image to be matched.
(4) Anencoding unit 904;
and anencoding unit 904, configured to encode the feature vector to obtain a matching feature vector for image matching, where elements in the matching feature vector meet a preset requirement.
In this embodiment, theencoding unit 904 may specifically encode the feature vector, and convert the feature vector into a matching feature vector composed of two elements, i.e., 0 and 1. For example, the feature vector is encoded using a hash encoding algorithm, and the floating-point number vector is converted into a binary hash code.
The process of "encoding the feature vector using a hash encoding algorithm" specifically includes: learning a rotation matrix of the feature vector based on an iterative quantization algorithm; and carrying out hash coding on the feature vector through the rotation matrix to obtain a matching feature vector consisting of 0 and 1 elements.
Before encoding the feature vector, the method may further include: and reducing the dimension of the feature vector by using a whitening algorithm, wherein the dimension number of the feature vector after the dimension reduction is lower than a preset numerical value. That is, optionally, thecoding unit 904 may include awhitening sub-unit 9041 and acoding sub-unit 9042, as shown in fig. 9 d.
Specifically, thewhitening subunit 9041 may normalize the feature vector first, perform dimension reduction on the normalized feature vector through a whitening algorithm, and perform normalization on the feature vector after dimension reduction.
The whitening algorithm may be a projection whitening algorithm, and the dimension reduction is performed on the normalized feature vector through the projection whitening algorithm, and the specific process is as follows: and acquiring supervision information of the feature vector, calculating to obtain a mapping matrix of the feature vector based on the supervision information, and multiplying the mapping matrix and the normalized feature vector to reduce the dimension of the feature vector.
(5) Asecond acquisition unit 905;
a second obtainingunit 905, configured to obtain matching feature vectors of the candidate images.
In this embodiment, the second obtainingunit 905 may specifically include: before 'obtaining the matching characteristic vector of the candidate image', determining the class of an object carried by the image to be matched; then, matching feature vectors of the candidate images are obtained.
Correspondingly, the step of "obtaining matching feature vectors of candidate images" may include:
determining a candidate image, wherein the class of the object carried by the candidate image is the same as the class of the object carried by the image to be matched;
and acquiring the matching feature vector of the candidate image.
The matching feature vectors of the candidate images can be obtained by calculating the matching feature vectors of the candidate images in real time. In this embodiment, the matching feature vector of the candidate image may be calculated in advance, and stored in the database, for example, in a shared book of a block chain, and when the matching feature vector of the candidate image needs to be acquired, the second acquiringunit 905 acquires the matching feature vector of the candidate image from the database. For example, matching feature vectors for candidate images are obtained from a shared ledger of a blockchain.
(6) Adetermination unit 906;
a determiningunit 906, configured to determine, based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image, a similar image similar to the image to be matched in the candidate image.
In this embodiment, the determiningunit 906 may determine a similar image similar to the image to be matched in the candidate images by measuring the distance between the matching feature vectors. For example, the hamming distance between the matching feature vector of the image to be matched and the matching feature vector of the candidate image may be specifically calculated; and determining the candidate images with the Hamming distance meeting the preset conditions in the candidate images as similar images similar to the image to be matched.
Optionally, in this embodiment, the image matching apparatus may further include an output unit, as follows:
an output unit for outputting the similar image.
In the present embodiment, there are various ways of outputting the similar image. For example, the similar image may be saved locally to the image matching apparatus, or may be displayed on a display unit of the image matching apparatus.
The number of the similar images in this embodiment is at least one, and when the number of the similar images is multiple, the similarity between the similar images and the image to be matched may be determined according to the calculation in the determiningunit 906, and the similar images are sorted based on the similarity, so as to obtain a similar image set composed of the sorted similar images. Then, an output unit outputs a similar image set composed of the sorted similar images.
Specifically, the neural network may be provided to the image matching apparatus after being trained by other devices, or may also be trained by the image matching apparatus, that is, optionally, as shown in fig. 10, the image matching apparatus may further include a third obtainingunit 907 and atraining unit 908, as follows:
a third obtainingunit 907 is configured to obtain training data, where the training data includes a plurality of training images, and the category of an object carried by the training images is multiple.
Thetraining unit 908 is specifically configured to perform object classification training on the neural network based on the training data; extracting feature graphs under multiple scales from the training image through a feature extraction block in the trained neural network; processing the feature map of the training image to obtain a feature vector of the training image; and optimizing parameters of the neural network based on the feature vectors of the training images of various objects so that the difference of the feature vectors corresponding to the training images of different objects in the same object meets the preset requirement.
The classification training is to distribute the feature vectors of the training data of the same category in the same region in space, and the feature vectors of the training data not belonging to the same category are not highly similar.
Optionally, in this embodiment, at least two or more objects carried by the training images are the same object, and the object information displayed by different training images of the same object is not completely the same. For example, for the example where the training image is a commodity photograph, there are at least two commodity photographs taken from different angles for the same commodity.
In this embodiment, for each class of training images, the difference between the feature vectors corresponding to the training images of different objects in the same class of objects can meet the preset requirement through the following process, which specifically includes:
extracting three images from various training images to form a triple to obtain a plurality of triples, wherein the content carried by two images in the triple belongs to the same object and is called as an anchor example and a positive example, and the other image is a sample of different objects in the same category and is called as a negative example; calculating the distance between the anchor example feature vector and the positive example feature vector and the distance between the anchor example feature vector and the negative example feature vector to obtain the loss of the triplet; and optimizing parameters of the neural network through the loss of the triples to update the characteristic vectors, and continuously repeating the optimization process until convergence or training to a preset iteration number to obtain the trained neural network.
As can be seen from the above, in this embodiment, after an image to be matched is obtained, feature maps under multiple scales may be extracted from the image to be matched through theextraction unit 902, the feature maps under multiple scales are processed by theprocessing unit 903, so as to obtain a feature vector of the image to be matched, and the feature vector is encoded by theencoding unit 904, so as to obtain a matching feature vector for image matching, where elements in the matching feature vector meet preset requirements; next, the second obtainingunit 905 obtains the matching feature vector of the candidate image, and the determiningunit 906 determines a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image; the feature images under multiple scales are extracted, so that the feature vectors contain feature information of the image to be matched under multiple scales, the characterization force of the feature vectors is strong, and the accuracy of image matching can be greatly improved when image similarity matching is carried out, so that the matching effect is better.
Example four,
An electronic device according to an embodiment of the present application is further provided, as shown in fig. 11, which shows a schematic structural diagram of the electronic device according to an embodiment of the present application, specifically:
the electronic device may include components such as aprocessor 1101 of one or more processing cores,memory 1102 of one or more computer-readable storage media, apower supply 1103, and aninput unit 1104. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 11 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
theprocessor 1101 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in thememory 1102 and calling data stored in thememory 1102, thereby performing overall monitoring of the electronic device. Optionally,processor 1101 may include one or more processing cores; preferably, theprocessor 1101 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into theprocessor 1101.
Thememory 1102 may be used to store software programs and modules, and theprocessor 1101 executes various functional applications and data processing by operating the software programs and modules stored in thememory 1102. Thememory 1102 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, thememory 1102 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, thememory 1102 may also include a memory controller to provide theprocessor 1101 with access to thememory 1102.
The electronic device further comprises apower supply 1103 for supplying power to each component, and preferably, thepower supply 1103 is logically connected to theprocessor 1101 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. Thepower supply 1103 may also include any component, such as one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may further include aninput unit 1104, and theinput unit 1104 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, theprocessor 1101 in the electronic device loads an executable file corresponding to a process of one or more application programs into thememory 1102 according to the following instructions, and theprocessor 1101 runs the application programs stored in thememory 1102, thereby implementing various functions as follows:
acquiring an image to be matched; extracting feature maps under multiple scales from the image to be matched; processing the feature maps under multiple scales to obtain a feature vector of the image to be matched; coding the feature vector to obtain a matching feature vector for image matching, wherein elements in the matching feature vector meet preset requirements; acquiring matching feature vectors of the candidate images; and determining similar images similar to the image to be matched in the candidate images based on the matching feature vectors of the image to be matched and the matching feature vectors of the candidate images.
The neural network may be specifically provided to the image matching device after being trained by other devices, or may be trained by the image matching device, and the specific training mode may refer to the foregoing embodiment, which is not described herein again.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
As can be seen from the above, in this embodiment, after an image to be matched is obtained, feature maps under multiple scales are extracted from the image to be matched, the feature maps under multiple scales are processed to obtain a feature vector of the image to be matched, and then the feature vector is encoded to obtain a matching feature vector for image matching, where elements in the matching feature vector meet preset requirements; then, acquiring a matching feature vector of a candidate image, and determining a similar image similar to the image to be matched in the candidate image based on the matching feature vector of the image to be matched and the matching feature vector of the candidate image; the feature images under multiple scales are extracted, so that the feature vectors contain feature information of the image to be matched under multiple scales, the characterization force of the feature vectors is strong, and the accuracy of image matching can be greatly improved when image similarity matching is carried out, so that the matching effect is better.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in the image matching method provided by the present application. For example, the instructions may perform the steps of:
acquiring an image to be matched; extracting feature maps under multiple scales from the image to be matched; processing the feature maps under multiple scales to obtain a feature vector of the image to be matched; coding the feature vector to obtain a matching feature vector for image matching, wherein elements in the matching feature vector meet preset requirements; acquiring matching feature vectors of the candidate images; and determining similar images similar to the image to be matched in the candidate images based on the matching feature vectors of the image to be matched and the matching feature vectors of the candidate images.
The neural network may be specifically provided to the image matching apparatus after being trained by other devices, or may be trained by the image matching apparatus, that is, the instruction may further execute the following steps:
acquiring training data, wherein the training data comprises a plurality of training images, and the types of objects borne by the training images are multiple; performing object classification training on the neural network based on the training data; extracting feature graphs under multiple scales from a training image through a feature extraction block in the trained neural network; processing the feature map of the training image to obtain a feature vector of the training image; and optimizing parameters of the neural network based on the feature vectors of the training images of various objects so that the difference of the feature vectors corresponding to the training images of different objects in the same object meets the preset requirement.
The step of "optimizing parameters of the neural network based on the feature vectors of the training images of the various objects so that the difference of the feature vectors corresponding to the training images of different objects in the same object meets the preset requirement" may specifically include:
extracting three images from various training images to form a triple to obtain a plurality of triples, wherein the content carried by two images in the triple belongs to the same object and is called as an anchor example and a positive example, and the other image is a sample of different objects in the same category and is called as a negative example; calculating the distance between the anchor example feature vector and the positive example feature vector and the distance between the anchor example feature vector and the negative example feature vector to obtain the loss of the triplet; and (3) optimizing parameters of the neural network through loss of the triples to update the characteristic vectors, and continuously repeating the optimization process until convergence or training to preset iteration times to obtain the trained neural network.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in the image matching method provided in the embodiment of the present application, the beneficial effects that can be achieved by the image matching method provided in the embodiment of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The system related to the embodiment of the application can be a distributed system formed by connecting a client, a plurality of nodes (any form of electronic equipment in an access network, such as a server and a terminal) through a network communication mode.
Taking a distributed system as an example of a blockchain system, referring To fig. 12, fig. 12 is an optional structural schematic diagram of the distributed system 100 applied To the blockchain system provided in this embodiment of the present application, and is formed by a plurality of nodes 200 (computing devices in any form in an access network, such as servers and user terminals) and aclient 300, a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer. In this embodiment, information such as the candidate image, the category of the candidate image, and the matching feature vector may be stored in a shared book of the area chain system through a node, and an electronic device (e.g., a terminal or a server) may acquire the matching feature vector of the candidate image based on record data stored in the shared book.
Referring to the functions of each node in the blockchain system shown in fig. 12, the functions involved include:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the remaining electronic money in the electronic money address;
and 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3) Intelligent contracts, computerized agreements, which can execute the terms of a certain contract, are realized by codes deployed on a shared account for execution when certain conditions are met, are used for completing automated transactions according to actual business requirement codes, such as inquiring the logistics status of goods purchased by a buyer, and transferring the electronic money of the buyer to the address of a merchant after the buyer signs the goods; of course, intelligent contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
Referring to fig. 13, fig. 13 is an optional schematic diagram of a Block Structure (Block Structure) provided in this embodiment, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash value to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
The foregoing describes in detail an image matching method, an image matching apparatus, an electronic device, and a storage medium provided in embodiments of the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the description of the foregoing embodiments is only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.