Summary of the invention
The objective of the invention is to make the same subject image gather into a class, retrieve, propose a kind of Web image clustering method based on image and text relevant mining to make things convenient for the user for the Web image searching result is carried out cluster.
Web image clustering method based on image and text relevant mining comprises the steps:
(1) extracts the image in the result for retrieval of Google picture searching and follow text according to user inquiring, extract the noun of following in the text and constitute vocabulary;
(2) to following text to carry out text-processing and extracting text feature;
(3) visibility of each word in the calculating vocabulary;
(4) visibility of word is integrated related to calculate word and image correlation with the TF-IDF method;
(5) according to topic model to following text collection analysis, extract implicit theme probability distribution to calculate the degree of subject relativity between any two words in the vocabulary;
(6) utilize complicated graph model and word related with image correlation is related with the word topic relativity to word and carry out modeling;
(7) use complicated figure clustering algorithm image is carried out cluster.
Image in the described result for retrieval that extracts the Google picture searching according to user inquiring and follow text, extracting the noun of following in the text, to constitute the step of vocabulary as follows:
(1) writes image in the result for retrieval that the reptile program downloads the Google picture searching, composing images set IMG={Image1..., ImageNd, N whereindIt is the total number of images among the set IMG;
(2) each image place webpage among the download images set IMG utilizes page analysis program that each webpage is resolved, and behind removal HTML mark and the punctuation mark, the content of text on the reservation page is as the text of following of image;
(3) text of following to each image carries out part-of-speech tagging, removes non-noun word, keeps the noun in the text, constitutes and follows text collection D={d1..., dNd, N whereindBe that set is followed the text sum among the D;
(4) sequential scanning follows among the text collection D each to follow text diIn all words, i=1 wherein ..., Nd, each various words keeps one, forms the vocabulary VOL={w that word list is representedi..., wNw, N whereinwIt is the total words among the vocabulary VOL.
Described to following text to carry out text-processing and to extract the step of text feature as follows:
(1) to each the word w among the vocabulary VOLi, i=1 wherein ..., Nw, NwBe total words in the vocabulary, sequential scanning follows among the text collection D each to follow text dj, add up each word wiAt each document djThe middle frequency n that occursIj, j=1 wherein ..., Nd, NdBe to follow the text sum, and comprise word w among the statistics set DiFollow text number n um (wi);
(2) calculate each word w according to formula (1)iFollow text d at eachjIn word frequency freq (wi, dj), i=1 wherein ..., Nw, NwBe total words in the vocabulary, j=1 ..., Nd, NdBe to follow the text sum among the set D;
(3) to each the word w among the vocabulary VOLi, calculate its contrary document word frequency idf (w according to formula (2)i);
idf(wi)=log(Nd/num(wi)). (2)
(4), will gather that each follows text d among the D according to vector space modeljBe expressed as NwDimensional vector, i are tieed up the word w in the corresponding vocabularyi, its value is tfidf (wi), computing formula is as follows:
tfidf(wi)=freq(wi,dj)×idf(wi). (3)。
The method of the visibility of each word is in the described calculating vocabulary: each word w among the vocabulary VOLiVisual scale value vis (wi) calculate by formula (4);
Wherein, C1Be with word wiSubmit to the result for retrieval sum that the Google picture searching returns, C as inquiry2Be with word wiSubmit to the result for retrieval sum that the Google text search returns as inquiry; Exponential factor IDFGoogle(wi) computing formula as follows:
IDFGoogle(wi)=log(|DGoogle|/C2). (5)
Wherein, DGoogleBe all Web page set of Google index, | DGoogle| expression set DGoogleIn page sum.
Described visibility with word is integrated with the TF-IDF method to calculate the related method of word and image correlation to be: word wiWith image I magejThe related r (w of correlativityi, Imagei) calculate by formula (6), j=1 wherein ..., Nd, NdBe to follow the text sum;
r(wi,Imagej)=tfidf(wi)×vis(wi). (6)。
Described according to topic model to following text collection analysis, it is as follows with the step of calculating the degree of subject relativity between any two words in the vocabulary to extract implicit theme probability distribution:
(1) with vocabulary VOL, follow text collection D and the implicit number of topics k of set among the D as the input that the implicit Di Li Cray of topic model distributes, export each implicit theme zjProbability distribution P (zj) and zjAt each word wiOn probability distribution P (wi| zj), j=1 wherein ..., k;
(2) any two word w among the set VOLsAnd wtBetween degree of subject relativity Topic_r (ws, wt) by the defined degree of subject relativity function calculation of formula (7), wherein σ is a normaliztion constant,
And word with word topic relativity related method of carrying out modeling related with image correlation is the complicated graph model of described utilization to word: complicated graph model comprises image node and two kinds of dissimilar nodes of word node, the link of isomery link between word and image and the isomorphism between word and word is as the limit between node, and word and image links weight are by the related r (w with image correlation of the defined word of formula (6)i, Imagei) calculate, word and word link weight are the word and the word degree of subject relativity function T opic_r (w of formula (7) definitions, wt) calculate, complicated graph model is expressed as set of matrices as shown in Equation (8);
{}. (8)
Wherein, symmetric matrixExpression word and word correlation matrix, NwBe total words in the vocabulary, R+Be the arithmetic number set, matrix element SIj(the expression of i ≠ j) word wiAnd wjBetween degree of subject relativity, SIj=Topic_r (wi, wj) matrixExpression word and image correlation matrix, NdBe total number of images, matrix element AIjExpression word wiWith j image I magejBetween the correlativity association, AIj=tfidf (wi) vis (wi).
The complicated figure clustering algorithm of described application can be expressed as the defined optimization problem of formula (9) the method that image carries out cluster;
Wherein, each component ofvector 1 all is 1, k1And k2The cluster number of representing word and image respectively, generic oriental matrix C(1)And C(2)Be the output of complicated figure clustering algorithm, matrix element CPq(2)Represent p image I magepBelong to the q class, the complicated figure clustering algorithm that the defined optimization problem of formula (9) is found the solution is shown in algorithm 1:
Algorithm 1. complicated figure G1Clustering algorithm CGC.
Input: matrix S and A;
Output: generic oriental matrix C(1)And C(2), k1And k2It is respectively the cluster number of word and image;
2-5 is up to convergence forstep 1. iteration step;
Step 2. is calculated D=((C(1))TC(1))-1(C(1))TSC(1)(C(1))TC(1))-1
Step 3. is calculated B=((C(1))TC(1))-1(C(1))TAC(2)(C(2))TC(2))-1
Step 4. is D fixedly, B and C(2), upgrade C line by line(1), make to minimize L that L is calculated as follows:
L=||S-C(1)D(C(1))T||2+||A-C(1)B(C(2))T||2;
Step 5. is D fixedly, B and C(1), upgrade C line by line(2), make to minimize L that L is calculated as follows:
L=||S-C(1)D(C(1))T||2+||A-C(1)B(C(2))T||2.
Generic oriental matrix C according toalgorithm 1 output(2)The method of image collection IMG being carried out cluster is, if matrix elementThen p image I magepBe classified as the q class, p=1 wherein ..., Nd, NdTotal number of images among the expression set IMG, q=1 ..., k2, k2The cluster number of image among the expression IMG.
The useful effect that the present invention has is: the present invention is related in conjunction with the correlativity of definition word and image with traditional TF-IDF method with word visibility model, has broken through TF-IDF method a kind of text-processing technology of writing a composition and can not directly measure the restriction of correlativity between word and the image; And word related with image correlation is related with the word topic relativity to word carries out modeling and proposed a kind of new Web image clustering framework by complicated figure, improved Web image clustering precision, make image searching result sort out, make things convenient for the user to retrieve according to theme.
Embodiment
The present invention proposes a kind of Web image clustering method based on image and text relevant mining, and in conjunction with the accompanying drawings, its enforcement is described in detail as follows.
Web image clustering method based on image and text relevant mining comprises the steps:
(1) extracts the image in the result for retrieval of Google picture searching and follow text according to user inquiring, extract the noun of following in the text and constitute vocabulary;
(2) to following text to carry out text-processing and extracting text feature;
(3) visibility of each word in the calculating vocabulary;
(4) visibility of word is integrated related to calculate word and image correlation with the TF-IDF method;
(5) according to topic model to following text collection analysis, extract implicit theme probability distribution to calculate the degree of subject relativity between any two words in the vocabulary;
(6) utilize complicated graph model and word related with image correlation is related with the word topic relativity to word and carry out modeling;
(7) use complicated figure clustering algorithm image is carried out cluster.
Image in the described result for retrieval that extracts the Google picture searching according to user inquiring and follow text, extracting the noun of following in the text, to constitute the step of vocabulary as follows:
(1) writes image in the result for retrieval that the reptile program downloads the Google picture searching, composing images set IMG={Image1..., ImageNd, N whereindIt is the total number of images among the set IMG;
(2) each image place webpage among the download images set IMG utilizes page analysis program that each webpage is resolved, and behind removal HTML mark and the punctuation mark, the content of text on the reservation page is as the text of following of image;
(3) text of following to each image carries out part-of-speech tagging, removes non-noun word, keeps the noun in the text, constitutes and follows text collection D={d1..., dNd, N whereindBe that set is followed the text sum among the D;
(4) sequential scanning follows among the text collection D each to follow text diIn all words, i=1 wherein ..., Nd, each various words keeps one, forms the vocabulary VOL={w that word list is representedi..., wNw, N whereinwIt is the total words among the vocabulary VOL.
Described to following text to carry out text-processing and to extract the step of text feature as follows:
(1) to each the word w among the vocabulary VOLi, i=1 wherein ..., Nw, NwBe total words in the vocabulary, sequential scanning follows among the text collection D each to follow text dj, add up each word wiAt each document djThe middle frequency n that occursIj, j=1 wherein ..., Nd, NdBe to follow the text sum, and comprise word w among the statistics set DiFollow text number n um (wi);
(2) calculate each word w according to formula (1)iFollow text d at eachjIn word frequency freq (wi, dj), i=1 wherein ..., Nw, NwBe total words in the vocabulary, j=1 ..., Nd, NdBe to follow the text sum among the set D;
(3) to each the word w among the vocabulary VOLi, calculate its contrary document word frequency idf (w according to formula (2)i);
idf(wi)=log(Nd/num(wi)). (2)
(4), will gather that each follows text d among the D according to vector space modeljBe expressed as NwDimensional vector, i are tieed up the word w in the corresponding vocabularyi, its value is tfidf (wi), computing formula is as follows:
tfidf(wi)=freq(wi,dj)×idf(wi). (3)。
The method of the visibility of each word is in the described calculating vocabulary: each word w among the vocabulary VOLiVisual scale value vis (wi) calculate by formula (4);
Wherein, C1Be with word wiSubmit to the result for retrieval sum that the Google picture searching returns, C as inquiry2Be with word wiSubmit to the result for retrieval sum that the Google text search returns as inquiry; Exponential factor IDFGoogle(wi) computing formula as follows:
IDFGoogle(wi)=log(|DGoogle|/C2). (5)
Wherein, DGoogleBe all Web page set of Google index, | DGoogle| expression set DGoogleIn page sum.
Described visibility with word is integrated with the TF-IDF method to calculate the related method of word and image correlation to be: word wiWith image I magejThe related r (w of correlativityi, Imagei) calculate by formula (6), j=1 wherein ..., Nd, NdBe to follow the text sum;
r(wi,Imagej)=tfidf(wi)×vis(wi). (6)。
Described according to topic model to following text collection analysis, it is as follows with the step of calculating the degree of subject relativity between any two words in the vocabulary to extract implicit theme probability distribution:
(1) with vocabulary VOL, follow text collection D and the implicit number of topics k of set among the D as the input that the implicit Di Li Cray of topic model distributes, export each implicit theme zjProbability distribution P (zj) and zjAt each word wiOn probability distribution P (wi| zj), j=1 wherein ..., k;
(2) any two word w among the set VOLsAnd wtBetween degree of subject relativity Topic_r (ws, wt) by the defined degree of subject relativity function calculation of formula (7), wherein σ is a normaliztion constant,
And word with word topic relativity related method of carrying out modeling related with image correlation is the complicated graph model of described utilization to word: complicated graph model comprises image node and two kinds of dissimilar nodes of word node, the link of isomery link between word and image and the isomorphism between word and word is as the limit between node, and word and image links weight are by the related r (w with image correlation of the defined word of formula (6)i, Imagei) calculate, word and word link weight are the word and the word degree of subject relativity function T opic_r (w of formula (7) definitions, wt) calculate, complicated graph model is expressed as set of matrices as shown in Equation (8);
{}.(8)
Wherein, symmetric matrixExpression word and word correlation matrix, NwBe total words in the vocabulary, R+Be the arithmetic number set, matrix element SIj(the expression of i ≠ j) word wiAnd wjBetween degree of subject relativity, SIj=Topic_r (wi, wj) matrixExpression word and image correlation matrix, NdBe total number of images, matrix element AIjExpression word wiWith j image I magejBetween the correlativity association, AIj=tfidf (wi) vis (wi).
The complicated figure clustering algorithm of described application can be expressed as the defined optimization problem of formula (9) the method that image carries out cluster;
Wherein, each component ofvector 1 all is 1, k1And k2The cluster number of representing word and image respectively, generic oriental matrix C(1)And C(2)Be the output of complicated figure clustering algorithm, matrix element CPq(2)Represent p image I magepBelong to the q class, the complicated figure clustering algorithm that the defined optimization problem of formula (9) is found the solution is shown in algorithm 1:
Algorithm 1. complicated figure G1Clustering algorithm CGC.
Input: matrix S and A;
Output: generic oriental matrix C(1)And C(2), k1And k2It is respectively the cluster number of word and image;
2-5 is up to convergence forstep 1. iteration step;
Step 2. is calculated D=((C(1))TC(1))-1(C(1))TSC(1)(C(1))TC(1))-1
Step 3. is calculated B=((C(1))TC(1))-1(C(1))TAC(2)(C(2))TC(2))-1
Step 4. is D fixedly, B and C(2), upgrade C line by line(1), make to minimize L that L is calculated as follows:
L=||S-C(1)D(C(1))T||2+||A-C(1)B(C(2))T||2;
Step 5. is D fixedly, B and C(1), upgrade C line by line(2), make to minimize L that L is calculated as follows:
L=||S-C(1)D(C(1))T||2+||A-C(1)B(C(2))T||2.
Generic oriental matrix C according toalgorithm 1 output(2)The method of image collection IMG being carried out cluster is, if matrix elementThen p image I magepBe classified as the q class, p=1 wherein ..., Nd, NdTotal number of images among the expression set IMG, q=1 ..., k2, k2The cluster number of image among the expression IMG.
Embodiment
Selected 5 vision polysemants as inquiry, they are: " apple ", " bass ", " jaguar ", " mouse " and " tower ".Write the reptile program, extracted Goolge ImageSearch automatically as inquiry according to the key word of submitting toTMReturn results.To each image in the return results, downloaded the Web page at image file and this image place.Because Google has limited and searched for the actual quantity of returning as a result, data set comprises about 4000 data item altogether.In order to extract the text of following of image, the Web page at image place is resolved, extract the follow text of the text of image word on every side as this image.All texts of following pass through part-of-speech tagging, extract noun wherein.Inquiring about its noun vocabulary scale of following text for each is 1000~2000 words.In order to obtain benchmark generic listing vector, we mark the manual image category that data are concentrated.
The workflow diagram of committed step of the present invention is an example with submit queries " bass " as shown in Figure 1, and concrete implementation step is:
1. write all images and image place webpage in the result for retrieval that the reptile program downloads the Google picture searching, by page resolver each html page is resolved, remove HTML mark and punctuation mark, obtain the image collection IMG={Image shown in Fig. 1 (a)1..., ImageNdAnd follow text collection D={d1..., dNd, NdBeing to follow the text sum, also is total number of images simultaneously;
2. utilize the part-of-speech tagging program that each is followed text diCarry out part-of-speech tagging, i=1 wherein ..., Nd, remove the non-noun word in the text, keep the noun in the text;
3. sequential scanning follows among the text collection D each to follow text diIn all words, each various words keeps one, forms the vocabulary VOL={w that word list is representedi..., wNw, N whereinwBe the total words among the vocabulary VOL, to each the word w among the vocabulary VOLiAdd up each word wiAt each document djThe middle frequency n that occursIj, and comprise word w among the set DiFollow text number n um (wi);
4. each is followed text dj(j=1 ..., Nd) extract its text feature, concrete steps are:
(1) to each word w among the vocabulary VOLi, i=1 wherein ..., Nw, NwBe total words in the vocabulary, calculate wiFollowing text djIn word frequency
(2) to each word w among the vocabulary VOLi, calculate wiContrary document word frequency idf (wi)=log (Nd/ num (wi));
(3) according to vector space model, with document djBe expressed as NwDimensional vector:I ties up the word w in the corresponding vocabularyi, its value is tfidf (wi)=freq (wi, dj) * idf (wi);
5. to each word w among the vocabulary VOLiCalculate its visibilityWherein, C1Be with word wiSubmit to the result for retrieval sum that the Google picture searching returns, C as inquiry2Be with word wiSubmit to the result for retrieval sum that the Google text search returns as inquiry; Exponential factor IDFGoogle(wi) computing formula as follows:
IDFGoogle(wi)=log(|DGoogle|/C2)
Wherein, DGoogleBe all Web page set of Google index, | DGoogle| expression set DGoogleIn page sum, in the present embodiment | DGoogle|=5 * 1011
The visibility of word has embodied word, noun especially, contain the degree that semantic usable image is described.From the angle of cognitive psychology and thinking in images, the word of high-visibility, as " banana ", than the word of low visibility, as " Bayesian ", the easier direct vision image that in human brain, forms.Can be used for expressing the semantic association between word and the image with visibility as a kind of new attribute of word.In the Web page, each word has visibility in various degree around the image, and the high-visibility word has stronger descriptive power to the semanteme of image.With C1/ C2Value can be weighed the visibility of various words, for example C of word " banana " to a certain extent as quantizating index1/ C2Value is greater than " Bayesian ".With Fig. 2 is example, this image be this image be with key word " bass " as inquiry, among preceding 5 results that return by the Google image search engine one.Follow noun C in the text1And C2Be worth to retrieve and obtain in May, 2009 from Google, as shown in table 1.As shown in Figure 3, the C of speech such as " legend ", " record ", " scale "1/ C2Value greater than " largemouth " and " fishermen ".But according to the visibility definition, because " largemouth " and " fishermen " is two main objects in this width of cloth image, they should have more high-visibility.Cause this result's reason to be, the more wide in range words of theme such as " record " appears on the Web page in large quantities, also appears at following in the text of image simultaneously in large quantities, thereby has improved their C1/ C2Value.The C of the wide in range word of theme2Be worth often very big, therefore proposed by the invention visibility model utilization " the contrary document word frequency factor " IDFGoogle(wi)=log (| DGoogle|/C2) come its visibility is suppressed, | DGoogle| be all Web page sums of Google index.The vis of noun (w) is worth as shown in Figure 3 among Fig. 3, and the vis (w) of " largemouth " and " fishermen " is worth maximum, and visible the present invention puies forward the rationality of visibility model.
Table 1
6. calculate each word w among the VOLiWith image I magejThe related r (w of correlativityi, Imagej)=tfidf (wi) * vis (wi); Structure word and image correlation matrixMatrix element AIjExpression word wiWith j image I magejBetween the degree of correlation, AIj=r (wi, Imagej).
7. to any two word w among the vocabulary VOLsAnd wtCalculate its degree of subject relativity, and structure word and word correlation matrix, concrete steps are as follows:
(1) with vocabulary VOL, follow text collection D and implicit number of topics k as the input that the implicit Di Li Cray of topic model distributes, export each implicit theme zj(j=1 ..., probability distribution P (z k)j) and zjAt each word wiOn probability distribution P (wi| zj);
(2) any two word wsAnd wtBetween degree of subject relativity Topic_r (ws, wt) be calculated as follows, σ is a normaliztion constant.
(3) structure word and word correlation matrix are symmetric matrixMatrix element SIj(the expression of i ≠ j) word wiAnd wjBetween degree of subject relativity, SIj=Topic_r (wi, wj).
8. obtain complicated graph model shown in Fig. 1 (b) through above step, this complexity graph model can be expressed as set of matrices. using complicated figure clustering algorithm can carry out cluster to image collection IMG, and complicated figure clustering algorithm is expressed as optimization problem;
Wherein, each component ofvector 1 all is 1, k1And k2The cluster number of representing word and image respectively, generic oriental matrix C(1)And C(2)Be the output of complicated figure clustering algorithm, matrix element CPq(2)Represent p image I magepBelong to the q class, the concrete steps of complicated figure clustering algorithm are shown in algorithm 1:
Algorithm 1. complicated figure G1Clustering algorithm CGC.
Input: matrix S and A;
Output: generic oriental matrix C(1)And C(2), k1And k2It is respectively the cluster number of word and image;
2-5 is up to convergence forstep 1. iteration step;
Step 2. is calculated D=((C(1))TC(1))-1(C(1))TSC(1)(C(1))TC(1))-1
Step 3. is calculated B=((C(1))TC(1))-1(C(1))TAC(2)(C(2))TC(2))-1
Step 4. is D fixedly, B and C(2), upgrade C line by line(1), make to minimize L that L is calculated as follows:
L=||S-C(1)D(C(1))T||2+||A-C(1)B(C(2))T||2;
Step 5. is D fixedly, B and C(1), upgrade C line by line(2), make to minimize L that L is calculated as follows:
L=||S-C(1)D(C(1))T||2+||A-C(1)B(C(2))T||2.
Generic oriental matrix C according toalgorithm 1 output(2)The method of image collection IMG being carried out cluster is, if matrix elementThen p image I magepBe classified as the q class, p=1 wherein ..., Nd, NdTotal number of images among the expression set IMG, q=1 ..., k2, k2The cluster number of image among the expression IMG.
As shown in Figure 1, obtain cluster result through step (2), image is classified as 3 theme class in this example, is respectively " bass fishing ", " bass fish " and " bass guitar ".
For the validity that shows core content of the present invention and the overall performance of cluster framework, we carry out following cluster result contrast:
(1) distinguishes r (wi, Imagej)=tfidf (wi) and r (wi, Imagej)=tfidf (wi) * vis (wi) two kinds of situations carry out cluster;
(2) with the related Topic_r (w of the topic relativity between words, wt) with the related P (w of word symbiosis correlativitys, wt) contrast, image follow in the text any two word wsAnd wtThe symbiosis correlativity be defined as the probability P (w in the text of following that they appear at certain image simultaneouslys, wt)=num (ws, wt)/Nd, num (ws, wt) be that it is followed and comprises word w in the text simultaneouslysAnd wtThe number of image.Related with the symbiosis correlativity in conjunction with topic relativity, word and word isomorphism link weight are defined as: λ p (ws, wt)+(1-λ) Topic_r (ws, wt), wherein λ (0<λ<1) is an adjustable parameter.
The cluster Performance evaluation criterion adopts normalized cluster mutual information, i.e. Normalized MutualInformation.Normalized cluster mutual information is defined as: given cluster number k, generic listing vector λ=(λ1..., λK) middle λiSpan be λi=1 ... k, λi=j represents that i data item belongs to CjClass.Use λ(a)And λ(b)Respectively ecbatic with benchmark generic listing vector, then λ(a)And λ(b)Normalization cluster mutual information φ(NMI)Be defined as:
Wherein, nh(a)Be corresponding to λ(a)Class ChIn the data item number, nl(b)Be corresponding to λ(b)Class ClIn the data item number.CHlExpression is gathered at λ simultaneously(a)Class ChIn and λ(b)Class ClIn the number of data item.The λ of certain cluster result(a)With benchmark generic λ(b)Between mutual information value φ(NMI)(λ(a), λ(b)) big more, represent that this cluster effect is good more.Desirable cluster is φ(NMI)(λ(a), λ(b))=1.
For parameter lambda, consider three kinds of situations:
1)λ=1;
2)λ=0;
3)λ=0.15;
As shown in Figure 4: the NMI value for all 5 the complicated figure clusters of inquiry all o'clock reaches best in λ=0, so can show the rationality of word proposed by the invention and word degree of subject relativity.
As shown in Figure 4: " λ=0 (vis (w)) " expression word and image links weight adopt AIj=tfidf (wi) * vis (wi).The result can see by the cluster mutual information, in complicated figure cluster, the visibility of word is introduced word make the high-visibility word to the related with it more topic relativity information of image node transmission with the image links weight, has improved the cluster performance.
With shown in Figure 5 be example to inquiry " jaguar " retrieving images cluster result, contrast (a) and (b) figure can see to such an extent that strengthen under some word of describing the image special object and the image links weight situation introducing visibility, the cluster performance improves.
Be to adopt the Web image clustering method the present invention is based on image and text relevant mining that inquiry mouse is submitted to Google picture searching institute return results to carry out preceding 10 images in three themes of cluster gained as shown in Figure 6, first row are theme " computer mouse ", secondary series is theme " mouse animal ", and the 3rd row are theme " Mickey mouse "; The image of red dotted border is wrong cluster item.