CN101582080A

Movatterモバイル変換

Info

Publication number: CN101582080A
Application number: CNA2009101000718A
Authority: CN
Inventors: 庄越挺; 吴飞; 韩亚洪
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2009-06-22
Filing date: 2009-06-22
Publication date: 2009-11-18
Anticipated expiration: 2029-06-22
Also published as: CN101582080B

Abstract

本发明公开了一种基于图像和文本相关性挖掘的Web图像聚类方法。包括如下步骤：(1)根据查询提取Google图片搜索结果中的图像及其伴随文本；(2)提取伴随文本中名词构成词汇表；(3)计算词汇表中单词的可见度，并将其与TF－IDF方法集成以计算单词和图像相关性关联；(4)计算词汇表中任意两个单词间的主题相关度；(5)利用复杂图对相关性关联建模；(6)应用复杂图聚类算法对图像进行聚类。本发明将单词可见度与TF－IDF方法结合定义单词和图像的相关性关联，突破了TF－IDF方法作为一种文本处理技术不能直接度量单词和图像之间相关性的限制，通过复杂图对单词和图像以及单词和单词相关性关联建模提出了一种Web图像聚类框架，使得图像检索结果根据主题进行归类，方便用户进行检索。The invention discloses a Web image clustering method based on image and text correlation mining. It includes the following steps: (1) extract images and their accompanying texts in Google image search results according to the query; (2) extract nouns in the accompanying texts to form a vocabulary; (3) calculate the visibility of words in the vocabulary and compare them with TF -IDF method integration to calculate word and image correlation; (4) calculate topic correlation between any two words in the vocabulary; (5) use complex graph to model correlation; (6) apply complex graph aggregation Class algorithm to cluster images. The present invention combines the word visibility with the TF-IDF method to define the correlation between words and images, breaking through the limitation that the TF-IDF method as a text processing technology cannot directly measure the correlation between words and images. A web image clustering framework is proposed based on correlation modeling with images and words and words, so that the image retrieval results can be classified according to topics, which is convenient for users to retrieve.

Description

A kind of Web image clustering method based on image and text relevant mining

Technical field

The present invention relates to multimedia retrieval, relate in particular to a kind of Web image clustering method based on image and text relevant mining.

Background technology

On Web, use the keyword search image to remain retrieval method effectively commonly used, as the picture searching of commercial search engine Google and AltaVista.In the Web image retrieval, the key word that the user submits to is the vision polysemant often, and this class word comprises a plurality of different vision implications.For example " computer mouse ", " mouse animal " and a plurality of themes such as " Mickey mouse " can be represented in word " mouse ".Therefore, with these vision polysemant query image, the image searching result that is returned can comprise a plurality of themes, and the image blend of different themes together.This just need provide a kind of last handling process of retrieving to come the image of expressing different themes is sorted out.Recently, Many researchers has proposed the Web image clustering method and has solved this problem.Because have " semantic wide gap " between level image feature and the high-level semantic, these clustering methods have often utilized the multi-modal information such as vision, text and link that comprised by the cluster image collection simultaneously.The multi-modal information that belongs to the different characteristic space is to be mutually related, and excavating with utilizing these correlativitys related is an emphasis problem of recent machine learning research with the study of carrying out multi-modal information fusion, and representative work has various visual angles study and transfer learning.The former utilizes the various features space representation of same data to learn simultaneously, and latter's research and training data and test data have different distributions or belong to the problem concerning study in different characteristic space.It is related that the present invention excavates the correlativity of two kinds of modal informations of text and image, by graph model its incidence relation carried out modeling, and utilize the figure clustering algorithm that the Web image is carried out cluster.

The Web image follows text to coexist as among the html page with it usually, follows text and some text labels to describe the semantic content of image.In Web image retrieval and mark field, a lot of research and utilizations the correlativity association between image and the text.But, follow in the text various words that image, semantic is described the difference of contributing.For a plurality of words in the text, the word that has can find suitable image to vividly describe the implication of this word, for example " chairs "; The word that has is more abstract, then is difficult to find a suitable images to vividly describe the implication of this word, for example " statistics ".From the angle of thinking in images, this species diversity has reflected and has had different semantic associations between word and the image, reflects that also word has " visibility " attribute.So-called visibility is the probability that certain word can be visually perceived.As a kind of text-processing technology, TF-IDF can not directly measure the correlativity between word and the image, and tradition is weighed by the TF-IDF method and followed in the text word that the importance of image has been ignored the visual signature that image itself has to a certain extent.Therefore, the present invention proposes a kind of word visibility model, and with this model and TF-IDF method in conjunction with define a kind of new word and image correlation related.

On the other hand, for the Web image collection that comprises a plurality of themes, it follows implicit subject information in the text to reflect topic relativity between image indirectly.For this topic relativity is introduced the Web image clustering, the present invention utilizes implicit Di Li Cray to distribute and learns to obtain being distributed in the implicit theme probability on each word, by the degree of subject relativity function calculation word and the word topic relativity of definition.Latent dirichlet allocation model, be Latent Dirichlet Allocation, it is a kind of unsupervised learning model that can extract the implicit theme of text that proposes in recent years, as a kind of generating probability model, implicit Di Li Cray distribution is modeled in the set of a discrete data, as text data set.In the text representation field, it is typical case's representative of topic model that implicit Di Li Cray distributes, and can carry out modeling to the subject information that text data comprises.

Therefore, the present invention is by excavating image and following the correlativity association between the text to obtain two kinds of incidence relations: word is related with image correlation and word is related with the word topic relativity, and this cross correlation can carry out modeling with graph model.Traditional graph model can only carry out modeling to the link of the isomorphism between single type node and node.Bigraph (bipartite graph) can carry out modeling to two types of nodes, but this graph model only comprises the isomery link between the dissimilar nodes.Because two kinds of incidence relations that the present invention relates to had both comprised the isomery link between word and the image two class different node, comprise the isomorphism link between word and the word node of the same type again, therefore propose these two kinds of incidence relations to be carried out modeling, and use complicated figure clustering algorithm image is carried out cluster with more generally complicated graph model.

Summary of the invention

The objective of the invention is to make the same subject image gather into a class, retrieve, propose a kind of Web image clustering method based on image and text relevant mining to make things convenient for the user for the Web image searching result is carried out cluster.

Web image clustering method based on image and text relevant mining comprises the steps:

(1) extracts the image in the result for retrieval of Google picture searching and follow text according to user inquiring, extract the noun of following in the text and constitute vocabulary;

(2) to following text to carry out text-processing and extracting text feature;

(3) visibility of each word in the calculating vocabulary;

(4) visibility of word is integrated related to calculate word and image correlation with the TF-IDF method;

(5) according to topic model to following text collection analysis, extract implicit theme probability distribution to calculate the degree of subject relativity between any two words in the vocabulary;

(6) utilize complicated graph model and word related with image correlation is related with the word topic relativity to word and carry out modeling;

(7) use complicated figure clustering algorithm image is carried out cluster.

Image in the described result for retrieval that extracts the Google picture searching according to user inquiring and follow text, extracting the noun of following in the text, to constitute the step of vocabulary as follows:

(1) writes image in the result for retrieval that the reptile program downloads the Google picture searching, composing images set IMG={Image₁..., Image_Nd, N wherein_dIt is the total number of images among the set IMG;

(2) each image place webpage among the download images set IMG utilizes page analysis program that each webpage is resolved, and behind removal HTML mark and the punctuation mark, the content of text on the reservation page is as the text of following of image;

(3) text of following to each image carries out part-of-speech tagging, removes non-noun word, keeps the noun in the text, constitutes and follows text collection D={d₁..., d_Nd, N wherein_dBe that set is followed the text sum among the D;

(4) sequential scanning follows among the text collection D each to follow text d_iIn all words, i=1 wherein ..., N_d, each various words keeps one, forms the vocabulary VOL={w that word list is represented_i..., w_Nw, N wherein_wIt is the total words among the vocabulary VOL.

Described to following text to carry out text-processing and to extract the step of text feature as follows:

(1) to each the word w among the vocabulary VOL_i, i=1 wherein ..., N_w, N_wBe total words in the vocabulary, sequential scanning follows among the text collection D each to follow text d_j, add up each word w_iAt each document d_jThe middle frequency n that occurs_Ij, j=1 wherein ..., N_d, N_dBe to follow the text sum, and comprise word w among the statistics set D_iFollow text number n um (w_i);

(2) calculate each word w according to formula (1)_iFollow text d at each_jIn word frequency freq (w_i, d_j), i=1 wherein ..., N_w, N_wBe total words in the vocabulary, j=1 ..., N_d, N_dBe to follow the text sum among the set D;

freq (w_{i}, d_{j}) = n_{ij} / Σ_{k = 1}^{N_{w}} n_{kj} . - - - (1)

(3) to each the word w among the vocabulary VOL_i, calculate its contrary document word frequency idf (w according to formula (2)_i);

idf(w_i)＝log(N_d/num(w_i)). (2)

(4), will gather that each follows text d among the D according to vector space model_jBe expressed as N_wDimensional vector, i are tieed up the word w in the corresponding vocabulary_i, its value is tfidf (w_i), computing formula is as follows:

tfidf(w_i)＝freq(w_i，d_j)×idf(w_i). (3)。

The method of the visibility of each word is in the described calculating vocabulary: each word w among the vocabulary VOL_iVisual scale value vis (w_i) calculate by formula (4);

vis (w_{i}) = {((C_{1} + 10^{- 9}) / (C_{2} + 10^{- 9}))}^{- {IDF}_{Google} (w_{i})} . - - - (4)

Wherein, C₁Be with word w_iSubmit to the result for retrieval sum that the Google picture searching returns, C as inquiry₂Be with word w_iSubmit to the result for retrieval sum that the Google text search returns as inquiry; Exponential factor IDF_Google(w_i) computing formula as follows:

IDF_Google(w_i)＝log(|D_Google|/C₂). (5)

Wherein, D_GoogleBe all Web page set of Google index, | D_Google| expression set D_GoogleIn page sum.

Described visibility with word is integrated with the TF-IDF method to calculate the related method of word and image correlation to be: word w_iWith image I mage_jThe related r (w of correlativity_i, Image_i) calculate by formula (6), j=1 wherein ..., N_d, N_dBe to follow the text sum;

r(w_i，Image_j)＝tfidf(w_i)×vis(w_i). (6)。

Described according to topic model to following text collection analysis, it is as follows with the step of calculating the degree of subject relativity between any two words in the vocabulary to extract implicit theme probability distribution:

(1) with vocabulary VOL, follow text collection D and the implicit number of topics k of set among the D as the input that the implicit Di Li Cray of topic model distributes, export each implicit theme z_jProbability distribution P (z_j) and z_jAt each word w_iOn probability distribution P (w_i| z_j), j=1 wherein ..., k;

(2) any two word w among the set VOL_sAnd w_tBetween degree of subject relativity Topic_r (w_s, w_t) by the defined degree of subject relativity function calculation of formula (7), wherein σ is a normaliztion constant,

Topic_r (w_{s}, w_{t}) = \max_{j} P (z = j | w_{s}) P (z = j | w_{t})

= \max_{j} \frac{p (w_{s} | z = j) P (z = j)}{P (w_{s})} \cdot \frac{p (w_{t} | z = j) P (z = j)}{P (w_{t})} - - - (7) .

= \max_{j} \frac{p (w_{s} | z = j) p (w_{t} | z = j) P (z = j)}{σ} .

And word with word topic relativity related method of carrying out modeling related with image correlation is the complicated graph model of described utilization to word: complicated graph model comprises image node and two kinds of dissimilar nodes of word node, the link of isomery link between word and image and the isomorphism between word and word is as the limit between node, and word and image links weight are by the related r (w with image correlation of the defined word of formula (6)_i, Image_i) calculate, word and word link weight are the word and the word degree of subject relativity function T opic_r (w of formula (7) definition_s, w_t) calculate, complicated graph model is expressed as set of matrices as shown in Equation (8);

{

S &Element; R_{+}^{N_{w} \times N_{w}},

A &Element; R_{+}^{N_{d} \times N_{d}}

}. (8)

Wherein, symmetric matrix

S &Element; R_{+}^{N_{w} \times N_{w}}

Expression word and word correlation matrix, N_wBe total words in the vocabulary, R₊Be the arithmetic number set, matrix element S_Ij(the expression of i ≠ j) word w_iAnd w_jBetween degree of subject relativity, S_Ij=Topic_r (w_i, w_j) matrix

A &Element; R_{+}^{N_{w} \times N_{d}}

Expression word and image correlation matrix, N_dBe total number of images, matrix element A_IjExpression word w_iWith j image I mage_jBetween the correlativity association, A_Ij=tfidf (w_i) vis (w_i).

The complicated figure clustering algorithm of described application can be expressed as the defined optimization problem of formula (9) the method that image carries out cluster;

[\begin{matrix} \min_{C^{(1)}, C^{(2)}, D, B} {| | S - C^{(1)} D {(C^{(1)})}^{T} | |}^{2} + {| | A - C^{(1)} B {(C^{(2)})}^{T} | |}^{2} \\ s . t . C^{(1)} &Element; {0,1}^{N_{w} \times k_{1}}, C^{(2)} &Element; {0,1}^{N_{d} \times k_{2}}, C^{(1)} 1 = 1, C^{(2)} 1 = 1 \end{matrix}] . - - - (9)

Wherein, each component ofvector 1 all is 1, k₁And k₂The cluster number of representing word and image respectively, generic oriental matrix C⁽¹⁾And C⁽²⁾Be the output of complicated figure clustering algorithm, matrix element C_Pq⁽²⁾Represent p image I mage_pBelong to the q class, the complicated figure clustering algorithm that the defined optimization problem of formula (9) is found the solution is shown in algorithm 1:

Algorithm 1. complicated figure G₁Clustering algorithm CGC.

Input: matrix S and A;

Output: generic oriental matrix C⁽¹⁾And C⁽²⁾, k₁And k₂It is respectively the cluster number of word and image;

2-5 is up to convergence forstep 1. iteration step;

Step 2. is calculated D=((C⁽¹⁾)^TC⁽¹⁾)^-1(C⁽¹⁾)^TSC⁽¹⁾(C⁽¹⁾)^TC⁽¹⁾)^-1

Step 3. is calculated B=((C⁽¹⁾)^TC⁽¹⁾)^-1(C⁽¹⁾)^TAC⁽²⁾(C⁽²⁾)^TC⁽²⁾)^-1

Step 4. is D fixedly, B and C⁽²⁾, upgrade C line by line⁽¹⁾, make to minimize L that L is calculated as follows:

L＝||S-C⁽¹⁾D(C⁽¹⁾)^T||²+||A-C⁽¹⁾B(C⁽²⁾)^T||²；

Step 5. is D fixedly, B and C⁽¹⁾, upgrade C line by line⁽²⁾, make to minimize L that L is calculated as follows:

L＝||S-C⁽¹⁾D(C⁽¹⁾)^T||²+||A-C⁽¹⁾B(C⁽²⁾)^T||².

Generic oriental matrix C according toalgorithm 1 output⁽²⁾The method of image collection IMG being carried out cluster is, if matrix element

C_{pq}^{(2)} = 1

Then p image I mage_pBe classified as the q class, p=1 wherein ..., N_d, N_dTotal number of images among the expression set IMG, q=1 ..., k₂, k₂The cluster number of image among the expression IMG.

The useful effect that the present invention has is: the present invention is related in conjunction with the correlativity of definition word and image with traditional TF-IDF method with word visibility model, has broken through TF-IDF method a kind of text-processing technology of writing a composition and can not directly measure the restriction of correlativity between word and the image; And word related with image correlation is related with the word topic relativity to word carries out modeling and proposed a kind of new Web image clustering framework by complicated figure, improved Web image clustering precision, make image searching result sort out, make things convenient for the user to retrieve according to theme.

Description of drawings

Fig. 1 is based on the committed step workflow diagram of the Web image clustering method of image and text relevant mining, wherein (a) is according to the parts of images that extracts from Goole picture search return results of inquiry " bass " and follows text accordingly, (b) be complicated graph model example, solid line represents word related with image correlation, dotted line represents word related with the word topic relativity, (c) be the cluster result of output, treatment step (1) is to after following text to carry out text-processing and extracting text feature, it is related with the correlativity between image to excavate text, the word that obtains and image and word and word are closed the complicated figure of couplings for two kinds carry out modeling, treatment step (2) is to utilize complicated figure clustering algorithm that complicated figure shown in Fig. 1 (b) is carried out cluster;

Fig. 2 is based in the Web image clustering method of image and text relevant mining the Web image and follows the text synoptic diagram, and italic is represented noun among the figure;

Fig. 3 is that Fig. 2 follows noun visibility result of calculation synoptic diagram in the text;

Fig. 4 is the mutual information comparison diagram to the complicated figure cluster result of 5 query cases;

Fig. 5 (a) is inquiry jaguar synoptic diagram of preceding 5 images among three theme class " jaguar car ", " the jaguar animal " and " jaguar car " in the complicated figure cluster result under not introducing the visibility situation, and the image of red dotted border is the cluster item of mistake among the figure;

Fig. 5 (b) is inquiry jaguar synoptic diagram of preceding 5 images among three theme class " jaguar car ", " the jaguar animal " and " jaguar car " in the complicated figure cluster result after introducing visibility, and the image of red dotted border is the cluster item of mistake among the figure;

Fig. 6 is the synoptic diagram of inquiry mouse by preceding 10 images among three theme class " computermouse ", " the mouse animal " and " Mickey mouse " in the clustering method cluster result of the present invention, and the image of red dotted border is wrong cluster item among the figure.

Embodiment

The present invention proposes a kind of Web image clustering method based on image and text relevant mining, and in conjunction with the accompanying drawings, its enforcement is described in detail as follows.

(2) to following text to carry out text-processing and extracting text feature;

(3) visibility of each word in the calculating vocabulary;

(7) use complicated figure clustering algorithm image is carried out cluster.

freq (w_{i}, d_{j}) = n_{ij} / Σ_{k = 1}^{N_{w}} n_{kj} . - - - (1)

idf(w_i)＝log(N_d/num(w_i)). (2)

tfidf(w_i)＝freq(w_i，d_j)×idf(w_i). (3)。

vis (w_{i}) = {((C_{1} + 10^{- 9}) / (C_{2} + 10^{- 9}))}^{- {IDF}_{Google} (w_{i})} . - - - (4)

IDF_Google(w_i)＝log(|D_Google|/C₂). (5)

r(w_i，Image_j)＝tfidf(w_i)×vis(w_i). (6)。

Topic_r (w_{s}, w_{t}) = \max_{j} P (z = j | w_{s}) P (z = j | w_{t})

= \max_{j} \frac{p (w_{s} | z = j) P (z = j)}{P (w_{s})} \cdot \frac{p (w_{t} | z = j) P (z = j)}{P (w_{t})} - - - (7) .

= \max_{j} \frac{p (w_{s} | z = j) p (w_{t} | z = j) P (z = j)}{σ} .

{

S &Element; R_{+}^{N_{w} \times N_{w}},

A &Element; R_{+}^{N_{d} \times N_{d}}

}.(8)

Wherein, symmetric matrix

S &Element; R_{+}^{N_{w} \times N_{w}}

A &Element; R_{+}^{N_{w} \times N_{d}}

[\begin{matrix} \min_{C^{(1)}, C^{(2)}, D, B} {| | S - C^{(1)} D {(C^{(1)})}^{T} | |}^{2} + {| | A - C^{(1)} B {(C^{(2)})}^{T} | |}^{2} \\ s . t . C^{(1)} &Element; {0,1}^{N_{w} \times k_{1}}, C^{(2)} &Element; {0,1}^{N_{d} \times k_{2}}, C^{(1)} 1 = 1, C^{(2)} 1 = 1 \end{matrix}] . - - - (9)

Algorithm 1. complicated figure G₁Clustering algorithm CGC.

Input: matrix S and A;

2-5 is up to convergence forstep 1. iteration step;

L＝||S-C⁽¹⁾D(C⁽¹⁾)^T||²+||A-C⁽¹⁾B(C⁽²⁾)^T||²；

L＝||S-C⁽¹⁾D(C⁽¹⁾)^T||²+||A-C⁽¹⁾B(C⁽²⁾)^T||².

C_{pq}^{(2)} = 1

Embodiment

Selected 5 vision polysemants as inquiry, they are: " apple ", " bass ", " jaguar ", " mouse " and " tower ".Write the reptile program, extracted Goolge ImageSearch automatically as inquiry according to the key word of submitting to^TMReturn results.To each image in the return results, downloaded the Web page at image file and this image place.Because Google has limited and searched for the actual quantity of returning as a result, data set comprises about 4000 data item altogether.In order to extract the text of following of image, the Web page at image place is resolved, extract the follow text of the text of image word on every side as this image.All texts of following pass through part-of-speech tagging, extract noun wherein.Inquiring about its noun vocabulary scale of following text for each is 1000～2000 words.In order to obtain benchmark generic listing vector, we mark the manual image category that data are concentrated.

The workflow diagram of committed step of the present invention is an example with submit queries " bass " as shown in Figure 1, and concrete implementation step is:

1. write all images and image place webpage in the result for retrieval that the reptile program downloads the Google picture searching, by page resolver each html page is resolved, remove HTML mark and punctuation mark, obtain the image collection IMG={Image shown in Fig. 1 (a)₁..., Image_NdAnd follow text collection D={d₁..., d_Nd, N_dBeing to follow the text sum, also is total number of images simultaneously;

2. utilize the part-of-speech tagging program that each is followed text d_iCarry out part-of-speech tagging, i=1 wherein ..., N_d, remove the non-noun word in the text, keep the noun in the text;

3. sequential scanning follows among the text collection D each to follow text d_iIn all words, each various words keeps one, forms the vocabulary VOL={w that word list is represented_i..., w_Nw, N wherein_wBe the total words among the vocabulary VOL, to each the word w among the vocabulary VOL_iAdd up each word w_iAt each document d_jThe middle frequency n that occurs_Ij, and comprise word w among the set D_iFollow text number n um (w_i);

4. each is followed text d_j(j=1 ..., N_d) extract its text feature, concrete steps are:

(1) to each word w among the vocabulary VOL_i, i=1 wherein ..., N_w, N_wBe total words in the vocabulary, calculate w_iFollowing text d_jIn word frequency

freq (w_{i}, d_{j}) = n_{ij} / Σ_{k = 1}^{N_{w}} n_{kj};

(2) to each word w among the vocabulary VOL_i, calculate w_iContrary document word frequency idf (w_i)=log (N_d/ num (w_i));

(3) according to vector space model, with document d_jBe expressed as N_wDimensional vector:

d_{j} = (tfidf (w_{1}), . . ., tfidf (w_{N_{w}})),

I ties up the word w in the corresponding vocabulary_i, its value is tfidf (w_i)=freq (w_i, d_j) * idf (w_i);

5. to each word w among the vocabulary VOL_iCalculate its visibility

vis (w_{i}) = {((C_{1} + 10^{- 9}) / (C_{2} + 10^{- 9}))}^{- {IDF}_{Google} (w_{i})},

IDF_Google(w_i)＝log(|D_Google|/C₂)

Wherein, D_GoogleBe all Web page set of Google index, | D_Google| expression set D_GoogleIn page sum, in the present embodiment | D^Google|=5 * 10¹¹

The visibility of word has embodied word, noun especially, contain the degree that semantic usable image is described.From the angle of cognitive psychology and thinking in images, the word of high-visibility, as " banana ", than the word of low visibility, as " Bayesian ", the easier direct vision image that in human brain, forms.Can be used for expressing the semantic association between word and the image with visibility as a kind of new attribute of word.In the Web page, each word has visibility in various degree around the image, and the high-visibility word has stronger descriptive power to the semanteme of image.With C₁/ C₂Value can be weighed the visibility of various words, for example C of word " banana " to a certain extent as quantizating index₁/ C₂Value is greater than " Bayesian ".With Fig. 2 is example, this image be this image be with key word " bass " as inquiry, among preceding 5 results that return by the Google image search engine one.Follow noun C in the text₁And C₂Be worth to retrieve and obtain in May, 2009 from Google, as shown in table 1.As shown in Figure 3, the C of speech such as " legend ", " record ", " scale "₁/ C₂Value greater than " largemouth " and " fishermen ".But according to the visibility definition, because " largemouth " and " fishermen " is two main objects in this width of cloth image, they should have more high-visibility.Cause this result's reason to be, the more wide in range words of theme such as " record " appears on the Web page in large quantities, also appears at following in the text of image simultaneously in large quantities, thereby has improved their C₁/ C₂Value.The C of the wide in range word of theme₂Be worth often very big, therefore proposed by the invention visibility model utilization " the contrary document word frequency factor " IDF_Google(w_i)=log (| D_Google|/C₂) come its visibility is suppressed, | D_Google| be all Web page sums of Google index.The vis of noun (w) is worth as shown in Figure 3 among Fig. 3, and the vis (w) of " largemouth " and " fishermen " is worth maximum, and visible the present invention puies forward the rationality of visibility model.

Table 1

6. calculate each word w among the VOL_iWith image I mage_jThe related r (w of correlativity_i, Image_j)=tfidf (w_i) * vis (w_i); Structure word and image correlation matrix

A &Element; R_{+}^{N_{w} \times N_{d}},

Matrix element A_IjExpression word w_iWith j image I mage_jBetween the degree of correlation, A_Ij=r (w_i, Image_j).

7. to any two word w among the vocabulary VOL_sAnd w_tCalculate its degree of subject relativity, and structure word and word correlation matrix, concrete steps are as follows:

(1) with vocabulary VOL, follow text collection D and implicit number of topics k as the input that the implicit Di Li Cray of topic model distributes, export each implicit theme z_j(j=1 ..., probability distribution P (z k)_j) and z_jAt each word w_iOn probability distribution P (w_i| z_j);

(2) any two word w_sAnd w_tBetween degree of subject relativity Topic_r (w_s, w_t) be calculated as follows, σ is a normaliztion constant.

Topic_r (w_{s}, w_{t}) = \max_{j} P (z = j | w_{s}) P (z = j | w_{t})

= \max_{j} \frac{p (w_{s} | z = j) P (z = j)}{P (w_{s})} \cdot \frac{p (w_{t} | z = j) P (z = j)}{P (w_{t})}

= \max_{j} \frac{p (w_{s} | z = j) p (w_{t} | z = j) P (z = j)}{σ} .

(3) structure word and word correlation matrix are symmetric matrix

S &Element; R_{+}^{N_{w} \times N_{w}},

Matrix element S_Ij(the expression of i ≠ j) word w_iAnd w_jBetween degree of subject relativity, S_Ij=Topic_r (w_i, w_j).

8. obtain complicated graph model shown in Fig. 1 (b) through above step, this complexity graph model can be expressed as set of matrices

S &Element; R_{+}^{N_{w} \times N_{w}},

A &Element; R_{+}^{N_{d} \times N_{d}}

. using complicated figure clustering algorithm can carry out cluster to image collection IMG, and complicated figure clustering algorithm is expressed as optimization problem;

[\begin{matrix} \min_{C^{(1)}, C^{(2)}, D, B} {| | S - C^{(1)} D {(C^{(1)})}^{T} | |}^{2} + {| | A - C^{(1)} B {(C^{(2)})}^{T} | |}^{2} \\ s . t . C^{(1)} &Element; {0,1}^{N_{w} \times k_{1}}, C^{(2)} &Element; {0,1}^{N_{d} \times k_{2}}, C^{(1)} 1 = 1, C^{(2)} 1 = 1 \end{matrix}] .

Wherein, each component ofvector 1 all is 1, k₁And k₂The cluster number of representing word and image respectively, generic oriental matrix C⁽¹⁾And C⁽²⁾Be the output of complicated figure clustering algorithm, matrix element C_Pq⁽²⁾Represent p image I mage_pBelong to the q class, the concrete steps of complicated figure clustering algorithm are shown in algorithm 1:

Algorithm 1. complicated figure G₁Clustering algorithm CGC.

Input: matrix S and A;

2-5 is up to convergence forstep 1. iteration step;

L＝||S-C⁽¹⁾D(C⁽¹⁾)^T||²+||A-C⁽¹⁾B(C⁽²⁾)^T||²；

L＝||S-C⁽¹⁾D(C⁽¹⁾)^T||²+||A-C⁽¹⁾B(C⁽²⁾)^T||².

C_{pq}^{(2)} = 1

As shown in Figure 1, obtain cluster result through step (2), image is classified as 3 theme class in this example, is respectively " bass fishing ", " bass fish " and " bass guitar ".

For the validity that shows core content of the present invention and the overall performance of cluster framework, we carry out following cluster result contrast:

(1) distinguishes r (w_i, Image_j)=tfidf (w_i) and r (w_i, Image_j)=tfidf (w_i) * vis (w_i) two kinds of situations carry out cluster;

(2) with the related Topic_r (w of the topic relativity between word_s, w_t) with the related P (w of word symbiosis correlativity_s, w_t) contrast, image follow in the text any two word w_sAnd w_tThe symbiosis correlativity be defined as the probability P (w in the text of following that they appear at certain image simultaneously_s, w_t)=num (w_s, w_t)/N_d, num (w_s, w_t) be that it is followed and comprises word w in the text simultaneously_sAnd w_tThe number of image.Related with the symbiosis correlativity in conjunction with topic relativity, word and word isomorphism link weight are defined as: λ p (w_s, w_t)+(1-λ) Topic_r (w_s, w_t), wherein λ (0＜λ＜1) is an adjustable parameter.

The cluster Performance evaluation criterion adopts normalized cluster mutual information, i.e. Normalized MutualInformation.Normalized cluster mutual information is defined as: given cluster number k, generic listing vector λ=(λ₁..., λ_K) middle λ_iSpan be λ_i=1 ... k, λ_i=j represents that i data item belongs to C_jClass.Use λ^(a)And λ^(b)Respectively ecbatic with benchmark generic listing vector, then λ^(a)And λ^(b)Normalization cluster mutual information φ^(NMI)Be defined as:

φ^{(NMI)} (λ^{(a)}, λ^{(b)}) = \frac{Σ_{h - 1}^{k} Σ_{l - 1}^{k} n_{hl} \log (\frac{n \cdot n_{hl}}{n_{h}^{(a)} n_{l}^{(b)}})}{\sqrt{(Σ_{h - 1}^{k} n_{h}^{(a)} \log \frac{n_{h}^{(a)}}{n}) (Σ_{l - 1}^{k} n_{l}^{(b)} \log \frac{n_{l}^{(b)}}{n})}} .

Wherein, n_h^(a)Be corresponding to λ^(a)Class C_hIn the data item number, n_l^(b)Be corresponding to λ^(b)Class C_lIn the data item number.C_HlExpression is gathered at λ simultaneously^(a)Class C_hIn and λ^(b)Class C_lIn the number of data item.The λ of certain cluster result^(a)With benchmark generic λ^(b)Between mutual information value φ^(NMI)(λ^(a), λ^(b)) big more, represent that this cluster effect is good more.Desirable cluster is φ^(NMI)(λ^(a), λ^(b))=1.

For parameter lambda, consider three kinds of situations:

1)λ＝1；

2)λ＝0；

3)λ＝0.15；

As shown in Figure 4: the NMI value for all 5 the complicated figure clusters of inquiry all o'clock reaches best in λ=0, so can show the rationality of word proposed by the invention and word degree of subject relativity.

As shown in Figure 4: " λ=0 (vis (w)) " expression word and image links weight adopt A_Ij=tfidf (w_i) * vis (w_i).The result can see by the cluster mutual information, in complicated figure cluster, the visibility of word is introduced word make the high-visibility word to the related with it more topic relativity information of image node transmission with the image links weight, has improved the cluster performance.

With shown in Figure 5 be example to inquiry " jaguar " retrieving images cluster result, contrast (a) and (b) figure can see to such an extent that strengthen under some word of describing the image special object and the image links weight situation introducing visibility, the cluster performance improves.

Be to adopt the Web image clustering method the present invention is based on image and text relevant mining that inquiry mouse is submitted to Google picture searching institute return results to carry out preceding 10 images in three themes of cluster gained as shown in Figure 6, first row are theme " computer mouse ", secondary series is theme " mouse animal ", and the 3rd row are theme " Mickey mouse "; The image of red dotted border is wrong cluster item.