CN102779193A

Movatterモバイル変換

Info

Publication number: CN102779193A
Application number: CN2012102445195A
Authority: CN
Inventors: 杨沐昀; 王晓春; 李生; 齐浩亮; 赵铁军
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin University Of Technology High Tech Development Corp
Priority date: 2012-07-16
Filing date: 2012-07-16
Publication date: 2012-11-14
Anticipated expiration: 2032-07-16
Also published as: CN102779193B

Abstract

自适应个性化信息检索系统及方法，涉及计算机信息检索技术。本发明针对分布不规律的用户动态检索需求及时捕捉，伴随用户与搜索引擎的交互及时更新检索模型。所述系统包括用于根据当前查询信息、结合历史查询信息和历史点击信息构成特征矩阵，及根据特征矩阵获得训练参数预测模型的数据输入子系统；用于根据特征矩阵训练并应用参数预测模型、获得预测参数的参数训练和预测子系统；用于以预测参数组织当前查询、历史查询和历史点击，及将用户模型和查询模型结合形成个性化查询模型的执行检索子系统；和用于在待检索文档中寻找与个性化查询匹配的文档作为初步检索结果，根据相关性对所述初步检索结果进行排序获得最终检索结果输出的数据输出子系统。

The self-adaptive personalized information retrieval system and method relate to computer information retrieval technology. The invention captures in time the user's dynamic retrieval requirements with irregular distribution, and updates the retrieval model in time with the interaction between the user and the search engine. The system includes a data input subsystem for forming a feature matrix according to the current query information, combining historical query information and historical click information, and obtaining a training parameter prediction model according to the feature matrix; for training and applying the parameter prediction model according to the feature matrix, A parameter training and prediction subsystem for obtaining prediction parameters; an execution retrieval subsystem for organizing current queries, historical queries, and historical clicks with prediction parameters, and combining user models and query models to form a personalized query model; A data output subsystem that searches the retrieved documents for documents that match the personalized query as preliminary retrieval results, sorts the preliminary retrieval results according to their relevance to obtain the final retrieval results.

Description

Self-adaptation Personal Information System and method

Technical field

The present invention relates to the computer information retrieval technology.

Background technology

The vastness of the network information and the develop rapidly of correlation technique make the use search engine that People more and more is frequent.According to the statistics of China Internet network information center (NIC) (CNNIC), search engine (search engine) becomes the instrument of the most general assist people retrieval Web information.

In recent years, in order to improve the precision of information retrieval, make things convenient for the user to retrieve, improve user's search experience, information retrieval field has emerged many outstanding information retrieval models and has obtained good effect.One of them mainly improves is exactly to set up user interest model, and purpose is in the content relevance that guarantees inquiry and document, guarantees the correlativity of document and user interest simultaneously.User interest is divided into long-term interest and short-term interest according to time span.Short-term interest comes from the search history of an inquiry session (session).In the personalized retrieval research based on short-term interest, people such as Cao (2008; 2009) regard inquiry in the inquiry session and click as ordered data, adopt HMM model and improvement HMM model (vlHMM) and the training of CRF model, the predicted query intention.Zhu and Mishne (2009) are to user inquiring conversation procedure (session; The abbreviation inquiry session) carries out cluster; The importance polymerization that then whole inquiry sessions is produced proposes to be used to weigh the ClickRank model of webpage or website importance as the importance of the overall situation.Except these directly carry out the research method of modeling inquiry session, also have the researcher with inquiry session as the characteristic in the order models.Xiang people such as (2010) is added to multiple query modification relation among the RankSVM as characteristic.In addition, traditional retrieval model also can be applied to the research of user's short-term interest.Chen people such as (2009) has combined current inquiry and has clicked the similarity of documentation summary on the conventional language model based.Different is that the personalized retrieval model overwhelming majority who comprises long-term interest is based on the conventional information retrieval model.Tan (2006) proposes the method for some calculating historical information relevant with current inquiry on the basis of language model, this retrieval model all has positive role to new and old inquiry.Dou people such as (2007) has carried out similar experiment respectively on vector space model and language model.Ahn people such as (2008) is together in series a plurality of inquiry sessions according to Task, has set up the personalized retrieval system that embodies the long-term interest of user based on the BM25 probability model.

There is a significant disadvantage in above-mentioned these personalized retrieval models based on user interest: model is after training is accomplished, and the model inner parameter all is a fixed value, and relative fixed is constant.In fact, each is variant in information requirement under the different retrieval situation, adopts uniform way to handle various user search, lacks dirigibility unavoidably.With the personalized retrieval model based on query expansion is example, and user model combines to combine with current interrogation model, and setting two-part weight usually in the former studies is constant constant.But if the length of current inquiry is very short, user's query intention is expressed clear inadequately or is not sufficiently complete, and should strengthen the effect of user model this moment so, reduces the importance of current interrogation model.Otherwise if current query length is longer, it is clear that inquiry intention is expressed, and the effect played of user model is inessential on the contrary so.Therefore, a kind of have the personalized retrieval that adaptive dynamic retrieval model can further improve the user in theory and experience, and is the key characteristic that current searching system lacks.

A desirable dynamically personalized retrieval model should be a foundation with objective retrieve application, consider in design with when realizing retrieval model following several aspect:

1. user distribution

The user is a stochastic distribution in the objective world, and former studies often proposes hypothesis to user distribution.Radlinski (2007) supposes that the user is from the selection at random from the crowd of a fixed number.1 year, think that the user is always in a definite fixedly crowd.Existing research confirms that user's behavior is erratic (Agichtein et al., 2006), should avoid user distribution is done any hypothesis as far as possible.

2. user interest

User interest also is changeable.Belkin (1997) finds that very early when the user searched information, the user search demand can change, and Sofia Stamou (2009) thinks that also user interest can change along with the time.

3. query capability

The mutual process of user and search engine also is the process (Shen et al., 2005) that search engine is used in a study.The user resubmits new inquiry according to the quality and the satisfaction of return results.That is to say that the user can have influence on the inquiry that the user submits to next time in the reciprocal process with search engine.Along with enriching of user search experience, the ability that the user makes up inquiry is also strengthening.Therefore the importance of each historical query is along with time variation, new more inquiry importance high more (BinTan et al., 2006; Dou et al., 2007).

Summary of the invention

In order in time to catch for the erratic user's that distributes dynamic Search Requirement, follow the mutual of user and search engine and the purpose of the retrieval model that upgrades in time, the present invention has designed a kind of self-adaptation Personal Information System and method.

Self-adaptation Personal Information System of the present invention comprises:

Be used for according to current Query Information, combine historical query information and historical click information constitutive characteristic matrix, also be used for obtaining the data input subsystem of training parameter forecast model according to eigenmatrix;

Be used for training the also parameter training and the predicting subsystem of application parameter forecast model, acquisition Prediction Parameters according to eigenmatrix;

Be used for organizing current inquiry, historical query and historical the click with the parameter that prediction is come out; Also be used for user model and interrogation model in conjunction with the execution retrieval subsystem that forms the personalized enquire model;

Be used for seeking document with the personalized enquire coupling as the preliminary search result, also be used for said preliminary search result being sorted based on correlation at document to be retrieved, and the data output subsystem exported as final result for retrieval of the result after will sort.

Above-mentioned data input subsystem comprises:

Be used for according to current Query Information generate the user behavior characteristic module and

Be used for according to all the behavioural characteristic constitutive characteristic matrix norm pieces of user that obtain.

Above-mentioned parameter training and predicting subsystem comprise:

Be used to receive the data input module of pending data;

Be used to calculate each and inquire about pairing historical query and the historical module of clicking and being organized into desired data layout;

Be used for constitutive characteristic matrix norm piece;

Be used for searching with the mode of searching of traversal the module of current inquiry best parameter, the step-length of said traversal is 0.1;

Be used to use the SVM-Logic Regression Models to set up the module of the mapping of user characteristics and optimized parameter.

Self-adaptation customized information search method of the present invention comprises:

According to current Query Information, in conjunction with the step of historical query information and historical click information constitutive characteristic matrix;

Obtain the step of training parameter forecast model according to eigenmatrix;

Based on eigenmatrix training and application parameter forecast model, obtain the step of the parameter of prediction;

Parameter so that prediction is come out is organized current inquiry, historical query and historical the click, with the step of user model and interrogation model combination formation personalized enquire model;

In document to be retrieved, seek document with the personalized enquire Model Matching as the preliminary search result, and said preliminary search result is sorted, the step that the result after the ordering is exported as final result for retrieval data based on correlation.

Above-mentioned according to current Query Information, comprise in conjunction with the step of historical query information and historical click information constitutive characteristic matrix:

According to current Query Information generate the user behavior characteristic step and

Step according to all the behavioural characteristic constitutive characteristic matrixes of user that obtain.

Above-mentioned according to eigenmatrix training and application parameter forecast model, the step that obtains the parameter of prediction also comprises:

Receive the step of pending data;

Calculate each and inquire about pairing historical query and the historical step of clicking and being organized into desired data layout;

Constitutive characteristic matrix norm piece step;

Search the step of current inquiry best parameter with the mode of searching of traversal, the step-length of said traversal is 0.1;

Use the SVM regression model to set up the mapping steps of user characteristics and optimized parameter.

In the technical scheme of the present invention, said user behavior characteristic comprises:

The history of the web document of checking of expression user in an inquiry session session is clicked category feature, representes the web document that the user checked in very short time that is:;

The historical query category feature to searching system submitted of expression user in an inquiry session session promptly, represented the interior inquiry of submitting to of user's very short time,

The current inquiry category feature of representing current inquiry;

Current inquiry of representing to concern between current inquiry and the historical query and the characteristic between the historical query;

Represent the current inquiry of relation between current inquiry and historical click the and the characteristic between historical the click.

The particular content of above-mentioned five category features is respectively:

The said historical category feature of clicking comprises: the historical total degree of clicking; The historical total length of clicking; The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking is clicked average length at every turn, the last one historical total length of clicking; The last number of documents of clicking, the last mean value of clicking document length;

Said historical query category feature comprises: historical query total length, the average length of historical query and historical query total quantity;

The current inquiry category feature of the current inquiry of said expression comprises: current query length;

Characteristic between said current inquiry and the historical query comprises: current query word is compared with a last historical query, a new epexegesis and a last historical recurrence probability of clicking, and a current inquiry and a last inquiry are relatively; The quantity of new epexegesis, current query word is compared with a last historical query, and co-occurrence word accounts for the number percent of current query length; The similarity average of current inquiry and historical query, the similarity maximal value of current inquiry and historical query, the similarity of a current query word and a last historical query; Current inquiry is compared with a last historical query, the recurrence probability of new epexegesis and current inquiry, new epexegesis quantity; The number of times summation that new epexegesis occurs, current query word is compared with a last historical query, deletes the recurrence probability of a speech and a last historical query; Delete the quantity of speech in the last historical query; Delete the number of times summation that speech occurs in the last historical query, current inquiry is compared the recurrence probability of a co-occurrence word and a last historical query with a last historical query; The quantity of co-occurrence word in the last historical query, the number of times summation that co-occurrence word occurs in the last historical query;

Characteristic between said current inquiry and historical the click comprises: current query word and all historical similarity averages of clicking, current query word and whole historical similarity maximal values of clicking, a current query word and a last historical similarity of clicking; A current inquiry and a last historical point hit newly-increased speech number, and new epexegesis is in the last one historical occurrence number summation of clicking, and current query word is compared with a last historical query; Delete a speech and a last historical recurrence probability of clicking, delete the quantity of speech, last one historical point hits deletes the speech number; Hit the number of times summation of deleting that speech occurs at last one historical point; Current query word is compared with last historical a click, a co-occurrence word and a last historical recurrence probability of clicking, the quantity of co-occurrence word; Last one historical point hits the quantity of co-occurrence word, and last one historical point hits co-occurrence word occurrence number summation.

Because it is not necessarily identical that each inquires about pairing user behavior characteristic, the parameter in the corresponding interrogation model is just not necessarily identical.Therefore, the present invention is directed to the objective retrieval behavior rule that the method for the concrete retrieval environment dynamic assignment parameter of each inquiry more is close to the users.

In the actual information retrieving, call the feature weight that obtains in the training, the optimized parameter that should use in the prediction retrieval model.The present invention adopt the current inquiry of the common decision of five kinds of related characteristics of retrieving information, historical query and historically click in three parts, which part is more accurately expressed user search intent and for the contribution of current retrieval tasks; Thereby dynamic assignment the weight of three parts, reach the purpose that obtains optimized parameter.

To sum up; The adaptive personalized retrieval Model parameter of the present invention all is according to each user's interbehavior the parameter in the current interrogation model to be predicted; Adopted machine learning algorithm in the process of prediction; Such retrieval model is the parameter in the transaction module flexibly, thereby possesses higher dirigibility and retrieval rate.

Self-adaptation retrieval model of the present invention is self along with user and increasing of searching system interaction times and constantly; Wherein to historical information according to dynamic assignment weight with the size of current time interval, the decision attenuation amplitude parameter be to produce by parametric prediction model.For the present invention and mainstream technology are compared, adopted the data of (Shen et al., 2005), experiment is provided with also consistent with this article.The importance of considering historical information is with the special circumstances that change with current time interval, and the present invention has also compared the dynamic effect of retrieval model and fixed coefficient retrieval model this moment.See that on the whole along with enriching of historical information, the retrieval effectiveness of personalized retrieval model is become better and better on the whole, the gap between the model is more obvious, sees following table for details:

The 4th the inquiry Q4 that submits to user in the inquiry session is example; Utilize first inquiry Q1 equally; Second inquiry Q2 and the 3rd inquiry Q3 are as historical information; Even when not considering the historical information difference of importance, the method that this paper proposes has improved 38.18% with respect to traditional model (BayesInt) (being AdaptiveEW result) under this kind condition relatively on the MAP measurement index, and the PR20 index has improved 17.74% relatively; If difference of importance between the historical information, the AdaptiveDW model that this paper proposes is with respect to the BatchUp model, and MAP and PR20 increase rate reach 27.54% and 15.94% respectively.Data show that the retrieval effectiveness of the self-adaptation personalized retrieval model (AdaptiveDW) that the present invention proposes has surpassed personalized retrieval model (BatchUp mode) best in the current main-stream method.

To sum up, self-adaptation personalized retrieval model of the present invention adopts parametric prediction model to produce weight separately, has taken into account the dirigibility and the rationality of weight allocation.On identical data set, adaptive dynamically personalized retrieval model is superior to mainstream technology on retrieval effectiveness, has confirmed the validity of the technology of proposition in this invention.

Inventing concrete effect has:

One, the present invention is all effective for the new and old inquiry that the user submits to.

Old inquiry is meant the inquiry that in user search history, occurred; New inquiry is meant the inquiry that the user submits to for the first time.For old inquiry, because there is the historical information can reference, the weight for historical information in the personalized retrieval model will increase, and sets the constant near 1 usually.For new inquiry,,, set constant usually near 0 so the weight for historical information will reduce in the personalized retrieval model because there is not the history can reference.The present invention is different with prior art; Self-adaptation retrieval model of the present invention need not earlier the inquiry classification to be judged whether new inquiry or old inquiry; But directly set the parameter in the retrieval model flexibly according to the user behavior characteristic; Therefore, the present invention is applicable to various types of user behavior characteristics.

Two, the present invention is according to user interactions behavior dynamic assignment weight.

Prior art does not have to set the parameter in the retrieval model with reference to abundant user behavior characteristic.In fact, the user search behavior itself provides important interest information, serves as according to increasing the rationality that is assigned weight greatly with this part information.For instance, if the length of current inquiry is less, the quantity of information that so current inquiry provided is just less, and the weight for historical information will strengthen this moment.On the contrary, if user's historical information seldom, will strengthen the weight of current inquiry so.It is to assign weight dynamically according to realizing that parameter training of the present invention and predicting subsystem provide important interest information with user behavior itself, can increase the rationality of weight allocation greatly.

Three, the present invention has adopted machine learning algorithm to accomplish prediction automatically.

For instance, if the length of current inquiry is less, the weight for historical information will strengthen so.If user's historical information seldom, will strengthen the weight of current inquiry so.But, if current inquiry is shorter, the less situation of while historical information, how to assign weight has just seemed complicated.Adaptive personalized retrieval model solves the problem that model parameter is difficult to confirm by machine learning algorithm, has guaranteed the accuracy of the weight of prediction to a certain extent.

Four, the present invention has considered the sequential relationship between the inquiry.

User's query history is arranged according to the time in order, and new inquiry is more important than old inquiry, so historical query is decayed according to carry out weight with the time gap of current inquiry.

Five, the present invention has answered in the middle of the personalized retrieval modeling, how to organize current inquiry, historical query, and the historical relation of clicking between the three.

Six, the present invention has strengthened the processing of customized information, comes the further problem of the retrieval effectiveness of the current inquiry of raising if explored the historical information and the current Query Information that excavate the active user.

Seven, the present invention does not do any hypothesis to user distribution.Like this with regard to avoided the user true distribute inconsistent and influence the situation of retrieval effectiveness with hypothesis.

Description of drawings

Fig. 1 is a principle schematic of the present invention.Fig. 2 is the message processing flow figure of parameter prediction subsystem.

Embodiment

Embodiment one, the described self-adaptation Personal Information System of this embodiment comprise:

Embodiment two, this embodiment are that the data input subsystem in this embodiment comprises to the further qualification of data input subsystem in the embodiment one described self-adaptation Personal Information System:

Embodiment three, this embodiment are that parameter training and predicting subsystem comprise in this embodiment to the parameter training in the embodiment one described self-adaptation Personal Information System and the further qualification of predicting subsystem:

Be used to receive the data input module of pending data;

Be used for constitutive characteristic matrix norm piece;

Be used to use the SVM regression model to set up the module of the mapping of user characteristics and optimized parameter.

Embodiment four, this embodiment are that said user behavior characteristic comprises to the further specifying of the user behavior characteristic in the self-adaptation Personal Information System described in the embodiment one:

The history of the web document of checking of expression user in an inquiry session session is clicked category feature, representes the history click that the user checked in very short time that is:;

The historical query category feature to searching system submitted of expression user in an inquiry session session promptly, represented the interior historical query of submitting to of user's very short time,

The current inquiry category feature of representing current inquiry;

Embodiment five, this embodiment are to the further specifying of embodiment four described self-adaptation Personal Information System,

The said historical category feature of clicking comprises: the historical total degree of clicking; The historical total length (is unit with single speech/term) of clicking; The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking is clicked average length at every turn, the last one historical total length of clicking; The last number of documents of clicking, the last mean value of clicking document length;

Embodiment six, the described self-adaptation customized information of this embodiment search method comprise:

Obtain the step of training parameter forecast model according to eigenmatrix;

Embodiment seven, this embodiment are in the embodiment six described self-adaptation customized information search methods; According to current Query Information; In conjunction with the further qualification of the step of historical query information and historical click information constitutive characteristic matrix, this step further comprises:

Embodiment eight, this embodiment are in the embodiment six described self-adaptation customized information search methods; According to eigenmatrix training and application parameter forecast model; The further qualification of the step of the parameter of acquisition prediction, this step further comprises:

Receive the step of pending data;

Constitutive characteristic matrix norm piece step;

Embodiment nine, this embodiment are that said user behavior characteristic comprises to the further qualification of the user behavior characteristic described in the embodiment six described self-adaptation customized information search methods:

The historical query category feature of the historical query to searching system submitted of expression user in an inquiry session session promptly, is represented the interior historical query of submitting to of user's very short time,

The current inquiry category feature of representing current inquiry;

Embodiment ten, this embodiment are further specifying five types of technical characterictics described in the embodiment nine:

Input data of the present invention are continuous-query behaviors of carrying out in order to satisfy a search need according to each user of sequence of event; Comprise that each user submits to the inquiry of searching system; The document that searching system is returned (comprising title and summary), and the document code checked of user.

With file query_history.topic2 is example, and data layout is:

The result for retrieval of inquiry string " acquisition u.s.foreign company " is recorded in<jian Suojieguo>With</Jian Suojieguo>Between.The precedence that document code occurs has been reacted the sequencing information of document in the searching system return results.Click set record the user click the numbering of the document of checking.

Step according to current Query Information, combination historical query information and historical click information constitutive characteristic matrix is:

After the input data, next carry out feature extraction.Need that current inquiry and the historical query in the analysis and consult session, current inquiry and historical clicked, historical query, the relation between historical the click is finally extracted five type, 39 the search behavior characteristics of each user when submitting each inquiry to, for:

The history of the web document of checking of expression user in an inquiry session session is clicked category feature, comprising:
	The historical total degree of clicking
The historical total length of clicking
	The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking
Each average length of clicking
	The last one historical total length of clicking
The last number of documents of clicking
	The last mean value of clicking document length

The historical query category feature to searching system submitted of expression user in an inquiry session session comprises:
	The historical query total length
Historical query length mean value
	Historical query quantity
Represent the current inquiry category feature of current inquiry, comprising:
	Current query length
Represent the current inquiry of relation between current inquiry and historical click the and the characteristic between historical the click, comprise
	Current inquiry term and whole historical similarity averages of clicking
Current inquiry term and whole historical similarity maximal values of clicking
	A current inquiry term and a last historical similarity of clicking
A current inquiry and a last historical point hit newly-increased speech number
	New epexegesis is in the last one historical occurrence number summation of clicking
Current inquiry term compares with a last historical query, deletes a speech and a last historical recurrence probability of clicking
	Delete the quantity of speech
Last one historical point hits deletes the speech number
	Hit the number of times summation of deleting that speech occurs at last one historical point
Current inquiry term compares with last historical a click, a co-occurrence word and a last historical recurrence probability of clicking
	The quantity of co-occurrence word
Last one historical point hits the quantity of co-occurrence word
	Last one historical point hits co-occurrence word occurrence number summation
Current inquiry of representing to concern between current inquiry and the historical query and the characteristic between the historical query comprise:
	Current inquiry term compares with a last historical query, a new epexegesis and a last historical recurrence probability of clicking
The quantity of new epexegesis is compared in a current inquiry and a last inquiry
	Current inquiry term compares with a last historical query, and co-occurrence word accounts for the number percent of current query length
The similarity average of current inquiry and historical query
	The similarity maximal value of current inquiry and historical query
The similarity of a current inquiry term and a last historical query
	Current inquiry term compares with a last historical query, the recurrence probability of new epexegesis and current inquiry
New epexegesis quantity

The number of times summation that new epexegesis occurs
	Current inquiry term compares with a last historical query, deletes the recurrence probability of a speech and a last historical query
Delete the quantity of speech in the last historical query
	Delete the number of times summation that speech occurs in the last historical query
Current inquiry term compares the recurrence probability of a co-occurrence word and a last historical query with a last historical query
	The quantity of co-occurrence word in the last historical query
The number of times summation that co-occurrence word occurs in the last historical query

On the other hand, calculate the optimum weighted value of each inquiry.These 39 characteristics and optimal weights value are formed the training data of parametric prediction model jointly.The part that in training data, starts is represented filename and the title of each characteristic and the symbolic animal of the birth year description of character pair of training data.Part below the DATA is exactly eigenmatrix (this form can directly be imported for existing SVM returns kit).

With q₂Be example, then corresponding training data is:

RELATION?q2.arff

ATTRIBUTE?cqlenth?numeric

......

ATTRIBUTE?class?numeric

DATA

3,2,20,20,10,20,0,2,0.0869565217391304,0.0869565217391304,0.0869565217391304,0,0,0,2,2,1,0.4,0.4,0.4,0.333333333333333,0,0.5,0.4

4,3,2,2,0.666666666666667,2,1,2,0,0,0,0,0,0,2,2,1,0.333333333333333,0.333333333333333,0.333333333333333,0.25,0,0.5,0.4

......

First line description of above-mentioned training data file by name " q2.arff ", key word is " RELATION ", second line description first characteristic " length of current inquiry ", key word is " ATTRIBUTE ".By that analogy, have 39 feature descriptions.

An ensuing line description optimized parameter type is the decimal between the 0-1, and key word is " ATTRIBUTE ".Be exactly the characteristic of correspondence matrix after DATA, eigenmatrix refers to the content of removing in the training data file with beginning, and 39 user behavior proper vectors and corresponding optimized parameter that eigenmatrix is mentioned by preamble are formed.Each row has 40 data item, and preceding 39 is eigenwert, and the 40th data item is optimized parameter.Each training data can be used delegation (40) vector representation, the delegation of constitutive characteristic matrix.The quantity of training data has determined the line number of eigenmatrix.Separate with comma between the data item.

Adopt machine learning method SVM to return (SVM-Regression) according to above-mentioned training data and come the training parameter forecast model, this model representation be the funtcional relationship of optimal weights and each characteristic;

MAP maximal value with each inquiry is the ferret out value.The step-length of traversal is 0.1.Adopt Support Vector Regression (SVR) (Chang and Lin, 2001) to train, confirm the optimal weights of each inquiry and the funtcional relationship of 39 characteristics, and then obtain the training parameter forecast model.

When the application parameter forecast model is predicted, import 39 eigenwerts of each test query, this parametric prediction model just can produce corresponding weighted value.Make up current inquiry by this way, historical query and historical three parts of clicking.The test data form is as follows.Test data and training data form basically identical, difference are that last row of proper vector are "? " in the test data, represent value to be predicted.The test data form is:

The main task of carrying out retrieval subsystem be with TREC AP88-90 document as the band search file, use Lemur to set up index, accomplish retrieval tasks at the conventional language model framework then.

Predicted the outcome based on what a last step application parameter forecast model produced, organize current inquiry and historical information, constitute the personalized enquire model.

If current inquiry is k inquiry Q in the inquiry session_k, the user interest of short-term history inquiry representative is embodied in historical query Q so_i(the average of the term probability of occurrence among 1≤i≤k-1).Similarly, user's short-term interest is also embodied in the historical C of click_i(the average that the term among 1≤i≤k-1) occurs.Query history is by historical query H_QClick H with history_CForm.Query word is represented with ω.

A) calculate current interrogation model

p (ω | Q_{i}) = \frac{c (ω, Q_{i})}{| Q_{i} |} - - - (1)

The implication of each parameter in the formula, please explain: ω represents speech, Q_iThe representative inquiry, P represents probability, and i representes the i time.Current interrogation model is by the number of times of current query word appearance and the length decision of current inquiry.P (ω | Q_i) the current inquiry Q of representative₁In the probability that occurs of each speech ω.C (ω, Q_i) represent at inquiry Q_iThe number of times that middle speech ω occurs.| Q_i| the length of expression inquiry Qi, just form by what speech.The implication of current interrogation model representative is that the computing method of the probability of the some speech in the inquiry string are, the number of times that this speech occurs in inquiry then divided by current inquiry in the sum of speech.

B) computation history interrogation model

p (ω | H_{Q}) = \frac{1}{k - 1} Σ_{i = 1}^{i = k - 1} p (ω | Q_{i}) - - - (2)

The implication of each parameter in the formula, please explain: ω represents speech, Q_iThe representative inquiry, P represents probability, H_qRepresent whole historical querys, i represents the i time.Historical query model p (ω | H_Q) by single historical query model P (ω | Q_i) adding up and making even all obtains.For current inquiry Q_k, its historical query is by Q₁, Q₂... Q_K-1Form.With each historical query model P (ω | Q_i) add up, then divided by the quantity k-1 of historical query.Wherein single historical query model P (ω | Q_i) calculate according to formula (1).The implication of historical query model representative is at whole historical H_QIn the method for calculating probability of single speech ω be, calculate number of times that this speech occurs sum at first respectively divided by the place speech that historical query comprised in each historical query, next, next k-1 probability done and, at last divided by k-1.

C) computation history is clicked model

p (ω | H_{C}) = \frac{1}{k - 1} Σ_{i = 1}^{i = k - 1} p (ω | C_{i}) - - - (3)

The implication of each parameter in the formula, please explain: ω represents speech, C_iThe web document that representative of consumer was checked, P represents probability, H_cWhole history web pages document that representative of consumer has been seen, i representes the i time.With the historical query model class seemingly, historical click model P (ω | H_C) by single historical click model P (ω | C_i) adding up and making even all obtains.For current inquiry Q_k, its history is clicked by C₁, C₂... C_K-1Form.With each historical click model P (ω | C_i) add up the quantity k-1 that clicks divided by history then.The wherein single historical model of clicking calculates according to formula (1).

D) extract current inquiry category feature

The length that mainly comprises current inquiry.

E) extract the historical query category feature

Mainly comprise historical query quantity, total length and average length.

F) characteristic between current inquiry of extraction and the historical query

Mainly comprise the similarity between a current inquiry and the last inquiry, the similarity of current inquiry and whole historical querys, new epexegesis and the quantity of deleting speech, and the shared proportion in current inquiry or historical query.

G) characteristic between current inquiry of extraction and historical the click

Mainly comprise the similarity between a current inquiry and whole and last historical the click, new epexegesis and the quantity of deleting speech, and concentrate the proportion of fighting at current inquiry and historical point.

H) the operation parameter forecast model obtains parameter

User characteristics is as the input of parameter prediction system, and output is fit to the parameter of the best of current inquiry

I) organize current interrogation model, historical query model and the historical model of clicking according to the parameter that dopes

Parameter beta wherein_k∈ (0,1) has determined the weight allocation between historical query and historical click the, parameter beta_kThe historical importance of clicking of big more explanation is big more; Work as β_k=1 o'clock, the expression user interest model was clicked by history fully and is embodied.In like manner, α_kBig more, the importance of current inquiry is big more.

Adaptive personalized retrieval model has been attempted two kinds of methods respectively, a kind of retrieval model (AdaptiveEW) under the equal situation of importance between the history, formalization representation such as formula (4) of being based on.Another kind is descending according to historical and current query time distance, the importance retrieval model (AdaptiveDW) under the rule that changes from small to big, and formalization representation is shown in formula (5).Wherein, Q_kRepresent current inquiry, H_cRepresent that the history before the current inquiry is clicked in the current inquiry session, H_qRepresent the historical query in the current inquiry session.Parameter alpha_k, β_k, m_k, n_kRepresent weight respectively, their span is the arbitrary small number between 0 to 1.

The interrogation model p of self-adaptation personalized retrieval model (AdaptiveEW) (ω | θ_k) comprise two parts: current interrogation model p (ω | Q_k) and historical models, current interrogation model weight is α_kThe historical models weight is 1-α_kCurrent interrogation model is represented the probability that current query word ω occurs, and calculates according to formula (1).Wherein historical information by history click model p (ω | H_c) and historical query model p (ω | H_Q) form.The historical query model calculates according to formula (2).The historical model of clicking calculates according to formula (3).Weight equates between each historical query.Weight equates between each historical click.Historical click model weight is 1-β_k, historical click model weight is β_kShown in formula (4).

p(ω|θ_k)=α_κp(ω|Q_K)+(1-α_k)[β_kp(ω|H_C)+(1-β_k)p(ω|H_Q)]

（4）

The implication of each parameter in the formula, please explain:

More than be self-adaptation retrieval model (AdaptiveEW), wherein weight equates between the historical information.Another kind of self-adaptation retrieval model thinks that the importance of historical information is relevant with the time gap of current inquiry.The interrogation model p of this self-adaptation retrieval model (AdaptiveDW) (ω | ψ_k) comprise two parts: interrogation model p (ω | θ_k) and historical click model p (ω | H_C) form.Historical click model p (ω | H_C) weight be m_k, interrogation model p (ω | θ_k) weight be 1-m_kInterrogation model p (ω | θ_k) by current interrogation model p (ω, θ_k) and last one constantly interrogation model p (ω | θ_K-1) form.Current interrogation model p (ω, θ_k) weight is n_k, the interrogation model p in a last moment (ω | θ_K-1) weight is 1-n_kThe historical query model calculates according to formula (2).The historical model of clicking calculates according to formula (3).Interrogation model carries out the weight decay to old interrogation model as time passes in the self-adaptation retrieval model (AdaptiveDW), and new historical query is bigger than the weight of old historical query,, formalization representation is shown in formula (5).

p(ω|θ_k)=n_kp(ω,Q_K)+(1-n_k)p(ω|θ_k-1)

The implication of each parameter in the formula, please explain:

J) start retrieving

In document to be retrieved, seek the result for retrieval that mates with personalized enquire, and carry out descending sort based on the correlation probabilities value.1000 pieces of documents are returned in each inquiry.

Personalized enquire is submitted to after the searching system, and searching system is returned result for retrieval.Personalized retrieval result's data layout:

Number of queries is shown in first tabulation, and secondary series is represented document code, the 3rd row representative ordering, and the 4th row are represented the mark of language model.So far, the implementation process of whole self-adaptation personalized retrieval model finishes.