Movatterモバイル変換


[0]ホーム

URL:


CN102779193A - Self-adaptive personalized information retrieval system and method - Google Patents

Self-adaptive personalized information retrieval system and method
Download PDF

Info

Publication number
CN102779193A
CN102779193ACN2012102445195ACN201210244519ACN102779193ACN 102779193 ACN102779193 ACN 102779193ACN 2012102445195 ACN2012102445195 ACN 2012102445195ACN 201210244519 ACN201210244519 ACN 201210244519ACN 102779193 ACN102779193 ACN 102779193A
Authority
CN
China
Prior art keywords
historical
query
current
previous
current query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102445195A
Other languages
Chinese (zh)
Other versions
CN102779193B (en
Inventor
杨沐昀
王晓春
李生
齐浩亮
赵铁军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University Of Technology High Tech Development Corp
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology ShenzhenfiledCriticalHarbin Institute of Technology Shenzhen
Priority to CN201210244519.5ApriorityCriticalpatent/CN102779193B/en
Publication of CN102779193ApublicationCriticalpatent/CN102779193A/en
Application grantedgrantedCritical
Publication of CN102779193BpublicationCriticalpatent/CN102779193B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

Translated fromChinese

自适应个性化信息检索系统及方法,涉及计算机信息检索技术。本发明针对分布不规律的用户动态检索需求及时捕捉,伴随用户与搜索引擎的交互及时更新检索模型。所述系统包括用于根据当前查询信息、结合历史查询信息和历史点击信息构成特征矩阵,及根据特征矩阵获得训练参数预测模型的数据输入子系统;用于根据特征矩阵训练并应用参数预测模型、获得预测参数的参数训练和预测子系统;用于以预测参数组织当前查询、历史查询和历史点击,及将用户模型和查询模型结合形成个性化查询模型的执行检索子系统;和用于在待检索文档中寻找与个性化查询匹配的文档作为初步检索结果,根据相关性对所述初步检索结果进行排序获得最终检索结果输出的数据输出子系统。

Figure 201210244519

The self-adaptive personalized information retrieval system and method relate to computer information retrieval technology. The invention captures in time the user's dynamic retrieval requirements with irregular distribution, and updates the retrieval model in time with the interaction between the user and the search engine. The system includes a data input subsystem for forming a feature matrix according to the current query information, combining historical query information and historical click information, and obtaining a training parameter prediction model according to the feature matrix; for training and applying the parameter prediction model according to the feature matrix, A parameter training and prediction subsystem for obtaining prediction parameters; an execution retrieval subsystem for organizing current queries, historical queries, and historical clicks with prediction parameters, and combining user models and query models to form a personalized query model; A data output subsystem that searches the retrieved documents for documents that match the personalized query as preliminary retrieval results, sorts the preliminary retrieval results according to their relevance to obtain the final retrieval results.

Figure 201210244519

Description

Self-adaptation Personal Information System and method
Technical field
The present invention relates to the computer information retrieval technology.
Background technology
The vastness of the network information and the develop rapidly of correlation technique make the use search engine that People more and more is frequent.According to the statistics of China Internet network information center (NIC) (CNNIC), search engine (search engine) becomes the instrument of the most general assist people retrieval Web information.
In recent years, in order to improve the precision of information retrieval, make things convenient for the user to retrieve, improve user's search experience, information retrieval field has emerged many outstanding information retrieval models and has obtained good effect.One of them mainly improves is exactly to set up user interest model, and purpose is in the content relevance that guarantees inquiry and document, guarantees the correlativity of document and user interest simultaneously.User interest is divided into long-term interest and short-term interest according to time span.Short-term interest comes from the search history of an inquiry session (session).In the personalized retrieval research based on short-term interest, people such as Cao (2008; 2009) regard inquiry in the inquiry session and click as ordered data, adopt HMM model and improvement HMM model (vlHMM) and the training of CRF model, the predicted query intention.Zhu and Mishne (2009) are to user inquiring conversation procedure (session; The abbreviation inquiry session) carries out cluster; The importance polymerization that then whole inquiry sessions is produced proposes to be used to weigh the ClickRank model of webpage or website importance as the importance of the overall situation.Except these directly carry out the research method of modeling inquiry session, also have the researcher with inquiry session as the characteristic in the order models.Xiang people such as (2010) is added to multiple query modification relation among the RankSVM as characteristic.In addition, traditional retrieval model also can be applied to the research of user's short-term interest.Chen people such as (2009) has combined current inquiry and has clicked the similarity of documentation summary on the conventional language model based.Different is that the personalized retrieval model overwhelming majority who comprises long-term interest is based on the conventional information retrieval model.Tan (2006) proposes the method for some calculating historical information relevant with current inquiry on the basis of language model, this retrieval model all has positive role to new and old inquiry.Dou people such as (2007) has carried out similar experiment respectively on vector space model and language model.Ahn people such as (2008) is together in series a plurality of inquiry sessions according to Task, has set up the personalized retrieval system that embodies the long-term interest of user based on the BM25 probability model.
There is a significant disadvantage in above-mentioned these personalized retrieval models based on user interest: model is after training is accomplished, and the model inner parameter all is a fixed value, and relative fixed is constant.In fact, each is variant in information requirement under the different retrieval situation, adopts uniform way to handle various user search, lacks dirigibility unavoidably.With the personalized retrieval model based on query expansion is example, and user model combines to combine with current interrogation model, and setting two-part weight usually in the former studies is constant constant.But if the length of current inquiry is very short, user's query intention is expressed clear inadequately or is not sufficiently complete, and should strengthen the effect of user model this moment so, reduces the importance of current interrogation model.Otherwise if current query length is longer, it is clear that inquiry intention is expressed, and the effect played of user model is inessential on the contrary so.Therefore, a kind of have the personalized retrieval that adaptive dynamic retrieval model can further improve the user in theory and experience, and is the key characteristic that current searching system lacks.
A desirable dynamically personalized retrieval model should be a foundation with objective retrieve application, consider in design with when realizing retrieval model following several aspect:
1. user distribution
The user is a stochastic distribution in the objective world, and former studies often proposes hypothesis to user distribution.Radlinski (2007) supposes that the user is from the selection at random from the crowd of a fixed number.1 year, think that the user is always in a definite fixedly crowd.Existing research confirms that user's behavior is erratic (Agichtein et al., 2006), should avoid user distribution is done any hypothesis as far as possible.
2. user interest
User interest also is changeable.Belkin (1997) finds that very early when the user searched information, the user search demand can change, and Sofia Stamou (2009) thinks that also user interest can change along with the time.
3. query capability
The mutual process of user and search engine also is the process (Shen et al., 2005) that search engine is used in a study.The user resubmits new inquiry according to the quality and the satisfaction of return results.That is to say that the user can have influence on the inquiry that the user submits to next time in the reciprocal process with search engine.Along with enriching of user search experience, the ability that the user makes up inquiry is also strengthening.Therefore the importance of each historical query is along with time variation, new more inquiry importance high more (BinTan et al., 2006; Dou et al., 2007).
Summary of the invention
In order in time to catch for the erratic user's that distributes dynamic Search Requirement, follow the mutual of user and search engine and the purpose of the retrieval model that upgrades in time, the present invention has designed a kind of self-adaptation Personal Information System and method.
Self-adaptation Personal Information System of the present invention comprises:
Be used for according to current Query Information, combine historical query information and historical click information constitutive characteristic matrix, also be used for obtaining the data input subsystem of training parameter forecast model according to eigenmatrix;
Be used for training the also parameter training and the predicting subsystem of application parameter forecast model, acquisition Prediction Parameters according to eigenmatrix;
Be used for organizing current inquiry, historical query and historical the click with the parameter that prediction is come out; Also be used for user model and interrogation model in conjunction with the execution retrieval subsystem that forms the personalized enquire model;
Be used for seeking document with the personalized enquire coupling as the preliminary search result, also be used for said preliminary search result being sorted based on correlation at document to be retrieved, and the data output subsystem exported as final result for retrieval of the result after will sort.
Above-mentioned data input subsystem comprises:
Be used for according to current Query Information generate the user behavior characteristic module and
Be used for according to all the behavioural characteristic constitutive characteristic matrix norm pieces of user that obtain.
Above-mentioned parameter training and predicting subsystem comprise:
Be used to receive the data input module of pending data;
Be used to calculate each and inquire about pairing historical query and the historical module of clicking and being organized into desired data layout;
Be used for constitutive characteristic matrix norm piece;
Be used for searching with the mode of searching of traversal the module of current inquiry best parameter, the step-length of said traversal is 0.1;
Be used to use the SVM-Logic Regression Models to set up the module of the mapping of user characteristics and optimized parameter.
Self-adaptation customized information search method of the present invention comprises:
According to current Query Information, in conjunction with the step of historical query information and historical click information constitutive characteristic matrix;
Obtain the step of training parameter forecast model according to eigenmatrix;
Based on eigenmatrix training and application parameter forecast model, obtain the step of the parameter of prediction;
Parameter so that prediction is come out is organized current inquiry, historical query and historical the click, with the step of user model and interrogation model combination formation personalized enquire model;
In document to be retrieved, seek document with the personalized enquire Model Matching as the preliminary search result, and said preliminary search result is sorted, the step that the result after the ordering is exported as final result for retrieval data based on correlation.
Above-mentioned according to current Query Information, comprise in conjunction with the step of historical query information and historical click information constitutive characteristic matrix:
According to current Query Information generate the user behavior characteristic step and
Step according to all the behavioural characteristic constitutive characteristic matrixes of user that obtain.
Above-mentioned according to eigenmatrix training and application parameter forecast model, the step that obtains the parameter of prediction also comprises:
Receive the step of pending data;
Calculate each and inquire about pairing historical query and the historical step of clicking and being organized into desired data layout;
Constitutive characteristic matrix norm piece step;
Search the step of current inquiry best parameter with the mode of searching of traversal, the step-length of said traversal is 0.1;
Use the SVM regression model to set up the mapping steps of user characteristics and optimized parameter.
In the technical scheme of the present invention, said user behavior characteristic comprises:
The history of the web document of checking of expression user in an inquiry session session is clicked category feature, representes the web document that the user checked in very short time that is:;
The historical query category feature to searching system submitted of expression user in an inquiry session session promptly, represented the interior inquiry of submitting to of user's very short time,
The current inquiry category feature of representing current inquiry;
Current inquiry of representing to concern between current inquiry and the historical query and the characteristic between the historical query;
Represent the current inquiry of relation between current inquiry and historical click the and the characteristic between historical the click.
The particular content of above-mentioned five category features is respectively:
The said historical category feature of clicking comprises: the historical total degree of clicking; The historical total length of clicking; The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking is clicked average length at every turn, the last one historical total length of clicking; The last number of documents of clicking, the last mean value of clicking document length;
Said historical query category feature comprises: historical query total length, the average length of historical query and historical query total quantity;
The current inquiry category feature of the current inquiry of said expression comprises: current query length;
Characteristic between said current inquiry and the historical query comprises: current query word is compared with a last historical query, a new epexegesis and a last historical recurrence probability of clicking, and a current inquiry and a last inquiry are relatively; The quantity of new epexegesis, current query word is compared with a last historical query, and co-occurrence word accounts for the number percent of current query length; The similarity average of current inquiry and historical query, the similarity maximal value of current inquiry and historical query, the similarity of a current query word and a last historical query; Current inquiry is compared with a last historical query, the recurrence probability of new epexegesis and current inquiry, new epexegesis quantity; The number of times summation that new epexegesis occurs, current query word is compared with a last historical query, deletes the recurrence probability of a speech and a last historical query; Delete the quantity of speech in the last historical query; Delete the number of times summation that speech occurs in the last historical query, current inquiry is compared the recurrence probability of a co-occurrence word and a last historical query with a last historical query; The quantity of co-occurrence word in the last historical query, the number of times summation that co-occurrence word occurs in the last historical query;
Characteristic between said current inquiry and historical the click comprises: current query word and all historical similarity averages of clicking, current query word and whole historical similarity maximal values of clicking, a current query word and a last historical similarity of clicking; A current inquiry and a last historical point hit newly-increased speech number, and new epexegesis is in the last one historical occurrence number summation of clicking, and current query word is compared with a last historical query; Delete a speech and a last historical recurrence probability of clicking, delete the quantity of speech, last one historical point hits deletes the speech number; Hit the number of times summation of deleting that speech occurs at last one historical point; Current query word is compared with last historical a click, a co-occurrence word and a last historical recurrence probability of clicking, the quantity of co-occurrence word; Last one historical point hits the quantity of co-occurrence word, and last one historical point hits co-occurrence word occurrence number summation.
Because it is not necessarily identical that each inquires about pairing user behavior characteristic, the parameter in the corresponding interrogation model is just not necessarily identical.Therefore, the present invention is directed to the objective retrieval behavior rule that the method for the concrete retrieval environment dynamic assignment parameter of each inquiry more is close to the users.
In the actual information retrieving, call the feature weight that obtains in the training, the optimized parameter that should use in the prediction retrieval model.The present invention adopt the current inquiry of the common decision of five kinds of related characteristics of retrieving information, historical query and historically click in three parts, which part is more accurately expressed user search intent and for the contribution of current retrieval tasks; Thereby dynamic assignment the weight of three parts, reach the purpose that obtains optimized parameter.
To sum up; The adaptive personalized retrieval Model parameter of the present invention all is according to each user's interbehavior the parameter in the current interrogation model to be predicted; Adopted machine learning algorithm in the process of prediction; Such retrieval model is the parameter in the transaction module flexibly, thereby possesses higher dirigibility and retrieval rate.
Self-adaptation retrieval model of the present invention is self along with user and increasing of searching system interaction times and constantly; Wherein to historical information according to dynamic assignment weight with the size of current time interval, the decision attenuation amplitude parameter be to produce by parametric prediction model.For the present invention and mainstream technology are compared, adopted the data of (Shen et al., 2005), experiment is provided with also consistent with this article.The importance of considering historical information is with the special circumstances that change with current time interval, and the present invention has also compared the dynamic effect of retrieval model and fixed coefficient retrieval model this moment.See that on the whole along with enriching of historical information, the retrieval effectiveness of personalized retrieval model is become better and better on the whole, the gap between the model is more obvious, sees following table for details:
Figure BDA00001891411100051
The 4th the inquiry Q4 that submits to user in the inquiry session is example; Utilize first inquiry Q1 equally; Second inquiry Q2 and the 3rd inquiry Q3 are as historical information; Even when not considering the historical information difference of importance, the method that this paper proposes has improved 38.18% with respect to traditional model (BayesInt) (being AdaptiveEW result) under this kind condition relatively on the MAP measurement index, and the PR20 index has improved 17.74% relatively; If difference of importance between the historical information, the AdaptiveDW model that this paper proposes is with respect to the BatchUp model, and MAP and PR20 increase rate reach 27.54% and 15.94% respectively.Data show that the retrieval effectiveness of the self-adaptation personalized retrieval model (AdaptiveDW) that the present invention proposes has surpassed personalized retrieval model (BatchUp mode) best in the current main-stream method.
To sum up, self-adaptation personalized retrieval model of the present invention adopts parametric prediction model to produce weight separately, has taken into account the dirigibility and the rationality of weight allocation.On identical data set, adaptive dynamically personalized retrieval model is superior to mainstream technology on retrieval effectiveness, has confirmed the validity of the technology of proposition in this invention.
Inventing concrete effect has:
One, the present invention is all effective for the new and old inquiry that the user submits to.
Old inquiry is meant the inquiry that in user search history, occurred; New inquiry is meant the inquiry that the user submits to for the first time.For old inquiry, because there is the historical information can reference, the weight for historical information in the personalized retrieval model will increase, and sets the constant near 1 usually.For new inquiry,,, set constant usually near 0 so the weight for historical information will reduce in the personalized retrieval model because there is not the history can reference.The present invention is different with prior art; Self-adaptation retrieval model of the present invention need not earlier the inquiry classification to be judged whether new inquiry or old inquiry; But directly set the parameter in the retrieval model flexibly according to the user behavior characteristic; Therefore, the present invention is applicable to various types of user behavior characteristics.
Two, the present invention is according to user interactions behavior dynamic assignment weight.
Prior art does not have to set the parameter in the retrieval model with reference to abundant user behavior characteristic.In fact, the user search behavior itself provides important interest information, serves as according to increasing the rationality that is assigned weight greatly with this part information.For instance, if the length of current inquiry is less, the quantity of information that so current inquiry provided is just less, and the weight for historical information will strengthen this moment.On the contrary, if user's historical information seldom, will strengthen the weight of current inquiry so.It is to assign weight dynamically according to realizing that parameter training of the present invention and predicting subsystem provide important interest information with user behavior itself, can increase the rationality of weight allocation greatly.
Three, the present invention has adopted machine learning algorithm to accomplish prediction automatically.
For instance, if the length of current inquiry is less, the weight for historical information will strengthen so.If user's historical information seldom, will strengthen the weight of current inquiry so.But, if current inquiry is shorter, the less situation of while historical information, how to assign weight has just seemed complicated.Adaptive personalized retrieval model solves the problem that model parameter is difficult to confirm by machine learning algorithm, has guaranteed the accuracy of the weight of prediction to a certain extent.
Four, the present invention has considered the sequential relationship between the inquiry.
User's query history is arranged according to the time in order, and new inquiry is more important than old inquiry, so historical query is decayed according to carry out weight with the time gap of current inquiry.
Five, the present invention has answered in the middle of the personalized retrieval modeling, how to organize current inquiry, historical query, and the historical relation of clicking between the three.
Six, the present invention has strengthened the processing of customized information, comes the further problem of the retrieval effectiveness of the current inquiry of raising if explored the historical information and the current Query Information that excavate the active user.
Seven, the present invention does not do any hypothesis to user distribution.Like this with regard to avoided the user true distribute inconsistent and influence the situation of retrieval effectiveness with hypothesis.
Description of drawings
Fig. 1 is a principle schematic of the present invention.Fig. 2 is the message processing flow figure of parameter prediction subsystem.
Embodiment
Embodiment one, the described self-adaptation Personal Information System of this embodiment comprise:
Be used for according to current Query Information, combine historical query information and historical click information constitutive characteristic matrix, also be used for obtaining the data input subsystem of training parameter forecast model according to eigenmatrix;
Be used for training the also parameter training and the predicting subsystem of application parameter forecast model, acquisition Prediction Parameters according to eigenmatrix;
Be used for organizing current inquiry, historical query and historical the click with the parameter that prediction is come out; Also be used for user model and interrogation model in conjunction with the execution retrieval subsystem that forms the personalized enquire model;
Be used for seeking document with the personalized enquire coupling as the preliminary search result, also be used for said preliminary search result being sorted based on correlation at document to be retrieved, and the data output subsystem exported as final result for retrieval of the result after will sort.
Embodiment two, this embodiment are that the data input subsystem in this embodiment comprises to the further qualification of data input subsystem in the embodiment one described self-adaptation Personal Information System:
Be used for according to current Query Information generate the user behavior characteristic module and
Be used for according to all the behavioural characteristic constitutive characteristic matrix norm pieces of user that obtain.
Embodiment three, this embodiment are that parameter training and predicting subsystem comprise in this embodiment to the parameter training in the embodiment one described self-adaptation Personal Information System and the further qualification of predicting subsystem:
Be used to receive the data input module of pending data;
Be used to calculate each and inquire about pairing historical query and the historical module of clicking and being organized into desired data layout;
Be used for constitutive characteristic matrix norm piece;
Be used for searching with the mode of searching of traversal the module of current inquiry best parameter, the step-length of said traversal is 0.1;
Be used to use the SVM regression model to set up the module of the mapping of user characteristics and optimized parameter.
Embodiment four, this embodiment are that said user behavior characteristic comprises to the further specifying of the user behavior characteristic in the self-adaptation Personal Information System described in the embodiment one:
The history of the web document of checking of expression user in an inquiry session session is clicked category feature, representes the history click that the user checked in very short time that is:;
The historical query category feature to searching system submitted of expression user in an inquiry session session promptly, represented the interior historical query of submitting to of user's very short time,
The current inquiry category feature of representing current inquiry;
Current inquiry of representing to concern between current inquiry and the historical query and the characteristic between the historical query;
Represent the current inquiry of relation between current inquiry and historical click the and the characteristic between historical the click.
Embodiment five, this embodiment are to the further specifying of embodiment four described self-adaptation Personal Information System,
The said historical category feature of clicking comprises: the historical total degree of clicking; The historical total length (is unit with single speech/term) of clicking; The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking is clicked average length at every turn, the last one historical total length of clicking; The last number of documents of clicking, the last mean value of clicking document length;
Said historical query category feature comprises: historical query total length, the average length of historical query and historical query total quantity;
The current inquiry category feature of the current inquiry of said expression comprises: current query length;
Characteristic between said current inquiry and the historical query comprises: current query word is compared with a last historical query, a new epexegesis and a last historical recurrence probability of clicking, and a current inquiry and a last inquiry are relatively; The quantity of new epexegesis, current query word is compared with a last historical query, and co-occurrence word accounts for the number percent of current query length; The similarity average of current inquiry and historical query, the similarity maximal value of current inquiry and historical query, the similarity of a current query word and a last historical query; Current inquiry is compared with a last historical query, the recurrence probability of new epexegesis and current inquiry, new epexegesis quantity; The number of times summation that new epexegesis occurs, current query word is compared with a last historical query, deletes the recurrence probability of a speech and a last historical query; Delete the quantity of speech in the last historical query; Delete the number of times summation that speech occurs in the last historical query, current inquiry is compared the recurrence probability of a co-occurrence word and a last historical query with a last historical query; The quantity of co-occurrence word in the last historical query, the number of times summation that co-occurrence word occurs in the last historical query;
Characteristic between said current inquiry and historical the click comprises: current query word and all historical similarity averages of clicking, current query word and whole historical similarity maximal values of clicking, a current query word and a last historical similarity of clicking; A current inquiry and a last historical point hit newly-increased speech number, and new epexegesis is in the last one historical occurrence number summation of clicking, and current query word is compared with a last historical query; Delete a speech and a last historical recurrence probability of clicking, delete the quantity of speech, last one historical point hits deletes the speech number; Hit the number of times summation of deleting that speech occurs at last one historical point; Current query word is compared with last historical a click, a co-occurrence word and a last historical recurrence probability of clicking, the quantity of co-occurrence word; Last one historical point hits the quantity of co-occurrence word, and last one historical point hits co-occurrence word occurrence number summation.
Embodiment six, the described self-adaptation customized information of this embodiment search method comprise:
According to current Query Information, in conjunction with the step of historical query information and historical click information constitutive characteristic matrix;
Obtain the step of training parameter forecast model according to eigenmatrix;
Based on eigenmatrix training and application parameter forecast model, obtain the step of the parameter of prediction;
Parameter so that prediction is come out is organized current inquiry, historical query and historical the click, with the step of user model and interrogation model combination formation personalized enquire model;
In document to be retrieved, seek document with the personalized enquire Model Matching as the preliminary search result, and said preliminary search result is sorted, the step that the result after the ordering is exported as final result for retrieval data based on correlation.
Embodiment seven, this embodiment are in the embodiment six described self-adaptation customized information search methods; According to current Query Information; In conjunction with the further qualification of the step of historical query information and historical click information constitutive characteristic matrix, this step further comprises:
According to current Query Information generate the user behavior characteristic step and
Step according to all the behavioural characteristic constitutive characteristic matrixes of user that obtain.
Embodiment eight, this embodiment are in the embodiment six described self-adaptation customized information search methods; According to eigenmatrix training and application parameter forecast model; The further qualification of the step of the parameter of acquisition prediction, this step further comprises:
Receive the step of pending data;
Calculate each and inquire about pairing historical query and the historical step of clicking and being organized into desired data layout;
Constitutive characteristic matrix norm piece step;
Search the step of current inquiry best parameter with the mode of searching of traversal, the step-length of said traversal is 0.1;
Use the SVM regression model to set up the mapping steps of user characteristics and optimized parameter.
Embodiment nine, this embodiment are that said user behavior characteristic comprises to the further qualification of the user behavior characteristic described in the embodiment six described self-adaptation customized information search methods:
The history of the web document of checking of expression user in an inquiry session session is clicked category feature, representes the history click that the user checked in very short time that is:;
The historical query category feature of the historical query to searching system submitted of expression user in an inquiry session session promptly, is represented the interior historical query of submitting to of user's very short time,
The current inquiry category feature of representing current inquiry;
Current inquiry of representing to concern between current inquiry and the historical query and the characteristic between the historical query;
Represent the current inquiry of relation between current inquiry and historical click the and the characteristic between historical the click.
Embodiment ten, this embodiment are further specifying five types of technical characterictics described in the embodiment nine:
The said historical category feature of clicking comprises: the historical total degree of clicking; The historical total length (is unit with single speech/term) of clicking; The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking is clicked average length at every turn, the last one historical total length of clicking; The last number of documents of clicking, the last mean value of clicking document length;
Said historical query category feature comprises: historical query total length, the average length of historical query and historical query total quantity;
The current inquiry category feature of the current inquiry of said expression comprises: current query length;
Characteristic between said current inquiry and the historical query comprises: current query word is compared with a last historical query, a new epexegesis and a last historical recurrence probability of clicking, and a current inquiry and a last inquiry are relatively; The quantity of new epexegesis, current query word is compared with a last historical query, and co-occurrence word accounts for the number percent of current query length; The similarity average of current inquiry and historical query, the similarity maximal value of current inquiry and historical query, the similarity of a current query word and a last historical query; Current inquiry is compared with a last historical query, the recurrence probability of new epexegesis and current inquiry, new epexegesis quantity; The number of times summation that new epexegesis occurs, current query word is compared with a last historical query, deletes the recurrence probability of a speech and a last historical query; Delete the quantity of speech in the last historical query; Delete the number of times summation that speech occurs in the last historical query, current inquiry is compared the recurrence probability of a co-occurrence word and a last historical query with a last historical query; The quantity of co-occurrence word in the last historical query, the number of times summation that co-occurrence word occurs in the last historical query;
Characteristic between said current inquiry and historical the click comprises: current query word and all historical similarity averages of clicking, current query word and whole historical similarity maximal values of clicking, a current query word and a last historical similarity of clicking; A current inquiry and a last historical point hit newly-increased speech number, and new epexegesis is in the last one historical occurrence number summation of clicking, and current query word is compared with a last historical query; Delete a speech and a last historical recurrence probability of clicking, delete the quantity of speech, last one historical point hits deletes the speech number; Hit the number of times summation of deleting that speech occurs at last one historical point; Current query word is compared with last historical a click, a co-occurrence word and a last historical recurrence probability of clicking, the quantity of co-occurrence word; Last one historical point hits the quantity of co-occurrence word, and last one historical point hits co-occurrence word occurrence number summation.
Input data of the present invention are continuous-query behaviors of carrying out in order to satisfy a search need according to each user of sequence of event; Comprise that each user submits to the inquiry of searching system; The document that searching system is returned (comprising title and summary), and the document code checked of user.
With file query_history.topic2 is example, and data layout is:
Figure BDA00001891411100111
The result for retrieval of inquiry string " acquisition u.s.foreign company " is recorded in<jian Suojieguo>With</Jian Suojieguo>Between.The precedence that document code occurs has been reacted the sequencing information of document in the searching system return results.Click set record the user click the numbering of the document of checking.
Step according to current Query Information, combination historical query information and historical click information constitutive characteristic matrix is:
After the input data, next carry out feature extraction.Need that current inquiry and the historical query in the analysis and consult session, current inquiry and historical clicked, historical query, the relation between historical the click is finally extracted five type, 39 the search behavior characteristics of each user when submitting each inquiry to, for:
The history of the web document of checking of expression user in an inquiry session session is clicked category feature, comprising:
The historical total degree of clicking
The historical total length of clicking
The historical length mean value (mean values of whole click length that each inquiry is corresponding) of clicking
Each average length of clicking
The last one historical total length of clicking
The last number of documents of clicking
The last mean value of clicking document length
The historical query category feature to searching system submitted of expression user in an inquiry session session comprises:
The historical query total length
Historical query length mean value
Historical query quantity
Represent the current inquiry category feature of current inquiry, comprising:
Current query length
Represent the current inquiry of relation between current inquiry and historical click the and the characteristic between historical the click, comprise
Current inquiry term and whole historical similarity averages of clicking
Current inquiry term and whole historical similarity maximal values of clicking
A current inquiry term and a last historical similarity of clicking
A current inquiry and a last historical point hit newly-increased speech number
New epexegesis is in the last one historical occurrence number summation of clicking
Current inquiry term compares with a last historical query, deletes a speech and a last historical recurrence probability of clicking
Delete the quantity of speech
Last one historical point hits deletes the speech number
Hit the number of times summation of deleting that speech occurs at last one historical point
Current inquiry term compares with last historical a click, a co-occurrence word and a last historical recurrence probability of clicking
The quantity of co-occurrence word
Last one historical point hits the quantity of co-occurrence word
Last one historical point hits co-occurrence word occurrence number summation
Current inquiry of representing to concern between current inquiry and the historical query and the characteristic between the historical query comprise:
Current inquiry term compares with a last historical query, a new epexegesis and a last historical recurrence probability of clicking
The quantity of new epexegesis is compared in a current inquiry and a last inquiry
Current inquiry term compares with a last historical query, and co-occurrence word accounts for the number percent of current query length
The similarity average of current inquiry and historical query
The similarity maximal value of current inquiry and historical query
The similarity of a current inquiry term and a last historical query
Current inquiry term compares with a last historical query, the recurrence probability of new epexegesis and current inquiry
New epexegesis quantity
The number of times summation that new epexegesis occurs
Current inquiry term compares with a last historical query, deletes the recurrence probability of a speech and a last historical query
Delete the quantity of speech in the last historical query
Delete the number of times summation that speech occurs in the last historical query
Current inquiry term compares the recurrence probability of a co-occurrence word and a last historical query with a last historical query
The quantity of co-occurrence word in the last historical query
The number of times summation that co-occurrence word occurs in the last historical query
On the other hand, calculate the optimum weighted value of each inquiry.These 39 characteristics and optimal weights value are formed the training data of parametric prediction model jointly.The part that in training data, starts is represented filename and the title of each characteristic and the symbolic animal of the birth year description of character pair of training data.Part below the DATA is exactly eigenmatrix (this form can directly be imported for existing SVM returns kit).
With q2Be example, then corresponding training data is:
RELATION?q2.arff
ATTRIBUTE?cqlenth?numeric
......
ATTRIBUTE?class?numeric
DATA
3,2,20,20,10,20,0,2,0.0869565217391304,0.0869565217391304,0.0869565217391304,0,0,0,2,2,1,0.4,0.4,0.4,0.333333333333333,0,0.5,0.4
4,3,2,2,0.666666666666667,2,1,2,0,0,0,0,0,0,2,2,1,0.333333333333333,0.333333333333333,0.333333333333333,0.25,0,0.5,0.4
......
First line description of above-mentioned training data file by name " q2.arff ", key word is " RELATION ", second line description first characteristic " length of current inquiry ", key word is " ATTRIBUTE ".By that analogy, have 39 feature descriptions.
An ensuing line description optimized parameter type is the decimal between the 0-1, and key word is " ATTRIBUTE ".Be exactly the characteristic of correspondence matrix after DATA, eigenmatrix refers to the content of removing in the training data file with beginning, and 39 user behavior proper vectors and corresponding optimized parameter that eigenmatrix is mentioned by preamble are formed.Each row has 40 data item, and preceding 39 is eigenwert, and the 40th data item is optimized parameter.Each training data can be used delegation (40) vector representation, the delegation of constitutive characteristic matrix.The quantity of training data has determined the line number of eigenmatrix.Separate with comma between the data item.
Adopt machine learning method SVM to return (SVM-Regression) according to above-mentioned training data and come the training parameter forecast model, this model representation be the funtcional relationship of optimal weights and each characteristic;
MAP maximal value with each inquiry is the ferret out value.The step-length of traversal is 0.1.Adopt Support Vector Regression (SVR) (Chang and Lin, 2001) to train, confirm the optimal weights of each inquiry and the funtcional relationship of 39 characteristics, and then obtain the training parameter forecast model.
When the application parameter forecast model is predicted, import 39 eigenwerts of each test query, this parametric prediction model just can produce corresponding weighted value.Make up current inquiry by this way, historical query and historical three parts of clicking.The test data form is as follows.Test data and training data form basically identical, difference are that last row of proper vector are "? " in the test data, represent value to be predicted.The test data form is:
Figure BDA00001891411100141
The main task of carrying out retrieval subsystem be with TREC AP88-90 document as the band search file, use Lemur to set up index, accomplish retrieval tasks at the conventional language model framework then.
Predicted the outcome based on what a last step application parameter forecast model produced, organize current inquiry and historical information, constitute the personalized enquire model.
If current inquiry is k inquiry Q in the inquiry sessionk, the user interest of short-term history inquiry representative is embodied in historical query Q soi(the average of the term probability of occurrence among 1≤i≤k-1).Similarly, user's short-term interest is also embodied in the historical C of clicki(the average that the term among 1≤i≤k-1) occurs.Query history is by historical query HQClick H with historyCForm.Query word is represented with ω.
A) calculate current interrogation model
p(&omega;|Qi)=c(&omega;,Qi)|Qi|---(1)
The implication of each parameter in the formula, please explain: ω represents speech, QiThe representative inquiry, P represents probability, and i representes the i time.Current interrogation model is by the number of times of current query word appearance and the length decision of current inquiry.P (ω | Qi) the current inquiry Q of representative1In the probability that occurs of each speech ω.C (ω, Qi) represent at inquiry QiThe number of times that middle speech ω occurs.| Qi| the length of expression inquiry Qi, just form by what speech.The implication of current interrogation model representative is that the computing method of the probability of the some speech in the inquiry string are, the number of times that this speech occurs in inquiry then divided by current inquiry in the sum of speech.
B) computation history interrogation model
p(&omega;|HQ)=1k-1&Sigma;i=1i=k-1p(&omega;|Qi)---(2)
The implication of each parameter in the formula, please explain: ω represents speech, QiThe representative inquiry, P represents probability, HqRepresent whole historical querys, i represents the i time.Historical query model p (ω | HQ) by single historical query model P (ω | Qi) adding up and making even all obtains.For current inquiry Qk, its historical query is by Q1, Q2... QK-1Form.With each historical query model P (ω | Qi) add up, then divided by the quantity k-1 of historical query.Wherein single historical query model P (ω | Qi) calculate according to formula (1).The implication of historical query model representative is at whole historical HQIn the method for calculating probability of single speech ω be, calculate number of times that this speech occurs sum at first respectively divided by the place speech that historical query comprised in each historical query, next, next k-1 probability done and, at last divided by k-1.
C) computation history is clicked model
p(&omega;|HC)=1k-1&Sigma;i=1i=k-1p(&omega;|Ci)---(3)
The implication of each parameter in the formula, please explain: ω represents speech, CiThe web document that representative of consumer was checked, P represents probability, HcWhole history web pages document that representative of consumer has been seen, i representes the i time.With the historical query model class seemingly, historical click model P (ω | HC) by single historical click model P (ω | Ci) adding up and making even all obtains.For current inquiry Qk, its history is clicked by C1, C2... CK-1Form.With each historical click model P (ω | Ci) add up the quantity k-1 that clicks divided by history then.The wherein single historical model of clicking calculates according to formula (1).
D) extract current inquiry category feature
The length that mainly comprises current inquiry.
E) extract the historical query category feature
Mainly comprise historical query quantity, total length and average length.
F) characteristic between current inquiry of extraction and the historical query
Mainly comprise the similarity between a current inquiry and the last inquiry, the similarity of current inquiry and whole historical querys, new epexegesis and the quantity of deleting speech, and the shared proportion in current inquiry or historical query.
G) characteristic between current inquiry of extraction and historical the click
Mainly comprise the similarity between a current inquiry and whole and last historical the click, new epexegesis and the quantity of deleting speech, and concentrate the proportion of fighting at current inquiry and historical point.
H) the operation parameter forecast model obtains parameter
User characteristics is as the input of parameter prediction system, and output is fit to the parameter of the best of current inquiry
I) organize current interrogation model, historical query model and the historical model of clicking according to the parameter that dopes
Parameter beta whereink∈ (0,1) has determined the weight allocation between historical query and historical click the, parameter betakThe historical importance of clicking of big more explanation is big more; Work as βk=1 o'clock, the expression user interest model was clicked by history fully and is embodied.In like manner, αkBig more, the importance of current inquiry is big more.
Adaptive personalized retrieval model has been attempted two kinds of methods respectively, a kind of retrieval model (AdaptiveEW) under the equal situation of importance between the history, formalization representation such as formula (4) of being based on.Another kind is descending according to historical and current query time distance, the importance retrieval model (AdaptiveDW) under the rule that changes from small to big, and formalization representation is shown in formula (5).Wherein, QkRepresent current inquiry, HcRepresent that the history before the current inquiry is clicked in the current inquiry session, HqRepresent the historical query in the current inquiry session.Parameter alphak, βk, mk, nkRepresent weight respectively, their span is the arbitrary small number between 0 to 1.
The interrogation model p of self-adaptation personalized retrieval model (AdaptiveEW) (ω | θk) comprise two parts: current interrogation model p (ω | Qk) and historical models, current interrogation model weight is αkThe historical models weight is 1-αkCurrent interrogation model is represented the probability that current query word ω occurs, and calculates according to formula (1).Wherein historical information by history click model p (ω | Hc) and historical query model p (ω | HQ) form.The historical query model calculates according to formula (2).The historical model of clicking calculates according to formula (3).Weight equates between each historical query.Weight equates between each historical click.Historical click model weight is 1-βk, historical click model weight is βkShown in formula (4).
p(ω|θk)=ακp(ω|QK)+(1-αk)[βkp(ω|HC)+(1-βk)p(ω|HQ)]
(4)
The implication of each parameter in the formula, please explain:
More than be self-adaptation retrieval model (AdaptiveEW), wherein weight equates between the historical information.Another kind of self-adaptation retrieval model thinks that the importance of historical information is relevant with the time gap of current inquiry.The interrogation model p of this self-adaptation retrieval model (AdaptiveDW) (ω | ψk) comprise two parts: interrogation model p (ω | θk) and historical click model p (ω | HC) form.Historical click model p (ω | HC) weight be mk, interrogation model p (ω | θk) weight be 1-mkInterrogation model p (ω | θk) by current interrogation model p (ω, θk) and last one constantly interrogation model p (ω | θK-1) form.Current interrogation model p (ω, θk) weight is nk, the interrogation model p in a last moment (ω | θK-1) weight is 1-nkThe historical query model calculates according to formula (2).The historical model of clicking calculates according to formula (3).Interrogation model carries out the weight decay to old interrogation model as time passes in the self-adaptation retrieval model (AdaptiveDW), and new historical query is bigger than the weight of old historical query,, formalization representation is shown in formula (5).
p(ω|θk)=nkp(ω,QK)+(1-nk)p(ω|θk-1)
The implication of each parameter in the formula, please explain:
J) start retrieving
In document to be retrieved, seek the result for retrieval that mates with personalized enquire, and carry out descending sort based on the correlation probabilities value.1000 pieces of documents are returned in each inquiry.
Personalized enquire is submitted to after the searching system, and searching system is returned result for retrieval.Personalized retrieval result's data layout:
Figure BDA00001891411100172
Number of queries is shown in first tabulation, and secondary series is represented document code, the 3rd row representative ordering, and the 4th row are represented the mark of language model.So far, the implementation process of whole self-adaptation personalized retrieval model finishes.

Claims (10)

Translated fromChinese
1.自适应个性化信息检索系统,其特征在于该系统包括:1. An adaptive personalized information retrieval system, characterized in that the system includes:用于根据当前查询信息、结合历史查询信息和历史点击信息构成特征矩阵,还用于根据特征矩阵获得训练参数预测模型的数据输入子系统;It is used to form a feature matrix based on the current query information, combined with historical query information and historical click information, and is also used to obtain the data input subsystem of the training parameter prediction model according to the feature matrix;用于根据特征矩阵训练并应用参数预测模型、获得预测参数的参数训练和预测子系统;A parameter training and prediction subsystem for training and applying a parameter prediction model based on the feature matrix, and obtaining prediction parameters;用于以预测出来的参数来组织当前查询、历史查询和历史点击;还用于将用户模型和查询模型结合形成个性化查询模型的执行检索子系统;It is used to organize current queries, historical queries and historical clicks with predicted parameters; it is also used to combine user models and query models to form a personalized query model execution retrieval subsystem;用于在待检索文档中寻找与个性化查询匹配的文档作为初步检索结果,还用于根据相关性对所述初步检索结果进行排序,并将排序后的结果作为最终检索结果输出的数据输出子系统。It is used to find the documents that match the personalized query in the documents to be retrieved as the preliminary retrieval results, and is also used to sort the preliminary retrieval results according to the correlation, and output the sorted results as the final retrieval results. system.2.根据权利要求1所述的自适应个性化信息检索系统,其特征在于,所述数据输入子系统包括:2. The adaptive personalized information retrieval system according to claim 1, wherein the data input subsystem comprises:用于根据当前查询信息生成用户行为特征的模块,和a module for generating user behavior features based on current query information, and用于根据获得的用户所有行为特征构成特征矩阵的模块。A module for constructing a feature matrix based on all the obtained user behavior features.3.根据权利要求1所述的自适应个性化信息检索系统,其特征在于,参数训练和预测子系统包括:3. The adaptive personalized information retrieval system according to claim 1, wherein the parameter training and prediction subsystems include:用于接收待处理数据的数据输入模块;A data input module for receiving data to be processed;用于计算每个查询所对应的历史查询和历史点击、并整理成所要求的数据格式的模块;A module for calculating historical queries and historical clicks corresponding to each query, and organizing them into the required data format;用于构成特征矩阵的模块;modules used to form feature matrices;用于以遍历的查找方式查找当前查询最优的参数的模块,所述遍历的步长为0.1;A module for finding the optimal parameter of the current query in a traversal search mode, the step size of the traversal is 0.1;用于使用SVM回归模型建立用户特征和最优参数的映射的模块。A module for building a mapping of user characteristics and optimal parameters using an SVM regression model.4.根据权利要求1所述的自适应个性化信息检索系统,其特征在于,所述用户行为特征包括:4. The adaptive and personalized information retrieval system according to claim 1, wherein the user behavior characteristics include:表示用户在一个查询会话session内的查看过的网页文档的历史点击类特征,Represents the historical click characteristics of the web documents viewed by the user in a query session,表示用户在一个查询会话session内的向检索系统提交过的历史查询类特征,Represents the historical query features submitted by the user to the retrieval system in a query session session,表示当前查询的当前查询类特征;Indicates the current query class feature of the current query;表示当前查询和历史查询之间关系的当前查询和历史查询之间的特征;A feature between the current query and the historical query representing the relationship between the current query and the historical query;表示当前查询和历史点击之间关系的当前查询和历史点击之间的特征。Features between the current query and historical hits representing the relationship between the current query and historical hits.5.根据权利要求4所述的自适应个性化信息检索系统,其特征在于,5. The adaptive personalized information retrieval system according to claim 4, characterized in that,所述历史点击类特征包括:历史点击总次数,历史点击总长度,历史点击长度平均值,每次点击平均长度,上一历史点击总长度,上一次点击文档数量,上一次点击文档长度的平均值;The historical click features include: the total number of historical clicks, the total length of historical clicks, the average length of historical clicks, the average length of each click, the total length of previous historical clicks, the number of documents clicked last time, and the average length of document clicks last time. value;所述历史查询类特征包括:历史查询总长度,历史查询的平均长度和历史查询总数量;The historical query class features include: the total length of historical queries, the average length of historical queries and the total number of historical queries;所述表示当前查询的当前查询类特征包括:当前查询长度;The current query class feature representing the current query includes: current query length;所述当前查询和历史查询之间的特征包括:当前查询词与上一个历史查询相比,新增词与上一个历史点击的重复概率,当前查询和上一个查询比较,新增词的数量,当前查询词与上一个历史查询相比,共现词占当前查询长度的百分比,当前查询和历史查询的相似度均值,当前查询和历史查询的相似度最大值,当前查询词和上一个历史查询的相似度,当前查询与上一个历史查询相比,新增词与当前查询的重复概率,新增词数量,新增词出现的次数总和,当前查询词与上一个历史查询相比,删减词与上一个历史查询的重复概率,上一历史查询中删减词的数量,上一历史查询中删减词出现的次数总和,当前查询与上一个历史查询相比,共现词与上一个历史查询的重复概率,上一历史查询中共现词的数量,上一历史查询中共现词出现的次数总和;The characteristics between the current query and the historical query include: the current query word is compared with the previous historical query, the repetition probability of the new word and the previous historical click, the comparison between the current query and the previous query, the number of new words, Compared with the previous historical query, the percentage of co-occurring words in the current query length, the average similarity between the current query and the historical query, the maximum similarity between the current query and the historical query, the current query and the previous historical query The similarity of the current query compared with the previous historical query, the repetition probability of new words and the current query, the number of new words, the sum of the times of new words, the current query compared with the previous historical query, the deletion The repetition probability of a word and the previous historical query, the number of deleted words in the previous historical query, the sum of the occurrence times of deleted words in the previous historical query, the current query compared with the previous historical query, the co-occurrence of words compared with the previous The repetition probability of historical queries, the number of co-occurring words in the previous historical query, and the sum of the times of co-occurring words in the previous historical query;所述当前查询和历史点击之间的特征包括:当前查询词与全部历史点击相似度均值,当前查询词与全部历史点击相似度最大值,当前查询词与上一个历史点击相似度,当前查询与上一个历史点击中新增词个数,新增词在上一历史点击出现次数总和,当前查询词与上一个历史查询相比,删减词与上一个历史点击的重复概率,删减词的数量,上一历史点击中删减词个数,在上一历史点击中删减词出现的次数总和,当前查询词与上一个历史点击相比,共现词与上一个历史点击的重复概率,共现词的数量,上一历史点击中共现词的数量,上一历史点击中共现词出现次数总和。The features between the current query and historical clicks include: the average similarity between the current query word and all historical clicks, the maximum similarity between the current query word and all historical clicks, the similarity between the current query word and the previous historical click, the current query and the previous historical click The number of new words in the last historical click, the sum of the number of new words in the previous historical click, the current query word compared with the previous historical query, the repetition probability of the deleted word and the previous historical click, the number of deleted words Quantity, the number of deleted words in the previous historical click, the sum of the times of deleted words in the previous historical click, the repetition probability of co-occurring words and the previous historical click compared with the current query word and the previous historical click, The number of co-occurrence words, the number of co-occurrence words in the last historical click, and the sum of the occurrence times of the last historical click.6.自适应个性化信息检索方法,其特征在于该自适应个性化信息检索方法包括:6. An adaptive personalized information retrieval method, characterized in that the adaptive personalized information retrieval method comprises:根据当前查询信息,结合历史查询信息和历史点击信息构成特征矩阵的步骤;A step of forming a feature matrix according to the current query information and combining historical query information and historical click information;根据特征矩阵获得训练参数预测模型的步骤;The step of obtaining the training parameter prediction model according to the feature matrix;根据特征矩阵训练并应用参数预测模型,获得预测的参数的步骤;A step of obtaining predicted parameters by training and applying a parameter prediction model according to the feature matrix;以预测出来的参数来组织当前查询、历史查询和历史点击,将用户模型和查询模型结合形成个性化查询模型的步骤;The step of organizing the current query, historical query and historical clicks with the predicted parameters, and combining the user model and the query model to form a personalized query model;在待检索文档中寻找与个性化查询模型匹配的文档作为初步检索结果,并根据相关性对所述初步检索结果进行排序,将排序后的结果作为最终检索结果数据输出的步骤。Finding documents matching the personalized query model among the documents to be retrieved as preliminary retrieval results, sorting the preliminary retrieval results according to their relevance, and outputting the sorted results as final retrieval result data.7.根据权利要求6所述的自适应个性化信息检索方法,其特征在于,根据当前查询信息,结合历史查询信息和历史点击信息构成特征矩阵的步骤包括:7. The adaptive personalized information retrieval method according to claim 6, characterized in that, according to the current query information, the step of combining historical query information and historical click information to form a feature matrix comprises:根据当前查询信息生成用户行为特征的步骤,和the step of generating user behavior features based on the current query information, and根据获得的用户所有行为特征构成特征矩阵的步骤。A step of constructing a feature matrix according to all the obtained user behavior features.8.根据权利要求6所述的自适应个性化信息检索方法,其特征在于,所述根据特征矩阵训练并应用参数预测模型,获得预测的参数的步骤还包括:8. The adaptive personalized information retrieval method according to claim 6, wherein the step of obtaining predicted parameters by training and applying a parameter prediction model according to the feature matrix further comprises:接收待处理数据的步骤;The step of receiving data to be processed;计算每个查询所对应的历史查询和历史点击、并整理成所要求的数据格式的步骤;Steps of calculating historical queries and historical clicks corresponding to each query, and organizing them into the required data format;构成特征矩阵的模块步骤;The modular steps that make up the feature matrix;以遍历的查找方式查找当前查询最优的参数的步骤,所述遍历的步长为0.1;The step of searching for the optimal parameter of the current query in a traversal search mode, the step size of the traversal is 0.1;使用SVM回归模型建立用户特征和最优参数的映射的步骤。Steps to build a mapping of user characteristics and optimal parameters using the SVM regression model.9.根据权利要求6所述的自适应个性化信息检索方法,其特征在于,所述用户行为特征包括:9. The adaptive personalized information retrieval method according to claim 6, wherein the user behavior characteristics include:表示用户在一个查询会话session内的查看过的网页文档的历史点击类特征,Represents the historical click characteristics of the web documents viewed by the user in a query session,表示用户在一个查询会话session内的向检索系统提交过的历史查询类特征,Represents the historical query features submitted by the user to the retrieval system in a query session session,表示当前查询的当前查询类特征;Indicates the current query class feature of the current query;表示当前查询和历史查询之间关系的当前查询和历史查询之间的特征;A feature between the current query and the historical query representing the relationship between the current query and the historical query;表示当前查询和历史点击之间关系的当前查询和历史点击之间的特征。Features between the current query and historical hits representing the relationship between the current query and historical hits.10.根据权利要求9所述的自适应个性化信息检索方法,其特征在于,10. The adaptive personalized information retrieval method according to claim 9, characterized in that,所述历史点击类特征包括:历史点击总次数,历史点击总长度,历史点击长度平均值,每次点击平均长度,上一历史点击总长度,上一次点击文档数量,上一次点击文档长度的平均值;The historical click features include: the total number of historical clicks, the total length of historical clicks, the average length of historical clicks, the average length of each click, the total length of previous historical clicks, the number of documents clicked last time, and the average length of document clicks last time. value;所述历史查询类特征包括:历史查询总长度,历史查询的平均长度和历史查询总数量;The historical query class features include: the total length of historical queries, the average length of historical queries and the total number of historical queries;所述表示当前查询的当前查询类特征包括:当前查询长度;The current query class feature representing the current query includes: current query length;所述当前查询和历史查询之间的特征包括:当前查询词与上一个历史查询相比,新增词与上一个历史点击的重复概率,当前查询和上一个查询比较,新增词的数量,当前查询词与上一个历史查询相比,共现词占当前查询长度的百分比,当前查询和历史查询的相似度均值,当前查询和历史查询的相似度最大值,当前查询词和上一个历史查询的相似度,当前查询与上一个历史查询相比,新增词与当前查询的重复概率,新增词数量,新增词出现的次数总和,当前查询词与上一个历史查询相比,删减词与上一个历史查询的重复概率,上一历史查询中删减词的数量,上一历史查询中删减词出现的次数总和,当前查询与上一个历史查询相比,共现词与上一个历史查询的重复概率,上一历史查询中共现词的数量,上一历史查询中共现词出现的次数总和;The characteristics between the current query and the historical query include: the current query word is compared with the previous historical query, the repetition probability of the new word and the previous historical click, the comparison between the current query and the previous query, the number of new words, Compared with the previous historical query, the percentage of co-occurring words in the current query length, the average similarity between the current query and the historical query, the maximum similarity between the current query and the historical query, the current query and the previous historical query The similarity of the current query compared with the previous historical query, the repetition probability of new words and the current query, the number of new words, the sum of the times of new words, the current query compared with the previous historical query, the deletion The repetition probability of a word and the previous historical query, the number of deleted words in the previous historical query, the sum of the occurrence times of deleted words in the previous historical query, the current query compared with the previous historical query, the co-occurrence of words compared with the previous The repetition probability of historical queries, the number of co-occurring words in the previous historical query, and the sum of the times of co-occurring words in the previous historical query;所述当前查询和历史点击之间的特征包括:当前查询词与全部历史点击相似度均值,当前查询词与全部历史点击相似度最大值,当前查询词与上一个历史点击相似度,当前查询与上一个历史点击中新增词个数,新增词在上一历史点击出现次数总和,当前查询词与上一个历史查询相比,删减词与上一个历史点击的重复概率,删减词的数量,上一历史点击中删减词个数,在上一历史点击中删减词出现的次数总和,当前查询词与上一个历史点击相比,共现词与上一个历史点击的重复概率,共现词的数量,上一历史点击中共现词的数量,上一历史点击中共现词出现次数总和。The features between the current query and historical clicks include: the average similarity between the current query word and all historical clicks, the maximum similarity between the current query word and all historical clicks, the similarity between the current query word and the previous historical click, the current query and the previous historical click The number of new words in the last historical click, the sum of the number of new words in the previous historical click, the current query word compared with the previous historical query, the repetition probability of the deleted word and the previous historical click, the number of deleted words Quantity, the number of deleted words in the previous historical click, the sum of the times of deleted words in the previous historical click, the repetition probability of co-occurring words and the previous historical click compared with the current query word and the previous historical click, The number of co-occurrence words, the number of co-occurrence words in the last historical click, and the sum of the occurrence times of the last historical click.
CN201210244519.5A2012-07-162012-07-16Self-adaptive personalized information retrieval system and methodExpired - Fee RelatedCN102779193B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201210244519.5ACN102779193B (en)2012-07-162012-07-16Self-adaptive personalized information retrieval system and method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201210244519.5ACN102779193B (en)2012-07-162012-07-16Self-adaptive personalized information retrieval system and method

Publications (2)

Publication NumberPublication Date
CN102779193Atrue CN102779193A (en)2012-11-14
CN102779193B CN102779193B (en)2015-05-13

Family

ID=47124105

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201210244519.5AExpired - Fee RelatedCN102779193B (en)2012-07-162012-07-16Self-adaptive personalized information retrieval system and method

Country Status (1)

CountryLink
CN (1)CN102779193B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104346160A (en)*2013-08-092015-02-11联想(北京)有限公司Method for processing information and electronic equipment
CN104462357A (en)*2014-12-082015-03-25百度在线网络技术(北京)有限公司Method and device for realizing personalized search
CN104462146A (en)*2013-09-242015-03-25北京千橡网景科技发展有限公司Method and device for information retrieval
CN104516897A (en)*2013-09-292015-04-15国际商业机器公司Method and device for sorting application objects
CN104537502A (en)*2015-01-152015-04-22北京嘀嘀无限科技发展有限公司Method and device for processing orders
CN104679771A (en)*2013-11-292015-06-03阿里巴巴集团控股有限公司Individual data searching method and device
CN104778176A (en)*2014-01-132015-07-15阿里巴巴集团控股有限公司Data search processing method and device
CN104951637A (en)*2014-03-252015-09-30腾讯科技(深圳)有限公司Method and device for obtaining training parameters
CN105022787A (en)*2015-06-122015-11-04广东小天才科技有限公司Composition pushing method and device
CN105045875A (en)*2015-07-172015-11-11北京林业大学Personalized information retrieval method and apparatus
CN105095357A (en)*2015-06-242015-11-25百度在线网络技术(北京)有限公司Method and device for processing consultation data
CN107133321A (en)*2017-05-042017-09-05广东神马搜索科技有限公司The analysis method and analytical equipment of the search attribute of the page
CN107229948A (en)*2017-05-192017-10-03四川新网银行股份有限公司A kind of method for reducing customer churn on line based on customer problem forecast model
CN107256267A (en)*2017-06-192017-10-17北京百度网讯科技有限公司Querying method and device
CN107423298A (en)*2016-05-242017-12-01北京百度网讯科技有限公司A kind of searching method and device
CN108345696A (en)*2018-03-202018-07-31广东欧珀移动通信有限公司Card sort method, device, server and storage medium
WO2018157625A1 (en)*2017-02-282018-09-07华为技术有限公司Reinforcement learning-based method for learning to rank and server
CN114021019A (en)*2021-11-102022-02-08中国人民大学Retrieval method integrating personalized search and search result diversification
CN115016873A (en)*2022-05-052022-09-06上海乾臻信息科技有限公司Front-end data interaction method and system, electronic equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2007051397A1 (en)*2005-11-012007-05-10Huawei Technologies Co., Ltd.An information retrieval system and information retrieval method
CN101127043A (en)*2007-08-032008-02-20哈尔滨工程大学 A lightweight personalized search engine and its search method
CN102346899A (en)*2011-10-082012-02-08亿赞普(北京)科技有限公司Method and device for predicting advertisement click rate based on user behaviors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2007051397A1 (en)*2005-11-012007-05-10Huawei Technologies Co., Ltd.An information retrieval system and information retrieval method
CN101127043A (en)*2007-08-032008-02-20哈尔滨工程大学 A lightweight personalized search engine and its search method
CN102346899A (en)*2011-10-082012-02-08亿赞普(北京)科技有限公司Method and device for predicting advertisement click rate based on user behaviors

Cited By (30)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104346160A (en)*2013-08-092015-02-11联想(北京)有限公司Method for processing information and electronic equipment
CN104346160B (en)*2013-08-092018-02-27联想(北京)有限公司The method and electronic equipment of information processing
CN104462146A (en)*2013-09-242015-03-25北京千橡网景科技发展有限公司Method and device for information retrieval
CN104516897A (en)*2013-09-292015-04-15国际商业机器公司Method and device for sorting application objects
CN104516897B (en)*2013-09-292018-03-02国际商业机器公司A kind of method and apparatus being ranked up for application
CN104679771B (en)*2013-11-292018-09-18阿里巴巴集团控股有限公司A kind of individuation data searching method and device
CN104679771A (en)*2013-11-292015-06-03阿里巴巴集团控股有限公司Individual data searching method and device
CN104778176A (en)*2014-01-132015-07-15阿里巴巴集团控股有限公司Data search processing method and device
CN104951637B (en)*2014-03-252018-04-03腾讯科技(深圳)有限公司A kind of method and device for obtaining training parameter
CN104951637A (en)*2014-03-252015-09-30腾讯科技(深圳)有限公司Method and device for obtaining training parameters
US9892368B2 (en)2014-03-252018-02-13Tencent Technology (Shenzhen) Company LimitedMethod and apparatus for acquiring training parameters to calibrate a model
CN104462357B (en)*2014-12-082017-11-17百度在线网络技术(北京)有限公司The method and apparatus for realizing personalized search
CN104462357A (en)*2014-12-082015-03-25百度在线网络技术(北京)有限公司Method and device for realizing personalized search
CN104537502A (en)*2015-01-152015-04-22北京嘀嘀无限科技发展有限公司Method and device for processing orders
CN105022787A (en)*2015-06-122015-11-04广东小天才科技有限公司Composition pushing method and device
CN105095357A (en)*2015-06-242015-11-25百度在线网络技术(北京)有限公司Method and device for processing consultation data
CN105045875B (en)*2015-07-172018-06-12北京林业大学Personalized search and device
CN105045875A (en)*2015-07-172015-11-11北京林业大学Personalized information retrieval method and apparatus
CN107423298B (en)*2016-05-242021-02-19北京百度网讯科技有限公司Searching method and device
CN107423298A (en)*2016-05-242017-12-01北京百度网讯科技有限公司A kind of searching method and device
US11500954B2 (en)2017-02-282022-11-15Huawei Technologies Co., Ltd.Learning-to-rank method based on reinforcement learning and server
WO2018157625A1 (en)*2017-02-282018-09-07华为技术有限公司Reinforcement learning-based method for learning to rank and server
CN107133321A (en)*2017-05-042017-09-05广东神马搜索科技有限公司The analysis method and analytical equipment of the search attribute of the page
CN107229948A (en)*2017-05-192017-10-03四川新网银行股份有限公司A kind of method for reducing customer churn on line based on customer problem forecast model
CN107256267A (en)*2017-06-192017-10-17北京百度网讯科技有限公司Querying method and device
CN107256267B (en)*2017-06-192020-07-24北京百度网讯科技有限公司 Inquiry method and device
CN108345696A (en)*2018-03-202018-07-31广东欧珀移动通信有限公司Card sort method, device, server and storage medium
CN114021019A (en)*2021-11-102022-02-08中国人民大学Retrieval method integrating personalized search and search result diversification
CN114021019B (en)*2021-11-102024-03-29中国人民大学Retrieval method integrating personalized search and diversification of search results
CN115016873A (en)*2022-05-052022-09-06上海乾臻信息科技有限公司Front-end data interaction method and system, electronic equipment and readable storage medium

Also Published As

Publication numberPublication date
CN102779193B (en)2015-05-13

Similar Documents

PublicationPublication DateTitle
CN102779193A (en)Self-adaptive personalized information retrieval system and method
CN102609433B (en)Method and system for recommending query based on user log
CN100507920C (en) A method for reordering search engine retrieval results based on user behavior information
CN109918563B (en)Book recommendation method based on public data
CN101501630B (en)Method for ranking computerized search result list and its database search engine
CN102054004B (en)Webpage recommendation method and device adopting same
CN100440224C (en) An automatic processing method for search engine performance evaluation
CN112632359A (en)Information recommendation method and device, electronic equipment and storage medium
CN111797214A (en) Question screening method, device, computer equipment and medium based on FAQ database
CN107704467B (en)Search quality evaluation method and device
WO2021184674A1 (en)Text keyword extraction method, electronic device, and computer readable storage medium
CN101464897A (en)Word matching and information query method and device
CN107808278A (en)A kind of Github open source projects based on sparse self-encoding encoder recommend method
CN104268142B (en)Based on the Meta Search Engine result ordering method for being rejected by strategy
CN103399891A (en)Method, device and system for automatic recommendation of network content
JP6056610B2 (en) Text information processing apparatus, text information processing method, and text information processing program
CN101127046A (en)Method and system for sequencing to blog article
CN101556603A (en)Coordinate search method used for reordering search results
CN1996316A (en)Search engine searching method based on web page correlation
CN103049470A (en)Opinion retrieval method based on emotional relevancy
CN103688256A (en) Method, device and system for determining video quality parameters based on comment information
CN113486232B (en)Query method, device, server, medium and product
CN118568355B (en) Personalized data retrieval method and system based on artificial intelligence
CN101814085A (en)WEB data bank selection method based on WDB (World Data Bank) characteristics and user query requests
CN114090877A (en)Position information recommendation method and device, electronic equipment and storage medium

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20200330

Address after:150001 No. 118 West straight street, Nangang District, Heilongjiang, Harbin

Patentee after:Harbin University of technology high tech Development Corp.

Address before:150001 Harbin, Nangang, West District, large straight street, No. 92

Patentee before:HARBIN INSTITUTE OF TECHNOLOGY

CF01Termination of patent right due to non-payment of annual fee
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20150513


[8]ページ先頭

©2009-2025 Movatter.jp