A kind of personalization document retrieval methodTechnical field
The present invention relates to documents, technical field of information retrieval, specifically relate to a kind of search method of personalized document.
Background technique
Literature search refers to the process of to be needed to obtain document according to study and work.Existing peek-a-boo is mostNumber is all based on the static informations such as the attribute, including keyword, author, bibliography of document itself and constructs, will not be literaryThe characteristic for offering demander or retrieval people is included in during literature search, that is to say, that anyone inputs same search keyWhen, obtained search result is identical.In the epoch of this information explosion, literature search equally faces the information retrieval of magnanimityAs a result, if it is possible to the identity speciality for retrieving people is included in retrieving, individualized fit is carried out to search result, it will helpObtain very useful search result.For example, the search result that is obtained at retrieval " network " of personnel of research logistics andThe researcher of one research fiber optic communication input the search result that is obtained when same keyword should different from, to reflect themThe research achievement of respective research field, i.e., carry out personalized document retrieval according to its identity.
Publication No. CN 101373486, publication date are that on 2 25th, 2009 Chinese patent literatures disclose a kind of baseIn the personalized summary system of user interest model, the personalized summary system is by WEB information retrieval unit, user interest unitIt is formed with personalized summary unit.The personalized summary system is built by analysis user search log using conceptual clustering methodThe user interest model that vertical and/or update is described with level concept structure;Then according to the user interest model and search resultThe parsing for carrying out sentence similarity in user interest and search result, to obtain the personalized summary for meeting user.It usesThe personalized summary that personalized sentence scoring is handled has fully considered the Characteristic of Interest of user, makes the generating process root of abstractIt is matched according to the interest of user, the validity of abstract and the satisfaction of user can be improved.
Using above patent document as the prior art of representative, although it is emerging also to carry out user with search result using interest modelThe parsing of interest and sentence similarity in search result, to obtain the abstract for meeting user individual, but it is needed to sentence phaseIt is parsed like degree, the personalized summary system accuracy rate shown after parsing is simultaneously not high enough, and retrieval mode is complicated.TogetherWhen, since the user of peek-a-boo is mostly the researcher of profession, the content of retrieval is also mainly professional Research Literature, gainedThe result is that automatic abstract, and it is not good enough for the matching of professional Research Literature search result.
Summary of the invention
The present invention is directed to provide a kind of personalized literature search side for defect and deficiency present in the above-mentioned prior artMethod when being retrieved using this method, increases the interest keyword and corresponding interest-degree of user, for each information retrievalIt is adjusted as a result, being all based on user interest keywords database, so that the search result of user individual is exported, so that outputSearch result is more accurate, and search method is simple.
The present invention is realized by using following technical proposals:
A kind of personalization document retrieval method, it is characterised in that steps are as follows:
A, user information static library: identity information and research field including being not limited to user is constructed for each user,And searching system is input to by user;
B, the interest keywords database X of user is constructed for each user: crucial including multiple interest keywords and each interestThe corresponding interest-degree of word;Interest keywords database X-form is expressed as x1, x2 ..., xm, wherein m is non-zero natural number, forEach element x=(k, w), wherein k is interest keyword, and w is the corresponding interest-degree of interest keyword, interest keywords database XIt is initialized as the Focus Area that user inputs in step a, and assigns interest-degree unification to a quiescent value;
C, information retrieval: when user carries out information retrieval, the keyword set for setting input is combined into Q, is retrieved, and is examinedHitch fruit R1, R2 ..., Rn, n is non-zero natural number;Each interest keyword in the interest keywords database X of user is added againEnter into keyword set, then retrieved, obtained search result is such as and R1, R2 ..., Rn have repeat element, then by thisThe ranking of a little repeat elements moves forward, and mobile distance is determined according to the interest-degree of this interest keyword;
If having m interest keyword in user interest keywords database X, needs to do m information retrieval movement, finally adjustThe search result of whole completion is exported as final result.
The update of interest keywords database X: when each user inputs search key, search key is added to interest and is closedIn the X of keyword library, a new interest keyword is formed, and the corresponding interest-degree of interest keyword is initialized as a static stateValue;If some search key t has existed in interest keywords database X, then the corresponding interest-degree w of the interest keyword is added1。
Meanwhile the interest level of be interested in keyword is done into attenuation operations after retrieval every time, that is, reduce by a numerical value e.This numerical value reflects the speed of interest attenuation, can be a fixed value, such as 0.01, can also be related to the retrieval habit of user,Adaptive study is done to determine.If interest-degree is decayed to less than or equal to 0, then by its corresponding interest keyword from interest keywordIt is deleted in the X of library, to keep the fresh and alive property of interest keywords database.
It include interest keyword and search key in the keyword set.
Compared with prior art, the beneficial effects obtained by the present invention are as follows it is as follows:
1, interest first is constructed for each user when carrying out information retrieval using tri- steps of abc of the present inventionKeywords database X is first to carry out retrieval acquisition as a result, being further added by the interest keyword of user using search key in retrievalSearch result is obtained into keyword set, finally the ranking of duplicate element moves forward, mobile distance is according to interest keyThe interest-degree of word determines.Such mode adjusts each information retrieval as a result, being all based on user interest keywords databaseIt is whole, the search result of user individual is exported, search result is made more to match the demand of user.
2, this method is using being updated interest keywords database X, be according to the Information Retrieval Behaviors of each user, toFamily interest keywords database carries out dynamic adjustment, so that system constantly deepens the understanding to user, so that future retrieval resultIts interest is more matched, search result is more accurate.
Specific embodiment
As best implementation of the invention, it discloses a kind of personalized document retrieval methods, and its step are as follows:
A, user information static library: identity information and research field including being not limited to user is constructed for each user,And searching system is input to by user;
B, the interest keywords database X of user is constructed for each user: crucial including multiple interest keywords and each interestThe corresponding interest-degree of word;Interest keywords database X-form is expressed as x1, x2 ..., xm, wherein m is non-zero natural number, forEach element x=(k, w), wherein k is interest keyword, and w is the corresponding interest-degree of interest keyword, interest keywords database XIt is initialized as the Focus Area that user inputs in step a, and assigns interest-degree unification to a quiescent value;
C, information retrieval: when user carries out information retrieval, the keyword set for setting input is combined into Q, is retrieved, and is examinedHitch fruit R1, R2 ..., Rn, n is non-zero natural number;Each interest keyword in the interest keywords database X of user is added againEnter into keyword set, then retrieved, obtained search result is such as and R1, R2 ..., Rn have repeat element, then by thisThe ranking of a little repeat elements moves forward, and mobile distance is pressed linear scale according to the interest-degree w of this interest keyword and determined;
If having m interest keyword in user interest keywords database X, needs to do m information retrieval movement, finally adjustThe search result of whole completion is exported as final result.
The update of interest keywords database X: when each user inputs search key, search key is added to interest and is closedIn the X of keyword library, a new interest keyword is formed, and its corresponding interest-degree is initialized as a quiescent value;Such as someSearch key t has existed in interest keywords database X, then the corresponding interest-degree w of the interest keyword is added 1.
Meanwhile the interest level of be interested in keyword is done into attenuation operations after retrieval every time, that is, reduce by a numerical value e.This numerical value reflects the speed of interest attenuation, can be a fixed value, such as 0.01, can also be related to the retrieval habit of user,Adaptive study is done to determine.If interest-degree is decayed to less than or equal to 0, then by its corresponding interest keyword from interest keywordIt is deleted in the X of library, to keep the fresh and alive property of interest keywords database.
It include interest keyword and search key in keyword set in the present embodiment.
This method in actual application, dynamic user interest keywords database X, interest keyword including user andCorresponding interest-degree, to each information retrieval as a result, being adjusted based on user interest keywords database, to export user personalityThe search result of change;Meanwhile according to the Information Retrieval Behaviors of each user, dynamic adjustment is carried out to user interest keywords database,So that system constantly deepens the understanding to user, so that future retrieval result more matches its interest, search result is moreAccurately.