A kind of data processing method based on user journalTechnical field
The present invention relates to areas of information technology are and in particular to a kind of data processing method based on user journal.
Background technology
Journal file produces in system operation, and it is able to record that the operation row of the operation conditions of system and userFor when system operation is slow or abnormal, system problem can be solved by checking journal file, recover normal and run.UserDaily record is also a kind of important information source, in social network sites or business web site, can be by the excavation to user journalFind out the potential access pattern of user, design the webpage that more convenient user accesses.
User journal is applied in search field, is divided into based on the inquiry of daily record:Correlation rule is recommended, clustering method pushes awayRecommend, Annual distribution is recommended.In the method for correlation rule, query phrase is considered as the item of correlation rule, inquiry log is regarded as meetingThe set of words, thus recommend the high frequency vocabulary in session;Clustering method is query string to be carried out cluster find relevant inquiring, the partyMethod needs the daily record data enriching in a large number to support;Annual distribution is recommended, and needs the search rate considering similar inquiry in the timeIt is that similar, special time point generally has special inquiry and recommendation in distribution, this kind of method can be used as other methodsSupplement.
Traditional inquiry mode is when user inquires about, and server just carries out the calculating of relevant inquiring field it is impossible to realizeCalculate in real time, computationally intensive, inquiry velocity relatively is slower, and the requirement to data base is higher, no longer adapts to present inspectionCable system growth requirement.
Content of the invention
It is an object of the invention to the problem above overcoming prior art to exist, provide a kind of data based on user journalProcessing method, the present invention calculates in real time, can retrieve acquisition Query Result faster.
For realizing above-mentioned technical purpose, reach above-mentioned technique effect, the present invention is achieved through the following technical solutions:
A kind of data processing method based on user journal, comprises the following steps:
S 101 real-time collecting user journal, selects to the user journal of real-time collecting, obtains effective user's dayWill, sets up the first data set;
S 102 is marked to the user journal in described first data set, and the user journal after labelling sets up the second numberAccording to collection;
S 103 carries out real-time budget in described second data set, sets up dynamic budget data set;
The inquiry field of user is mated by S 104 with the data in described dynamic budget data set, what the match is successfulData will be pushed to user as Query Result, if it fails to match, carry out next step;
S 105 extracts from described second data set and inquires about the user journal data that field has similarity, structure with userBuild the 3rd data set;
S 106 classifies to the user journal data in described 3rd data set, by same or analogous inquiry fieldAs query string, or label symbol cluster identical user journal is classified, or enquiry frequency time identical userDaily record is classified, and described sort module builds the 4th data set;
S 107 sets up linear regression model (LRM) according to rule searching, and the user journal with inquiry fields match is put into linearlyIn regression model, composite model after being processed, calculate the degree of association of each inquiry field;
S 108 inquires the user journal conduct matching with the inquiry field of user input in described 4th data setQuery set, builds the 5th data set;
S 109, in described 5th data set, is ranked up according to the degree of association that described first data processing module obtainsProcess, finally determine that N number of result, as Query Result, is pushed to user.
Preferably, the user journal in S101 passes through to collect user journal end real-time collecting.
Preferably, described collection user journal end can self-defined user journal, according to self-defined journal format, daily record classType, log content, daily record key character, selectively collect user journal.
Preferably, the described user journal collected at user journal end of collecting temporarily is stored.
Preferably, in S102, label symbol includes:Historical query field, query string, time, cluster name.
Preferably, in S103 the source of the pre- frequency according to historical query field and user journal at last calculating user againThe probability of secondary inquiry, and the size sequence according to probability.
Preferably, described 1≤N≤10, N is integer.
The invention has the beneficial effects as follows:
The present invention is the mode based on user journal, calculates in real time, can retrieval recommendation results quickly, this systemBudget module can shift to an earlier date budget result, then mated by matching module, if the match is successful, directly pushed to useFamily, in advance budget result improve push result efficiency, without budget in advance to result, then calculated.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,And can be practiced according to the content of description, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.The specific embodiment of the present invention is shown in detail in by following examples and its accompanying drawing.
Brief description
In order to be illustrated more clearly that the technical scheme in embodiment of the present invention technology, below will be in the description of embodiment technologyThe accompanying drawing of required use do simple introduce it should be apparent that, drawings in the following description be only the present invention some are realApply example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to these accompanying drawingsObtain other accompanying drawings.
Fig. 1 is the flow chart of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, completeSite preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based onEmbodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative workEmbodiment, broadly falls into the scope of protection of the invention.
With reference to shown in Fig. 1, a kind of data processing method based on user journal, the method be based on collect user journal end andData processing end, the Operation Log of above-mentioned collection user journal end real-time collecting user side, and the user journal collected is transmittedTo above-mentioned data processing end.
Above-mentioned collection user journal end can self-defined user journal, according to self-defined journal format, Log Types, daily recordContent, daily record key character, selectively collect user journal, and the user journal that above-mentioned collection user journal end is collected is carried out temporarilyWhen storage.
Above-mentioned data processing end can calculate to the user journal of real-time collecting, and budget result in advance can be very fastPush Query Result, if not having budget to arrive in advance, recalculate.
Specifically, comprise the following steps:
S 101 real-time collecting user journal, selects to the user journal of real-time collecting, obtains effective user's dayWill, sets up the first data set.
S 102 is marked to the user journal in above-mentioned first data set, and the user journal after labelling sets up the second numberAccording to collection;Label symbol includes:Historical query field, query string, time, cluster name.
S 103 carries out real-time budget in above-mentioned second data set, sets up dynamic budget data set;In advance at last according to historyInquire about the frequency of field and the source of user journal to calculate the probability that user inquires about again, and the big float according to probabilitySequence.
The inquiry field of user is mated by S 104 with the data in above-mentioned dynamic budget data set, what the match is successfulData will be pushed to user as Query Result, if it fails to match, carry out next step.
S 105 extracts from above-mentioned second data set and inquires about the user journal data that field has similarity, structure with userBuild the 3rd data set.
S 106 classifies to the user journal data in above-mentioned 3rd data set, by same or analogous inquiry fieldAs query string, or label symbol cluster identical user journal is classified, or enquiry frequency time identical userDaily record is classified, and above-mentioned sort module builds the 4th data set.
S 107 sets up linear regression model (LRM) according to rule searching, and the user journal with inquiry fields match is put into linearlyIn regression model, composite model after being processed, calculate the degree of association of each inquiry field.
S 108 inquires the user journal conduct matching with the inquiry field of user input in above-mentioned 4th data setQuery set, builds the 5th data set.
S 109, in above-mentioned 5th data set, is ranked up according to the degree of association that above-mentioned first data processing module obtainsProcess, finally determine that N number of result, as Query Result, is pushed to user.
Above-mentioned 1≤N≤10, N is integer.
Said method is the mode based on user journal, calculates in real time, can retrieval recommendation results quickly, this isThe budget module of system can shift to an earlier date budget result, then is mated by matching module, if the match is successful, directly pushes toUser, in advance budget result improve push result efficiency, without budget in advance to result, then calculated.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention.Multiple modifications to these embodiments will be apparent from for those skilled in the art, as defined hereinGeneral Principle can be realized without departing from the spirit or scope of the present invention in other embodiments.Therefore, the present inventionIt is not intended to be limited to the embodiments shown herein, and be to fit to and principles disclosed herein and features of novelty phase oneThe scope the widest causing.