Disclosure of Invention
The applicant finds that the bias of the search results is different for different users due to different using habits in the research process, so that the difference of the users is considered in the ranking process, and the ranking result which is closer to the behavior pattern of the users is beneficial to obtaining.
The application provides a search result ordering method and device, and aims to solve the problem of how to consider the difference of users in the ordering process of search results and obtain ordering results closer to the behavior mode of the users.
In order to achieve the above object, the present application provides the following technical solutions:
a search result ordering method, comprising:
obtaining a search result according to the search keyword input by the user;
selecting a target scene from historical scenes according to the search keywords and the historical behavior data of the user, wherein the historical scenes are determined by the historical search keywords and sample behavior data, and the sample behavior data is data of user behaviors which occur in search results obtained by the historical search keywords;
obtaining sequencing weights corresponding to the target scenes;
and ranking the search results based in part on the ranking weights.
Optionally, the selecting a target scene from the historical scenes according to the search keyword and the historical behavior data of the user includes:
extracting first-class data, wherein the first-class data is scene characteristic data extracted from scenes formed by the search keywords and the historical behavior data of the user;
extracting second class data, wherein the second class data is scene characteristic data extracted from the historical scene;
searching data closest to the first type data from the second type data to obtain target data;
the historical scene with the target data is the target scene.
Optionally, the determining process of the history scene includes:
extracting scene feature data for each sample, the sample comprising a combination of historical search keywords and sample behavior data;
and carrying out clustering operation on scene feature data of a plurality of samples to obtain a clustering center, wherein the clustering center is the historical scene.
Optionally, the method for generating the ranking weight corresponding to the historical scene includes:
taking the sample behavior data of the historical scene and the historical search keywords as positive sample data;
taking the attribute of the object which does not generate user behavior and the historical search keyword in the search result object of the historical search keyword of the historical scene as negative sample data;
acquiring the weight of the positive sample and the weight of the negative sample, wherein the weight of the positive sample of any one historical scene is determined by the weighted sum of user behaviors occurring in the search result objects of the historical scene, and the weight of the negative sample of any one historical scene is determined by the inverse of the occurrence times of the objects occurring in the user behaviors in all the historical scenes in the search result objects of the historical scene;
and determining the sequencing weight corresponding to the historical scene according to the positive sample data, the negative sample data, the weight of the positive sample and the weight of the negative sample.
Optionally, the historical behavior data of the user includes:
the identity of the user, the type of the user's historical behavior, and the attributes of the user's historical behavior objects;
the sample behavior data includes: the identity of the user performing the behavior, the type of behavior, and the properties of the behavior object.
A search result ordering apparatus, comprising:
the search module is used for obtaining search results according to the search keywords input by the user;
the selection module is used for selecting a target scene from historical scenes according to the search keywords and the historical behavior data of the user, wherein the historical scenes are determined by the historical search keywords and sample behavior data, and the sample behavior data are data of user behaviors which occur in search results obtained by the historical search keywords;
the acquisition module is used for acquiring the sequencing weight corresponding to the target scene;
and the sorting module is used for sorting the search results based on the sorting weight.
Optionally, the selecting module is specifically configured to:
extracting first-class data, wherein the first-class data is scene characteristic data extracted from scenes formed by the search keywords and the historical behavior data of the user; extracting second class data, wherein the second class data is scene characteristic data extracted from the historical scene; searching data closest to the first type data from the second type data to obtain target data; the historical scene with the target data is the target scene.
Optionally, the method further comprises:
a scene determination module for determining the historical scene using the following method: extracting scene feature data for each sample, the sample comprising a combination of historical search keywords and sample behavior data; and carrying out clustering operation on scene feature data of a plurality of samples to obtain a clustering center, wherein the clustering center is the historical scene.
Optionally, the method further comprises:
the sorting weight determining module is used for taking the sample behavior data of the historical scene and the historical search keywords as positive sample data; taking the attribute of the object which does not generate user behavior and the historical search keyword in the search result object of the historical search keyword of the historical scene as negative sample data; acquiring the weight of the positive sample and the weight of the negative sample, wherein the weight of the positive sample of any one historical scene is determined by the weighted sum of user behaviors occurring in the search result objects of the historical scene, and the weight of the negative sample of any one historical scene is determined by the inverse of the occurrence times of the objects occurring in the user behaviors in all the historical scenes in the search result objects of the historical scene; and determining the sequencing weight corresponding to the historical scene according to the positive sample data, the negative sample data, the weight of the positive sample and the weight of the negative sample.
Optionally, the historical behavior data of the user includes:
the identity of the user, the type of the user's historical behavior, and the attributes of the user's historical behavior objects;
the sample behavior data includes: the identity of the user performing the behavior, the type of behavior, and the properties of the behavior object.
A computer readable medium having instructions stored therein which, when executed on a computer, cause the computer to perform the following functions: obtaining a search result according to the search keyword input by the user; selecting a target scene from historical scenes according to the search keywords and the historical behavior data of the user, wherein the historical scenes are determined by the historical search keywords and sample behavior data, and the sample behavior data is data of user behaviors which occur in search results obtained by the historical search keywords; obtaining sequencing weights corresponding to the target scenes; and ranking the search results based in part on the ranking weights.
According to the search result ordering method and device, the scene is determined according to the historical behavior data and the search keywords of the user, the target scene is determined from the historical scene, and the ordering weight corresponding to the target scene is obtained.
Detailed Description
The embodiment of the application discloses a search result ordering method which can be applied to a search engine, wherein the search engine can be arranged in a website, for example, the search engine on an e-commerce website.
After the search engine receives the keywords input by the user, the search results can be ordered according to the historical behavior data and the keywords of the user, so that the search result ordering which is closer to the behavior mode of the user is obtained.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Fig. 1 is a diagram of a search result sorting method disclosed in an embodiment of the present application, applied to a search engine, including the following steps:
s101: and receiving search keywords input by a user.
S102: and searching according to the search keywords to obtain search results.
S103: historical behavior data of a user is obtained.
Specifically, after receiving a search keyword input by a user, the user identifier may be obtained, and the historical behavior data of the user may be obtained according to the user identifier.
In this embodiment, the historical behavior data of the user is data generated by the historical operation behavior of the user on the website, and may be obtained from a server of the website. Historical behavioral data of a user includes, but is not limited to: the identity of the user, the type of historical behavior of the user, and the attributes of the historical behavior object of the user.
Taking e-commerce websites as an example, historical behavior data of a user includes, but is not limited to: a user name, the type of historical behavior of the user name (including but not limited to click, browse, collect, join shopping carts, and deals, etc.), and attributes of the merchandise that are the subject of the user's historical behavior, such as the price of the merchandise.
S104: a scene is formed by using the search keywords input by the user and the historical behavior data of the user, and is marked as (q, s), wherein q represents the search keywords, and s represents the historical behavior data of the user.
S105: extracting data x1, x2 … … xn representing scene features f1, f2 … … fn from (q, s) (x 1 being data representing feature f1, … … xn being data representing feature fn), wherein the scene features include, but are not limited to: the category to which the keyword belongs, the category to which the user's historical behavior object belongs, and the attribute of the object of the user's historical behavior.
Taking an e-commerce website as an example, f1, f2 … … fn are as follows: the user name, the category to which the historical behavior commodity corresponding to the user name belongs, and the price of the historical behavior commodity corresponding to the user name.
S106: the scene closest to (q, s) is found from the historical scenes, and is called the target scene.
The historical scene is a scene formed by historical search keywords and sample behavior data. The historical search keywords are keywords searched by a search engine. For any one historical search keyword, the sample behavior data is the search result object of the historical keyword and the data of the occurred user behavior.
The sample behavior data may include: the identity of the user performing the behavior, the type of behavior, and the properties of the behavior object.
The sample behavior data includes historical behavior data of a plurality of users (e.g., all users) in search results of the historical keyword. Each historical search keyword and each sample behavior data form a historical scene. Thus, the history keywords and the history behavior data of the user constitute a plurality of history scenes. The specific determination of the history scenario will be described in detail in the embodiment shown in fig. 3.
The history scene closest to (q, s) is the history scene having the feature data closest to x1, x2 … … xn.
S107: and obtaining the sequencing weight corresponding to the target scene.
The sorting weight refers to a weight value of each feature data participating in subsequent sorting.
The weight corresponding to the historical scene is determined according to sample behavior data in the search results of the historical search keywords. The specific manner will be described in the embodiment shown in fig. 4.
S108: data y1, y2 … … yn representing ranking features m1, m2 … … mn in each search result is extracted, and ranking scores of the search results are calculated using ranking weights of the ranking features m1, m2 … … mn.
Specifically, the search results are scored using equation (1).
Wherein w isi1 ,wi2 ,…,win The ranking weights for the features m1, m2 … … mn, respectively, and y1, y2 … … yn are the values of the ranking features m1, m2 … … mn, respectively (i.e., data representing the ranking features m1, m2 … … mn).
The ranking features m1, m2 … … mn can be preset. For example, m1, m2 … … mn include, but are not limited to: purchasing power of the user and quality of the commodity.
S109: and displaying the search results according to the sorting scores.
For example, the search results are presented in a ranking from high to low.
Fig. 2 is an illustration of the process shown in fig. 1:
in fig. 2, it is assumed that both user a and user B input a search keyword "sweater" at the e-commerce web site to search. After receiving the sweater input by the user A, the search engine obtains the search result. Then obtaining historical behavior data of the user A, and extracting scene characteristic data from (sweater, historical behavior data of the user A): "category to which keyword belongs: clothing; the category(s) to which the item purchased or clicked by the user belongs: clothing; the price of the commodity purchased or clicked by the user is 1 (1 represents that the average price file is 1 file) ", and according to the characteristic data, the scene corresponding to the user A is obtained from the historical scene query as" scene_1", and then the sequencing characteristic data under the scene of" scene_1 "are obtained: the ranking weight of "user_power, commodity quality score auc _quality" is "1.0,1.0", and finally, the value of the feature "user_power, auc _quality" is extracted from each search result, the search result is scored using formula (1), and the search result is displayed to the user a in the order of the score from high to low, as shown in fig. 2 (a).
Using the same procedure, the category to which the "keyword" belongs "is based on the historical behavior data of user B: clothing, category(s) to which the item purchased or clicked by user B belongs: clothing; the price of the commodity purchased or clicked by the user is 5 (5 represents that the average price grade is 5 grade, and the price of 5 grade is higher than the price of 1 grade), and the ordering feature data under the scene 'scene_2' is obtained: the ranking weight of "user_power, commodity quality score auc _quality" is "2.0,1.0", and the search result ranking is finally obtained and displayed to the user B, as shown in fig. 2 (B).
As can be seen from fig. 2, although the user a and the user B input the same search keyword, because of the difference of the historic behaviors of the user a and the user B, in this example, the difference is mainly reflected in the price grades of the commodities of the historic behaviors, so that the sorting weights are different, that is, the sorting weight of the sorting feature "user_power" of the user B is larger, the search results finally displayed to the user a and the user B are different, and the price of the first few commodities in the commodities displayed to the user a is obviously lower than the price of the first few commodities in the commodities displayed to the user B.
It can be seen that, in the search result sorting method shown in fig. 1, a scene is determined according to the historical behavior data and the search keywords of the user, a target scene is determined from the historical scene, and then a sorting weight corresponding to the target scene is obtained.
In the above embodiment, step 3 in the process of obtaining the ranking weight determined using the search keyword and the user's historical behavior data may be implemented using a pre-trained scene model.
FIG. 3 is a training and application process of a scene model, comprising the steps of:
s301: the historical search keywords and sample behavior data for each historical search keyword are collected.
The historical search keywords are keywords searched by a search engine. The sample behavior data includes historical behavior data of a plurality of users (e.g., all users). The sample behavior data may include: the identity of the user performing the behavior, the type of behavior, and the properties of the behavior object.
For example, for an e-commerce web site, the historical search keywords are all keywords that were once searched by the search engine, such as "towel," "package," "sofa," and the like.
S302: for each combination of historical search keywords and sample behavior data, i.e., each sample, data x1, x2 … … xn representing scene features f1, f2 … … fn is extracted.
That is, a plurality of historical search keywords form a keyword set, sample behavior data form a sample behavior data set, and any one keyword in the keyword set and any one sample line data in the sample behavior data set can form one sample.
S303: and carrying out clustering operation on the characteristics of all samples, wherein the obtained clustering center of each type is a scene. The plurality of scenes constitutes a scene model.
In particular, the specific algorithm of clustering can be found in the prior art. For example, clustering is performed using a k-means clustering algorithm.
Alternatively, after clustering is completed, a unique identifier may be assigned to each cluster center, i.e., each scene in the scene model.
Based on the scene model, the specific implementation mode of searching the scene closest to (q, s) from the historical scenes is as follows: calculating distances from x1, x2 … … xn extracted from (q, s) and each scene using a distance function shown in formula (2), and taking the scene closest to the distance as a scene represented by (q, s), namely determining the identification of the scene represented by (q, s):
wherein x isi Is (q)1 ,s1 ) Is characterized by yi Is (q)2 ,s2 ) Is characterized by (3).
Fig. 4 is a process of obtaining ranking weights corresponding to historical scenes, including the following steps:
s401: historical search keywords and historical behavior data of the user are collected.
S402: for each piece of sample data, the weight of the positive sample and the weight of the negative sample are extracted.
The sample data is all data of search results of a historical search keyword, and comprises data of a search result object, data of a user on the occurrence of the search result object and data of the search result object without the occurrence of the user behavior.
The positive samples in one piece of sample data are objects with user behaviors, and the negative samples are objects without user behaviors.
The positive sample data is sample behavior data (specifically including user identification, behavior type, and attributes of objects), and the negative sample data includes historical search keywords and attributes of the negative sample.
The positive sample of the j-th sample is weighted as posj The weight of the negative sample is noted as negj Wherein:
posj =satj
neg>=0,pos>=0。
naucj the number of times a historical behavioral object (e.g., good) for the user in sample j appears in the sample set (sample set, i.e., all samples). The parameter m is a preset value, and is the same for each sample, and the specific value of the parameter m can be set to different values according to different sample sets.
acti For the number of times the i-th behavior appears in the sample set, wi Is the preset weight of the i-th class of behavior.
From statistical principle analysis, posj Can be regarded as the number of times the user behavior object is operated, neg, in one samplej Can be seen as the probability that an object in a sample, where the user behavior occurs, is manipulated. That is, in the present embodiment, not only the click rate of the object (e.g., commodity), i.e., neg, is consideredj The number of clicks of an object (e.g., commodity), i.e., pos, is also consideredj 。
Since the weighted sum of behaviors is used to represent the number of times that the user behavior object is operated, the number of times of user behavior can be reduced, which brings about a large fluctuation in the probability of user behavior. For objects with fewer operands, such as newly online merchandise, a more fair ordering weight can be provided.
S403: from each extracted sample, the values y1, y2 … … yn of the ranking features m1, m2 … … mn, m1, m2 … … mn are extracted.
The above m1, m2 … … mn and f1, f2 … … fn may be the same or different.
S404: n ordering features m1, m 2..mn and m ordering scenes c1, c 2..cm, two-two combination features were constructed:
ci _m,1<=i<=m,1<=j<=n
ci the eigenvalue of_m is yj.
Wherein, m ordered scenes are obtained from the process shown in fig. 3, namely, the scenes determined by the respective cluster centers.
S405: c of one samplei The eigenvalue yj of_m and the weight (neg of the samplej ,posj ) As inputs to the logistic regression model, positive sample data and negative sample data are used as logistic regression modelsObtaining the sorting weight w of the strip sampleij 。
The specific implementation calculation principle of the logistic regression model can be referred to in the prior art, and will not be described herein.
For a sort scene ci Ordering features m1, m2,..mn weight is wi1 ,wi2 ,…,win . The weights of the ranking features corresponding to the plurality of historical scenes form a scene weight model.
In the process shown in fig. 4, the satisfaction rate and the satisfaction number are used as the basis for calculating the weight, so that the fluctuation of the commodity with fewer showing times is reduced, and the influence of the number of showing times on the search result ordering is reduced.
Further, the ranking model may be trained according to the scene model obtained in fig. 3 and the scene weight model obtained in fig. 4, so as to obtain a ranking model with a function of inputting ranking weights according to the search keywords input by the user and the historical behavior characteristics of the user.
Fig. 5 is a search result sorting apparatus disclosed in an embodiment of the present application, including: the device comprises a searching module, a selecting module, an acquiring module and a sequencing module. Optionally, a scene determination module and a ranking weight determination module may also be included.
The search module is used for obtaining search results according to the search keywords input by the user. The selection module is used for selecting a target scene from historical scenes according to the search keywords and the historical behavior data of the user, the historical scenes are determined by the historical search keywords and sample behavior data, and the sample behavior data are data of user behaviors which occur in search results obtained by the historical search keywords. The acquisition module is used for acquiring the sequencing weight corresponding to the target scene. The ranking module is configured to rank the search results based in part on the ranking weights. The scene determination module is used to determine a historical scene using the method shown in fig. 3. The ranking weight determining module is used to determine ranking weights for the method shown in fig. 4.
The specific implementation process of the functions of the above modules may refer to the above method embodiments, and are not described herein again.
The apparatus shown in fig. 5 is capable of presenting personalized search results for users having different historical behaviors.
The functions described in the methods of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.