[ summary of the invention ]
In view of this, the present invention provides a method and an apparatus for identifying consumption intents, which are used for identifying consumption intents of a user and are helpful for more accurate information delivery for the user.
The specific technical scheme is as follows:
a method of identifying a consumption intent, the method comprising:
the process of establishing the consumption intention recognition model comprises the following steps:
s11, screening out related behavior logs of a set consumption field from historical behavior logs of each user;
s12, determining a behavior log corresponding to the behavior before purchase and a behavior log corresponding to the behavior after purchase based on behavior pattern analysis of the screened related behavior logs;
s13, selecting the behavior logs meeting the training data screening condition from the behavior logs determined in the step S12 as training samples;
s14, extracting feature training classification models from the training samples to obtain consumption intention recognition models corresponding to the set consumption fields, wherein the consumption intention recognition models can recognize behaviors before purchase and behaviors after purchase;
process of identifying consumption intents:
s21, determining the consumption field of the user to be identified;
s22, classifying the related behavior logs of the user to be recognized in the determined consumption field in a period of time by using the consumption intention recognition model corresponding to the determined consumption field, and obtaining whether the consumption intention of the user to be recognized is before or after purchase.
According to a preferred embodiment of the present invention, the step S11 specifically includes:
matching the historical behavior logs of the users with n keyword lists corresponding to the set consumption field respectively, and screening out the behavior logs simultaneously containing the keywords in the n keyword lists, wherein n is a positive integer; or,
and matching the historical behavior logs of the users with the expression templates corresponding to the set consumption fields respectively, and screening out the behavior logs matched with the expression templates.
According to a preferred embodiment of the present invention, the behavior pattern analysis performed on the screened related behavior logs in step S12 is:
manually analyzing the behavior pattern of the screened related behavior logs; or,
and analyzing the behavior pattern according to keywords which are contained in the related behavior logs and indicate the consumption intention, wherein the keywords which indicate the consumption intention comprise keywords which indicate the intention before purchase or keywords which indicate the intention after purchase.
According to a preferred embodiment of the present invention, the training data screening condition includes one or any combination of the following conditions:
the number of logs contained in the behavior log of the user is greater than or equal to a preset number threshold;
the proportion of keywords indicating the consumption intention in the behavior log is greater than or equal to a preset proportion threshold, and the keywords indicating the consumption intention are keywords indicating the intention before purchase or keywords indicating the intention after purchase;
the proportion of the occurrence frequency of the brand which appears most in the behavior log to the occurrence frequency of all the brands in the behavior log exceeds a preset proportion threshold.
According to a preferred embodiment of the present invention, the keywords indicating the consumption intention are mined as follows:
a1, determining seed words indicating the consumption intention aiming at the set consumption field;
a2, classifying the sample logs of the set consumption field by using the seed words to obtain pre-purchase sample logs and post-purchase sample logs;
a3, respectively carrying out word segmentation and useless word filtering on the sample log before purchase and the sample log after purchase, and then carrying out word frequency statistics on each word;
and A4, determining a keyword for indicating the intention before purchase and a keyword for indicating the intention after purchase based on the word frequency of each word in the sample log before purchase and the sample log after purchase respectively.
According to a preferred embodiment of the present invention, the features extracted from the training samples in step S14 include at least one or any combination of the following features:
the number of logs;
the number of logs occupied by the brand with the largest occurrence number;
the number of logs occupied by brands occurring many times;
the number of logs occupied by the model with the largest occurrence frequency;
the number of logs occupied by the models which are many times;
the proportion of keywords indicating the consumption intention;
the appearance position of a keyword indicating a consumption intention;
log proportion with query;
the proportion of keywords contained in the query and used for indicating the intention of the user;
log crossing duration;
the maximum crossing time of the log occupied by the same brand;
the maximum spanning time of the log occupied by the same model.
According to a preferred embodiment of the present invention, the step S13 specifically includes: selecting N behavior log sets respectively meeting N sets of training data screening conditions from the behavior logs determined in the step S12;
the step S14 specifically includes: and respectively taking each behavior log set as a group of training samples to train a classification model to obtain N candidate models, and selecting the optimal one from the N candidate models as a consumption intention recognition model corresponding to the set consumption field.
According to a preferred embodiment of the present invention, the most preferred one is: and the one of the N candidate models which meets the requirement of a preset recall rate and has the highest accuracy rate.
According to a preferred embodiment of the present invention, after the step S22, the method further includes:
and S23, displaying information to the user to be identified according to the display strategy corresponding to the consumption intention of the user to be identified.
According to a preferred embodiment of the present invention, the step S23 specifically includes:
according to the consumption intention of the user to be identified, showing promotion information of a type corresponding to the consumption intention of the user to be identified to the user to be identified; or,
and in the search results displayed to the user to be identified, the information corresponding to the consumption intention of the user to be identified is sorted in the search results in advance.
An apparatus for recognizing consumption intention, the apparatus comprising a modeling unit and a recognition unit;
the modeling unit includes:
the first screening subunit is used for screening out the related behavior logs of the set consumption field from the historical behavior logs of each user;
the pattern analysis subunit is used for determining a behavior log corresponding to the behavior before purchase and a behavior log corresponding to the behavior after purchase based on behavior pattern analysis performed on the screened related behavior logs;
the second screening subunit is used for selecting the behavior logs meeting the training data screening conditions from the behavior logs determined by the pattern analysis subunit as training samples;
the model training subunit is used for extracting a feature training classification model from a training sample to obtain a consumption intention recognition model corresponding to the set consumption field, wherein the consumption intention recognition model can recognize behaviors before purchase and behaviors after purchase;
the identification unit includes:
the domain determining subunit is used for determining the consumption domain of the user to be identified;
and the model classification subunit is used for classifying the related behavior logs of the user to be recognized in the determined consumption field in a near period of time by using the consumption intention recognition model corresponding to the consumption field determined by the field determination subunit, so as to obtain whether the consumption intention of the user to be recognized is before or after purchase.
According to a preferred embodiment of the present invention, the first screening subunit specifically performs:
matching the historical behavior logs of the users with n keyword lists corresponding to the set consumption field respectively, and screening out the behavior logs simultaneously containing the keywords in the n keyword lists, wherein n is a positive integer; or,
and matching the historical behavior logs of the users with the expression templates corresponding to the set consumption fields respectively, and screening out the behavior logs matched with the expression templates.
According to a preferred embodiment of the present invention, the pattern analysis subunit performs behavior pattern analysis on the screened related behavior logs manually, or performs behavior pattern analysis according to keywords indicating consumption intentions included in the related behavior logs, where the keywords indicating consumption intentions include keywords indicating pre-purchase intentions or keywords indicating post-purchase intentions.
According to a preferred embodiment of the present invention, the training data screening condition includes one or any combination of the following conditions:
the number of logs contained in the behavior log of the user is greater than or equal to a preset number threshold;
the proportion of keywords indicating the consumption intention in the behavior log is greater than or equal to a preset proportion threshold, and the keywords indicating the consumption intention are keywords indicating the intention before purchase or keywords indicating the intention after purchase;
the proportion of the occurrence frequency of the brand which appears most in the behavior log to the occurrence frequency of all the brands in the behavior log exceeds a preset proportion threshold.
According to a preferred embodiment of the present invention, the modeling unit further includes an intention keyword mining subunit configured to mine the keyword indicative of the consumption intention in the following manner:
a1, determining seed words indicating the consumption intention aiming at the set consumption field;
a2, classifying the sample logs of the set consumption field by using the seed words to obtain pre-purchase sample logs and post-purchase sample logs;
a3, respectively carrying out word segmentation and useless word filtering on the sample log before purchase and the sample log after purchase, and then carrying out word frequency statistics on each word;
and A4, determining a keyword for indicating the intention before purchase and a keyword for indicating the intention after purchase based on the word frequency of each word in the sample log before purchase and the sample log after purchase respectively.
According to a preferred embodiment of the present invention, the features extracted from the training sample by the model training subunit when training the classification model and the features extracted from the log of the related behavior of the user to be recognized in the determined consumption field by the model classification subunit when classifying by using the consumption intention recognition model at least include but are not limited to one or any combination of the following features:
the number of logs;
the number of logs occupied by the brand with the largest occurrence number;
the number of logs occupied by brands occurring many times;
the number of logs occupied by the model with the largest occurrence frequency;
the number of logs occupied by the models which are many times;
the proportion of keywords indicating the consumption intention;
the appearance position of a keyword indicating a consumption intention;
log proportion with query;
the proportion of keywords contained in the query and used for indicating the intention of the user;
log crossing duration;
the maximum crossing time of the log occupied by the same brand;
the maximum spanning time of the log occupied by the same model.
According to a preferred embodiment of the present invention, the second screening subunit selects N behavior log sets respectively satisfying N sets of training data screening conditions from the behavior logs determined by the pattern analysis subunit;
and the model training subunit takes each behavior log set as a group of training sample training classification models respectively to obtain N candidate models, and selects the optimal one of the N candidate models as a consumption intention recognition model corresponding to the set consumption field.
According to a preferred embodiment of the present invention, the most preferred one is: and the one of the N candidate models which meets the requirement of a preset recall rate and has the highest accuracy rate.
According to a preferred embodiment of the present invention, the identification unit further comprises: and the information display subunit is used for displaying information to the user to be identified according to the display strategy corresponding to the consumption intention of the user to be identified.
According to a preferred embodiment of the present invention, the information display subunit specifically performs:
according to the consumption intention of the user to be identified, showing promotion information of a type corresponding to the consumption intention of the user to be identified to the user to be identified; or,
and in the search results displayed to the user to be identified, the information corresponding to the consumption intention of the user to be identified is sorted in the search results in advance.
According to the technical scheme, the consumption intention model can be established by utilizing the historical behavior log of the user, and the consumption intention of the user is identified by utilizing the established consumption intention model, namely whether the user is an intention before purchase or an intention after purchase is identified, so that more accurate information delivery is facilitated for the user.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The invention realizes the recognition of the consumption intention by analyzing the historical behavior log of the user, and the method is mainly divided into two processes: one is a process of establishing a consumption intention recognition model by analyzing a large number of user historical behavior logs; the other is a process of identifying the consumption intention of the user by using the consumption intention identification model. The two processes are described below by way of example one and example two, respectively.
The first embodiment,
Fig. 1 is a flowchart of a method for creating a consumption intention recognition model according to an embodiment of the present invention, where the consumption intention recognition model created in the embodiment of the present invention is respectively specific to different consumption fields, and each consumption field has a consumption intention recognition model corresponding to the consumption field, for example, a consumption intention recognition model in a digital consumption product field, a consumption intention recognition model in a home appliance field, a consumption intention recognition model in a home property field, a consumption intention recognition model in an automobile field, a consumption intention recognition model in a cosmetics field, and the like. The embodiment describes one of the consumption recognition model establishing methods in the consumption field, as shown in fig. 1, the method includes the following steps:
step 101: and screening out a related behavior log of a set consumption field from the historical behavior logs of each user.
In this step, the behavior logs related to the set consumption field are actually screened from the historical behavior logs of a large number of users, so that the selected behavior logs are used as data sources required by establishing a subsequent consumption recognition model of the consumption field. For example, when a consumption intention recognition model in the field of digital consumer products is to be established, a history behavior log of each user needs to be screened out to obtain a related behavior log in the field of digital consumer products. In order to ensure the real-time effect of the user consumption intention recognition model, the screening source of the step is usually the behavior log of each user within the latest set time period, for example, the behavior log of the user within the latest week.
The screening method can adopt but is not limited to one or a combination of the following methods: a keyword matching based approach, and a template matching based approach.
The method based on keyword matching is characterized in that a keyword list corresponding to a set consumption field is mined in advance, each behavior log is matched with the keyword list corresponding to the set consumption field, and the behavior logs containing the keywords in the keyword list are screened out. The keyword list can be one or more, and a behavior log containing keywords in any keyword list can be screened out, or a behavior log containing keywords in several keyword lists at the same time can be screened out.
Taking the digital consumer product field as an example, a commodity model list corresponding to the digital consumer product field is mined in advance, such as "nokia 5233", "HTC visit", "Nexus 7", "ipad 2", and the like, when the log screening is performed in this step, the commodity model list is loaded, a behavior log containing a keyword in the commodity model list is screened, such as "nokia 5233" in a certain log, and the log is screened out. This is a case, and there is also a case: the method comprises the steps of mining a brand list and a commodity type list corresponding to the field of digital consumer goods in advance, wherein the brand list comprises 'association', 'apple', 'Samsung', 'Nokia' and the like, the commodity type list comprises 'notebook', 'mobile phone', 'tablet computer', 'camera' and the like, screening a behavior log which simultaneously comprises keywords in the brand list and the commodity type list, and screening the log if a certain log simultaneously comprises 'Samsung' and 'mobile phone'.
The mode based on the template matching is to dig out an expression template corresponding to a set consumption field in advance, match each behavior log with the expression template respectively and screen out the behavior logs matched with the expression template.
Still taking the digital consumer product field as an example, an expression template corresponding to the digital consumer product field is excavated in advance, for example, "display screen: + [ number ] pixel, if a behavior log contains "display screen: 800 ten thousand pixels ", the behavior log is screened out.
Step 102: and determining a behavior log corresponding to the behavior before purchase and a behavior log corresponding to the behavior after purchase based on behavior pattern analysis performed on the screened related behavior logs.
Generally, before purchase, a user is usually in an early analysis stage, and concerns about the problems of price, parameters, performance, and the like, and after purchase, the user concerns about information of after-sale, use, surrounding related products, and the like, and based on this characteristic, it is possible to determine whether the behavior log corresponds to before purchase or after purchase. The behavior pattern analysis may be performed manually and the analysis result may be labeled to each related behavior log, or an automatic analysis mode may be adopted.
If an automatic analysis mode is adopted, the behavior pattern can be analyzed based on the keywords which are contained in the relevant behavior logs and used for indicating the consumption intention. Here, the keywords indicating the intention of consumption may also be referred to as polar keywords, and are divided into keywords indicating the intention before purchase and keywords indicating the intention after purchase. Keywords such as "offer", "parameters", "price" indicate pre-purchase intent, and keywords such as "set", "repair point", "software" indicate post-purchase intent. The keywords indicating the consumption intention may be manually set in advance, but in order to improve efficiency and facilitate expansion of new consumer goods, an automatic mining method is preferably adopted.
The method for mining keywords indicating consumption intentions is shown in fig. 2 and comprises the following steps:
step 201: and determining a seed word which predicts the consumption intention aiming at the set consumption field.
In this step, the seed word may be manually set in advance, and is usually set for a specific consumption field, but a consumption field usually includes a plurality of subclasses, and when selecting the seed word, it is preferable to select the seed word that indicates the intention before purchase and the seed word that indicates the intention after purchase, which are applicable to each of the subclasses. For example, in the field of digital consumer products, various subclasses such as computers, cameras, and mobile phones are generally included, and keywords indicating an intention before purchase in the subclasses such as computers, cameras, and mobile phones may be used as seed words indicating an intention before purchase in the field of digital consumer products.
Step 202: and classifying the sample logs in the set consumption field by using the seed words indicating the consumption intentions to obtain a sample log before purchase and a sample log after purchase.
Selecting some sample logs in a set consumption field in advance, and if the condition that a certain sample log contains more seed words which indicate consumption intentions before purchase than seed words which indicate consumption intentions after purchase meets a preset condition, considering the sample log as a sample log before purchase; and if the condition that the seed words which indicate the consumption intention after purchase in the certain sample log and the seed words which indicate the consumption intention before purchase are redundant meets the preset condition, the sample log is considered as the sample log after purchase. The preset conditions here may be: and if the difference is larger than a preset difference threshold, the proportion of the difference in the total number of the seed words contained in the sample log exceeds a preset proportion threshold, and the like.
Step 203: and after the word segmentation and the filtering of the useless words are respectively carried out on the sample log before purchase and the sample log after purchase, the word frequency of each word is counted.
The filtered stop words may include, but are not limited to, at least one of the following: the words related to the brand and the model, the words which are nonsense to the semantic expression, such as the null word, the assistant word, the tone word and the like, and the words comprise the noise words such as the letters, the numbers or a Chinese character.
Step 204: and determining keywords indicating the pre-purchase intention and keywords indicating the post-purchase intention based on the word frequency of each word in the pre-purchase sample log and the post-purchase sample log, respectively.
If the word frequency of a word in the sample log before purchase is high, but the word frequency of the word in the sample log after purchase is low or 0, the word is determined to be a keyword indicating the intention before purchase, and similarly, if the word frequency of a word in the sample log after purchase is high, but the word frequency of the word in the sample log before purchase is low or 0, the word is determined to be a keyword indicating the intention after purchase.
With continued reference to fig. 1, step 103: and selecting the behavior logs meeting the training data screening conditions from the behavior logs determined in thestep 102 as training samples.
In view of the fact that the manual labeling method is too heavy and inefficient to select the training samples, the number of labels that can be labeled is limited, and disputes are large, the method of automatically selecting the training samples is adopted in the embodiment of the present invention. The above training data screening conditions may adopt, but are not limited to, one or any combination of the following conditions:
condition 1: and if the number of logs of a certain user in the determined behavior logs is greater than or equal to a preset number threshold, for example, is greater than or equal to 5, taking the logs of the user as training samples.
Condition 2: and if the proportion of the keywords which are contained in a certain log and indicate a certain consumption intention in the determined behavior log is greater than or equal to a preset proportion threshold value, taking the log as a training sample.
Condition 3: and if the proportion of the occurrence frequency of the brand which occurs most in a certain log to the occurrence frequency of all the brands in the log exceeds a preset proportion threshold value in the determined behavior log, taking the log as a training sample. For example, some log shows brands such as "association", "samsung" and "apple", but "apple" is the most appeared brand, if the proportion of the appearance frequency of "apple" to the appearance frequency of all brands in the log exceeds 0.5, the intention of the user is concentrated, the log is taken as a training sample, otherwise, the intention of the user is dispersed, and the log is likely to be browsed at will, and the log is abandoned.
Step 104: and training the classification model by using the training sample to obtain a consumption intention recognition model corresponding to the set consumption field, wherein the consumption intention recognition model can recognize the behaviors before purchase and the behaviors after purchase.
In the invention, the type of the classification model is not limited, any two classification models in the prior art, such as an SVM (support vector machine) model and the like, can be adopted, and the finally trained consumption intention recognition model can be recognized according to the probability of each type.
The features for the user employed in training the classification model with the training samples may include at least one or any combination of the following features:
the number of logs;
the number of logs occupied by the brand with the largest occurrence number;
the number of logs occupied by brands occurring many times;
the number of logs occupied by the model with the largest occurrence frequency;
the number of logs occupied by the models which are many times;
the proportion of keywords indicating the consumption intention;
the appearance position of a keyword indicating a consumption intention;
log proportion with query;
the proportion of keywords contained in the query and used for indicating the intention of the user;
log crossing duration;
the maximum crossing time of the log occupied by the same brand;
the maximum spanning time of the log occupied by the same model.
In addition to the method for training a classification model by using a set of training samples to obtain a consumption intention recognition model shown in theabove steps 103 and 104, in order to improve the recognition accuracy of the model, a preferred embodiment is provided herein:
instep 103, N behavior log sets respectively satisfying N sets of training data screening conditions are selected from the behavior logs determined instep 102, that is, N sets of training data screening conditions are preset, N is a positive integer greater than 1, one behavior log set can be obtained by each set of training data screening conditions, and N sets of behavior log sets are obtained by N sets. The N sets of training data screening conditions may be obtained by setting different number thresholds, ratio thresholds, and the like for the above 3 conditions, for example, setting the ratio thresholds in condition 2 as: 0. 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, etc., and the ratio thresholds are set to 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, respectively, in condition 3.
Respectively taking the N behavior log sets as training samples to train a classification model to obtain N candidate models; and then selecting the optimal one from the N candidate models as the consumption intention recognition model, wherein the optimal one can be the one which meets the preset recall rate requirement and has the highest accuracy rate in the candidate models. Specifically, the test samples may be classified by using the N candidate models, and the recall rate and the accuracy of each candidate model may be determined according to the classification result of the test sample.
In addition, since the setting of the probability threshold in the classification model directly affects the classification result of the assignment model (the probability threshold refers to that when the probability that the classification model predicts that the object a belongs to the classification result X is greater than the probability threshold, the object a belongs to the classification result X), after N candidate models are obtained, the candidate models having different probability thresholds are obtained by setting different probability thresholds for the N candidate models, and if the N candidate models are respectively set with five probability thresholds of 0.5, 0.6, 0.7, 0.8 and 0.9, 5N candidate models are actually obtained, and one of the 5N candidate models that satisfies the preset recall ratio requirement and has the highest accuracy is selected as the consumption intention recognition model.
Example II,
Fig. 3 is a flowchart of a method for identifying consumption intents according to a second embodiment of the present invention, where the method is implemented based on a consumption intention identification model established in the manner shown in the first embodiment, and as shown in fig. 3, the method includes the following steps:
step 301: the consumption field of the user to be identified is determined.
The step can adopt different modes according to different application scenes, and can be determined according to the type of the webpage where the user is currently located, for example, if the current user is looking at a column of a digital product in a certain website, the consumption field of the user to be identified can be directly determined to be the field of the digital consumer product; or, the consumption field to which the behavior log belongs may be identified according to the behavior log of the user to be identified in a recent period of time, and a theme determination manner in the prior art may be adopted to determine the consumption field to which the behavior log belongs, which is not described herein again. Generally, when the time length setting of the recent period of time is consistent with the time length setting of screening the related behavior logs adopted for establishing the consumption intention recognition model, the accuracy rate can be higher; alternatively, it may be determined according to the content currently input by the user, for example, when the user inputs samsung gate S4 using a search engine, it may be determined that the consumption field is the digital consumer product field according to the content input by the user.
Step 302: and classifying the behavior logs of the user to be identified by using the consumption intention identification model corresponding to the consumption field identified in the step 301 to obtain whether the consumption intention of the user to be identified is before or after purchase.
In this step, features of the behavior log of the user to be recognized need to be extracted, the consumption intention recognition model is actually classified by using the features of the behavior log of the user to be recognized, the extracted features should be consistent with the features adopted in the training of the classification model instep 104 of the embodiment, and specific available features are not described herein again. The classification basis of the consumption intention recognition model is actually the characteristics extracted from the behavior log of the user to be recognized, so that whether the consumption intention of the user to be recognized is before or after purchase is obtained.
After the consumption intention of the user is identified, the information can be displayed to the user in a targeted manner according to the display strategy corresponding to the consumption intention of the user, as shown in step 303. The method can be particularly applied to, but not limited to, the following two scenes:
the first scenario is: and displaying promotion information of a type corresponding to the consumption intention of the user to be identified to the user to be identified according to the consumption intention of the user to be identified. For example, if the user's consumption intention is before purchase, commodity purchase information, price comparison information, evaluation information, and the like may be recommended to the user, and if the user's consumption intention is after purchase, information such as heartburn information, related peripheral commodity information, and after-sale information may be recommended to the user.
The second scenario is: when the user to be identified uses the search engine, the search results can be sorted according to the consumption intention of the user to be identified, and the information corresponding to the consumption intention of the user to be identified is sorted in the search results in advance. For example, if the consumption intention of the user is recognized to be before purchase, the ranking of the web pages related to the commodity purchase information, the price comparison information and the evaluation information in the search result can be advanced, and if the consumption intention of the user is recognized to be after purchase, the ranking of the web pages related to the use hearts information, the related peripheral commodity information, the after-sale information and the like in the search result can be advanced.
The above is a detailed description of the method provided by the present invention, and the device for identifying consumption intents provided by the present invention is described in detail by the third embodiment.
Example III,
Fig. 4 is a block diagram of an apparatus for identifying a consumption intention according to a third embodiment of the present invention, the apparatus being generally disposed at a server side, as shown in fig. 4, the apparatus including a modeling unit 00 and an identification unit 10.
Wherein the modeling unit 00 includes: a first screening subunit 01, a pattern analysis subunit 02, a second screening subunit 03, and a model training subunit 04.
First, the first filtering subunit 01 filters out a log of related behaviors in a set consumption domain from the historical behavior logs of each user. The screening method can adopt one or a combination of the following methods: a keyword matching based approach, and a template matching based approach.
The method based on keyword matching is characterized in that a keyword list corresponding to a set consumption field is mined in advance, each behavior log is matched with the keyword list corresponding to the set consumption field, and the behavior logs containing the keywords in the keyword list are screened out. The keyword list can be one or more, and a behavior log containing keywords in any keyword list can be screened out, or a behavior log containing keywords in several keyword lists at the same time can be screened out.
The mode based on the template matching is to dig out an expression template corresponding to a set consumption field in advance, match each behavior log with the expression template respectively and screen out the behavior logs matched with the expression template.
Then, the pattern analysis subunit 02 determines a behavior log corresponding to the pre-purchase behavior and a behavior log corresponding to the post-purchase behavior based on behavior pattern analysis performed on the relevant behavior logs screened by the first screening subunit 01.
Here, the pattern analysis subunit 02 may perform behavior pattern analysis on the screened related behavior logs manually, or may perform automatic analysis by the pattern analysis subunit 02, that is, perform behavior pattern analysis according to keywords indicating consumption intentions included in the related behavior logs, where the keywords indicating consumption intentions include keywords indicating pre-purchase intentions or keywords indicating post-purchase intentions.
Thereafter, the second screening subunit 03 selects, as a training sample, a behavior log that satisfies the training data screening condition from among the behavior logs determined by the pattern analysis subunit 02. The training data screening conditions comprise one or any combination of the following conditions:
condition 1: the behavior log of the user comprises the log number which is larger than or equal to the preset number threshold.
Condition 2: the proportion of the keywords indicating the consumption intention in the behavior log is greater than or equal to a preset proportion threshold, and the keywords indicating the consumption intention are keywords indicating the intention before purchase or keywords indicating the intention after purchase.
Condition 3: the proportion of the occurrence frequency of the brand which appears most in the behavior log to the occurrence frequency of all the brands in the behavior log exceeds a preset proportion threshold.
In addition, in order to implement the above keyword mining for consumer intent prediction, the modeling unit 00 further includes an intent keyword mining subunit 05 for mining keywords for consumer intent prediction in a manner shown in the following operations a1-a 4:
operation a1, a seed word indicative of an intention to consume is determined for a set consumption domain. The seed words can be manually set in advance, and are usually set for a specific consumption field, but a consumption field usually comprises a plurality of subclasses, and when the seed words are selected, the seed words which are applicable to all the subclasses and indicate the intention before purchase and the intention after purchase are preferably selected. For example, in the field of digital consumer products, various subclasses such as computers, cameras, and mobile phones are generally included, and keywords indicating an intention before purchase in the subclasses such as computers, cameras, and mobile phones may be used as seed words indicating an intention before purchase in the field of digital consumer products.
Operation a2, classify the sample log in the set consumption domain by using the seed word, and obtain a pre-purchase sample log and a post-purchase sample log.
Specifically, some sample logs in a set consumption field are selected in advance, and if the condition that a certain sample log contains more seed words which indicate consumption intentions before purchase than seed words which indicate consumption intentions after purchase meets a preset condition, the sample log is considered as a sample log before purchase; and if the condition that the seed words which indicate the consumption intention after purchase in the certain sample log and the seed words which indicate the consumption intention before purchase are redundant meets the preset condition, the sample log is considered as the sample log after purchase. The preset conditions here may be: and if the difference is larger than a preset difference threshold, the proportion of the difference in the total number of the seed words contained in the sample log exceeds a preset proportion threshold, and the like.
And an operation A3 of performing word segmentation and filtering of useless words on the sample log before purchase and the sample log after purchase respectively, and then counting the word frequency of each word.
Where filtered, the stop words may include, but are not limited to, at least one of the following: the words related to the brand and the model, the words which are nonsense to the semantic expression, such as the null word, the assistant word, the tone word and the like, and the words comprise the noise words such as the letters, the numbers or a Chinese character.
Operation a4 determines a keyword indicative of the pre-purchase intention and a keyword indicative of the post-purchase intention based on word frequencies of the words in the pre-purchase sample log and the post-purchase sample log, respectively.
If the word frequency of a word in the sample log before purchase is high, but the word frequency of the word in the sample log after purchase is low or 0, the word is determined to be a keyword indicating the intention before purchase, and similarly, if the word frequency of a word in the sample log after purchase is high, but the word frequency of the word in the sample log before purchase is low or 0, the word is determined to be a keyword indicating the intention after purchase.
Finally, the model training subunit 04 extracts a feature training classification model from the training samples to obtain a consumption intention recognition model corresponding to the set consumption field, which can recognize the pre-purchase behavior and the post-purchase behavior. The classification model adopted by the method can be any two classification models in the prior art, such as SVM models, and the finally trained consumption intention recognition model can be recognized according to the probability of each type.
The features extracted from the training samples by the model training subunit 04 in training the classification model include at least one or any combination of the following features:
the number of logs;
the number of logs occupied by the brand with the largest occurrence number;
the number of logs occupied by brands occurring many times;
the number of logs occupied by the model with the largest occurrence frequency;
the number of logs occupied by the models which are many times;
the proportion of keywords indicating the consumption intention;
the appearance position of a keyword indicating a consumption intention;
log proportion with query;
the proportion of keywords contained in the query and used for indicating the intention of the user;
log crossing duration;
the maximum crossing time of the log occupied by the same brand;
the maximum spanning time of the log occupied by the same model.
In addition to the above method for training the classification model by using the set of training samples by the second screening subunit 03 and the model training subunit 04 to obtain the consumption intention recognition model, in order to improve the recognition accuracy of the model, a preferred embodiment is provided herein:
the second screening subunit 03 selects N behavior log sets that respectively satisfy the N sets of training data screening conditions from the behavior logs determined by the pattern analysis subunit 02. The N sets of training data screening conditions may be obtained by setting different number thresholds, ratio thresholds, and the like for the 3 conditions, for example, setting the ratio thresholds in condition 2 as follows: 0. 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, etc., and the ratio thresholds are set to 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, respectively, in condition 3.
The model training subunit 04 trains classification models by using each behavior log set as a group of training samples to obtain N candidate models, and selects an optimal one of the N candidate models as a consumption intention recognition model corresponding to a set consumption field. Specifically, the test samples may be classified by using the N candidate models, and the recall rate and the accuracy of each candidate model may be determined according to the classification result of the test sample.
After the consumption intention recognition model is built, the consumption intention recognition model can be used for the user consumption intention recognition of the recognition unit 10. As shown in fig. 4, the recognition unit 10 includes: the domain determining subunit 11 and the model classifying subunit 12 may further include an information presenting subunit 13.
First the domain determining subunit 11 determines the consumption domain of the user to be identified. According to different application scenarios, the domain determining subunit 11 may adopt different manners: the method can be determined according to the type of the webpage where the user is currently located, for example, if the current user is looking at a column of a digital product in a certain website, the consumption field of the user to be identified can be directly determined to be the field of the digital consumption product; or, the consumption field to which the behavior log belongs may be identified according to the behavior log of the user to be identified in a recent period of time, and a theme determination manner in the prior art may be adopted to determine the consumption field to which the behavior log belongs, which is not described herein again. Generally, when the time length setting of the recent period of time is consistent with the time length setting of screening the related behavior logs adopted for establishing the consumption intention recognition model, the accuracy rate can be higher; alternatively, it may be determined according to the content currently input by the user, for example, when the user inputs samsung gate S4 using a search engine, it may be determined that the consumption field is the digital consumer product field according to the content input by the user.
Then, the model classification subunit 12 classifies the log of the related behavior of the user to be recognized in the determined consumption field in a recent period of time by using the consumption intention recognition model corresponding to the consumption field determined by the field determination subunit 11, so as to obtain whether the consumption intention of the user to be recognized is before or after purchase. The model classification subunit needs to extract the features of the behavior log of the user to be recognized, and the consumption intention recognition model is actually classified by using the features of the behavior log of the user to be recognized. The extracted features thereof should be consistent with the features extracted by the model training subunit 04.
Furthermore, the information presentation subunit 13 can present the information to the user to be identified according to the presentation policy corresponding to the consumption intention of the user to be identified. In particular, but not limited to, the following two scenarios may apply:
the first scenario is: and displaying promotion information of a type corresponding to the consumption intention of the user to be identified to the user to be identified according to the consumption intention of the user to be identified. For example, if the user's consumption intention is before purchase, commodity purchase information, price comparison information, evaluation information, and the like may be recommended to the user, and if the user's consumption intention is after purchase, information such as heartburn information, related peripheral commodity information, and after-sale information may be recommended to the user.
The second scenario is: in the search results displayed to the user to be identified, the information corresponding to the consumption intention of the user to be identified is sorted in the search results in advance. For example, if the consumption intention of the user is recognized to be before purchase, the ranking of the web pages related to the commodity purchase information, the price comparison information and the evaluation information in the search result can be advanced, and if the consumption intention of the user is recognized to be after purchase, the ranking of the web pages related to the use hearts information, the related peripheral commodity information, the after-sale information and the like in the search result can be advanced.
The method and the device provided by the invention can be widely applied to the fields of webpage searching, commodity searching, advertisement promotion and the like, display the content which is more in line with the consumption intention of the user to the user, improve the accuracy, better meet the user requirement, reduce the interference of invalid content to the user and improve the effective commercial behavior conversion rate.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.