Content of the invention
Technical problem:The technical problem to be solved is:One kind is provided to go through based on userThe commodity information recommendation method of history behavior and system, the method and system are based on user's history behaviorData, precisely analyzes the behavioral data of user, provides the user the commercial product recommending list of personalization,And commercial product recommending is more accurate.
Technical scheme:In order to solve the above problems, the embodiment of the present invention adopts following technical sideCase:
In a first aspect, the present embodiment provides a kind of merchandise news based on user's history behavior to recommendMethod, the method comprises the following steps:
In e-commerce website historical behavior data, historical behavior data includes S11 collection userUser profile and merchandise news;
S12, according to historical behavior data, sets up user's commodity probabilistic forecasting characteristic vector;
S13, according to user's commodity probabilistic forecasting characteristic vector, training pattern, obtains user and recommendsCommodity projection model;
S14 by user data input user's Recommendations forecast model to be predicted, go by measuring and calculatingPrediction purchase probability for commodity;
S15 buys generally according to the prediction purchase probability of behavior commodity, the prediction calculating associated articlesRate, merges behavior commodity and associated articles, obtains commercial product recommending list.
In conjunction with a first aspect, as the first mode in the cards, in described S11, going throughHistory behavioral data data under PC end, WAP end, APP end, line;User profile includesThe mark ID of user, the sex of user, age, access preference;Merchandise news includes commodityIdentification code, commodity flows feature, commodity behavior characteristicss and commodity cost of decision making.
In conjunction with a first aspect, as second mode in the cards, in described S12, buildingVertical user's commodity probabilistic forecasting characteristic vector specifically includes:
S201 carries out data cleansing:By abnormal data and do not meet the data that user browses custom,It is carried out;
S202 carries out characteristic processing, obtains user characteristicses value:Each terminal use after cleaning is gone throughHistory behavior characteristicss, daily count, respectively structuring user's historical behavior characteristic statisticses function, to systemMeter natural law is divided into M segment, to each segment according to time attenuation function, calculates areaBetween section eigenvalue, add up each segment eigenvalue, obtain user characteristicses value:
S203 sets up user's commodity probabilistic forecasting characteristic vector:User's commodity probabilistic forecasting featureVector representation be:The fingerprint ID+ commodity ID+ user characteristicses vector value of each terminal;ItsIn, fingerprint ID represents mark ID of user, and commodity ID represents the identification code of commodity.
In conjunction with a first aspect, as the third mode in the cards, described S15 specifically wrapsInclude:
S301 determines associated articles:According to the access history data in user's history behavioral dataAnd purchase history data, using correlation rule or collaborative filtering, calculate behavior commodityAssociated articles the degree of association, take b commodity before degree of association highest, as behavior commodityAssociated articles set;
S302 calculates the purchase probability of associated articles according to formula (1):
Score_i=Master_Pos*SKU_Score_i/max (SKU_Score_i)Formula (1)
Wherein, Score_i represents the purchase probability of associated articles;Master_Pos represents capableFor commodity purchasing probability;Max (SKU_Score_i) represents degree of association highest in associated articles setValue, SKU_Score_i represents the degree of association of associated articles SKU_i and behavior commodity;
S303 merges behavior commodity and associated articles, generates commercial product recommending list:If behavior businessProduct in the associated articles set that step S301 obtains, then according to behavior commodity and associated articlesPrediction purchase probability size sequence, obtain commercial product recommending list;If behavior commodity are not in stepIn the associated articles set that S301 obtains, and the prediction purchase probability of behavior commodity is less than probabilityThreshold value, then be multiplied by penalty coefficient using behavior commodity projection purchase probability final as behavior commodityPrediction purchase probability;By associated articles and behavior commodity according to commodity projection purchase probability sizeSequence, obtains commercial product recommending list.
In conjunction with a first aspect, as the 4th kind of mode in the cards, described is gone through based on userThe commodity information recommendation method of history behavior, also includes step S16:The commodity that S15 obtains are pushed awayRecommend list, filtered and exported logical process, generate final commercial product recommending list.
In conjunction with the 4th kind of mode in the cards of first aspect, in the cards as the 5th kindMode, described step S16 specifically includes:Take the order commodity within nearest H days of family,Using affiliated for order commodity commodity group as user filtering commodity group, the commodity that filtration S16 obtains push awayRecommend and in list, belong to the commodity filtering commodity group;According to commodity projection purchase probability, after filteringCommercial product recommending list in commodity re-start sequence, generate final commercial product recommending list.
Second aspect, the present embodiment provides a kind of merchandise news based on user's history behavior to recommendSystem, this system includes:
Acquisition module:For gathering user in e-commerce website historical behavior data;
Characteristic vector sets up module:For historical behavior data is gathered according to acquisition module, set upUser's commodity probabilistic forecasting characteristic vector;
Model building module;For setting up user's commodity probability of module foundation according to characteristic vectorPredicted characteristics vector, training pattern, obtain user's Recommendations forecast model;
Measuring and calculating module:For entering data in user's Recommendations forecast model, calculate behaviorThe prediction purchase probability of commodity;
First generation module:Prediction for the behavior commodity according to measuring and calculating module measuring and calculating is bought generallyRate, calculates the prediction purchase probability of associated articles, merges behavior commodity and associated articles, obtainsCommercial product recommending list.
In conjunction with second aspect, as the first mode in the cards, described acquisition module is adoptedThe historical behavior Data Source collecting data under PC end, WAP end, APP end, line.
In conjunction with second aspect, as second mode in the cards, described characteristic vector is builtFormwork erection block includes:
Cleaning submodule:For by abnormal data and do not meet the data that user browses custom, carrying outCleaning;
Measuring and calculating submodule:For by each terminal use's historical behavior feature after cleaning, daily unitingMeter, structuring user's historical behavior characteristic statisticses function respectively, M area is divided into statistics natural lawBetween section, to each segment according to time attenuation function, calculate the eigenvalue of segment, add upThe eigenvalue of each segment, obtains user characteristicses value:
Setting up submodule:For setting up user's commodity probabilistic forecasting characteristic vector, user's commodity are generalRate predicted characteristics vector representation be:The fingerprint ID+ ID+commodity ID+ of each terminalUser characteristicses value;Wherein, fingerprint ID represents mark ID of user, and ID represents userFingerprint, commodity ID represents the identification code of commodity.
In conjunction with second aspect, as the third mode in the cards, described the first generation mouldBlock includes:
Determination sub-module:For according to the access history data in user's history behavioral data and purchaseBuy historical data, using correlation rule or collaborative filtering, calculate the pass of behavior commodityThe degree of association of connection commodity, takes b commodity before degree of association highest, as the pass of behavior commodityConnection commodity set;
Calculating sub module:For calculating the purchase probability of associated articles;
First generation submodule:For merging behavior commodity and associated articles, generate commercial product recommendingList:If behavior commodity are in the associated articles set that determination sub-module is set up, according to behaviorThe prediction purchase probability size sequence of commodity and associated articles, obtains commercial product recommending list;If OKFor commodity not in the associated articles set that determination sub-module is set up, and the prediction purchase of behavior commodityBuy probability be less than probability threshold value, then behavior commodity projection purchase probability is multiplied by penalty coefficient asThe final prediction purchase probability of behavior commodity;Will be pre- according to commodity to associated articles and behavior commoditySurvey the sequence of purchase probability size, obtain commercial product recommending list.
In conjunction with second aspect, as the 4th kind of mode in the cards, described is gone through based on userThe merchandise news commending system of history behavior, also includes the second generation module:For generating to firstThe commercial product recommending list that module obtains, is filtered and is exported logical process, generates final businessProduct recommendation list.
In conjunction with the 4th kind of mode in the cards of second aspect, as the 5th kind of side in the cardsFormula, the second described generation module includes:
Filter commodity group setting up submodule:For taking the order commodity within nearest H days of family,Using affiliated for order commodity commodity group as user filtering commodity group;
Filter submodule:Belong to for filtering in the commercial product recommending list that the first generation module generatesFilter the commodity of commodity group;
Second generation submodule:According to commodity projection purchase probability, after filter submodule is filteredCommercial product recommending list in commodity re-start sequence, generate final commercial product recommending list.
Beneficial effect:Compared with prior art, provided in an embodiment of the present invention based on user's historyThe commodity information recommendation method of behavior and system, the commodity that can provide the user personalization push awayRecommend, and recommend more accurate, meet user's request.The present embodiment based on user's history behaviorCommodity information recommendation method be analyzed based on the historical behavior data of user, build user push awayRecommend commodity projection model, and the associated articles related to behavior commodity are also included RecommendationsIn list, after the purchase probability of Integrated comparative behavior commodity and associated articles, generate commercial product recommendingList.
Specific embodiment
Below in conjunction with the accompanying drawings, detailed explanation is carried out to the technical scheme of the embodiment of the present invention.
As shown in figure 1, a kind of merchandise news based on user's history behavior of the present embodiment is recommendedMethod, comprises the following steps:
S11 collects user in e-commerce website historical behavior data, and historical behavior data includesUser profile and merchandise news;
S12, according to the historical behavior feature of the attribute, product features and user of user, sets up and usesFamily commodity probabilistic forecasting characteristic vector;
S13, according to user's commodity probabilistic forecasting characteristic vector, training pattern, show that user recommendsCommodity projection model;
S14 enters data in user's Recommendations forecast model, draws the prediction of behavior commodityPurchase probability;
S15 buys generally according to the prediction purchase probability of behavior commodity, the prediction calculating associated articlesRate, merges behavior commodity and associated articles, obtains commercial product recommending list.
In above-mentioned recommendation method, using user in the historical behavior data of e-commerce website, surveyThe prediction purchase probability of calculation behavior commodity, and behavior commodity and associated articles are combined, rawBecome commercial product recommending list.Because the behavior of different user is different, so being based on different userHistorical behavior, measuring and calculating behavior commodity prediction purchase probability so that the commodity ultimately generating push awayRecommend list and there is personalization, generate different commercial product recommending lists for different user.
Items list for making to recommend more meets the demand of user, in step S11, history rowFor Data Source under PC end, WAP end, APP end and line data.The Data Source of multiple terminals,Be conducive to the historical behavior data acquisition range that extends one's service so that the historical behavior data of collectionMore accurately react the historic demand of user, the generation for subsequent article recommendation list provides moreAccurately historical data basis.The historical behavior data class of collection can be true according to actual needsFixed, including user profile and merchandise news.For example, user tag information, user access information,The duration that user's click information, user browse, user search for information, user's collection, shoppingCar information, presell information, order sales information etc..User profile include ID ID,Customer attribute information etc..Merchandise news includes commodity sign coding, product features information etc..WithFamily attribute information includes the sex of user, age, accesses preference.Wherein, access preference reactionThe hobby of user, such as color, style etc..Determine that the attribute of user can be according to historical behaviorData, is modeled identification with methods such as statistical analysiss and machine learning to it and draws.CommodityFeature includes commodity flows feature, commodity behavior characteristicss and commodity cost of decision making.Commodity flows are specialLevy and refer to:PV, UV, conversion ratio, sales volume, quantity on order, sales volume rate of increase, order increaseRate etc..Commodity behavior characteristicss refer to:Sales promotion, price reduction, new product, presell reservation, quick-fried money commodity,Sales promotion dynamics, price etc..Commodity cost of decision making refers to:Buy the decision-making time of commodity, browseNumber of times, browse natural law etc..The historical behavior feature building user refers to:History row to userIt is analyzed for data, draw the factor that impact user buys, according to the other extraction factor of Factor minuteEigenvalue, composing factor numerical value vector, obtain the historical behavior feature of user.
Preferably, as shown in Fig. 2 in described S12, setting up user's commodity probabilistic forecastingCharacteristic vector specifically includes:
S201 carries out data cleansing:Abnormal data is carried out.
So-called abnormal data refers to compared with other data, and this data is significantly different, abnormalOr inconsistent data.The for example following data needing to filter, broadly falls into abnormal data:Filter same user and add shopping cart merchandise classification number>The user of merchandise classification threshold value Na;FilterBrowsing time is less than the commodity details page browsing record of browsing time threshold value Nbs;When filtration browsesBetween more than browsing time threshold value Ncs commodity details page browsing record;If used in a sessionFamily level Four page browsing quantity is more than level Four page browsing quantity threshold value Nd, then filter this meetingWords;User's same day accesses pv and is less than pv threshold value Ne, filters this user.
Except abnormal data, the data not meeting user and browsing custom can also be carried out,I.e.:By abnormal data with do not meet user and browse the data of custom and be carried out.So-called do not meetThe data that user browses custom refers to the data very big with the behavior difference of normally shopping user, exampleNavigation patterns as reptile user or brush single user.
S202 carries out characteristic processing:According to the distribution of each terminal use's historical behavior feature, pressIts statistics, constructs the function as shown in formula (2) respectively:
Formula (2)
Wherein, f (X) represents user's history behavior characteristicss statistical function, and X represents characteristic variable,A represents each characteristic threshold value, and x represents the statistical number of characteristic variable X.
If statistics natural law is N days, M segment is divided into statistics natural law, each segmentAccording to the time attenuation function shown in formula (3), calculate the eigenvalue of this segment;
Formula (3)
Wherein, K represents the half-life of attenuation function, and t represents the natural law calculated apart from this,As calculate the previous day eigenvalue when, t=1, when calculating eigenvalue a few days ago, t=2;
The eigenvalue of each segment that decays according to formula (3), the cumulative feature drawing final userValue:
Formula (4)
Wherein, N represents the statistics natural law of historical behavior data, and Nt represents 1:The integer of N/MSequence;
S203 sets up user's commodity probabilistic forecasting characteristic vector:User's commodity probabilistic forecasting featureVector representation be:The fingerprint ID+ commodity ID+ user characteristicses vector value of each terminal.
Fingerprint ID represents mark ID of user.Such as cookieid, MEMI, member's coding etc..Commodity ID represents the identification code of commodity.
With user's commodity probabilistic forecasting characteristic vector according to the difference of each terminal use, and build respectivelyVertical, specifically:
(1) pc user:Fingerprint ID (PC)+commodity ID+ behavior characteristicss;
(2) WAP user:Fingerprint ID (WAP)+commodity ID+ behavior characteristicss;
(3) APP user:Fingerprint ID+ commodity ID+ behavior characteristicss;
(4) across screen user:Fingerprint ID1 (PC)+fingerprint ID2 (WAP)+fingerprint ID3+Commodity ID+ behavior characteristicss.
Wherein, fingerprint ID (PC) represents pc user mark ID;Fingerprint ID (WAP) representsWAP ID ID;Fingerprint ID represents APP ID ID.
In step s 13, according to user's commodity probabilistic forecasting characteristic vector, training pattern, obtainGo out user's Recommendations forecast model.
The model of training is according in logistic regression, lasso recurrence, random forestsAny one or more method and set up.During training, the multiterminal such as PC end, WAP end, APP endData, is trained model respectively.Take the user of the commodity having been converted into order in shopping cartCommodity probabilistic forecasting characteristic vector, as training set positive sample data.Take in behavior and do not convertFor user's commodity probabilistic forecasting characteristic vector of the SKU of order, as the anti-sample number of training setAccording to.The model training being related in the present embodiment, calculates each commodity using learning classification modelPurchase probability, including logistic regression, lasso recurrence, random forests etc..
Logic Regression Models:In the case of classification, the LR grader after study obtainsTo one group of weights, weights according to linear with training data plus and mode, obtain a weighted value,Go out its probability according to the form calculus of sigmoid function afterwards, that is, obtain purchase probability.
Lasso regression model:Lasso(Least absolute shrinkage andSelection operator, Tibshirani) method is a kind of Shrinkage estimation.It passes through constructionOne penalty function obtains the model of a more refine so that it compresses some coefficients, sets simultaneouslySome coefficients fixed are zero.The advantage therefore remaining subset contraction, is that a kind of process has again altogetherLinear data biased estimation.The basic thought of Lasso is the absolute value sum in regression coefficientUnder constraints less than a constant, residual sum of squares (RSS) is made to minimize such that it is able to produce certainA little regression coefficients exactly equal to 0, obtain the model that can explain.Make prediction probability morePlus accurately.
Random forests model:Random forest is a classification comprising multiple decision treesDevice, and depending on the classification of its output is the mode by the classifications of indivedual tree outputs.According to outputClassification calculates the purchase probability of user.
In step S14, user data to be predicted is loaded in user's commodity projection model,Draw the prediction purchase probability of behavior commodity.
Preferably, S15 specifically includes following steps:
S301 determines associated articles:According to the access history data in user's history behavioral dataAnd purchase history data, using correlation rule or collaborative filtering, calculate behavior commodityAssociated articles the degree of association, take b commodity before degree of association highest, as behavior businessThe associated articles set of product, b is integer, and b>1.
S302 calculates the purchase probability of associated articles according to formula (1):
Score_i=Master_Pos*SKU_Score_i/max (SKU_Score_i) formula(1)
Wherein, Score_i represents the purchase probability of associated articles;Master_Pos represents capableFor commodity purchasing probability;Max (SKU_Score_i) represents degree of association highest in associated articles setValue, SKU_Score_i represents the degree of association of associated articles SKU_i and behavior commodity;
S303 merges behavior commodity and associated articles, obtains the recommendation list of commodity:
If behavior commodity are in the associated articles set that step S301 obtains, according to behavior businessThe prediction purchase probability size sequence of product and associated articles, draws commercial product recommending list;If behaviorCommodity are not in the associated articles set that step S301 obtains, and the prediction of behavior commodity is boughtProbability is less than probability threshold value, then behavior commodity projection purchase probability is multiplied by penalty coefficient as rowFor the prediction purchase probability that commodity are final, by associated articles and behavior commodity, according to commodity projectionPurchase probability size is resequenced, and obtains commercial product recommending list.
In step S303, probability threshold value and penalty coefficient are according to comprehensive evaluation index(F-Measure) optimization criteria is selected, and takes the probability when F-Measure is maximumThreshold value and penalty coefficient.
Wherein:The individual sum of the individual sum of hit rate=correct identification/identify;
Individual sum present in the individual sum/test set of recall rate=correct identification;
In the case that contradiction in hit rate and recall rate index, using comprehensive evaluation index(F-Measure is also called F-Score) considers them, chooses optimal value.F-MeasureIt is hit rate and recall rate weighted harmonic mean.
F-Measure=(1+a2) * hit rate * recall rate/a2* (hit rate+recall rate);
When parameter a=1 it is simply that modal F1, namely F1=2* hit rate * recall rate/(lifeMiddle rate+recall rate).
Understand, F1 combines the result of hit rate and recall rate.When F1 is higher, then illustrateMethod is more effective.F1 Main Function is adjustment sequence.
As shown in figure 4, the recommendation method that the present embodiment provides is on the basis of above-described embodiment,Increased step S16:The commercial product recommending list being obtained according to S15, according to behavior filter logicFiltered and exported logical process, exported final commercial product recommending list.
The detailed process being filtered and being exported logical process according to behavior filter logic is:TakeOrder commodity within nearest H days of family, using affiliated for order commodity commodity group as user filtering businessProduct group, filters and belongs to the commodity filtering commodity group in the commercial product recommending list that S16 obtains;According toCommodity projection purchase probability, the commodity in the commercial product recommending list after filtering are re-started rowSequence, as final commercial product recommending list.Order has been descended within nearest H days based on userCommodity, user would not buy in the recent period again, therefore, processed using behavior filter logic, makeObtain the business no longer occurring in commercial product recommending list finally having played order within nearest H daysProduct.Eliminate the commercial product recommending list after these commodity, more can accurately reflect the demand of user.
The recommendation method of above-described embodiment, considered user behavior and product features and the twoCross feature, improve the accuracy of prediction, further increase the accuracy of recommendation.Hand overFork feature refers to the linear of characteristic attribute or nonlinear combination.Cross feature is to user behaviorPortray abundanter, the dimension of the characteristic variable of increase, further increase the precision of model.
Carry out accuracy test:Comparative example and the present embodiment, both test datas are all using thisThe acquisition modes of training data in embodiment, are calculated in new time window.Comparative example is adoptedThe model set up using step S13 with Logic Regression Models, the present embodiment.When calculating, rightThe user characteristicses that ratio adopts are the navigation patterns of user, and the user characteristicses that the present embodiment adopts areUser behavior feature, product features.Comparative example and the present embodiment export through model measurement and recommendList.Predicting the outcome according to both, the precision of prediction AUC of comparative example is 0.70, this realityThe precision of prediction AUC applying example is 0.83.
Overall dimensions are ranked up to recommendation results, for the core evaluation index recall rate recommendedSituation about can conflict with hit rate, the present embodiment adopts the statistics of aggregative weighted harmonic averageMethod is weighed, and finally recommends ranking results using its optimal value optimization, comments according to comprehensiveThe corresponding result of valency index maximum is ranked up, and improves the accuracy recommending sequence,
The method employing multi-level decay, sets up the historical behavior feature of user.By user'sHistorical behavior is divided into M segment, is decayed in segment and two dimensions of time.MakeRemain the custom that continuously browses of user with the method, the method thinks the use in same segmentFamily behavior is Continuous behavior, and considers the impact of the purchasing demand to user for the time.MultilamellarFinal spy can be caused in the commodity score of the method impact sequence of secondary decay, the speed of decay and intervalLevy vector value different, lead to user's score different.It is according to score because the commodity of user sortBig minispread, therefore the result of score Different Effects sequence (recommending ranking results).
The present embodiment method prediction user carries out purchase probability prediction to ecommerce commodity, predictionResult carries out the basic forecast data such as precision marketing, personalized recommendation as e-commerce website.
In addition, a kind of being pushed away based on the merchandise news of user's history behavior as shown in figure 5, also providingRecommend system, this system includes:
Acquisition module:For gathering user in e-commerce website historical behavior data;
Characteristic vector sets up module:For historical behavior data is gathered according to acquisition module, set upUser's commodity probabilistic forecasting characteristic vector;
Model building module;For setting up user's commodity probability of module foundation according to characteristic vectorPredicted characteristics vector, training pattern, obtain user's Recommendations forecast model;
Measuring and calculating module:For entering data in user's Recommendations forecast model, calculate behaviorThe prediction purchase probability of commodity;
First generation module:Prediction for the behavior commodity according to measuring and calculating module measuring and calculating is bought generallyRate, calculates the prediction purchase probability of associated articles, merges behavior commodity and associated articles, obtainsCommercial product recommending list.
In said system, using user in the historical behavior data of e-commerce website, calculate rowFor the prediction purchase probability of commodity, and behavior commodity and associated articles are combined, generate businessProduct recommendation list.Because the behavior of different user is different, so going through based on different userHistory behavior, the prediction purchase probability of measuring and calculating behavior commodity is so that the commercial product recommending ultimately generating arrangesTable has personalization, generates different commercial product recommending lists for different user.
The historical behavior Data Source of acquisition module collection is in PC end, WAP end, APP end and lineLower data.The Data Source of multiple terminals, is conducive to the historical behavior data acquisition model extending one's serviceEnclose so that the historical behavior data gathering more accurately reacts the historic demand of user, after beingThe generation of continuous commercial product recommending list provides more accurately historical data basis.Historical behavior packetInclude user profile and merchandise news.User profile includes mark ID of user, user property letterBreath etc..Merchandise news includes the identification code of commodity, product features information etc..User property bagInclude the sex of user, age, access preference.Product features include commodity flows feature, commodityBehavior characteristicss and commodity cost of decision making.
Preferably, as shown in fig. 6, described characteristic vector sets up module includes:
Cleaning submodule:For by abnormal data and do not meet the data that user browses custom, enteringRow cleaning;
Measuring and calculating submodule:For by each terminal use's historical behavior feature after cleaning, daily unitingMeter, structuring user's historical behavior characteristic statisticses function respectively, M area is divided into statistics natural lawBetween section, to each segment according to time attenuation function, calculate the eigenvalue of segment, add upThe eigenvalue of each segment, obtains user characteristicses value:
Setting up submodule:For setting up user's commodity probabilistic forecasting characteristic vector, user's commodity are generalRate predicted characteristics vector representation be:The fingerprint ID++ commodity ID+ user characteristicses of each terminalValue;Wherein, fingerprint ID represents mark ID of user, and commodity ID represents that the mark of commodity is compiledCode.
Characteristic vector is set up in module, does not meet using cleaning submodule cleaning abnormal data and notUser browses the data of custom, then using measuring and calculating submodule measuring and calculating user characteristicses value, finally profitSet up user's commodity probabilistic forecasting characteristic vector with setting up submodule.Wherein, calculate submodule pairEach segment, according to time attenuation function, calculates the eigenvalue of segment, and then add up each areaBetween section eigenvalue.The commodity score of the method impact sequence of multi-level decay, the speed of decayFinal characteristic vector value can be caused different with interval, lead to user's score different, due to user'sCommodity sequence is according to the big minispread of score, therefore the result of score Different Effects sequence (pushes awayRecommend ranking results).
In cleaning submodule, abnormal data refers to compared with other data, and this data is notable phaseDifferent, abnormal or inconsistent data.The for example following data needing to filter, broadly falls intoAbnormal data:Filter same user and add shopping cart merchandise classification number>Merchandise classification threshold value NaUser;Filter the commodity details page browsing record that the browsing time is less than browsing time threshold value Nbs;Filter the commodity details page browsing record that the browsing time is more than browsing time threshold value Ncs;If oneIn individual session, user's level Four page browsing quantity is more than level Four page browsing quantity threshold value Nd, thatFilter this session;User's same day accesses pv and is less than pv threshold value Ne, filters this user.
So-called do not meet user browse custom data refer to normally shopping user behavior poorNot very big data, the such as navigation patterns of reptile user or brush single user.
Preferably, as shown in fig. 7, the first described generation module includes:
Determination sub-module:For according to the access history data in user's history behavioral data and purchaseBuy historical data, using correlation rule or collaborative filtering, calculate the pass of behavior commodityThe degree of association of connection commodity, takes b commodity before degree of association highest, as the pass of behavior commodityConnection commodity set;
Calculating sub module:For calculating the purchase probability of associated articles according to formula (1):
Score_i=Master_Pos*SKU_Score_i/max (SKU_Score_i)Formula (1)
Wherein, Score_i represents the purchase probability of associated articles;Master_Pos represents capableFor commodity purchasing probability;Max (SKU_Score_i) represents degree of association highest in associated articles setValue, SKU_Score_i represents the degree of association of associated articles SKU_i and behavior commodity;
First generation submodule:For merging behavior commodity and associated articles, generate commercial product recommendingList:If behavior commodity are in the associated articles set that determination sub-module is set up, according to behaviorThe prediction purchase probability size sequence of commodity and associated articles, obtains commercial product recommending list;If OKFor commodity not in the associated articles set that determination sub-module is set up, and the prediction purchase of behavior commodityBuy probability be less than probability threshold value, then behavior commodity projection purchase probability is multiplied by penalty coefficient asThe final prediction purchase probability of behavior commodity;Will be pre- according to commodity to associated articles and behavior commoditySurvey the sequence of purchase probability size, obtain commercial product recommending list.
First generates in submodule, and probability threshold value and penalty coefficient are according to comprehensive evaluation index(F-Measure) optimization criteria is selected, and takes the probability when F-Measure is maximumThreshold value and penalty coefficient.
First generation submodule not only allows for behavior commodity it is also contemplated that associated articles, will closeConnection commodity are with behavior commodity together as commodity to be recommended.When selecting Recommendations, according toWhether behavior commodity have existed in associated articles set, and the prediction purchase probability of behavior commodity is enteredThe different process of row, by the behavior commodity after associated articles and process, again according to purchase probabilityIt is ranked up so that position in recommendation list for the behavior commodity more meets the demand of user.
As shown in figure 8, the described merchandise news commending system based on user's history behavior, alsoIncluding the second generation module:For the commercial product recommending list that the first generation module is obtained, carry outFilter and output logical process, generate final commercial product recommending list.Because user buys in the recent periodCommodity, generally will not buy again, thus to first generation module generate commercial product recommending rowTable is filtered and is exported logical process so that not having user near in final commercial product recommending listThe commodity that phase is bought, so that commercial product recommending list more meets the real demand of user.
As shown in figure 9, the second described generation module includes:
Filter commodity group setting up submodule:For taking the order commodity within nearest H days of family,Using affiliated for order commodity commodity group as user filtering commodity group.H is integer, and H>3.
Filter submodule:Belong to for filtering in the commercial product recommending list that the first generation module generatesFilter the commodity of commodity group.
Second generation submodule:According to commodity projection purchase probability, after filter submodule is filteredCommercial product recommending list in commodity re-start sequence, generate final commercial product recommending list.
Choose filtration commodity group by filtering commodity group setting up submodule.Filtration commodity group is userThe commodity bought in the recent period.In the commercial product recommending list that first generation module is generated by filter submoduleBelong to the commodity filtration filtering commodity group.Second generates the commercial product recommending row after submodule will filterCommodity in table, according to commodity projection purchase probability, re-start sequence, generate final businessProduct recommendation list.By above three submodule, the commercial product recommending that the first generation module is generatedIn list, the commodity bought recent with user belong to similar commodity and filter out, so that finallyCommercial product recommending list in the commodity of arrangement meet the real demand of user.
Those skilled in the art should know, realizes method or the system of above-described embodiment, canTo be realized by computer program instructions.This computer program instructions is loaded into programmable dataIn processing equipment, such as computer, thus execute corresponding in programmable data processing deviceInstruction, the function that the method or system for realizing above-described embodiment is realized.
Those skilled in the art, according to above-described embodiment, can carry out non-creativeness to the applicationTechnological improvement, without deviating from the spirit of the present invention.These improvement still should be regarded as in the applicationWithin scope of the claims.