Based on magnanimity across the user's portrait method for shielding behavioral dataTechnical field:
The present invention relates to medium field network information processing technology, it is more particularly to a kind of based on magnanimity across screen behavioral dataUser's portrait method.
Background technology:
With after CHINA RFTCOM Co Ltd company translate the epoch arrival, digital television business development it is increasingly mature, pay channel, whenBe moved back to see, VOD program requests, a variety of two-way interaction new business such as other value-added services (stock, TV store, game etc.) are continuousEnrich the business service content of Broadcast Television network operators, the development priorities of Broadcast Television network operators gradually builds from digital platform,Bidirectional network Transformation has turned to business operation and the profit model of more diversification.
Although the important handgrip for developing into the sharp synergy of Broadcast Television network operators increasing of value-added service, is due to not severalAccording to supporting, lacking to understand the solid of user, the often construction and operation of value-added service and the actual demand of user exists largerDeviation, causes business project verification without standard, and function is reached the standard grade the awkward state of no one, how to obtain user's portrait, Quan Mianzhang in netUser's potential demand is held, goes accurate service guidance to develop according to user's request, becomes the problem of operator assistant officer is to be solved.
On the other hand, Broadcast Television network operators also rest on the aspect of basic business marketing to the understanding mode of user, lead toCross historical development experience to judge the use habit and potential demand of user, it is difficult to quantify, which can not be CHINA RFTCOM Co Ltd fortuneSeek business's service operation and accurately data supporting is provided.
The content of the invention:
In view of this, the invention provides it is a kind of based on magnanimity across screen behavioral data user draw a portrait method.This method masterTo be solved be in face of more and more flexible two-way new media business, in face of million grades, in addition millions user magnanimity behaviorData, carry out HDFS distributed storages by the user behavior data collected, data are extracted by ETL module, changedAfter loading, by meet the optimal combined algorithm of media industry feature by the user behavior data of magnanimity merge content tab,User tag, consumption label, geographical labels, device label, user property etc. carry out efficient data prediction, and ultimately formUser draws a portrait, then is drawn a portrait by the related user of WEB application routine call, is provided precisely for Broadcast Television network operators service operationData supporting.
The concrete technical scheme of the present invention is as follows:
Based on magnanimity across user's portrait method of screen behavioral data, comprise the following steps:
(1) terminal data acquisition module, HDFS distributed storages module, ETL module, portrait module, WEB application mould are setBlock;
(2) terminal data acquisition module is used to gather viewing behavior data of the user in multimedia messages playback terminal, andThe data forwarding gathered is responsible for storage to HDFS distributed storage modules;
(3) HDFS distributed storages module except be responsible for storage user audience data, be also responsible for storage other the 3rdMethod, system isomeric data;
(4) ETL module be responsible for the user audience data stored is extracted from HDFS distributed storages module,Conversion and loading, and provide infrastructure elements data for the behavior modeling module in portrait module;
(5) portrait module includes behavior modeling, portrait label, model prediction, and user draws a portrait these modules;
(6) WEB application module is the weblication that terminal is embedded, visual presentation and download for user tag.
In such scheme, the multimedia messages playback terminal includes DVB STB (DTV STB), OTT (interconnectionsNet set top box), intelligent television, mobile phone, tablet personal computer.
In such scheme, other described third party system isomeric datas are these page browsing data of PV, UV.
In such scheme, the behavior modeling module in the mark portrait block to the data after ETL of upper stage to enter every tradeFor modeling, to take out the portrait label of user, this stage focuses on Great possibility, arranged as much as possible by mathematical algorithm modelExcept the accidental behavior of user;Behavior modeling algorithm includes, text mining, natural language processing, prediction algorithm, clustering algorithm,Machine learning algorithm etc..
In such scheme, the portrait label model in the portrait module is formed on the basis of the reliability of the adjustment model checkingLabel, which define including content tab, user property, behavior label, user tag, consumption label, geographical labels, equipmentLabel;The content tab gathers EPG (electronic program list) pieces forms data by terminal acquisition module and obtained, and content tab is definedThe dimensions such as one-level label, label dimension, detailed label, the label data based on programme information is provided for algorithm processing module;InstituteThe main body that user property defines label object is stated, user property basic element evidence includes Customs Assigned Number, DTV STBThe information such as MAC Address, affiliated area;The terminal device viewing behavior number that the behavior label is obtained by terminal acquisition moduleAccording to by analyzing user audience data, obtaining the data such as user watched duration, rating number of times, the rating frequency, be at algorithmManage module and calculating basis is provided;The user tag defines the rating preference of user;All basic metadatas of the user tagCome from automatic data collection and the processing of machine, gather standard criterion, whole no manual intervention is a kind of user tag of standardizationTaxonomic hierarchies;The user tag is included:Sports, film, variety entertainment, service for life, juvenile's animation, science and education, TV columnMesh, news program, documentary film, financial finance and economics, TV play, other etc..It is described to consume the tag definition consumption preferences mark of userLabel;Consume label and include shopping category, number of visits, single-page residence time, access duration, the transaction frequency, scoring, collectionDeng;The geographical labels define user behavior historical address information;Geographical labels comprising longitude and latitude, structuring address information,Commercial circle information etc.;The device label defines the facility information of user;Device label comprising device type, brand, model, setStandby characteristic etc..
In such scheme, the model prediction module in the portrait module by the analysis to business, will portrait label withMarketing Model, business model etc. are combined, and form user's value models, content temperature model consumer loyalty degree model, height bodyPattern type, customer loss model etc.;User's value models calculate the value mould based on user watched behavior by RFM modelsType;Total viewing duration in the variate-value R=nearest viewing time F=viewing frequency M=cycles;The content temperature model passes through heatRank algorithm is spent, the prediction of video content temperature is formed;Key index:Pageview, push up, step on, the time;In conjunction with user interest labelThe weight ratio of each element, is given a mark by weighted calculation for each content, passes through fraction formation temperature list;The user is loyalReally degree model judges the loyalty of user by business rule, portrait label, clustering algorithm;The height and weight model by usingThe commodity such as family purchase clothes, footwear, cap and consumption label are judged;The customer loss model passes through user behavior label, industryBusiness rule, time dimension, consumption frequency etc. are judged.
In such scheme, user's portrait module in the portrait module is the labeling system of basic forming, includes useFamily value, liveness, loyalty, at heart influence power, feature, social networks, crowd's attribute, instantly consuming capacity, demand, potentialThe multistage label such as demand and multiclass classification.
By the invention described above methods described Broadcast Television network operators can be made to make full use of existing bilateral network passage to obtainThe mass users behavioral data got, merges other third party's consumption data, geodatas etc., fast and effectively obtains solidUser is drawn a portrait and accurately user's request, and Operation Decision foundation is provided for operator.It is more existing simultaneously in resource utilizationSampling survey techniques can save substantial amounts of hardware device resources and personnel cost.
Brief description of the drawings:
The present invention is further illustrated below in conjunction with the drawings and specific embodiments.
Fig. 1 is the step block diagram of method of the present invention of being drawn a portrait based on magnanimity across the user of screen behavioral data.
Embodiment:
In order that the technical means, the inventive features, the objects and the advantages of the present invention are easy to understand, tie belowConjunction is specifically illustrating, and the present invention is expanded on further.
As shown in figure 1, it is of the present invention based on magnanimity across screen behavioral data user draw a portrait method, first be set eventuallyEnd data acquisition module, HDFS distributed storages module, ETL module, portrait module, WEB application module;Secondly, terminal dataAcquisition module be used for gather user multimedia messages playback terminal (including DVB STB (DTV STB), OTT (interconnectionNet set top box), intelligent television, mobile phone, tablet personal computer etc.) viewing behavior data, and by the data forwarding gathered to HDFSDistributed storage module is responsible for storage;HDFS distributed storages module is also responsible for except being responsible for storage user audience dataStore other third party system isomeric datas (these page browsing data of PV, UV);ETL module is responsible for from HDFS distributed storagesModule is extracted, changed and loaded to the user audience data stored, and is the behavior modeling mould in portrait moduleBlock provides infrastructure elements data;Module of drawing a portrait includes behavior modeling, portrait label, model prediction, and user draws a portrait these modules;WEB application module is the weblication that terminal is embedded, visual presentation and download for user tag.
It is to be noted that the behavior modeling module in mark portrait block is built to carry out behavior to the data after ETL of upper stageMould, to take out the portrait label of user, this stage focuses on Great possibility, use is excluded as much as possible by mathematical algorithm modelThe accidental behavior at family;Behavior modeling algorithm includes, text mining, natural language processing, prediction algorithm, clustering algorithm, machineLearning algorithm etc..
Portrait label model in portrait module is the label formed on the basis of the reliability of the adjustment model checking, is which definedIncluding content tab, user property, behavior label, user tag, consumption label, geographical labels, device label;Content tab byTerminal acquisition module collection EPG (electronic program list) pieces forms data is obtained, content tab define one-level label, label dimension,The dimensions such as detailed label, the label data based on programme information is provided for algorithm processing module;User property defines label pairThe main body of elephant, user property basic element evidence includes the information such as Customs Assigned Number, DTV STB MAC Address, affiliated area;The terminal device viewing behavior data that behavior label is obtained by terminal acquisition module, by analyzing user audience data,The data such as user watched duration, rating number of times, the rating frequency are obtained, calculating basis is provided for algorithm processing module;User tagDefine the rating preference of user;All basic metadatas of the user tag come from automatic data collection and the processing of machine, collectionStandard criterion, whole no manual intervention is a kind of user tag taxonomic hierarchies of standardization;The user tag is included:Physical culture is competingSkill, film, variety entertainment, service for life, juvenile's animation, science and education, TV column, news program, documentary film, financial finance and economics, electricityDepending on it is acute, other etc..Consume the tag definition consumption preferences label of user;Consume label and include shopping category, number of visits, listPage residence time, access duration, the transaction frequency, scoring, collection etc.;Geographical labels define user behavior historical address letterBreath;Geographical labels include longitude and latitude, structuring address information, commercial circle information etc.;Device label defines the facility information of user;Device label includes device type, brand, model, device characteristics etc..
Model prediction module in module of drawing a portrait is by the analysis to business, by portrait label and Marketing Model, business mouldType etc. is combined, and forms user's value models, content temperature model consumer loyalty degree model, height build model, customer lossModel etc.;User's value models calculate the value models based on user watched behavior by RFM models;Variate-value R=most close upSee total viewing duration in the time F=viewing frequency M=cycles;Content temperature model is formed in video by temperature rank algorithmHold temperature prediction;Key index:Pageview, push up, step on, the time;In conjunction with the weight ratio of each element of user interest label, pass throughWeighted calculation is given a mark for each content, passes through fraction formation temperature list;Consumer loyalty degree model passes through business rule, portraitLabel, clustering algorithm judge the loyalty of user;Height and weight model buys commodity and the consumption such as clothes, footwear, cap by userLabel is judged;Customer loss model is sentenced by user behavior label, business rule, time dimension, consumption frequency etc.It is disconnected.
User's portrait module in module of drawing a portrait is the labeling system of basic forming, includes user's value, liveness, loyaltySincere degree, influence power, at heart feature, social networks, crowd's attribute, consuming capacity, instantly the multistage label such as demand, potential demand andMulticlass classification.
The data prediction behaviour that magnanimity is carried out due to employing the algorithm bag and data model of optimum organization in the above methodMake, the processing of each user tag, it is only necessary to participate in real-time fortune from the extracting data related data by data predictionCalculate, it is not necessary to inquired about and computing from complete original magnanimity behavioral data, analytic operation efficiency is as needed for prior artSeveral hours, the very long stand-by period of more than ten hour, be promoted to the second level, or even Millisecond real-time response, greatly improveData operation efficiency, while whole data operation process uses Machine self-learning algorithm completely, it is only necessary to common PC servicesDevice resource can be completed, and greatly save the input of human resources input and hardware server resource.
The general principle and principal character and advantages of the present invention of the present invention has been shown and described above.The technology of the industryPersonnel are it should be appreciated that the present invention is not limited to the above embodiments, and the simply explanation described in above-described embodiment and specification is originallyThe principle of invention, without departing from the spirit and scope of the present invention, various changes and modifications of the present invention are possible, these changesChange and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and itsEquivalent thereof.