[detailed description of the invention]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings with specific embodiment pairThe present invention is described in detail.
Fig. 1 is the flow chart of the processing method embodiment one of the user identity of the present invention.As it is shown in figure 1, the present embodimentThe processing method of user identity, specifically may include steps of:
100, the behavioral data of pending user is obtained;
The behavioral data of the pending user of the present embodiment can be that the historical behavior to this pending user is acquiredObtaining, the behavioral data of pending user is specifically as follows related data, can include a lot in this dataField.In actual application, the when of gathering the behavioral data of this pending user, can carry out targetedly in conjunction with setting identityAcquisition.Such as set identity as car owner, when obtaining the behavioral data of this pending user, can tend to obtain this and wait to locateThe behavioral data relevant to car owner of reason user.Such as in these behavioral datas can include a period of time of this pending userMap retrieval number of times and use natural law, essential information point (point of interest;Poi) retrieval number of times and natural law, public transportRoute retrieval, driving route retrieval, Walking Route inspection and subway line retrieval number of times, retrieval number of times accounting and natural law, user groundThe retrieval number of times of figure use time cumulation duration, user to use each term that map natural law is relevant to car, retrieve number of times withAccounting in car coordinate indexing total degree and natural law etc..The most each poi can comprise four directions surface information: title, classification, warpInformation such as degree latitude and neighbouring retail shop of restaurant of hotel etc., this poi information can also be referred to as " navigation map information ".Wherein withThe term that car is relevant comprises: gas station, automobile services, sale of automobile, 4S shop, car detailing, Motor Maintenance, automobile decoration,Carwash, auto parts machinery, Vehicle Inspection field etc..
101, according to identity converter, the behavioral data of pending user is converted into identity characteristic vector;
Owing to the behavioral data of pending user cannot be processed by follow-up identity processor, the present embodiment needsFirst converting the behavioral data of this pending user, the identity characteristic vector after conversion identifies pending user for uniqueBehavioral data, the identity characteristic vector of such as the present embodiment can use numeral 0 and 1 constitute.
102, according to identity processor, the identity characteristic vector of pending user is processed, to determine pending userIdentity be whether to set identity.
The identity processor of the present embodiment can process with the identity characteristic vector of pending user, it is achieved to pendingWhether the identity of user is identified, be to set identity to determine the identity of pending user.The setting identity of the present embodiment canThink car owner, or can also be other such as drawing practitioner or certain art speciality practitioner or other machinery orPractitioner of Engineering etc..
The processing method of the user identity of the present embodiment, by using technique scheme, it is achieved according to pending userBehavioral data the identity of pending user is determined, compared with prior art, can effectively ensure user identityThe accuracy determined, and be effectively improved the determination efficiency of user identity, in order to follow-up carry out according to the user identity determinedThe information pushing relevant to this user identity, and then it is effectively improved the efficiency of information pushing.
Still optionally further, on the basis of the technical scheme shown in above-mentioned Fig. 1, before step 101, it is also possible to include asLower step a: according to the first behavior data set and the second behavioral data collection of non-setting identity user of setting identity user, generateIdentity converter.
Specifically, the first behavior data set of the setting identity user of the present embodiment and second row of non-setting identity userCan also collect in advance for data set, such as, can obtain from each user access record on the internet.FirstBehavioral data concentrates the behavioral data including multiple setting identity user, permissible in the behavioral data of each setting identity userData including multiple fields.The data of the multiple fields in the behavioral data of each setting identity user are relevant as oneConnection data store, so, in the first behavior data set the behavioral data of the user of multiple setting identity be exactly according toFamily stores one by one.In like manner the second behavioral data concentrates the behavioral data including multiple non-setting identity user, its storageMode is with reference to the behavioral data storage mode in above-mentioned first behavior data set, the behavioral data of the user of multiple non-setting identityAlso it is to store one by one according to user.The first behavior data set according to setting identity user sets the row of identity userFor the behavioral data of the non-setting identity user that the second behavioral data of data and non-setting identity user is concentrated, genus can be found outIn setting some general character of behavioral data of identity user, and it is not belonging to the individual character of specific identity user, as such, it is possible to according toFirst behavior data set and the second behavioral data collection, generate identity converter.This identity converter is possible not only to according to behavior numberIt is identified according to the identity of pending user, identity characteristic vector can also be generated according to recognition result simultaneously.
The quantity of the user of the setting identity that the first behavior data set includes and the second behavioral data collection in the present embodimentThe quantity of the user of the non-setting identity included can be identical, it is also possible to differs;Quantity specifically can be 20,500Or any number such as 800;And the quantity of the user of setting identity that includes of the first behavior data set and the second behavior numberAccording to concentrating the quantity of the user of non-setting identity included, the identity converter obtained and identity processor carry out identity convert andThe accuracy that identity processes is the highest.
The setting identity of such as the present embodiment can be car owner, and non-setting identity is non-car owner.According to car owner firstAll behavioral datas that second behavioral data of behavioral data collection and non-car owner is concentrated, can train an identity converter, shouldIdentity converter is capable of the behavioral data of user being converted into a corresponding identity characteristic vector, to simplify behavioral dataRepresentation, facilitate the follow-up identification carrying out user identity according to identity characteristic vector.
Such as this step a is specifically as follows: use each in the first behavior data set to set the behavior of identity userThe behavioral data of each non-setting identity user that data and the second behavioral data are concentrated, the identity that training is preset converts mouldType, obtains identity converter.
The identity transformation model of the present embodiment specifically can use random forest (random forest) or gradient to promoteDecision tree (Gradient Boost Decision Tree;GBDT) realize.
For example, it is possible to the behavioral data of the user of the setting identity that the first behavior data set is included and the second behavior numberAccording to the behavioral data of the user concentrating the non-setting identity included, it is referred to as training data.Then these training datas are utilized to instructPracticing random forest or GBDT model, can obtain N decision tree respectively, wherein a tree N of decision tree is by training processThe decision tree number state modulator of middle configuration, can configure according to the actual requirements before training.Every in N decision treeOne tree leaf node be all numbered, the m-th leaf node of such as n-th decision tree be (n, m).To every training data,It is the probability setting identity user that every tree all can have a leaf node to export this user, if probability is more than specifying threshold value, and canWith by result queue for 1, no person is 0.And owing to the identity of the user of every training data determines that, when output result withWhen the identity of the user known is inconsistent, random forest or the GBDT model of correspondence can be adjusted, to adjust final acquisitionN decision tree, improve N the decision tree accuracy to user identity identification.All known by use in training dataRandom forest or GBDT model are trained by the behavioral data of user identity, finally give N decision tree and areThe identity converter obtained eventually, the identity of user can be identified according to behavioral data, the most also may be used by this identity converterTo generate only one identity characteristic vector according to recognition result.Such as, for the behavioral data of each user, this N treeLeaf node can export one N*M vector of acquisition, wherein M is the leaf node of the most decision tree of leaf node numberNumber.If n-th tree m-th leaf node discriminative training data be labeled as 1, then ((n-1) * M+m) bit element is 1,No person is 0, and the element not exporting the leaf node of result corresponding is 0.If the leaf node number of certain decision tree is less than M, exceedLeaf node number element below is with 0 leaf node supplying disappearance.Due to each behavioral data pair in training dataThe identity of the user answered all determines that, so, according to the behavioral data of the user that these identity determine, can train this N certainlyPlan tree is capable of identify that the ability of user identity, and simultaneously for the behavioral data of each user, this N decision tree can be corresponding defeatedGo out a corresponding identity characteristic vector.N decision tree parameter after training all secures, and can accurately realize pendingThe identification of user identity, now this N decision tree can form an identity converter.
Still optionally further, on the basis of the technical scheme of above-mentioned embodiment illustrated in fig. 1, before step 102, also may be usedTo include step b: according to the first behavior data set, the second behavioral data collection and identity converter, generate identity processor.
Such as, following two steps can be included when this step b implements:
(b1) each in the first behavior data set is set behavioral data and the second behavioral data collection of identity userIn the behavioral data of each non-setting identity user, be separately input in identity converter, obtain the first behavior data setIn each set identity user identity characteristic vector sum second behavioral data concentrate each non-setting identity userIdentity characteristic vector;
(b2) each in the first behavior data set is used to set identity characteristic vector sum second behavior of identity userThe identity characteristic vector of the non-setting identity user of each in data set, the identity that training is preset processes model, obtains identityProcessor.
Owing to identity converter can realize the behavioral data conversion to corresponding identity characteristic vector of user, this enforcementIn example after generating identity converter, each in the first behavior data set can be set the behavioral data of identity userWith the behavioral data of each non-setting identity user that the second behavioral data is concentrated, respectively by identity converter, it is converted intoCharacteristic vector, as such, it is possible to obtain first eigenvector set corresponding in the first behavior data set and the second behavioral data collectionCorresponding second feature vector set.Then the characteristic vector in first eigenvector set and second feature vector set is dividedNot as training data, the identity that training is preset processes model, to obtain identity processor.Such as, this identity preset processesModel is specifically as follows Logic Regression Models.Owing to the characteristic vector in first eigenvector set is the feature of setting userVector, so, inputs the characteristic vector in first eigenvector set and processes in model to the identity preset, can calculateWhether user corresponding to this feature vector is the probability of the user presetting identity, if probability is more than or equal to predetermined probabilities thresholdValue, it may be determined that this user is the user presetting identity;Probability is less than predetermined probabilities threshold value else if, then explanation needs to adjustThis identity preset processes model so that the probit of calculating is more than this predetermined probabilities threshold value.By using first eigenvectorAll characteristic vectors in set and second feature vector set, constantly train this identity preset to process model, the most constantlyGround adjusts this identity preset and processes model, obtains this identity processor.
For example with GBDT model training be decision tree number be set to 3 tree time, the decision tree at training is the most permissibleThere are 4 leaf nodes.For the behavioral data of certain user a, leaf node (1,1), (2,2), the differentiation result that (3,4) are given isBe respectively 1,0,1, the identity characteristic vector of the 0-1 attribute that the behavioral data of the most pending user is corresponding be (1,0,0,0,0,0,0,0,0,0,0,1), the thick differentiation end value corresponding for providing the leaf node differentiating result is marked.
Still optionally further, on the basis of the technical scheme of above-mentioned embodiment illustrated in fig. 1, step 102 specifically can be wrappedInclude following steps:
(c1) the pending user corresponding to identity characteristic vector of pending user is calculated according to identity processor for settingThe probit of the user of identity;
(c2) judge that whether probit is more than or equal to predetermined probabilities threshold value;If so, determine that pending user is for settingThe user of identity;That otherwise determine this pending user is not the user setting identity.
The probit of the present embodiment can the most rule of thumb be chosen.
The technical scheme of above-described embodiment, compared with prior art, can ensure the standard of the determination of user identity effectivelyReally property, and be effectively improved the determination efficiency of user identity, in order to follow-up carry out and this user according to the user identity determinedThe information pushing that identity is relevant, and then it is effectively improved the efficiency of information pushing.
Fig. 2 is the flow chart of the processing method embodiment two of the user identity of the present invention.The present embodiment combines above-mentioned realityExecuting the technical scheme of example, to set identity as car owner, the non-identity that sets, as a example by non-car owner, describes technical scheme.As in figure 2 it is shown, the processing method of the user identity of the present embodiment, specifically may include steps of:
200, the first behavior data set and the second behavioral data collection of non-car owner of car owner are gathered;
Such as, specifically the user data from the Internet can gather this first behavior data set and the second behavioral dataCollection.Wherein the first behavior data set includes that the behavioral data of multiple car owner, the second behavioral data are concentrated and includes multiple non-car ownerBehavioral data.
The behavioral data of the present embodiment includes the behavioral data of the multiple fields relevant to car, is referred to above-mentioned in detailThe record of embodiment, does not repeats them here.
201, the behavioral data that the first behavior data set of car owner and second behavioral data of non-car owner are concentrated is utilized, rightRandom forest or GBDT model are trained, and N the decision tree obtained is as identity converter;
Concrete training principle is referred to the record of above-described embodiment, does not repeats them here.N the decision-making finally givenThe leaf node number setting the most decision tree of middle period son node number corresponding is M.
202, the behavioral data first behavior data set of car owner and second behavioral data of non-car owner concentrated, adopts respectivelyChange with identity converter, obtain the first identity characteristic vector set corresponding to the first behavior data set of car owner and non-car ownerSecond Identity of Local vector set corresponding to the second behavioral data collection;
Owing to each behavioral data can uniquely change a corresponding identity characteristic vector by identity transducer.SoIn the behavioral data of each car owner in the first behavior data set of car owner and the first identity characteristic vector of car owner oneIdentity characteristic vector one_to_one corresponding, the behavioral data of each non-car owner that second behavioral data of non-car owner is concentrated and non-car ownerSecond Identity of Local vector in an identity characteristic vector one_to_one corresponding.
203, the first identity characteristic vector set and second row of non-car owner that the first behavior data set of car owner is corresponding are utilizedFor the characteristic vector in the Second Identity of Local vector set that data set is corresponding, Logic Regression Models is trained, obtains identityProcessor;
Concrete training principle is referred to the record of above-described embodiment, does not repeats them here.
204, the behavioral data of pending user is obtained;
205, the behavioral data of pending user is inputted to identity processor, calculate this pending user for car owner'sProbit;
206, judge that whether probit is more than or equal to predetermined probabilities threshold value;If so, determine that pending user is car owner;Otherwise determine that this pending user is non-car owner.
The said method using the present embodiment can also realize the identification of the user to other default identity.
Along with the development of the Internet, going on a journey with car, carwash, the Internet service relevant to car such as Motor Maintenance is gradually by peopleAccept.In the internet, applications that these are relevant to car, car owner is the core of all services, therefore obtains car owner's body of the unknownPart becomes extremely important, especially during promoting service and ownership.Use the said method of the present embodiment, permissibleGet whether pending user is car owner accurately and efficiently, in order to follow-up car owner's identity according to user carries out vehicle phaseThe information pushing closed, and then it is effectively improved the efficiency of information pushing.
Fig. 3 is the structure chart of the processing means embodiment one of the user identity of the present invention.As it is shown on figure 3, the present embodimentThe processing means of user identity, specifically may include that acquisition module 10, conversion module 11 and processing module 12.
Wherein acquisition module 10 is for obtaining the behavioral data of pending user;Conversion module 11 is for converting according to identityThe behavioral data of the pending user that acquisition module 10 is obtained by device converts identity characteristic vector, and identity characteristic vector is for uniqueIdentify the behavioral data of pending user;Processing module 12 is pending for convert conversion module 11 according to identity processorWhether the identity characteristic vector of user processes, be to set identity to determine the identity of pending user.
The processing means of the user identity of the present embodiment, realizes the realization of the process of user identity by the above-mentioned module of employingPrinciple and technique effect, with above-mentioned related method embodiment realize identical, be referred to the record of above-described embodiment in detail,Do not repeat them here.
Fig. 4 is the structure chart of the processing means embodiment two of the user identity of the present invention.As shown in Figure 4, the present embodimentThe processing means of user identity, on the basis of the technical scheme of above-mentioned embodiment illustrated in fig. 3, further comprises following technologyScheme.
As shown in Figure 4, the processing means of the user identity of the present embodiment also includes identity converter generation module 13.This bodyPart converter generation module 13 is for according to the first behavior data set and the second of non-setting identity user setting identity userBehavioral data collection, generates identity converter.
Still optionally further, the identity converter generation module 13 in the processing means of the user identity of the present embodiment is concreteEvery for use each behavioral data setting identity user in the first behavior data set and the second behavioral data to concentrateThe behavioral data of one non-setting identity user, the identity transformation model that training is preset, obtain identity converter.Wherein convert mouldBlock 11 is specifically used for according to the identity converter generated with identity converter generation module 13, by treating that acquisition module 10 obtainsThe behavioral data processing user converts identity characteristic vector.
Still optionally further, as shown in Figure 4, the processing means of the user identity of the present embodiment also includes identity processorGeneration module 14.Wherein identity processor generation module 14 is for according to the first behavior data set, the second behavioral data collection and bodyThe identity converter that part converter generation module 13 generates, generates identity processor.
Still optionally further, identity processor generation module 14 is specifically for setting each in the first behavior data setDetermine the behavioral data of identity user and the behavioral data of each non-setting identity user of the second behavioral data concentration, the most defeatedEnter to the identity converter of identity converter generation module 13 generation, obtain each in the first behavior data set and set bodyThe identity characteristic vector of each non-setting identity user that identity characteristic vector sum second behavioral data of part user is concentrated;AdoptWith in the first behavior data set each set identity user identity characteristic vector sum the second behavioral data concentrate eachThe identity characteristic vector of individual non-setting identity user, the identity that training is preset processes model, obtains identity processor.
The identity preset of such as the present embodiment processes model and includes Logic Regression Models.
Still optionally further, as shown in Figure 4, in the processing means of the user identity of the present embodiment, processing module 12 is concreteIncluding: computing unit 121 and judging unit 122.
Wherein computing unit 121 calculates for the identity processor generated according to identity processor generation module 14 and converts mouldThe probit of the user that pending user be setting identity that the identity characteristic vector of pending user that block 11 converts is corresponding;SentenceWhether the probit that disconnected unit 122 calculates for judging computing unit 121 is more than or equal to predetermined probabilities threshold value;The most trueFixed pending user is the user setting identity.
The processing means of the user identity of the present embodiment, realizes the realization of the process of user identity by the above-mentioned module of employingPrinciple and technique effect, with above-mentioned related method embodiment realize identical, be referred to the record of above-described embodiment in detail,Do not repeat them here.
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method are permissibleRealize by another way.Such as, device embodiment described above is only schematically, such as, and described unitDividing, be only a kind of logic function and divide, actual can have other dividing mode when realizing.
The described unit illustrated as separating component can be or may not be physically separate, shows as unitThe parts shown can be or may not be physical location, i.e. may be located at a place, or can also be distributed to multipleOn NE.Some or all of unit therein can be selected according to the actual needs to realize the mesh of the present embodiment scheme's.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it is also possible toIt is that unit is individually physically present, it is also possible to two or more unit are integrated in a unit.Above-mentioned integrated listUnit both can realize to use the form of hardware, it would however also be possible to employ hardware adds the form of SFU software functional unit and realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in an embodied on computer readable and depositIn storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions with so that a computerEquipment (can be personal computer, server, or the network equipment etc.) or processor (processor) perform the present invention eachThe part steps of method described in embodiment.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. variousThe medium of program code can be stored.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all essences in the present inventionWithin god and principle, any modification, equivalent substitution and improvement etc. done, within should be included in the scope of protection of the invention.