It is on May 8th, 2012 that the application, which is the applying date, and application No. is 201210152084.1, entitled " one kind is obtainedTake the method and system of family individualized feature " patent divisional application.
Specific embodiment
The method of the present invention is described in further detail in conjunction with attached drawing.
To the explanation of this patent method specific embodiment, including following components.Firstly, illustrate user, document andThe parameter vector representation method of the method for numbering serial of feature and user and document;Then, illustrate to access subscriber signal based on userParameter vector more new algorithm, and based on user access document signal parameter vector more new algorithm;Later, illustrate in social activityThe method for searching user or user group according to given feature on network;Finally, illustrating that a kind of user individual feature of obtaining isSystem.
Illustrate the method for numbering serial of user, document and feature first.
Multiple users are obtained on the internet, and each user possesses at least one user identifier, and the user identifier includesOne in user account number, phone number, Cookie identification code, IP address, the address Email and instant communication number.To acquisitionMultiple users carry out Unified number, and by Customs Assigned Number pool together composition user collect U={ 1,2 ..., M }, wherein M isUser's number, each user have unique subscriber-coded.Equally, multiple documents are obtained on the internet, such as pass through spider journeySequence obtains multiple Web pages.Document on internet has unique identification, such as the address URL of Web page.To the more of acquisitionA document carries out Unified number, and document code is pooled together composition document sets D={ 1,2 ..., N }, and wherein N is textShelves number, each document have unique document coding.
The user is collected into feature possessed by each element in the U and document sets D and carries out Unified number, composition is specialIt collects K={ 1,2 ..., L }, wherein L is characterized number.The attribute of the character representation user and document, such as news, wealthThrough, science, music, military affairs and sport etc..
The representation method of the parameter vector of user and document is described below.The parameter vector representation method and vector spaceThe vector expression method of model VSM is similar, i.e., using characteristic item as user characteristics or the basic unit of file characteristics.With user withThe parameter vector of the degree of correlation of each feature gathered to indicate user, with the set of document and the degree of correlation of each feature come tableShow the parameter vector of document.If some user or document do not have some feature, user or document and this featureThe degree of correlation is zero.
Fig. 1 is the parameter vector representation method that user collects each user in U.Collect any one user i (i ∈ in U in userU parameter vector) is set as Ku (i)=(uwi1, uwi2 ..., uwik ..., uwiL), wherein the uwik indicates the useThe degree of correlation of family i and feature k (k ∈ K), uwik ∈ [a, b], a and b are nonnegative constant.In addition, the user is collected every in UThe degree of correlation of k-th of feature of a user and feature set K pools together one vector of composition, is called k-th of the use that user collects UFamily column vector (uw1k, uw2k ..., uwMk).
Fig. 2 is the parameter vector representation method of each document in document sets D.Any one document n (n ∈ in document sets DD parameter vector) is set as Kd (n)=(dwn1, dwn2 ..., dwnk ..., dwnL), wherein the dwnk indicates the textThe degree of correlation of shelves n and feature k (k ∈ K), dwnk ∈ [a, b], a and b are nonnegative constant.In addition, by every in the document sets DThe degree of correlation of k-th of feature of a document and feature set K pools together one vector of composition, is called k-th of text of document sets DShelves column vector (dw1k, dw2k ..., dwNk).
The degree of correlation is a real number value, it indicates the relationship of some feature in document or user and feature set KTightness degree.As soon as if document or user be associated with musical features it is more be associated with sports feature it is a little less, weSay that the degree of correlation of the document or user and musical features is high, it is low with the degree of correlation of sports feature.In addition, being between some featuresThe dimension of feature set K can be reduced by reducing the correlation between feature with correlation, therefore in feature selecting,The demand to server storage is reduced, efficiency of algorithm is improved.Some features need not be directly included in feature set, because theseThe degree of correlation of feature can be come out by the relatedness computation of one or several other features in feature set K.
Illustrate the setting method of the parameter vector initial value of user or document below.It is illustrated for following three example.If initial value is not set in the parameter vector of user or document, parameter vector initial value is default to be set as null vector.
The method that example 1 is artificial setting user i (i ∈ U) or the parameter vector initial value of document n (n ∈ D).Such as it setsSet feature sum L=5, feature set K=(science, finance and economics, education, music, sport), setting Ku (i)=(uwi1, uwi2,Uwi3, uwi4, uwi5)=(0,0.00032,0,0.00059,0).That is user i and the degree of correlation of " finance and economics " feature are0.00032, the degree of correlation with " music " feature is 0.00059, and the degree of correlation with other feature is zero.It, can using similar approachParameter vector Kd (n)=(dwn1, the dwn2 ..., dwnk ..., dwnL) initial value of any document n is arranged.
Example 2 is that the method for the parameter vector initial value of user i (i ∈ U) is arranged.One group of document sets is submitted by the user iConjunction H=..., r ... }The parameter vector of the document r (r ∈ H) is Kd (r)=(dwr1, dwr2 ..., dwrL),Therefore, for each k ∈ K, it is arranged uwik=(σ 1/s) ∑ (r ∈ H) [dwrk/ (∑ (k ∈ K) dwrk)], wherein s is instituteThe element number of set H is stated, σ 1 is setting constant.Using similar approach, the user i can also collect in U in the user to be selectedOne group of user is selected to calculate the parameter vector initial value of the user i.
Example 3 is a kind of method of parameter vector initial value that document is arranged.Catalogue is a kind of special document, there is corresponding textShelves number.Such as portal website generally includes the classified catalogues such as news, music, sport, finance and economics and science and technology.We assume that identical meshDocument under record is all related to sport with certain identical features, such as the document under sport catalogue.If document n (n ∈ D)A document under catalogue h (h ∈ D), then the parameter vector initial value of the document n by the parameter vector of the catalogue h LaiIt determines.For example, dwnk=σ 2dwhk is arranged for each k ∈ K, wherein σ 2 is setting constant.
Fig. 3 is the method that the acquisition user individual feature of subscriber signal is accessed based on user.Specifically comprise the following steps:
S11. obtain multiple users on the internet, storage by the user that the multiple user forms collect U=1,2 ..., M };Multiple features are set, the feature set K={ 1,2 ..., L } being made of the multiple feature is stored;
S12. collect multiple user setting parameter vector initial values in U for the user;
S13. the signal that any one user i (i ∈ U) accesses any one user j (j ∈ U) is received;
S14. parameter vector Ku (i)=(uwi 1, uwi2 ..., the uwik ..., uwiL) of the user i is read,Described in uwik indicate the degree of correlation of the user i and feature k (k ∈ K);
S15. parameter vector Ku (j)=(uwj1, uwj2 ..., uwjk ..., the uwjL) of the user j is read, whereinThe uwjk indicates the degree of correlation of the user j and feature k (k ∈ K);
S16. with following parameter vector more new algorithm, the parameter vector of the user i and the user j are updated,
Ku* (i)=function1 [Ku (i), Ku (j)],
Ku* (j)=function2 [Ku (i), Ku (j)];
Return step S13.
Wherein, the Ku (i) and the Ku* (i), which respectively indicate, updates the preceding parameter vector with the user i after update,The Ku (j) and the Ku* (j), which respectively indicate, updates the preceding parameter vector with the user j after update;The Ku* (i)=(uwi1*, uwi2* ..., uwik* ..., uwiL*), the Ku* (j)=(uwj1*, uwj2* ..., uwjk* ...,uwjL*);The function1 indicates that Ku* (i) is the function of Ku (i) He Ku (j), and the function2 indicates that Ku* (j) isThe function of Ku (i) and Ku (j);The user i and user j represents any two user in user's collection U, and is not specific to certainTwo users, for example, when n-th executes step S13, i=1023, j=29328 in the signal, and (n+1)th time executes stepWhen rapid S13, i=737443, j=837487 in the signal.
It further include updating the Ku (i) and the Ku (j) after having executed the step S16 in Fig. 3 the methodThe step of, i.e. progress assignment Ku (i)=Ku* (i) and Ku (j)=Ku* (j).
In an application example of Fig. 3 the method, the type of the signal is with one of Types Below: T=1Indicate that described user i concern (follow) described user j, T=2 indicate that the user i adds as a friend the user j, T=3Indicate that the user i forwards the information of the user j, T=4 indicates that the user i comments on the information of the user j publication, T=5 indicate that the user i collects the information of the user j, and T=6 indicates that the user i sends personal letter, T=7 to the user jIndicate that the user i labels to the user j, T=8 indicates that the user j information issued is set as liking by the user iVigorously.In an application example of Fig. 3 the method, the signal is acquired from system log.
In an application example of Fig. 3 the method, the method meets Ku* (i) >=Ku (i) and Ku* (j) >=Ku(j).Wherein inequality Ku* (i) >=Ku (i) is meant that for each k ∈ K, there is uwik* >=uwik;Inequality Ku* (j) >=Ku (j) is meant that for each k ∈ K, there is uwjk* >=uwjk.
It is the increasing of the uwjk for each k ∈ K, the uwik* in an application example of Fig. 3 the methodFunction;It is the increasing function of the uwik for each k ∈ K, the uwjk*.
It is ∑ (k ∈ K) uwjk for each k ∈ K, the uwiK* in an application example of Fig. 3 the methodSubtraction function;It is the subtraction function of ∑ (k ∈ K) uwik for each k ∈ K, the uwjK*.
In an application example of Fig. 3 the method, the signal includes the user of the user i and the user jMark, each user identifier with uniquely it is subscriber-coded corresponding.The application example is read according to the user identifier of the user iThe user i's is subscriber-coded, and according to the parameter vector of the subscriber-coded reading of the user i user i;According to describedThe user identifier of user j reads the subscriber-coded of the user j, and according to the subscriber-coded reading of the user j user jParameter vector.
In Fig. 3 the method, after executing the parameter vector more new algorithm and reaching setting number, for each featureK ∈ K needs to collect user k-th of user's column vector (uw1k, uw2k ..., uwMk) in U and is normalized(normalization).Wherein, it executes primary parameter vector more new algorithm to be meant that, by the Ku (i) and the Ku (j)It brings the function1 and the function2 into, obtains the process of the Ku* (j) He the Ku* (i).The normalizationThe specific application example of method is as follows:
Example 1: the side that k-th of user's column vector (uw1k, uw2k ..., uwMk) is normalized in U is collected to userMethod, including setting temp=∑ (t ∈ U) uwtk, and for each i ∈ U, uwik=uwik/temp is set.
Example 2: the side that k-th of user's column vector (uw1k, uw2k ..., uwMk) is normalized in U is collected to userMethod is as follows.Temp=∑ (t ∈ U) uwtk is calculated first, and uwik=uwik/temp is calculated for each i ∈ U;Then rightUw1k, uw2k ..., uwMk are ranked up and uw1k, uw2k ..., uwMk are divided into r group according to ranking results, and takeThe smallest data composition set { s1, s2 ..., sr } in every group out, and s1 < s2 < ... < sr;Finally to uw1k,Uw2k ..., uwMk are handled as follows: for each i ∈ U, if uwik < s1, is arranged uwik=a;If sm≤Uwik=g (sm) is then arranged in uwik≤sm+1;If uwik > sr, is arranged uwik=b.Wherein g (sm) is increasing function, and g(sm) ∈ (a, b), 1≤m < r, a and b are nonnegative constant, and r is setup parameter.
Application example 1
This is an application example of Fig. 3 the method.In Fig. 3 the method, the parameter vector more new algorithm is logicalFollowing specific application example is crossed, to update the parameter vector of the user i and the user j:
Uwik*=uwik+ λ 1 (j, i, T) f1 [Ku (j)] is (for each)
Uwjk*=uwjk+ λ 2 (i, j, T) f2 [Ku (i)] is (for each)
Wherein, the uwik and uwik* is respectively indicated update before and after updating the parameter vector of the user i theK component, the uwjk and the uwjk*, which are respectively indicated, updates preceding k-th with the parameter vector of the user j after updateComponent;The λ 1 (j, i, T) is influence coefficient of the user j to the user i at the type T of the signal, the λ 2(i, j, T) is influence coefficient of the user i to the user j at the type T of the signal.The UKi is by describedIn the parameter vector Ku (i) of user i=(uwi1, uwi2 ..., uwik ..., uwiL) corresponding to the maximum Pi component of numerical valueFeature composition set, the UKj be by the user j parameter vector Ku (j)=(uwj1, uwj2 ...,Uwjk ..., uwjL) in the composition of feature corresponding to the maximum Pj component of numerical value set, Pi and Pj are setup parameter, andPi≤L, Pj≤L.Such as i=30, P30=3, UK30={ literature, computer, biology };J=265, P265=2, UK265={ music, history }.It is carried out after executing the specific algorithm arranged below, i.e., for each k ∈ UKi, uwjk=is setuwjk*;For each k ∈ UKj, uwik=uwik* is set.
In the application example 1, the specific algorithm can be further defined to for each k ∈ UKi, be metuwjk*≥uwjk;For each k ∈ UKj, meet uwik* >=uwik.
In the application example 1, the f1 [Ku (j)] is the function of the parameter vector Ku (j) of the user j, describedF2Ku (i)] be the user i parameter vector Ku (i) function.The specific reality of the f1 [Ku (j)] and the f2 [Ku (i)]Existing method includes following instance:
Example 1: the f1 [Ku (j)] is the increasing function of the uwjk, is the subtraction function of ∑ (k ∈ K) uwjk;F2 [the Ku(i)] be the uwik increasing function, be the subtraction function of ∑ (k ∈ K) uwik.
Example 2:f1 [Ku (j)]=σ 3uwjk/ (∑ (k ∈ K) uwjk), f2 [Ku (i)]=σ 4uwik/ (∑ (k ∈ K)Uwik), wherein σ 3 and σ 4 is setting constant.
Example 3:f1 [Ku (j)]=σ 5uwjk, f2 [Ku (i)]=σ 6uwik, wherein σ 5 and σ 6 is setting constant.
Example 4:f1 [Ku (j)]=σ 7 { 1/ [1+exp (- uwjk)] }, f2 [Ku (i)]=σ 8 1/ [1+exp (-Uwik)] }, wherein σ 7 and σ 8 is setting constant.
In the application example 1, the concrete methods of realizing of the λ 1 (j, i, T) and the λ 2 (i, j, T) include:
Example 1: the λ 1 (j, i, T) and the λ 2 (i, j, T) are the parameter vector of the user i and the user j respectivelyBetween similarity sim (i, j) function.Such as λ 1 (j, i, T)=c1sim (i, j), λ 2 (i, j, T)=c2sim (i,And sim (i, j) j) ,=| | Ku (i), Ku (j) | |=[∑ (k ∈ K) (uwikuwjk)]/{ [∑ (k ∈ K) (uwik) 2] 1/2 [∑ (k ∈ K) (uwjk) 2] 1/2 }, c1 and c2 are setting constant.This example is meant that the parameter of user i and user jSimilarity between vector is higher, then the proportionality coefficient that they " vote " each other is bigger.
Example 2: 1 (j, i, T)=u1 (j) u2 (i) of λ, 2 (i, j, T)=u1 (i) u2 (j) of λ, wherein u1 (j)Indicate whether the parameter vector of user j can be used for updating the parameter vector that user collects other users in U, u1 (i) indicates user iParameter vector whether can be used for updating user collect U in other users parameter vector;U2 (i) indicate user i parameter toThe parameter vector whether amount can be collected other users in U by user updates;U2 (j) indicates whether the parameter vector of user j can be withIt is updated by the parameter vector that user collects other users in U.U1 (j), u2 (j), u1 (i) and u2 (i) are setup parameters, they takeValue is 0 or 1.1 represent be, 0 represent it is no.This example is meant that prevent malicious attack, does not pass through reliability certificationUser, parameter vector cannot be updated the parameter vector of other users;Some special users, parameter vector cannotIt is updated by the parameter vector of other users.
Example 3: 1 (j, i, the T)=s1 (T) of λ, 2 (i, j, the T)=s2 (T) of λ.Wherein the T is that user accesses useThe type of family signal, the s1 (T) and the s2 (T) are the functions of the T.
Example 4: the λ 1 (j, i, T) and λ 2 (i, j, T) are generated using the combination of above-mentioned 1~3 each method of example.Such as
λ 1 (j, i, T)={ c1sim (i, j) } { u1 (j) u2 (i) } s1 (T)
λ 2 (i, j, T)={ c2sim (i, j) } { u1 (i) u2 (j) } s2 (T).
Example 5: the λ 1 (j, i, T) is the function of number of users in the relational network of the user j, the λ 2 (i, j, T)It is the function of number of users in the relational network of the user i.
Example 6: the λ 1 (j, i, T) and the λ 2 (i, j, T) are setting constant.
In the application example 1, after the execution specific algorithm reaches setting number, need for each feature kK-th of user's column vector (uw1k, uw2k ..., uwMk) is normalized in ∈ K.
Application example 2
This is a citing of 1 method of application example.For convenience of illustration,let us suppose that on the internet there are threeUser, there are two features, i.e. user to collect U={ 1,2,3 } by each user, feature set K={ 1,2 }.User 1, user 2 and user 3Parameter vector be respectively (uw11, uw12), (uw21, uw22) and (uw31, uw32).Wherein uwik (i ∈ U, k ∈ K) is indicatedThe degree of correlation of the user i and feature k.
If having received the signal that the user 2 accesses the user 3, and signal type T=1, then according to following parameterVector more new algorithm updates the parameter vector of the user 2 and the user 3:
Uw21*=uw21+ λ 1 (3,2,1) { uw31/ (uw31+uw32) }
Uw22*=uw22+ λ 1 (3,2,1) { uw32/ (uw31+uw32) }
Uw31*=uw31+ λ 2 (2,3,1) { uw21/ (uw21+uw22) }
Uw32*=uw32+ λ 2 (2,3,1) { uw22/ (uw21+uw22) }
Wherein λ 1 (3,2,1) indicates influence coefficient of the user 3 to the user 2 at signal type T=1;λ2(2,3,1) influence coefficient of the user 2 to the user 3 at signal type T=1 is indicated.If λ 1 (3,2,1)=c1Sim (2,3) u1 (3) u2 (2) s1 (1);λ 2 (2,3,1)=c2sim (2,3) u1 (2) u2 (3) s2 (1),If s1 (1)=3, s2 (1)=1.5;C1 and c2 is setting constant;U1 (3) indicates whether the parameter vector of user 3 can be used forThe parameter vector that user collects other users in U is updated, u1 (2) indicates whether the parameter vector of user 2 can be used for updating userCollect the parameter vector of other users in U, u2 (2) indicates whether the parameter vector of user 2 can be collected other users in U by userParameter vector updates, and u2 (3) indicates whether the parameter vector of user 3 can collect the parameter vector of other users in U more by userNewly, u1 (2)=u2 (2)=u1 (3)=u2 (3)=1;The sim (2,3) indicate the parameter of the user 2 and the user 3 toSimilarity between amount, i.e. sim (2,3)=(uw21uw31+uw22uw32)/{ [(uw21) 2+ (uw22) 2] 1/2[(uw31)2+(uw32)2]1/2} 。
After having executed above-mentioned algorithm, the parameter vector of the user 2 and the user 3 are updated, i.e. setting uw31=Uw31*, uw32=uw32*, uw21=uw21* and uw22=uw22*.
After having executed above-mentioned algorithm, to user's column vector (uw11, uw21, uw31) and (uw12, uw22, uw32) intoRow normalized.Its algorithm is as follows: setting temp1=uw11+uw21+uw31, then uw11=uw11/ is arranged to feature k=1Temp1, uw21=uw21/temp1, uw31=uw31/temp1;If temp2=uw12+uw22+uw32, then to feature k=2Uw12=uw12/temp2, uw22=uw22/temp2, uw32=uw32/temp2 are set.
Fig. 4 is the method that the acquisition user individual feature of document signal is accessed based on user.Specifically comprise the following steps:
S21. obtain multiple users on the internet, storage by the user that the multiple user forms collect U=1,2 ...,M};Multiple documents are obtained on the internet, store document sets D={ 1, the 2 ..., N } being made of the multiple document;SettingMultiple features store the feature set K={ 1,2 ..., L } being made of the multiple feature;
S22. collect multiple user setting parameter vector initial values in U for the user, and in the document sets DMultiple document setup parameter vector initial values;
S23. the signal that any one user m (m ∈ U) accesses any one document n (n ∈ D) is received;
S24. parameter vector Ku (m)=(uwm1, uwm2 ..., uwmk ..., the uwmL) of the user m is read, whereinThe uwmk indicates the degree of correlation of the user m and feature k (k ∈ K);
S25. parameter vector Kd (n)=(dwn1, dwn2 ..., dwnk ..., the dwnL) of the document n is read, whereinThe dwnk indicates the degree of correlation of the document n and feature k (k ∈ K);
S26. with following parameter vector more new algorithm 2, the parameter vector of the user m and the document n are updated,
Ku* (m)=function3 [Ku (m), Kd (n)],
Kd* (n)=function4 [Ku (m), Kd (n)];
After having executed the step S26, the step S23 is returned.
Wherein, the Ku (m) and the Ku* (m), which respectively indicate, updates the preceding parameter vector with the user m after update,The Kd (n) and the Kd* (n), which respectively indicate, updates the preceding parameter vector with the document n after update;The Ku* (m)=(uwm1*, uwm2* ..., uwmk* ..., uwmL*), the Kd* (n)=(dwn1*, dwn2* ..., dwnk* ...,dwnL*);The function3 indicates that Ku* (m) is the function of Ku (m) He Kd (n), and the function4 indicates that Kd* (n) isThe function of Ku (m) and Kd (n);The user m represents user and collects any one of U user, and is not specific to some user, describedDocument n represents any one of document sets D document, and is not specific to some document, for example, n-th execute step S23 whenM=1023, n=3428 in the signal, and m=33456 in the signal when (n+1)th execution step S23, n=28477。
It further include updating the Ku (m) and the Kd (n) after having executed the step S26 in Fig. 4 the methodThe step of, i.e. progress assignment Kd (n)=Kd* (n) and Ku (m)=Ku* (m).
In Fig. 4 the method, the type of the signal is at least with one of Types Below: T=9 indicates the userM clicks the link of the document n, and T=10 indicates that the user m keys in the address of the document n, and T=11 indicates the user mLabel is set to the document n, T=12 indicates that the document n is set bookmark by the user m, and T=13 indicates the userThe document n is set as liking+the 1 of Google (Like of such as types of facial makeup in Beijing operas and) by m, and T=14 indicates that the user m forwards the documentN, T=15 indicate that the user m comments on the document n, and T=16 indicates that the user m collects the document n.The side described in Fig. 4In one application example of method, the signal is acquired from Web log.The Web log, including server log(server log), error log (error log) and Cookie log etc..
In an application example of Fig. 4 the method, the method meets Ku* (m) >=Ku (m) and Kd* (n) >=Kd(n).Wherein inequality Ku* (m) >=Ku (m) is meant that for each k ∈ K, there is uwmk* >=uwmk;Inequality Kd* (n) >=Kd (n) is meant that for each k ∈ K, there is dwnk* >=dwnk.
It is the increasing of the dwnk for each k ∈ K, the uwmk* in an application example of Fig. 4 the methodFunction is the subtraction function of ∑ (k ∈ K) dwnk;It is the increasing function of the uwmk for each k ∈ K, the dwnk*, is ∑ (k∈ K) uwmk subtraction function.
In an application example of Fig. 4 the method, user identifier of the signal comprising the user m and the textThe document identification of shelves n, the user identifier and unique subscriber-coded corresponding, the document identification and unique document coding pairIt answers.The application example reads the subscriber-coded of the user m by the user identifier of the user m, and according to the user mThe subscriber-coded reading user m parameter vector;The document of the document n is read by the document identification of the document nIt encodes, and reads the parameter vector of the document n according to the document coding of the document n.
In Fig. 4 the method, after executing the parameter vector more new algorithm 2 and reaching setting number t1, for eachK-th of user's column vector (uw1k, uw2k ..., uwMk) is normalized in feature k ∈ K;Execute the parameter toAfter amount more new algorithm 2 reaches setting number t2, for each feature k ∈ K, to k-th document column vector (dw1k, dw2k ...,DwNk it) is normalized;Wherein t1 and t2 is positive integer.Primary parameter vector more new algorithm 2 is executed to be meant that, it willThe Ku (m) and the Kd (n) bring the function3 and the function4 into, obtain the Ku* (m) and the Kd*(n) process.The specific application example of the method for normalizing is as follows:
Example 1: the side that k-th of user's column vector (uw1k, uw2k ..., uwMk) is normalized in U is collected to userMethod, including setting temp=∑ (t ∈ U) uwtk, and for each m ∈ U, uwmk=uwmk/temp is set.To document sets DIn the method that is normalized of k-th of document column vector (dw1k, dw2k ..., dwNk), including setting temp=∑ (t∈ D) dwtk, and for each n ∈ D, dwnk=dwnk/temp is set.
Example 2: to the side that k-th of document column vector (dw1k, dw2k ..., dwNk) is normalized in document sets DMethod is as follows.Temp=∑ (t ∈ D) dwtk is calculated first, and dwnk=dwnk/temp is calculated for each n ∈ D;Then rightDw1k, dw2k ..., dwNk are ranked up and dw1k, dw2k ..., dwNk are divided into r group according to ranking results, and takeThe smallest data composition set { s1, s2 ..., sr } in every group out, and s1 < s2 < ... < sr;Finally to dw1k,Dw2k ..., dwNk are handled as follows: for each n ∈ D, if dwnk < s1, is arranged dwnk=a;If sm≤Dwnk=g (sm) is then arranged in dwnk≤sm+1;If dwnk > sr, is arranged dwnk=b.Wherein g (sm) is increasing function, and g(sm) ∈ (a, b), 1≤m < r, a and b are nonnegative constant, and r is setup parameter.With same method, user can be collected in U k-thUser's column vector is normalized.
Application example 3
This is an application example of Fig. 4 the method.The parameter vector more new algorithm 2 passes through following concrete applicationExample, to update the parameter vector of the user m and the document n:
Uwmk*=uwmk+ λ 3 (n, m, T) f3 [Kd (n)] is (for each)
Dwnk*=dwnk+ λ 4 (m, n, T) f4 [Ku (m)] is (for each)
Wherein, the uwmk and uwmk* is respectively indicated update before and after updating the parameter vector of the user m theK component, the dwnk and the dwnk*, which are respectively indicated, updates preceding k-th with the parameter vector of the document n after updateComponent;The λ 3 (n, m, T) is influence coefficient of the document n to the user m at the type T of the signal, the λ 4(m, n, T) is influence coefficient of the user m to the document n at the type T of the signal.The UKm is by describedIn the parameter vector Ku (m) of user m=(uwm1, uwm2 ..., uwmk ..., uwmL) corresponding to the maximum Pm component of numerical valueFeature composition set, the DKn be by the document n parameter vector Kd (n)=(dwn1, dwn2 ...,Dwnk ..., dwnL) in the composition of feature corresponding to the maximum Qn component of numerical value set, Pm and Qn are setup parameter, andPm≤L, Qn≤L.Such as m=30, P30=3, UK30={ music, sport, finance and economics };N=265, Q265=2, DK265={ soundIt is happy, building }.In addition, carrying out after executing above-mentioned specific algorithm arranged below, i.e., for each k ∈ UKm, dwnk=is setUwmk=uwmk* is arranged for each k ∈ DKn in dwnk*.
In the application example 3, the specific algorithm can be further defined to for each k ∈ DKn, be metuwmk*≥uwmk;For each k ∈ UKm, meet dwnk* >=dwnk.
In the application example 3, the f3 [Kd (n)] is the function of the parameter vector Kd (n) of the document n, describedF4 [Ku (m)] is the function of the parameter vector Ku (m) of the user m.The f3 [Kd (n)] and the f4's [Ku (m)] is specificImplementation method includes:
Example 1: the f3 [Kd (n)] is the increasing function of the dwnk, is the subtraction function of ∑ (k ∈ K) dwnk;F4 [the Ku(m)] be the uwmk increasing function, be the subtraction function of ∑ (k ∈ K) uwmk.
Example 2:f3 [Kd (n)]=σ 3dwnk/ (∑ (k ∈ K) dwnk), f4 [Ku (m)]=σ 4uwmk/ (∑ (k ∈ K)Uwmk), wherein σ 3 and σ 4 is setting constant.
Example 3:f3 [Kd (n)]=σ 5dwnk, f4 [Ku (m)]=σ 6uwmk, wherein σ 5 and σ 6 is setting constant.
Example 4:f3 [Kd (n)]=σ 7 { 1/ [1+exp (- dwnk)] }, f4 [Ku (m)]=σ 8 1/ [1+exp (-Uwmk)] }, wherein σ 7 and σ 8 is setting constant.
In the application example 3, the concrete methods of realizing of the λ 3 (n, m, T) and the λ 4 (m, n, T) include as followsExample:
Example 1: the λ 3 (n, m, T) and the λ 4 (m, n, T) are the parameter vector of the user m and the document n respectivelyBetween similarity sim (m, n) function.Such as λ 3 (n, m, T)=c1sim (m, n), λ 4 (m, n, T)=c2sim (m,And sim (m, n) n) ,=| | Ku (m), Kd (n) | |=[∑ (k ∈ K) (uwmkdwnk)]/{ [∑ (k ∈ K) (uwmk) 2] 1/2 [∑ (k ∈ K) (dwnk) 2] 1/2 }, c1 and c2 are setting constant.This example be meant that the parameter of user and document toSimilarity between amount is higher, and the proportionality coefficient that they " vote " each other is bigger.
Example 2: 3 (n, m, T)=u2 (m) d1 (n) of λ, 4 (n, m, T)=u1 (m) d2 (n) of λ, wherein d1 (n)Indicate whether the parameter vector of document n can be used for updating the parameter vector that user collects user in U, u2 (m) indicates the ginseng of user mWhether number vector can be updated by the parameter vector of document in document sets D, and u1 (m) indicates whether the parameter vector of user m can be withFor updating the parameter vector of document in document sets D, d2 (n) indicates whether the parameter vector of document n can be collected in U by userThe parameter vector of user updates.U1 (m), u2 (m), d1 (n) and d2 (n) are setup parameters, their value is 0 or 1.1 generationTable is, 0 represent it is no.This example is meant that prevent malicious attack, some documents (or user) are not due to by reliableProperty certification, parameter vector cannot be updated the parameter vector of other users (or document);Some special documents (or useFamily), parameter vector cannot be updated by the parameter vector of other users (or document).
Example 3: 3 (n, m, the T)=s1 (T) of λ, 4 (m, n, the T)=s2 (T) of λ.Wherein the T is that user accesses textThe type of shelves signal, the s1 (T) and the s2 (T) are the functions of the T.
Example 4: the λ 3 (n, m, T) and λ 4 (m, n, T) are generated using the combination of above-mentioned 1~3 each method of example.I.e.
λ 3 (n, m, T)={ c1sim (m, n) } { u2 (m) d1 (n) } s1 (T)
λ 4 (m, n, T)={ c2sim (m, n) } { u1 (m) d2 (n) } s2 (T).
Example 5: the λ 3 (n, m, T) is the accessed number of the document n or the function of PageRank value, the λ 4(m, n, T) is the function of number of users in the relational network of the user m.
Example 6: the λ 3 (n, m, T) and the λ 4 (m, n, T) are setting constant.
In the application example 3, after the execution specific parameter vector more new algorithm reaches setting number, needFor each feature k ∈ K, respectively to k-th of document column vector (dw1k, dw2k ..., dwNk) and k-th of user's column vector(uw1k, uw2k ..., uwMk) is normalized.
Application example 4
This is an applicating example of described 3 the method for application example.For convenience of illustration,let us suppose that in internetUpper there are two user and three documents, and each user and each document are there are two feature, i.e. user collects U={ 1,2 }, document setsD={ 1,2,3 }, feature set K={ 1,2 }.The parameter vector of user 1 and user 2 be respectively (uw11, uw12) and (uw21,Uw22), the parameter vector of document 1, document 2 and document 3 is respectively (dw11, dw12), (dw21, dw22) and (dw31, dw32).Wherein uwmk (m ∈ U, k ∈ K) indicates the degree of correlation of the user m and feature k;Dwnk (n ∈ D, k ∈ K) indicates the document nWith the degree of correlation of feature k.
Assuming that have received the signal that the user 2 accesses the document 3 in the server, and signal type T=9, then rootThe parameter vector of the user 2 and the document 3 are updated according to following algorithm:
Uw21*=uw21+ λ 3 (3,2,9) { dw31/ (dw31+dw32) }
Uw22*=uw22+ λ 3 (3,2,9) { dw32/ (dw31+dw32) }
Dw31*=dw31+ λ 4 (2,3,9) { uw21/ (uw21+uw22) }
Dw32*=dw32+ λ 4 (2,3,9) { uw22/ (uw21+uw22) }
Wherein λ 3 (3,2,9) indicates influence coefficient of the document 3 to the user 2 at signal type T=9;λ4(2,3,9) influence coefficient of the user 2 to the document 3 at signal type T=9 is indicated.Such as set λ 3 (3,2,9)=C1sim (2,3) s1 (9);λ 4 (2,3,9)=c2sim (2,3) s2 (9), if s1 (9)=3, s2 (9)=1.5;c1It is setting constant with c2;The sim (2,3) indicates the similarity between the user 2 and the parameter vector of the document 3,That is: sim (2,3)=(uw21dw31+uw22dw32)/{ [(uw21) 2+ (uw22) 2] 1/2 [(dw31) 2+ (dw32)2]1/2} 。
After having executed above-mentioned algorithm, the parameter vector of the user 2 and the document 3 are updated, i.e. setting uw21=Uw21*, uw22=uw22*, dw31=dw31* and dw32=dw32*.
After having executed above-mentioned algorithm, place is normalized to user's column vector (uw11, uw21) and (uw12, uw22)Reason, and document column vector (dw11, dw21, dw31) and (dw12, dw22, dw32) are normalized.
It is as follows to the algorithm of user's standardization on series vectors processing: to set temp1=uw11+uw21, then feature k=1 is setSet uw11=uw11/temp1, uw21=uw21/temp1;If temp2=uw12+uw22, then uw12=is arranged to feature k=2Uw12/temp2, uw22=uw22/temp2.
It is as follows to the algorithm of the normalized of document column vector: to set temp1=dw11+dw21+dw31, then to feature k=1 setting dw11=dw11/temp1, dw21=dw21/temp1, dw31=dw31/temp1;If temp2=dw12+dw22Then dw12=dw12/temp2, dw22=dw22/temp2, dw32=dw32/temp2 is arranged to feature k=2 in+dw32.
Fig. 5 is the method flow diagram for the user that inquiry has special characteristic.This method includes that execution in the server is as followsStep:
A11. the query vector of either query user e (e ∈ U) setting is received;
A12. the inquiry user e collects in U in the user chooses one group of user Q (Q ∈ U), for example, in social networksAll users for selecting the age to set in geographic area in one group of user in a given section or position at one;If withFamily is without above-mentioned selection, default value Q=U;
A13. according to the parameter vector of each user in the query vector and one group of user Q, described one is calculatedThe personalized ordering value UR (e, m) of each user m (m ∈ Q) in group user Q;The UR (e, m) indicates to be based on the user eQuery vector the user m personalized ordering value;
A14. in one group of user Q, the mark that the personalized ordering is worth maximum some users is sent to instituteState inquiry user e.
In Fig. 5 the method, the query vector of user e setting be Ks (e)=(swe1, swe2 ...,Swek ..., sweL), wherein swek indicates the degree of correlation for the user and feature k (k ∈ K) that the user e expectation inquires,Swek ∈ [a, b], a and b are default nonnegative constant.There are several types of setting methods for the query vector Ks (e).
The first is feature to be selected in feature set K by the user e, and it is arranged the feature degree of correlation, such as be arrangedThe degree of correlation of swe2=0.00023, swe6=0.00061, the user e and other feature is 0.
Second is that the parameter vector Ku (e) of the user e is assigned to the query vector Ks (e).
The third is that the user e submits one group of user or document Se { ..., r ... }.WhenWhen, the user rThe parameter vector of (r ∈ Se) is (uwr1, uwr2 ..., uwrL), therefore the query vector of the user e is set as: for eachFeature k ∈ K, swek=(σ 9/s) ∑ (r ∈ Se) [uwrk/ (∑ (k ∈ K) uwrk)];WhenWhen, the document r (r∈ Se) parameter vector be (dwr1, dwr2 ..., dwrL), therefore the query vector of the user e is set as: for each spyIt levies k ∈ K, swek=(σ 10/s) ∑ (r ∈ Se) [dwrk/ (∑ (k ∈ K) dwrk)].
In an application example of Fig. 5 the method, the personalized ordering value UR (e, m) is by the user eThe parameter vector Ku (m) of query vector Ks (e)=(swe1, swe2 ..., swek ..., sweL) and the user m (m ∈ Q)Obtained from=(uwm1, uwm2 ..., uwmk ..., uwmL) is calculated, such as
Fig. 6 is the method flow diagram for the user group that inquiry has special characteristic.Multiple user groups are obtained, subscriber cluster is formedG={ 1,2 ..., E }, wherein E is the number of user group.The parameter vector of user group i (i ∈ G) be set as (gwi1, gwi2 ...,Gwik ..., gwiL), wherein the gwik indicates the degree of correlation of the user group i and feature k (k ∈ K).Therefore, inquiry hasThe method of special characteristic user group is as follows:
A21. the parameter vector of each user group in the subscriber cluster G is calculated;The parameter vector of one user group is by thisThe parameter vector for each user that user group includes is calculated;For example, all users set in user group i form user's setBi, then the parameter vector calculation method of user group i is that each feature k ∈ K is arranged gwik=(σ 11/s) ∑ (t ∈ Bi)[uwtk/ (∑ (k ∈ K) uwtk)], wherein s is the element number of user's set Bi, and σ 11 is setting constant.
A22. the query vector of either query user e (e ∈ U) setting is received;
A23. according to the parameter vector of each user group in the query vector and the subscriber cluster G, described in calculatingThe personalized ordering value GR (e, i) of each user group i (i ∈ G) in subscriber cluster G;The GR (e, i) indicates to be based on the useThe personalized ordering value of the user group i of the query vector of family e;
A24. in the subscriber cluster G, the mark that the personalized ordering is worth maximum some user groups is sent toThe inquiry user e.
In Fig. 6 the method, the query vector of user e setting be Ks (e)=(swe1, swe2 ...,Swek ..., sweL), wherein swek indicates the degree of correlation for the user group and feature k (k ∈ K) that the user e expectation inquires,Swek ∈ [a, b], a and b are default nonnegative constant.There are four types of setting method, first three and Fig. 5 institutes for the query vector Ks (e)The three kinds of methods stated are identical.4th kind is that the user e submits a user group, and the parameter vector of the user group is assigned toThe Ks (e).
In the method for searching user group, the personalized ordering value GR (e, i) is by the query vector Ks of the user e(e) the parameter vector Ku (i) of=(swe1, swe2 ..., swek ..., sweL) and the user group i (i ∈ G)=(gwi1,Gwi2 ..., gwi ..., gwiL) calculated obtained from, such as
A kind of system construction drawing for obtaining user individual feature of Fig. 7.The system 200 includes following functional module:
User, document and feature setup module 211: obtaining multiple users on the internet, composition user collect U=1,2 ..., M }, user collection U is stored in customer data base 220;Multiple documents are obtained on the internet, form document sets DThe document sets D is stored in document database 230 by={ 1,2 ..., N };Multiple features are set, composition characteristic collection K=1,2 ..., L }, the feature set is stored in property data base 240;
The parameter vector initial value setup module 212 of user and document: collect multiple user settings ginseng in U for the userNumber vector initial value, and it is stored in the customer data base 220;For multiple document setup parameters in the document sets DVector initial value, and it is stored in the document database 230;User and the document of parameter vector initial value are not set,Parameter vector initial value is default is set as null vector for its;
User accesses subscriber signal acquisition module 213: accessing any one use for acquiring any one user i (i ∈ U)The signal 1 of family j (j ∈ U), the signal 1 are stored in web log data library 250;The user i (101) accesses the userThe signal of j (102), will be sent to social network server 302;
User accesses document signal acquisition module 214: any for acquiring any one user m (m ∈ U) (103) accessThe signal 2 of one document n (n ∈ D), the signal 2 are stored in web log data library 250;The user m (103) accesses instituteThe signal of document n is stated, will be sent at least one application server, the application server includes portal site server301, social network server 302, search engine server 303 and instant communication server 304;
The parameter vector update module 215 of user and document: according to the signal 1, in the customer data base 220The parameter vector of the user i (101) and the user j (102) is read, then by described in the update of parameter vector more new algorithmThe parameter vector of user i (101) and the user j (102), and the store-updated institute in the customer data base 220State the parameter vector of user i and the user j;According to the signal 2, the user m is read in the customer data base 220(103) parameter vector and the parameter vector that the document n is read in the document database 230, then pass through parameterVector more new algorithm 2 updates the parameter vector of the user m (103) and the document n, and in the customer data base 220In the store-updated user m parameter vector and the store-updated text in the document database 230The parameter vector of shelves n;
User query module 216: the module has the query function of user and user group;User query function includes: to connectIt receives the query vector by inquiry user setting and obtains one group of userThen according to the query vector andThe parameter vector of each user in one group of user Q calculates the personalized row of each user in one group of user QSequence value, and according to the size of the personalized ordering value, the mark of at least one user in one group of user Q is sent to instituteInquiry user is stated, referring to step A11 to A14;User group query function includes: the query vector received by inquiry user setting,Then it according to the parameter vector of each user group in the query vector and subscriber cluster G, calculates in the subscriber cluster GThe personalized ordering value of each user group, and according to the size of the personalized ordering value, by least one in the subscriber cluster GThe mark of a user group is sent to the inquiry user, referring to step A21 to A24.
Application example described above is only preferable application example of the invention, the protection model being not intended to limit the inventionIt encloses.