Summary of the invention
In view of this, the present invention provides a kind of Internet resources analysis method and device, resource can be automatically analyzed,And important synergism is played to resource classification, to reduce the consumed manpower of resource collating sort.
Based on above-mentioned purpose Internet resources analysis method provided by the invention, include the following steps:
From the historical record of user group request resource, the first user for requesting first resource in the set time period is obtainedGroup's number and before and after request first resource in another setting time the request of the first user group the in addition to first resourceOne resource collection;
The time of every resource and request number of times is requested in first resource set to calculate separately the according to the first user groupThe correlation of every resource and first resource in one resource collection;
In first resource set, the Secondary resource that degree of correlation is higher than setting related condition is obtained;
First resource and Secondary resource are determined as related resource.
Optionally, the time of first resource set and request number of times is requested to calculate first resource collection according to the first user groupEvery resource and the step of correlation of first resource, specifically include in conjunction:
For any one resource in first resource set, any one is requested to provide according to the first user group respectivelyThe number in source calculates the request number of times parameter of the resource, and requests the time of any one resource to be counted according to the first user groupCalculate the request time parameter of the resource;
Request number of times parameter and request time parameter are regard as first multiplied by the sum of the value after the weight accordingly set respectivelyThe first relative coefficient of any one resource in resource collection relative to the correlation size of first resource;
The correlation of first resource and any one resource is calculated according to the first relative coefficient.
Optionally, in first resource set, obtain degree of correlation be higher than setting related condition Secondary resource stepBefore rapid, further includes:
For any one resource in the first resource set, the resource is requested in acquisition in corresponding set period of timeUser group number and the user group request first resource of the acquisition request resource number and request time;
First resource is calculated relative to any one money in first resource set according to the data obtained in above-mentioned stepsThe request number of times parameter and request time parameter in source;
By first resource relative to any one resource request number of times parameter and request time parameter respectively multiplied bySecond correlation size coefficient of the sum of the value after corresponding setting weight as first resource relative to any one resource;
The step of correlation of the first relative coefficient of foundation calculating first resource and any one resource, is alsoIt specifically includes:
The phase of first resource and any one resource is calculated according to the first relative coefficient and the second relative coefficientGuan Xing.
Optionally, the step of first resource and Secondary resource being determined as related resource specifically includes:
According in preset rules, relative coefficient size institute and corresponding corresponding relationship accordingly determine the first moneyThe correlation level in source and Secondary resource;
The relative coefficient includes the first relative coefficient and the second relative coefficient.
Optionally, after the step of first resource and Secondary resource being determined as related resource, further includes:
Assign identical first classification information to first resource and Secondary resource, so as to using first classification information asArrange foundation when Internet resources;
First classification information includes the first theme and/or the first label.
Further, the present invention provides a kind of Internet resources analytical equipment, comprising:
First user group obtains module: for obtaining in setting from the historical record of user group request resourceBetween in section request first resource the first user group number;
First resource set obtains module: for obtaining the first user in another setting time before and after requesting first resourceThe first resource set in addition to first resource of group's request;
Correlation calculations module: for according to the first user group request in first resource set the time of every resource andRequest number of times calculates separately the correlation of every resource and first resource in first resource set;
Related resource obtains module: being higher than setting related condition in first resource set, obtaining degree of correlationSecondary resource;
Relationship output module: for first resource and Secondary resource to be determined as related resource.
Optionally, the correlation calculations module specifically includes:
First request number of times parameter calculation unit: for for any one resource in first resource set, difference rootThe number of any one resource is requested to calculate the request number of times parameter of the resource according to the first user group;
First request time parameter calculation unit: for requesting the time of any one resource according to the first user groupCalculate the request time parameter of the resource;
First relative coefficient computing unit: for by request number of times parameter and request time parameter respectively multiplied by accordingly settingThe sum of value after fixed weight is as correlation of any one resource relative to first resource in first resource setFirst relative coefficient of size;
Correlation output unit: for calculating first resource and any one resource according to the first relative coefficientCorrelation.
Optionally, described device further include:
Second user group obtains module: for obtaining in phase for any one resource in the first resource setAnswer the user group number that the resource is requested in set period of time;
Second request number of times and time-obtaining module: time of the user group request first resource for the acquisition request resourceSeveral and request time;
Second request number of times and time parameter computing module: for obtaining module and the second request time according to second user groupData acquired in several and time-obtaining module calculate first resource relative to any one resource in first resource setRequest number of times parameter and request time parameter;
Second relative coefficient computing module: for the request number of times by first resource relative to any one resourceParameter and request time parameter are used as first resource relative to described any one multiplied by the sum of the value after corresponding setting weight respectivelySecond correlation size coefficient of item resource;
The correlation output unit calculates first resource and institute according to the first relative coefficient and the second relative coefficientState the correlation of any one resource.
Optionally, the relationship output module specifically includes:
Correlation level computing unit: according in preset rules, relative coefficient size institute and corresponding correspondence are closedSystem accordingly determines the correlation level of first resource and Secondary resource;
The relative coefficient includes the first relative coefficient and the second relative coefficient.
Optionally, described device further include:
First classification information obtains module: for assigning identical first classification information to first resource and Secondary resource,So as to using first classification information as foundation when arrangement Internet resources;
First classification information includes the first theme and/or the first label.
From the above it can be seen that Internet resources analysis method provided by the invention and device, can ask according to userIt asks the behavior of resource to analyze the relationship between Internet resources, the pass between Internet resources can be obtained by objective behaviorSystem.Meanwhile by determining the relationship between Internet resources, facilitates the arrangement and classification of Internet resources, arranged for Internet resourcesRelated personnel effective subsidiary classification foundation is provided.
Specific embodiment
In order to provide effective implementation, the present invention provides following embodiments, below in conjunction with Figure of description to thisInventive embodiments are illustrated.
Present invention firstly provides a kind of Internet resources analysis methods, include the steps that as shown in Figure 1:
Step 101: from the historical record of user group request resource, obtaining and request first resource in the set time periodFirst user group number and before and after requesting first resource in another setting time the first user group request remove first resourceFirst resource set in addition;
Step 102: the time of every resource and request number of times point in first resource set are requested according to the first user groupNot Ji Suan in first resource set every resource and first resource correlation;
Step 103: in first resource set, obtaining the Secondary resource that degree of correlation is higher than setting related condition;
Step 104: first resource and Secondary resource are determined as related resource.
From the above it can be seen that resource analysis method provided by the invention, it can be according to the use of Internet usageThe historical behavior of family group request resource calculates the correlation between resource, and under normal circumstances, the correlation between Internet resourcesProperty can be embodied by the access behavior of user;For example, user can access same class website according to the hobby of oneself;CauseThis, the present invention therefrom obtains resource dependencies by the resource request behavior of analysis user group, identifies to exist between each other and closeThe resource of connection, enables the classification for judging resource when resource classified finishing according to the analysis result of related resource, is magnanimity numberAccording to classification provide extremely effective help.
In above-mentioned steps 101, the historical record of user's entirety request Internet resources can be obtained from network server,For any one Internet resources, the user for requesting the resource can be obtained from the historical record that user's entirety requests Internet resourcesThen group obtains the other resources requested in these user groups period shorter before and after requesting the Internet resources.InstituteIt states the first user group in embodiments of the present invention, refers to a time point or in the use of a set period of time request first resourceThe set at family;Specifically, if 1000 users request first resource at a time point or set period of time, then this 1000User's not the first user group.
It, can be by first resource and second when calculating the correlation of first resource and Secondary resource in above-mentioned steps 102The requested information of resource is as the information for embodying correlation, for example, with first resource requested time in the first user groupRequested number is as foundation in the first user group for several and Secondary resource, if first resource is requested in the first user group1000 times, then if Secondary resource is also requested 1000 times in the first user group, it is believed that first resource is mutual with Secondary resourceFor perfectly correlated resource.In a particular embodiment, corresponding parameter can be set to indicate correlation size, for example, passing through theThe ratio table of two resources requested number and first resource requested number in the first user group in the first user groupShow the degree of correlation height of Secondary resource and first resource.
In step 103, in a particular embodiment, the related condition limit value Secondary resource and first resource of the settingDegree of correlation reach a certain height;For example, the related condition of the setting can be, the first user group requests Secondary resourceNumber and the first user group request first resource number ratio be higher than 50%;For another example, the related condition of the setting canTo be, the first user group requests the time of Secondary resource 1 minute before and after the time point of the first user group request first resourceWithin, and the number of the first user group request Secondary resource and the number ratio of the first user group request first resource are higher than50%.
At step 104, after first resource and Secondary resource being determined as related resource, network management personnel or moneyFirst resource and Secondary resource can be divided into same class resource according to the analysis result of related resource by source collating sort personnel,Or outgoing label, title, classification etc. are extracted according to the specifying information of first resource and Secondary resource, assign first resource and secondResource, so that first resource and Secondary resource label having the same, title, classification.
In some embodiment of the invention, time and the request number of times of first resource set are requested according to the first user groupThe step of calculating every resource and the correlation of first resource in first resource set specifically includes:
For any one resource in first resource set, any one is requested to provide according to the first user group respectivelyThe number in source calculates the request number of times parameter of the resource, and requests the time of any one resource to be counted according to the first user groupCalculate the request time parameter of the resource;
Request number of times parameter and request time parameter are regard as first multiplied by the sum of the value after the weight accordingly set respectivelyThe first relative coefficient of any one resource in resource collection relative to the correlation size of first resource;
The correlation of first resource and any one resource is calculated according to the first relative coefficient.
In a particular embodiment, the request number of times parameter of any one resource can be any one resourceRequest number of times;For example, first resource is requested a time point by 1000 users, this 1000 users also request the first moneyResource A in the set of source, request number of times is 300 times, then the request number of times parameter of resource A is 300 (secondary).
The request number of times parameter of any one resource is also possible to request number of times and the institute of any one resourceState the ratio of the request number of times of first resource.For example, first resource is requested a time point by 1000 users, this 1000User also requests the resource B in first resource set, and request number of times is 500 times, then the request number of times parameter of resource B is500:1000, i.e., 0.5.
The request time parameter of any one resource can be the request time and first of any one resourceThe absolute value of the difference of the request time of resource.For example, first resource is requested a time point by 1000 users, this 1000In user, some user requests the resource in first resource set after requesting first resource in 10 seconds timeC, then the request time parameter of resource C was 10 (seconds).
The request time parameter of any one resource is also possible to the request time and of any one resourceThe absolute value of the difference of the request time of one resource corresponding numerical value in preset duration hierarchy rules.For example, first resource existsOne time point is requested by 1000 users, in this 1000 users, some user before requesting first resource 2.6The resource D in first resource set is requested within second;And according to preset duration hierarchy rules, corresponding grade is within 0-1.0 seconds1,1.1-2.0 second corresponding grade is that 2,2.1-3 seconds corresponding grades are 3 ... so, and the request time parameter of resource D is 3.
In the step of first relative coefficient of foundation calculates the correlation of first resource and any one resourceIn, the first relative coefficient directly as the parameter of reflection correlation size and can be determined into any one resource and theOne resource whether be related resource foundation.
Before the step of obtaining Secondary resource of the degree of correlation higher than setting related condition, further includes:
For any one resource in the first resource set, the resource is requested in acquisition in corresponding set period of timeUser group number and the user group request first resource of the acquisition request resource number and time;
First resource is calculated relative to any one money in first resource set according to the data obtained in above-mentioned stepsThe request number of times parameter and request time parameter in source;
By first resource relative to any one resource request number of times parameter and request time parameter respectively multiplied bySecond correlation size coefficient of the sum of the value after corresponding setting weight as first resource relative to any one resource;
The step of correlation of the first relative coefficient of foundation calculating first resource and any one resource, is alsoIt specifically includes:
The phase of first resource and any one resource is calculated according to the first relative coefficient and the second relative coefficientGuan Xing.
As a kind of specific embodiment of the invention, it is assumed that 10000 users request resource A between 10:00-10:01,This 10000 users i.e. first user group, 1000 users in first user group request in 10 seconds after resource A alsoC is requested, also requests D in 10 seconds after 200 users request resource A in first user group;Described first usesC, 500 users in first user group are also requested in 30 seconds after 1500 users request resource A in the group of familyD is also requested in 30 seconds after request resource A;1500 users request 1 minute after resource A in first user groupC is inside also requested, also requests D in 1 minute in first user group after 900 users' request resource A;Described firstC, 2000 users in first user group are also requested in 5 minutes in user group after 1500 users' request resource AD is also requested in 5 minutes after request resource A;... wherein, resource A, that is, first resource, resource B, resource C, resource D instituteThe set of composition, that is, first resource set.In the data for meeting statistical data amount, according to identical behavior ratio and behavior whenBetween the weight that is spaced determine distant relationships by relative coefficient that unified standard is calculated.Assuming that fiducial time parameter isIt is 20,1 minute weight be 10,5 minutes weights is 1 that 10 seconds weights, which are 100,30 seconds weights, it is assumed that according to plain mode, i.e., originalRatio multiplied by weight accumulation calculating, then resource C relative to the first relative coefficient of resource A be 100 × 1000/10000+20 ×The first phase of 1500/10000+10 × 1500/10000+1 × 1500/10000+ ...=14.65, resource D relative to resource APass property coefficient is 100 × 200/10000+20 × 500/10000+10 × 900/10000+1 × 2000/10000+ ...=4.1.Single user operation, sampling interval duration, weight, algorithm are that citing is convenient.Correlation size, the resource of resource A and resource C simultaneouslyThe relative coefficient and resource A and money for being referring also to reversed resource A and resource C with time point of the correlation size of A and resource DRelation value is calculated according to positive converse value in the relative coefficient of source D again.That is, being said from calculating process, it is equivalent toUsing C and D as first resource, the relative coefficient of the relative coefficient of computing resource A and resource C, resource A and resource D.
In a particular embodiment, the first relative coefficient can be added to the phase for calculating two resources with the second relative coefficientGuan Xing.For example, first resource and the first relative coefficient of Secondary resource and the sum of the second relative coefficient are higher, then accordingly, the degree of relevancy of first resource and Secondary resource is higher.
In some embodiments of the invention, the step of first resource and Secondary resource being determined as related resource is specifically wrappedIt includes:
According in preset rules, relative coefficient size institute and corresponding corresponding relationship accordingly determine the first moneyThe correlation level in source and Secondary resource;
The relative coefficient includes the first relative coefficient and the second relative coefficient.
In practical situations, some resources are not directly related, but indirect correlation, or related but degree of relevancy compared withIt is low.In the above-described embodiments, be conducive to determine the mutual different degrees of correlative relationship of multiple resources.
In some embodiments of the invention, after the step of first resource and Secondary resource being determined as related resource,Further include:
From the user group request resource historical record in, the second user group number of acquisition request Secondary resource andThe Secondary resource set in addition to Secondary resource of second user group request;
The time of Secondary resource set and request number of times is requested to calculate according to second user group each in Secondary resource setThe correlation of item resource and Secondary resource;
In Secondary resource set, the information resources that relative coefficient is greater than setting related condition are obtained;
Information resources and Secondary resource are determined as related resource, Secondary resource and first resource are determined as indirect correlationResource.
Request the frequency and the resource in certain section by identical people in certain section interval according to some resource and another resourceThe percent probability of the frequency is requested in time, the relationship that can measure the two resources is far and near.Binding time attribute and regionAttribute can measure the relationship of some regions and each resource within certain class time;It then can be with by the transmitting between relationshipRelationship between affluent resources forms the network of personal connections of resource.Such as: most of travel enthusiasts would generally be requested about this National Travel AgencyThe Internet resources at sight spot are swum, small part tourism, which has endured this, can request the Internet resources about other national tourist attractions, domestic sameTheme travel sight spot Internet resources can be recorded by the historical requests of most of domestic tourists user and determine its correlation, togetherWhen foreign countries can be confirmed to the request at the limited same theme travel sight spot of country variant by the transnational tourist fan of various countriesTourism scenery resources just can confirm that the whole world using the transmitting of relationship with the correlation between the resource of domestic travel sight spotAbout the correlation between the Internet resources of tourist attractions.Although being in different hemisphere and country, according to area attribute andRelation transmission can establish the correlative relationship between all-network resource based on a large amount of network resource datas, to provide to networkEffective booster action is played in source classification.
In some embodiments of the invention, after the step of first resource and Secondary resource being determined as related resource,Further include:
Assign identical first classification information to first resource and Secondary resource, so as to using first classification information asArrange foundation when Internet resources;
First classification information includes the first theme and/or the first label.
The classification informations such as identical theme, label, classification, energy are extracted to related resource identified in the embodiment of the present inventionEnough resource classification operators provide effective and accurate classification foundation when classifying to Internet resources.
Further, the present invention also provides a kind of Internet resources analytical equipment, structure is as shown in Figure 2, comprising:
First user group obtains module: for obtaining in set period of time from the historical record of user group request resourceFirst user group number of interior request first resource;
First resource set obtains module: for obtaining the first user in another setting time before and after requesting first resourceThe first resource set in addition to first resource of group's request;
Correlation calculations module: for according to the first user group request in first resource set the time of every resource andRequest number of times calculates separately the correlation of every resource and first resource in first resource set;
Related resource obtains module: being higher than setting related condition in first resource set, obtaining degree of correlationSecondary resource;
Relationship output module: for first resource and Secondary resource to be determined as related resource.
In some embodiment of the invention, the correlation calculations module specifically includes:
First request number of times parameter calculation unit: for for any one resource in first resource set, difference rootThe number of any one resource is requested to calculate the request number of times parameter of the resource according to the first user group;
First request time parameter calculation unit: for requesting the time of any one resource according to the first user groupCalculate the request time parameter of the resource;
First relative coefficient computing unit: for by request number of times parameter and request time parameter respectively multiplied by accordingly settingThe sum of value after fixed weight is as correlation of any one resource relative to first resource in first resource setFirst relative coefficient of size;
Correlation output unit: for calculating first resource and any one resource according to the first relative coefficientCorrelation.
In some embodiment of the invention, described device further include:
Second user group obtains module: for obtaining in phase for any one resource in the first resource setAnswer the user group number that the resource is requested in set period of time;
Second request number of times and time-obtaining module: time of the user group request first resource for the acquisition request resourceSeveral and request time;
Second request number of times and time parameter computing module: for obtaining module and the second request time according to second user groupData acquired in several and time-obtaining module calculate first resource relative to any one resource in first resource setRequest number of times parameter and request time parameter;
Second relative coefficient computing module: for the request number of times by first resource relative to any one resourceParameter and request time parameter are used as first resource relative to described any one multiplied by the sum of the value after corresponding setting weight respectivelySecond correlation size coefficient of item resource;
The correlation output unit calculates first resource and institute according to the first relative coefficient and the second relative coefficientState the correlation of any one resource.
In some embodiment of the invention, the relationship output module specifically includes:
Correlation level computing unit: right for the relationship according to preset relative coefficient size and correlation levelDetermine the correlation level of first resource and Secondary resource with answering;
The relative coefficient includes the first relative coefficient and the second relative coefficient.
In some embodiment of the invention, described device further include:
First classification information obtains module: for assigning identical first classification information to first resource and Secondary resource,So as to using first classification information as foundation when arrangement Internet resources;
First classification information includes the first theme and/or the first label.
From the above it can be seen that Internet resources analysis method provided by the invention and device, can ask according to userIt asks the behavior of resource to analyze the relationship between Internet resources, the pass between Internet resources can be obtained by objective behaviorSystem.Meanwhile by determining the relationship between Internet resources, facilitates the arrangement and classification of Internet resources, arranged for Internet resourcesRelated personnel effective subsidiary classification foundation is provided.
It should be appreciated that multiple embodiments described in this specification are merely to illustrate and explain the present invention, it is not used to limitThe fixed present invention.And in the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the artMind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologiesWithin, then the present invention is also intended to include these modifications and variations.