Movatterモバイル変換


[0]ホーム

URL:


CN101634995B - A Machine Learning-Based Network Connection Speed Prediction Method - Google Patents

A Machine Learning-Based Network Connection Speed Prediction Method
Download PDF

Info

Publication number
CN101634995B
CN101634995BCN2009101021269ACN200910102126ACN101634995BCN 101634995 BCN101634995 BCN 101634995BCN 2009101021269 ACN2009101021269 ACN 2009101021269ACN 200910102126 ACN200910102126 ACN 200910102126ACN 101634995 BCN101634995 BCN 101634995B
Authority
CN
China
Prior art keywords
user
website
neural network
training set
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009101021269A
Other languages
Chinese (zh)
Other versions
CN101634995A (en
Inventor
徐颂华
江浩
金涛
刘智满
潘云鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJUfiledCriticalZhejiang University ZJU
Priority to CN2009101021269ApriorityCriticalpatent/CN101634995B/en
Publication of CN101634995ApublicationCriticalpatent/CN101634995A/en
Application grantedgrantedCritical
Publication of CN101634995BpublicationCriticalpatent/CN101634995B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

The invention discloses a network connection speed predicting method based on machine learning. The method comprises the following steps: (1) recording the speed of connecting a user with a browsed website by a customized browser and taking the speed as a training set and a test set; (2) using neural network training and predicting the speed of connecting the user with all websites in the training set by the obtained website connecting speed; (3) executing the step (4) or dividing the training set into smaller training sets and executing the step (2) on all the training sets according to the condition of the predicted error reduction of all neural networks; (4) testing the predicting performance of the neural networks by a decision tree; and (5) predicting the speed of connecting the user with an unknown website by the decision tree and the neural networks. The invention predicts the speed of connecting the user with all the websites by an artificial intelligence technology and a machine learning method, improves the precision of network condition assessment, and fully uses the user bandwidth to provide better internet experience for the user.

Description

A kind of network connection speed predicting method based on machine learning
Technical field
The present invention relates to computer search and web technology field, relate in particular to a kind of network connection speed predicting method based on machine learning.
Background technology
In recent years, a series of research activities has appearred, studying personalized or user oriented search engine and algorithm, as be published in 2007 the 16 international web-seminar (WWW ' 07:Proceedings ofthe 16ThInternational conference on World Wide Web) the one piece of article " the extensive evaluation and the analysis of personalized search strategy " on (" A large-scale evaluation and analysis of personalizedsearch strategies ").One piece of article in the 23 U.S. artificial intelligence association in 2008 meeting " based on the user oriented webpage sort algorithm of user concerned time " (" A user-oriented webpage rankingalgorithm based on user attention time ") lining, the author also proposes to set up the personalized solution of a user oriented web page search engine.The present invention is the network connection situation that is used for optimizing specially the personal user.In the present invention, we have studied user oriented optimum network to greatest extent and connect and select, and this is seldom related in the past research and invention work.
Because service quality is very crucial in the network insertion of web browser and many other types, any method that can improve service quality all has huge commercial value.Some solutions have the people to propose, and some has then dropped into commercial the use.In these solutions, the most successful large scale business software is to utilize simple idea, can open a plurality of linked network content supplier automatically and do parallel the download or visit.An example is a software (http://www.xunlei.com/) that is called a sudden peal of thunder, and this is one of most popular Chinese software.Yet, use this class method, web page contents supplier's website will be subjected to tremendous influence, because it is to visit webpage by auto-programming, rather than the final user, therefore, online advertisement will lose their value on these webpages.The enterprise that this problem has caused providing this type of service quality to improve serves and some the law court's cases between the web site contents supplier.In the present invention, we have proposed a kind ofly to come the predictive user network condition based on data mining method, can be used to set up the significant consideration of personalized web site commending system, thereby provide best service quality for the personal user.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of network connection speed predicting method based on machine learning is provided.
Network connection speed predicting method based on machine learning may further comprise the steps:
1) utilize custom browser, the connection speed of the website that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
The present invention has effectively utilized artificial intelligence technology, the method of using multiple machine learning is come the connection speed between predictive user and each website, promoted the precision of assessment network condition, make the use of Internet resources can use user bandwidth to a greater extent, experience for the user provides better internet.
Description of drawings
Fig. 1 is based on the embodiment flowage structure figure of the network connection speed predicting method of machine learning;
Fig. 2 is the process flow diagram that the present invention is applied to personalized network resource recommended system;
Fig. 3 is the virtual network architecture synoptic diagram that uses in the virtual network experiment;
Fig. 4 is the predicated error synoptic diagram under the neural network number situation in limiting the artificial neural network group; Horizontal ordinate is the number of neural network among the artificial neural network group, the predicated error when ordinate is predicted the training intensive data for using this neural network group and decision tree to be used to; Article three, curve has represented that respectively training set has 10000,50000, and the predicated error situation during 100000 data;
Fig. 5 is a schematic diagram data of testing acquisition in the virtual network of as shown in Figure 3 simulation internet situation; (a)-(f) distinguish the data in the corresponding tables 1 (a)-(f), and draw with the form of performance boost percentage; In each figure, horizontal ordinate is the number of neural network among the artificial neural network group, ordinate is used to predict the connection speed of unknown website for using this neural network group and decision tree, thereby after being used for network resource recommended system, the number percent that the data speed of download of acquisition promotes; In the experiment of (a)-(c), the user is set and is in than on the computing machine away from the network root, and promptly analog dialup network user situation also has 10000 in advance, and 50000, the experimental data during 100000 user's historical records; In the experiment of (d)-(f), the user is set and is in than on the computing machine near the network root, and promptly simulate the broadband network user situation and have 10000 in advance, 50000, the experimental data during 100000 user's historical records; Each experimental data is the same experiment flow average data after 100 times repeatedly, and experimental data unit is millisecond (ms);
Fig. 6 is that the embodiment of the invention is used the experimental data figure under real China Internet; (a)-(c) shown the user's download text respectively, the experimental data of PDF document and online game installation file required time; In each figure, horizontal ordinate is #1~#20 numbering of 20 users of participation experiment, and ordinate is a user's download network resource data consumed time; When each column figure represents not use the network resource recommended system that has the embodiment of the invention respectively and use experimental data after the network resource recommended system that has the embodiment of the invention and the experimental data of sudden peal of thunder software when the single site downloading mode under the similarity condition; Each user's experimental data is all listed in the drawings; Each experimental data is the similar resource average data after 100 times repeatedly; The average data size of above-mentioned three class resources is respectively the 10.6K byte, 3.49M byte, 784M byte; Experimental data unit is second (sec.).
Embodiment
Network connection speed predicting method based on machine learning may further comprise the steps:
1) utilize custom browser, the connection speed of the website that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
The present invention has effectively utilized artificial intelligence technology, the method of using multiple machine learning is come the connection speed between predictive user and each website, promoted the precision of assessment network condition, make the use of Internet resources can use user bandwidth to a greater extent, experience for the user provides better internet.
Embodiment
As shown in Figure 1, this method comprises training stage and forecast period two parts; Training stage comprises user'shistorical data 10, training set 20, and test set 30, artificialneural network 40, error judges 50, cuts apart training set 60, artificialneural network group 70, C4.5decision tree 80; Forecast period comprisesunknown website 90, C4.5decision tree 80, artificialneural network group 70, connection speed predicted value 99.
User's historical data 10: the data of the user bandwidth of the user's tie-time during each access site during with the transmission data; Wherein the user of certain website is defined as each user the tie-time and sends request of access to this website and obtain time interval between this address response to the user, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user's tie-time be as the criterion; Speed of download when the user bandwidth of certain website is defined as each user from this website data download, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user bandwidth be as the criterion.
Training set 20: select 10% at random as behind the test set from user'shistorical data 10, remaining 90% part is as training set;
Test set 30: from user's historical data, select 10% at random as test set;
Artificial neural network 40: in an embodiment, we have used one 4 layers artificial neural network, and wherein input layer is the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; Its output layer is 2 real numbers, tie-time estimated value and bandwidth estimation value between expression user and this website; Each neuron in other two-layer in this neural network is a sigmod function, and per two neurons between adjacent two layers all link to each other; Utilize back-propagation algorithm (back-propagation), constantly use user'shistorical data 10 that it is trained on the backstage.
Error judges 50: use the user's tie-time and the user bandwidth of artificialneural network 40 predictions each website in training set 20 after training, calculate the predicted value of this website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient; If this step is not to be performed for the first time and the predicated error summation of this moment differs with last predicated error summation and is no more than 3%, then finish training to neural network, preserve at this moment all neural networks, obtain artificialneural network group 70.
Cut apart training set 60: the station data in the training set 20 is arranged from small to large by its predicated error in artificialneural network 40, and using the contiguous clustering algorithm of k that website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group; To each group in the m group, with it again as training set 20 and jump to artificialneural network 40 places and carry out.
Artificial neural network group 70: judge in 50 in error, if everyone artificial neural networks differs with the last predicated error summation of cutting apart before the training set 60 the predicated error summation of all training datas and is no more than 3%, everyone artificial neural networks of this moment is the artificial neural network group; Fig. 4 has shown the predicated error under the neural network number situation in limiting the artificial neural network group, this shows the necessity of setting up the neural network group and cutting apart training set.
C4.5 decision tree 80: all websites in the test set 30 are divided into 1000 groups by its network ip address, are numbered between 1~1000; If total n artificial neural network among the artificialneural network group 70, the neural network that each website is used to train in artificialneural network group 70 at last in the record training set 20 is numbered, and is numbered between 1~n; The value that is input as of this decision tree is 1~1000 a network ip address group #, and being output as value is the neural network numbering of 1~n; Utilize test set 30, use the C4.5 decision Tree algorithms to train this decision tree, preserve the decision tree after training;
Unknown website 90: the website of its connection speed of the unknown on the internet;
Connection speed predicted value 99: tounknown website 90, obtain its value and be the network ip address numbering between 1~1000, and use C4.5decision tree 80 to obtain its corresponding neural network numbering, use its neural network corresponding in artificialneural network group 70 then, predict the user's tie-time and the user bandwidth of this website that obtains.
An important application of the present invention: the flowage structure of the network resource recommended system of propertyization as shown in Figure 2 one by one.You and preceding you two parts before this personalized recommendation system comprises, background end comprisescustom browser 100, resource recommendation result 700; Preceding you comprises user's historical data 200, based on the networkconnection speed prediction 300 of machine learning, general search engine 400, basic search result 500, the merger of Search Results and adjustment 600.
Custom browser 100: by the form of plug-in unit, at existing Internet resources browser such as Firefox, but the module of the user bandwidth of the user's tie-time when embedding the each access site of recording user among the Internet Explorer during with the transmission data.
User's historical data 200: the data of the user bandwidth of the user's tie-time during each access site of obtaining bycustom browser 100 during with the transmission data; Wherein the user of certain website is defined as each user the tie-time and sends request of access to this website and obtain time interval between this address response to the user, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user's tie-time be as the criterion; Speed of download when the user bandwidth of certain website is defined as each user from this website data download, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user bandwidth be as the criterion.
Networkconnection speed prediction 300 based on machine learning: use a kind of network connection speed predicting method of the present invention, each website among the basic search result is predicted its user tie-time and user bandwidth based on machine learning.
General search engine 400 a: user interface is provided, calls the Internet resources search service; In the present embodiment, this interface is used and is realized with jsp; When the user submits a query requests to, call general network search engine (such as Google) and obtain Search Results.
Basic search result 500: after utilizing general search engine 400 to search for, preceding 100 results in its return results are resolved and obtained to its result of page searching, and, then the document is downloaded and deposited to this locality if this resource is a document.
The merger of Search Results and adjustment 600:1) merger: if the Internet resources that the user needs are text, then, use " text similarity estimation source code " (" the Code for estimating document similarity ") in the open source code package of Microsoft (Microsoft) to calculate the text similarity between them to per two in the Search Results; If its similarity is greater than 95%, then these two of marks are identical content; If the Internet resources of user's needs are other forms, then to per two in the Search Results, 10 offset location of picked at random, the data of comparison 1K byte length on each offset location in this data file of two; If this data file of two is identical in the data of all 10 positions, then they is labeled as and has identical content; The website that then all is had identical content all is integrated in the middle of the search result items that this content the most preceding occurs, is combined into a search result items; 2) adjust: if the network resource data size that the user needs is less than 100K, to each has comprised the search result items of two or more websites in the Search Results, with wherein website according to user's tie-time and bandwidth prediction 60 estimate the user resequence from small to large the tie-time; If the network resource data size that the user needs is greater than 100K, to each has comprised the search result items of two or more websites in the Search Results, with wherein website according to user's tie-time and bandwidth prediction 60 estimate user bandwidth resequence from big to small;
Resource recommendation result 700: obtain user oriented personalized resource recommendation result after the merger of process Search Results and the process of adjustment 600; This recommendation results has fully taken into account personal network's situation of user, makes the use of Internet resources can use user bandwidth to a greater extent, can experience for the user provides better internet.
The experimental result of table 1~2 demonstrates the superiority of this method clearly;
Table 1 is data of testing acquisition in the virtual network of a simulation internet situation; The network structure of this virtual network as shown in Figure 3; Each website is under the multitiered network structure that is formed by some gateway tissues; Total about 30000 computing machines in this virtual network, be distributed in three different Internet service providers (ISP) under; Gateways at different levels if more near the network root then its time delay more little and bandwidth near netting twine is big more, if more away from the network root then its time delay big more and bandwidth near netting twine is more little; Be about 1/100 of inner each gateway delay time of same ISP the time delay of the main line gateway between the different I SP, bandwidth is about 50 times; In our experiment, we have set up 500 different resource data, and each duplicates 2000 parts, are randomly dispersed in the computing machine in the virtual network; User's resource query request each time supposes that search engine can return wherein 90% website, and random alignment; The probability that the i item in the site list is returned in our suppose user clicks search is
Figure G2009101021269D00091
In table 1, listed and used before and after the network resource recommended system that has the method for the invention embodiment, the user obtains the spended time altogether of its resource requirement; Each row represents that respectively user's resource requirement is of a size of the 10K byte in each experiment, obtains the required time of this resource when 1M byte and 100M byte; Each row is not when the network resource recommended system that has the embodiment of the invention is used in expression respectively, use has the network resource recommended system of the embodiment of the invention and limits and has 1 among the artificial neural network group, and 5,10,50, the experimental data during 100 neural networks; Each experimental data is the same experiment flow average data after 100 times repeatedly, and experimental data unit is millisecond (ms); In the experiment of table 1 (a)-(c), the user is set and is in than on the computing machine away from the network root, and promptly analog dialup network user situation also has 10000 in advance, and 50000, the experimental data during 100000 user's historical records; In the experiment of table 1 (d)-(f), the user is set and is in than on the computing machine near the network root, and promptly simulate the broadband network user situation and have 10000 in advance, 50000, the experimental data during 100000 user's historical records; All data of table 1 (a)-(f) all are presented among each figure of Fig. 5 (a)-(f) with the graph mode correspondence with the data mode of performance boost number percent.
Table 1
(a) analog dialup network, 10000 user's historical records
The neural network numberThe 10K byteThe 1M byteThe 100M byte
Not 0 (not using the present invention) 285.3ms 23472ms 2290176ms
?1 233.7ms 17205ms 1662668ms
?5 117.8ms 10445ms 846298ms
?10 97.0ms 8098ms 785530ms
?50 87.1ms 6957ms 695418ms
?100 88.3ms 6329ms 718152ms
(b) analog dialup network, 50000 user's historical records
The neural network numberThe 10K byteThe 1M byteThe 100M byte
Not 0 (not using the present invention) 296.6ms 23631ms 2394101ms
?1 238.8ms 18149ms 1941615ms
?5 118.6ms 8791ms 945567ms
?10 77.1ms 6940ms 825965ms
?50 76.3ms 5334ms 631479ms
?100 73.3ms 5404ms 598086ms
(c) analog dialup network, 100000 user's historical records
The neural network numberThe 10K byteThe 1M byteThe 100M byte
Not 0 (not using the present invention) 269.8ms 22255ms 2250904ms
?1 195.6ms 17381ms 1609396ms
?5 80.1ms 7077ms 841838ms
?10 54.8ms 4874ms 567228ms
?50 41.5ms 4015ms 457065ms
?100 41.5ms 3387ms 338572ms
(d) simulation broadband network, 10000 user's historical records
The neural network numberThe 10K byteThe 1M byteThe 100M byte
Not 0 (not using the present invention) 135.8ms 3808ms 486680ms
?1 133.1ms 4116ms 476460ms
?5 123.1ms 3610ms 435579ms
?10 124.5ms 3397ms 346516ms
50 102.5ms 3046ms 305148ms
100 98.9ms 2871ms 279206ms
(e) simulation broadband network, 50000 user's historical records
The neural network numberThe 10K byteThe 1M byteThe 100M byte
Not 0 (not using the present invention) 140.2ms 4597ms 369387ms
?1 142.4ms 4951ms 361999ms
?5 125.9ms 4275ms 340575ms
?10 108.7ms 3480ms 271869ms
?50 106.1ms 3273ms 253768ms
?100 103.5ms 3167ms 249336ms
(f) simulation broadband network, 100000 user's historical records
The neural network numberThe 10K byteThe 1M byteThe 100M byte
Not 0 (not using the present invention) 175.8ms 8012ms 494828ms
?1 180.9ms 7403ms 510168ms
?5 152.2ms 6794ms 422088ms
?10 114.8ms 3332ms 220693ms
?50 101.8ms 2660ms 196942ms
?100 97.2ms 2732ms 191004ms
Table 2 is experimental datas that the present invention compares with sudden peal of thunder software under the virtual network situation; Table 2 (a)-(c) has shown that respectively qualification user historical data is 10000,50000, and the experimental data in the time of 100000; In each table, each row respectively in the each experiment of expression user's resource requirement data size size be the 10K byte, the 1M byte obtains the required time of this resource during the 100M byte; The expression when not using embodiment of the invention system and use experimental data after the network resource recommended system that has the embodiment of the invention respectively of each row; In order better to show the special efficacy of this method, the experimental data of sudden peal of thunder software under the similarity condition (simulating its single site downloading mode) is also listed in table as a comparison; Each experimental data is the similar resource average data after 100 times repeatedly; Experimental data unit is millisecond (ms).
Table 2
(a) 10000 user's historical datas
The 10K byteThe 1M byteThe 100M byte
Do not use the present invention 285.3ms 23472ms 2290176ms
Use the present invention 88.3ms 6329ms 718152ms
Use a sudden peal of thunder 254.4ms 19603ms 1951160ms
(b) 50000 user's historical datas
The 10K byteThe 1M byteThe 100M byte
Do not use the present invention 296.6ms 23631ms 2394101ms
Use the present invention 73.3ms 5404ms 590086ms
Use a sudden peal of thunder 253.6ms 18858ms 1855428ms
(c) 100000 user's historical datas
The 10K byteThe 1M byteThe 100M byte
Do not use the present invention 269.8ms 22255ms 2250904ms
Use the present invention 41.5ms 3387ms 338572ms
Use a sudden peal of thunder 204.8ms 15710ms 1482256ms
Fig. 6 is that the embodiment of the invention is used the experimental data under real China Internet; In the experiment of Fig. 6, ((user #11~#20) has used the network resource recommended system that has the embodiment of the invention for user #1~#10) and 10 broadband network users from different regions for 10 Dial-up Network users from different regions; After using fortnight, to the exemplary resource on 3 kinds of internets: text, PDF document and online game installation file conduct interviews; Fig. 6 (a)-(c) has shown the user's download text respectively, the experimental data of PDF document and online game installation file required time; In each figure, when each column figure represents not use the network resource recommended system that has the embodiment of the invention respectively and use experimental data after the network resource recommended system that has the embodiment of the invention; In order better to show the special efficacy of this method, the experimental data of sudden peal of thunder software under the similarity condition (being defined as the single site downloading mode) is also listed as a comparison with the column figure; Each user's experimental data is all listed in the drawings; Each experimental data is the similar resource average data after 100 times repeatedly; The average data size of above-mentioned three class resources is respectively the 10.6K byte, 3.49M byte, 784M byte; Experimental data unit is second (sec.).
Above-mentioned experiment shows, the present invention has effectively utilized user's web-based history Visitor Logs, the method of using artificial intelligence has been predicted the connection speed between user and each website, personal network's situation of user has been combined in the access to netwoks process, make the use of Internet resources can use user bandwidth to a greater extent, can experience for the user provides better internet.
The above only is the preferred embodiment of a kind of network connection speed predicting method based on machine learning of the present invention, is not in order to limit the scope of essence technology contents of the present invention.A kind of network connection speed predicting method of the present invention based on machine learning; its essence technology contents is to be defined in widely in claims; any technology entity or method that other people are finished; if it is identical with the definien of institute in claims; or the change of same equivalence, all will be regarded as being covered by within this scope of patent protection.

Claims (1)

Translated fromChinese
1.一种基于机器学习的网络连接速度预测方法,其特征在于包括以下步骤:1. A method for predicting network connection speed based on machine learning, characterized in that it may further comprise the steps:1)利用自定义浏览器,记录用户浏览过的网站连接速度,作为训练集和测试集;1) Use a custom browser to record the connection speed of the website that the user has browsed, as a training set and a test set;2)利用获得的网站连接速度,使用神经网络训练并预测用户与训练集中各网站的连接速度;2) Using the obtained website connection speed, use the neural network to train and predict the connection speed between the user and each website in the training set;3)根据神经网络的预测误差减小状况,或者执行步骤4),或者将训练集分成更小的训练集并对每个训练集返回执行步骤2);3) According to the prediction error reduction status of the neural network, or perform step 4), or divide the training set into smaller training sets and return to perform step 2) for each training set;4)使用决策树测试神经网络的预测性能;4) Use a decision tree to test the predictive performance of the neural network;5)使用决策树和神经网络,预测用户与任何未知网站的连接速度;5) Use decision trees and neural networks to predict the connection speed between users and any unknown website;所述的利用自定义浏览器,记录用户浏览过的网站的连接速度,作为训练集和测试集步骤:The steps of using a custom browser to record the connection speed of the websites browsed by the user as a training set and a test set:(a)对用户访问过的每个网站,记录每次用户向网站发出访问请求到用户获得回应的时间间隔,记为网站的用户连接时间;(a) For each website that the user has visited, record the time interval between each time the user sends an access request to the website and the user gets a response, and record it as the user connection time of the website;(b)对用户访问过的每个网站,记录每次用户从网站下载数据时的下载速度,记为网站的用户带宽;(b) For each website visited by the user, record the download speed each time the user downloads data from the website, and record it as the user bandwidth of the website;(c)若用户多次访问网站,则以最近一周中或最近10次的用户连接时间的平均值作为网站的用户连接时间,以最近一周中或最近10次的用户带宽的平均值作为网站的用户带宽;(c) If the user visits the website many times, the average value of the user connection time in the last week or the last 10 times is taken as the user connection time of the website, and the average value of the user bandwidth in the last week or the last 10 times is taken as the website user bandwidth;(d)从用户历史数据中随机选择10%作为测试集,其余90%作为训练集;(d) Randomly select 10% from user historical data as a test set, and the remaining 90% as a training set;所述的利用获得的网站连接速度,使用神经网络训练并预测用户与训练集中各网站的连接速度步骤:The steps of using the obtained website connection speed to train and predict the connection speed between the user and each website in the training set by using the neural network:(e)建立人工神经网络,其输入为一个网站的特征数据:包括一个表示为32位整数的网络IP地址和1个取值在0~23之间的整数用于表示当前时间的小时数;其输出为2个实数,分别表示用户与网站的连接时间估计值和带宽估计值;(e) Establishing an artificial neural network whose input is characteristic data of a website: including a network IP address expressed as a 32-bit integer and an integer with a value between 0 and 23 used to represent the hour of the current time; Its output is two real numbers, representing the estimated value of connection time and the estimated value of bandwidth between the user and the website respectively;(f)使用反向传播算法,用步骤(a)-(d)获得的用户连接时间和用户带宽历史数据作为训练集,使用反向传播算法训练步骤(e)所建立的神经网络,保存训练后的神经网络;(f) Use the backpropagation algorithm, use the user connection time and user bandwidth historical data obtained in steps (a)-(d) as the training set, use the backpropagation algorithm to train the neural network established in step (e), and save the training After the neural network;所述的根据神经网络的预测误差减小状况,或者执行步骤4),或者将训练集分成更小的训练集并对每个训练集返回执行步骤2)步骤:Described according to the prediction error reduction situation of neural network, or carry out step 4), perhaps training set is divided into smaller training set and returns to carrying out step 2) step for each training set:(g)使用步骤(f)中训练后的神经网络预测在训练集中每个网站的用户连接时间和用户带宽,计算出网站的预测误差e:(g) use the neural network after training in step (f) to predict the user connection time and user bandwidth of each website in the training set, and calculate the prediction error e of the website:e=t+Kb*be=t+Kb*b其中t为用户连接时间的预测误差,单位为毫秒;b为用户带宽的预测误差,单位为千比特每秒;Kb为取值为200~1000的系数;Among them, t is the prediction error of user connection time, in milliseconds; b is the prediction error of user bandwidth, in kilobits per second; Kb is a coefficient with a value of 200 to 1000;(h)若步骤(g)不是第一次被执行,且预测误差e总和与上一次预测误差总和相差不超过3%,则跳转执行步骤(k);(h) If step (g) is not executed for the first time, and the difference between the sum of the forecast errors e and the sum of the last forecast errors is not more than 3%, then skip to step (k);(i)将训练集中的网站数据按其在步骤(g)中的预测误差e从小到大排列,并使用k邻近聚类算法将网站分成m组,m为取值在1~5之间且使得各组间平均预测误差差别最大的整数;(i) Arrange the website data in the training set according to their prediction error e in step (g) from small to large, and use the k-nearest clustering algorithm to divide the websites into m groups, where m is between 1 and 5 and The integer that maximizes the difference in mean prediction errors between groups;(j)对步骤(i)中的每一组网站,将其作为训练集跳转执行步骤(e);(j) For each group of websites in step (i), use it as a training set to jump and execute step (e);所述的使用决策树测试神经网络的预测性能步骤:The described steps for testing the predictive performance of a neural network using a decision tree:(k)将步骤(d)所获得的测试集中的网站按其网络IP地址分成1000组,编号为1~1000之间;设步骤(e)-(j)过程中最终使用了n个神经网络,记录其中每个站点在步骤(e)-(j)结束后其最后用于训练的神经网络编号,编号为1~n之间;(k) Divide the websites in the test set obtained in step (d) into 1000 groups according to their network IP addresses, numbered between 1 and 1000; it is assumed that n neural networks are finally used in the process of steps (e)-(j) , record the number of the neural network used for training at each site after the steps (e)-(j) are completed, and the number is between 1 and n;(l)建立一个决策树,其输入为取值为1~1000间的网络IP地址组编号,输出为取值为1~n的神经网络编号;(l) set up a decision tree, its input is the network IP address group number between 1~1000, and the output is the neural network number of 1~n;(m)利用步骤(d)所获得的测试集数据,使用C4.5决策树算法来训练步骤(l)中所建立的决策树,保存训练后的决策树;(m) Utilize the test set data obtained in step (d), use the C4.5 decision tree algorithm to train the decision tree established in step (l), and save the trained decision tree;所述的使用决策树和神经网络,预测用户与任何未知网站的连接速度步骤:The described steps of predicting a user's connection speed to any unknown website using a decision tree and a neural network:(n)对任意一个未知其连接速度的网站,按照步骤(k)所述获得其取值为1~1000之间的网络IP地址编号,并使用步骤(m)获得的决策树获得其对应的神经网络编号;(n) For any website whose connection speed is unknown, obtain the network IP address number whose value is between 1 and 1000 as described in step (k), and use the decision tree obtained in step (m) to obtain its corresponding neural network number;(o)使用其对应的神经网络,预测该网站的用户连接时间和用户带宽。(o) Use its corresponding neural network to predict the user connection time and user bandwidth of the website.
CN2009101021269A2009-08-132009-08-13 A Machine Learning-Based Network Connection Speed Prediction MethodExpired - Fee RelatedCN101634995B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN2009101021269ACN101634995B (en)2009-08-132009-08-13 A Machine Learning-Based Network Connection Speed Prediction Method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN2009101021269ACN101634995B (en)2009-08-132009-08-13 A Machine Learning-Based Network Connection Speed Prediction Method

Publications (2)

Publication NumberPublication Date
CN101634995A CN101634995A (en)2010-01-27
CN101634995Btrue CN101634995B (en)2011-09-21

Family

ID=41594185

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN2009101021269AExpired - Fee RelatedCN101634995B (en)2009-08-132009-08-13 A Machine Learning-Based Network Connection Speed Prediction Method

Country Status (1)

CountryLink
CN (1)CN101634995B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103116708A (en)*2013-03-052013-05-22星云融创(北京)信息技术有限公司Device and method for evaluating website experience
CN107958695B (en)*2017-11-172021-12-14桂林电子科技大学 A high-precision drug quantification method based on machine learning
CN108011780B (en)2017-12-012019-01-22北京百度网讯科技有限公司 A data transmission rate measurement method, apparatus, device and computer readable medium
CN110362772B (en)*2019-06-112022-04-01北京邮电大学Real-time webpage quality evaluation method and system based on deep neural network
CN110445653B (en)*2019-08-122022-03-29灵长智能科技(杭州)有限公司Network state prediction method, device, equipment and medium
CN113033783B (en)*2021-04-072024-04-26苏州瑞立思科技有限公司Bandwidth adjustment method based on BP neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2004029884A (en)*2002-06-212004-01-29Kengo FujiwaraRanking analyzing device
CN1710560A (en)*2005-06-222005-12-21浙江大学 A Personalized Search Engine Method Based on Link Analysis
CN1744529A (en)*2004-08-312006-03-08英业达股份有限公司 A method for automatic testing of multi-site performance
CN101018164A (en)*2007-02-282007-08-15西南科技大学A TCP/IP network performance evaluation prediction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2004029884A (en)*2002-06-212004-01-29Kengo FujiwaraRanking analyzing device
CN1744529A (en)*2004-08-312006-03-08英业达股份有限公司 A method for automatic testing of multi-site performance
CN1710560A (en)*2005-06-222005-12-21浙江大学 A Personalized Search Engine Method Based on Link Analysis
CN101018164A (en)*2007-02-282007-08-15西南科技大学A TCP/IP network performance evaluation prediction method

Also Published As

Publication numberPublication date
CN101634995A (en)2010-01-27

Similar Documents

PublicationPublication DateTitle
CN101615197B (en)Personalized network resource recommended method based on network connection speed
CN101634995B (en) A Machine Learning-Based Network Connection Speed Prediction Method
CN102646129B (en)Topic-relative distributed web crawler system
Pires et al.The nested assembly of individual‐resource networks
CN107665444B (en) A method and system for instant effect evaluation of online advertising based on user online behavior
CN103176982B (en)The method and system that a kind of e-book is recommended
TWI390421B (en)Method and apparatus for estimating the performance of an information package
CN102231165B (en)Method for searching and sequencing personalized web pages based on user retention time analysis
US20090150371A1 (en)Methods and apparatus for computing graph similarity via signature similarity
US20130041898A1 (en)Image processing system, image processing method, program, and non-transitory information storage medium
CN103399861B (en)A kind of network address in Web side navigation recommends methods, devices and systems
CN104298782B (en)Internet user actively accesses the analysis method of action trail
CN102831199A (en)Method and device for establishing interest model
Nithya et al.Novel pre-processing technique for web log mining by removing global noise and web robots
US20110066608A1 (en)Systems and methods for delivering targeted content to a user
EP1708106A1 (en)Associating advertisement information with network-based content locations
CN102946320A (en)Distributed supervision method and system for user behavior log forecasting network
CN106980677B (en) Industry-Oriented Topic Search Methods
CN104636245A (en)User browsing behavior collection modes based on real-time update
CN109829504A (en)A kind of prediction technique and system forwarding behavior based on ICS-SVM analysis user
Lin et al.A novel website structure optimization model for more effective web navigation
LeiModeling and intelligent analysis of web user behavior of web user behavior
SinghAnalysis of web site using web log expert tool based on web data mining
Lee et al.A proactive request distribution (prord) using web log mining in a cluster-based web server
Saste et al.Predicting demographic attributes from web usage: Purpose and methodologies

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
C17Cessation of patent right
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20110921

Termination date:20130813


[8]ページ先頭

©2009-2025 Movatter.jp