Movatterモバイル変換


[0]ホーム

URL:


CN109242257A - A kind of 4G Internet user complaint model based on key index association analysis - Google Patents

A kind of 4G Internet user complaint model based on key index association analysis
Download PDF

Info

Publication number
CN109242257A
CN109242257ACN201810902832.0ACN201810902832ACN109242257ACN 109242257 ACN109242257 ACN 109242257ACN 201810902832 ACN201810902832 ACN 201810902832ACN 109242257 ACN109242257 ACN 109242257A
Authority
CN
China
Prior art keywords
model
data
complains
association analysis
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810902832.0A
Other languages
Chinese (zh)
Other versions
CN109242257B (en
Inventor
仇春芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU HANXIN COMMUNICATION TECHNOLOGY Co Ltd
Original Assignee
GUANGZHOU HANXIN COMMUNICATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGZHOU HANXIN COMMUNICATION TECHNOLOGY Co LtdfiledCriticalGUANGZHOU HANXIN COMMUNICATION TECHNOLOGY Co Ltd
Priority to CN201810902832.0ApriorityCriticalpatent/CN109242257B/en
Publication of CN109242257ApublicationCriticalpatent/CN109242257A/en
Application grantedgrantedCritical
Publication of CN109242257BpublicationCriticalpatent/CN109242257B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a kind of, and the 4G Internet user based on key index association analysis complains model, comprising the following steps: S1 utilizes Logic Regression Models, finds out several factors for influencing customer complaint and establishes decision-tree model;S2 extracts user data, carries out whole merging treatment to data;S3 carries out T inspection to user data, tentatively finds out the factor for influencing to complain;S4 sets training dataset, establishes Logic Regression Models using R language;And it is optimized according to the result returned after rudimentary model is established using rear stepwise logistic regression method, obtain final result and determines final mask;S5 carries out Chi-square Test to model;S6 predicts test data set using the model, and prediction result is carried out to intersect statistics with actual result.The present invention provides a kind of, and the 4G Internet user based on key index association analysis complains model, complains Producing reason to solve 4G Internet user, and carry out preventative solution in advance to potential report user.

Description

A kind of 4G Internet user complaint model based on key index association analysis
Technical field
The present invention relates to mobile communication fields, surf the Internet more particularly, to a kind of 4G based on key index association analysisCustomer complaint model.
Background technique
With the rapid development of mobile Internet, requirement of the user to mobile network is higher and higher, reduces complaint amount and mentionsThe preceding potential 4G customer complaint of prevention is the vital task of network worker, and the present invention can reduce the complaint amount of 4G user, andPromote 4G user's online satisfaction.
Summary of the invention
Present invention aim to address said one or multiple defects, propose a kind of 4G based on key index association analysisInternet user complains model.
To realize the above goal of the invention, the technical solution adopted is that:
A kind of 4G Internet user based on key index association analysis complains the method for building up of model, comprising the following steps:
S1: exploring the influence factor and condition of report user, using Logic Regression Models, if finding out influences customer complaintDry factor simultaneously establishes decision-tree model;
S2: extract include several factors described in step S1 user data, including original report user's data and non-Report user's data;Whole merging treatment is carried out to data;
S3: T inspection is carried out to the step S2 original report user's data obtained and non-report user's data, comparison, which is complained, to be referred toMark and the non-difference for complaining index, tentatively find out the factor for influencing to complain;
S4: setting training dataset establishes Logic Regression Models using R language;It wherein sets and whether complains as because becomingAmount, value is set as 0 and 1, and is optimized according to the result returned after rudimentary model is established using rear stepwise logistic regression method,It obtains final result and determines final mask;
S5: Chi-square Test is carried out to model, it is ensured that each variable of model also needs to ensure entire while passing through significance testModel is significant;
S6: predicting test data set using the model, and prediction result is carried out to intersect statistics with actual result.
Further, when several factors for customer complaint being influenced described in step S1 include attach success rate, attachProlong, default bearing success rate, default bearing time delay, success rate of shaking hands for Tcp23 times, time delay of shaking hands for Tcp23 times.
Further, data are arranged described in step S2 the following steps are included:
S2.1: all data success rates all remove percentage sign, retain the number between 0~100, and decimal point retains 2;
S2.2: Rejection index missing number is greater than 5 record;
S2.3: the record for being 1~5 to missing values number fills up missing values using k-nearest neighbor;
S2.4: randomly selecting and complain 80% and non-80% complained in record in record for training pattern, remaining20% for predicting.
Final mask are as follows:
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of, and the 4G Internet user based on key index association analysis complains model, to solve on 4GNetwork users complain Producing reason, and carry out preventative solution in advance to potential report user.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
A kind of 4G Internet user based on key index association analysis complains the method for building up of model, referring to FIG. 1, includingFollowing steps:
S1: exploring the influence factor and condition of report user, using Logic Regression Models, if finding out influences customer complaintDry factor simultaneously establishes decision-tree model;
Explore the influence factor and condition of report user.15 indexs are done with T inspection between complaint group and non-complaint group, is sent outExisting attach success rate, attach time delay, default bearing success rate, default bearing time delay, shake hands for Tcp23 times success rate, Tcp23Secondary time delay of shaking hands is with very strong significant difference.Using Logic Regression Models, 5 factors for influencing customer complaint are had found:Attach success rate, attach time delay, default bearing time delay, success rate of shaking hands for Tcp23 times, time delay of shaking hands for Tcp23 times.According to this5 factors establish decision-tree model, obtain user and are likely to the condition complained:
1) Tcp23 shake hands success rate < 100 and shake hands for Tcp23 times time delay < 80 and time delay < 178 attach;
2) Tcp23 success rate of shaking hands<100 and time delay of shaking hands for Tcp23 times>=80;
3) Tcp23 shake hands success rate=100 and attach time delay<374 and attach time delay>=179 and default bearingTime delay >=191;
4) Tcp23 shake hands success rate=100 and time delay < 179 attach;
5) shake hands for Tcp23 times success rate=100 and attach time delay >=374.
The decision-tree model has 65.7% predictablity rate, but obtained complaint condition is less than satisfactory.FinallyThreshold value whether there is to each index single factor analysis, so that user is likely to complain outside this threshold value, final conclusion is as follows:
1) when attach success rate is less than or equal to 60%, user is likely to complain;
2) when attach time delay is optionally greater than 1500ms, user is likely to complain;
3) when default bearing success rate is less than or equal to 20%, user is likely to complain;
4) when default bearing time delay is optionally greater than 1000ms, user is likely to complain;
5) when Tcp23 success rate of shaking hands is less than or equal to 90%, user is likely to complain.
S2: extract include several factors described in step S1 user data, including original report user's data and non-Report user's data;Whole merging treatment is carried out to data;
The present embodiment extracts original 2336 parts of report user's data, and 2993 parts of non-report user, totally 5329 record.DataIt is integrated into a table, includes following index:
When attach success rate, attach time delay, default bearing success rate, default bearing time delay, DNS success rate, DNSProlong, success rate of shaking hands for Tcp12 times, time delay of shaking hands for Tcp12 times, success rate of shaking hands for Tcp23 times, time delay of shaking hands for Tcp23 times, Get are rungAnswer success rate, Get response delay, Post response success rate, Post response delay, great Bao (being greater than 500KB) downloading rate.
For modeling, following processing is done to data:
1) all success rates all remove percentage sign, retain the number between 0~100, and decimal point retains 2.For example, 99.5%It is transformed to 99.50;
2) Rejection index missing number is greater than 5 record: calculating the index missing number of every record: 15 indexs first, such asFruit has the index missing values of 5 or more (without 5), considers directly to reject, otherwise can be very unfavorable to subsequent modeling.After kicking off,Remaining 4819 parts of data (complain 1999, non-complaint is 2820);
3) record for being 1~5 to missing values number, fills up missing values using k-nearest neighbor, in this way, data to be modeled do not haveThere are missing values, facilitates modeling;
4) it randomly selects and complains 80% and non-80% complained in record in record for training pattern (total totally 3820Item record), residue 20% is for predicting.
S3: T inspection is carried out to the step S2 original report user's data obtained and non-report user's data, comparison, which is complained, to be referred toMark and the non-difference for complaining index, tentatively find out the factor for influencing to complain;
The Mathematics Application of P value is as follows: in T inspection
P valueProbability by chanceTo null hypothesisStatistical significance
P>0.05A possibility that occurring by chance is greater than 5%It cannot negate null hypothesisTwo groups of difference are without significant meaning
P<0.05A possibility that occurring by chance is less than 5%It can negate null hypothesisTwo groups of difference have significant meaning
P<0.01A possibility that occurring by chance is less than 1%It can negate null hypothesisDifference of them has very significant meaning
Index and the non-difference complained between index are complained using the method comparison that T is examined, influence can be tentatively found out and complainFactor.Data use 2336 parts of original report user's data, and 2993 parts of non-report user, totally 5329 records are (containing missingValue), each index independently calculates, and it encounters the record containing missing values and ignores automatically, as a result as shown in the table:
As seen from the above table, attach success rate, attach time delay, default bearing time delay, success rate of shaking hands for Tcp23 times,Time delay of shaking hands for Tcp23 times these indexs have very strong significant difference between report user and non-report user, and (99% setsLetter is horizontal).Also there were significant differences under 95% confidence level for default bearing success rate, without aobvious under 99% confidence levelWrite difference.Other indexs are not significantly different.
S4: setting training dataset establishes Logic Regression Models using R language;It wherein sets and whether complains as because becomingAmount, value is set as 0 and 1, and is optimized according to the result returned after rudimentary model is established using rear stepwise logistic regression method,It obtains final result and determines final mask;
Below according to training dataset (3820 records), Logic Regression Models are established using R language.Whether conduct is complainedDependent variable, value only have 0 and 1 (0 is non-complaint, and 1 is complaint), and 15 indexs are as independent variable.R is returned after establishing rudimentary modelResult it is as follows:
Call:
Glm (whether formula=complain~and attach success rate+attach time delay+default bearing success rate+
Default bearing time delay+DNS success rate+DNS time delay+Tcp12 times+Tcp12 time delays of shaking hands of success rate of shaking hands+Secondary time delay+Get response success rate+Get response delay+Post response the success rate of shaking hands of the Tcp23 success rate+Tcp23 that shakes hands+Post response delay+big packet is greater than 500KB. downloading rate,
Family=" binomial ", data=train.dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.2577-0.9793-0.9161 1.3126 2.3556
Coefficients:
Big packet is greater than 500KB. downloading rate -1.491e-06 2.603e-06-0.573 0.566695
---
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:5182.4 on 3819 degrees of freedom Residual deviance:5006.4 on 3804 degrees of freedom AIC:5038.4
Number of Fisher Scoring iterations:5
In result above, band * * * indicates that coefficient highly significant, band * * indicate very significant, and band * indicates significant, and band indicates micro-Significantly, tape identification is not then significant.As a result have in the coefficient of multiple indexs be it is inapparent, need to be optimized model.
Using stepwise logistic regression method Optimized model backward, (every step, which rejects one, influences least apparent factor, so that mouldIt is more preferable before type ratio, the significant factor of final retention factor), the final result of successive Regression is as follows backward:
Call:
Glm (whether formula=complain~and attach success rate+attach time delay+default bearing time delay+Tcp12 times holdsHand time delay+Tcp23 times+Tcp23 time delays of shaking hands of success rate of shaking hands, family=" binomial ", data=train.dt)
Deviance Residuals:
Min 1Q Median 3Q Max -3.2489 -0.9789 -0.9198 1.3174 1.7080
Coefficients:
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:5182.4 on 3819 degrees of freedom Residual deviance:5012.0 on 3813 degrees of freedom AIC:5026
Number of Fisher Scoring iterations:5
The model finally remains 6 indexs, but the coefficient for time delay of shaking hands for Tcp12 times is still not significant, consideration rejectingThis index is modeled again again with 5 indexs of residue:
Call:
Glm (whether formula=complain~and attach success rate+attach time delay+default bearing time delay+Tcp23 times holds+ Tcp23 time delays of shaking hands of hand success rate, family=" binomial ", data=train.dt)
Deviance Residuals:
Min 1Q Median 3Q Max -3.2410 -0.9790 -0.9201 1.3180 1.5413
Coefficients:
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:5182.4 on 3819 degrees of freedom Residual deviance:5014.2 on 3814 degrees of freedom AIC:5026.2
Number of Fisher Scoring iterations:5
So far each term coefficient significantly, remains relatively important variable.Then the model established are as follows:
+ 0.0005773* default bearing time delay -0.0666100*Tcp23 times the success rate+0.0038896*Tcp23 that shakes hands is secondaryIt shakes hands time delay
As model calculated P > 0.5, it is believed that can complain, otherwise not complain.
S5: Chi-square Test is carried out to model, it is ensured that each variable of model also needs to ensure entire while passing through significance testModel is significant;
Each variable of model also needs to ensure that entire model is significantly, could only in this way to protect while passing through significance testModel of a syndrome is correct, significant.Chi-square Test is carried out to model, as a result as follows:
Analysis of Deviance Table
Model:binomial,link:logit
Response: whether complain
Terms added sequentially(first to last)
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
Model has passed through whole significant inspection, illustrates that the model being made of above-mentioned variable is meaningful.
S6: predicting test data set using the model, and prediction result is carried out to intersect statistics with actual result.
Below with the model, to test data set, (totally 956 records complain 396, and 560) non-complaint is predicted, in advanceSurvey whether a unknown subscriber may complain.As model calculated P > 0.5, it is believed that can complain, otherwise not complain.It will predictionAs a result it is done with actual result and intersects statistics, as shown in the table:
Report userNon- report user
Prediction is complained16294
Prediction is not complained234466
Accuracy rate: 162/ (162+94) * 100%=63.8% is complained in prediction;
It then predicts not complain accuracy rate: 466/ (466+234) * 100%=66.6%;
Whole predictablity rate: (466+162)/(466+162+234+94) * 100%=65.7%;
Recall rate: 162/ (162+234)=40.9%.
Brief summary: model prediction ability is preferable.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pairThe restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above descriptionTo make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all thisMade any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of inventionProtection scope within.

Claims (4)

CN201810902832.0A2018-08-092018-08-094G internet user complaint model based on key index correlation analysisActiveCN109242257B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810902832.0ACN109242257B (en)2018-08-092018-08-094G internet user complaint model based on key index correlation analysis

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810902832.0ACN109242257B (en)2018-08-092018-08-094G internet user complaint model based on key index correlation analysis

Publications (2)

Publication NumberPublication Date
CN109242257Atrue CN109242257A (en)2019-01-18
CN109242257B CN109242257B (en)2021-08-20

Family

ID=65069998

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810902832.0AActiveCN109242257B (en)2018-08-092018-08-094G internet user complaint model based on key index correlation analysis

Country Status (1)

CountryLink
CN (1)CN109242257B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111314124A (en)*2020-02-072020-06-19中国联合网络通信集团有限公司Network problem analysis method, device, equipment and storage medium for high-speed rail network
CN111553816A (en)*2020-04-202020-08-18北京北大软件工程股份有限公司Method and device for analyzing administrative review influence factors
CN112101692A (en)*2019-06-182020-12-18中国移动通信集团浙江有限公司 Method and device for identifying poor quality mobile Internet users
CN112685957A (en)*2020-12-302021-04-20中国电力科学研究院有限公司Method for predicting relay protection defects
CN112699099A (en)*2020-12-302021-04-23广州杰赛科技股份有限公司Method, device and storage medium for expanding user complaint database
CN113157763A (en)*2021-01-042021-07-23北京汇达城数科技发展有限公司Accurate identification system and method for user with specified behavior information
CN113780677A (en)*2021-09-262021-12-10深圳供电局有限公司 A method and device for predicting potential power repeated demand users
CN114548470A (en)*2020-11-262022-05-27顺丰科技有限公司Prediction method and device of user complaint amount, computer equipment and storage medium
CN114915845A (en)*2021-12-282022-08-16天翼数字生活科技有限公司 System and method for predicting IPTV subscriber claims
CN115442833A (en)*2021-06-032022-12-06中国移动通信集团四川有限公司 Complaint root cause analysis method, device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20110132083A (en)*2010-06-012011-12-07가톨릭대학교 산학협력단 How to predict the risk of developing cirrhosis to liver cancer
US20130311387A1 (en)*2012-04-182013-11-21Jurgen SchmerlerPredictive method and apparatus to detect compliance risk
US20140244644A1 (en)*2011-07-062014-08-28Fred Bergman Healthcare Pty LtdEvent detection algorithms
CN105574601A (en)*2014-10-252016-05-11胡峻源Regression model modeling method for mobile traffic statistics
CN106204106A (en)*2016-06-282016-12-07武汉斗鱼网络科技有限公司A kind of specific user's recognition methods and system
CN106548357A (en)*2016-10-272017-03-29南方电网科学研究院有限责任公司Client satisfaction evaluation method and system
CN106971310A (en)*2017-03-162017-07-21国家电网公司A kind of customer complaint quantitative forecasting technique and device
CN107437124A (en)*2017-07-202017-12-05大连大学A kind of operator based on big data analysis complains and trouble correlation analytic method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20110132083A (en)*2010-06-012011-12-07가톨릭대학교 산학협력단 How to predict the risk of developing cirrhosis to liver cancer
US20140244644A1 (en)*2011-07-062014-08-28Fred Bergman Healthcare Pty LtdEvent detection algorithms
US20130311387A1 (en)*2012-04-182013-11-21Jurgen SchmerlerPredictive method and apparatus to detect compliance risk
CN105574601A (en)*2014-10-252016-05-11胡峻源Regression model modeling method for mobile traffic statistics
CN106204106A (en)*2016-06-282016-12-07武汉斗鱼网络科技有限公司A kind of specific user's recognition methods and system
CN106548357A (en)*2016-10-272017-03-29南方电网科学研究院有限责任公司Client satisfaction evaluation method and system
CN106971310A (en)*2017-03-162017-07-21国家电网公司A kind of customer complaint quantitative forecasting technique and device
CN107437124A (en)*2017-07-202017-12-05大连大学A kind of operator based on big data analysis complains and trouble correlation analytic method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112101692A (en)*2019-06-182020-12-18中国移动通信集团浙江有限公司 Method and device for identifying poor quality mobile Internet users
CN112101692B (en)*2019-06-182023-11-24中国移动通信集团浙江有限公司Identification method and device for mobile internet bad quality users
CN111314124A (en)*2020-02-072020-06-19中国联合网络通信集团有限公司Network problem analysis method, device, equipment and storage medium for high-speed rail network
CN111314124B (en)*2020-02-072023-04-07中国联合网络通信集团有限公司Network problem analysis method, device, equipment and storage medium for high-speed rail network
CN111553816B (en)*2020-04-202023-11-03北京北大软件工程股份有限公司Administrative multiple-proposal influence factor analysis method and device
CN111553816A (en)*2020-04-202020-08-18北京北大软件工程股份有限公司Method and device for analyzing administrative review influence factors
CN114548470A (en)*2020-11-262022-05-27顺丰科技有限公司Prediction method and device of user complaint amount, computer equipment and storage medium
CN112685957A (en)*2020-12-302021-04-20中国电力科学研究院有限公司Method for predicting relay protection defects
CN112699099A (en)*2020-12-302021-04-23广州杰赛科技股份有限公司Method, device and storage medium for expanding user complaint database
CN112699099B (en)*2020-12-302024-06-04中电科普天科技股份有限公司User complaint database expansion method, device and storage medium
CN113157763A (en)*2021-01-042021-07-23北京汇达城数科技发展有限公司Accurate identification system and method for user with specified behavior information
CN113157763B (en)*2021-01-042023-10-13北京汇达城数科技发展有限公司Accurate identification system and method for user with specified behavior information
CN115442833A (en)*2021-06-032022-12-06中国移动通信集团四川有限公司 Complaint root cause analysis method, device and electronic equipment
CN113780677A (en)*2021-09-262021-12-10深圳供电局有限公司 A method and device for predicting potential power repeated demand users
CN114915845A (en)*2021-12-282022-08-16天翼数字生活科技有限公司 System and method for predicting IPTV subscriber claims

Also Published As

Publication numberPublication date
CN109242257B (en)2021-08-20

Similar Documents

PublicationPublication DateTitle
CN109242257A (en)A kind of 4G Internet user complaint model based on key index association analysis
CN108924333B (en)Fraud telephone identification method, device and system
CN109905411B (en)Abnormal user identification method and device and computing equipment
CN104396188B (en)System and method for carrying out basic reason analysis to mobile network property problem
CN106780263B (en)High-risk personnel analysis and identification method based on big data platform
CN110337059B (en)Analysis algorithm, server and network system for family relationship of user
US8725746B2 (en)Filtering information using targeted filtering schemes
CN106658564B (en)The recognition methods of domestic consumer a kind of and device
CN106713290B (en)Method for identifying main user account and server
US8139756B2 (en)Method, apparatus, and computer product for computing skill value
CN106022826A (en)Cheating user recognition method and system in webcast platform
CN109543734A (en)User portrait method and device, storage medium
CN106936997B (en)A kind of rubbish voice recognition methods and system based on social networks map
CN108696626A (en)The treating method and apparatus of invalid information
CN109784393A (en)A kind of kinsfolk&#39;s identification clustering method based on telecommunications big data
CN110493476B (en)Detection method, device, server and storage medium
CN108462615A (en)A kind of network user&#39;s group technology and device
CN108985048A (en)Simulator recognition methods and relevant apparatus
CN118674465A (en)Fraud recognition method, fraud recognition device, electronic equipment and storage medium
CN112671982B (en)Crank call identification method and system
CN113890941A (en)Method and device for identifying illegal number
CN109819125A (en)A kind of method and device limiting telecommunication fraud
US9450982B1 (en)Email spoofing detection via infrastructure machine learning
CN107360087A (en)A kind of social graph construction method
CN115334510B (en)Identification method and device for fraud number

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp