Movatterモバイル変換


[0]ホーム

URL:


CN102833085A - System and method for classifying communication network messages based on mass user behavior data - Google Patents

System and method for classifying communication network messages based on mass user behavior data
Download PDF

Info

Publication number
CN102833085A
CN102833085ACN2011101620972ACN201110162097ACN102833085ACN 102833085 ACN102833085 ACN 102833085ACN 2011101620972 ACN2011101620972 ACN 2011101620972ACN 201110162097 ACN201110162097 ACN 201110162097ACN 102833085 ACN102833085 ACN 102833085A
Authority
CN
China
Prior art keywords
data
message
module
communication network
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011101620972A
Other languages
Chinese (zh)
Other versions
CN102833085B (en
Inventor
刘晓亮
罗峰
黄苏支
李娜
王琪
张玉波
阎飞飞
刘书良
刘生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Izp (China) Network Technology Co. Ltd.
Original Assignee
BEIJING IZP TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING IZP TECHNOLOGIES Co LtdfiledCriticalBEIJING IZP TECHNOLOGIES Co Ltd
Priority to CN201110162097.2ApriorityCriticalpatent/CN102833085B/en
Publication of CN102833085ApublicationCriticalpatent/CN102833085A/en
Application grantedgrantedCritical
Publication of CN102833085BpublicationCriticalpatent/CN102833085B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

The invention discloses a system and a method for classifying communication network messages based on mass user behavior data, which are characterized by comprising a user data acquisition system. The user data acquisition system transmits acquired data to a data cleaning module; the data cleaning module transmits characteristic matrix generated by cleaning and extracting message characteristics to a classification algorithm module; the classification algorithm module and a classification model exchange data mutually; and the classification model outputs a model finally compared with the messages through a model output module. By adopting the system and the method, various messages can be accurately identified, the requirement on fine grit of data in message analysis can be satisfied, and the user behavior data including user access and search data can be carefully analyzed through the message classification effectively.

Description

Communication network message classification system and method based on the mass user behavioral data
Technical field
The field of the present invention relates to comprises; Mass user use the communication network message that various network device and terminal access network produce analysis, extract message characteristic, use data mining and machine learning techniques that the communication network message is carried out correct classification to predict, design a kind of communication network message classification system and method especially according to user's behavior based on the mass user behavioral data.
Background technology
The system that all is based on rule that most of traditional message classification uses just adds up the keyword that occurs in the different messages, forms a rule base then, when next message occurs, just goes to mate in the rule base, gets the general classification of outgoing packet.
The shortcoming of this method is clearly: (1) has a large amount of messages to exist, and can not obtain a very accurate rule base; (2) the possibility repeatability of rule in the Different Rule storehouse uses matching strategy possibly obtain inaccurate message classification (3) when message amount is huge, and matching strategy can not satisfy temporal validity.
Summary of the invention
The object of the invention is that a kind of communication network message classification system and method based on the mass user behavioral data is provided; This system and method can accurately be discerned all kinds of messages; Satisfy the fine granularity demand of data in the message analysis; Can be through message classification effectively to user behavior data, comprise that user's visit, search data carry out careful analysis.
Technical scheme of the present invention is following:
A kind of communication network message classification system based on the mass user behavioral data; Comprise the user data acquisition system; Said user data acquisition system is given the data cleansing module with the transfer of data of collecting; Message characteristic generating feature matrix after said data cleansing module will clean and extract is transferred to the sorting algorithm module, and said sorting algorithm module and the mutual swap data of disaggregated model, said disaggregated model are exported the model that finally is used for the message comparison through the model output module.
Said user data acquisition module enters the storage of subscriber data system with the storage that network is collected.
Said sorting algorithm module also receives the data of training dataset, and said disaggregated model also receives the verification msg of assessment data collection.
A kind of communication network packet classification method based on the mass user behavioral data, realize message classification through following steps:
(1) information in the user data acquisition module is imported the data cleansing module user data is cleaned, extract the characteristic of user communication network message, the generating feature matrix, and generate disaggregated model in the importing sorting algorithm module;
(2) use manual type that the classification of each communication network message is marked simultaneously, set up training dataset and assessment data collection; The eigenmatrix that training dataset is generated also is input to the sorting algorithm module simultaneously; The sorting algorithm module is learnt the disaggregated model about message to training dataset; The eigenmatrix of assessment data collection production is input in the disaggregated model intermediate object program; Verification model output result and artificial annotation results, the accuracy of coming judgment models according to the accuracy and the recall rate of gained;
(3) parameter after the disaggregated model checking is fed back to the sorting algorithm module, constantly the sorting algorithm module is optimized, with the robustness and the model accuracy of raising system under real complex situations;
(4) set up final mask and be used for being connected, predict the classification of communication network message with new message through model output module output.
The network message classification mark that said manual type is distinguished comprises the search engine message, web page browsing message, resource downloading page or leaf message, ad material message.
User behavior data is collected and with information stores access customer data-storage system through the user data acquisition module.
Technique effect of the present invention is:
In the communication network message, there is a large amount of type of messages miscellaneous, in order to carry out the analysis and the excavation of the degree of depth to these messages, must correct all kinds of messages of identification.Because data volume is huge, become very difficult so accomplish this task in the object time and in the target accuracy rate.The present invention is through careful analysis communication network message; Extracted the characteristic of message according to user behavior; Used from the technique construction of data mining and machine learning a whole set of accurately system of all kinds of messages of identification then; Comprise the entire flow of collecting final online use from original message, guaranteed the accurate identification of message in the object time.
Description of drawings
Fig. 1 is the communication network message classification system and method flow chart of steps based on the mass user behavioral data according to the invention.
Embodiment
Below in conjunction with accompanying drawing the present invention is further specified.
As shown in Figure 1; A kind of communication network message classification system based on the mass user behavioral data; Comprise the user data acquisition system, said user data acquisition system is given the data cleansing module with the transfer of data of collecting, and the message characteristic generating feature matrix after said data cleansing module will clean and extract is transferred to the sorting algorithm module; Said sorting algorithm module and the mutual swap data of disaggregated model, said disaggregated model are exported the model that finally is used for the message comparison through the model output module.
Said user data acquisition module enters the storage of subscriber data system with the storage that network is collected.
Said sorting algorithm module also receives the data of training dataset, and said disaggregated model also receives the verification msg of assessment data collection.
A kind of communication network packet classification method based on the mass user behavioral data, realize message classification through following steps:
(1) information in the user data acquisition module is imported the data cleansing module user data is cleaned, extract the characteristic of user communication network message, the generating feature matrix, and generate disaggregated model in the importing sorting algorithm module;
(2) use manual type that the classification of each communication network message is marked simultaneously, set up training dataset and assessment data collection; The eigenmatrix that training dataset is generated also is input to the sorting algorithm module simultaneously; The sorting algorithm module is learnt the disaggregated model about message to training dataset; The eigenmatrix of assessment data collection production is input in the disaggregated model intermediate object program; Verification model output result and artificial annotation results, the accuracy of coming judgment models according to the accuracy and the recall rate of gained;
(3) parameter after the disaggregated model checking is fed back to the sorting algorithm module, constantly the sorting algorithm module is optimized, with the robustness and the model accuracy of raising system under real complex situations;
(4) set up final mask and be used for being connected, predict the classification of communication network message with new message through model output module output.
The network message classification mark that said manual type is distinguished comprises the search engine message, web page browsing message, resource downloading page or leaf message, ad material message.
User behavior data is collected and with information stores access customer data-storage system through the user data acquisition module.
Sorting algorithm module optimizing process: said sorting algorithm module receives computer and the artificial message classification eigenmatrix that is generated; And generation disaggregated model; The assessment data collection generation of the artificial input of said disaggregated model reception is all verified and is used the message classification eigenmatrix; Data after disaggregated model will be verified again feed back to the sorting algorithm module, so that its sorting algorithm module is optimized, so that classification more accurately afterwards.
The effect of cleaning module is to remove some noises in the data, and comprise two parts: some unnecessary samples are removed in (1); (2) remove some noise information in some sample.
Said training dataset comprises two parts, and the one, the network message classification of artificial mark is represented the characteristic vector of network message besides, generally representes with sparse vector, in order to meet the requirement of concrete sorting algorithm, can carry out corresponding format conversion.
Characteristic mainly is some information that can differentiate all kinds of messages, draws through manual analysis and statistics, and can be made up of three parts such as advertisement url characteristic: (1) comprises particular keywords, alimama, doubleclick, ad etc.; (2) generally be in the leaf node that user capture is set; (3) user directly to import ratio generally smaller.
Eigenmatrix refers to the matrix of the characteristic value formation of each sample.
The performance of estimating categorizing system has two aspects, and one is model accuracy, and one is the efficient of algorithm.Wherein influence an adequacy that key factor is exactly a characteristic of model accuracy, comprise the power and the number of characteristic.The present invention carries out according to user behavior message having been carried out careful classification on the basis of depth analysis at the communication network message to magnanimity, has extracted the characteristic of all kinds of messages meticulously, thereby has guaranteed the precision and the prediction accuracy of model.On efficiency of algorithm, carry out a large amount of optimization in addition, thereby guaranteed the actual effect of mass data processing.
The above is merely preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of within spirit of the present invention and principle, being done, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (6)

CN201110162097.2A2011-06-162011-06-16Based on communication network message categorizing system and the method for mass users behavioral dataExpired - Fee RelatedCN102833085B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201110162097.2ACN102833085B (en)2011-06-162011-06-16Based on communication network message categorizing system and the method for mass users behavioral data

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201110162097.2ACN102833085B (en)2011-06-162011-06-16Based on communication network message categorizing system and the method for mass users behavioral data

Publications (2)

Publication NumberPublication Date
CN102833085Atrue CN102833085A (en)2012-12-19
CN102833085B CN102833085B (en)2015-09-16

Family

ID=47336064

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201110162097.2AExpired - Fee RelatedCN102833085B (en)2011-06-162011-06-16Based on communication network message categorizing system and the method for mass users behavioral data

Country Status (1)

CountryLink
CN (1)CN102833085B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106649455A (en)*2016-09-242017-05-10孙燕群Big data development standardized systematic classification and command set system
CN107404398A (en)*2017-05-312017-11-28中山大学A kind of networks congestion control judgement system
CN112016617A (en)*2020-08-272020-12-01中国平安财产保险股份有限公司Fine-grained classification method and device and computer-readable storage medium
CN113366500A (en)*2019-05-132021-09-07三星电子株式会社Classification result verification method and classification result learning method using verification neural network, and computing device executing the methods

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090171662A1 (en)*2007-12-272009-07-02Sehda, Inc.Robust Information Extraction from Utterances
CN101540048A (en)*2009-04-212009-09-23北京航空航天大学Image quality evaluating method based on support vector machine
CN101853277A (en)*2010-05-142010-10-06南京信息工程大学 A Vulnerability Data Mining Method Based on Classification and Association Analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090171662A1 (en)*2007-12-272009-07-02Sehda, Inc.Robust Information Extraction from Utterances
CN101540048A (en)*2009-04-212009-09-23北京航空航天大学Image quality evaluating method based on support vector machine
CN101853277A (en)*2010-05-142010-10-06南京信息工程大学 A Vulnerability Data Mining Method Based on Classification and Association Analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘博等: "改进的KNN方法及其在中文文本分类中的应用", 《西华大学学报(自然科学版)》*
谢华: "Internet网页自动分类技术的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106649455A (en)*2016-09-242017-05-10孙燕群Big data development standardized systematic classification and command set system
CN106649455B (en)*2016-09-242021-01-12孙燕群Standardized system classification and command set system for big data development
CN107404398A (en)*2017-05-312017-11-28中山大学A kind of networks congestion control judgement system
CN113366500A (en)*2019-05-132021-09-07三星电子株式会社Classification result verification method and classification result learning method using verification neural network, and computing device executing the methods
CN112016617A (en)*2020-08-272020-12-01中国平安财产保险股份有限公司Fine-grained classification method and device and computer-readable storage medium
CN112016617B (en)*2020-08-272023-12-01中国平安财产保险股份有限公司Fine granularity classification method, apparatus and computer readable storage medium

Also Published As

Publication numberPublication date
CN102833085B (en)2015-09-16

Similar Documents

PublicationPublication DateTitle
De Choudhury et al.How does the data sampling strategy impact the discovery of information diffusion in social media?
CN106778876B (en)User classification method and system based on mobile user track similarity
CN102289447B (en)Website webpage evaluation system based on communication network message
CN103218431B (en)A kind ofly can identify the system that info web gathers automatically
CN102567494B (en)Website classification method and device
CN101980199A (en) Method and system for discovering network hot topics based on situation assessment
CN108021651B (en)Network public opinion risk assessment method and device
CN109191191B (en) Advertising click fraud detection method based on cost-sensitive convolutional neural network
CN104298679A (en)Application service recommendation method and device
CN104869009A (en)Website data statistics system and method
CN103150663A (en)Method and device for placing network placement data
CN102831193A (en)Topic detecting device and topic detecting method based on distributed multistage cluster
CN105260414B (en)User behavior similarity calculation method and device
CN103838754A (en)Information searching device and method
CN105447147A (en)Data processing method and apparatus
CN113806534B (en)Hot event prediction method for social network
CN105550253B (en)Method and device for acquiring type relationship
CN103136331A (en)Micro blog network opinion leader identification method
CN103713894A (en)Method and equipment for determining access demand information of user
CN104933475A (en)Network forwarding behavior prediction method and apparatus
CN102982112A (en)Ranking list generation method and journal generation method and server
CN102833085A (en)System and method for classifying communication network messages based on mass user behavior data
CN103164537A (en)Method of search engine log data mining facing user information requirements
CN111858702B (en)User behavior data acquisition and weighting method for dynamic portrait
CN106022640B (en)Electric quantity index checking system and method

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
C56Change in the name or address of the patentee
CP01Change in the name or title of a patent holder

Address after:100081, Beijing, Zhongguancun, Haidian District South Avenue, No. 18, International Building, Beijing, block 18, B

Patentee after:Izp (China) Network Technology Co. Ltd.

Address before:100081, Beijing, Zhongguancun, Haidian District South Avenue, No. 18, International Building, Beijing, block 18, B

Patentee before:Beijing IZP Technologies Co., Ltd.

CF01Termination of patent right due to non-payment of annual fee
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20150916

Termination date:20160616


[8]ページ先頭

©2009-2025 Movatter.jp