Movatterモバイル変換


[0]ホーム

URL:


CN112615966B - A terminal identification method for cat pool - Google Patents

A terminal identification method for cat pool
Download PDF

Info

Publication number
CN112615966B
CN112615966BCN202011462856.2ACN202011462856ACN112615966BCN 112615966 BCN112615966 BCN 112615966BCN 202011462856 ACN202011462856 ACN 202011462856ACN 112615966 BCN112615966 BCN 112615966B
Authority
CN
China
Prior art keywords
mobile phone
terminal
phone terminal
terminals
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011462856.2A
Other languages
Chinese (zh)
Other versions
CN112615966A (en
Inventor
王保强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Power Grid Digital Grid Group Hainan Co ltd
Original Assignee
Hainan Digital Power Grid Research Institute of China Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainan Digital Power Grid Research Institute of China Southern Power Grid Co LtdfiledCriticalHainan Digital Power Grid Research Institute of China Southern Power Grid Co Ltd
Priority to CN202011462856.2ApriorityCriticalpatent/CN112615966B/en
Publication of CN112615966ApublicationCriticalpatent/CN112615966A/en
Application grantedgrantedCritical
Publication of CN112615966BpublicationCriticalpatent/CN112615966B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a cat pool terminal identification method, which comprises the steps of selecting activated mobile phone terminals from sold mobile phone terminals to form a target mobile phone terminal group; acquiring data of each mobile phone terminal in the target mobile phone terminal group, and performing data processing to obtain an index feature library corresponding to each mobile phone terminal; performing data visualization processing on each index feature in the index feature library to obtain a plurality of key index features; finally, judging whether the mobile phone terminal in the target mobile phone terminal group is a cat pool terminal or not by using the key index characteristics; by the method and the system, whether the agent uses the cat pool terminal to cheat the operator can be effectively identified and judged, so that commission is withdrawn, and the legal benefit of the operator is conveniently maintained.

Description

Translated fromChinese
一种猫池终端识别方法A cat pool terminal identification method

技术领域technical field

本发明涉及移动通信终端识别技术领域,更具体地,涉及一种猫池终端识别方法。The present invention relates to the technical field of mobile communication terminal identification, and more specifically, to a catpool terminal identification method.

背景技术Background technique

猫池(外文名:Modem POOL)就是将相当数量的Modem使用特殊的拨号请求接入设备连接在一起,可以同时接受多个用户拨号连接的设备。猫池可看作是一种特殊的终端,可以单独认为它是仅仅基于电话的一种扩充装备,而不去区分它的上网或者手机讯号收发效应。正常的一部手机终端只能运行1至2张手机卡,但是通过猫池终端这样的自动化工具则可以把几百张的手机卡运行起来。Modem pool (foreign name: Modem POOL) is a device that connects a considerable number of Modems with special dial-up request access devices, and can accept multiple user dial-up connections at the same time. Maochi can be regarded as a special terminal, which can be regarded as an expansion device based only on the phone, without distinguishing its Internet access or mobile phone signal sending and receiving effects. A normal mobile phone terminal can only run 1 to 2 mobile phone cards, but through automated tools such as Maochi terminal, hundreds of mobile phone cards can be operated.

在通信业务中,运营商往往通过给代理商发放佣金的方式鼓励运营商发展客户,促进运营商客户量的增加,但在实际应用中,部分代理商采取欺诈的方法套取运营商的佣金,导致运营商客户量虚增。例如,代理商向运营商批量提取手机终端和手机卡后,通过研究运营商销售政策,对费用组成和优惠政策进行研究,采取机卡分离的方式进行销售,对于未能售出的手机终端和手机卡,代理商往往购置猫池设备,通过用猫池终端模拟众多手机终端的IMEM识别码进行正常通信行为,让运营商误以为该IMEM识别码的手机终端被激活,营造手机终端被激活和使用的假象。并在猫池设备上通过养卡行为保持众多手机卡处于简单运行状态,从而套取运营商的佣金。代理商通过猫池设备上的模拟手机终端激活和养卡行为获得的佣金往往比正常出售手机终端和手机卡获得的佣金更多,而且速度更快,这就是代理商使用猫池设备的利润驱动力。因此,如何识别出代理商是否采用猫池终端模拟手机终端通信和利用养卡行为进行欺诈,以便于维护运营商的利益成为本领域亟待解决的技术问题。In the communication business, operators often encourage operators to develop customers and increase the number of customers of operators by issuing commissions to agents. However, in practical applications, some agents use fraudulent methods to extract commissions from operators. Leading to an inflated increase in the number of operators' customers. For example, after the agent extracts mobile phone terminals and mobile phone cards from the operator in batches, by studying the sales policy of the operator, researching the cost composition and preferential policies, and adopting the method of separating the machine card for sales, for the unsold mobile phone terminals and mobile phone cards Mobile phone cards, agents often purchase Maochi equipment, and use Maochi terminals to simulate the IMEM identification codes of many mobile phone terminals for normal communication behaviors, making operators mistakenly think that the mobile terminal with the IMEM identification code is activated, creating a situation where the mobile phone terminal is activated and illusions used. And keep many mobile phone cards in a simple running state by raising cards on the Maochi equipment, so as to extract the operator's commission. Agents get more commissions through the activation of simulated mobile phone terminals and card maintenance on Maochi equipment than the normal sales of mobile terminals and mobile phone cards, and the commission is faster. This is the profit driver for agents using Maochi equipment force. Therefore, how to identify whether the agent uses the Maochi terminal to simulate mobile phone terminal communication and use the card raising behavior to carry out fraud, so as to maintain the interests of the operator has become a technical problem to be solved urgently in this field.

发明内容Contents of the invention

本发明要解决的技术问题是提供一种猫池终端识别方法,通过获取新售激活的手机终端的多个重点指标特征,并与猫池终端的特性进行分析比较,从而判断代理商是否采用猫池终端模拟手机终端通信和利用养卡行为进行欺诈套取佣金,便于维护运营商的合法利益。The technical problem to be solved by the present invention is to provide a method for identifying a Maochi terminal. By acquiring multiple key indicator features of a newly sold and activated mobile phone terminal, and analyzing and comparing them with the characteristics of the Maochi terminal, it is possible to determine whether an agent uses a Maochi terminal. The pool terminal simulates mobile phone terminal communication and uses card raising behaviors to fraudulently collect commissions, which is convenient for maintaining the legitimate interests of operators.

为解决上述技术问题,本发明采用的技术方案是:一种猫池终端识别方法,所述方法具体包括以下步骤:In order to solve the above technical problems, the technical solution adopted in the present invention is: a cat pool terminal identification method, the method specifically includes the following steps:

步骤S1:从已出售的手机终端中选取已被激活的手机终端,组成目标手机终端群;Step S1: Select activated mobile terminals from the sold mobile terminals to form a target mobile terminal group;

步骤S2:获取所述目标手机终端群中每个手机终端的数据,并进行数据处理,获得与每个所述手机终端相对应的指标特征库;Step S2: Obtain the data of each mobile phone terminal in the target mobile phone terminal group, and perform data processing to obtain an index feature library corresponding to each of the mobile phone terminals;

步骤S3:通过对所述指标特征库中的各个指标特征进行数据可视化处理,获取若干重点指标特征;Step S3: Obtain several key index features by performing data visualization processing on each index feature in the index feature database;

步骤S4:由重点指标特征判断所述目标手机终端群中的手机终端是否为猫池终端。Step S4: Judging whether the mobile phone terminals in the target mobile phone terminal group are Maochi terminals based on key indicator features.

优选的,所述步骤S1中,判定手机终端是否已被激活的原则为:若手机终端产生通话行为或者流量行为,则判定该手机终端已被激活使用。Preferably, in the step S1, the principle for judging whether the mobile phone terminal has been activated is: if the mobile phone terminal generates a call behavior or traffic behavior, it is determined that the mobile phone terminal has been activated for use.

优选的,所述步骤S2中,获取所述目标手机终端群中每个手机终端的数据,包括:Preferably, in the step S2, acquiring the data of each mobile terminal in the target mobile terminal group includes:

获取每个手机终端的销售时间、激活时间、激活号码、捆绑号码、通话总时长、计费总时长、总流量、总通话次数数据;Obtain the sales time, activation time, activation number, bundled number, total call duration, total billing duration, total traffic, and total call times data of each mobile terminal;

获取基于手机终端使用的手机卡号码的通话时长、计费时长、流量、通话次数数据。Obtain the data of call duration, billing duration, traffic, and number of calls based on the mobile phone card number used by the mobile terminal.

优选的,所述指标特征库包括如下内容:Preferably, the index feature library includes the following contents:

每个手机终端的计费总时长;The total billing time of each mobile terminal;

每个手机终端的总流量;The total traffic of each mobile terminal;

每个手机终端的通话总次数;The total number of calls per mobile terminal;

每个手机终端的主用终端数和其他终端数;The number of main terminals and other terminals of each mobile phone terminal;

每个手机终端的终端收入;Terminal revenue per mobile terminal;

每个手机终端的主叫号码通话占比;The calling number call ratio of each mobile terminal;

每个手机终端的交往圈终端数;The number of communication circle terminals for each mobile terminal;

每个手机终端的通话行为集中次数;Concentrated number of call behaviors of each mobile terminal;

每个手机终端的被叫号码归属占比;The proportion of called number belonging to each mobile terminal;

每个手机终端的通话基站数;The number of call base stations for each mobile terminal;

每个手机终端的激活终端数。The number of active terminals per mobile terminal.

优选的,步骤S3中,对所述指标特征库中的各个指标特征进行数据可视化处理,获取若干重点指标特征,包括:Preferably, in step S3, data visualization processing is performed on each index feature in the index feature library to obtain several key index features, including:

步骤S31:对所述指标特征库中的所有指标构建直方图,获得每一类指标特征的直方图;Step S31: construct a histogram for all indicators in the indicator feature library, and obtain the histogram of each type of indicator feature;

步骤S32:对每类指标特征的直方图进行分析,若某类指标特征在直方图中不符合正态分布,则将该类指标特征作为重点指标特征。Step S32: Analyze the histogram of each type of index feature, if a certain type of index feature does not conform to the normal distribution in the histogram, then use this type of index feature as the key index feature.

优选的,步骤S3中,所述重点指标特征包括下述指标:终端收入、通话基站数、交往圈终端数、激活终端数、主叫号码通话占比,所述终端收入指标值、通话基站数指标值、交往圈终端数指标值、激活终端数指标值、主叫号码通话占比指标值构成所述目标手机终端群中的手机终端的五维虚拟坐标。Preferably, in step S3, the key indicator features include the following indicators: terminal income, number of call base stations, number of contacts in the circle of terminals, number of activated terminals, proportion of calling number calls, the terminal income index value, number of call base stations The index value, the index value of the number of terminals in the communication circle, the index value of the number of activated terminals, and the index value of the call ratio of the calling number constitute the five-dimensional virtual coordinates of the mobile terminals in the target mobile terminal group.

优选的,步骤S4中,由重点指标特征判断所述目标手机终端群中的手机终端是否为猫池终端,包括:Preferably, in step S4, judging whether the mobile phone terminal in the target mobile phone terminal group is a Maochi terminal by key indicator features, including:

步骤S41:通过聚类算法将目标手机终端群中所有终端手机划分为K个类群,分别获得K个中心点的坐标值;Step S41: divide all terminal mobile phones in the target mobile phone terminal group into K clusters by clustering algorithm, obtain the coordinate values of K center points respectively;

步骤S42:若其中一个类群中的手机终端的终端收入特征值、通话基站数特征值、交往圈终端数特征值均低于其他类群,而激活终端数特征值、主叫占比特征值均高于其他类群,则将该类群为猫池终端类群,该类群中的手机终端均为猫池终端。Step S42: If the eigenvalues of terminal income, eigenvalues of the number of call base stations, and eigenvalues of the number of terminals in the communication circle of mobile terminals in one of the groups are all lower than those of other groups, while the eigenvalues of the number of activated terminals and the eigenvalues of caller ratio are all high For other groups, this group is the Maochi terminal group, and the mobile terminals in this group are all Maochi terminals.

优选的,还包括步骤S5,所述步骤S5包括:Preferably, step S5 is also included, and said step S5 includes:

步骤S51:将所述猫池终端类群作为猫池检验库,提取新激活手机终端中的终端收入指标值、通话基站数指标值、交往圈终端数指标值、激活终端数指标值、主叫号码通话占比指标值构成取新激活手机终端五维虚拟坐标;Step S51: Use the Maochi terminal group as the Maochi inspection library, and extract the terminal income index value, call base station number index value, communication circle terminal number index value, activated terminal number index value, and calling number in the newly activated mobile phone terminal The composition of the call ratio index value takes the five-dimensional virtual coordinates of the newly activated mobile phone terminal;

步骤S52:分别计算新激活手机终端五维虚拟坐标与K个中心点的坐标值之间的欧式距离并比较,若新激活手机终端五维虚拟坐标与猫池检验库的中心点的坐标值之间的欧式距离最小,则将新激活手机终端列为疑似猫池终端;Step S52: Calculate and compare the Euclidean distances between the five-dimensional virtual coordinates of the newly activated mobile terminal and the coordinates of the K center points, and compare them. If the Euclidean distance between them is the smallest, the newly activated mobile phone terminal will be listed as a suspected cat pool terminal;

步骤S53:对疑似猫池终端进行人工筛选,根据筛选结果判断所述新激活手机终端是否为猫池终端。Step S53: Carry out manual screening of suspected Maochi terminals, and judge whether the newly activated mobile phone terminal is a Maochi terminal according to the screening results.

与现有技术相比,本发明达到的有益效果是:Compared with prior art, the beneficial effect that the present invention reaches is:

本发明提供一种猫池终端识别方法,通过获取新售出的已激活手机终端的各项数据,经过数据处理获得手机终端的多项指标特征,再通过数据可视化处理,从中挑选出若干的重点指标特征,由重点指标特征判断所述目标手机终端群中的手机终端是否为猫池终端;通过本发明,可有效的识别判断代理商是否采用猫池终端对运营商进行欺诈从而套取佣金,便于维护运营商的合法利益。The present invention provides a method for identifying Maochi terminals. By acquiring various data of newly sold activated mobile terminals, multiple index features of mobile terminals are obtained through data processing, and then through data visualization processing, a number of key points are selected therefrom. Index features, judge whether the mobile phone terminal in the target mobile phone terminal group is a Maochi terminal by the key index features; through the present invention, it can effectively identify and judge whether the agent uses the Maochi terminal to cheat the operator so as to get a commission, It is convenient to safeguard the legitimate interests of operators.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的优选实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only preferred embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是本发明的一种猫池终端识别方法流程图;Fig. 1 is a kind of cat pool terminal identification method flowchart of the present invention;

具体实施方式Detailed ways

为了更好理解本发明技术内容,下面提供具体实施例,并结合附图对本发明做进一步的说明:In order to better understand the technical contents of the present invention, specific embodiments are provided below, and in conjunction with the accompanying drawings, the present invention is further described:

参见图1,本发明提供一种猫池终端识别方法,所述方法具体包括以下步骤:Referring to Fig. 1, the present invention provides a kind of cat pool terminal identification method, and described method specifically comprises the following steps:

步骤S1:从已出售的手机终端中选取已被激活的手机终端,组成目标手机终端群;Step S1: Select activated mobile terminals from the sold mobile terminals to form a target mobile terminal group;

步骤S2:获取所述目标手机终端群中每个手机终端的数据,并进行数据处理,获得与每个所述手机终端相对应的指标特征库;Step S2: Obtain the data of each mobile phone terminal in the target mobile phone terminal group, and perform data processing to obtain an index feature library corresponding to each of the mobile phone terminals;

步骤S3:通过对所述指标特征库中的各个指标特征进行数据可视化处理,获取若干重点指标特征;Step S3: Obtain several key index features by performing data visualization processing on each index feature in the index feature database;

步骤S4:由重点指标特征判断所述目标手机终端群中的手机终端是否为猫池终端。Step S4: Judging whether the mobile phone terminals in the target mobile phone terminal group are Maochi terminals based on key indicator features.

具体的,所述步骤S1中,判定手机终端是否已被激活的原则为:若手机终端产生通话行为或者流量行为,则判定该手机终端已被激活使用。由于运营商用户群巨大,且猫池终端多发生在新售的手机终端基础上,因此新售手机终端的通话、流量行为是筛选合适的目标手机终端的重要依据。Specifically, in the step S1, the principle for judging whether the mobile phone terminal has been activated is as follows: if the mobile phone terminal generates a call behavior or traffic behavior, it is determined that the mobile phone terminal has been activated for use. Due to the huge user base of operators and the fact that Maochi terminals mostly occur on the basis of newly-sold mobile terminals, the call and traffic behavior of newly-sold mobile terminals is an important basis for screening suitable target mobile terminals.

具体的,猫池终端在收入、流量、激活、使用等指标上会存在相似的特征,且显著有别于正常的手机终端,所以识别猫池终端存在可行性。因此首要获取目标手机终端群中手机终端的各项指标特征,便于跟猫池终端特性进行比较,才能对手机终端是否为猫池终端做出判断,所述步骤S2中,获取所述目标手机终端群中每个手机终端的数据,包括:Specifically, Maochi terminals have similar characteristics in income, traffic, activation, usage and other indicators, and are significantly different from normal mobile phone terminals, so it is feasible to identify Maochi terminals. Therefore firstly acquire the various indicator characteristics of the mobile phone terminal in the target mobile phone terminal group, so that it is convenient to compare with the characteristics of the cat pool terminal, and then it is possible to judge whether the mobile phone terminal is a cat pool terminal. In the step S2, obtain the target mobile phone terminal The data of each mobile terminal in the group, including:

获取每个手机终端的销售时间、激活时间、激活号码、捆绑号码、通话总时长、计费总时长、总流量、总通话次数数据;Obtain the sales time, activation time, activation number, bundled number, total call duration, total billing duration, total traffic, and total call times data of each mobile terminal;

获取基于手机终端使用的手机卡号码的通话时长、计费时长、流量、通话次数数据。Obtain the data of call duration, billing duration, traffic, and number of calls based on the mobile phone card number used by the mobile terminal.

具体的,所述指标特征库包括如下内容:每个手机终端的计费总时长;每个手机终端的总流量;每个手机终端的通话总次数;每个手机终端的主用终端数和其他终端数;每个手机终端的终端收入;每个手机终端的主叫号码通话占比;每个手机终端的交往圈终端数;每个手机终端的通话行为集中次数;每个手机终端的被叫号码归属占比;每个手机终端的通话基站数;每个手机终端的激活终端数。其中计费总时长、总流量、通话总次数可直接由运营商的数据库获取,剩余其它指标特征的获取过程如下:Specifically, the index feature library includes the following contents: the total billing duration of each mobile terminal; the total traffic of each mobile terminal; the total number of calls of each mobile terminal; the number of active terminals of each mobile terminal and other The number of terminals; the terminal income of each mobile terminal; the proportion of calls made by the calling number of each mobile terminal; the number of terminals in the communication circle of each mobile terminal; The proportion of number ownership; the number of call base stations for each mobile terminal; the number of activated terminals for each mobile terminal. Among them, the total billing time, total traffic, and total number of calls can be directly obtained from the operator's database, and the acquisition process of the remaining other index features is as follows:

主用终端数和其他终端数:若手机终端上某一电话卡号码同时满足该手机卡号码在该手机终端上的通话时长大于该手机卡号码总通话时长的20%、该手机卡号码在该手机终端上的使用流量大于该电话卡号码总使用流量的20%,则该手机终端的主用终端数加1,否则该手机终端的其他终端数加1;The number of primary terminals and other terminals: If a phone card number on a mobile phone terminal simultaneously satisfies the call duration of the phone card number on the phone terminal is greater than 20% of the total call time of the phone card number, the phone card number in the phone If the usage traffic on the terminal is greater than 20% of the total usage traffic of the phone card number, the number of primary terminals of the mobile terminal will be increased by 1, otherwise the number of other terminals of the mobile terminal will be increased by 1;

终端收入:通过下式计算每个手机终端的终端收入:Terminal income: Calculate the terminal income of each mobile terminal through the following formula:

其中,S为每个手机终端的终端收入,n为该手机终端使用过的手机卡号码个数,tx总为该手机卡号码总通话时长,tx为该手机卡号码在该手机终端上的通话时长,ARPUx为该手机卡号码一个时期内给运行商的贡献业务收入;Among them, S is the terminal income of each mobile terminal, n is the number of mobile phone card numbers used by the mobile phone terminal, tx is the total call time of the mobile phone card number, and tx is the number of mobile phone card numbers on the mobile phone terminal The call duration, ARPUx is the contribution business income of the mobile phone card number to the operator within a period;

主叫号码通话占比:数据库获取每个手机终端的主叫次数和被叫次数以及相应的手机卡号码,使用每个手机终端的主叫次数除以该手机终端的所有通话次数,获得每个手机终端的主叫号码通话占比;Calling number call ratio: the database obtains the number of calling and called times of each mobile terminal and the corresponding mobile card number, divides the number of calling of each mobile terminal by all the number of calls of the mobile terminal, and obtains the number of calls of each mobile terminal. The proportion of calling numbers of mobile terminals;

交往圈终端数:数据库获取基于每个手机终端的所有的号码在该手机终端上进行通话目标,生成每个手机终端的交往圈终端数;Number of contacts in the circle of terminals: database acquisition is based on all numbers of each mobile phone terminal to carry out the call target on the mobile phone terminal, generating the number of contacts in the circle of contacts of each mobile phone terminal;

通话行为集中次数:数据库获取该每个手机终端的用户清单,统计每个手机终端在一个月中有几次出现这样的情况:在同一天,同一基站下,向同一人通话,且通话时间在30s以内,且通话次数大于2次,以获取每个手机终端的通话行为集中次数;Concentrated number of call behaviors: the database obtains the user list of each mobile terminal, and counts how many times each mobile terminal has such a situation in a month: on the same day, under the same base station, the same person is called, and the call time is within Within 30s, and the number of calls is greater than 2, to obtain the concentrated number of call behaviors of each mobile terminal;

被叫号码归属占比:数据库统计每个手机终端被叫号码所属的渠道,同时获取每个手机终端被叫号码的数量,再用每个手机终端所有的被叫号码的数量除以被叫号码所属的渠道数,得到被叫号码归属占比。Proportion of called number ownership: the database counts the channel to which each mobile terminal’s called number belongs, and at the same time obtains the number of each mobile terminal’s called number, and then divides the number of each mobile terminal’s called number by the called number The number of channels to which it belongs, and the proportion of the called number is obtained.

通话基站数:数据库统计每个手机终端使用过的基站,获得每个手机终端的通话基站数;The number of call base stations: the database counts the used base stations of each mobile terminal, and obtains the number of call base stations of each mobile terminal;

激活终端数:数据库统计激活每个手机终端的手机卡号码又额外激活了另外多少个手机终端,获得每个手机终端的激活终端数;Activation terminal number: database statistics activates the mobile phone card number of each mobile terminal and activates how many other mobile terminals additionally, obtains the activation terminal number of each mobile terminal;

上述各项指标特征的获取涉及到了手机终端上各个手机卡号码的相关数据,可反应出各个手机卡号码的相关使用情况,因此,各项指标特征对判别终端上是否存在养卡行为具有一定的效用。The acquisition of the above-mentioned index features involves the relevant data of each mobile phone card number on the mobile phone terminal, which can reflect the related usage of each mobile phone card number. utility.

具体的,数据探索是对数据进行初步研究,以便更好地理解它的特殊性质。数据探索的一种常用方法是“可视化”。数据可视化是指以图形或表格的形式显示信息。成功的可视化需要将数据(信息)转换成可视的形式,以便能够借此分析或报告数据的特征和数据项或属性之间的关系;而直方图便是一种可视化技术。在统计学中,直方图是一种对数据分布情况的图形表示,是一种二维统计图表,它的两个坐标分别是统计样本和该样本对应的某个属性的度量。因此所述步骤S3采用直方图对各项指标特征进行可视化处理,获取若干重点指标特征,包括:Specifically, data exploration is the preliminary study of data in order to better understand its special properties. A common method of data exploration is "visualization". Data visualization refers to presenting information in graphical or tabular form. Successful visualization requires converting data (information) into a visual form so that the characteristics of the data and the relationship between data items or attributes can be analyzed or reported; and the histogram is a visualization technique. In statistics, a histogram is a graphical representation of the distribution of data, a two-dimensional statistical chart, and its two coordinates are the measurement of a statistical sample and a certain attribute corresponding to the sample. Therefore, the step S3 adopts the histogram to visualize the various index features, and obtains several key index features, including:

步骤S31:对所述指标特征库中的所有指标构建直方图,获得每一类指标特征的直方图;Step S31: construct a histogram for all indicators in the indicator feature library, and obtain the histogram of each type of indicator feature;

步骤S32:对每类指标特征的直方图进行分析,若某类指标特征在直方图中不符合正态分布,则将该类指标特征作为重点指标特征。Step S32: Analyze the histogram of each type of index feature, if a certain type of index feature does not conform to the normal distribution in the histogram, then use this type of index feature as the key index feature.

具体的,通过对直方图进行分析后,所述重点指标特征包括下述指标:终端收入、通话基站数、交往圈终端数、激活终端数、主叫号码通话占比,所述终端收入指标值、通话基站数指标值、交往圈终端数指标值、激活终端数指标值、主叫号码通话占比指标值构成所述目标手机终端群中的手机终端的五维虚拟坐标。Specifically, after analyzing the histogram, the characteristics of the key indicators include the following indicators: terminal income, number of call base stations, number of terminals in the communication circle, number of activated terminals, proportion of calling number calls, and the value of the terminal income index The five-dimensional virtual coordinates of the mobile phone terminals in the target mobile phone terminal group are composed of the index value of the number of calling base stations, the index value of the number of terminals in the communication circle, the index value of the number of activated terminals, and the index value of the call proportion of the calling number.

具体的,由重点指标特征判断所述目标手机终端群中的手机终端是否为猫池终端,包括:Specifically, judging whether the mobile phone terminal in the target mobile phone terminal group is a Maochi terminal by key indicator features includes:

步骤S41:通过聚类算法将目标手机终端群中所有终端手机划分为K个类群,分别获得K个中心点的坐标值;Step S41: divide all terminal mobile phones in the target mobile phone terminal group into K clusters by clustering algorithm, obtain the coordinate values of K center points respectively;

步骤S42:若其中一个类群中的手机终端的终端收入特征值、通话基站数特征值、交往圈终端数特征值均低于其他类群,而激活终端数特征值、主叫占比特征值均高于其他类群,则将该类群为猫池终端类群,该类群中的手机终端均为猫池终端。Step S42: If the eigenvalues of terminal income, eigenvalues of the number of call base stations, and eigenvalues of the number of terminals in the communication circle of mobile terminals in one of the groups are all lower than those of other groups, while the eigenvalues of the number of activated terminals and the eigenvalues of caller ratio are all high For other groups, this group is the Maochi terminal group, and the mobile terminals in this group are all Maochi terminals.

因为聚类算法是把重点指标特征具有相似数据特性的手机终端聚集为一个类群,因此,分别用K个类群中重点指标特征的数据特性与猫池终端特性相比较,将特性相符合的一个类群中的手机终端判定为猫池终端,即实际中,是由猫池终端模拟该类群中的每个手机终端的IMEM识别码进行通信,营造出该IMEM 识别码所绑定的手机终端被激活的假象,从而套取运营商的佣金。Because the clustering algorithm is to gather mobile terminals with similar data characteristics of key indicators into a cluster, therefore, the data characteristics of key indicators in the K clusters are compared with the characteristics of Maochi terminals, and a cluster with matching characteristics The mobile terminal in the group is determined to be a Maochi terminal, that is, in practice, the Maochi terminal simulates the IMEM identification code of each mobile terminal in the group to communicate, creating the mobile terminal bound to the IMEM identification code to be activated Illusion, so as to extract the operator's commission.

其中,本实施例在聚类算法中又具体采用k-means算法,需要说明的是,随着所分的类群个数K的增大,分类误差SSE会越来越小,但同时分类耗时(迭代次数)会急剧上升,而且所分的类群个数过多,会干扰决策判断,导致业务上的偏差,因此,选择一个合理的K值是非常必要的。具体的,类群个数与迭代次数和平方误差和(SSE)的关系如下表1所示:Among them, this embodiment specifically uses the k-means algorithm in the clustering algorithm. It should be noted that, as the number of clusters K increases, the classification error SSE will become smaller and smaller, but at the same time, the classification takes time (The number of iterations) will rise sharply, and the number of divided groups will interfere with decision-making and judgment, resulting in business deviations. Therefore, it is very necessary to choose a reasonable K value. Specifically, the relationship between the number of groups and the number of iterations and the sum of squared errors (SSE) is shown in Table 1 below:

通过观察表1可知,当类群个数K在低于8的范围内递增时,分类误差SSE 快速下降,当类群个数K在高于30的范围递增时,分类误差SSE平稳下降,但迭代次数迅速上升。因此,在综合考虑分类误差SSE要小、迭代次数要少、类群个数不宜过多的情况下,本实施例中优选类群个数K的取值为10,即将所有手机终端依据k-means算法划分为10个类群。It can be seen from Table 1 that when the number of groups K increases in the range below 8, the classification error SSE decreases rapidly, and when the number of groups K increases in the range above 30, the classification error SSE decreases steadily, but the number of iterations rise rapidly. Therefore, considering that the classification error SSE should be small, the number of iterations should be small, and the number of groups should not be too large, in this embodiment, the value of the number of groups K is preferably 10, that is, all mobile terminals are based on the k-means algorithm. Divided into 10 taxa.

进一步的,可基于现有的10个类群和相应的10个中心点的坐标值对其它新售手机终端进行识别判断,具体为步骤S5,所述步骤S5包括:Further, based on the coordinate values of the existing 10 groups and the corresponding 10 center points, other newly sold mobile phone terminals can be identified and judged, specifically step S5, and the step S5 includes:

步骤S51:将所述猫池终端类群作为猫池检验库,提取新激活手机终端中的终端收入指标值、通话基站数指标值、交往圈终端数指标值、激活终端数指标值、主叫号码通话占比指标值构成取新激活手机终端五维虚拟坐标;Step S51: Use the Maochi terminal group as the Maochi inspection library, and extract the terminal income index value, call base station number index value, communication circle terminal number index value, activated terminal number index value, and calling number in the newly activated mobile phone terminal The composition of the call ratio index value takes the five-dimensional virtual coordinates of the newly activated mobile phone terminal;

步骤S52:分别计算新激活手机终端五维虚拟坐标与10个中心点的坐标值之间的欧式距离并比较,若新激活手机终端五维虚拟坐标与猫池检验库的中心点的坐标值之间的欧式距离最小,则将新激活手机终端列为疑似猫池终端;Step S52: Calculate and compare the Euclidean distances between the five-dimensional virtual coordinates of the newly activated mobile terminal and the coordinates of the 10 center points, and compare them. If the Euclidean distance between them is the smallest, the newly activated mobile phone terminal will be listed as a suspected cat pool terminal;

计算欧式距离的具体计算公式为:The specific calculation formula for calculating the Euclidean distance is:

其中,dk为所述新激活手机终端五维虚拟坐标点到一个类群中心点的欧式距离,m为所述手机终端的重点指标特征个数即m=5,Xy为所述新激活手机终端的五维虚拟坐标值,Xky为中心点的坐标值;Wherein,dk is the Euclidean distance from the five-dimensional virtual coordinate point of the newly activated mobile phone terminal to a group center point, m is the number of key index features of the mobile phone terminal i.e. m=5, Xy is the newly activated mobile phone The five-dimensional virtual coordinate value of the terminal, Xky is the coordinate value of the center point;

步骤S53:对疑似猫池终端进行人工筛选,根据筛选结果判断所述新激活手机终端是否为猫池终端。Step S53: Carry out manual screening of suspected Maochi terminals, and judge whether the newly activated mobile phone terminal is a Maochi terminal according to the screening results.

显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims (5)

CN202011462856.2A2020-12-142020-12-14 A terminal identification method for cat poolActiveCN112615966B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011462856.2ACN112615966B (en)2020-12-142020-12-14 A terminal identification method for cat pool

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011462856.2ACN112615966B (en)2020-12-142020-12-14 A terminal identification method for cat pool

Publications (2)

Publication NumberPublication Date
CN112615966A CN112615966A (en)2021-04-06
CN112615966Btrue CN112615966B (en)2023-04-14

Family

ID=75233593

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011462856.2AActiveCN112615966B (en)2020-12-142020-12-14 A terminal identification method for cat pool

Country Status (1)

CountryLink
CN (1)CN112615966B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110662212A (en)*2019-10-112020-01-07上海德拓信息技术股份有限公司Mobile phone terminal identification method based on big data
CN110995687A (en)*2019-11-262020-04-10深圳市随手金服信息科技有限公司Cat pool equipment identification method, device, equipment and storage medium
CN111565253A (en)*2019-12-132020-08-21成都无糖信息技术有限公司Early warning method and system for fraud data of cat pool

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9781257B2 (en)*2000-01-192017-10-03Sony Mobile Communications AbTechnique for obtaining caller-originated alert signals in IP-based communication sessions
US6826403B1 (en)*2000-09-122004-11-30Phonepages Of Sweden AbMethod and system for identifying a user
CN102081774A (en)*2009-11-262011-06-01中国移动通信集团广东有限公司Card-raising identification method and system
CN101882146A (en)*2010-05-182010-11-10北京邮电大学 A Clustering-Based Identification Method for Mobile Communication Service User False Opening
CN107248082B (en)*2017-05-232020-08-04北京道隆华尔软件股份有限公司Card maintenance identification method and device
CN109600344B (en)*2017-09-302021-03-23腾讯科技(深圳)有限公司Method and device for identifying risk group and electronic equipment
CN110611929A (en)*2018-06-152019-12-24中国移动通信集团有限公司 Abnormal user identification method and device
CN111353138A (en)*2018-12-242020-06-30中移(杭州)信息技术有限公司 A method, device, electronic device and storage medium for abnormal user identification
CN110751231B (en)*2019-10-302022-04-29上海观安信息技术股份有限公司Card number detection method and system based on unsupervised algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110662212A (en)*2019-10-112020-01-07上海德拓信息技术股份有限公司Mobile phone terminal identification method based on big data
CN110995687A (en)*2019-11-262020-04-10深圳市随手金服信息科技有限公司Cat pool equipment identification method, device, equipment and storage medium
CN111565253A (en)*2019-12-132020-08-21成都无糖信息技术有限公司Early warning method and system for fraud data of cat pool

Also Published As

Publication numberPublication date
CN112615966A (en)2021-04-06

Similar Documents

PublicationPublication DateTitle
CN107248082B (en)Card maintenance identification method and device
CN109640312B (en)'Black card' identification method, electronic equipment and computer readable storage medium
CN106294105B (en)Brushing amount tool detection method and device
CN109688275A (en)Harassing call recognition methods, device and storage medium
CN111028016A (en)Sales data prediction method and device and related equipment
CN112866192B (en)Method and device for identifying abnormal aggregation behaviors
CN111476296A (en)Sample generation method, classification model training method, identification method and corresponding devices
US20140089040A1 (en)System and Method for Customer Experience Measurement & Management
CN112422574A (en)Risk account identification method, device, medium and electronic equipment
CN109993392A (en) Business complaint risk estimation method, device, computing device and storage medium
CN109474923B (en)Object recognition method and device, and storage medium
CN109274834B (en)Express number identification method based on call behavior
CN114828013A (en)Fraud number recognition and model training method thereof, related equipment and storage medium
CN113837512A (en)Abnormal user identification method and device
CN110019996A (en)A kind of family relationship recognition methods and system
WO2024001102A1 (en)Method and apparatus for intelligently identifying family circle in communication industry, and device
CN102387512A (en)Repeat network access analysis method based on vector space model (VSM)
CN113205443A (en)Abnormal user identification method and device
CN106301975A (en)A kind of data detection method and device thereof
CN112615966B (en) A terminal identification method for cat pool
CN111144430B (en) Recognition method and device for maintenance card number based on genetic algorithm
CN109510903B (en)Method for identifying international fraud number
CN114817518B (en)License handling method, system and medium based on big data archive identification
CN110909753B (en)Data classification method, system and equipment
CN110162545A (en)Information-pushing method, equipment, storage medium and device based on big data

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CP03Change of name, title or address

Address after:No.32, Haifu Road, Meilan District, Haikou City, Hainan Province, 570203

Patentee after:Southern Power Grid Digital Grid Group (Hainan) Co.,Ltd.

Country or region after:China

Address before:No. 32 Haifu Road, Meilan District, Haikou City, Hainan Province

Patentee before:China Southern Power Grid Hainan Digital Power Grid Research Institute Co.,Ltd.

Country or region before:China

CP03Change of name, title or address

[8]ページ先頭

©2009-2025 Movatter.jp