Movatterモバイル変換


[0]ホーム

URL:


CN114189585B - Method, device and computing equipment for detecting abnormal harassment calls - Google Patents

Method, device and computing equipment for detecting abnormal harassment calls
Download PDF

Info

Publication number
CN114189585B
CN114189585BCN202010961602.9ACN202010961602ACN114189585BCN 114189585 BCN114189585 BCN 114189585BCN 202010961602 ACN202010961602 ACN 202010961602ACN 114189585 BCN114189585 BCN 114189585B
Authority
CN
China
Prior art keywords
data
call
decision tree
historical
harassing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010961602.9A
Other languages
Chinese (zh)
Other versions
CN114189585A (en
Inventor
宋维平
向倞
张麾军
董宇翔
廖珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Chongqing Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Chongqing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Chongqing Co LtdfiledCriticalChina Mobile Communications Group Co Ltd
Priority to CN202010961602.9ApriorityCriticalpatent/CN114189585B/en
Publication of CN114189585ApublicationCriticalpatent/CN114189585A/en
Application grantedgrantedCritical
Publication of CN114189585BpublicationCriticalpatent/CN114189585B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The embodiment of the invention relates to the technical field of communication, and discloses a method, a device, a computing device and a storage medium for detecting nuisance call abnormality, wherein the method comprises the following steps: acquiring call record call ticket data and internet crawler data; analyzing a harassing call early warning model constructed according to the call record call ticket data application to acquire early warning data; performing secondary rechecking on the early warning data according to the internet crawler data to obtain a real harassment telephone number; and issuing the harassing call to stop the harassing call number. By means of the method, the device and the system, the processing can be performed based on various data sources, the practical scene is more met, and the detection accuracy and completeness can be improved.

Description

Translated fromChinese
骚扰电话异常检测方法、装置及计算设备Method, device and computing equipment for detecting abnormal harassment calls

技术领域Technical Field

本发明实施例涉及通信技术领域,具体涉及一种骚扰电话异常检测方法、装置、计算设备及存储介质。Embodiments of the present invention relate to the field of communication technology, and in particular to a method, device, computing device and storage medium for detecting abnormal harassing calls.

背景技术Background Art

随着近年来骚扰电话呈现出的高发趋势,网络侧和终端侧各厂商都推出了各种骚扰电话分析与拦截产品。其中网络侧主要是运营商基于通话信令数据承建的骚扰电话分析模型与拦截系统;终端侧主要是基于互联网厂商所提供的各种应用程序(Application,APP),如360手机卫士、猎网系统等。终端侧产品主要是基于终端用户点击举报,生产骚扰电话数据库,并将数据下载到终端,来电时给与提醒。With the high incidence of harassing calls in recent years, various network and terminal vendors have launched various harassing call analysis and interception products. The network side mainly consists of harassing call analysis models and interception systems built by operators based on call signaling data; the terminal side mainly consists of various applications (Application, APP) provided by Internet vendors, such as 360 Mobile Guardian and Hunter Network System. The terminal side products are mainly based on terminal users clicking reports, generating a harassing call database, downloading the data to the terminal, and giving reminders when calls come in.

已有的针对骚扰电话治理的检测方案主要有以下局限。对于网络侧,骚扰电话分类较为单一,一般只按照信令属性进行划分,没有进行细分骚扰电话属性,如按主叫电话的所属行业、主叫征信情况等,数据维度和提取的特征数据相对有限,处理精度不足。对于终端侧,目前的APP方式需用户开通权限,涉及用户隐私,当前只覆盖了部分的智能终端用户,非智能终端用户无法覆盖。在实际使用时必须依赖已有数据,无法及时应对新出现的骚扰电话,存在滞后性;同时也可能对运营商的已回收号码进行错误拦截。且数据来源全部为终端用户,存在遗漏、恶意标记等情况。The existing detection schemes for the management of harassing calls have the following limitations. On the network side, the classification of harassing calls is relatively simple, and is generally divided only according to signaling attributes, without subdividing the attributes of harassing calls, such as the industry to which the caller belongs, the credit status of the caller, etc. The data dimensions and extracted feature data are relatively limited, and the processing accuracy is insufficient. On the terminal side, the current APP method requires users to open permissions, which involves user privacy. Currently, only some smart terminal users are covered, and non-smart terminal users cannot be covered. In actual use, it must rely on existing data, and it is impossible to respond to newly emerging harassing calls in a timely manner, and there is a lag; at the same time, it may also mistakenly intercept the operator's recycled numbers. In addition, the data sources are all terminal users, and there are omissions, malicious markings, etc.

发明内容Summary of the invention

鉴于上述问题,本发明实施例提供了一种骚扰电话异常检测方法、装置、计算设备及存储介质,克服了上述问题或者至少部分地解决了上述问题。In view of the above problems, embodiments of the present invention provide a method, apparatus, computing device and storage medium for detecting abnormal harassing calls, which overcome the above problems or at least partially solve the above problems.

根据本发明实施例的一个方面,提供了一种骚扰电话异常检测方法,所述方法包括:获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码。According to one aspect of an embodiment of the present invention, a method for detecting abnormal harassing calls is provided, the method comprising: obtaining call record bill data and Internet crawler data; analyzing a harassing call warning model constructed based on the call record bill data to obtain warning data; performing a secondary review of the warning data based on the Internet crawler data to obtain a real harassing phone number; and sending the harassing call to shut down the harassing phone number.

在一种可选的方式中,所述获取通话记录话单数据,包括:从业务域和运营域采集所述通话记录话单数据,并根据所述通话记录话单数据统计行为特征。In an optional manner, the acquiring of call record bill data includes: collecting the call record bill data from a business domain and an operation domain, and calculating behavior characteristics based on the call record bill data.

在一种可选的方式中,所述骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。In an optional manner, the harassing call warning model includes: a high-frequency call warning model, a cat pool warning model, a high-risk user monitoring model, a silent card activation monitoring model, and an hourly model.

在一种可选的方式中,所述根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据之前,包括:获取历史通讯记录话单数据以及历史互联网爬虫数据,并根据所述历史通讯记录话单数据统计历史行为特征;根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树;根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证并调整所述决策树的参数,确定最终的所述决策树;使用所述决策树建立所述骚扰电话预警模型。In an optional manner, the harassing call warning model constructed based on the call record bill data is analyzed before obtaining the warning data, including: obtaining historical communication record bill data and historical Internet crawler data, and statistically analyzing historical behavior characteristics based on the historical communication record bill data; constructing a decision tree based on the historical behavior characteristics and the historical Internet crawler data; verifying the decision tree and adjusting the parameters of the decision tree using random secondary sampling or self-service sampling according to the sample size to determine the final decision tree; and using the decision tree to establish the harassing call warning model.

在一种可选的方式中,所述根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树,包括:根据所述历史行为特征获取信息增益;根据所述信息增益从最大值的一项开始从大到小依次建立分支,构建决策树;根据所述历史互联网爬虫数据对所述决策树进行剪枝操作,在所述信息增益小于预设阈值时停止构建所述决策树。In an optional manner, constructing a decision tree based on the historical behavior characteristics and the historical Internet crawler data includes: obtaining information gain based on the historical behavior characteristics; establishing branches in descending order starting from an item with a maximum value based on the information gain to construct a decision tree; pruning the decision tree based on the historical Internet crawler data, and stopping constructing the decision tree when the information gain is less than a preset threshold.

在一种可选的方式中,所述根据所述历史行为特征获取信息增益,包括:根据所述历史行为特征获取与所述历史行为特征对应的经验熵与条件熵;根据所述经验熵与所述条件熵计算信息增益。In an optional manner, obtaining information gain according to the historical behavior characteristics includes: obtaining empirical entropy and conditional entropy corresponding to the historical behavior characteristics according to the historical behavior characteristics; and calculating information gain according to the empirical entropy and the conditional entropy.

在一种可选的方式中,所述将所述骚扰电话进行下发以关停所述骚扰电话号码,包括:将所述骚扰电话号码自动下发到所属公司通知机构关停;或者将所述骚扰电话号码与已有自动处置系统进行关联形成自动关停。In an optional manner, sending the harassing call to shut down the harassing phone number includes: automatically sending the harassing phone number to the company to which it belongs to notify the agency to shut it down; or associating the harassing phone number with an existing automatic handling system to automatically shut it down.

根据本发明实施例的另一个方面,提供了一种骚扰电话异常检测装置,所述装置包括:数据获取单元,获取通话记录话单数据以及互联网爬虫数据;模型分析单元,用于根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;二次复核单元,用于根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;号码关停单元,用于将所述骚扰电话进行下发以关停所述骚扰电话号码。According to another aspect of an embodiment of the present invention, a device for detecting abnormal harassment calls is provided, the device comprising: a data acquisition unit for acquiring call record bill data and Internet crawler data; a model analysis unit for analyzing a harassment call warning model constructed according to the call record bill data to obtain warning data; a secondary review unit for performing a secondary review on the warning data according to the Internet crawler data to obtain a real harassment phone number; and a number shutdown unit for sending the harassment call to shut down the harassment phone number.

根据本发明实施例的另一方面,提供了一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;According to another aspect of an embodiment of the present invention, there is provided a computing device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus;

所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述骚扰电话异常检测方法的步骤。The memory is used to store at least one executable instruction, and the executable instruction enables the processor to execute the steps of the above-mentioned harassing call abnormality detection method.

根据本发明实施例的又一方面,提供了一种计算机存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使所述处理器执行上述骚扰电话异常检测方法的步骤。According to another aspect of an embodiment of the present invention, a computer storage medium is provided, wherein at least one executable instruction is stored in the storage medium, and the executable instruction enables the processor to execute the steps of the above-mentioned harassing call abnormality detection method.

本发明实施例通过获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码,能够基于多种数据源进行处理,更加符合实用场景,能够提高检测的准确性和完备性。The embodiment of the present invention obtains call record bill data and Internet crawler data; analyzes the harassment call warning model constructed according to the call record bill data to obtain warning data; conducts a secondary review of the warning data according to the Internet crawler data to obtain the real harassment phone number; and sends the harassment call to shut down the harassment phone number. It can be processed based on multiple data sources, is more in line with practical scenarios, and can improve the accuracy and completeness of detection.

上述说明仅是本发明实施例技术方案的概述,为了能够更清楚了解本发明实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本发明实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the embodiment of the present invention. In order to more clearly understand the technical means of the embodiment of the present invention, it can be implemented according to the contents of the specification. In order to make the above and other purposes, features and advantages of the embodiment of the present invention more obvious and easy to understand, the specific implementation methods of the present invention are listed below.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art by reading the detailed description of the preferred embodiments below. The accompanying drawings are only for the purpose of illustrating the preferred embodiments and are not to be considered as limiting the present invention. Also, the same reference symbols are used throughout the accompanying drawings to represent the same components. In the accompanying drawings:

图1示出了本发明实施例提供的骚扰电话异常检测系统的架构示意图;FIG1 is a schematic diagram showing the architecture of a harassing call anomaly detection system provided by an embodiment of the present invention;

图2示出了本发明实施例提供的骚扰电话异常检测方法的流程示意图;FIG2 is a schematic diagram showing a flow chart of a method for detecting abnormal harassing calls provided by an embodiment of the present invention;

图3示出了本发明实施例提供的骚扰电话异常检测方法的构建骚扰电话预警模型的流程示意图;FIG3 is a schematic diagram showing a flow chart of building a harassing call warning model in a harassing call abnormality detection method provided in an embodiment of the present invention;

图4示出了本发明实施例提供的骚扰电话异常检测方法的骚扰电话预警模型的行为特征示意图;FIG4 is a schematic diagram showing the behavior characteristics of a harassing call warning model of a harassing call anomaly detection method provided by an embodiment of the present invention;

图5示出了本发明实施例提供的又一骚扰电话异常检测方法的流程示意图;FIG5 is a schematic diagram showing a flow chart of another method for detecting abnormal harassing calls provided by an embodiment of the present invention;

图6示出了本发明实施例提供的骚扰电话异常检测装置的结构示意图;FIG6 shows a schematic diagram of the structure of a device for detecting abnormal harassment calls provided by an embodiment of the present invention;

图7示出了本发明实施例提供的计算设备的结构示意图。FIG. 7 shows a schematic diagram of the structure of a computing device provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。The exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present invention are shown in the accompanying drawings, it should be understood that the present invention can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present invention and to enable the scope of the present invention to be fully communicated to those skilled in the art.

本发明实施例针对骚扰电话治理异常检测的需求,提供了一种综合性的骚扰电话异常检测系统,是基于现有通用大数据架构,采用Hadoop分布式计算框架中MapReduce、Hive、Spark分布式技术,实现网内开卡静默预警模型、高危代理商发卡预警、可疑基站呼叫预警、高危用户预警、猫池骚扰数据治理预警等场景的分析、预警。In response to the demand for abnormal detection of harassing call management, the embodiment of the present invention provides a comprehensive harassing call anomaly detection system. The system is based on the existing general big data architecture and adopts MapReduce, Hive, and Spark distributed technologies in the Hadoop distributed computing framework to realize the analysis and warning of scenarios such as the silent warning model for card activation within the network, the warning for card issuance by high-risk agents, the warning for suspicious base station calls, the warning for high-risk users, and the warning for harassing data management in cat pools.

骚扰电话异常检测系统的具体架构如图1所示,分为三个部分:安全数据中心,安全分析子平台,安全态势管理子平台。采集的外部数据在安全数据中心进行初步的整理和储存,然后使用安全分析子平台对该数据进行进一步的分析,将分析的结果输出到安全态势管理子平台进行展示与自动化处置。The specific architecture of the harassment call anomaly detection system is shown in Figure 1, which is divided into three parts: security data center, security analysis sub-platform, and security situation management sub-platform. The collected external data is initially sorted and stored in the security data center, and then the security analysis sub-platform is used to further analyze the data, and the analysis results are output to the security situation management sub-platform for display and automatic processing.

安全数据中心用于收集原始数据,对数据进行初步的分类和整理,并为上层分析提供资源与接口。原始数据在安全数据中心中进行数据清洗,标准化,补齐,打标签等操作以后存储在Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)中,为接下来的分析做准备。同时,安全数据中心还为分析提供相应的接口(api,idbc,ftp)和资源,包括SQL,spark,HDFS,弹性搜索(elasticsearch,ES)等。该层同时提供元数据管理,组建管理,运行监控的功能。其中,元数据管理是灵活管理数据表单,支持数据采集的断点续传,数据传输校验,并提供统一的接口方便的接入新的数据,为未来的扩容做准备。组件管理是对采集组件,分析组件,存储组件灵活管理,提供组件升级,组件重启等功能。运行监控是实时监控各组件的运行状态即系统资源的利用情况。The security data center is used to collect raw data, perform preliminary classification and organization of the data, and provide resources and interfaces for upper-level analysis. After the raw data is cleaned, standardized, supplemented, and labeled in the security data center, it is stored in the Hadoop Distributed File System (HDFS) to prepare for the subsequent analysis. At the same time, the security data center also provides corresponding interfaces (api, idbc, ftp) and resources for analysis, including SQL, spark, HDFS, elastic search (elasticsearch, ES), etc. This layer also provides metadata management, component management, and operation monitoring functions. Among them, metadata management is to flexibly manage data forms, support breakpoint continuation of data collection, data transmission verification, and provide a unified interface to facilitate access to new data, in preparation for future expansion. Component management is to flexibly manage collection components, analysis components, and storage components, and provide functions such as component upgrade and component restart. Operation monitoring is to monitor the operating status of each component in real time, that is, the utilization of system resources.

安全分析子平台用于为骚扰电话业务异常进行分析,包括基础算法库和各种业务异常分析模型。分析模型包含高频通话监控模型、高危用户监视模型、小时检测模型及其他异常检测模型等。该层同时提供引擎管理和运行监控的功能,其中,引擎管理灵活加入新的检测引擎,以适应更多的检测场景。同时,灵活的对每种引擎的算法进行升级与调试。运行监控是灵活监控每种引擎的运行状态,包括是否正确运行,是否死机等。The security analysis sub-platform is used to analyze the anomalies of harassing call services, including the basic algorithm library and various business anomaly analysis models. The analysis models include high-frequency call monitoring models, high-risk user monitoring models, hourly detection models, and other anomaly detection models. This layer also provides engine management and operation monitoring functions. Among them, engine management flexibly adds new detection engines to adapt to more detection scenarios. At the same time, the algorithm of each engine can be flexibly upgraded and debugged. Operation monitoring is to flexibly monitor the operating status of each engine, including whether it is running correctly, whether it is frozen, etc.

安全态势管理子平台提供态势呈现功能,表现为可视化的威胁预警与风险通告,并可按需求对分析结果进行自动化处置。其中,态势呈现功能从时间、空间等维度,对各种安全风险的现状、历史、发展趋势等进行图标形式的可视化展示。The security situation management sub-platform provides a situation presentation function, which is manifested as a visual threat warning and risk notification, and can automatically handle the analysis results as needed. Among them, the situation presentation function visualizes the current status, history, and development trends of various security risks in the form of icons from the dimensions of time and space.

本发明实施例基于大量历史通讯记录话单数据的统计去发现骚扰通话行为与普通用户的通话行为的特征差别,然后根据特征判别通话记录。在此基础上额外导入了互联网爬虫数据。使用了爬虫数据作为额外维度的决策树即为机器学习的手段,利用该手段进行电信骚扰行为特征的学习。The embodiment of the present invention discovers the characteristic differences between harassing call behaviors and ordinary users' call behaviors based on the statistics of a large amount of historical communication record bill data, and then distinguishes the call records based on the characteristics. On this basis, Internet crawler data is additionally imported. The decision tree using crawler data as an additional dimension is a machine learning method, which is used to learn the characteristics of telecommunications harassment behavior.

图2示出了本发明实施例提供的骚扰电话异常检测方法的流程示意图。该骚扰电话异常检测方法应用于运营商服务器端,如图2所示,骚扰电话异常检测方法包括:FIG2 is a flow chart of a method for detecting abnormal harassment calls provided by an embodiment of the present invention. The method for detecting abnormal harassment calls is applied to an operator server, as shown in FIG2 , and the method for detecting abnormal harassment calls includes:

步骤S11:获取通话记录话单数据以及互联网爬虫数据。Step S11: Acquire call record bill data and Internet crawler data.

本发明实施例应用搭建的专门的大数据存储平台,使用MapReduce、Hive、Spark等现有的分布式架构。从业务域和运营域获取原始数据,原始数据包括以下几种:The dedicated big data storage platform built by the embodiment of the present invention uses existing distributed architectures such as MapReduce, Hive, and Spark. The original data is obtained from the business domain and the operation domain. The original data includes the following:

业务运营支撑系统(Business and Operation Support System,BOSS)基础数据,包括用卡单位数据,发卡单位数据,发卡信息数据。Business and Operation Support System (BOSS) basic data, including card-using unit data, card-issuing unit data, and card-issuing information data.

BOSS业务数据,包括语音话单,短信话单,流量话单等话单数据。BOSS service data, including voice call records, SMS call records, traffic call records and other call record data.

用户入网数据,包括入网信息,入网所在渠道信息等。User network access data, including network access information, network access channel information, etc.

用户资料数据,包括计费数据、上网数据等。User profile data, including billing data, Internet access data, etc.

在本发明实施例中,从业务域和运营域采集所述通话记录话单数据,并根据所述通话记录话单数据统计行为特征。具体对采集的通话记录话单数据按照所属类别划分不同类别的行为特征。In the embodiment of the present invention, the call record bill data is collected from the business domain and the operation domain, and the behavior characteristics are counted according to the call record bill data. Specifically, the collected call record bill data is divided into different categories of behavior characteristics according to the categories to which it belongs.

本发明实施例的爬虫数据来源广泛:多网络平台的电话标记数据,例如百度、360等互联网安全厂商;多维度数据,包括用户行业信息、企业信息、信用信息、房产和广告推销信息等。The crawler data of the embodiment of the present invention comes from a wide range of sources: telephone tag data from multiple network platforms, such as Baidu, 360 and other Internet security vendors; multi-dimensional data, including user industry information, enterprise information, credit information, real estate and advertising information, etc.

步骤S12:根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据。Step S12: Analyze the harassing call warning model constructed based on the call record data to obtain warning data.

在步骤S12之前,需要构建骚扰电话预警模型。具体地,如图3所示,包括:Before step S12, a harassment call warning model needs to be constructed. Specifically, as shown in FIG3 , it includes:

步骤S121:获取历史通讯记录话单数据以及历史互联网爬虫数据,并根据所述历史通讯记录话单数据统计历史行为特征。Step S121: Acquire historical communication record bill data and historical Internet crawler data, and calculate historical behavior characteristics based on the historical communication record bill data.

具体地数据处理方法与步骤S11中的相同,对所采集的数据按照所属类别进行划分不同类别的历史行为特征。Specifically, the data processing method is the same as that in step S11, and the collected data is divided into different categories of historical behavior characteristics according to the categories to which it belongs.

步骤S122:根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树。Step S122: constructing a decision tree according to the historical behavior characteristics and the historical Internet crawler data.

首先根据所述历史行为特征获取信息增益。具体地,根据所述历史行为特征获取与所述历史行为特征对应的经验熵与条件熵;根据所述经验熵与所述条件熵计算信息增益。不同类别特征所有可能值所包含的期望满足以下关系式:其中,n为分类数目,p(xi)为特征的值。本发明实施例定义样本数据表中的数据为训练数据集D,则训练数据集D的经验熵为H(D),|D|表示其样本容量,即样本个数。设有K个类特征Ck,k=1,2,3,···,K,|Ck|为属于类特征Ck的样本个数,则可应用以下关系式计算与所述历史行为特征对应的经验熵:条件熵H(Y∣X)表示在已知随机变量X的条件下随机变量Y的不确定性,随机变量X给定的条件下随机变量Y的条件熵(conditional entropy)H(Y|X),即随机变量X给定条件下Y的条件概率分布的熵对X的数学期望。应用以下关系式计算与所述历史行为特征对应的条件熵:其中,pi=P(X=xi),H(Y|X)为特征X给定条件下Y的条件概率分布的熵对特征X的数学期望。对于每项特征而言,信息增益为经验熵与条件熵之差。因此,应用以下关系式计算信息增益:G(D,A)=H(D)-H(D|A),其中,A为所述特征。First, information gain is obtained according to the historical behavior characteristics. Specifically, the empirical entropy and conditional entropy corresponding to the historical behavior characteristics are obtained according to the historical behavior characteristics; information gain is calculated according to the empirical entropy and the conditional entropy. The expectations contained in all possible values of different category features satisfy the following relationship: Where n is the number of categories, and p(xi ) is the value of the feature. The data in the sample data table of the embodiment of the present invention is defined as the training data set D, and the empirical entropy of the training data set D is H(D), and |D| represents its sample capacity, that is, the number of samples. Suppose there are K class featuresCk , k = 1, 2, 3, ..., K, |Ck| is the number of samples belonging to the class featureCk , then the following relationship can be applied to calculate the empirical entropy corresponding to the historical behavior feature: Conditional entropy H(Y|X) represents the uncertainty of random variable Y under the condition of known random variable X. The conditional entropy H(Y|X) of random variable Y under the condition of given random variable X is the mathematical expectation of the entropy of the conditional probability distribution of Y under the condition of given random variable X. The following relationship is used to calculate the conditional entropy corresponding to the historical behavior characteristics: Where, pi = P(X = xi ), H(Y|X) is the mathematical expectation of the entropy of the conditional probability distribution of Y given the feature X for the feature X. For each feature, the information gain is the difference between the empirical entropy and the conditional entropy. Therefore, the following relationship is used to calculate the information gain: G(D,A) = H(D) - H(D|A), where A is the feature.

然后根据所述信息增益从最大值的一项开始从大到小依次建立分支,构建决策树。Then, according to the information gain, branches are established in order from large to small starting from the item with the maximum value to construct a decision tree.

由于决策树本身存在的过拟合(over-fitting)情况,为了提高电信骚扰行为特征分类器的准确率和对新数据的识别能力,需要仔细选择构建决策树时的取值。在本发明实施例中,根据所述历史互联网爬虫数据对所述决策树进行剪枝操作,在所述信息增益小于预设阈值时停止构建所述决策树。即使用互联网数据的属性维度对决策树进行剪枝操作,使得当信息增益小于该数据所定的预设阀值时则停止构建决策树,由此确定适合的决策树。Due to the over-fitting situation of the decision tree itself, in order to improve the accuracy of the telecommunications harassment behavior feature classifier and the ability to identify new data, it is necessary to carefully select the values when constructing the decision tree. In an embodiment of the present invention, the decision tree is pruned according to the historical Internet crawler data, and the construction of the decision tree is stopped when the information gain is less than the preset threshold. That is, the decision tree is pruned using the attribute dimension of the Internet data, so that when the information gain is less than the preset threshold set by the data, the decision tree is stopped, thereby determining a suitable decision tree.

步骤S123:根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证并调整所述决策树的参数,确定最终的所述决策树。Step S123: According to the sample size, a random secondary sampling method or a self-service sampling method is used to verify the decision tree and adjust the parameters of the decision tree to determine the final decision tree.

决策树构建完成后,需要对决策树导入数据进行验证,根据计算后得出的评估指标值来判断决策树的优劣。具体根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证,根据验证结果调整构建决策树所设计的参数,最后得到最终可用的决策树。其中,随机二次抽样法适用于较大数据量,自助抽样法适用于较少数据量。评估指标包含以下四种:分类准确度,召回率,虚警率,精确度。After the decision tree is built, it is necessary to verify the data imported into the decision tree, and judge the quality of the decision tree according to the evaluation index value obtained after calculation. Specifically, the decision tree is verified by random secondary sampling method or self-service sampling method according to the number of samples, and the parameters designed for building the decision tree are adjusted according to the verification results, and finally the final usable decision tree is obtained. Among them, the random secondary sampling method is suitable for larger data volumes, and the self-service sampling method is suitable for smaller data volumes. The evaluation indicators include the following four: classification accuracy, recall rate, false alarm rate, and precision.

步骤S124:使用所述决策树建立所述骚扰电话预警模型。Step S124: Use the decision tree to establish the harassing call warning model.

在本发明实施例中,使用所述决策树建立所述骚扰电话预警模型,此即为实际生产环境中使用的骚扰电话预警模型模型。骚扰电话预警模型的行为特征如图4所示,包括主叫统计特征和被叫统计特征。主叫统计特征包括呼叫频率、呼空率、被叫地域离散率、时间分布、被叫号码离散率、通话时长均值等。被叫统计特征包括被呼叫频率、时间分布、主叫号码离散率、主叫号码地域离散率等。骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。不同的骚扰电话预警模型适用不同类型的骚扰电话检测,对应的决策树的建立方法与前面相同,只是包括不同类别的行为特征。In an embodiment of the present invention, the decision tree is used to establish the harassing call warning model, which is the harassing call warning model used in the actual production environment. The behavioral characteristics of the harassing call warning model are shown in Figure 4, including calling statistical characteristics and called statistical characteristics. Calling statistical characteristics include call frequency, empty call rate, called regional discrete rate, time distribution, called number discrete rate, call duration mean, etc. Called statistical characteristics include called frequency, time distribution, calling number discrete rate, calling number regional discrete rate, etc. The harassing call warning model includes: high-frequency call warning model, cat pool warning model, high-risk user monitoring model, silent card activation monitoring model, and hourly model. Different harassing call warning models are suitable for different types of harassing call detection, and the corresponding decision tree establishment method is the same as before, except that it includes different categories of behavioral characteristics.

在步骤S12中,应用已建立的骚扰电话预警模型对通话记录话单数据进行分析,具体可以应用其中一种或多种骚扰电话预警模型在各维度对原始数据进行分析过滤,将其中部分符合骚扰电话预警模型标准的数据标记为预警数据。In step S12, the established harassing call warning model is applied to analyze the call record data. Specifically, one or more harassing call warning models can be applied to analyze and filter the original data in various dimensions, and some data that meets the standards of the harassing call warning model can be marked as warning data.

步骤S13:根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码。Step S13: Perform a second review on the warning data based on the Internet crawler data to obtain the real harassing phone number.

引入互联网爬虫数据进行校验,具体应用互联网爬虫数据作为额外维度对二次复核的数据进行较验,能够更好区分是否为真实骚扰电话号码,从而获得真实的骚扰电话号码。Introducing Internet crawler data for verification, specifically using Internet crawler data as an additional dimension to verify the secondary reviewed data, can better distinguish whether it is a real harassing phone number, thereby obtaining the real harassing phone number.

步骤S14:将所述骚扰电话进行下发以关停所述骚扰电话号码。Step S14: Send the harassing call to shut down the harassing phone number.

在本发明实施例中,对于符合骚扰电话预警模型且校验通过符合骚扰特征的骚扰电话号码,将所述骚扰电话号码自动下发到所属公司通知机构关停。或者也可以将所述骚扰电话号码与已有自动处置系统进行关联形成自动关停,提升关停的时效性。In the embodiment of the present invention, for harassing phone numbers that meet the harassing phone warning model and pass the verification of meeting harassing characteristics, the harassing phone numbers are automatically sent to the affiliated company notification agency to shut down. Alternatively, the harassing phone numbers can be associated with the existing automatic handling system to form an automatic shutdown, thereby improving the timeliness of the shutdown.

本发明实施例通过分析发现疑似骚扰电话的通话行为与普通用户的通话行为的特征差别,使用了支持决策树(Decision Tree)这一机器学习手段进行通讯骚扰行为的学习,建立骚扰电话预警模型,从而应用骚扰电话预警模型发现高危疑似骚扰电话号码,并进行预警处置。完整的骚扰电话异常检测方法如图5所示,包括:The embodiment of the present invention analyzes and finds the characteristic differences between the calling behaviors of suspected harassing calls and those of ordinary users, uses a machine learning method called decision tree to learn about communication harassing behaviors, and establishes a harassing call warning model, thereby using the harassing call warning model to find high-risk suspected harassing phone numbers and conduct warning disposal. The complete harassing call anomaly detection method is shown in Figure 5, and includes:

步骤S201:获取原始数据。Step S201: Acquire original data.

具体地,从业务域和运营域获取原始数据,同时还从互联网获取互联网爬虫数据。原始数据包括通话记录话单数据、用户资料数据等。Specifically, the original data is obtained from the business domain and the operation domain, and the Internet crawler data is also obtained from the Internet. The original data includes call record data, user profile data, etc.

步骤S202:应用骚扰电话预警模型进行分析。Step S202: Apply the harassing call warning model for analysis.

骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。可以应用其中一种或多种骚扰电话预警模型在各维度对原始数据进行分析过滤,将其中部分符合骚扰电话预警模型标准的数据标记为预警数据。The harassment call warning models include: high-frequency call warning model, cat pool warning model, high-risk user monitoring model, silent card activation monitoring model, and hourly model. One or more of these harassment call warning models can be used to analyze and filter the original data in various dimensions, and some of the data that meets the standards of the harassment call warning model can be marked as warning data.

步骤S203:应用互联网爬虫数据进行二次复核。Step S203: Use Internet crawler data for secondary review.

具体应用互联网爬虫数据作为额外维度对二次复核的数据进行较验,能够更好区分是否为真实骚扰电话号码。The specific application of Internet crawler data as an additional dimension to verify the secondary review data can better distinguish whether it is a real harassing phone number.

步骤S204:判断是否符合关停条件。如果否,则执行步骤S205;如果是,则跳转至执行步骤S206。Step S204: Determine whether the shutdown conditions are met. If not, execute step S205; if yes, jump to execute step S206.

具体地,如果符合骚扰电话预警模型且应用互联网爬虫数据校验通过,则符合关停条件,为真实的骚扰电话号码。否则,不符合关停条件。Specifically, if it meets the harassing call warning model and passes the Internet crawler data verification, it meets the shutdown conditions and is a real harassing phone number. Otherwise, it does not meet the shutdown conditions.

步骤S205:不予处置。Step S205: No processing is performed.

如果不符合骚扰电话预警模型,和/或应用互联网爬虫数据校验未通过,不符合关停条件,则不进行任何处理,保持原状态不变。If it does not meet the harassing call warning model, and/or the Internet crawler data verification fails and does not meet the shutdown conditions, no action will be taken and the original status will remain unchanged.

步骤S206:自动下发骚扰电话号码。Step S206: Automatically send the harassing phone number.

如果符合骚扰电话预警模型且应用互联网爬虫数据校验通过,为真实的骚扰电话号码,则将真实的骚扰电话号码下发到所属公司,或与已有自动处置系统进行关联。If it meets the harassing call warning model and passes the Internet crawler data verification, and is a real harassing phone number, the real harassing phone number will be sent to the company to which it belongs, or linked to an existing automatic handling system.

步骤S207:关停骚扰电话号码。Step S207: shut down the harassing phone number.

通过骚扰电话号码下发到的所属公司通知机构关停对应的骚扰电话号码,或者也可以是与骚扰电话号码关联的已有自动处置系统自动对骚扰电话号码进行关停处理。The company to which the harassing phone number belongs notifies the agency to shut down the corresponding harassing phone number, or an existing automatic handling system associated with the harassing phone number can automatically shut down the harassing phone number.

本发明实施例的骚扰电话异常检测方法采用了更多的数据源,除主叫的信令数据外还接入了计费话单、入网信息、流量信息等相关信息构建骚扰电话预警模型。基于多种数据源的骚扰电话预警模型可以提高分析结果的准确性和完备性。并且本发明实施例除了常规骚扰电话预警模型分析外还导入了互联网爬虫数据,二者融合后的数据源能更好区分其中特征相似的正常通话和实际骚扰电话,分析结果更加符合实用场景,且准确可靠,是一种创新型的分析手段,对以后建立新的骚扰电话分析系统作出了有力指导。针对生成的骚扰电话分析结果,可按业务实际需求将数据进一步进行其它处理,提升了数据的灵活性与适应广度,提高了生产效率。The abnormal detection method for harassing calls in the embodiment of the present invention adopts more data sources. In addition to the caller's signaling data, it also accesses relevant information such as billing bills, network access information, and traffic information to build a harassing call warning model. The harassing call warning model based on multiple data sources can improve the accuracy and completeness of the analysis results. In addition to the conventional harassing call warning model analysis, the embodiment of the present invention also imports Internet crawler data. The data source after the fusion of the two can better distinguish between normal calls and actual harassing calls with similar features. The analysis results are more in line with practical scenarios, and are accurate and reliable. It is an innovative analysis method and provides strong guidance for the establishment of a new harassing call analysis system in the future. For the generated harassing call analysis results, the data can be further processed according to the actual business needs, which improves the flexibility and adaptability of the data and improves production efficiency.

本发明实施例通过获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码,能够基于多种数据源进行处理,更加符合实用场景,能够提高检测的准确性和完备性。The embodiment of the present invention obtains call record bill data and Internet crawler data; analyzes the harassment call warning model constructed according to the call record bill data to obtain warning data; conducts a secondary review of the warning data according to the Internet crawler data to obtain the real harassment phone number; and sends the harassment call to shut down the harassment phone number. It can be processed based on multiple data sources, is more in line with practical scenarios, and can improve the accuracy and completeness of detection.

图6示出了本发明实施例的骚扰电话异常检测装置的结构示意图。如图6所示,该骚扰电话异常检测装置包括:数据获取单元601、模型分析单元602、二次复核单元603、号码关停单元604以及模型构建单元605。其中:FIG6 shows a schematic diagram of the structure of a device for detecting abnormal harassment calls according to an embodiment of the present invention. As shown in FIG6 , the device for detecting abnormal harassment calls includes: a data acquisition unit 601, a model analysis unit 602, a secondary review unit 603, a number shutdown unit 604, and a model construction unit 605. Among them:

数据获取单元601获取通话记录话单数据以及互联网爬虫数据;模型分析单元602用于根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;二次复核单元603用于根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;号码关停单元604用于将所述骚扰电话进行下发以关停所述骚扰电话号码。The data acquisition unit 601 acquires call record bill data and Internet crawler data; the model analysis unit 602 is used to analyze the harassment call warning model constructed based on the call record bill data to obtain warning data; the secondary review unit 603 is used to perform a secondary review on the warning data based on the Internet crawler data to obtain the real harassment phone number; the number shutdown unit 604 is used to send the harassment call to shut down the harassment phone number.

在一种可选的方式中,数据获取单元601用于:从业务域和运营域采集所述通话记录话单数据,并根据所述通话记录话单数据统计行为特征。In an optional manner, the data acquisition unit 601 is used to collect the call record bill data from the business domain and the operation domain, and to collect statistical behavior features based on the call record bill data.

在一种可选的方式中,所述骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。In an optional manner, the harassing call warning model includes: a high-frequency call warning model, a cat pool warning model, a high-risk user monitoring model, a silent card activation monitoring model, and an hourly model.

在一种可选的方式中,模型构建单元605用于:获取历史通讯记录话单数据以及历史互联网爬虫数据,并根据所述历史通讯记录话单数据统计历史行为特征;根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树;根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证并调整所述决策树的参数,确定最终的所述决策树;使用所述决策树建立所述骚扰电话预警模型。In an optional manner, the model building unit 605 is used to: obtain historical communication record bill data and historical Internet crawler data, and statistically analyze historical behavior characteristics based on the historical communication record bill data; build a decision tree based on the historical behavior characteristics and the historical Internet crawler data; verify the decision tree and adjust the parameters of the decision tree using random secondary sampling or self-service sampling according to the sample size to determine the final decision tree; and use the decision tree to establish the harassing call warning model.

在一种可选的方式中,模型构建单元605用于:根据所述历史行为特征获取信息增益;根据所述信息增益从最大值的一项开始从大到小依次建立分支,构建决策树;根据所述历史互联网爬虫数据对所述决策树进行剪枝操作,在所述信息增益小于预设阈值时停止构建所述决策树。In an optional manner, the model building unit 605 is used to: obtain information gain based on the historical behavior characteristics; establish branches in order from large to small starting from the item with the maximum value according to the information gain to construct a decision tree; prune the decision tree according to the historical Internet crawler data, and stop building the decision tree when the information gain is less than a preset threshold.

在一种可选的方式中,模型构建单元605用于:根据所述历史行为特征获取与所述历史行为特征对应的经验熵与条件熵;根据所述经验熵与所述条件熵计算信息增益。In an optional manner, the model building unit 605 is used to: obtain the empirical entropy and conditional entropy corresponding to the historical behavior feature according to the historical behavior feature; and calculate the information gain according to the empirical entropy and the conditional entropy.

在一种可选的方式中,号码关停单元604用于:将所述骚扰电话号码自动下发到所属公司通知机构关停;或者将所述骚扰电话号码与已有自动处置系统进行关联形成自动关停。In an optional manner, the number deactivation unit 604 is used to: automatically send the harassing phone number to the company to which it belongs to notify the agency to deactivate the number; or associate the harassing phone number with an existing automatic handling system to automatically deactivate the number.

本发明实施例通过获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码,能够基于多种数据源进行处理,更加符合实用场景,能够提高检测的准确性和完备性。The embodiment of the present invention obtains call record bill data and Internet crawler data; analyzes the harassment call warning model constructed according to the call record bill data to obtain warning data; conducts a secondary review of the warning data according to the Internet crawler data to obtain the real harassment phone number; and sends the harassment call to shut down the harassment phone number. It can be processed based on multiple data sources, is more in line with practical scenarios, and can improve the accuracy and completeness of detection.

本发明实施例提供了一种非易失性计算机存储介质,所述计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的骚扰电话异常检测方法。An embodiment of the present invention provides a non-volatile computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the harassing call anomaly detection method in any of the above method embodiments.

可执行指令具体可以用于使得处理器执行以下操作:The executable instructions may be specifically used to cause the processor to perform the following operations:

获取通话记录话单数据以及互联网爬虫数据;Obtain call record data and Internet crawler data;

根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;Analyze the harassment call warning model constructed by the call record data application to obtain warning data;

根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;Perform a secondary review of the warning data based on the Internet crawler data to obtain a real harassing phone number;

将所述骚扰电话进行下发以关停所述骚扰电话号码。The harassing call is sent to close the harassing phone number.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

从业务域和运营域采集所述通话记录话单数据,并根据所述通话记录话单数据统计行为特征。The call record bill data is collected from the business domain and the operation domain, and behavioral features are counted based on the call record bill data.

在一种可选的方式中,所述骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。In an optional manner, the harassing call warning model includes: a high-frequency call warning model, a cat pool warning model, a high-risk user monitoring model, a silent card activation monitoring model, and an hourly model.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

获取历史通讯记录话单数据以及历史互联网爬虫数据,并根据所述历史通讯记录话单数据统计历史行为特征;Obtaining historical communication record bill data and historical Internet crawler data, and statistically analyzing historical behavior characteristics based on the historical communication record bill data;

根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树;Building a decision tree based on the historical behavior characteristics and the historical Internet crawler data;

根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证并调整所述决策树的参数,确定最终的所述决策树;Verify the decision tree and adjust the parameters of the decision tree by using a random secondary sampling method or a self-service sampling method according to the sample size to determine the final decision tree;

使用所述决策树建立所述骚扰电话预警模型。The decision tree is used to establish the harassing call warning model.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

根据所述历史行为特征获取信息增益;Obtaining information gain according to the historical behavior characteristics;

根据所述信息增益从最大值的一项开始从大到小依次建立分支,构建决策树;According to the information gain, branches are established in order from large to small starting from the item with the maximum value, so as to construct a decision tree;

根据所述历史互联网爬虫数据对所述决策树进行剪枝操作,在所述信息增益小于预设阈值时停止构建所述决策树。The decision tree is pruned according to the historical Internet crawler data, and construction of the decision tree is stopped when the information gain is less than a preset threshold.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

根据所述历史行为特征获取与所述历史行为特征对应的经验熵与条件熵;Acquire the empirical entropy and conditional entropy corresponding to the historical behavior characteristics according to the historical behavior characteristics;

根据所述经验熵与所述条件熵计算信息增益。The information gain is calculated according to the empirical entropy and the conditional entropy.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

将所述骚扰电话号码自动下发到所属公司通知机构关停;或者Automatically send the harassing phone number to the company to which it belongs to notify the agency to shut it down; or

将所述骚扰电话号码与已有自动处置系统进行关联形成自动关停。The harassing telephone number is associated with an existing automatic handling system to automatically shut it down.

本发明实施例通过获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码,能够基于多种数据源进行处理,更加符合实用场景,能够提高检测的准确性和完备性。The embodiment of the present invention obtains call record bill data and Internet crawler data; analyzes the harassment call warning model constructed according to the call record bill data to obtain warning data; conducts a secondary review of the warning data according to the Internet crawler data to obtain the real harassment phone number; and sends the harassment call to shut down the harassment phone number. It can be processed based on multiple data sources, is more in line with practical scenarios, and can improve the accuracy and completeness of detection.

本发明实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意方法实施例中的骚扰电话异常检测方法。An embodiment of the present invention provides a computer program product, which includes a computer program stored on a computer storage medium, and the computer program includes program instructions. When the program instructions are executed by a computer, the computer executes the method for detecting abnormal harassing calls in any of the above method embodiments.

可执行指令具体可以用于使得处理器执行以下操作:The executable instructions may be specifically used to cause the processor to perform the following operations:

获取通话记录话单数据以及互联网爬虫数据;Obtain call record data and Internet crawler data;

根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;Analyze the harassment call warning model constructed by the call record data application to obtain warning data;

根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;Perform a secondary review of the warning data based on the Internet crawler data to obtain a real harassing phone number;

将所述骚扰电话进行下发以关停所述骚扰电话号码。The harassing call is sent to close the harassing phone number.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

从业务域和运营域采集所述通话记录话单数据,并根据所述通话记录话单数据统计行为特征。The call record bill data is collected from the business domain and the operation domain, and behavioral features are counted based on the call record bill data.

在一种可选的方式中,所述骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。In an optional manner, the harassing call warning model includes: a high-frequency call warning model, a cat pool warning model, a high-risk user monitoring model, a silent card activation monitoring model, and an hourly model.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

获取历史通讯记录话单数据以及历史互联网爬虫数据,并根据所述历史通讯记录话单数据统计历史行为特征;Obtaining historical communication record bill data and historical Internet crawler data, and statistically analyzing historical behavior characteristics based on the historical communication record bill data;

根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树;Building a decision tree based on the historical behavior characteristics and the historical Internet crawler data;

根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证并调整所述决策树的参数,确定最终的所述决策树;Verify the decision tree and adjust the parameters of the decision tree by using a random secondary sampling method or a self-service sampling method according to the sample size to determine the final decision tree;

使用所述决策树建立所述骚扰电话预警模型。The decision tree is used to establish the harassing call warning model.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

根据所述历史行为特征获取信息增益;Obtaining information gain according to the historical behavior characteristics;

根据所述信息增益从最大值的一项开始从大到小依次建立分支,构建决策树;According to the information gain, branches are established in order from large to small starting from the item with the maximum value, so as to construct a decision tree;

根据所述历史互联网爬虫数据对所述决策树进行剪枝操作,在所述信息增益小于预设阈值时停止构建所述决策树。The decision tree is pruned according to the historical Internet crawler data, and construction of the decision tree is stopped when the information gain is less than a preset threshold.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

根据所述历史行为特征获取与所述历史行为特征对应的经验熵与条件熵;Acquire the empirical entropy and conditional entropy corresponding to the historical behavior characteristics according to the historical behavior characteristics;

根据所述经验熵与所述条件熵计算信息增益。The information gain is calculated according to the empirical entropy and the conditional entropy.

在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instruction causes the processor to perform the following operations:

将所述骚扰电话号码自动下发到所属公司通知机构关停;或者Automatically send the harassing phone number to the company to which it belongs to notify the agency to shut it down; or

将所述骚扰电话号码与已有自动处置系统进行关联形成自动关停。The harassing telephone number is associated with an existing automatic handling system to automatically shut it down.

本发明实施例通过获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码,能够基于多种数据源进行处理,更加符合实用场景,能够提高检测的准确性和完备性。The embodiment of the present invention obtains call record bill data and Internet crawler data; analyzes the harassment call warning model constructed according to the call record bill data to obtain warning data; conducts a secondary review of the warning data according to the Internet crawler data to obtain the real harassment phone number; and sends the harassment call to shut down the harassment phone number. It can be processed based on multiple data sources, is more in line with practical scenarios, and can improve the accuracy and completeness of detection.

图7示出了本发明实施例提供的计算设备的结构示意图,本发明具体实施例并不对设备的具体实现做限定。FIG. 7 shows a schematic diagram of the structure of a computing device provided in an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the device.

如图7所示,该计算设备可以包括:处理器(processor)702、通信接口(Communications Interface)704、存储器(memory)706、以及通信总线708。As shown in FIG. 7 , the computing device may include: a processor (processor) 702 , a communications interface (Communications Interface) 704 , a memory (memory) 706 , and a communication bus 708 .

其中:处理器702、通信接口704、以及存储器706通过通信总线708完成相互间的通信。通信接口704,用于与其它设备比如客户端或其它服务器等的网元通信。处理器702,用于执行程序710,具体可以执行上述骚扰电话异常检测方法实施例中的相关步骤。The processor 702, the communication interface 704, and the memory 706 communicate with each other via the communication bus 708. The communication interface 704 is used to communicate with other devices such as a client or other server network elements. The processor 702 is used to execute the program 710, which can specifically execute the relevant steps in the above-mentioned harassment call abnormality detection method embodiment.

具体地,程序710可以包括程序代码,该程序代码包括计算机操作指令。Specifically, the program 710 may include program codes, which include computer operation instructions.

处理器702可能是中央处理器CPU,或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit),或者是被配置成实施本发明实施例的一个或各个集成电路。设备包括的一个或各个处理器,可以是同一类型的处理器,如一个或各个CPU;也可以是不同类型的处理器,如一个或各个CPU以及一个或各个ASIC。The processor 702 may be a central processing unit (CPU), or an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the device may be processors of the same type, such as one or more CPUs; or may be processors of different types, such as one or more CPUs and one or more ASICs.

存储器706,用于存放程序710。存储器706可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 706 is used to store the program 710. The memory 706 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

程序710具体可以用于使得处理器702执行以下操作:The program 710 may be specifically configured to enable the processor 702 to perform the following operations:

获取通话记录话单数据以及互联网爬虫数据;Obtain call record data and Internet crawler data;

根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;Analyze the harassment call warning model constructed by the call record data application to obtain warning data;

根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;Perform a secondary review of the warning data based on the Internet crawler data to obtain a real harassing phone number;

将所述骚扰电话进行下发以关停所述骚扰电话号码。The harassing call is sent to close the harassing phone number.

在一种可选的方式中,所述程序710使所述处理器执行以下操作:In an optional manner, the program 710 enables the processor to perform the following operations:

从业务域和运营域采集所述通话记录话单数据,并根据所述通话记录话单数据统计行为特征。The call record bill data is collected from the business domain and the operation domain, and behavioral features are counted based on the call record bill data.

在一种可选的方式中,所述骚扰电话预警模型包括:高频电话预警模型、猫池预警模型、高危用户监控模型、静默开卡监控模型、小时模型。In an optional manner, the harassing call warning model includes: a high-frequency call warning model, a cat pool warning model, a high-risk user monitoring model, a silent card activation monitoring model, and an hourly model.

在一种可选的方式中,所述程序710使所述处理器执行以下操作:In an optional manner, the program 710 enables the processor to perform the following operations:

获取历史通讯记录话单数据以及历史互联网爬虫数据,并根据所述历史通讯记录话单数据统计历史行为特征;Obtaining historical communication record bill data and historical Internet crawler data, and statistically analyzing historical behavior characteristics based on the historical communication record bill data;

根据所述历史行为特征以及所述历史互联网爬虫数据构建决策树;Building a decision tree based on the historical behavior characteristics and the historical Internet crawler data;

根据样本数量采用随机二次抽样法或自助抽样法对所述决策树进行验证并调整所述决策树的参数,确定最终的所述决策树;Verify the decision tree and adjust the parameters of the decision tree by using a random secondary sampling method or a self-service sampling method according to the sample size to determine the final decision tree;

使用所述决策树建立所述骚扰电话预警模型。The decision tree is used to establish the harassing call warning model.

在一种可选的方式中,所述程序710使所述处理器执行以下操作:In an optional manner, the program 710 enables the processor to perform the following operations:

根据所述历史行为特征获取信息增益;Obtaining information gain according to the historical behavior characteristics;

根据所述信息增益从最大值的一项开始从大到小依次建立分支,构建决策树;According to the information gain, branches are established in order from large to small starting from the item with the maximum value, so as to construct a decision tree;

根据所述历史互联网爬虫数据对所述决策树进行剪枝操作,在所述信息增益小于预设阈值时停止构建所述决策树。The decision tree is pruned according to the historical Internet crawler data, and construction of the decision tree is stopped when the information gain is less than a preset threshold.

在一种可选的方式中,所述程序710使所述处理器执行以下操作:In an optional manner, the program 710 enables the processor to perform the following operations:

根据所述历史行为特征获取与所述历史行为特征对应的经验熵与条件熵;Acquire the empirical entropy and conditional entropy corresponding to the historical behavior characteristics according to the historical behavior characteristics;

根据所述经验熵与所述条件熵计算信息增益。The information gain is calculated according to the empirical entropy and the conditional entropy.

在一种可选的方式中,所述程序710使所述处理器执行以下操作:In an optional manner, the program 710 enables the processor to perform the following operations:

将所述骚扰电话号码自动下发到所属公司通知机构关停;或者Automatically send the harassing phone number to the company to which it belongs to notify the agency to shut it down; or

将所述骚扰电话号码与已有自动处置系统进行关联形成自动关停。The harassing telephone number is associated with an existing automatic handling system to automatically shut it down.

本发明实施例通过获取通话记录话单数据以及互联网爬虫数据;根据所述通话记录话单数据应用构建的骚扰电话预警模型进行分析,获取预警数据;根据所述互联网爬虫数据对所述预警数据进行二次复核,获取真实的骚扰电话号码;将所述骚扰电话进行下发以关停所述骚扰电话号码,能够基于多种数据源进行处理,更加符合实用场景,能够提高检测的准确性和完备性。The embodiment of the present invention obtains call record bill data and Internet crawler data; analyzes the harassment call warning model constructed according to the call record bill data to obtain warning data; conducts a secondary review of the warning data according to the Internet crawler data to obtain the real harassment phone number; and sends the harassment call to shut down the harassment phone number. It can be processed based on multiple data sources, is more in line with practical scenarios, and can improve the accuracy and completeness of detection.

在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithm or display provided here is not inherently related to any specific computer, virtual system or other equipment. Various general systems can also be used together with the teaching based on this. According to the above description, it is obvious to construct the structure required for this type of system. In addition, the embodiment of the present invention is not directed to any specific programming language yet. It should be understood that various programming languages can be utilized to realize the content of the present invention described here, and the description of the above specific language is for disclosing the best mode of the present invention.

在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, a large number of specific details are described. However, it is understood that embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures and techniques are not shown in detail so as not to obscure the understanding of this description.

类似地,应当理解,为了精简本发明并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。Similarly, it should be understood that in order to streamline the present invention and aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of the present invention, various features of the embodiments of the present invention are sometimes grouped together into a single embodiment, figure, or description thereof. However, this method of disclosure should not be interpreted as reflecting an intention that the claimed invention requires more features than those expressly recited in each claim.

本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will appreciate that the modules in the devices in the embodiments may be adaptively changed and arranged in one or more devices different from the embodiments. The modules or units or components in the embodiments may be combined into one module or unit or component, and in addition they may be divided into a plurality of submodules or subunits or subcomponents. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstracts and drawings) and all processes or units of any method or device disclosed in this manner may be combined in any combination. Unless otherwise expressly stated, each feature disclosed in this specification (including the accompanying claims, abstracts and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above embodiments illustrate the present invention rather than limit it, and that those skilled in the art may design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference symbol between brackets shall not be construed as a limitation on the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "one" or "an" preceding an element does not exclude the presence of a plurality of such elements. The present invention may be implemented by means of hardware comprising a number of different elements and by means of a suitably programmed computer. In a unit claim that lists a number of devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third, etc. does not indicate any order. These words may be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be understood as limitations on the order of execution.

Claims (8)

The model construction unit is used for acquiring historical communication record call ticket data and historical internet crawler data and counting historical behavior characteristics according to the historical communication record call ticket data; constructing a decision tree according to the historical behavior characteristics and the historical internet crawler data, wherein the decision tree comprises: acquiring information gain according to the historical behavior characteristics; sequentially establishing branches from large to small according to one item of the maximum value of the information gain, and constructing a decision tree; pruning operation is carried out on the decision tree according to the historical internet crawler data, and the construction of the decision tree is stopped when the information gain is smaller than a preset threshold value; verifying the decision tree by adopting a random subsampling method or a self-help sampling method according to the number of samples, adjusting parameters of the decision tree, and determining the final decision tree; establishing a nuisance call early warning model by using the decision tree;
CN202010961602.9A2020-09-142020-09-14 Method, device and computing equipment for detecting abnormal harassment callsActiveCN114189585B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010961602.9ACN114189585B (en)2020-09-142020-09-14 Method, device and computing equipment for detecting abnormal harassment calls

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010961602.9ACN114189585B (en)2020-09-142020-09-14 Method, device and computing equipment for detecting abnormal harassment calls

Publications (2)

Publication NumberPublication Date
CN114189585A CN114189585A (en)2022-03-15
CN114189585Btrue CN114189585B (en)2024-08-27

Family

ID=80539037

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010961602.9AActiveCN114189585B (en)2020-09-142020-09-14 Method, device and computing equipment for detecting abnormal harassment calls

Country Status (1)

CountryLink
CN (1)CN114189585B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115022464A (en)*2022-05-062022-09-06中国联合网络通信集团有限公司Number processing method, system, computing device and storage medium
CN115221940A (en)*2022-06-012022-10-21国家计算机网络与信息安全管理中心 A method and system for analyzing abnormal behavior of historical communication records based on decision tree

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108462785A (en)*2017-02-212018-08-28中国移动通信集团浙江有限公司A kind of processing method and processing device of malicious call phone
CN110401779A (en)*2018-04-242019-11-01中国移动通信集团有限公司 A method, device and computer-readable storage medium for identifying telephone numbers

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9692885B2 (en)*2015-11-172017-06-27Microsoft Technology Licensing, LlcDetermining scam risk during a voice call
US10284720B2 (en)*2016-11-012019-05-07Transaction Network Services, Inc.Systems and methods for automatically conducting risk assessments for telephony communications
CN110147430A (en)*2019-04-252019-08-20上海欣方智能系统有限公司Harassing call recognition methods and system based on random forests algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108462785A (en)*2017-02-212018-08-28中国移动通信集团浙江有限公司A kind of processing method and processing device of malicious call phone
CN110401779A (en)*2018-04-242019-11-01中国移动通信集团有限公司 A method, device and computer-readable storage medium for identifying telephone numbers

Also Published As

Publication numberPublication date
CN114189585A (en)2022-03-15

Similar Documents

PublicationPublication DateTitle
US12184697B2 (en)AI-driven defensive cybersecurity strategy analysis and recommendation system
US20220210200A1 (en)Ai-driven defensive cybersecurity strategy analysis and recommendation system
US11218510B2 (en)Advanced cybersecurity threat mitigation using software supply chain analysis
US11184401B2 (en)AI-driven defensive cybersecurity strategy analysis and recommendation system
US20220239672A1 (en)Malware data clustering
US20240241752A1 (en)Risk profiling and rating of extended relationships using ontological databases
US20210019674A1 (en)Risk profiling and rating of extended relationships using ontological databases
US20200389495A1 (en)Secure policy-controlled processing and auditing on regulated data sets
US10862906B2 (en)Playbook based data collection to identify cyber security threats
US12225049B2 (en)System and methods for integrating datasets and automating transformation workflows using a distributed computational graph
WO2021216163A2 (en)Ai-driven defensive cybersecurity strategy analysis and recommendation system
CN112733045B (en)User behavior analysis method and device and electronic equipment
CN111724069A (en) Method, apparatus, device and storage medium for processing data
US20220405535A1 (en)Data log content assessment using machine learning
CN113282920B (en)Log abnormality detection method, device, computer equipment and storage medium
CN114189585B (en) Method, device and computing equipment for detecting abnormal harassment calls
CN115001934A (en)Industrial control safety risk analysis system and method
CN116963072A (en)Fraud user early warning method and device, electronic equipment and storage medium
CN118138284A (en)API interface data safety monitoring and automatic response method and system based on flow characteristic analysis
CN117421640A (en)API asset identification method, device, equipment and storage medium
US12321454B1 (en)Command line parsing for classification of process start alerts
CN114663100A (en)Transaction data processing method and device and terminal equipment
CN116033428A (en) Method and device for processing abnormal telecommunication users
CN118734263B (en) Service content digital management system and operation method based on data processing
US20240195841A1 (en)System and method for manipulation of secure data

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp