CN112134719A

Movatterモバイル変換

Info

Publication number: CN112134719A
Application number: CN201910556924.2A
Authority: CN
Inventors: 穆青
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2020-12-25
Also published as: WO2020258982A1

Abstract

Translated fromChinese

本发明实施例公开了一种分析基站安全日志的方法和系统，所述方法包括：数据汇聚系统基于第一分布式集群基站将基站实时上报的安全日志缓存到第一数据缓存系统中；日志特征计算系统基于第二分布式集群计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统中；特征值实时分析系统基于流式计算框架对安全日志的特征值进行实时分析得到分析结果；数据呈现装置显示分析结果。本发明实施例基于分布式集群和流式计算框架实现了对基站实时上报的安全日志的实时分析，从而缩短了高危安全问题的发现时间，大幅度降低了人工分析的时间，缩短了问题反馈周期，减少了人力投入，提升了管理效率。

Embodiments of the present invention disclose a method and system for analyzing base station security logs. The method includes: a data aggregation system caches, based on a first distributed cluster base station, a security log reported by the base station in real time into a first data cache system; log features The computing system calculates the characteristic value of the security log based on the second distributed cluster, and caches the characteristic value of the security log in the second data cache system; Analysis results; the data presentation device displays the analysis results. The embodiment of the present invention realizes real-time analysis of the security log reported by the base station in real time based on the distributed cluster and the streaming computing framework, thereby shortening the discovery time of high-risk security problems, greatly reducing the time for manual analysis, and shortening the problem feedback period , reducing manpower input and improving management efficiency.

Description

Translated fromChinese

一种分析基站安全日志的方法和系统A method and system for analyzing base station security logs

技术领域technical field

本发明实施例涉及但不限于计算机和通信领域，尤指一种分析基站安全日志的方法和系统。The embodiments of the present invention relate to, but are not limited to, the fields of computers and communications, and more particularly, to a method and system for analyzing a security log of a base station.

背景技术Background technique

随着5G无线网络技术的发展，无线接入网络中基站数量越来越多，对基站的指令操作和数据操作以及访问安全的采集和分析需求越来越重要。传统的基站安全日志是通过定时任务周期从基站上采集得到的，采集后对所有的安全日志进行解析入库，故障发生后，人工进行分析，这一流程存在获取数据的延迟较大，人工分析可能会由于忽略了某些关键安全日志而导致问题的定位和处理出现极大的拖延。5G，随着无线网络和物联网的发展极大扩展了网络规模，面对现网每天成千上万条安全日志，需要投入大量人力。With the development of 5G wireless network technology, the number of base stations in the wireless access network is increasing, and the demand for command operation and data operation of base stations, as well as the collection and analysis of access security, is more and more important. The traditional base station security log is collected from the base station through a regular task cycle. After collection, all security logs are parsed and stored. After a fault occurs, manual analysis is performed. This process has a large delay in acquiring data, and manual analysis is required. There may be significant delays in locating and addressing problems due to ignoring some critical security logs. 5G, with the development of wireless networks and the Internet of Things, has greatly expanded the network scale. Faced with thousands of security logs every day on the existing network, a lot of manpower needs to be invested.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供了一种分析基站安全日志的方法和系统，能够实现对基站的安全日志的实时分析，从而缩短高危安全问题的发现时间。Embodiments of the present invention provide a method and system for analyzing a security log of a base station, which can realize real-time analysis of the security log of a base station, thereby shortening the discovery time of high-risk security problems.

本发明实施例提供了一种分析基站安全日志的方法，包括：An embodiment of the present invention provides a method for analyzing a base station security log, including:

数据汇聚系统基于第一分布式集群基站将基站实时上报的安全日志缓存到第一数据缓存系统中；The data aggregation system caches, based on the first distributed cluster base station, the security log reported by the base station in real time into the first data cache system;

日志特征计算系统基于第二分布式集群计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统中；The log feature calculation system calculates the feature value of the security log based on the second distributed cluster, and caches the feature value of the security log in the second data cache system;

特征值实时分析系统基于流式计算框架对安全日志的特征值进行实时分析得到分析结果；The eigenvalue real-time analysis system performs real-time analysis on the eigenvalues of the security log based on the streaming computing framework to obtain the analysis results;

数据呈现装置显示分析结果。The data presentation device displays the analysis results.

本发明实施例提供了一种分析基站安全日志的系统，包括：An embodiment of the present invention provides a system for analyzing base station security logs, including:

数据汇聚系统，用于基于第一分布式集群基站将基站实时上报的安全日志缓存到第一数据缓存系统中；a data aggregation system, configured to cache the security log reported by the base station in real time into the first data cache system based on the first distributed cluster base station;

日志特征计算系统，用于基于第二分布式集群计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统中；a log feature calculation system, configured to calculate the feature value of the security log based on the second distributed cluster, and cache the feature value of the security log in the second data cache system;

特征值实时分析系统，用于基于流式计算框架对安全日志的特征值进行实时分析得到分析结果；The characteristic value real-time analysis system is used to analyze the characteristic value of the security log in real time based on the streaming computing framework to obtain the analysis result;

数据呈现装置，用于显示分析结果。A data presentation device for displaying analysis results.

本发明实施例包括：数据汇聚系统基于第一分布式集群将基站实时上报的安全日志缓存到第一数据缓存系统中；日志特征计算系统基于第二分布式集群计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统中；特征值实时分析系统基于流式计算框架对安全日志的特征值进行实时分析得到分析结果；数据呈现装置显示分析结果。本发明实施例基于分布式集群和流式计算框架实现了对基站实时上报的安全日志的实时分析，从而缩短了高危安全问题的发现时间，大幅度降低了人工分析的时间，缩短了问题反馈周期，减少了人力投入，提升了管理效率。The embodiment of the present invention includes: the data aggregation system caches the security log reported by the base station in real time in the first data cache system based on the first distributed cluster; the log feature calculation system calculates the feature value of the security log based on the second distributed cluster, and the security log The eigenvalues of the log are cached in the second data cache system; the eigenvalue real-time analysis system performs real-time analysis on the eigenvalues of the security log based on the streaming computing framework to obtain analysis results; the data presentation device displays the analysis results. The embodiment of the present invention realizes real-time analysis of the security log reported by the base station in real time based on the distributed cluster and the streaming computing framework, thereby shortening the discovery time of high-risk security problems, greatly reducing the time for manual analysis, and shortening the problem feedback period , reducing manpower input and improving management efficiency.

本发明实施例的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明实施例而了解。本发明实施例的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of embodiments of the invention will be set forth in the description that follows, and in part will be apparent from the description, or learned by practice of the embodiments of the invention. The objectives and other advantages of the embodiments of the invention may be realized and attained by the structure particularly pointed out in the description, claims and drawings.

附图说明Description of drawings

附图用来提供对本发明实施例技术方案的进一步理解，并且构成说明书的一部分，与本发明实施例的实施例一起用于解释本发明实施例的技术方案，并不构成对本发明实施例技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solutions of the embodiments of the present invention, and constitute a part of the specification. They are used to explain the technical solutions of the embodiments of the present invention together with the embodiments of the present invention, and do not constitute the technical solutions of the embodiments of the present invention. limits.

图1为本发明一个实施例提出的分析基站安全日志的方法的流程图；1 is a flowchart of a method for analyzing a base station security log proposed by an embodiment of the present invention;

图2为本发明另一个实施例提出的分析基站安全日志的系统的结构组成示意图。FIG. 2 is a schematic structural composition diagram of a system for analyzing a base station security log proposed by another embodiment of the present invention.

具体实施方式Detailed ways

下文中将结合附图对本发明实施例进行详细说明。需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互任意组合。Hereinafter, the embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments of the present invention and the features of the embodiments may be arbitrarily combined with each other unless there is conflict.

在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行。并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。The steps shown in the flowcharts of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

参见图1，本发明一个实施例提出了一种分析基站安全日志的方法，包括：Referring to FIG. 1, an embodiment of the present invention provides a method for analyzing a security log of a base station, including:

步骤100、数据汇聚系统基于第一分布式集群基站将基站实时上报的安全日志缓存到第一数据缓存系统中。Step 100: Based on the first distributed cluster base station, the data aggregation system caches the security log reported by the base station in real time into the first data cache system.

在本发明实施例中，安全日志包括一条或一条以上记录，每一条记录包括：日志类型、日志级别、基站标识(ID)、基站互联网协议(IP，Internet Protocol)地址、业务名称、PID等基本信息，也包括日志的具体内容如：安全日志的产生源、产生原因等。In this embodiment of the present invention, the security log includes one or more records, and each record includes: log type, log level, base station identification (ID), base station Internet Protocol (IP, Internet Protocol) address, service name, PID and other basic information Information, including the specific content of the log, such as: the source of the security log, the cause, etc.

例如，某一个安全日志的某一条记录为：For example, a certain record of a certain security log is:

45673491238900123<37>1 2017-12-13T05:42:16.156Z gnb1 g2z7p -2–src_ip:192.254.1.100,u:ad,content:login fail.45673491238900123<37>1 2017-12-13T05:42:16.156Z gnb1 g2z7p -2–src_ip:192.254.1.100,u:ad,content:login fail.

其中，45673491238900123为记录的唯一标识(即安全日志标识)，<37>为PRI()，1为版本号，2017-12-13T05:42:16.156Z为记录的产生时间，gnb1为主机名，g2z7p为服务名，-为PID号，无PID时用“-”占位，2为事件号，–为结构体定义，无结构体时用“—”占位，src_ip:192.254.1.100,u:ad,content:login fail为日志的具体内容。Among them, 45673491238900123 is the unique identifier of the record (that is, the security log identifier), <37> is PRI(), 1 is the version number, 2017-12-13T05:42:16.156Z is the record generation time, gnb1 is the host name, g2z7p is the service name, - is the PID number, "-" is used to occupy the place when there is no PID, 2 is the event number, – is the structure definition, and "—" is used to occupy the place when there is no structure, src_ip:192.254.1.100,u:ad ,content:login fail is the specific content of the log.

由上说明，每一条日志记录包括PRI，日志版本，时间，基站标识(即主机名)，业务名称(即服务名)，PID号，事件号，结构体，日志内容组成。As explained above, each log record includes PRI, log version, time, base station identifier (ie host name), business name (ie service name), PID number, event number, structure, and log content.

其中，通过PRI值可以解析出日志类型和日志级别；不同事件号规定了日志内容的不同格式，在本例中2号事件为用户登录失败，则日志内容包含三个属性：src_ip为登录用户的IP；u为用户使用的登录名；content为说明。Among them, the log type and log level can be parsed through the PRI value; different event numbers specify different formats of log content. In this example, event No. 2 is a user login failure, and the log content contains three attributes: src_ip is the login user's IP; u is the login name used by the user; content is the description.

在本发明实施例中，基站的参数修改或者访问登陆都会产生安全日志，产生安全日志后实时上报。In this embodiment of the present invention, a security log will be generated for parameter modification or access login of the base station, and the generated security log will be reported in real time.

在一个示例性实例中，第一数据缓存系统为消息中间件，基于开源分布式消息系统Kafka实现。In an exemplary instance, the first data cache system is message middleware, which is implemented based on the open source distributed message system Kafka.

在本发明另一个实施例中，数据汇聚系统基于第一分布式集群将安全日志缓存到第一数据缓存系统中包括：In another embodiment of the present invention, the data aggregation system caching the security log into the first data caching system based on the first distributed cluster includes:

所述数据汇聚系统基于第一分布式集群为所述安全日志分配安全日志标识，将所述安全日志标识和所述安全日志之间的第一对应关系缓存到所述第一数据缓存系统中；The data aggregation system allocates a security log identifier to the security log based on the first distributed cluster, and caches the first correspondence between the security log identifier and the security log in the first data cache system;

所述安全日志的特征值包括所述安全日志标识；The characteristic value of the security log includes the security log identifier;

所述方法还包括：数据存储系统将所述第一对应关系、所述安全日志的特征值和所述分析结果进行持久化保存。The method further includes: the data storage system persistently saves the first correspondence, the characteristic value of the security log, and the analysis result.

在一个示例性实例中，所述第一分布式集群包括：接入网关、zookeeper和两个或两个以上采集服务器(Collection-Server)；所述数据汇聚系统基于第一分布式集群为所述安全日志分配安全日志标识，将所述安全日志标识和所述安全日志之间的第一对应关系缓存到所述第一数据缓存系统中包括：In an exemplary example, the first distributed cluster includes: an access gateway, a zookeeper, and two or more collection servers (Collection-Servers); the data aggregation system is based on the first distributed cluster for the The security log assigns a security log identifier, and caching the first correspondence between the security log identifier and the security log into the first data cache system includes:

所述zookeeper维护所述两个或两个以上采集服务器的集群状态；其中，所述集群状态包括：登陆集群状态或退出集群状态；The zookeeper maintains the cluster state of the two or more collection servers; wherein, the cluster state includes: logging in to the cluster state or exiting the cluster state;

所述接入网关从所述zookeeper中获取所述两个或两个采集服务器的集群状态，将所述安全日志分配给所述两个或两个采集服务器中，处于登陆集群状态的所述采集服务器；The access gateway obtains the cluster state of the two or two collection servers from the zookeeper, and allocates the security log to the two or two collection servers, and the collection server in the log-in cluster state server;

所述处于登陆集群状态的采集服务器为所述安全日志分配安全日志标识，将所述安全日志标识和所述安全日志之间的第一对应关系缓存到所述第一数据缓存系统中。The collection server in the state of logging into the cluster assigns a security log identifier to the security log, and caches the first correspondence between the security log identifier and the security log in the first data cache system.

其中，接入网关为基站暴露统一的接入入口。The access gateway exposes a unified access portal for the base station.

在一个示例性实例中，该方法还包括：zookeeper为采集服务器分配标识。In an exemplary instance, the method further includes: zookeeper assigning an identifier to the collection server.

在一个示例性实例中，安全日志标识为每一个安全日志的全局唯一的标识，安全日志标识可以采用Twitter开源的分布式全局标识分配算法snowflake来进行分配。In an exemplary instance, the security log identifier is a globally unique identifier of each security log, and the security log identifier can be allocated by using the distributed global identifier allocation algorithm snowflake open sourced by Twitter.

在一个示例性实例中，所述数据存储系统包括：两个或两个以上数据服务器和搜索服务器集群；In an illustrative example, the data storage system includes: two or more data servers and a search server cluster;

所述数据存储系统将所述第一对应关系、所述安全日志的特征值和所述分析结果进行持久化保存包括：The data storage system persistently storing the first correspondence, the characteristic value of the security log, and the analysis result includes:

对于每一个所述数据服务器，所述数据服务器从所述第一数据缓存系统中获取所述第一对应关系，从所述第二数据缓存系统中获取所述特征值，将所述第一对应关系和所述特征值保存到所述搜索服务器集群中；For each of the data servers, the data server obtains the first correspondence from the first data cache system, obtains the feature value from the second data cache system, and stores the first correspondence The relationship and the feature value are stored in the search server cluster;

数据服务器从特征值实时分析系统中获取所述分析结果，将所述分析结果保存到所述搜索服务器集群中。The data server obtains the analysis result from the characteristic value real-time analysis system, and saves the analysis result in the search server cluster.

在一个示例性实例中，搜索服务器集群为ES(ElasticSearch)集群。In an exemplary instance, the search server cluster is an ES (ElasticSearch) cluster.

在另一个示例性实例中，数据服务器还维护第一对应关系和特征值在搜索服务器集群中的存储时间。In another illustrative example, the data server also maintains the storage time of the first correspondence and the feature value in the search server cluster.

步骤101、日志特征计算系统基于第二分布式集群计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统中。Step 101: The log feature calculation system calculates the feature value of the security log based on the second distributed cluster, and caches the feature value of the security log in the second data cache system.

在一个示例性实例中，第二数据缓存系统为消息中间件，基于开源分布式消息系统Kafka实现。In an exemplary instance, the second data cache system is message middleware, which is implemented based on the open source distributed message system Kafka.

在一个示例性实例中，所述第二分布式集群包括：两个或两个以上特征计算服务器；所述日志特征计算系统基于第二分布式集群计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统中包括：In an exemplary example, the second distributed cluster includes: two or more feature calculation servers; the log feature calculation system calculates the feature value of the security log based on the second distributed cluster, and calculates the feature value of the security log The caching of values into the second data caching system includes:

对于每一个所述特征计算服务器，所述特征计算服务器计算所述安全日志的特征值，将所述安全日志的特征值缓存到所述第二数据缓存系统中。For each of the feature calculation servers, the feature calculation server calculates the feature value of the security log, and caches the feature value of the security log in the second data cache system.

在一个示例性实例中，特征计算服务器计算所述安全日志的特征值包括：In an exemplary instance, the feature calculation server calculating the feature value of the security log includes:

所述特征计算服务器从所述安全日志中提取预先定义的一个或一个以上特征向量值，将所述一个或一个以上特征向量值组成所述特征值。The feature calculation server extracts one or more predefined feature vector values from the security log, and composes the one or more feature vector values into the feature value.

例如，如表1所示，某一安全日志包括以下字段：id、PRI、time、hostname、app-name和msgld，其中，id为记录的唯一标识，PRI为主键值，time为记录的产生事件，hostname为主机名，app-name为服务名，magld为事件ID；For example, as shown in Table 1, a security log includes the following fields: id, PRI, time, hostname, app-name, and msgld, where id is the unique identifier of the record, PRI is the primary key value, and time is the generation of the record Event, hostname is the host name, app-name is the service name, and magld is the event ID;

采用的特征向量包括：id、Facility、Severity、Time、NBId、ServerName、EventId；其中，id为记录的唯一标识，facility为设施码，Severity为严重级别，time为记录的产生时间，NBId为基站标识，SeverName为服务名，EventId为事件ID。The feature vectors used include: id, Facility, Severity, Time, NBId, ServerName, EventId; where id is the unique identifier of the record, facility is the facility code, Severity is the severity level, time is the record generation time, and NBId is the base station identifier , SeverName is the service name, and EventId is the event ID.

其中，Facility、Severity均根据字段PRI计算获得，其他特征向量之间从安全日志中提取获得，如表1所示，Facility＝PRI>>3，Severity＝PRI&0x7。Among them, Facility and Severity are calculated according to the field PRI, and other feature vectors are extracted from the security log, as shown in Table 1, Facility=PRI>>3, Severity=PRI&0x7.

特征向量Feature vector安全日志中的字段Fields in the Security LogIdIdIdIdFacilityFacilityPRI>>3PRI>>3SeveritySeverityPRI&0x7PRI&0x7TimeTimeTimeTimeNBIdNBIdHostnameHostnameServerNameServerNameApp-nameApp-nameEventIdEventIdMsgldMsgld

表1Table 1

其中，特征向量Id的提取，Id是数据汇聚系统为日志记录分配的唯一标识，Id的提取无算法；Among them, the extraction of the feature vector Id, Id is the unique identifier allocated by the data aggregation system for the log record, and there is no algorithm for the extraction of Id;

特征向量Facility的提取，Facility是通过日志记录中PRI的值通过位运算(右移3位)得出；Extraction of feature vector Facility, Facility is obtained by bit operation (3-bit right shift) through the value of PRI in the log record;

特征向量Severity的提取，Severity是通过日志记录中PRI的值通过位运算(位于0x7)得出；Extraction of feature vector Severity, Severity is obtained by bit operation (located at 0x7) through the value of PRI in the log record;

特征向量Time的提取，Time是通过日志记录中的Time获取；Extraction of feature vector Time, Time is obtained through Time in the log record;

特征向量NBId的提取，NBId是通过日志记录中的Hostname获取；Extraction of feature vector NBId, NBId is obtained through the Hostname in the log record;

特征向量ServerName的提取，ServerName是通过日志记录中的App-name获取；Extraction of feature vector ServerName, ServerName is obtained through App-name in the log record;

特征向量EventId的提取，EventId是通过日志记录中的Msgld获取，MsgId还包含一些内部的特殊标识，在提取EventId时会将其剥离；Extraction of feature vector EventId, EventId is obtained through Msgld in the log record, MsgId also contains some internal special identifiers, which will be stripped when extracting EventId;

其他一些特征向量，如客户端IP，登录用户名等从日志内容中获取，日志内容为内部定义的归一化格式，适用于各服务模块。Some other feature vectors, such as client IP, login user name, etc., are obtained from the log content. The log content is an internally defined normalized format, which is applicable to each service module.

在一个示例性实例中，为了将特征值和安全日志进行关联，可以将安全日志标识作为其中一个特征向量。In an exemplary instance, in order to associate the feature value with the security log, the security log identifier may be used as one of the feature vectors.

在本发明实施例中，不同的特征计算服务器可以采用相同的特征向量，也可以采用不同的特征向量，以灵活满足业务需要。In this embodiment of the present invention, different feature calculation servers may use the same feature vector, or may use different feature vectors, so as to flexibly meet business needs.

步骤102、特征值实时分析系统基于流式计算框架对安全日志的特征值进行实时分析得到分析结果。Step 102: The characteristic value real-time analysis system performs real-time analysis on the characteristic value of the security log based on the streaming computing framework to obtain an analysis result.

在本发明实施例中，特征向量Facility提取了产生日志的程序的类型信息，Severity提取了日志记录中的严重级别，ServerName提取了基站产生安全日志的程序模块，因此，可以通过预定义Facility、Severity和ServerName构成的评分算法计算日志记录的分值，通过分值来评估日志是否是主要程序模块严重的安全问题；In the embodiment of the present invention, the feature vector Facility extracts the type information of the program that generates the log, the Severity extracts the severity level in the log record, and the ServerName extracts the program module for the base station to generate the security log. Therefore, the predefined Facility, Severity The scoring algorithm composed of ServerName and ServerName calculates the score of the log record, and uses the score to evaluate whether the log is a serious security problem in the main program module;

特征向量NBId提取了基站标识，因此可以识别那个基站产生了安全问题。The feature vector NBId extracts the base station identity, so it can identify which base station has caused a security problem.

在一个示例性实例中，流式计算框架可以是JStorm流式计算框架，根据业务场景定义topology并部署到JStorm集群，topology对特征值进行分析，将分析结果发送给数据呈现装置进行展示，将分析结果保存到数据存储系统中。In an exemplary instance, the streaming computing framework may be the JStorm streaming computing framework. A topology is defined according to a business scenario and deployed to a JStorm cluster. The topology analyzes the feature values, and sends the analysis results to a data presentation device for display. The results are saved to the data storage system.

其中，Topology是运行在JStorm集群上的一段代码，即为一个数据流转换图，其定义了数据的获取，计算和分发的流程。Among them, Topology is a piece of code running on the JStorm cluster, that is, a data flow transformation graph, which defines the process of data acquisition, calculation and distribution.

在一个示例性实例中，定义的topology可以包含但不限于以下一个或多个场景：In an illustrative example, the defined topology may include, but is not limited to, one or more of the following scenarios:

满足第一预设条件的特征值的数量最大的N个基站；N base stations with the largest number of eigenvalues satisfying the first preset condition;

在预设时间内满足第二预设条件的特征值的数量；The number of eigenvalues that satisfy the second preset condition within the preset time;

满足第三预设条件的安全日志。A security log that satisfies the third preset condition.

步骤103、数据呈现装置显示分析结果。Step 103: The data presentation device displays the analysis result.

在本发明另一个实施例中，该方法还包括：基站实时上报安全日志。In another embodiment of the present invention, the method further includes: the base station reports the security log in real time.

在本发明实施例中，基站上报的安全日志的消息可以符合任何协议定义，例如RFC5424协议等。In this embodiment of the present invention, the message of the security log reported by the base station may conform to any protocol definition, such as the RFC5424 protocol.

在本发明另一个实施例中，该方法还包括：In another embodiment of the present invention, the method further includes:

所述数据呈现装置在所述数据存储系统中查找所述分析结果对应的所述第一对应关系和所述安全日志的特征值，将查找到的所述第一对应关系和所述安全日志的特征值进行关联，得到所述分析结果、查找到的所述第一对应关系和所述安全日志的特征值之间的第二对应关系，显示所述第二对应关系；The data presentation device searches the data storage system for the first correspondence corresponding to the analysis result and the characteristic value of the security log, and compares the found first correspondence with the characteristic value of the security log. The characteristic values are correlated to obtain the second correspondence between the analysis result, the found first correspondence and the characteristic values of the security log, and the second correspondence is displayed;

或者，所述数据呈现装置在所述数据存储系统中查找用户输入的查找指令对应的所述第一对应关系和所述安全日志的特征值，将查找到的所述第一对应关系和所述安全日志的特征值进行关联，得到查找到的所述分析结果、所述第一对应关系和所述安全日志的特征值之间的第二对应关系，显示所述第二对应关系。Alternatively, the data presentation device searches the data storage system for the first correspondence corresponding to the search instruction input by the user and the characteristic value of the security log, and compares the found first correspondence with the The eigenvalues of the security log are correlated to obtain a second correspondence between the found analysis result, the first correspondence and the eigenvalues of the security log, and the second correspondence is displayed.

具体在进行查找时，可以基于安全日志标识进行查找。Specifically, when searching, the search can be performed based on the security log identifier.

本发明实施例基于分布式集群和流式计算框架实现了对基站实时上报的安全日志的实时分析，从而缩短了高危安全问题的发现时间，大幅度降低了人工分析的时间，缩短了问题反馈周期，减少了人力投入，提升了管理效率。The embodiment of the present invention realizes real-time analysis of the security log reported by the base station in real time based on the distributed cluster and the streaming computing framework, thereby shortening the discovery time of high-risk security problems, greatly reducing the time for manual analysis, and shortening the problem feedback period , reducing manpower input and improving management efficiency.

下面列举几个实例说明上述方法的具体实现，所列举的例子不用于限定本发明实施例的保护范围。Several examples are listed below to illustrate the specific implementation of the above method, and the listed examples are not used to limit the protection scope of the embodiments of the present invention.

实例1Example 1

本实例中，统计Serverity在Warning及以上级别的安全日志的数量，并显示排名前20位的基站。In this example, the number of security logs of Serverity at Warning level and above is counted, and the top 20 base stations are displayed.

首先定义特征值包括以下的特征向量值：id、Facility、Severity、Time、NBId、ServerName，这些特征向量与安全日志中的字段之间的对应关系如表2所示。First, define the eigenvalues including the following eigenvector values: id, Facility, Severity, Time, NBId, ServerName. The correspondence between these eigenvectors and the fields in the security log is shown in Table 2.

特征向量Feature vector安全日志中的字段Fields in the Security LogIdIdIdIdFacilityFacilityPRI>>3PRI>>3SeveritySeverityPRI&0x7PRI&0x7TimeTimeTimeTimeNBIdNBIdHostnameHostnameServerNameServerNameApp-nameApp-name

表2Table 2

要统计Warning及以上级别的安全日志的数量，需要依据特征向量Severity，根据安全日志所遵循的RFC5424协议的定义，Severity可以取到如表3所示的8个值。To count the number of security logs at Warning and above levels, it is necessary to rely on the feature vector Severity. According to the definition of the RFC5424 protocol that security logs follow, Severity can take eight values as shown in Table 3.

表3table 3

如表3所示可知Warning的值为4，Waring及以上级别的值为：0,1,2,3,4共5个值。由于是对基站的排序，需要依据特征向量NBId，NBId代表基站的唯一ID对安全日志的数量进行统计，也就是对于相同NBId的基站，统计特征向量Severity为0,1,2,3,4中的任何一个的值的安全日志的数量，可以每30分钟计算一次，每次计算排名前20位的基站发送至数据呈现装置进行显示。As shown in Table 3, it can be seen that the value of Warning is 4, and the value of Waring and above is 5 values: 0, 1, 2, 3, and 4. Since the base station is sorted, it is necessary to count the number of security logs according to the feature vector NBId, which represents the unique ID of the base station. That is, for the base stations with the same NBId, the statistical feature vector Severity is 0, 1, 2, 3, and 4. The number of security logs of any one value can be calculated every 30 minutes, and the top 20 base stations in each calculation are sent to the data presentation device for display.

实例2Example 2

本实例中，统计5分钟窗口内基站登陆的数量。In this example, the number of base station logins within a 5-minute window is counted.

首先定义特征值包括以下的特征向量值：id、Facility、Severity、Time、NBId、ServerName、EventId，这些特征向量与安全日志中的字段之间的对应关系如表1所示。First, define the eigenvalues including the following eigenvector values: id, Facility, Severity, Time, NBId, ServerName, EventId. The correspondence between these eigenvectors and the fields in the security log is shown in Table 1.

要统计5分钟窗口内基站登陆的数量，需要先明确基站登陆的字段，例如安全日志中登陆时间的msgId定义为1，需要特征向量EventId的值为1。也就是说，对于相同NBId的基站，统计特征向量EventId的值为1的安全日志的数量，数据呈现装置显示5分钟内每一个基站登陆的数量(即特征向量EventId的值为1的安全日志的数量)。To count the number of base station logins within a 5-minute window, it is necessary to clarify the fields of base station logins. For example, the msgId of the login time in the security log is defined as 1, and the value of the feature vector EventId is required to be 1. That is to say, for base stations with the same NBId, count the number of security logs whose feature vector EventId value is 1, and the data presentation device displays the number of logins for each base station within 5 minutes (that is, the number of security logs whose feature vector EventId value is 1). quantity).

实例3Example 3

本实例中，获取Serverity为Critical的安全日志。In this example, the security log whose Serverity is Critical is obtained.

要获取Critical级别的安全日志，需要依据特征向量Severity，根据安全日志所遵循的RFC5424协议的定义，Severity可以取到如表3所示的8个值。To obtain critical-level security logs, you need to use the feature vector Severity. According to the definition of the RFC5424 protocol that the security log follows, the Severity can obtain 8 values as shown in Table 3.

如表3所示可知Critical的值为2。由于是对基站的统计，需要依据特征向量NBId，NBId代表基站的唯一ID对安全日志的数量进行统计，也就是对于相同NBId的基站，获取特征向量Severity为2特征值，根据特征值中的安全日志标识获取对应的安全日志，获得的安全日志发送至数据呈现装置进行显示。As shown in Table 3, the value of Critical is 2. Since it is statistics on base stations, it is necessary to count the number of security logs according to the feature vector NBId, which represents the unique ID of the base station. That is, for the base stations with the same NBId, the obtained feature vector Severity is 2 eigenvalues, according to the security in the eigenvalues. The log identifier acquires the corresponding security log, and the acquired security log is sent to the data presentation device for display.

参见图2，本发明另一个实施例提出了一种分析基站安全日志的系统，包括：Referring to FIG. 2 , another embodiment of the present invention provides a system for analyzing a security log of a base station, including:

数据汇聚系统201，用于基于第一分布式集群2011基站将基站实时上报的安全日志缓存到第一数据缓存系统202中；The data aggregation system 201 is configured to cache the security log reported by the base station in real time in the first data cache system 202 based on the base station of the first distributed cluster 2011;

日志特征计算系统203，用于基于第二分布式集群2031计算安全日志的特征值，将安全日志的特征值缓存到第二数据缓存系统204中；The log feature calculation system 203 is configured to calculate the feature value of the security log based on the second distributed cluster 2031, and cache the feature value of the security log in the second data cache system 204;

特征值实时分析系统205，用于基于流式计算框架2051对安全日志的特征值进行实时分析得到分析结果；The characteristic value real-time analysis system 205 is used to perform real-time analysis on the characteristic value of the security log based on the streaming computing framework 2051 to obtain the analysis result;

数据呈现装置206，用于显示分析结果。The data presentation device 206 is used for displaying the analysis result.

在本发明另一个实施例中，还包括：基站207，用于实时上报安全日志。In another embodiment of the present invention, it further includes: a base station 207, configured to report the security log in real time.

在本发明实施例中，所述数据汇聚系统201具体用于：In this embodiment of the present invention, the data aggregation system 201 is specifically used for:

基于第一分布式集群2011为所述安全日志分配安全日志标识，将所述安全日志标识和所述安全日志之间的第一对应关系缓存到所述第一数据缓存系统中；Allocate a security log identifier to the security log based on the first distributed cluster 2011, and cache the first correspondence between the security log identifier and the security log in the first data cache system;

所述系统还包括：The system also includes:

数据存储系统208，用于将所述第一对应关系、所述安全日志的特征值和所述分析结果进行持久化保存。The data storage system 208 is configured to persistently store the first correspondence, the characteristic value of the security log, and the analysis result.

在本发明实施例中，所述数据呈现装置206还用于：In this embodiment of the present invention, the data presentation apparatus 206 is further configured to:

在所述数据存储系统中查找所述分析结果对应的所述第一对应关系和所述安全日志的特征值，将查找到的所述第一对应关系和所述安全日志的特征值进行关联，得到所述分析结果、查找到的所述第一对应关系和所述安全日志的特征值之间的第二对应关系，显示所述第二对应关系；Searching for the first correspondence corresponding to the analysis result and the characteristic value of the security log in the data storage system, and associating the found first correspondence with the characteristic value of the security log, obtaining the second correspondence between the analysis result, the found first correspondence and the characteristic value of the security log, and displaying the second correspondence;

或者，在所述数据存储系统中查找用户输入的查找指令对应的所述第一对应关系和所述安全日志的特征值，将查找到的所述第一对应关系和所述安全日志的特征值进行关联，得到查找到的所述分析结果、所述第一对应关系和所述安全日志的特征值之间的第二对应关系，显示所述第二对应关系。Or, search the data storage system for the first correspondence corresponding to the search instruction input by the user and the characteristic value of the security log, and compare the found first correspondence with the characteristic value of the security log Correlation is performed to obtain a second correspondence between the found analysis result, the first correspondence and the characteristic value of the security log, and the second correspondence is displayed.

在本发明实施例中，所述第一分布式集群2011包括：接入网关2012、zookeeper2013和两个或两个以上采集服务器2014；In this embodiment of the present invention, the first distributed cluster 2011 includes: an access gateway 2012, a zookeeper 2013, and two ormore collection servers 2014;

所述zookeeper2013，用于维护所述两个或两个以上采集服务器的集群状态；其中，所述集群状态包括：登陆集群状态或退出集群状态；The zookeeper2013 is used to maintain the cluster state of the two or more collection servers; wherein, the cluster state includes: logging into the cluster state or exiting the cluster state;

所述接入网关2012，用于从所述zookeeper中获取所述两个或两个采集服务器的集群状态，将所述安全日志分配给所述两个或两个采集服务器中，处于登陆集群状态的所述采集服务器；The access gateway 2012 is configured to obtain the cluster status of the two or two collection servers from the zookeeper, allocate the security log to the two or two collection servers, and be in the login cluster status the collection server;

所述处于登陆集群状态的采集服务器2014，用于为所述安全日志分配安全日志标识，将所述安全日志标识和所述安全日志之间的第一对应关系缓存到所述第一数据缓存系统中。Thecollection server 2014 in the login cluster state is configured to assign a security log identifier to the security log, and cache the first correspondence between the security log identifier and the security log to the first data cache system middle.

在本发明实施例中，所述数据存储系统208包括：两个或两个以上数据服务器2081和搜索服务器集群2082；In this embodiment of the present invention, the data storage system 208 includes: two or more data servers 2081 and a search server cluster 2082;

所述数据服务器2081，用于从所述第一数据缓存系统中获取所述第一对应关系，从所述第二数据缓存系统中获取所述特征值，根据所述第一对应关系和所述特征值确定所述第二对应关系，将所述第二对应关系保存到所述搜索服务器集群2082中；The data server 2081 is configured to obtain the first correspondence from the first data caching system, and obtain the feature value from the second data caching system, according to the first correspondence and the The characteristic value determines the second correspondence, and saves the second correspondence in the search server cluster 2082;

从特征值实时分析系统201中获取所述分析结果，将所述分析结果保存到所述搜索服务器集群2082中。The analysis result is obtained from the feature value real-time analysis system 201 , and the analysis result is saved in the search server cluster 2082 .

在本发明实施例中，所述第二分布式集群2021包括：两个或两个以上特征计算服务器2022；In this embodiment of the present invention, the second distributed cluster 2021 includes: two or more feature computing servers 2022;

所述特征计算服务器2022，用于计算所述安全日志的特征值，将所述安全日志的特征值缓存到所述第二数据缓存系统中。The feature calculation server 2022 is configured to calculate the feature value of the security log, and cache the feature value of the security log in the second data cache system.

在本发明实施例中，所述特征计算服务器2022具体用于采用以下方式实现计算所述安全日志的特征值：In this embodiment of the present invention, the feature calculation server 2022 is specifically configured to calculate the feature value of the security log in the following manner:

从所述安全日志中提取预先定义的一个或一个以上特征向量值，将所述一个或一个以上特征向量值组成所述特征值。One or more pre-defined eigenvector values are extracted from the security log, and the one or more eigenvector values are composed of the eigenvalues.

上述分析基站安全日志的系统的具体实现过程与前述实施例分析基站安全日志的方法的具体实现过程相同，这里不再赘述。The specific implementation process of the system for analyzing the security log of the base station is the same as the specific implementation process of the method for analyzing the security log of the base station in the foregoing embodiment, which will not be repeated here.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中，在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分；例如，一个物理组件可以具有多个功能，或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器，如数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, functional modules/units in the systems, and devices can be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components Components execute cooperatively. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data flexible, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

虽然本发明实施例所揭露的实施方式如上，但所述的内容仅为便于理解本发明实施例而采用的实施方式，并非用以限定本发明实施例。任何本发明实施例所属领域内的技术人员，在不脱离本发明实施例所揭露的精神和范围的前提下，可以在实施的形式及细节上进行任何的修改与变化，但本发明实施例的专利保护范围，仍须以所附的权利要求书所界定的范围为准。Although the implementation manners disclosed in the embodiments of the present invention are as above, the content described is only an implementation manner adopted to facilitate understanding of the embodiments of the present invention, and is not intended to limit the embodiments of the present invention. Any person skilled in the art to which the embodiments of the present invention belong, without departing from the spirit and scope disclosed by the embodiments of the present invention, can make any modifications and changes in the form and details of the implementation. The scope of patent protection is still subject to the scope defined by the appended claims.

Claims

1. A method of analyzing a base station security log, comprising:

the data aggregation system caches the safety logs reported by the base station in real time to a first data cache system based on a first distributed cluster base station;

the log feature calculation system calculates the feature value of the security log based on the second distributed cluster, and caches the feature value of the security log into the second data cache system;

the characteristic value real-time analysis system analyzes the characteristic value of the safety log in real time based on a stream type calculation frame to obtain an analysis result;

and the data presentation device displays the analysis result.

2. The method of claim 1, further comprising:

and the base station reports the safety log in real time.

3. The method of claim 1, wherein the data aggregation system caching the security log into the first data caching system based on the first distributed cluster comprises:

the data aggregation system distributes a safety log identifier for the safety log based on a first distributed cluster, and caches a first corresponding relation between the safety log identifier and the safety log in the first data caching system;

the characteristic value of the security log comprises the security log identification;

the method further comprises the following steps:

and the data storage system stores the first corresponding relation, the characteristic value of the safety log and the analysis result in a persistent mode.

4. The method of claim 3, further comprising:

the data presentation device searches the first corresponding relationship corresponding to the analysis result and the characteristic value of the safety log in the data storage system, associates the searched first corresponding relationship with the characteristic value of the safety log to obtain a second corresponding relationship among the analysis result, the searched first corresponding relationship and the characteristic value of the safety log, and displays the second corresponding relationship;

or, the data presentation device searches the first corresponding relationship corresponding to a search instruction input by a user and the characteristic value of the security log in the data storage system, associates the searched first corresponding relationship with the characteristic value of the security log to obtain a second corresponding relationship among the searched analysis result, the first corresponding relationship and the characteristic value of the security log, and displays the second corresponding relationship.

5. The method of claim 3, wherein the first distributed cluster comprises: the system comprises an access gateway, a zookeeper and two or more acquisition servers; the data aggregation system allocates a security log identifier for the security log based on a first distributed cluster, and caches a first correspondence between the security log identifier and the security log in the first data caching system, including:

the zookeeper maintains cluster states of the two or more acquisition servers; wherein the cluster state comprises: logging in a cluster state or exiting the cluster state;

the access gateway acquires the cluster states of the two or two acquisition servers from the zookeeper, and distributes the security logs to the acquisition servers in the cluster logging state in the two or two acquisition servers;

and the acquisition server in the logging cluster state distributes a safety log identifier for the safety log, and caches the first corresponding relation between the safety log identifier and the safety log in the first data cache system.

6. The method of claim 3, wherein the data storage system comprises: two or more data servers and search server clusters;

the data storage system persists the security log, the characteristic values of the security log, and the analysis result, and includes:

for each data server, the data server acquires the first corresponding relation from the first data cache system, acquires the characteristic value from the second data cache system, determines the second corresponding relation according to the first corresponding relation and the characteristic value, and stores the second corresponding relation into the search server cluster;

and the data server acquires the analysis result from the characteristic value real-time analysis system and stores the analysis result into the search server cluster.

7. The method of claim 1, wherein the second distributed cluster comprises: two or more feature computation servers; the log feature calculation system calculates a feature value of the security log based on the second distributed cluster, and caches the feature value of the security log in the second data cache system, including:

for each feature calculation server, the feature calculation server calculates a feature value of the security log and caches the feature value of the security log in the second data caching system.

8. The method of claim 7, wherein the computing the feature value of the security log by the feature computation server comprises:

the feature computation server extracts one or more pre-defined feature vector values from the security log, combining the one or more feature vector values into the feature value.

9. A system for analyzing a base station security log, comprising:

the data aggregation system is used for caching the safety logs reported by the base station in real time into the first data caching system based on the first distributed cluster base station;

the log characteristic computing system is used for computing the characteristic value of the security log based on the second distributed cluster and caching the characteristic value of the security log into the second data caching system;

the characteristic value real-time analysis system is used for carrying out real-time analysis on the characteristic value of the safety log based on a streaming computation framework to obtain an analysis result;

and the data presentation device is used for displaying the analysis result.

10. The system of claim 9, further comprising:

and the base station is used for reporting the security log in real time.

11. The system of claim 9, wherein the data aggregation system is specifically configured to:

the system further comprises:

and the data storage system is used for carrying out persistent storage on the first corresponding relation, the characteristic value of the safety log and the analysis result.

12. The system of claim 11, wherein the data presentation device is further configured to:

searching the first corresponding relation corresponding to the analysis result and the characteristic value of the safety log in the data storage system, associating the searched first corresponding relation with the characteristic value of the safety log to obtain a second corresponding relation among the analysis result, the searched first corresponding relation and the characteristic value of the safety log, and displaying the second corresponding relation;

or searching the first corresponding relation corresponding to a search instruction input by a user and the characteristic value of the safety log in the data storage system, associating the searched first corresponding relation with the characteristic value of the safety log to obtain a second corresponding relation among the searched analysis result, the first corresponding relation and the characteristic value of the safety log, and displaying the second corresponding relation.

13. The system of claim 11, wherein the first distributed cluster comprises: the system comprises an access gateway, a zookeeper and two or more acquisition servers;

the zookeeper is used for maintaining the cluster state of the two or more acquisition servers; wherein the cluster state comprises: logging in a cluster state or exiting the cluster state;

the access gateway is used for acquiring the cluster states of the two or two acquisition servers from the zookeeper, and distributing the security logs to the acquisition servers in the cluster logging state in the two or two acquisition servers;

the collection server in the log-in cluster state is used for distributing a safety log identifier for the safety log and caching the first corresponding relation between the safety log identifier and the safety log into the first data caching system.

14. The system of claim 11, wherein the data storage system comprises: two or more data servers and search server clusters;

the data server is configured to obtain the first corresponding relationship from the first data cache system, obtain the feature value from the second data cache system, determine the second corresponding relationship according to the first corresponding relationship and the feature value, and store the second corresponding relationship in the search server cluster;

and acquiring the analysis result from the characteristic value real-time analysis system, and storing the analysis result into the search server cluster.

15. The system of claim 9, wherein the second distributed cluster comprises: two or more feature computation servers;

the characteristic calculation server is used for calculating the characteristic value of the security log and caching the characteristic value of the security log into the second data caching system.

16. The system of claim 15, wherein the feature computation server is specifically configured to implement computing the feature value of the security log by:

one or more predefined feature vector values are extracted from the security log, the one or more feature vector values being combined into the feature value.