Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
In order to make the technical scheme and advantages of the present disclosure more apparent, the present disclosure will be further described in detail below with reference to the accompanying drawings and specific embodiments.
FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the memory overflow event handling method or memory overflow event handling apparatus of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include node 101, network 102, server 103, and database 104. Network 102 is the medium used to provide communication links between nodes 101 and servers 103 and databases 104. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.
The operating system of node 101 may include a user state and a kernel state. The CPU in kernel mode can access any data and peripheral equipment, and can be switched from one program to another program, and the occupation of the CPU can not happen in preemption. The CPU in the user state can only access limited resources, can not directly access hardware devices such as a memory, and can only access the privileged resources by trapping into the kernel through system call. Node 101 may upload a log of node overflow events to database 104 when the node overflow event occurs. The node 101 may be connected to a display device to display information such as a log of the node overflow time.
The node is a working node in the Kubernetes cluster and is responsible for running the containerized application. Each Node has a runtime environment (e.g., docker) and runs multiple Pod. A Node may be a physical machine or a virtual machine. Pod is the smallest schedulable unit in Kubernetes and typically contains one or more containers that share storage, networks, and configurations. The containers in one Pod cooperate to accomplish a certain task and are always scheduled and run together.
Server 103 may obtain a log of node overflow events from database 104, analyze the log, and feed back the analysis to node 101. Node 101 may adjust the configuration information based on the results fed back by server 103.
Database 104 may be various types of databases that may store a log of node overflow events, database 103 may be accessed at a specified time or under specified conditions, and node 101 may also access database 104.
It should be noted that, some steps of the memory overflow event processing method provided in the embodiments of the present disclosure may be performed by the node 101, and another part of steps may be performed by the server 103. Or all steps of the memory overflow event handling method are performed by node 101. Accordingly, a portion of the memory overflow event handling means may be provided in node 101, another portion may be provided in server 103, or all of the memory overflow event handling means may be provided in node 101.
It should be understood that the number of nodes, networks, servers, and databases in fig. 1 are merely illustrative. There may be any number of nodes, networks, servers, and databases as desired for an implementation.
FIG. 2 illustrates a flow 200 of one embodiment of a memory overflow event processing method of the present disclosure. As shown in fig. 2, the memory overflow event processing method of the present embodiment may include the following steps:
In step 201, in the kernel state, the memory overflow event is captured in real time by a probe function pre-installed in a preset function.
In this embodiment, the execution body of the memory overflow event processing method (for example, the node 101 shown in fig. 1) may capture the memory overflow event in real time through a preset function. Specifically, the preset function may be a function for handling a memory overflow event. In some specific practices, the preset function may be a OOM _kill_process function, which is a key function used by the kernel to process the OOM event. The detection function can be mounted on the preset function in advance, and when the OOM event occurs, the detection function can sense, so that the real-time capturing of the OOM event can be realized. Specifically, the probe function may be mounted in the form of a hook function.
Step 202, in response to capturing the memory overflow event, collecting relevant data of the memory overflow event, and sending the relevant data to the user state.
In this embodiment, the relevant data of the OOM event may be collected before or while the preset function processes the OOM event. The related data may include a context of the OOM event, and may further include a thread triggering the OOM event, and information of a process killed by a preset function. After the related data is acquired, the related data can be sent to the user mode through a communication channel between the kernel mode and the user mode.
In some specific practices, the acquisition of relevant data may be accomplished by inserting probes into the oom _kill_process function described above using eBPF. The predecessor of The eBPF, collectively extended Berkeley PACKET FILTER, EBP technology, referred to as BPF (Berkeley PACKET FILTER), or cBPF (classic BPF), was first mentioned in 1992 in STEVEN MCCANNE and Van Jacobson, the BSD PACKET FILTER: A New Architecture for User-LEVEL PACKET Capture. Briefly, eBPF is a very flexible and efficient virtual machine-like component in the Linux kernel that can safely execute bytecodes (bytecode) at many kernel hook points. Many nuclear subsystems have used BPFs, such as common networking, tracking and security. Accordingly, the collected relevant data of the OOM event can be sent to the user mode for processing by using the eBPF perf buffer mechanism. The perfbuffer mechanism is an efficient mechanism for securely and efficiently transferring data collected by eBPF programs to user space. It is based on a performance event (PERF EVENTS) subsystem in the Linux kernel and provides a way for eBPF programs to pass events or other data to processes running in user space.
In step 203, in the user state, enrichment processing is performed on the related data of the memory overflow event, so as to obtain a memory overflow event log.
In the user mode, the enrichment processing can be carried out on the related data of the OOM event to obtain the memory overflow event log. Data enrichment refers to the process of enriching raw data by adding or supplementing additional information. This includes invoking external services to obtain additional information, purging and validating data, format conversion, etc., to improve the integrity and applicability of the data. Specifically, according to the specified fields in the related data, the fields related to the specified fields can be searched, and the related relation between the specified fields and the searched fields is established, so as to obtain the OOM event log.
And step 204, reporting the memory overflow event log.
After the memory overflow event log is obtained, it can be reported to the database. In this way, it can be read for further processing.
According to the memory overflow event processing method provided by the embodiment of the disclosure, the detection function can be mounted on the preset function in the kernel mode to collect the related data of the memory overflow time, the related data is sent to the user mode, the related data is subjected to enrichment processing in the user mode to obtain the memory overflow event log, and the memory overflow event log is further reported, so that the information of the memory overflow event can be obtained in real time, efficiently and accurately, and delay and omission are avoided.
With continued reference to fig. 3, a flow 300 of another embodiment of a memory overflow event processing method according to the present disclosure is shown. As shown in fig. 3, the memory overflow event processing method of the present embodiment may include the following steps:
in step 301, a probe is inserted at a preset function, and a probe function is mounted at the probe.
In this embodiment, probes may be inserted in user mode at each node using eBPF kprobe at kernel function oom _kill_process, which is a process termination event dedicated to handling under-memory. This probe mounting allows the capture of the OOM events occurring in the system in real time.
In step 302, in the kernel state, the memory overflow event is captured in real time by a probe function pre-installed in a preset function.
The OOM event can be perceived in real time through a probe function pre-installed on the kernel function OOM _kill_process.
In step 303, in response to capturing the memory overflow event, relevant data of the memory overflow event is collected and sent to the user state.
After the OOM event is perceived, context information of the OOM event can be obtained, including a PID, NSPID, a process path, a process name, a container identifier of the process, a number of memory pages occupied by the process, a parent process PID, a parent process NSPID, a parent process name, a parent process path, a PID, NSPID, a process path, a process name, a container identifier of the process, a parent process PID, a parent process NSPID, a parent process name, and a parent process path of the process that is performed by the OOM kill. Specifically, the results are shown in Table 1. And sending the acquired information to a user mode through a perf buffer for further processing. The process runs in a kernel mode through eBPF programs, and can achieve real-time OOM event capture with extremely low performance overhead.
Table 1 data relating to the OOM events collected
Step 304, determining a container identifier from the related data in the user mode, determining container metadata corresponding to the container identifier according to the container identifier, and enriching the related data according to the container metadata to obtain a memory overflow event log.
In this embodiment, the user state may be cached with metadata of the container and metadata of the process. In the Kubernetes environment, pod is the smallest resource management component. Pod metadata refers to various information related to Pod in the Kubernetes environment, including the name of Pod, IP address, namespace where it is located, the name of the running node, the name of the service account, CPU and memory usage and restrictions of the container, tags and notes of Pod, and so on. The data enrichment may be performed by a container identification (i.e., container_id) in the relevant data while the relevant data of the ook event is received. In some specific practices, the enriched data includes container names, pod names, and namespaces, and the resulting OOM event log is more complete. The enriched OOM event log includes the data as shown in Table 2.
TABLE 2 OOM event Log Entries including data
Fcontainer _name and fp od_ name, fnamespace in table 2 are extended fields.
And step 305, reporting the memory overflow event log.
Step 306, determining potential malicious behaviors according to the memory overflow event log and preset alarm rules, and outputting the potential malicious behaviors to alarm.
In this embodiment, the memory overflow event log may be further analyzed and processed. Specifically, a preset alarm rule can be combined to determine potential malicious behaviors. For example, the preset alert rule triggers an OOM event for a pod under the specified user namespace. User namespaces are a very important resource in kubernetes systems, and their main role is to achieve resource isolation for multiple sets of environments or multi-tenant resource isolation. After the cluster is started, several user namespaces are created by default. If it is analyzed in the memory overflow time log that the pod under the specified user namespace triggered the OOM event, then the behavior is considered to be either a resource abuse, or an attack.
Furthermore, a big data analysis platform, such as a link stream computing engine, can be utilized to analyze, identify pos, containers, processes that frequently trigger the OOM, generate alarms for potential malicious behaviors (such as resource abuse, DDOS attacks, etc.), and notify operation and maintenance personnel through mail, short messages or enterprise communication tools.
Step 307, analyzing the memory overflow event log, determining the container identification with the number of times of occurrence of the memory overflow event exceeding the preset value, and adjusting the configuration information corresponding to the container identification.
In addition, the memory overflow event log can be analyzed to determine the container identification that the number of times of occurrence of the memory overflow event exceeds a preset value. For example, if Pod B under user namespace a has occurred three OOM events within an hour, then the memory resource limitations of Pod B under user namespace a are considered less. At this time, the Kubernetes API may be called to increase the memory resource limit of Pod B under the user namespace a by 1.5 times, or the Kubernetes API may be called to set the Node Affinity (Node Affinity) of Pod B under the user namespace a, and schedule the Node to the Node with rich memory resource.
Step 308, the memory overflow event log is analyzed to determine the memory overflow information of each node and the container, the father-son relationship of the process triggering the memory overflow event and the father-son relationship of the killed process, and the memory overflow information and the father-son relationship are visually displayed.
In addition, the memory overflow event log can be analyzed to determine the memory overflow information of each node and the container, the father-son relationship of the process triggering the memory overflow event and the father-son relationship of the killed process. In particular, it may be determined when and why a memory overflow event occurred for each node or each container. Further, the memory overflow information and the parent-child relationship can be displayed through a visualization tool. Further, when displaying, historical memory overflow information and real-time memory overflow information of each node and container can be displayed.
In some optional implementations of the embodiment, memory overflow information of the single node and the container may be displayed in real time through a dashboard, and a memory overflow event triggering trend of the single node and the container may be determined according to historical memory overflow information in a preset time period. In this way, decisions can be made by the operation and maintenance personnel.
According to the memory overflow event processing method provided by the embodiment of the disclosure, the eBPF kprobe technology is adopted to mount the kernel function OOM _kill_process, so that the method is non-invasive to the service, and the OOM event can be captured in real time with extremely low performance cost. Compared with the traditional monitoring means, the method and the device can acquire the OOM event information in real time, efficiently and accurately, and avoid delay and missing report. In addition, detailed context information of the processes triggering the OOM and the processes being OOM kills can be accurately captured in the kernel state, wherein the detailed context information comprises a process ID, a path, a container ID, a memory used by the processes and the like, and the user state enriches the container and Pod metadata, so that the integrity and the accuracy of OOM event data are ensured. This provides complete underlying data for subsequent analysis and response. By enriching the OOM event log, the events are associated with metadata such as containers, pod, namespaces and the like, so that the tight association of the OOM events with the services in the cloud primary environment is realized. The relevance of the context information enables operation and maintenance personnel to rapidly identify the service and the container of the problem, and improves the problem positioning efficiency. By carrying out automatic resource adjustment or node migration on the service with frequent OOM events, service interruption caused by resource contention is avoided. The automatic response capability obviously reduces manual intervention work of operation and maintenance personnel, and improves the stability and service continuity of the system. The real-time OOM event is analyzed through the big data analysis platform, the abnormal situation can be timely detected and identified by combining with the preset alarm rule, and related personnel are notified through various channels, so that a flexible alarm mechanism is realized, and the operation and maintenance team can respond rapidly. The real-time display and history inquiry of the OOM event are supported, and strong monitoring and analysis capability is provided. The operation and maintenance personnel can view the system state in real time by means of the visual instrument panel, identify the resource hot spot, and can further optimize the resource configuration and strategy by analyzing the historical data.
With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of a memory overflow event processing apparatus, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 4, the memory overflow event processing device 400 of the present embodiment includes an event capturing unit 401, a data collecting unit 402, a data enriching unit 403, and a log reporting unit 404.
The event capturing unit 401 is configured to capture, in the kernel state, the memory overflow event in real time through the probe function pre-installed in the preset function.
The data acquisition unit 402 is configured to acquire relevant data of the memory overflow event in response to capturing the memory overflow event, and send the relevant data to the user mode.
The data enrichment unit 403 is configured to enrich the relevant data of the memory overflow event in the user mode, so as to obtain a memory overflow event log.
The log reporting unit 404 is configured to report the memory overflow event log.
In addition, in the technical scheme of the application, the application also provides electronic equipment.
Fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
As shown in fig. 5, the electronic device may include a processor 501, a memory 502, a bus 503, and a computer program stored on the memory 502 and executable on the processor 501, wherein the processor 501 and the memory 502 perform communication with each other via the bus 503. The processor 501 performs the steps of implementing the method when executing the computer program, for example, the method includes capturing a memory overflow event in real time through a probe function pre-installed in a preset function in a kernel mode, acquiring related data of the memory overflow event in response to capturing the memory overflow event, and sending the related data to a user mode, enriching the related data of the memory overflow event in the user mode to obtain a memory overflow event log, and reporting the memory overflow event log.
In addition, in an embodiment of the present disclosure, a non-transitory computer readable storage medium is provided, on which a computer program is stored, where the computer program when executed by a processor implements the steps of the method described above, for example, in a kernel mode, capturing a memory overflow event in real time through a probe function pre-installed in a preset function, in response to capturing the memory overflow event, capturing relevant data of the memory overflow event, and sending the relevant data to a user mode, in the user mode, performing enrichment processing on the relevant data of the memory overflow event to obtain a memory overflow event log, and reporting the memory overflow event log.
In summary, in the technical solution of the present disclosure, the detection function may be mounted on the preset function in the kernel mode to collect related data of the memory overflow time, and the related data is sent to the user mode, and enrichment processing is performed on the related data in the user mode to obtain the memory overflow event log, and the memory overflow event log is further reported, so that information of the memory overflow event can be obtained in real time, efficiently and accurately, and delay and omission are avoided.
The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to cover all modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present disclosure.