Movatterモバイル変換


[0]ホーム

URL:


CN116166181A - Cloud monitoring method and cloud management platform - Google Patents

Cloud monitoring method and cloud management platform
Download PDF

Info

Publication number
CN116166181A
CN116166181ACN202210142252.2ACN202210142252ACN116166181ACN 116166181 ACN116166181 ACN 116166181ACN 202210142252 ACN202210142252 ACN 202210142252ACN 116166181 ACN116166181 ACN 116166181A
Authority
CN
China
Prior art keywords
tenant
management platform
cloud management
cloud
slo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210142252.2A
Other languages
Chinese (zh)
Inventor
王宝林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co LtdfiledCriticalHuawei Cloud Computing Technologies Co Ltd
Priority to PCT/CN2022/116891priorityCriticalpatent/WO2023093194A1/en
Publication of CN116166181ApublicationCriticalpatent/CN116166181A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the application discloses a cloud monitoring method and a cloud management platform, and relates to the technical field of data processing. The method comprises the following steps: the cloud management platform determines service level target SLO information input or selected by a tenant at the cloud management platform, wherein the SLO information is used for representing the use requirement of the tenant on index data, and the index data is data generated by monitoring cloud resources purchased by the tenant at the cloud management platform by cloud monitoring service provided by the cloud management platform; and selecting one back-end storage module matched with the SLO information from at least two back-end storage modules to store the index data. Therefore, the index data of the tenants are classified according to the use requirements of the tenants on the index data of the cloud monitoring service, so that different use requirements of different tenants on the cloud monitoring data can be guaranteed, and meanwhile, the effective utilization rate of cloud resources is improved.

Description

Translated fromChinese
一种云监控方法和云管理平台A cloud monitoring method and cloud management platform

本申请要求在2021年11月24日提交申请号为 202111399739.0、发明名称为“一种云监控方法和系统”的专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a patent application with application number 202111399739.0 filed on November 24, 2021, and invention name “A Cloud Monitoring Method and System”, the entire contents of which are incorporated by reference into this application.

技术领域Technical Field

本申请涉及数据处理技术领域,特别涉及一种云监控方法和云管理平台。The present application relates to the field of data processing technology, and in particular to a cloud monitoring method and a cloud management platform.

背景技术Background Art

云计算(cloud computing)是分布式计算的一种,指的是通过网络“云”将巨大的数据计算处理程序分解成无数个小程序,然后,通过多部服务器组成的系统进行处理和分析这些小程序,得到结果并返回给用户(包括企业用户和个人用户)。Cloud computing is a type of distributed computing. It refers to breaking down huge data computing programs into countless small programs through the network "cloud". These small programs are then processed and analyzed through a system composed of multiple servers to obtain results and return them to users (including corporate users and individual users).

其中,云监控服务是云厂商的通用基础能力,负责对租户购买的云资源进行监控产生的指标数据的接入、加工、汇聚、存储等,并提供应用程序编程接口(applicationprogramming interface,API)供租户查询和使用云监控服务相关的指标数据。Among them, cloud monitoring service is a common basic capability of cloud vendors. It is responsible for the access, processing, aggregation, and storage of indicator data generated by monitoring cloud resources purchased by tenants, and provides an application programming interface (API) for tenants to query and use indicator data related to cloud monitoring services.

随着云计算的发展,越来越多的租户将业务迁移到云上,日益增长的数据规模也对云资源的监控带来挑战。然而,目前各个云厂商对所有租户开放的用于查询云监控数据的API 的默认能力是一致的,无法满足不同类型的租户对云监控服务的指标数据的不同使用诉求,同时无法兼顾云资源的有效使用率。With the development of cloud computing, more and more tenants are migrating their businesses to the cloud, and the growing data size also brings challenges to the monitoring of cloud resources. However, the default capabilities of the APIs for querying cloud monitoring data currently open to all tenants by various cloud vendors are the same, which cannot meet the different usage requirements of different types of tenants for indicator data of cloud monitoring services, and cannot take into account the effective utilization rate of cloud resources.

发明内容Summary of the invention

本申请实施例提供一种云监控方法和云管理平台,有助于保障不同类型的租户对云监控服务的指标数据的不同使用诉求,同时提升云资源的有效使用率。The embodiments of the present application provide a cloud monitoring method and a cloud management platform, which help to ensure that different types of tenants have different usage requirements for indicator data of cloud monitoring services, while improving the effective utilization rate of cloud resources.

第一方面,本申请实施例提供了一种云监控方法,该方法可由云管理平台执行,该云管理平台可以为云端服务器。该方法可以包括:云管理平台确定租户在所述云管理平台输入或选择的服务等级目标SLO信息,所述SLO信息用于表示所述租户对指标数据的使用需求,其中所述指标数据为所述云管理平台提供的云监控服务针对所述租户在所述云管理平台购买的云资源进行监控产生的数据;所述云管理平台从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,其中,所述至少两种后端存储模块读取数据的速率不同和/或所述至少两种后端存储模块老化数据的时间不同。In a first aspect, an embodiment of the present application provides a cloud monitoring method, which can be executed by a cloud management platform, and the cloud management platform can be a cloud server. The method may include: the cloud management platform determines the service level objective SLO information input or selected by the tenant on the cloud management platform, and the SLO information is used to represent the tenant's demand for the use of indicator data, wherein the indicator data is the data generated by the cloud monitoring service provided by the cloud management platform for monitoring the cloud resources purchased by the tenant on the cloud management platform; the cloud management platform selects a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, wherein the at least two backend storage modules have different data reading rates and/or the at least two backend storage modules have different data aging times.

通过上述方法,云管理平台可以根据租户对指标数据的使用需求,由不同的后端存储模块对租户的指标数据进行存储,以在读取数据的速率、老化数据的时间等方面为租户提供不同的SLO能力,从而保障不同类型的租户对云监控服务的指标数据的不同使用诉求,同时提升云资源的有效使用率。Through the above method, the cloud management platform can store the tenants' indicator data by different back-end storage modules according to the tenants' usage requirements for the indicator data, so as to provide tenants with different SLO capabilities in terms of data reading rate, data aging time, etc., thereby ensuring the different usage requirements of different types of tenants for the indicator data of the cloud monitoring service, while improving the effective utilization rate of cloud resources.

结合第一方面,在一种可能的实现方式中,所述至少两种后端存储模块包括类型为内存的内存型后端存储模块、类型为固态硬盘SSD的SSD型后端存储模块以及类型为对象存储OBS桶的OBS型后端存储模块,其中,所述内存型后端存储模块读取数据的速率最快,所述OBS型后端存储模块读取数据的速率最慢,所述内存型后端存储模块老化数据的时间最快,所述OBS型后端存储模块老化数据的时间最慢。In combination with the first aspect, in a possible implementation, the at least two back-end storage modules include a memory-type back-end storage module of type memory, an SSD-type back-end storage module of type solid-state drive SSD, and an OBS-type back-end storage module of type object storage OBS bucket, wherein the memory-type back-end storage module reads data at the fastest rate, and the OBS-type back-end storage module reads data at the slowest rate, the memory-type back-end storage module ages data the fastest, and the OBS-type back-end storage module ages data the slowest.

通过上述方法,云管理平台可以通过内部网络与至少两种后端存储模块连接,所述至少两种后端存储模块可用于在读取数据的速率、老化数据的时间等方面为租户提供不同的 SLO能力,从而保障不同类型的租户对云监控服务的指标数据的不同使用诉求,同时提升云资源的有效使用率。Through the above method, the cloud management platform can be connected to at least two back-end storage modules through the internal network. The at least two back-end storage modules can be used to provide tenants with different SLO capabilities in terms of data reading rate, data aging time, etc., thereby ensuring different types of tenants' different usage requirements for indicator data of cloud monitoring services, while improving the effective utilization rate of cloud resources.

可以理解的是,本申请实施例仅是以读取数据的速率、老化数据的时间作为示例设置所述云管理平台连接的所述至少两种后端存储模块,并不限定该至少两种后端存储模块的具体类型或能力,在其它实施例中,云厂商可以根据业务需要部署后端存储模块,本申请实施例对此不做限定。一种可选的实现方式中,云管理平台还可以针对指标数据提供数据存储以外其它数据处理能力,比如,接入、加工、汇聚等,相应地,该云管理平台也可以与提供接入、加工、汇聚等中的任一能力的至少两种后端处理模块连接,并从该至少两种后端处理模块选择与租户的SLO信息匹配的一种后端处理模块处理(例如接入、加工或汇聚)指标数据,该匹配过程与实现存储能力时的匹配过程相似,详细实现可相互参见,在此不再赘述。It can be understood that the embodiment of the present application only uses the rate of reading data and the time of aging data as examples to set the at least two back-end storage modules connected to the cloud management platform, and does not limit the specific type or capabilities of the at least two back-end storage modules. In other embodiments, cloud vendors can deploy back-end storage modules according to business needs, and the embodiment of the present application does not limit this. In an optional implementation, the cloud management platform can also provide other data processing capabilities besides data storage for indicator data, such as access, processing, aggregation, etc. Correspondingly, the cloud management platform can also be connected to at least two back-end processing modules that provide any of the capabilities of access, processing, aggregation, etc., and select a back-end processing module that matches the tenant's SLO information from the at least two back-end processing modules to process (for example, access, processing or aggregation) the indicator data. The matching process is similar to the matching process when implementing storage capabilities. The detailed implementation can be referred to each other and will not be repeated here.

结合第一方面,在一种可能的实现方式中,在所述SLO信息表示所述租户对指标数据的使用需求为高优先级的情况下,所述云管理平台从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,包括:所述云管理平台选择类型为内存的内存型后端存储模块存储所述指标数据。In combination with the first aspect, in a possible implementation, when the SLO information indicates that the tenant's demand for using the indicator data is of high priority, the cloud management platform selects a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, including: the cloud management platform selects a memory-type backend storage module of memory type to store the indicator data.

通过上述方法,高优先级可以表示对指标数据的读取速率、老化时间等的高需求,在租户对指标数据的使用需求为高优先级的情况下,云管理平台选择类型为内存的内存型后端存储模块存储所述指标数据,该内存型存储器的查询响应可以达到P99<50ms的时延,从而在租户读取指标数据时,云管理平台可以P99<50ms的时延来从该内存型后端存储模块读取租户所需的指标数据并反馈,相比于其它类型的后端存储模块可以降低时延。Through the above method, high priority can represent high demand for the reading rate, aging time, etc. of the indicator data. When the tenant's demand for the use of the indicator data is high priority, the cloud management platform selects a memory-type backend storage module of memory type to store the indicator data. The query response of the memory-type storage can achieve a latency of P99 < 50ms. Therefore, when the tenant reads the indicator data, the cloud management platform can read the indicator data required by the tenant from the memory-type backend storage module and feedback it with a latency of P99 < 50ms, which can reduce the latency compared to other types of backend storage modules.

结合第一方面,在一种可能的实现方式中,在所述SLO信息表示所述租户对指标数据的使用需求为中优先级的情况下,所述云管理平台从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,包括:所述云管理平台选择内存型后端存储模块存储所述指标数据。In combination with the first aspect, in a possible implementation method, when the SLO information indicates that the tenant's demand for the use of the indicator data is of medium priority, the cloud management platform selects a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, including: the cloud management platform selects a memory-type backend storage module to store the indicator data.

通过上述方法,中优先级可以表示对指标数据的读取速率、老化时间等的中等需求,在租户对指标数据的使用需求为中优先级的情况下,一种可选的实现方式,云管理平台也可以选择内存型后端存储模块存储指标数据。比如,云管理平台可以根据云资源的实际使用情况动态地调整租户的SLO信息,以在保障租户对指标数据的使用需求的情况下,尽可能充分地利用云资源,降低整体的查询响应时延、存储成本等,以提升用户体验。Through the above method, medium priority can represent medium demand for the reading rate, aging time, etc. of indicator data. When the tenant's demand for indicator data is medium priority, an optional implementation method is that the cloud management platform can also choose a memory-type backend storage module to store indicator data. For example, the cloud management platform can dynamically adjust the tenant's SLO information according to the actual use of cloud resources, so as to make full use of cloud resources as much as possible while ensuring the tenant's demand for indicator data, reduce the overall query response delay, storage cost, etc., to improve the user experience.

结合第一方面,在一种可能的实现方式中,在所述SLO信息表示所述租户对指标数据的使用需求为中优先级的情况下,所述云管理平台从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,包括:所述云管理平台选择类型为SSD的SSD型后端存储模块存储所述指标数据。In combination with the first aspect, in a possible implementation method, when the SLO information indicates that the tenant's demand for the use of the indicator data is of medium priority, the cloud management platform selects a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, including: the cloud management platform selects an SSD-type backend storage module of SSD type to store the indicator data.

通过上述方法,在租户对指标数据的使用需求为中优先级的情况下,云管理平台可以选择SSD型后端存储模块存储指标数据。比如说,中优先级对查询响应时延的要求低于高优先级对查询响应时延的要求,在中优先级需求的情况下,通过选择SSD型后端存储模块存储指标数据,可以在满足租户的使用需求的情况下,降低存储成本。Through the above method, when the tenant's demand for the use of indicator data is medium priority, the cloud management platform can select SSD-type backend storage modules to store indicator data. For example, the medium priority requirement for query response latency is lower than the high priority requirement for query response latency. In the case of medium priority requirements, by selecting SSD-type backend storage modules to store indicator data, the storage cost can be reduced while meeting the tenant's use requirements.

结合第一方面,在一种可能的实现方式中,在所述SLO信息用于表示所述租户对指标数据的使用需求为低优先级的情况下,所述云管理平台从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,包括:所述云管理平台选择类型为OBS桶的OBS型后端存储模块存储所述指标数据。In combination with the first aspect, in a possible implementation, when the SLO information is used to indicate that the tenant's demand for use of the indicator data is of low priority, the cloud management platform selects a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, including: the cloud management platform selects an OBS-type backend storage module of the OBS bucket type to store the indicator data.

通过上述方法,低优先级可以表示对指标数据的读取速率、老化时间等的低需求,在租户对指标数据的使用需求为低优先级的情况下,云管理平台可以选择OBS型后端存储模块存储所述指标数据,进一步降低存储成本。Through the above method, low priority can represent low demand for the reading rate, aging time, etc. of the indicator data. When the tenant's demand for the use of the indicator data is low priority, the cloud management platform can choose the OBS-type back-end storage module to store the indicator data, thereby further reducing storage costs.

结合第一方面,在一种可能的实现方式中,所述云管理平台确定租户在所述云管理平台输入或选择的服务等级目标SLO信息,包括:所述云管理平台向所述租户提供应用程序编程接口API,所述应用程序编程接口API用于指示表示所述SLO信息的不同属性的多个字段;所述云管理平台接收所述租户发送的所述SLO信息,其中所述SLO信息包括所述多个字段和所述租户针对每个字段输入的参数。In combination with the first aspect, in a possible implementation, the cloud management platform determines the service level objective SLO information input or selected by the tenant in the cloud management platform, including: the cloud management platform provides an application programming interface API to the tenant, and the application programming interface API is used to indicate multiple fields representing different attributes of the SLO information; the cloud management platform receives the SLO information sent by the tenant, wherein the SLO information includes the multiple fields and parameters entered by the tenant for each field.

通过上述方法,云管理平台可以向租户提供API,并提供API的用于设置SLO信息的不同属性的字段,租户可以根据该多个字段输入相应的参数,该多个字段以及租户针对每个字段输入的参数可以作为租户设置的SLO信息发送给云管理平台,完成租户的SLO信息的配置。Through the above method, the cloud management platform can provide an API to tenants, and provide API fields for setting different attributes of SLO information. Tenants can enter corresponding parameters based on the multiple fields. The multiple fields and the parameters entered by the tenant for each field can be sent to the cloud management platform as the SLO information set by the tenant to complete the configuration of the tenant's SLO information.

结合第一方面,在一种可能的实现方式中,所述多个字段包括:命名空间、分组名称、实例名称、监控项名称、时间段或所述租户期望的SLO信息。In combination with the first aspect, in a possible implementation, the multiple fields include: a namespace, a group name, an instance name, a monitoring item name, a time period, or the SLO information expected by the tenant.

结合第一方面,在一种可能的实现方式中,所述云管理平台确定租户在所述云管理平台输入或选择的服务等级目标SLO信息,包括:所述云管理平台向所述租户提供控制台界面;所述云管理平台确定所述租户在所述控制台界面输入或选择的所述SLO信息,其中所述SLO信息包括所述控制台界面提供的多个指标属性配置项及所述租户针对每个指标属性配置项输入或选择的参数。In combination with the first aspect, in a possible implementation method, the cloud management platform determines the service level objective SLO information input or selected by the tenant on the cloud management platform, including: the cloud management platform provides a console interface to the tenant; the cloud management platform determines the SLO information input or selected by the tenant on the console interface, wherein the SLO information includes multiple indicator attribute configuration items provided by the console interface and parameters input or selected by the tenant for each indicator attribute configuration item.

通过上述方法,云管理平台可以通过控制台界面获得租户的SLO信息。Through the above method, the cloud management platform can obtain the tenant's SLO information through the console interface.

结合第一方面,在一种可能的实现方式中,所述多个指标属性配置项包括:分级名称、分级资源范围、分级策略。In combination with the first aspect, in a possible implementation manner, the multiple indicator attribute configuration items include: a classification name, a classification resource range, and a classification strategy.

第二方面,本申请实施例提供了一种云管理平台,包括:SLO存储单元,用于确定租户在所述云管理平台输入或选择的服务等级目标SLO信息,所述SLO信息用于表示所述租户对指标数据的使用需求,其中所述指标数据为所述云管理平台提供的云监控服务针对所述租户在所述云管理平台购买的云资源进行监控产生的数据;分析单元,用于从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,其中,所述至少两种后端存储模块读取数据的速率不同和/或所述至少两种后端存储模块老化数据的时间不同。In a second aspect, an embodiment of the present application provides a cloud management platform, comprising: an SLO storage unit, used to determine service level objective SLO information input or selected by a tenant on the cloud management platform, the SLO information being used to represent the tenant's demand for the use of indicator data, wherein the indicator data is data generated by a cloud monitoring service provided by the cloud management platform for monitoring cloud resources purchased by the tenant on the cloud management platform; an analysis unit, used to select a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, wherein the at least two backend storage modules have different data reading rates and/or the at least two backend storage modules have different data aging times.

结合第二方面,在一种可能的实现方式中,所述至少两种后端存储模块包括类型为内存的内存型后端存储模块、类型为SSD的SSD型后端存储模块以及类型为OBS桶的OBS型后端存储模块,其中,所述内存型后端存储模块读取数据的速率最快,所述OBS型后端存储模块读取数据的速率最慢,所述内存型后端存储模块老化数据的时间最快,所述OBS 型后端存储模块老化数据的时间最慢。In combination with the second aspect, in a possible implementation, the at least two back-end storage modules include a memory-type back-end storage module of memory type, an SSD-type back-end storage module of SSD type, and an OBS-type back-end storage module of OBS bucket type, wherein the memory-type back-end storage module reads data at the fastest rate, and the OBS-type back-end storage module reads data at the slowest rate, the memory-type back-end storage module ages data the fastest, and the OBS-type back-end storage module ages data the slowest.

结合第二方面,在一种可能的实现方式中,在所述SLO信息表示所述租户对指标数据的使用需求为高优先级的情况下,所述分析单元用于:选择类型为内存的内存型后端存储模块存储所述指标数据。In combination with the second aspect, in a possible implementation, when the SLO information indicates that the tenant's demand for use of the indicator data is high priority, the analysis unit is used to: select a memory-type backend storage module of memory type to store the indicator data.

结合第二方面,在一种可能的实现方式中,在所述SLO信息表示所述租户对指标数据的使用需求为中优先级的情况下,所述分析单元用于:选择内存型后端存储模块存储所述指标数据。In combination with the second aspect, in a possible implementation, when the SLO information indicates that the tenant's demand for use of the indicator data is of medium priority, the analysis unit is used to: select a memory-type backend storage module to store the indicator data.

结合第二方面,在一种可能的实现方式中,在所述SLO信息表示所述租户对指标数据的使用需求为中优先级的情况下,所述分析单元用于:选择类型为SSD的SSD型后端存储模块存储所述指标数据。In combination with the second aspect, in a possible implementation, when the SLO information indicates that the tenant's demand for the use of the indicator data is of medium priority, the analysis unit is used to: select an SSD-type backend storage module of type SSD to store the indicator data.

结合第二方面,在一种可能的实现方式中,在所述SLO信息用于表示所述租户对指标数据的使用需求为低优先级的情况下,所述分析单元用于:选择类型为OBS桶的OBS型后端存储模块存储所述指标数据。In combination with the second aspect, in a possible implementation, when the SLO information is used to indicate that the tenant's demand for use of the indicator data is of low priority, the analysis unit is used to: select an OBS-type backend storage module of the OBS bucket type to store the indicator data.

结合第二方面,在一种可能的实现方式中,所述SLO存储单元确定租户在所述云管理平台输入或选择的服务等级目标SLO信息包括:向所述租户提供应用程序编程接口API,所述应用程序编程接口API用于指示表示所述SLO信息的不同属性的多个字段;响应于所述API被调用,接收所述租户发送的所述SLO信息,其中所述SLO信息包括所述多个字段和所述租户针对每个字段输入的参数。In combination with the second aspect, in a possible implementation, the SLO storage unit determines the service level objective SLO information input or selected by the tenant on the cloud management platform, including: providing an application programming interface API to the tenant, the application programming interface API being used to indicate multiple fields representing different attributes of the SLO information; in response to the API being called, receiving the SLO information sent by the tenant, wherein the SLO information includes the multiple fields and the parameters entered by the tenant for each field.

结合第二方面,在一种可能的实现方式中,所述多个字段包括:命名空间、分组名称、实例名称、监控项名称、时间段或所述租户期望的SLO信息。In combination with the second aspect, in a possible implementation, the multiple fields include: a namespace, a group name, an instance name, a monitoring item name, a time period, or the SLO information expected by the tenant.

结合第二方面,在一种可能的实现方式中,所述SLO存储单元确定租户在所述云管理平台输入或选择的服务等级目标SLO信息包括:向所述租户提供控制台界面;确定所述租户在所述控制台界面输入或选择的所述SLO信息,其中所述SLO信息包括所述控制台界面提供的多个指标属性配置项及所述租户针对每个指标属性配置项输入或选择的参数。In combination with the second aspect, in a possible implementation method, the SLO storage unit determines the service level objective SLO information input or selected by the tenant on the cloud management platform, including: providing a console interface to the tenant; determining the SLO information input or selected by the tenant on the console interface, wherein the SLO information includes multiple indicator attribute configuration items provided by the console interface and parameters input or selected by the tenant for each indicator attribute configuration item.

结合第二方面,在一种可能的实现方式中,所述多个指标属性配置项包括:分级名称、分级资源范围、分级策略。In combination with the second aspect, in a possible implementation manner, the multiple indicator attribute configuration items include: a classification name, a classification resource range, and a classification strategy.

第三方面,本申请实施例提供了一种通信装置,包括一个或多个处理器和一个或多个存储器;所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,所述装置执行上述第一方面或第一方面任一可能设计所述的方法。In a third aspect, an embodiment of the present application provides a communication device, comprising one or more processors and one or more memories; the one or more memories are coupled to the one or more processors, and the one or more memories are used to store computer program code, and the computer program code includes computer instructions, and when the one or more processors execute the computer instructions, the device executes the method described in the first aspect or any possible design of the first aspect.

第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算装置上运行时,使得所述装置执行上述第一方面或第一方面任一可能设计所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which is used to store a computer program. When the computer program runs on a computing device, the device executes the method described in the first aspect or any possible design of the first aspect.

本申请实施例在上述各方面提供的实现的基础上,还可以进行进一步组合以提供更多实现。Based on the implementations provided in the above aspects, the embodiments of the present application can be further combined to provide more implementations.

上述第二方面至第四方面中任一方面中的任一可能实现方式可以达到的技术效果,可以相应参照上述第一方面中任一方面中的任一可能实现方式可以达到的技术效果描述,重复之处不予论述。The technical effects that can be achieved by any possible implementation method in any of the second to fourth aspects mentioned above can be referred to the description of the technical effects that can be achieved by any possible implementation method in any of the first aspect mentioned above, and the repetitions will not be discussed here.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1示出了一种云服务系统的架构示意图;FIG1 shows a schematic diagram of the architecture of a cloud service system;

图2示出了本申请实施例的云监控系统的架构示意图;FIG2 shows a schematic diagram of the architecture of a cloud monitoring system according to an embodiment of the present application;

图3示出了本申请实施例中根据Console用户界面配置SLO信息的示意图;FIG3 is a schematic diagram showing a configuration of SLO information according to a Console user interface in an embodiment of the present application;

图4示出了本申请实施例的云监控方法的流程示意图;FIG4 shows a schematic diagram of a process flow of a cloud monitoring method according to an embodiment of the present application;

图5示出了本申请实施例的云监控方法的流程示意图;FIG5 is a schematic diagram showing a flow chart of a cloud monitoring method according to an embodiment of the present application;

图6示出了本申请实施例的云监控方法的流程示意图;FIG6 shows a schematic diagram of a process flow of a cloud monitoring method according to an embodiment of the present application;

图7示出了本申请实施例的云管理平台的结构示意图;FIG7 shows a schematic diagram of the structure of a cloud management platform according to an embodiment of the present application;

图8示出了本申请实施例的通信装置的结构示意图。FIG8 shows a schematic structural diagram of a communication device according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

以下,对本申请实施例中的部分用语进行解释说明,以便于本领域技术人员理解。Below, some terms in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.

1、时间线(Time Line):是指对过去以及未来所发生的事情在一个方向上的编码。在云监控场景中,时间线是指不同的指标数据在时间上的呈现。1. Timeline: It refers to the encoding of things that happened in the past and in the future in one direction. In the cloud monitoring scenario, the timeline refers to the presentation of different indicator data in time.

2、对象存储服务(Object Storage Serves,OBS):是一个基于对象的海量存储服务,为租户提供海量、安全、高可靠、低成本的数据存储能力。2. Object Storage Service (OBS): It is an object-based mass storage service that provides tenants with massive, secure, highly reliable, and low-cost data storage capabilities.

OBS的基本组成是OBS桶和对象。OBS桶是OBS中存储对象的容器,每个桶都有自己的存储类别、访问权限、所属区域等属性,租户在互联网上通过桶的访问域名来定位桶。对象是OBS中数据存储的基本单位,一个对象实际是一个文件的数据与其相关属性信息的集合体,包括键值(Key)、元数据(Metadata)、数据(Data)三部分。The basic components of OBS are OBS buckets and objects. OBS buckets are containers for storing objects in OBS. Each bucket has its own storage category, access rights, region, and other attributes. Tenants locate buckets on the Internet through bucket access domain names. Objects are the basic unit of data storage in OBS. An object is actually a collection of file data and its related attribute information, including key value (Key), metadata (Metadata), and data (Data).

3、命名空间(NameSpace):是对一组资源和对象的抽象整合。在同一个集群内可创建不同的命名空间,不同命名空间中的数据彼此隔离,使得它们既可以共享同一个集群的服务,也能够互不干扰。3. Namespace: It is an abstract integration of a group of resources and objects. Different namespaces can be created in the same cluster. The data in different namespaces are isolated from each other, so that they can share the services of the same cluster without interfering with each other.

4、工单:租户在云服务和/或云上资源故障时,发送给云厂商的通知,云厂商收到工单后,根据实际情况决定是否受理。4. Work order: A notification sent by a tenant to a cloud vendor when a cloud service and/or cloud resource fails. After receiving the work order, the cloud vendor decides whether to accept it based on the actual situation.

5、云资源:云管理平台为租户提供的云上资源,包括云服务和云实例,云服务例如为 VPC网络提供服务、网关提供服务、防火墙服务、NAT服务、云盘、弹性公网IP(EIP)、云监控服务以及其他各种云厂商提供的云服务,云实例例如为虚拟机、容器或裸金属服务器,虚拟机、容器或裸金属服务器均为云厂商在云厂商的数据中心提供给租户使用的虚拟实例。5. Cloud resources: Cloud resources provided by the cloud management platform to tenants, including cloud services and cloud instances. Cloud services include VPC network services, gateway services, firewall services, NAT services, cloud disks, elastic public IP (EIP), cloud monitoring services, and cloud services provided by various other cloud vendors. Cloud instances include virtual machines, containers, or bare metal servers. Virtual machines, containers, or bare metal servers are virtual instances provided by cloud vendors to tenants in their data centers.

下面结合附图及实施例详细介绍本申请。The present application is described in detail below with reference to the accompanying drawings and embodiments.

图1示出了一种云服务系统的架构示意图。FIG1 shows a schematic diagram of the architecture of a cloud service system.

如图1所示,该云服务系统100可以包括云厂商提供的云服务模块110、云管理平台120和后端存储器130。As shown in FIG. 1 , the cloud service system 100 may include a cloud service module 110 , a cloud management platform 120 , and a backend storage 130 provided by a cloud vendor.

云管理平台120对外可通过互联网与租户操作的终端设备连接,租户可以在云厂商侧订购云服务,以使云服务模块110可以根据租户订购的云服务为租户提供相应服务。云管理平台120在系统100内可通过内部网络与云服务模块110、后端存储器130连接,该后端存储器130可为来自租户的终端设备的数据或来自云服务模块110的数据提供数据存储服务。The cloud management platform 120 can be connected to the terminal devices operated by the tenants through the Internet. The tenants can order cloud services from the cloud vendor so that the cloud service module 110 can provide corresponding services to the tenants according to the cloud services ordered by the tenants. The cloud management platform 120 can be connected to the cloud service module 110 and the backend storage 130 through the internal network in the system 100. The backend storage 130 can provide data storage services for data from the tenants' terminal devices or data from the cloud service module 110.

其中,云管理平台120可以提供API接口供云服务模块110调用。该云服务模块110可以为租户提供云服务并在服务过程中产生相关服务数据,同时,云服务模块110会调用云管理平台120提供的API执行写命令,将产生的相关服务数据上报至云管理平台120,由云管理平台120将相关服务数据存储至后端存储器130。云管理平台120可以提供API接口供租户调用,并根据租户向API接口输入的命令,访问后端存储器130,并将访问结果提供给租户。Among them, the cloud management platform 120 can provide an API interface for the cloud service module 110 to call. The cloud service module 110 can provide cloud services for tenants and generate relevant service data during the service process. At the same time, the cloud service module 110 will call the API provided by the cloud management platform 120 to execute the write command, and report the generated relevant service data to the cloud management platform 120, and the cloud management platform 120 will store the relevant service data in the backend storage 130. The cloud management platform 120 can provide an API interface for tenants to call, and access the backend storage 130 according to the command input by the tenant to the API interface, and provide the access result to the tenant.

本申请实施例中,该后端存储器130可以包括至少两种后端存储模块,例如可以包括类型为内存型的内存型后端存储模块和类型为持久型的持久型后端存储模块,该持久型后端存储模块例如可以包括类型为固态硬盘(Solid State Disk或Solid State Drive,SSD)的 SSD型后端存储模块、类型为对象存储服务(Object Storage Service,OBS)桶的OBS后端存储模块等。该至少两种后端存储模块具备不同的能力,比如读取数据的速率不同和/或老化数据的时间不同。比如,以后端存储器130通过不同的保留时间来老化数据为例,内存型后端存储模块只保留3个小时,SSD型后端存储模块保留1个月,OBS型后端存储模块可保留1年。云管理平台120可在上述至少两种后端存储模块中选择目标后端存储模块来存储来自云服务模块110的实时数据,通过不同的后端存储模块,可为不同类型的租户提供针对相关服务数据的不同使用能力。In the embodiment of the present application, the backend storage 130 may include at least two backend storage modules, for example, a memory-type backend storage module of memory type and a persistent backend storage module of persistent type, and the persistent backend storage module may include, for example, a SSD-type backend storage module of solid-state disk (SSD), an OBS backend storage module of object storage service (OBS) bucket, etc. The at least two backend storage modules have different capabilities, such as different data reading rates and/or different aging data times. For example, taking the backend storage 130 aging data by different retention times as an example, the memory-type backend storage module only retains 3 hours, the SSD-type backend storage module retains 1 month, and the OBS-type backend storage module can retain 1 year. The cloud management platform 120 can select a target backend storage module from the at least two backend storage modules to store real-time data from the cloud service module 110. Through different backend storage modules, different types of tenants can be provided with different usage capabilities for related service data.

租户可在云管理平台120购买云资源,在购买成功后,云管理平台120通知云服务系统100创建云资源,并向租户提供合适的访问方式以远程使用云资源,举例而言,云资源例如为一个虚拟机,租户可在云管理平台选择虚拟机的规格(内存、处理器和磁盘),云管理平台在租户付费成功后,通知云服务系统100创建具有该规格的虚拟机,并开放该虚拟机的远程桌面,云管理平台120将远程桌面的连接账号和密码提供给租户,使得租户可通过账号密码远程登录虚拟机。Tenants can purchase cloud resources on the cloud management platform 120. After the purchase is successful, the cloud management platform 120 notifies the cloud service system 100 to create cloud resources and provides tenants with appropriate access methods to use cloud resources remotely. For example, the cloud resources are a virtual machine. The tenant can select the specifications of the virtual machine (memory, processor and disk) on the cloud management platform. After the tenant successfully pays, the cloud management platform notifies the cloud service system 100 to create a virtual machine with the specifications and opens the remote desktop of the virtual machine. The cloud management platform 120 provides the tenant with the connection account and password of the remote desktop, so that the tenant can log in to the virtual machine remotely through the account and password.

云资源也可以例如为容器、裸金属服务器、弹性公网IP(Elastic IP Address,EIP)等各种云服务,本申请实施例对云服务类型不作限定。Cloud resources may also be various cloud services such as containers, bare metal servers, elastic public IPs (Elastic IP Address, EIP), etc. The embodiments of the present application do not limit the types of cloud services.

其中,上述云服务系统100还可以为租户提供云监控服务。租户可以在云厂商侧订购云监控服务,云服务系统100根据租户订购的云监控服务进行配置,以在云服务模块110为租户提供云服务的过程中,监控云服务模块110提供的正在运行的云服务和/或云上资源,产生租户的云服务和/或云上资源的指标数据(统称为云监控数据)。The cloud service system 100 can also provide cloud monitoring services for tenants. Tenants can order cloud monitoring services from cloud vendors, and the cloud service system 100 is configured according to the cloud monitoring services ordered by the tenants, so as to monitor the running cloud services and/or cloud resources provided by the cloud service module 110 during the process of the cloud service module 110 providing cloud services to tenants, and generate indicator data of the tenants' cloud services and/or cloud resources (collectively referred to as cloud monitoring data).

租户可根据该些指标数据获知云资源的工作状态,从而根据指标数据触发在云管理平台120预设的动作,例如针对用于处理多个业务请求的虚拟机(例如该虚拟机提供网页以供互联网用户访问,该虚拟机可接收到多个业务请求,业务请求具体而言为网页访问请求) 而言,监控的指标数据例如为该虚拟机的CPU占用率,若CPU占用率超过阈值时,云管理平台120通知云服务系统100将该虚拟机的数量增加,例如从1个增加至2个,具体可以通过复制虚拟机镜像的方式实现,该2个虚拟机通过租户在云管理平台120预设的负载均衡策略接收不同的业务请求并分别进行处理,从而使得原来的虚拟机的CPU占用率降低。Tenants can learn about the working status of cloud resources based on these indicator data, and thus trigger actions preset in the cloud management platform 120 based on the indicator data. For example, for a virtual machine used to process multiple business requests (for example, the virtual machine provides web pages for Internet users to access, and the virtual machine can receive multiple business requests, and the business requests are specifically web page access requests), the monitored indicator data is, for example, the CPU occupancy rate of the virtual machine. If the CPU occupancy rate exceeds the threshold, the cloud management platform 120 notifies the cloud service system 100 to increase the number of virtual machines, for example, from 1 to 2. This can be achieved by copying the virtual machine image. The two virtual machines receive different business requests and process them separately through the load balancing strategy preset by the tenant on the cloud management platform 120, thereby reducing the CPU occupancy rate of the original virtual machine.

然而,不同类型的租户对指标数据的查询和使用等服务通常具有不同的能力诉求,不同业务场景下的使用需求也是不同的。比如,云监控场景下,大多(例如90%以上)的查询是近1小时内的实时数据查询;大多(例如90%以上)的查询是常驻查询,而且常驻查询涉及的时间线仅占写入时间线比例的1%以下;大多(例如95%以上)的常驻查询是由规模较大的租户发起的,其中规模较大的租户例如为购买的虚拟机数量较多的租户,在本申请实施例中,规模较大的租户例如为购买的云资源较多的租户,规模较小的租户例如为购买的云资源较少的租户。However, different types of tenants usually have different capability requirements for services such as querying and using indicator data, and the usage requirements in different business scenarios are also different. For example, in a cloud monitoring scenario, most (for example, more than 90%) queries are real-time data queries within the past hour; most (for example, more than 90%) queries are resident queries, and the timelines involved in resident queries account for less than 1% of the write timeline ratio; most (for example, more than 95%) resident queries are initiated by larger tenants, among which larger tenants are, for example, tenants who have purchased a large number of virtual machines. In the embodiment of the present application, larger tenants are, for example, tenants who have purchased more cloud resources, and smaller tenants are, for example, tenants who have purchased fewer cloud resources.

大规模租户对服务能力的要求更高,并且能够接受通过付费的方式来获得更优的服务能力。比如,大规模租户会租用多个不同云厂商的云资源,并调用各个云厂商提供的开放 API获取指标数据,以供该租户进行二次数据处理,例如在租户自建的运维平台分析和呈现指标数据,因此,该租户通常需要云服务系统100提供更快速以及更大资源量的查询响应,例如P99毫秒级时延(例如99%的响应时延≤50毫秒(ms))以及百级每秒查询率(Queries Per Second,QPS)。而小规模租户由于资源量较小且大多不涉及二次数据处理过程,通常只需按照该租户的需求进行查询和分析,对响应延迟的要求也不高,例如可以接受P99秒级的时延和更低的QPS。Large-scale tenants have higher requirements for service capabilities and are willing to pay for better service capabilities. For example, large-scale tenants will rent cloud resources from multiple different cloud vendors and call the open APIs provided by each cloud vendor to obtain indicator data for the tenant to perform secondary data processing, such as analyzing and presenting indicator data on the tenant's self-built operation and maintenance platform. Therefore, the tenant usually requires the cloud service system 100 to provide faster and larger query responses, such as P99 millisecond latency (for example, 99% of the response latency ≤ 50 milliseconds (ms)) and hundreds of queries per second (Queries Per Second, QPS). Small-scale tenants, due to their small amount of resources and most of them not involving secondary data processing, usually only need to query and analyze according to the tenant's needs, and the requirements for response delay are not high, for example, P99 second latency and lower QPS can be accepted.

目前各个云厂商对所有租户开放的API的默认能力是一致的,租户无法将自身对上述服务能力的诉求传达给云厂商,云厂商无法获取到不同规模的租户对服务能力的不同诉求。租户只能在被流量控制(简称为流控,traffic control)或者业务受损的情况下,通过提交工单的方式通知云厂商适度变更所需的服务能力。由于云厂商无法了解到不同租户的查询和使用需求,会对所有的指标数据进行统一处理,会将指标数据存储到内存型后端存储模块中。Currently, the default capabilities of the APIs opened by various cloud vendors to all tenants are the same. Tenants cannot convey their demands for the above service capabilities to cloud vendors, and cloud vendors cannot obtain the different demands of tenants of different sizes for service capabilities. Tenants can only notify cloud vendors to appropriately change the required service capabilities by submitting work orders when traffic control (abbreviated as traffic control) or business is damaged. Since cloud vendors cannot understand the query and usage requirements of different tenants, they will uniformly process all indicator data and store the indicator data in the memory-based backend storage module.

然而,涉及查询和使用的数据量通常只占总体数据量的1%以下,99%的指标数据都没有必要占用内存型后端存储模块的存储空间。比如1亿时间线,时序压缩后每个数据点0.03KB,每个数据点按照每分钟周期性上报,若保留3小时,则需要515G的内存存储空间,但是其中510G的内存存储空间都没有被查询和使用。并且,由于内存型后端存储模块、SSD 型后端存储模块、OBS后端存储模块的成本依次递减,响应时延依次递增,若云服务系统 100忽视租户的查询和使用需求而仍对所有的指标数据进行统一处理,既无法贴近租户的实际诉求,也将无法保障对云资源的有效使用率。However, the amount of data involved in query and use usually accounts for less than 1% of the total data volume, and 99% of the indicator data do not need to occupy the storage space of the memory-type backend storage module. For example, for 100 million timelines, each data point is 0.03KB after time series compression, and each data point is reported periodically every minute. If it is retained for 3 hours, 515G of memory storage space is required, but 510G of the memory storage space is not queried and used. In addition, since the costs of memory-type backend storage modules, SSD-type backend storage modules, and OBS backend storage modules decrease in turn, and the response delay increases in turn, if the cloud service system 100 ignores the query and use needs of tenants and still processes all indicator data in a unified manner, it will not be able to meet the actual demands of tenants, and will not be able to guarantee the effective utilization rate of cloud resources.

针对上述问题,本申请实施例提供了一种云监控方法和云管理平台,有助于保障不同类型的租户对指标数据的不同使用诉求,同时提升对云资源的有效使用率。其中,方法和装置是基于同一技术构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。In view of the above problems, the embodiments of the present application provide a cloud monitoring method and a cloud management platform, which help to ensure the different usage requirements of different types of tenants for indicator data, while improving the effective utilization rate of cloud resources. Among them, the method and the device are based on the same technical concept. Since the principles of solving the problem by the method and the device are similar, the implementation of the device and the method can refer to each other, and the repeated parts will not be repeated.

下面以云监控服务作为示例介绍本申请实施例适用的系统架构。The following uses cloud monitoring service as an example to introduce the system architecture applicable to the embodiments of the present application.

图2示出了本申请实施例的云监控系统的架构示意图。FIG2 shows a schematic diagram of the architecture of a cloud monitoring system according to an embodiment of the present application.

如图2所示,该云监控系统200可以包括云服务模块210、云管理平台220和后端存储器230,云服务模块210和/或租户可以通过调用云管理平台220提供的API与该云管理平台220通信,该云管理平台220及其连接的后端处理模块(包括但不限于图1所示的后端存储模块)可以为来自云服务模块210的指标数据提供接入、加工、汇聚、存储等数据处理功能,以及为租户提供对保存的指标数据的查询和使用等功能。As shown in Figure 2, the cloud monitoring system 200 may include a cloud service module 210, a cloud management platform 220 and a back-end storage 230. The cloud service module 210 and/or tenants can communicate with the cloud management platform 220 by calling an API provided by the cloud management platform 220. The cloud management platform 220 and its connected back-end processing modules (including but not limited to the back-end storage module shown in Figure 1) can provide data processing functions such as access, processing, aggregation, and storage for indicator data from the cloud service module 210, as well as provide tenants with functions such as querying and using stored indicator data.

可以理解的是,图2中仅示例性地表示该系统200可以包括的功能模块,并不限定各功能模块的数量以及功能,例如,该系统200可以包括多种类型的后端处理模块,任一后端处理模块也可以包括各自的子模块,本申请实施例对此不做限定。It will be understood that FIG2 merely exemplarily shows the functional modules that the system 200 may include, and does not limit the number and function of each functional module. For example, the system 200 may include multiple types of back-end processing modules, and any back-end processing module may also include its own sub-modules, which is not limited in this embodiment of the present application.

示例地,该云管理平台220可以包括服务等级目标(service level objective,SLO)存储单元250、分析单元260。可选地,该云管理平台220还可以包括缓存单元270。其中,该 SLO存储单元250、分析单元260和缓存单元270可以为该云管理平台220中的独立模块,也可以为其它模块的子模块,本申请实施例对该SLO存储单元250、分析单元260和缓存单元270的具体实现方式不做限定,图中虚线框仅表示该SLO存储单元250、分析单元260 和缓存单元270为可选的独立模块。For example, the cloud management platform 220 may include a service level objective (SLO) storage unit 250 and an analysis unit 260. Optionally, the cloud management platform 220 may also include acache unit 270. The SLO storage unit 250, the analysis unit 260, and thecache unit 270 may be independent modules in the cloud management platform 220, or may be submodules of other modules. The embodiment of the present application does not limit the specific implementation of the SLO storage unit 250, the analysis unit 260, and thecache unit 270. The dashed box in the figure only indicates that the SLO storage unit 250, the analysis unit 260, and thecache unit 270 are optional independent modules.

下面结合附图对本申请实施例的SLO存储单元250、分析单元260和缓存单元270进行介绍。The SLO storage unit 250 , the analysis unit 260 , and thecache unit 270 of the embodiment of the present application are introduced below with reference to the drawings.

1、SLO存储单元2501. SLO storage unit 250

该SLO存储单元250可以用于持久化地存储租户自定义的SLO信息。在一种可选的实现方式中,该SLO存储单元250可以归属于后端存储器。The SLO storage unit 250 may be used to persistently store tenant-defined SLO information. In an optional implementation, the SLO storage unit 250 may belong to a backend storage.

本申请实施例中,租户订购云监控服务时,云管理平台220可以为租户提供配置SLO 信息的入口,例如通过API配置SLO信息。该租户可以通过该入口完成SLO信息的自定义配置,租户自定义配置的SLO信息可疑持久化地存储到该SLO存储单元250中。后续,该云监控系统200的其它功能模块可以从该SLO存储单元250中获取租户的SLO信息,并基于该SLO信息为相应租户提供对云监控服务的指标数据的各种处理功能、以及查询和使用功能。In the embodiment of the present application, when a tenant subscribes to a cloud monitoring service, the cloud management platform 220 can provide the tenant with an entry for configuring SLO information, such as configuring SLO information through an API. The tenant can complete the custom configuration of the SLO information through the entry, and the SLO information customized by the tenant is persistently stored in the SLO storage unit 250. Subsequently, other functional modules of the cloud monitoring system 200 can obtain the tenant's SLO information from the SLO storage unit 250, and provide the corresponding tenant with various processing functions, as well as query and use functions for the indicator data of the cloud monitoring service based on the SLO information.

具体实施过程中,云管理平台220可以为租户提供SLO配置项,该SLO配置项可以用于租户自定义配置SLO信息,该SLO信息可以包括租户期望的对指标数据的查询范围和查询能力参数等。示例地,该查询范围比如可以包括以下至少一项:待查询指标的命名空间、监控项名称、实例名称、资源分组信息、资源标识、时间段,该查询能力参数比如可以包括以下至少一项:时延、查询率、单次查询量上限、SLO等级。During the specific implementation process, the cloud management platform 220 can provide tenants with SLO configuration items, which can be used for tenants to customize SLO information, and the SLO information can include the query range and query capability parameters of the indicator data expected by the tenant. For example, the query range can include at least one of the following: the namespace of the indicator to be queried, the monitoring item name, the instance name, the resource grouping information, the resource identifier, and the time period, and the query capability parameter can include at least one of the following: latency, query rate, upper limit of a single query amount, and SLO level.

其中,该查询范围中,待查询指标的命名空间为待查询的指标数据所属的命名空间;监控项名称为待查询指标对应的指标名称;资源分组信息用于指示待查询指标对应的资源分组;资源标识为待查询指标对应的资源的标识;时间段为待查询指标对应的时间范围。该查询能力参数中,时延可以包括对查询请求的最大响应时延、平均响应时延等;查询率例如可以包括查询请求能够达到的查询率,具体例如是QPS;单次查询量上限表示单次最大查询量;SLO等级例如可以包括高优先级、中优先级、和低优先级等中的任一项。Among them, in the query range, the namespace of the indicator to be queried is the namespace to which the indicator data to be queried belongs; the monitoring item name is the indicator name corresponding to the indicator to be queried; the resource grouping information is used to indicate the resource grouping corresponding to the indicator to be queried; the resource identifier is the identifier of the resource corresponding to the indicator to be queried; the time period is the time range corresponding to the indicator to be queried. In the query capability parameter, the latency can include the maximum response latency and average response latency to the query request; the query rate can include, for example, the query rate that the query request can reach, specifically, for example, QPS; the upper limit of a single query volume indicates the maximum single query volume; the SLO level can include, for example, any one of high priority, medium priority, and low priority.

一种可能的实现方式中,租户在云管理平台220订购云监控服务时,可以通过自身操作的终端设备向在云管理平台220发送云监控服务配置请求。该在云管理平台220可以接收来自终端设备的云监控服务配置请求,并响应于该云监控服务配置请求向终端设备反馈云监控服务配置响应。该云监控服务配置响应可以用于指示备选配置信息,租户可以根据该备选配置信息以及自身对云监控服务的指标数据的使用需求确定目标配置信息。进一步地,该终端设备可以向云管理平台220发送该目标配置信息,以便云管理平台220根据该目标配置信息获取租户的SLO信息,完成对租户的SLO的自定义配置。可以理解的是,在具体实现中,该目标配置信息可以为租户的SLO信息,或者,该目标配置信息可以为用于获取租户的SLO信息的信息,本申请实施例对该SLO信息的配置实现不做限定。In a possible implementation, when a tenant subscribes to a cloud monitoring service on the cloud management platform 220, the tenant can send a cloud monitoring service configuration request to the cloud management platform 220 through a terminal device operated by the tenant. The cloud management platform 220 can receive a cloud monitoring service configuration request from a terminal device, and in response to the cloud monitoring service configuration request, feedback a cloud monitoring service configuration response to the terminal device. The cloud monitoring service configuration response can be used to indicate alternative configuration information, and the tenant can determine the target configuration information based on the alternative configuration information and its own use requirements for the indicator data of the cloud monitoring service. Further, the terminal device can send the target configuration information to the cloud management platform 220 so that the cloud management platform 220 obtains the tenant's SLO information based on the target configuration information and completes the customized configuration of the tenant's SLO. It can be understood that in a specific implementation, the target configuration information can be the tenant's SLO information, or the target configuration information can be information used to obtain the tenant's SLO information, and the embodiment of the present application does not limit the configuration implementation of the SLO information.

以租户通过API配置SLO信息为例,上述备选配置信息可以有至少一种实现方式,具体配置过程示例说明如下:Taking the example of tenants configuring SLO information through API, the above alternative configuration information can have at least one implementation method. The specific configuration process example is described as follows:

云管理平台220提供的API格式如下:The API format provided by the cloud management platform 220 is as follows:

Figure BDA0003507537470000091
Figure BDA0003507537470000091

具体地,云管理平台220可在在互联网提供的网页上显示以上API格式,并注明相应字段的用法,例如上述//后面的相关提示。租户在看到上述API格式之后,根据上述API格式填入相应的参数,例如在"namespace":后面填入:"EIP",即"namespace":"EIP",表示涉及的服务是租户所购买的EIP服务,进一步,可在"namespace":后面填入:"*",即"namespace":"*",标识涉及的是租户所购买的全部云资源。Specifically, the cloud management platform 220 may display the above API format on a web page provided on the Internet, and indicate the usage of the corresponding fields, such as the relevant prompts after the above //. After seeing the above API format, the tenant fills in the corresponding parameters according to the above API format, for example, fill in "EIP" after "namespace":, that is, "namespace":"EIP", indicating that the service involved is the EIP service purchased by the tenant, and further, fill in "*" after "namespace":, that is, "namespace":"*", indicating that it involves all cloud resources purchased by the tenant.

举例而言,租户可以针对API格式填入参数如下:For example, a tenant can fill in the parameters for the API format as follows:

Figure BDA0003507537470000092
Figure BDA0003507537470000092

Figure BDA0003507537470000101
Figure BDA0003507537470000101

租户可将上述输入了参数的API以模板方式通过互联网发送至云管理平台220,云管理平台220检测API中不同字段对应的参数,从而获取租户针对API不同字段对应的需求。因此,在本实施例中,SLO信息包括API字段和租户输入的参数。进一步,云管理平台220 将该租户的SLO信息存储至该SLO存储单元250。The tenant can send the API with the input parameters to the cloud management platform 220 via the Internet in a template manner. The cloud management platform 220 detects the parameters corresponding to different fields in the API, thereby obtaining the tenant's requirements corresponding to different fields of the API. Therefore, in this embodiment, the SLO information includes the API fields and the parameters input by the tenant. Further, the cloud management platform 220 stores the tenant's SLO information in the SLO storage unit 250.

除提供API之外,如图3所示,云管理平台220还可以提供控制台(Console)界面以供租户进行配置。租户的终端设备可以显示该控制台界面,租户可以根据自身对指标数据的使用需求在Console界面的相关属性配置项输入或选择与上述API类似的参数。In addition to providing APIs, as shown in Figure 3, the cloud management platform 220 can also provide a console interface for tenants to configure. The tenant's terminal device can display the console interface, and the tenant can enter or select parameters similar to the above API in the relevant attribute configuration items of the console interface according to their own usage requirements for indicator data.

示例地,云厂商提供的云监控服务具体例如可以包括但不限于资源分组、告警、主机监控、云服务监控、分级处理、事件监控等,租户选择上述任一个具体的云监控服务后,该Console界面可以呈现该项云监控服务的相关指标属性配置项,以供租户进行选择或输入,完成该项云监控服务的相关参数配置。For example, the cloud monitoring services provided by cloud vendors may include but are not limited to resource grouping, alarms, host monitoring, cloud service monitoring, hierarchical processing, event monitoring, etc. After the tenant selects any of the above specific cloud monitoring services, the Console interface can present the relevant indicator attribute configuration items of the cloud monitoring service for the tenant to select or input to complete the relevant parameter configuration of the cloud monitoring service.

因此,在本实施例中,SLO信息包括控制台界面提供的指标属性配置项及租户针对每个指标属性配置项输入或选择的参数。Therefore, in this embodiment, the SLO information includes the indicator attribute configuration items provided by the console interface and the parameters input or selected by the tenant for each indicator attribute configuration item.

以分级处理服务这一具体云监控服务为例,如图3所示,分级处理服务涉及的指标属性配置项可以包括多个分级信息,每个分级信息可以关联分级名称(或ID)、涉及的分级资源范围、分级策略等。租户可以根据自身对云监控服务的指标数据的使用需求,在该多个分级信息中确定目标分级信息,该目标分级信息能够满足租户对云监控服务的指标数据的使用需求。Taking the hierarchical processing service as an example, as shown in FIG3, the indicator attribute configuration items involved in the hierarchical processing service may include multiple hierarchical information, each of which may be associated with a hierarchical name (or ID), a hierarchical resource range involved, a hierarchical strategy, etc. Tenants may determine target hierarchical information from the multiple hierarchical information based on their own use requirements for the indicator data of the cloud monitoring service, and the target hierarchical information may meet the tenant's use requirements for the indicator data of the cloud monitoring service.

例如,租户可以选择涉及的服务为“EIP”,策略包括:查询最近1小时数据;最大响应时延P99<50ms;默认按高优先级处理。或者,租户可以选择涉及的服务为“EIP”,策略包括:查询最近1小时数据;最大响应时延P99<500ms;默认按低优先级处理。For example, the tenant can select the service involved as "EIP", and the policies include: query the data of the last hour; the maximum response delay P99 < 50ms; and the default is high priority. Alternatively, the tenant can select the service involved as "EIP", and the policies include: query the data of the last hour; the maximum response delay P99 < 500ms; and the default is low priority.

租户通过操作终端设备向云管理平台220反馈云监控服务配置响应,该云监控服务配置响应中可以携带目标配置信息,该目标配置信息可以用于指示多个分级信息中的目标分级信息。进一步地,云管理平台220可以根据该目标配置信息确定该租户的SLO信息,并将该租户的SLO信息存储至该SLO存储单元250。The tenant feeds back a cloud monitoring service configuration response to the cloud management platform 220 by operating the terminal device. The cloud monitoring service configuration response may carry target configuration information, which may be used to indicate target classification information among multiple classification information. Further, the cloud management platform 220 may determine the SLO information of the tenant based on the target configuration information, and store the SLO information of the tenant in the SLO storage unit 250.

需要说明的是,上述示例仅以通过API配置SLO信息为例对该SLO信息的配置过程进行说明,并不限定租户自定义配置该SLO信息的具体实现方式,在其它实施例中,该SLO信息的配置过程还可以通过其它方式实现,在此不再赘述。可以理解的是,上述基于控制台界面配置SLO信息的示例中,每种分级策略的执行可以通过至少一个API配合实现,因此,租户选择分级策略实质也是配置租户所需的API。It should be noted that the above example only uses the configuration of SLO information through the API as an example to illustrate the configuration process of the SLO information, and does not limit the specific implementation method of the tenant's custom configuration of the SLO information. In other embodiments, the configuration process of the SLO information can also be implemented in other ways, which will not be repeated here. It can be understood that in the above example of configuring SLO information based on the console interface, the execution of each grading strategy can be implemented through at least one API. Therefore, the tenant's selection of a grading strategy is essentially the API required to configure the tenant.

可以理解的是,本申请实施例中,云厂商可以根据租户自定义配置的SLO信息向租户收取费用,其中,不同服务级别对应的基本费用可以不同,云厂商可以根据租户的SLO信息所涉及的服务级别,按照不同的计费规则收取费用。例如,对于高优先级SLO,可以按照100元/天进行计费,中优先级SLO,可以按照10元/天进行计费,低优先级SLO可以按照免费处理。应理解,此处仅通过举例示意性表示对不同级别SLO的不同计费规则,并不限定各级SLO的具体费用。It is understandable that in the embodiments of the present application, the cloud vendor can charge the tenant according to the SLO information customized by the tenant, wherein the basic fees corresponding to different service levels may be different, and the cloud vendor can charge fees according to different billing rules based on the service level involved in the tenant's SLO information. For example, for a high-priority SLO, it can be billed at 100 yuan/day, a medium-priority SLO can be billed at 10 yuan/day, and a low-priority SLO can be processed free of charge. It should be understood that the different billing rules for different levels of SLO are only schematically indicated by example here, and the specific fees of SLOs at all levels are not limited.

2、分析单元2602. Analysis Unit 260

该分析单元260可以从SLO存储单元250读取租户的SLO信息,并从上述至少两种后端存储模块选择与租户的SLO信息匹配的一种后端存储模块存储指标数据,并将上报至云管理平台220的指标数据,路由到所选择的后端存储模块进行存储。该分析单元260也可以将来自租户的查询请求路由到所选择的后端存储模块,以读取租户所需的指标数据。The analysis unit 260 can read the tenant's SLO information from the SLO storage unit 250, and select a backend storage module that matches the tenant's SLO information from the at least two backend storage modules to store the indicator data, and route the indicator data reported to the cloud management platform 220 to the selected backend storage module for storage. The analysis unit 260 can also route query requests from the tenant to the selected backend storage module to read the indicator data required by the tenant.

一种可选的实现方式中,该分析单元260可以获取租户的查询打点统计数据,并根据该查询打点统计数据进行分析处理得到租户的查询模型以及分级处理策略,该查询模型和该分级处理策略可用于指示对该租户关联的指标数据的处理方式。In an optional implementation, the analysis unit 260 can obtain the tenant's query dot statistics, and analyze and process the query dot statistics to obtain the tenant's query model and hierarchical processing strategy. The query model and the hierarchical processing strategy can be used to indicate the processing method of the indicator data associated with the tenant.

以通过API获取租户的查询打点统计数据为例,该分析单元260可以连接到云厂商提供的各个API的入口,租户触发查询时,所调用的API可以统计查询业务的关键打点数据,例如查询时间范围、关联的时间线数量、响应时延、返回记录数等,并将该查询打点统计数据发送给该分析单元260。Taking obtaining the query and management statistics of the tenant through the API as an example, the analysis unit 260 can be connected to the entrances of various APIs provided by the cloud vendor. When the tenant triggers a query, the called API can count the key management data of the query business, such as the query time range, the number of associated timelines, the response delay, the number of returned records, etc., and send the query and management statistics to the analysis unit 260.

该分析单元260可以从SLO存储单元250获取得到的该租户的SLO信息,并根据该租户的SLO信息以及该租户的查询打点统计数据,经过分析处理后获得该租户的查询模型以及分级处理策略。分析单元260所得到的查询模型和/或分级处理策略可以作为分析数据缓存到该缓存单元270中。其中,租户的查询模型可用于指示租户的查询涉及的指标元数据,分级处理策略可用于指示该租户适配的处理方式。分析单元260可以根据该租户的查询模型确定租户的查询涉及的热点元数据,并根据该热点元数据确定是否在缓存单元270中缓存租户的SLO信息关联的指标元数据、以及如何缓存该指标元数据。缓存的指标元数据可用于对来自于云服务模块210的指标数据或来自于租户的查询请求中携带的元数据进行匹配,以便实现对指标数据的存储功能、查询和使用功能等。该分析单元260还可以根据租户的分级处理策略确定以何种处理方式处理来自于云服务模块210的指标数据和来自于租户的查询请求。例如由上述至少两种后端存储模块中的哪个后端存储模块作为目标后端存储模块存储指标数据、从哪个后端存储模块读取指标数据等。The analysis unit 260 can obtain the SLO information of the tenant from the SLO storage unit 250, and obtain the query model and hierarchical processing strategy of the tenant after analysis and processing according to the SLO information of the tenant and the query dotting statistics of the tenant. The query model and/or hierarchical processing strategy obtained by the analysis unit 260 can be cached in thecache unit 270 as analysis data. Among them, the query model of the tenant can be used to indicate the indicator metadata involved in the tenant's query, and the hierarchical processing strategy can be used to indicate the processing method adapted by the tenant. The analysis unit 260 can determine the hotspot metadata involved in the tenant's query according to the query model of the tenant, and determine whether to cache the indicator metadata associated with the tenant's SLO information in thecache unit 270 according to the hotspot metadata, and how to cache the indicator metadata. The cached indicator metadata can be used to match the indicator data from the cloud service module 210 or the metadata carried in the query request from the tenant, so as to realize the storage function, query and use function of the indicator data. The analysis unit 260 can also determine in what processing method to process the indicator data from the cloud service module 210 and the query request from the tenant according to the hierarchical processing strategy of the tenant. For example, which backend storage module among the at least two backend storage modules is used as the target backend storage module to store the index data, from which backend storage module the index data is read, etc.

需要说明的是,本申请实施例中,在租户未自定义配置SLO信息的情况下,该云管理平台220可以确定按照系统默认的最低SLO策略实施对该租户的指标数据的存储功能、查询和使用功能,例如前文中图1述及的方案中,将指标数据按照时间范围写入不同的后端存储模块中。一种可选的实现方式中,云管理平台220例如可以优先满足设置了SLO信息的租户的使用需求,根据租户自定义配置的SLO信息为租户提供对指标数据的存储服务、查询和使用服务。另一种可选的实现方式中,云管理平台220在系统存在剩余云资源(例如存储容量)的情况下,例如可以将租户的热点查询所涉及的指标元数据按照SLO优先级从高到低设置,并将不同级别优先级的SLO查询涉及的指标元数据缓存至缓存单元270,未写入缓存单元的指标元数据则按照系统默认的最低SLO策略实施。It should be noted that, in the embodiment of the present application, when the tenant does not customize the SLO information, the cloud management platform 220 can determine to implement the storage function, query and use function of the tenant's indicator data according to the system default minimum SLO policy, such as the scheme described in Figure 1 above, the indicator data is written into different back-end storage modules according to the time range. In an optional implementation, the cloud management platform 220 can, for example, give priority to meeting the use needs of tenants who have set SLO information, and provide tenants with storage services, query and use services for indicator data according to the SLO information customized by the tenant. In another optional implementation, the cloud management platform 220 can, for example, set the indicator metadata involved in the tenant's hotspot query according to the SLO priority from high to low when there are remaining cloud resources (such as storage capacity) in the system, and cache the indicator metadata involved in the SLO query of different levels of priority to thecache unit 270, and the indicator metadata not written into the cache unit is implemented according to the system default minimum SLO policy.

租户的查询热点可以是动态变化的,例如随着租户的业务调整变化、随着租户对云监控数据的使用需求调整变化、随着云资源的使用情况调整变化等。相应地,该分析单元260 还可以根据租户的查询热点的更新结果,动态确定和/或调整租户的SLO信息关联的指标元数据的优先级、指标元数据的缓存策略或者租户的分级处理策略等,下文中将结合方法流程图进行说明,在此暂不赘述。The query hotspot of the tenant may change dynamically, for example, as the tenant's business is adjusted, as the tenant's demand for cloud monitoring data is adjusted, as the cloud resource usage is adjusted, etc. Accordingly, the analysis unit 260 may also dynamically determine and/or adjust the priority of the indicator metadata associated with the tenant's SLO information, the cache strategy of the indicator metadata, or the tenant's hierarchical processing strategy, etc., according to the update result of the tenant's query hotspot, which will be explained in conjunction with the method flow chart below and will not be repeated here.

3、缓存单元2703.Cache unit 270

该缓存单元270可以与分析单元260通信,用于缓存来自分析单元260的分析数据,例如,与租户的SLO信息关联的指标元数据和/或租户适配的分级处理策略。在另一种可选的实现方式中,该缓存单元可以归属于后端存储器。Thecache unit 270 can communicate with the analysis unit 260 to cache analysis data from the analysis unit 260, for example, indicator metadata associated with the tenant's SLO information and/or a tenant-adapted hierarchical processing strategy. In another optional implementation, the cache unit can belong to the backend memory.

云管理平台220接收到来自云服务模块210的指标数据和/或来自租户的查询请求后,可以从该缓存单元270获取租户的SLO信息关联的指标元数据,并根据所获取的指标元数据与云服务模块210上报的指标数据中携带的元数据、租户的查询请求中携带的云监控指标元数据进行匹配,根据匹配结果在至少两种后端存储模块(该至少两种后端存储模块对应不同的处理等级)选择与租户的SLO信息匹配的一种后端存储模块作为目标后端存储模块,并利用该目标后端存储模块对相应的指标数据进行存储、查询响应等。After the cloud management platform 220 receives the indicator data from the cloud service module 210 and/or the query request from the tenant, it can obtain the indicator metadata associated with the tenant's SLO information from thecache unit 270, and match the obtained indicator metadata with the metadata carried in the indicator data reported by the cloud service module 210 and the cloud monitoring indicator metadata carried in the tenant's query request, and select a backend storage module that matches the tenant's SLO information as the target backend storage module from at least two backend storage modules (the at least two backend storage modules correspond to different processing levels) based on the matching results, and use the target backend storage module to store the corresponding indicator data, respond to queries, etc.

具体的,云管理平台220会从缓存单元270加载租户的SLO信息关联的元数据(例如周期性加载或订阅变化通知),并根据该元数据实时地将云服务模块210调用API上报的指标数据按照元数据类别写入到不同的后端存储模块。例如高优先级SLO需求涉及到的指标数据写入到内存型后端存储模块,中优先级SLO需求涉及的指标数据写入到内存型后端存储模块或SSD型后端存储模块,默认的最低优先级SLO需求涉及的指标数据写入到OBS 型后端存储模块。Specifically, the cloud management platform 220 will load the metadata associated with the tenant's SLO information from the cache unit 270 (such as periodic loading or subscription change notification), and write the indicator data reported by the cloud service module 210 calling the API to different backend storage modules in real time according to the metadata category. For example, the indicator data involved in the high-priority SLO requirement is written to the memory-type backend storage module, the indicator data involved in the medium-priority SLO requirement is written to the memory-type backend storage module or the SSD-type backend storage module, and the indicator data involved in the default lowest-priority SLO requirement is written to the OBS-type backend storage module.

基于数据存储的差异,内存型后端存储模块的查询响应可以达到P99<50ms的时延, SSD型后端存储模块的查询响应可以达到P99<500ms的时延,在租户设置了P99<50ms的目标的情况下,云管理平台220通过以缓存或内存型后端存储模块作为目标处理后端存储指标数据,可以降低时延、更匹配租户对指标数据的使用需求,同时,也可以保障对云资源的有效使用率。在租户设置了P99<500ms的目标的情况下,云管理平台220通过SSD 型后端存储模块作为目标处理后端存储指标数据,可以降低存储成本。Based on the difference in data storage, the query response of the memory-type backend storage module can achieve a latency of P99 < 50ms, and the query response of the SSD-type backend storage module can achieve a latency of P99 < 500ms. When the tenant sets a target of P99 < 50ms, the cloud management platform 220 can reduce latency and better match the tenant's use requirements for indicator data by processing the backend storage indicator data with the cache or memory-type backend storage module as the target, while also ensuring the effective use of cloud resources. When the tenant sets a target of P99 < 500ms, the cloud management platform 220 can reduce storage costs by processing the backend storage indicator data with the SSD-type backend storage module as the target.

可以理解的是,本申请实施例中,缓存单元270还可以用于缓存其它信息,本申请实施例对此缓存单元的具体功能不做限定。It can be understood that in the embodiment of the present application, thecache unit 270 can also be used to cache other information, and the embodiment of the present application does not limit the specific function of this cache unit.

至此,已经结合附图的实施例介绍本申请实施例的云监控系统200及其功能模块。其中,云管理平台可以提供租户期望的服务等级目标信息的自定义通道,以供租户自定义配置SLO信息。对于通过API设置SLO信息的租户,该云管理平台220会按照租户自定义的SLO信息对租户关联的指标数据进行分级处理,以更加匹配租户指标数据的使用诉求。比如,在租户设置P99时延小于50ms的目标的情况下,云管理平台220通过缓存或内存型后端存储模块来降低时延、匹配租户要求。该云管理平台220可以提供分析单元260,该分析单元260可以通过租户的查询打点统计数据获取租户的查询模型,并结合云管理平台220 的设置参数、云资源的实际使用情况,动态调整租户对应的分级处理策略,以在保障租户对指标数据的使用需求的情况下,尽可能充分地利用云资源,降低整体的查询时延、存储成本等,提升用户体验。So far, the cloud monitoring system 200 and its functional modules of the embodiment of the present application have been introduced in combination with the embodiment of the accompanying drawings. Among them, the cloud management platform can provide a custom channel for the service level target information expected by the tenant, so that the tenant can customize the SLO information. For tenants who set SLO information through API, the cloud management platform 220 will hierarchically process the indicator data associated with the tenant according to the SLO information customized by the tenant to better match the usage requirements of the tenant's indicator data. For example, when the tenant sets the target of P99 latency less than 50ms, the cloud management platform 220 reduces the latency and matches the tenant's requirements through a cache or memory-type backend storage module. The cloud management platform 220 can provide an analysis unit 260, which can obtain the tenant's query model through the tenant's query dot statistics, and dynamically adjust the tenant's corresponding hierarchical processing strategy in combination with the setting parameters of the cloud management platform 220 and the actual use of cloud resources, so as to fully utilize cloud resources as much as possible while ensuring the tenant's use requirements for indicator data, reduce the overall query latency, storage costs, etc., and improve user experience.

需要说明的是,本申请上述实施例中,仅是以存储模块和存储功能为例,对指标数据的处理方式的示例说明而非任何限定,在其它实施例中,云管理平台还可以用于实现对指标数据的其它处理功能,包括但不限于接入、加工、汇聚等,相应地,任一处理功能可对应至少两种后端处理模块,针对相应的处理功能,云管理平台可以从至少两种后端处理模块选择一种后端处理模块作为目标处理模块,由该目标处理模块执行对指标数据的相应处理,详细实现过程可参见结合存储功能的相关介绍,在此不再赘述。It should be noted that, in the above-mentioned embodiments of the present application, the storage module and the storage function are only used as examples to illustrate the processing method of the indicator data but not any limitation. In other embodiments, the cloud management platform can also be used to implement other processing functions of the indicator data, including but not limited to access, processing, aggregation, etc. Accordingly, any processing function can correspond to at least two back-end processing modules. For the corresponding processing function, the cloud management platform can select one back-end processing module from at least two back-end processing modules as the target processing module, and the target processing module performs the corresponding processing of the indicator data. The detailed implementation process can be found in the relevant introduction combined with the storage function, which will not be repeated here.

下面结合附图及实施例介绍本申请实施例的云监控方法。The cloud monitoring method of the embodiment of the present application is introduced below in conjunction with the accompanying drawings and embodiments.

图4示出了本申请实施例的云监控方法的流程示意图。其中,该方法可由租户的终端设备、图2所示的云管理平台及其子模块实现。如图4所示,该云监控方法可以包括以下步骤:FIG4 shows a schematic flow chart of a cloud monitoring method according to an embodiment of the present application. The method can be implemented by a tenant's terminal device, the cloud management platform shown in FIG2 and its submodules. As shown in FIG4, the cloud monitoring method may include the following steps:

S410:云管理平台确定租户在所述云管理平台输入或选择的SLO信息。S410: The cloud management platform determines the SLO information input or selected by the tenant on the cloud management platform.

本申请实施例中,该SLO信息由租户自定义配置得到,用于表示租户对指标数据的使用需求,该指标数据为云管理平台提供的云监控服务针对租户在云管理平台购买的云资源进行监控产生的数据。该SLO信息可以持久化地保存在SLO存储单元中。实施S410时,云管理平台可以从SLO存储单元中获取该租户的SLO信息。例如,云管理平台可以根据租户(或租户的终端设备)的唯一标识,在SLO存储单元中获取租户的SLO信息。In an embodiment of the present application, the SLO information is obtained by custom configuration of the tenant and is used to indicate the tenant's demand for the use of indicator data, which is data generated by the cloud monitoring service provided by the cloud management platform for monitoring the cloud resources purchased by the tenant on the cloud management platform. The SLO information can be persistently stored in the SLO storage unit. When implementing S410, the cloud management platform can obtain the tenant's SLO information from the SLO storage unit. For example, the cloud management platform can obtain the tenant's SLO information in the SLO storage unit based on the unique identifier of the tenant (or the tenant's terminal device).

一种可选的实现方式中,租户自定义配置SLO信息时,租户可以通过终端设备向云管理平台发送云监控服务配置请求。相应地,云管理平台可以接收来自租户的云监控服务配置请求,并响应于该云监控服务配置请求向终端设备反馈云监控服务配置响应,该云监控服务配置响应可以用于指示备选配置信息。租户可以根据所述备选配置信息,确定目标配置信息,并通过终端设备向该云管理平台发送该目标配置信息,该目标配置信息可以用于获取租户的SLO信息。详细配置细节可参见上文中结合API格式和控制台界面的相关介绍,在此不再赘述。In an optional implementation, when a tenant customizes the SLO information, the tenant can send a cloud monitoring service configuration request to the cloud management platform through a terminal device. Accordingly, the cloud management platform can receive a cloud monitoring service configuration request from the tenant, and in response to the cloud monitoring service configuration request, feedback a cloud monitoring service configuration response to the terminal device, and the cloud monitoring service configuration response can be used to indicate alternative configuration information. The tenant can determine the target configuration information based on the alternative configuration information, and send the target configuration information to the cloud management platform through the terminal device, and the target configuration information can be used to obtain the tenant's SLO information. Detailed configuration details can be found in the above description of the API format and console interface, which will not be repeated here.

S420:云管理平台从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据。S420: The cloud management platform selects a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data.

本申请实施例中,租户的指标数据可以是由被监控的云资源产生的。In the embodiment of the present application, the tenant's indicator data may be generated by the monitored cloud resources.

通常,租户可以向云监控系统订购云监控服务(包括云监控服务涉及的各项云监控指标)。云管理平台可以根据租户所订购的云监控服务,监控云资源,产生相应的指标数据。实施S420时,该指标数据可以由被监控的云服务模块通过调用API(例如数据通信类API) 上报至云管理平台,也可以是该云管理平台监控云资源获得该指标数据,本申请实施例对该指标数据的产生方式不做限定,下文中,也可以将指标数据的来源统称为数据源。Typically, tenants can order cloud monitoring services (including various cloud monitoring indicators involved in the cloud monitoring services) from the cloud monitoring system. The cloud management platform can monitor cloud resources and generate corresponding indicator data based on the cloud monitoring services ordered by the tenants. When implementing S420, the indicator data can be reported to the cloud management platform by the monitored cloud service module by calling an API (such as a data communication API), or the cloud management platform can monitor cloud resources to obtain the indicator data. The embodiment of the present application does not limit the way the indicator data is generated. In the following, the source of the indicator data may also be collectively referred to as a data source.

示例地,本申请实施例中,该云监控服务涉及的各项云监控指标可以包括但不限于以下至少一项:写入流量/读取流量、原始数据大小、总体QPS、操作次数、服务状态、UE 解析成功流量、UE解析成功行数、UE解析失败行数、UE错误次数、发生错误IP统计。 S420中需要被存储的指标数据可以包括上述至少一项云监控指标对应的云监控数据。For example, in an embodiment of the present application, the cloud monitoring service may include, but is not limited to, at least one of the following: write traffic/read traffic, raw data size, overall QPS, number of operations, service status, UE resolution success traffic, number of UE resolution success lines, number of UE resolution failure lines, number of UE errors, and statistics of IPs with errors. The indicator data to be stored in S420 may include cloud monitoring data corresponding to at least one of the above cloud monitoring indicators.

S420中,云管理平台将选择出的后端存储模块作为目标后端存储模块,并将所需要被存储的指标数据路由至该目标存储后端进行存储。In S420, the cloud management platform uses the selected backend storage module as the target backend storage module, and routes the indicator data to be stored to the target storage backend for storage.

下面将结合图5和图6对图4所示的云监控方法的各个方法步骤进行详细介绍。The following will introduce in detail the various method steps of the cloud monitoring method shown in FIG. 4 in conjunction with FIG. 5 and FIG. 6 .

S501:租户通过终端设备调用API(例如管理类API)自定义配置SLO信息。S501: The tenant calls an API (eg, a management API) through a terminal device to customize the SLO information.

S502:被调用的API将该租户自定义配置的SLO信息持久化地存储至SLO存储单元。S502: The called API persistently stores the SLO information customized by the tenant in the SLO storage unit.

需要说明的是,本申请实施例中,租户自定义配置的SLO信息可以包括租户期望的查询范围和查询能力参数等,上述可选的实现方式仅是对该自定义配置过程的示例说明而非任何限定,在其它实施例中,租户还可以通过其它方式自定义配置该SLO信息,在此不再赘述。It should be noted that in the embodiments of the present application, the SLO information customized by the tenant may include the query range and query capability parameters expected by the tenant, etc. The above optional implementation method is only an example description of the custom configuration process and not any limitation. In other embodiments, the tenant can also customize the SLO information in other ways, which will not be repeated here.

S503:数据源(例如包括前述的云服务模块)调用API(例如数据通信类API)向云管理平台上报租户的指标数据。S503: The data source (eg, including the aforementioned cloud service module) calls an API (eg, a data communication API) to report the tenant's indicator data to the cloud management platform.

S504:分析单元响应于调用的API,从缓存单元中获取租户缓存的指标元数据。具体地,S504可以包括:S504a:分析单元向缓存单元发送读取请求,S504b:该缓存单元响应于该读取请求向处理器返回读取响应,该读取响应中包括租户缓存的指标元数据。S504: The analysis unit obtains the indicator metadata of the tenant cache from the cache unit in response to the called API. Specifically, S504 may include: S504a: The analysis unit sends a read request to the cache unit, S504b: The cache unit returns a read response to the processor in response to the read request, and the read response includes the indicator metadata of the tenant cache.

S505:分析单元将该指标数据中携带的元数据与缓存的指标元数据进行分析匹配,并根据匹配结果在至少两种后端存储模块中确定与租户的SLO信息匹配的一种后端存储模块作为目标后端存储模块。S505: The analysis unit analyzes and matches the metadata carried in the indicator data with the cached indicator metadata, and determines a backend storage module that matches the tenant's SLO information from at least two backend storage modules as a target backend storage module based on the matching result.

S506:分析单元将指标数据路由至该目标后端存储模块,以利用所述目标后端存储模块对该指标数据进行存储。S506: The analysis unit routes the indicator data to the target backend storage module, so as to use the target backend storage module to store the indicator data.

如图5所示,该至少两种后端存储模块(也称为分级后端)可以包括将后端存储器划分得到的多个级别的后端存储模块,例如级别(level)1后端、level 2后端、level 3后端、……、 level n后端,n表示后端存储器的分级数量。分析单元根据缓存的指标元数据进行决策后,可以按照指标数据对应的处理策略,在该n个后端存储模块中确定至少一个目标后端存储模块,并将指标数据发送给该至少一个目标后端存储模块,以利用该至少一个目标后端存储模块对指标数据进行存储。As shown in FIG5 , the at least two backend storage modules (also referred to as hierarchical backends) may include backend storage modules of multiple levels obtained by dividing the backend memory, such as level 1 backend, level 2 backend, level 3 backend, ..., level n backend, where n represents the number of hierarchical levels of the backend memory. After the analysis unit makes a decision based on the cached indicator metadata, it may determine at least one target backend storage module from the n backend storage modules according to the processing strategy corresponding to the indicator data, and send the indicator data to the at least one target backend storage module, so as to use the at least one target backend storage module to store the indicator data.

可以理解的是,图5中仅示意性地表示该分析单元能够将指标数据发送给该n个后端存储模块中的目标后端存储模块,以实现对指标数据的存储,并不限定该n个后端存储模块为目标后端存储模块。It is understandable that FIG5 only schematically shows that the analysis unit can send the indicator data to the target backend storage module among the n backend storage modules to achieve storage of the indicator data, and does not limit the n backend storage modules to the target backend storage modules.

一种可选的实现方式中,云管理平台还可以为租户提供对指标数据的查询和使用功能。In an optional implementation, the cloud management platform can also provide tenants with query and use functions for indicator data.

示例地,当租户触发查询请求时,如图5所示,该查询请求过程可以包括以下步骤:For example, when a tenant triggers a query request, as shown in FIG5 , the query request process may include the following steps:

S507:租户的终端设备调用API上报查询请求。S507: The tenant's terminal device calls the API to report a query request.

S508:分析单元响应于被调用的API,从缓存单元中获取租户缓存的指标元数据。具体地,S508例如可以包括:S508a:分析单元向缓存单元发送读取请求,S508b:该缓存单元响应于该读取请求向分析单元返回读取响应,该读取响应中包括租户缓存的指标元数据。S508: The analysis unit obtains the indicator metadata of the tenant cache from the cache unit in response to the called API. Specifically, S508 may include, for example: S508a: The analysis unit sends a read request to the cache unit, S508b: The cache unit returns a read response to the analysis unit in response to the read request, and the read response includes the indicator metadata of the tenant cache.

S509:分析单元将云监控指标元数据与缓存的指标元数据进行匹配,并根据匹配结果在至少两种后端存储模块中选择与租户的SLO信息匹配的目标后端存储模块。S509: The analysis unit matches the cloud monitoring indicator metadata with the cached indicator metadata, and selects a target backend storage module that matches the tenant's SLO information from at least two backend storage modules according to the matching result.

S510:分析单元将该查询请求路由至该目标后端存储模块,利用该目标后端存储模块向所述终端设备反馈与所述云监控指标元数据对应的目标指标数据,即访问结果。S510: The analysis unit routes the query request to the target backend storage module, and uses the target backend storage module to feed back the target indicator data corresponding to the cloud monitoring indicator metadata, that is, the access result, to the terminal device.

一种可选的实现方式中,分析单元可以动态地确定和/或调整租户的SLO信息,以在保障租户对指标数据的使用需求的情况下,尽可能充分地利用云资源,降低整体的查询响应时延、存储成本等,以提升用户体验。示例地,该过程可以包括以下步骤:In an optional implementation, the analysis unit can dynamically determine and/or adjust the tenant's SLO information to fully utilize cloud resources while ensuring the tenant's demand for the use of indicator data, reduce the overall query response latency, storage costs, etc., and improve the user experience. For example, the process may include the following steps:

S511:在当租户触发查询请求时,被调用的API获取租户的查询打点统计数据,并将该查询打点统计数据发送给分析单元。S511: When the tenant triggers a query request, the called API obtains the query and RBI statistics of the tenant, and sends the query and RBI statistics to the analysis unit.

S512:分析单元从SLO存储单元获取租户的SLO信息。S512: The analysis unit obtains the tenant's SLO information from the SLO storage unit.

S513:分析单元根据租户的查询打点统计数据确定热点元数据,根据该热点元数据对该租户的SLO信息进行分析得到分析结果,该分析结果例如可以为是否缓存租户的SLO信息关联的指标元数据。S513: The analysis unit determines hotspot metadata based on the query and dot statistics of the tenant, and analyzes the SLO information of the tenant based on the hotspot metadata to obtain an analysis result, which may be, for example, indicator metadata associated with whether to cache the tenant's SLO information.

S514:分析单元将租户需要缓存的指标元数据缓存至缓存单元。S514: The analysis unit caches the indicator metadata that the tenant needs to cache into the cache unit.

另一种可选的实现方式中,SLO存储单元中存储的SLO信息可以替换为租户的分级处理策略,该分级处理策略可以由云管理平台的调度器根据租户的SLO信息以及云资源动态调整得到。In another optional implementation, the SLO information stored in the SLO storage unit can be replaced with the tenant's hierarchical processing strategy, which can be dynamically adjusted by the scheduler of the cloud management platform according to the tenant's SLO information and cloud resources.

在一个具体实施例中,如图6所示,租户的分级处理策略可以通过以下步骤得到:In a specific embodiment, as shown in FIG6 , the hierarchical processing strategy of the tenants can be obtained by the following steps:

S601:系统管理员根据租户订购的云服务和/或云监控服务,通过管理类API为租户划分专属云资源的占比阈值和共享云资源的占比阈值,并将该占比阈值作为初始分级处理策略持久化存储到SLO存储单元。S601: The system administrator divides the proportion threshold of exclusive cloud resources and the proportion threshold of shared cloud resources for the tenant through the management API according to the cloud services and/or cloud monitoring services subscribed by the tenant, and stores the proportion threshold as the initial grading processing strategy persistently in the SLO storage unit.

S602:分级后端(包括至少两种后端存储模块)将租户实时写入的各项云监控指标发送到分析单元,该各项云监控指标可以用于指示租户的查询热点。S602: The hierarchical backend (including at least two backend storage modules) sends various cloud monitoring indicators written by the tenant in real time to the analysis unit. The various cloud monitoring indicators can be used to indicate the query hotspots of the tenant.

S603:该分析单元可以结合租户的查询热点的更新结果和系统管理员设置的上述初始分级处理策略,生成自适应的分级处理策略。S603: The analysis unit may generate an adaptive hierarchical processing strategy by combining the update result of the tenant's query hotspot and the above-mentioned initial hierarchical processing strategy set by the system administrator.

S604:调度器从分析单元获取该分级处理策略,并根据该分级处理策略更新租户的SLO 信息关联的指标元数据的优先级以及指标元数据的缓存策略。S604: The scheduler obtains the hierarchical processing strategy from the analysis unit, and updates the priority of the indicator metadata associated with the tenant's SLO information and the cache strategy of the indicator metadata according to the hierarchical processing strategy.

S605:调度器将租户的SLO信息关联的指标元数据的优先级以及指标元数据的缓存策略发送给缓存单元,以便缓存单元根据该优先级和该缓存策略执行对租户的SLO信息关联的指标元数据的缓存步骤。S605: The scheduler sends the priority of the indicator metadata associated with the tenant's SLO information and the caching strategy of the indicator metadata to the cache unit, so that the cache unit performs the caching steps for the indicator metadata associated with the tenant's SLO information according to the priority and the caching strategy.

一种可选的实现方式中,该分析单元和调度器可以周期性地更新租户的SLO信息关联的指标元数据的优先级以及指标元数据的缓存策略,以便根据更新后的指标元数据的优先级和缓存策略更新缓存的指标元数据,从而动态地调整不同分级后端的处理策略,以在保障不同租户对云监控服务的指标数据的不同诉求的同时,提升对云资源的有效使用率。In an optional implementation, the analysis unit and the scheduler may periodically update the priority of the indicator metadata associated with the tenant's SLO information and the caching strategy of the indicator metadata, so as to update the cached indicator metadata according to the updated priority and caching strategy of the indicator metadata, thereby dynamically adjusting the processing strategies of different hierarchical backends, so as to improve the effective utilization of cloud resources while ensuring the different demands of different tenants for indicator data of cloud monitoring services.

结合上述方法实施例,本申请实施例还提供了一种云管理平台,该云管理平台具体结构可以参照上述图2所示,可用于执行上述方法实施例中云管理平台及其各个子模块所执行的方法。In combination with the above method embodiment, the embodiment of the present application also provides a cloud management platform, the specific structure of which can be referred to as shown in Figure 2 above, and can be used to execute the method executed by the cloud management platform and its various sub-modules in the above method embodiment.

如图7所示,该云管理平台700可以包括:SLO存储单元701,用于确定租户在所述云管理平台输入或选择的服务等级目标SLO信息,所述SLO信息用于表示所述租户对指标数据的使用需求,其中所述指标数据为所述云管理平台提供的云监控服务针对所述租户在所述云管理平台购买的云资源进行监控产生的数据;分析单元702,用于从至少两种后端存储模块选择与所述SLO信息匹配的一种后端存储模块存储所述指标数据,其中,所述至少两种后端存储模块读取数据的速率不同和/或所述至少两种后端存储模块老化数据的时间不同。As shown in Figure 7, thecloud management platform 700 may include: an SLO storage unit 701, used to determine the service level target SLO information input or selected by the tenant on the cloud management platform, the SLO information is used to represent the tenant's demand for the use of indicator data, wherein the indicator data is the data generated by the cloud monitoring service provided by the cloud management platform for monitoring the cloud resources purchased by the tenant on the cloud management platform; an analysis unit 702, used to select a backend storage module that matches the SLO information from at least two backend storage modules to store the indicator data, wherein the at least two backend storage modules have different data reading rates and/or the at least two backend storage modules have different data aging times.

示例地,该SLO存储单元701和分析单元702可以集成在图2所示的云管理平台中,本申请实施例对该SLO存储单元701和分析单元702的产品形态不做限定,其功能的详细实现细节可参见上述方法实施例的相关介绍,在此不再赘述。For example, the SLO storage unit 701 and the analysis unit 702 can be integrated in the cloud management platform shown in Figure 2. The embodiment of the present application does not limit the product form of the SLO storage unit 701 and the analysis unit 702. The detailed implementation details of their functions can be found in the relevant introduction of the above-mentioned method embodiment, which will not be repeated here.

需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。It should be noted that the division of units in the embodiments of the present application is schematic and is only a logical function division. There may be other division methods in actual implementation. Each functional unit in the embodiments of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of a software functional unit.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM, Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program code.

在一个简单的实施例中,本领域的技术人员可以想到上述实施例中的云管理平台或终端设备均可采用图8所示的形式。如图8所示的通信装置800,包括至少一个处理器810、存储器820,可选的,还可以包括通信接口830。In a simple embodiment, those skilled in the art may think that the cloud management platform or terminal device in the above embodiment may adopt the form shown in Figure 8. Thecommunication device 800 shown in Figure 8 includes at least one processor 810, a memory 820, and optionally, a communication interface 830.

存储器820可以是易失性存储器,例如随机存取存储器;存储器也可以是非易失性存储器,例如只读存储器,快闪存储器,硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)、或者存储器820是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器820可以是上述存储器的组合。The memory 820 may be a volatile memory, such as a random access memory; the memory may also be a non-volatile memory, such as a read-only memory, a flash memory, a hard disk drive (HDD) or a solid-state drive (SSD), or the memory 820 may be any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto. The memory 820 may be a combination of the above memories.

本申请实施例中不限定上述处理器810以及存储器820之间的具体连接介质。The specific connection medium between the processor 810 and the memory 820 is not limited in the embodiment of the present application.

在如图8的装置中,还包括通信接口830,处理器810在与其他设备进行通信时,可以通过通信接口830进行数据传输。The apparatus as shown in FIG. 8 further includes a communication interface 830 . When the processor 810 communicates with other devices, data can be transmitted through the communication interface 830 .

当云管理平台采用图8所示的形式时,图8中的处理器810可以通过调用存储器820中存储的计算机执行指令,使得装置800可以执行上述任一方法实施例中云管理平台所执行的方法。When the cloud management platform adopts the form shown in Figure 8, the processor 810 in Figure 8 can call the computer execution instructions stored in the memory 820 so that thedevice 800 can execute the method executed by the cloud management platform in any of the above method embodiments.

当租户侧终端设备采用图8所示的形式时,图8中的处理器810可以通过调用存储器 820中存储的计算机执行指令,使得设备800可以执行上述任一方法实施例中租户侧终端设备所执行的方法。When the tenant-side terminal device adopts the form shown in Figure 8, the processor 810 in Figure 8 can call the computer execution instructions stored in the memory 820 so that thedevice 800 can execute the method executed by the tenant-side terminal device in any of the above method embodiments.

本申请实施例还涉及一种芯片系统,该芯片系统包括处理器,用于调用存储器中存储的计算机程序或计算机指令,以使得该处理器执行上述方法实施例。An embodiment of the present application also relates to a chip system, which includes a processor for calling a computer program or computer instructions stored in a memory so that the processor executes the above method embodiment.

在一种可能的实现方式中,该处理器通过接口与存储器耦合。In a possible implementation manner, the processor is coupled to the memory through an interface.

在一种可能的实现方式中,该芯片系统还包括存储器,该存储器中存储有计算机程序或计算机指令。In a possible implementation, the chip system also includes a memory, in which a computer program or computer instructions are stored.

本申请实施例还涉及一种处理器,该处理器用于调用存储器中存储的计算机程序或计算机指令,以使得该处理器执行上述方法实施例。An embodiment of the present application also relates to a processor, which is used to call a computer program or computer instruction stored in a memory so that the processor executes the above method embodiment.

其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制上述图8所示的实施例中的方法的程序执行的集成电路。上述任一处提到的存储器可以为只读存储器 (read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the method in the embodiment shown in FIG8. The memory mentioned in any of the above may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.

应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It should be understood that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application can adopt the form of complete hardware embodiments, complete software embodiments, or embodiments in combination with software and hardware. Moreover, the present application can adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the scope of the embodiments of the present application. Thus, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims (22)

1. A cloud monitoring method, the method comprising:
the cloud management platform determines service level target SLO information input or selected by a tenant at the cloud management platform, wherein the SLO information is used for representing the use requirement of the tenant on index data, and the index data is data generated by monitoring cloud resources purchased by the tenant at the cloud management platform by cloud monitoring service provided by the cloud management platform;
the cloud management platform selects one back-end storage module matched with the SLO information from at least two back-end storage modules to store the index data, wherein the data reading rates of the at least two back-end storage modules are different and/or the aging times of the at least two back-end storage modules are different.
2. The method of claim 1, wherein the at least two back-end storage modules include a memory-type back-end storage module of a type of memory, an SSD-type back-end storage module of a type of solid state drive SSD, and an OBS-type back-end storage module of a type of object storing OBS buckets, wherein the memory-type back-end storage module reads data at a fastest rate, the OBS-type back-end storage module reads data at a slowest rate, the memory-type back-end storage module ages data at a fastest time, and the OBS-type back-end storage module ages data at a slowest time.
3. The method of claim 2, wherein, in the case where the SLO information indicates that the tenant's use requirement for index data is high priority, the cloud management platform selects one back-end storage module that matches the SLO information from at least two back-end storage modules to store the index data, comprising:
and the cloud management platform selects a memory type back-end storage module with a memory type to store the index data.
4. The method of claim 2 or 3, wherein, in a case where the SLO information indicates that a requirement of the tenant for use of index data is a medium priority, the cloud management platform selects one back-end storage module matched with the SLO information from at least two back-end storage modules to store the index data, including:
and the cloud management platform selects a memory type back-end storage module to store the index data.
5. The method of any of claims 2-4, wherein, in the case where the SLO information indicates that the tenant's use requirement for index data is of medium priority, the cloud management platform selects one back-end storage module from at least two back-end storage modules that matches the SLO information to store the index data, comprising:
And the cloud management platform selects an SSD type rear end storage module with SSD type to store the index data.
6. The method of any of claims 2-5, wherein, in a case where the SLO information is used to indicate that a requirement of the tenant for use of index data is low priority, the cloud management platform selects one back-end storage module that matches the SLO information from at least two back-end storage modules to store the index data, including:
and the cloud management platform selects an OBS type back-end storage module of an OBS barrel to store the index data.
7. The method of any of claims 1-6, wherein the cloud management platform determining service level target SLO information entered or selected by a tenant at the cloud management platform comprises:
the cloud management platform providing an application programming interface API to the tenant, the application programming interface API for indicating a plurality of fields representing different attributes of the SLO information;
the cloud management platform receives the SLO information sent by the tenant, wherein the SLO information comprises the plurality of fields and parameters input by the tenant for each field.
8. The method of claim 7, wherein the plurality of fields comprise: namespaces, group names, instance names, monitoring item names, time periods, or SLO information desired by the tenant.
9. The method of any of claims 1-6, wherein the cloud management platform determining service level target SLO information entered or selected by a tenant at the cloud management platform comprises:
the cloud management platform provides a console interface for the tenant;
the cloud management platform determines the SLO information input or selected by the tenant at the console interface, wherein the SLO information comprises a plurality of index attribute configuration items provided by the console interface and parameters input or selected by the tenant for each index attribute configuration item.
10. The method of claim 9, wherein the plurality of index property configuration items comprises: hierarchical name, hierarchical resource scope, hierarchical policy.
11. A cloud management platform, comprising:
the cloud management platform comprises an SLO storage unit, a cloud management platform and a cloud management platform, wherein the SLO storage unit is used for determining service level target SLO information input or selected by a tenant at the cloud management platform, and the SLO information is used for representing the use requirement of the tenant on index data, wherein the index data is data generated by monitoring cloud resources purchased by the tenant at the cloud management platform by cloud monitoring service provided by the cloud management platform;
And the analysis unit is used for selecting one back-end storage module matched with the SLO information from at least two back-end storage modules to store the index data, wherein the data reading rates of the at least two back-end storage modules are different and/or the aging times of the at least two back-end storage modules are different.
12. The cloud management platform of claim 11, wherein said at least two back-end storage modules comprise a memory-type back-end storage module of a type memory, an SSD-type back-end storage module of a type SSD, and an OBS-type back-end storage module of a type OBS bucket, wherein said memory-type back-end storage module reads data at a fastest rate, said OBS-type back-end storage module reads data at a slowest rate, said memory-type back-end storage module ages data at a fastest time, and said OBS-type back-end storage module ages data at a slowest time.
13. The cloud management platform of claim 12, wherein in a case where the SLO information indicates that the tenant's use requirement for index data is high priority, the analysis unit is configured to:
and selecting a memory type back-end storage module with a memory type to store the index data.
14. The cloud management platform of claim 12 or 13, wherein, in a case where the SLO information indicates that the tenant's use requirement for index data is of medium priority, the analysis unit is configured to:
and selecting a memory type back-end storage module to store the index data.
15. The cloud management platform of any of claims 12-14, wherein, in a case where the SLO information indicates that the tenant's use requirement for index data is of medium priority, the analysis unit is configured to:
and selecting an SSD type back-end storage module with the SSD type to store the index data.
16. The cloud management platform of any of claims 12-15, wherein, in a case where the SLO information is used to indicate that the tenant's use requirement for index data is low priority, the analysis unit is configured to:
and selecting an OBS type back-end storage module with the type of the OBS barrel to store the index data.
17. The cloud management platform of any of claims 11-16, wherein the SLO storage unit determining service level target SLO information entered or selected by a tenant at the cloud management platform comprises:
providing an application programming interface API to the tenant, the application programming interface API for indicating a plurality of fields representing different attributes of the SLO information;
And receiving the SLO information sent by the tenant in response to the API being called, wherein the SLO information comprises the plurality of fields and parameters input by the tenant for each field.
18. The cloud management platform of claim 17, wherein said plurality of fields comprises: namespaces, group names, instance names, monitoring item names, time periods, or SLO information desired by the tenant.
19. The cloud management platform of any of claims 11-16, wherein the SLO storage unit determining service level target SLO information entered or selected by a tenant at the cloud management platform comprises:
providing a console interface to the tenant;
and determining the SLO information input or selected by the tenant at the console interface, wherein the SLO information comprises a plurality of index attribute configuration items provided by the console interface and parameters input or selected by the tenant for each index attribute configuration item.
20. The cloud management platform of claim 19, wherein said plurality of index attribute configuration items comprises: hierarchical name, hierarchical resource scope, hierarchical policy.
21. A communication device, comprising: one or more processors and one or more memories;
The one or more memories coupled to the one or more processors, the one or more memories for storing computer program code comprising computer instructions which, when executed by the one or more processors, the apparatus performs the method of any of claims 1-10.
22. A computer readable storage medium for storing a computer program which, when run on a computing device, causes the device to perform the method of any one of claims 1-10.
CN202210142252.2A2021-11-242022-02-16Cloud monitoring method and cloud management platformPendingCN116166181A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
PCT/CN2022/116891WO2023093194A1 (en)2021-11-242022-09-02Cloud monitoring method and cloud management platform

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
CN20211139973902021-11-24
CN2021113997392021-11-24

Publications (1)

Publication NumberPublication Date
CN116166181Atrue CN116166181A (en)2023-05-26

Family

ID=86413761

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210142252.2APendingCN116166181A (en)2021-11-242022-02-16Cloud monitoring method and cloud management platform

Country Status (2)

CountryLink
CN (1)CN116166181A (en)
WO (1)WO2023093194A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119030982A (en)*2024-08-212024-11-26北京火山引擎科技有限公司 A method, device and equipment for scheduling tasks with multi-tenant isolation on the cloud
WO2024251115A1 (en)*2023-06-062024-12-12华为云计算技术有限公司Packet flow transmission method and related device
WO2025011056A1 (en)*2023-07-072025-01-16华为云计算技术有限公司Cloud disk service change recommendation method and apparatus
CN119807278A (en)*2024-12-182025-04-11北京百度网讯科技有限公司 Monitoring index data collection method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116956363B (en)*2023-09-202023-12-05微网优联科技(成都)有限公司Data management method and system based on cloud computer technology

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8468241B1 (en)*2011-03-312013-06-18Emc CorporationAdaptive optimization across information technology infrastructure
US10078533B2 (en)*2014-03-142018-09-18Amazon Technologies, Inc.Coordinated admission control for network-accessible block storage
WO2017074320A1 (en)*2015-10-272017-05-04Hewlett Packard Enterprise Development LpService scaling for batch processing
CN108462596B (en)*2017-02-212021-02-23华为技术有限公司SLA decomposition method, equipment and system
CN109451008B (en)*2018-10-312021-05-28中国人民大学 A multi-tenant bandwidth guarantee framework and cost optimization method under cloud platform

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2024251115A1 (en)*2023-06-062024-12-12华为云计算技术有限公司Packet flow transmission method and related device
WO2025011056A1 (en)*2023-07-072025-01-16华为云计算技术有限公司Cloud disk service change recommendation method and apparatus
CN119030982A (en)*2024-08-212024-11-26北京火山引擎科技有限公司 A method, device and equipment for scheduling tasks with multi-tenant isolation on the cloud
CN119807278A (en)*2024-12-182025-04-11北京百度网讯科技有限公司 Monitoring index data collection method and device

Also Published As

Publication numberPublication date
WO2023093194A1 (en)2023-06-01

Similar Documents

PublicationPublication DateTitle
CN116166181A (en)Cloud monitoring method and cloud management platform
US10757035B2 (en)Provisioning cloud resources
CN108776934B (en)Distributed data calculation method and device, computer equipment and readable storage medium
US9971823B2 (en)Dynamic replica failure detection and healing
US10169090B2 (en)Facilitating tiered service model-based fair allocation of resources for application servers in multi-tenant environments
US20210342193A1 (en)Multi-cluster container orchestration
US7984151B1 (en)Determining placement of user data to optimize resource utilization for distributed systems
US8429096B1 (en)Resource isolation through reinforcement learning
US8429097B1 (en)Resource isolation using reinforcement learning and domain-specific constraints
CN109981702B (en)File storage method and system
US9438665B1 (en)Scheduling and tracking control plane operations for distributed storage systems
US9466036B1 (en)Automated reconfiguration of shared network resources
US10712958B2 (en)Elastic storage volume type selection and optimization engine for public cloud environments
US11275667B2 (en)Handling of workload surges in a software application
US10158709B1 (en)Identifying data store requests for asynchronous processing
CN110597858A (en) Task data processing method, device, computer equipment and storage medium
US11914894B2 (en)Using scheduling tags in host compute commands to manage host compute task execution by a storage device in a storage system
TW201820165A (en)Server and cloud computing resource optimization method thereof for cloud big data computing architecture
US10102230B1 (en)Rate-limiting secondary index creation for an online table
US10277529B2 (en)Visualization of computer resource quotas
US20170272475A1 (en)Client identification for enforcing computer resource quotas
CN110839069A (en)Node data deployment method, node data deployment system and medium
WO2023196042A1 (en)Data flow control in distributed computing systems
CN102111438A (en)Method and device for parameter adjustment and distributed computation platform system
CN114296891A (en)Task scheduling method, system, computing device, storage medium and program product

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp