CN117519978A

Movatterモバイル変換

Info

Publication number: CN117519978A
Application number: CN202311553961.0A
Authority: CN
Inventors: 廖望; 王涛; 周丽斌; 袁明明; 李世钰
Original assignee: Inspur Communication Information System Co Ltd
Current assignee: Inspur Communication Information System Co Ltd
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-06

Abstract

The invention particularly relates to an AI cloud edge platform grinding method. The AI cloud edge platform grinding method adopts a GitOps operation model technical method, uses Git as a version control tool, and realizes model development and version control; an automatic edge computing resource management technical method is used for realizing resource prediction and allocation, and a declarative application program deployment technical method is used for defining deployment requirements and characteristics for a model; continuously monitoring the deployed AI model by using a continuous verification mechanism technical method; using the Chaos Engineering technique of edge computation, faults are simulated on edge devices periodically, and optimized according to observations. The AI cloud edge platform grinding method not only simplifies the deployment and management flow, improves the resource utilization rate and the calculation efficiency, but also enhances the security and observability of micro-service communication between the cloud and the edge, improves the elasticity and stability of the application program in a real fault scene, and can meet the requirements of various cloud edge scenes.

Description

AI cloud edge platform grinding and transporting method

Technical Field

The invention relates to the technical field of cloud computing and edge computing, in particular to an AI cloud edge platform grinding method.

Background

With the rapid development of cloud computing and artificial intelligence technology, more and more enterprises and developers wish to be able to deploy and run AI applications on cloud and edge devices. Cloud computing provides powerful computing power and flexible resource management, while edge computing can bring computing tasks closer to the data source, thereby reducing latency and improving response speed. However, how to efficiently deploy, manage, and optimize AI applications on cloud and edge devices remains a challenge.

The traditional software development and deployment method often separates two stages of development and operation, which causes problems of deployment delay, configuration error, resource waste and the like. To address these issues, the DevOps methodology has evolved, which emphasizes the close collaboration of development and operation dimensions, enabling continuous integration and continuous delivery. However, most existing DevOps tools and practices are primarily directed to cloud applications, not edge computing environments.

Edge computing environments have their unique challenges such as resource limitation, network instability, and device diversity. These challenges complicate deployment and management of AI applications on edge devices. Therefore, a new technical method is needed, which can integrate advantages of cloud computing and edge computing, and meanwhile, consider practices and principles of DevOps to achieve efficient deployment and operation of AI applications.

In order to improve the whole agility and operation efficiency of a supply chain, the invention provides an AI cloud edge platform grinding method.

Disclosure of Invention

The invention provides a simple and efficient AI cloud edge platform grinding method for overcoming the defects of the prior art.

The invention is realized by the following technical scheme:

an AI cloud edge platform grinding method is characterized in that: the method comprises the following steps:

step S1, research and development stage

Step S1.1, model development and version control

By adopting the technical method of the GitOps operation model, using Git as a version control tool, a developer locally develops an AI model and carries out version control to ensure the version consistency of the AI model, and automatically triggers deployment or update flow when the AI model or related configuration thereof is updated in a Git warehouse;

step S1.2, resource prediction and allocation

After model development is completed, predicting the resource requirement of the AI model on the edge equipment by using an automatic edge computing resource management technical method; according to the prediction result and the actual resource condition of the edge equipment, automatically distributing and scheduling proper resources for the AI model;

step S1.3, model deployment

Defining deployment requirements and characteristics for the model by using a declarative application deployment technique; the AI model and the dependence thereof are packed into a container through a containerization technology, such as Docker, so that the model can stably run on different edge devices;

step S1.4, microservice communication

The service grid technology method is used for managing micro-service communication on the cloud and the edge equipment, so that load balancing, service discovery and safety communication among the micro-services are realized, and safety, high efficiency and reliability of communication among the micro-services are ensured;

step S2, maintenance stage

Step S2.1, continuous monitoring and verification

Continuously monitoring the deployed AI model by using a continuous verification mechanism technical method; collecting operation data of an AI model and comparing the operation data with a preset performance index;

step S2.2, fault simulation and optimization

Using Chaos Engineering technical method of edge calculation, periodically simulating faults on edge equipment, observing the behavior of an AI model after fault injection, evaluating the elasticity and stability of the AI model, and optimizing according to the observation result;

step S2.3, AI model update and deployment

When the AI model needs to be updated, a developer updates locally and submits the AI model through Git; the cloud side platform automatically detects the updating of the AI model and triggers an automatic deployment flow, so that the AI model on the edge equipment is ensured to be always synchronous with the cloud side.

In the step S1.1, a continuous integration/continuous deployment (Continuous Integration/Continuous Deployment, abbreviated as CI/CD) pipeline is adopted to realize automatic test, verification and deployment of the model.

Step S1.2, predicting the resource requirement of the AI application, including PU, memory and storage, by using a machine learning model aiming at the limitation of the computing capacity, storage and network bandwidth of the edge equipment; and according to the predicted resource demand and the actual resource condition of the edge equipment, the allocation and the scheduling of the resources are automatically carried out, and the efficient operation of the AI application is ensured.

In the step S1.3, when the declarative application deployment technique is adopted, each device is not required to be manually configured, and the user only needs to declare the requirements and characteristics of the AI application, including the required computing resources, the dependent libraries and services, and the platform automatically completes the deployment process.

In the step S1.4, using a service grid technology Istio or linker to manage micro-service communication on the cloud and the edge device; enhancing security, observability, and resiliency between micro-services.

In the step S2.1, the system periodically detects performance indexes and safety indexes, and compares the performance indexes and the safety indexes with a preset threshold value to ensure that the service level agreement SLA is satisfied; after the AI application is deployed, continuously collecting running data of the application, including response time and error rate, and comparing the running data with preset performance indexes; if the performance is detected to be reduced or the error is detected to be increased, an optimization flow is automatically triggered, and resources are allocated again after the AI model is finely tuned, so that the stability and the reliability of AI application are ensured.

In the step S2.2, a network interrupt fault or a CPU overload fault is periodically introduced, and the response of the AI model is observed, so that the AI model can still operate normally under a real fault scene.

AI cloud limit end platform grinding equipment, its characterized in that: comprising a memory and a processor; the memory is used for storing a computer program, and the processor is used for implementing the above method steps when executing the computer program.

A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the above-described method steps.

The beneficial effects of the invention are as follows: the AI cloud edge platform grinding method not only simplifies the deployment and management flow, improves the resource utilization rate and the calculation efficiency, but also enhances the security and observability of micro-service communication between the cloud and the edge, improves the elasticity and stability of the application program in a real fault scene, and can meet the requirements of various cloud edge scenes.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an AI cloud end platform of the invention.

Detailed Description

In order to enable those skilled in the art to better understand the technical solution of the present invention, the following description will make clear and complete description of the technical solution of the present invention in combination with the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The AI cloud edge platform grinding method comprises the following steps:

step S1, research and development stage

Step S1.1, model development and version control

step S1.2, resource prediction and allocation

step S1.3, model deployment

step S1.4, microservice communication

step S2, maintenance stage

Step S2.1, continuous monitoring and verification

step S2.2, fault simulation and optimization

step S2.3, AI model update and deployment

The AI cloud edge platform grinding equipment comprises a memory and a processor; the memory is used for storing a computer program, and the processor is used for implementing the above method steps when executing the computer program.

The readable storage medium has stored thereon a computer program which, when executed by a processor, implements the above-described method steps.

Compared with the prior art, the AI cloud edge platform grinding method has the following characteristics:

1) The automation strategy is provided, consistency of application programs and configuration between the cloud and the edge equipment can be ensured, and deployment and management processes are simplified.

2) And through an intelligent resource management strategy, the resource allocation of the edge equipment is optimized, and the resource utilization rate and the computing efficiency are improved.

3) A continuous verification mechanism is introduced that automatically detects and verifies the performance, security and reliability of an application to ensure that a particular Service Level Agreement (SLA) is met.

4) And the security and observability of micro-service communication between the cloud and the edge are enhanced by utilizing the service grid technology.

5) The elasticity and the stability of the application program under a real fault scene are improved through Chaos Engineering.

The above examples are only one of the specific embodiments of the present invention, and the ordinary changes and substitutions made by those skilled in the art within the scope of the technical solution of the present invention should be included in the scope of the present invention.

Claims

1. An AI cloud edge platform grinding method is characterized in that: the method comprises the following steps:

step S1, research and development stage

Step S1.1, model development and version control

step S1.2, resource prediction and allocation

step S1.3, model deployment

Defining deployment requirements and characteristics for the model by using a declarative application deployment technique; through a containerization technology, the AI model and the dependence thereof are packaged into a container, so that the model can stably run on different edge devices;

step S1.4, microservice communication

step S2, maintenance stage

Step S2.1, continuous monitoring and verification

step S2.2, fault simulation and optimization

step S2.3, AI model update and deployment

2. The AI cloud end platform grinding method of claim 1, wherein the AI cloud end platform grinding method is characterized by comprising the following steps: in the step S1.1, a continuous integration/continuous deployment pipeline is adopted to realize automatic test, verification and deployment of the model.

3. The AI cloud end platform grinding method of claim 1, wherein the AI cloud end platform grinding method is characterized by comprising the following steps: step S1.2, predicting the resource requirement of the AI application, including PU, memory and storage, by using a machine learning model aiming at the limitation of the computing capacity, storage and network bandwidth of the edge equipment; and according to the predicted resource demand and the actual resource condition of the edge equipment, the allocation and the scheduling of the resources are automatically carried out, and the efficient operation of the AI application is ensured.

4. The AI cloud end platform grinding method of claim 1, wherein the AI cloud end platform grinding method is characterized by comprising the following steps: in the step S1.3, when the declarative application deployment technique is adopted, each device is not required to be manually configured, and the user only needs to declare the requirements and characteristics of the AI application, including the required computing resources, the dependent libraries and services, and the platform automatically completes the deployment process.

5. The AI cloud end platform grinding method of claim 1, wherein the AI cloud end platform grinding method is characterized by comprising the following steps: in the step S1.4, using a service grid technology Istio or linker to manage micro-service communication on the cloud and the edge device; enhancing security, observability, and resiliency between micro-services.

6. The AI cloud end platform grinding method of claim 1, wherein the AI cloud end platform grinding method is characterized by comprising the following steps: in the step S2.1, the system periodically detects performance indexes and safety indexes, and compares the performance indexes and the safety indexes with a preset threshold value to ensure that the service level agreement SLA is satisfied; after the AI application is deployed, continuously collecting running data of the application, including response time and error rate, and comparing the running data with preset performance indexes; if the performance is detected to be reduced or the error is detected to be increased, an optimization flow is automatically triggered, and resources are allocated again after the AI model is finely tuned, so that the stability and the reliability of AI application are ensured.

7. The AI cloud end platform grinding method of claim 1, wherein the AI cloud end platform grinding method is characterized by comprising the following steps: in the step S2.2, a network interrupt fault or a CPU overload fault is periodically introduced, and the response of the AI model is observed, so that the AI model can still operate normally under a real fault scene.

8. AI cloud limit end platform grinding equipment, its characterized in that: comprising a memory and a processor; the memory is adapted to store a computer program, the processor being adapted to implement the method steps of any of claims 1 to 7 when the computer program is executed.

9. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method steps of any of claims 1 to 7.