Please refer to a schematic diagram of a resource management middleware provided in an embodiment of the present application. The resource management middleware in the embodiment of the application is a Java middleware developed by using a Spring Boot technology, provides an access interface to the job logic management middleware in the form of REST API, and has the main functions of:

1 provides REST APIs that manage job lifecycle in a unified abstract big data job data format, including but not limited to create jobs, get job status, stop jobs;

2, converting the uniform abstract big data job format into a resource description corresponding to a mode, specifically, converting the uniform abstract big data job format into a corresponding big data job format supported by a container cluster in the embodiment of the application; the converted batch type job Yaml and the ad hoc query type job Yaml in the embodiment of the present application are as follows:

spark batch operation

·

Hive instant query job

·

3. Submitting the converted mode-related job Yaml to a Kubernetes container cluster, and starting scheduling and running the job by the container cluster

4. The resource management middleware of the embodiment of the application acquires the state of the submitted job from the Kubernets container cluster through the API provided by the Kubernets container cluster, converts the state into uniform abstraction, and feeds the uniform abstraction back to the logic middleware

The important feature of the embodiment of the present application is that a container cluster is used to run big data operations, and the container cluster in the embodiment of the present application adopts a container arrangement platform kubernets which is mainstream in the industry. Kubernetes is open-source container orchestration platform system software, provides a good expansion mechanism such as an Operator to expand self capacity, and in the embodiment of the application, a plurality of Operator plug-ins of community open sources are deployed, so that the capacity of adapting to multi-mode large data framework operation is achieved. Specifically, the method comprises the following steps:

1. the Kubernets cluster of the embodiment is provided with spark-on-k8s-Operator which is open source by Google corporation and Operator plug-ins of two kinds of Kubernets which are open source by Huawei corporation, so as to respectively drive corresponding multi-mode big data frame operation

2. After receiving a job request in a Yaml format submitted by a resource management middleware, the kubernets container cluster of this embodiment matches the job request to an installed Operator to perform task scheduling, and the Operator starts scheduling a big data frame job task corresponding to the Operator according to parameters in the specific Yaml

After the operator finishes scheduling, kubernets creates a large data frame job cluster in a container mode and runs the job, and after the job runs, the container is destroyed

Preferably, the embodiment also deploys a high-performance network support component, and in the embodiment, the work content and the flow of this part are as follows: deploying and configuring high performance network hardware and plug-ins on a container cluster:

(1) In the embodiment of the application, the nodes of the container cluster are provided with the high-performance network cards, and the embodiment is provided with the RoCE and InfiniBand network cards

(2) In the container cluster in the embodiment of the application, a self-developed kubernetesevice plug is deployed and used for communicating with a high-performance network card on a node

2, when a user submits a job, declaring high-performance hardware required to be used in a uniform abstract format; specifically, the Yaml definition of the big data job declaration high-performance network resource in the embodiment of the present application is as follows:

3, when the resource management middleware performs resource template rendering, the resource management middleware converts the high-performance network resource declared in the uniform abstraction into a specific container cluster resource configuration, specifically, the configuration in the embodiment of the present application is as follows:

the beneficial effect of this embodiment lies in:

unified management on the multi-mode big data operation cluster is realized through containerization cluster arrangement and Operator management, and operation and maintenance cost is effectively reduced;

the consistency of the multi-mode big data job scheduling management is realized through a uniform abstract mechanism, and the difference of the life cycles of the multi-mode big data jobs is shielded for users; end-users can submit jobs with a unified UI or API

Based on the realization mode of uniform abstraction and Operator drive, the method is more convenient for expanding the big data frame operation supporting other modes in the future, and only the corresponding Operator needs to be realized and deployed

By utilizing container cluster arrangement, the cluster environment for running the big data job does not need to be deployed in advance, but is bound along the life cycle of the big data job, and the big data job is deleted when the big data job is used up; greatly improves the resource utilization rate

The system provides high-performance network support for big data jobs in a mode transparent to users, the users only need to declare that the high-performance network is needed when submitting jobs, the system automatically carries out drive layer configuration and distributes related resources, and the communication bottleneck under the large-scale distributed job scene is solved

It is to be noted that the foregoing description is only exemplary of the invention and that the principles of the technology may be employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A multi-mode big data job scheduling system based on a container cluster is characterized in that the system comprises at least one logic middleware, a resource management middleware and a container cluster;

the logic middleware is used for receiving a client big data job request by using a uniform API (application programming interface), wherein the big data job request comprises basic information of the big data job;

the resource management middleware is used for carrying out mode-related adaptation and format processing according to the basic information of the big data operation, converting the mode-related adaptation and format processing into a resource description file supported by the container cluster and submitting the resource description file to the container cluster;

2. The scheduling system of claim 1 wherein the logic middleware is to provide a uniform programming interface out of the multi-mode big data frame job; the logical middleware is also used for providing uniform abstraction for the life cycle of the multi-mode big data frame operation.

3. The scheduling system of claim 1 wherein the resource management middleware is to provide resource layer configuration for multi-mode big data frame jobs;

the resource management middleware is used for being responsible for scheduling adaptation to the container cluster aiming at multi-mode big data frame operation;

4. The scheduling system of claim 1 wherein the container cluster is configured to provide a containerized runtime environment for multi-mode big data jobs, wherein a scheduler supporting multiple big data framework modes is deployed on the container cluster.

5. The scheduling system of claim 4 wherein the resource management middleware is further operable to provide resource configuration for high performance networks for big data frame jobs;

6. A multi-mode big data job scheduling method based on a container cluster is characterized by comprising the following steps:

the method comprises the steps that a logic middleware receives a client big data job request through a uniform API (application programming interface), wherein the big data job request comprises basic information of a big data job;

7. The scheduling method according to claim 6, wherein the step of performing, by the resource management middleware, mode-dependent adaptation and format processing according to the basic information of the big data job, converting the result into the resource description file supported by the container cluster, and submitting the resource description file to the container cluster includes:

converting the uniform operation abstract model into resource allocation of a corresponding big data scheduling plug-in on a container cluster by the resource management middleware according to concrete big data frame operation information;

correspondingly, the step of scheduling, by the container cluster, the big data job in the corresponding mode by using a pre-deployed Operator specifically includes:

and inquiring and acquiring the operation condition from the container cluster by the resource manager, converting the operation state into a unified operation abstract life state, and feeding back the unified operation abstract life state to the logic middleware.

8. The scheduling method of claim 7, wherein the scheduling method further comprises:

the logic middleware receives a stop request of a client for big data frame operation through the uniform API;

converting a uniform big data operation model into resource configuration of a corresponding big data scheduling plug-in on a container cluster by the resource management middleware according to the concrete big data operation abstraction;

the container cluster stops corresponding big data operation by using a corresponding Operator according to parameters in the operation resource request, and releases container cluster resources;

and the resource management middleware acquires a stopping result of the big data operation from the container cluster, converts the stopping result into a life state of uniform operation abstraction and feeds the life state back to the logic middleware.

9. The scheduling method of claim 7, wherein the scheduling method further comprises:

the logic middleware receives a query request of a client for a big data operation state through the uniform API;

the resource management middleware converts a uniform operation model into resource allocation of a corresponding big data scheduling plug-in on a container cluster according to specific big data operation;

the resource management middleware sends the converted job resource request to the container cluster to acquire the state of a job container;

and converting the operation container state acquired from the container cluster into a life state of unified operation abstraction by the resource management middleware, and feeding back the life state to the logic middleware.

10. The scheduling method of claim 6 wherein the resource manager sets high performance network parameters in a pattern dependent big data job resource configuration; each working node of the container cluster comprises an RDMA network card;

correspondingly, the scheduling method further comprises the following steps:

the logic middleware provides the client with uniform big data operation template parameters to declare the requirement on high-performance network resources;

configuring high-performance network resources for the operation by the resource middleware, and submitting the high-performance network resources to the operation cluster;

and allocating corresponding high-performance network resources for the operation when the big data operation runs by the operation cluster.