Movatterモバイル変換


[0]ホーム

URL:


CN120780447A - Task resource allocation method and related device - Google Patents

Task resource allocation method and related device

Info

Publication number
CN120780447A
CN120780447ACN202410405255.XACN202410405255ACN120780447ACN 120780447 ACN120780447 ACN 120780447ACN 202410405255 ACN202410405255 ACN 202410405255ACN 120780447 ACN120780447 ACN 120780447A
Authority
CN
China
Prior art keywords
task
data
target
amount
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410405255.XA
Other languages
Chinese (zh)
Inventor
罗达明
周正中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Filing date
Publication date
Application filed by Huawei Technologies Co LtdfiledCriticalHuawei Technologies Co Ltd
Publication of CN120780447ApublicationCriticalpatent/CN120780447A/en
Pendinglegal-statusCriticalCurrent

Links

Abstract

A resource allocation method for tasks is used for improving the rationality of task resource allocation, thereby improving the efficiency of a computing cluster when executing tasks. In the resource allocation method of the task, the target data volume involved in the task execution is predicted based on the data screening condition indicated in the task and the total data accessed by the computing cluster, and then the target resource volume required by the task execution is predicted based on the target data volume. Therefore, when the tasks are delivered to the computing cluster, the target resource amount required by the tasks can be designated, so that the computing cluster is ensured to allocate proper resource amount for the tasks, each task can be ensured to have the highest execution efficiency when being executed, the normal execution of the tasks is prevented from being influenced due to unreasonable allocation of the task resources, and the task execution efficiency of the computing cluster is effectively improved.

Description

Task resource allocation method and related device
Technical Field
The present application relates to the field of cluster computing technologies, and in particular, to a task resource allocation method and a related device.
Background
With the development of technology, various fields in society are generating a large amount of data every moment, such as transaction data in the e-commerce field, multimedia data in the streaming media field, doctor-patient data in the medical field, and the like. In most areas, users will often desire to analyze large amounts of data maintained in a database. Currently, in order to implement analysis processing of large-scale data, a computing cluster has been developed. A typical computing cluster Spark is a powerful distributed computing framework capable of processing large-scale data sets to accomplish the processing of various tasks.
In the related art, when a task is executed based on a computing cluster Spark, a dynamic resource allocation scheme is generally used to automatically allocate resources for the task. In the dynamic resource allocation scheme, in order to complete tasks in the queue more quickly, a scheduling mechanism allocates as many resources as possible to the tasks in the queue, so that a phenomenon of resource waste is easy to occur.
In general, the existing dynamic resource allocation scheme in the industry has the phenomenon of unreasonable resource allocation, is easy to have the conditions of long task waiting time and shortage of cluster computing resources, and finally leads to high task operation failure rate. Therefore, a better resource allocation scheme is needed to solve the problem of unreasonable allocation of computing resources in the computing clusters.
Disclosure of Invention
The application provides a resource allocation method for tasks, which can improve the rationality of task resource allocation, thereby improving the efficiency of a computing cluster when executing tasks.
The first aspect of the present application provides a method for allocating resources of a task, where the method is used for allocating resources for executing the task to be executed in a computing cluster. The resource allocation method of the tasks comprises the steps of firstly, acquiring a first task, wherein the first task is used for indicating a computing cluster to process data meeting data screening conditions, so that the computing cluster can clearly execute which part of data in a data storage system is processed when the first task is executed.
And then, predicting the target data volume to be processed when the computing cluster executes the first task based on the data screening condition and the total data accessed by the computing cluster. That is, the target data amount is an indication of what total amount of data needs to be processed when the computing cluster performs the first task.
Second, based on the target amount of data, a target amount of resources required for the first task to execute, i.e., the processing resources that the computing cluster needs to allocate to the first task, is predicted. Wherein the target amount of resources is used to indicate the amount of processor resources and/or the amount of memory, i.e. to indicate at least one of the amount of processor resources and the amount of memory. And, the target resource amount predicted based on the target data amount is a resource amount that can make the execution efficiency of the first task as high as possible. That is, when the processing resources are allocated to the first task in accordance with the target resource amount, the execution efficiency of the first task can be made as high as possible.
And finally, sending a task processing request to the computing cluster, wherein the task processing request is used for requesting the computing cluster to process the first task and designating the amount of resources allocated by the computing cluster for the first task as a target amount of resources, so that the computing cluster can allocate processing resources for the first task according to the designated amount of resources, and excessive allocation of resources by the computing cluster for the first task is avoided.
In the scheme, the target data volume involved in task execution is predicted based on the data screening conditions indicated in the task and the total data accessed by the computing clusters, and then the target resource volume required by task execution is predicted based on the target data volume. Therefore, when the tasks are delivered to the computing cluster, the target resource amount required by the tasks can be designated, so that the computing cluster is ensured to allocate proper resource amount for the tasks, each task can be ensured to have the highest execution efficiency when being executed, the normal execution of the tasks is prevented from being influenced due to unreasonable allocation of the task resources, and the task execution efficiency of the computing cluster is effectively improved.
In one possible implementation, the method further includes determining, based on the data screening conditions, a data redistribution operation involved in the execution of the first task, the data redistribution operation to redistribute data processed by the plurality of computing nodes on the computing cluster. Since the first task is actually to perform processing on data that satisfies the data filtering condition, it can be determined whether the data redistribution operation can be performed after determining that the data that is actually required to be processed is based on the data filtering condition.
In predicting the target amount of resources required for the first task to execute, the target amount of resources required for the first task to execute may be specifically predicted based on the target amount of data and the data redistribution operation. Generally, if a data redistribution operation is involved in the task processing, the amount of resources required for the task tends to be increased compared to a task that has the same amount of data but does not involve the data redistribution operation. That is, where a task involves a data redistribution operation, the processing of the task tends to require more resources to complete.
In the scheme, whether the task execution process involves data redistribution operation is determined based on the data screening condition indicated by the task, and the resource amount required by the task execution process is predicted further based on the data amount involved by the task and the data redistribution operation, so that the accuracy of the predicted resource amount can be effectively improved.
In one possible implementation, the target amount of resources required for execution of the first task may be predicted by a regression model based on the target amount of data and the data redistribution operation. The regression model is obtained by fitting data based on historical task processing of the computing clusters. And substituting the target data quantity and the data redistribution operation into the regression model obtained by fitting, so that the target resource quantity required by the execution of the first task output by the regression model can be obtained.
In particular, the historical task processing data of the computing cluster may include the amount of data actually involved in the task performed over the historical period of time, the data redistribution operations involved in the task, and the amount of resources allocated upon task execution. Therefore, based on the historical task processing data of the computing cluster, the relation among the data quantity related to the task, the data redistribution operation related to the task and the resource quantity distributed during task execution can be obtained, and accordingly a corresponding regression model is obtained through fitting.
In the scheme, the regression model is fitted by adopting the historical task processing data of the computing cluster, so that the relation between the data quantity related to the task and the resource quantity required by the task and the data redistribution operation can be effectively represented by the regression model, and the resource quantity required by the task execution can be accurately and effectively predicted.
In one possible implementation, the task processing efficiency corresponding to the historical task processing data satisfies a preset condition, e.g., the task processing efficiency is greater than or equal to a preset threshold. The task processing efficiency is related to the data amount of the task, the allocated resource amount of the task and the execution time of the task.
In the scheme, the regression model is fitted by screening the task processing data corresponding to higher execution efficiency, so that the first task can have higher execution efficiency when resources are allocated to the first task based on the predicted resource amount of the regression model, and the rationality of resource allocation is further ensured.
In one possible implementation, the computing cluster includes a plurality of task queues for processing tasks, and the task processing request is further for specifying a target queue for processing the first task, the target queue being one of the plurality of task queues. That is, the task processing request is a target queue that simultaneously specifies the target amount of resources that the first task needs to allocate and processes the first task. Thus, when the computing cluster processes the first task, the first task is placed in the target queue, and a target amount of resources is allocated for the first task to execute the first task.
In the scheme, under the condition that the computing cluster comprises a plurality of task queues, the target queues for processing the first tasks are appointed, so that the loads of the plurality of task queues can be guaranteed to be in an equilibrium state as far as possible, and the resources in the computing cluster can be reasonably utilized under the condition that part of task queues are overloaded and the other part of task queues are underloaded.
In one possible implementation manner, in order to determine the target queue corresponding to the first task, the method further includes the steps of retrieving, for each task queue, a plurality of target samples closest to the resource occupancy, from among the historical samples of the plurality of task queues, based on the resource occupancy of each task queue in the plurality of task queues at the current time, where the historical samples are used to indicate the resource occupancy of the task queue in the historical time period. The target samples corresponding to each task queue belong to the history samples of each task queue.
Then, based on a plurality of target samples corresponding to each task queue, the expected resource occupation condition of each task queue at the future time is predicted. The resource occupation condition under a certain time period after a plurality of target samples is averaged to obtain a resource occupation condition, and the resource occupation condition can be used as the expected resource occupation condition of the task queue in the future time.
Second, a target queue is determined among the plurality of task queues for processing the first task based on the expected resource occupancy and the target data amount. Because there may be other tasks to be delivered at the same time when the first task is delivered, the task queue corresponding to each task may be determined based on the amount of resources required to be allocated for each task and the expected resource occupation situation of each task queue.
In the scheme, the task queues allocated by the first task are determined by predicting the resource occupation condition of a plurality of task queues in the computing cluster in future time, so that the load among the task queues after task allocation can be effectively ensured to be in a relatively balanced state, and the resources of the computing cluster are effectively utilized.
In one possible implementation, the task processing request is further used to indicate a processing priority of the first task, the processing priority being determined based on the target amount of resources.
Specifically, the smaller the amount of resources required by the task, the smaller the resources consumed during the task running, and the faster the task is run, so that the priority of the task with the smaller amount of resources required can be increased, and the task can be run out as soon as possible, so as to vacate more resources to run other tasks.
In the scheme, the priority of the task is determined based on the amount of resources required by the task, so that the execution priority of each task in the task queue can be clarified, the task queue is ensured to execute according to the execution priority order when the task is processed, and the diversified demands of users are met as far as possible.
In one possible implementation, the processing priority is also associated with an expected run-time of the first task, the expected run-time being determined based on the target amount of data and the target amount of resources. That is, the processing priority is related to both the target amount of resources of the first task and the expected runtime.
Specifically, in the case where the target data amount and the target resource amount corresponding to the first task have been determined, the expected running time of the first task may be predicted based on a regression model obtained by fitting in advance to determine the processing priority of the first task.
In one possible implementation, the first task is a task that is periodically performed in the computing cluster. For example, in the financial domain, the first task may be a sales data analysis task, where the computing cluster is required to periodically extract and analyze sales data from the past daily, weekly, or monthly.
In one possible implementation, the first task is a structured query language (Structured Query Language, SQL) based data query task, and the data filtering criteria include one or more of custom data selection criteria, data statistics fields, data statistics dimensions, and data filtering periods.
The second aspect of the application provides a resource allocation device of a task, which comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first task, the first task is used for indicating a computing cluster to process data meeting a data screening condition, the processing module is used for predicting a target data amount required to be processed when the computing cluster executes the first task based on the data screening condition and the total data accessed by the computing cluster, the processing module is also used for predicting a target resource amount required by the first task when the computing cluster executes based on the target data amount, the target resource amount is used for indicating the resource amount of a processor and/or the memory, and the processing module is also used for sending a task processing request to the computing cluster, wherein the task processing request is used for requesting the computing cluster to process the first task and designating the resource amount allocated by the computing cluster for the first task as the target resource amount.
In one possible implementation, the processing module is further configured to determine, based on the data filtering condition, a data redistribution operation involved in executing the first task, the data redistribution operation configured to redistribute data processed by the plurality of computing nodes on the computing cluster, and predict, based on the target data amount and the data redistribution operation, a target amount of resources required for executing the first task.
In one possible implementation, the processing module is further configured to predict, based on the target data amount and the data redistribution operation, the target resource amount required for the execution of the first task through a regression model, where the regression model is obtained by fitting data based on historical task processing data of the computing cluster.
In one possible implementation manner, task processing efficiency corresponding to historical task processing data meets a preset condition, and the task processing efficiency is related to the data amount of a task, the amount of resources allocated to the task and the execution time of the task.
In one possible implementation, the computing cluster includes a plurality of task queues for processing tasks, and the task processing request is further for specifying a target queue for processing the first task, the target queue being one of the plurality of task queues.
In one possible implementation, the processing module is further configured to, based on a resource occupancy of each task queue of the plurality of task queues at the current time, retrieve, for each task queue, a plurality of target samples closest to the resource occupancy among a plurality of historical samples of the task queues, the historical samples being used to indicate the resource occupancy of the task queue at a historical time period, predict, based on the plurality of target samples corresponding to each task queue, an expected resource occupancy of each task queue at a future time, and determine, based on the expected resource occupancy and a target data amount, the target queue among the plurality of task queues for processing the first task.
In one possible implementation, the task processing request is further used to indicate a processing priority of the first task, the processing priority being determined based on the target amount of resources.
In one possible implementation, the processing priority is also associated with an expected run-time of the first task, the expected run-time being determined based on the target amount of data and the target amount of resources.
In one possible implementation, the first task is a task that is periodically performed in the computing cluster.
In one possible implementation, the first task is an SQL-based data query task, and the data filtering criteria includes one or more of a custom data selection criteria, a data statistics field, a data statistics dimension, and a data filtering period.
A third aspect of the application provides a computing device comprising a processor and a memory. Wherein the processor of the computing device is configured to execute instructions stored in the memory of the computing device to cause the computing device to perform a method as described above in the first aspect or any implementation of the first aspect. For the steps in each possible implementation manner of the first aspect executed by the computer device, reference may be specifically made to the first aspect, which is not described herein.
A fourth aspect of the application provides a cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory. Wherein the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform a method as described above in the first aspect or any implementation of the first aspect. For the steps in each possible implementation manner of the first aspect to be executed by the computer device cluster, reference may be specifically made to the first aspect, which is not described herein again.
A fifth aspect of the application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of any of the above aspects.
A sixth aspect of the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above aspects.
A seventh aspect of the application provides a chip system comprising a processor and a communication interface for communicating with a module external to the chip system, the processor being operable to execute a computer program or instructions such that a device in which the chip system is installed can perform the method of any of the above aspects.
The technical effects of any one of the design manners of the second aspect to the seventh aspect may be referred to the technical effects of the different implementation manners of the first aspect, which are not described herein.
Drawings
Fig. 1 is a schematic diagram of a system architecture 100 according to an embodiment of the present application;
Fig. 2 is a flow chart of a resource allocation method of a task according to an embodiment of the present application;
FIG. 3 is a schematic diagram of determining a target amount of resources required for executing a first task according to an embodiment of the present application;
FIG. 4 is a schematic diagram of determining an expected resource occupancy of a task queue according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an operation architecture of a task resource allocation method according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of generating a task processing request based on a timed task according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating selection of a task queue according to an embodiment of the present application;
FIG. 8 is a schematic diagram of task priority determination according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a resource allocation device for tasks according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a computing device 1000 according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a computing device cluster according to an embodiment of the present application;
FIG. 12 is a schematic diagram of another computing device cluster according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described below with reference to the accompanying drawings. It will be apparent that the described embodiments are merely some, but not all embodiments of the application. As a person skilled in the art can know, with the appearance of a new application scenario, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the description so used is interchangeable under appropriate circumstances such that the embodiments are capable of operation in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps in the present application does not mean that the steps in the method flow must be executed according to the time/logic sequence indicated by the naming or numbering, and the execution sequence of the steps in the flow that are named or numbered may be changed according to the technical purpose to be achieved, so long as the same or similar technical effects can be achieved. The division of the units in the present application is a logical division, and may be implemented in another manner in practical application, for example, a plurality of units may be combined or integrated in another system, or some features may be omitted or not implemented, and in addition, coupling or direct coupling or communication connection between the units shown or discussed may be through some interfaces, and indirect coupling or communication connection between the units may be electrical or other similar manners, which are not limited in the present application. The units or sub-units described as separate components may be physically separated or not, may be physical units or not, or may be distributed in a plurality of circuit units, and some or all of the units may be selected according to actual needs to achieve the purpose of the present application.
In order to facilitate understanding, some technical terms related to the embodiments of the present application are described below.
(1) Computing clusters
A computing cluster is a computer system that is connected together, primarily by a set of loosely-integrated computer software or hardware, to perform computing work in a highly-tightly coordinated manner. In a sense, a computing cluster may be considered a computer. Individual computers in a computing cluster are often referred to as nodes, and the nodes are typically connected by a network.
(2)Spark
Spark is a computational engine designed for large-scale data processing, essentially a computational cluster. In general, spark can be applied to various data processing scenarios to implement the processing of tasks. Some scenarios in which Spark can be applied will be briefly described below.
1. Big data processing and analysis scenario Spark is a powerful distributed computing framework capable of handling large-scale data sets and supporting various data sources and formats, such as text, images, audio, video, etc.
2. The Spark supports various machine learning algorithms and tools, such as classification, regression, clustering, recommendation systems and the like, and can be used for various application scenes, such as finance, medical treatment, retail and the like.
3. Data mining and visualization scenarios Spark provides various data mining and visualization tools that can be used to explore data, discover data patterns, and visualize data results.
4. The Spark supports stream processing and real-time analysis, and can be used for processing real-time data streams and constructing real-time analysis applications, such as stream media processing, internet of things applications and the like.
5. And the Spark can be used for constructing the big data warehouse and the data lake, and integrating a plurality of data sources and data in a format into a unified data set, so that data sharing and analysis are facilitated.
(3) Structured query language (Structured Query Language, SQL)
SQL is a database language with multiple functions such as data manipulation and data definition. Databases are generally retrieved in a syntax format of SQL, so most data retrieval tasks require SQL calls. The SQL can be independently applied to the terminal, and can be used as a sub-language to provide effective assistance for other programming, namely, the SQL can optimize the program function together with other programming languages, so that more comprehensive information is provided for users.
(4) Shuffle operator
The Shuffle operator is a data partitioning operator in Spark that functions to Shuffle and repartition data. In Spark, the processing of data is based on a resilient distributed data set (RESILIENT DISTRIBUTED DATASET, RDD). RDD is a distributed, immutable data set that is divided into multiple partitions, each of which is assigned to a different node for processing. The data is re-partitioned by the Shuffle operator, so that the data with the same key value is distributed to the same node for processing, and the data processing efficiency is improved.
(5) Regression model
A mathematical model for quantitatively describing statistical relationships is described by a regression model (regression model). Specifically, the regression model is a predictive modeling technique that studies the relationship between dependent and independent variables.
For example, a mathematical model of multiple linear regression may be expressed as y=β0+β1×1+β2×2.+ βp×p, where β0, β1,., βp is p+1 parameters to be estimated, y is a dependent variable, x1-xp is an independent variable, βi is a regression coefficient, and characterizes the extent to which the independent variable affects the dependent variable.
When executing tasks based on the computing cluster Spark, a dynamic resource allocation scheme is generally adopted to automatically allocate resources for the tasks. In the dynamic resource allocation scheme, in order to complete tasks in the queue more quickly, a scheduling mechanism allocates as many resources as possible to the tasks in the queue, so that a phenomenon of resource waste is easy to occur. For example, assuming that a task is actually completed efficiently with only 8 cores (Central Processing Unit, CPU) and 16 gigabytes (Gigabyte, G) of memory, current dynamic allocation schemes may allocate more than 100 cores of CPU and 100GB of memory for the task, resulting in a very large amount of resources being occupied by the task. Moreover, the resources allocated for the tasks do not effectively increase the execution speed of the tasks, but may only increase the execution speed of the tasks by a small margin, which ultimately results in extremely low actual execution efficiency of the tasks.
In general, the existing dynamic resource allocation scheme in the industry has the phenomenon of unreasonable resource allocation, is easy to have the conditions of long task waiting time and shortage of cluster computing resources, and finally leads to high task operation failure rate. Therefore, a better resource allocation scheme is needed to solve the problem of unreasonable allocation of computing resources in the computing clusters.
Based on the above, the embodiment of the application provides a resource allocation method for a task, which predicts a target data volume involved in task execution based on data screening conditions indicated in the task and the total data accessed by a computing cluster, and further predicts a target resource volume required by task execution based on the target data volume. Therefore, when the tasks are delivered to the computing cluster, the target resource amount required by the tasks can be designated, so that the computing cluster is ensured to allocate proper resource amount for the tasks, each task can be ensured to have the highest execution efficiency when being executed, the normal execution of the tasks is prevented from being influenced due to unreasonable allocation of the task resources, and the task execution efficiency of the computing cluster is effectively improved.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture 100 according to an embodiment of the application. As shown in fig. 1, the system architecture 100 includes an execution device 110, a data storage system 120, and a computing cluster 130.
The execution device 110 may be implemented by at least one computing instance of a physical host (computing device), virtual machine, container. When the execution device 110 is implemented by a virtual machine or container, the execution device 110 is actually in the form of a cloud computing product, capable of providing cloud services. Alternatively, the execution device 110 may be coupled to other computing devices, such as data storage devices, load balancers, etc., and the execution device 110 may be disposed on one physical site or distributed across multiple physical sites.
The computing cluster 130 includes a plurality of computing devices capable of performing analysis processing on data stored in the data storage system 120 to complete tasks posted by the executing devices 110. Alternatively, the execution device 110 may be a device independent of the computing cluster 130 (e.g., independent of a server outside of the computing cluster 130), or the execution device 110 may be a device within the computing cluster 130, i.e., the execution device 110 itself may be used to perform tasks.
Alternatively, the data storage system 120 may be located external to the execution device 110, exchanging data with the execution device 110 over a network. Alternatively, where the execution device 110 is a physical host, the data storage system 120 may also be located internal to the execution device 110, with the data storage system 120 exchanging data with the processor over a bus. At this time, the data storage system 120 appears as a hard disk. With the data storage system 120, the execution device 110 may use data in the data storage system 120 or invoke program code in the data storage system 120 to implement the resource allocation methods of tasks provided by embodiments of the present application.
In a specific implementation, the executing device 110 is configured to implement the resource allocation method for tasks provided by the embodiment of the present application, so as to determine an amount of resources required for executing the tasks, and further specify an amount of resources that needs to be allocated to the tasks when delivering the tasks to the computing cluster 130.
Referring to fig. 2, fig. 2 is a flow chart of a method for allocating resources of a task according to an embodiment of the present application. As shown in fig. 2, the method for allocating resources for a task according to the embodiment of the present application includes the following steps 201 to 204.
In step 201, the executing device acquires a first task, where the first task is used to instruct the computing cluster to process data that satisfies the data filtering condition.
In this embodiment, the first task is a task that needs to perform computing cluster processing, and the first task includes a data filtering condition to indicate that the computing cluster processing satisfies data under a specific data filtering condition. In brief, since a computing cluster can be used to process a large amount of data stored in a data storage system, a first task is often indicative of the computing cluster processing a particular portion of data stored in the data storage system, because data filtering conditions may be indicated in the first task, thereby enabling the computing cluster to explicitly perform the first task as to which portion of data in the data storage system to process.
Illustratively, the first task may be, for example, an SQL-based data query task, with the data screening conditions including one or more of a custom data selection condition (such as a WHERE condition in SQL), a data statistics field (such as a SELECT field in SQL), a data statistics dimension (such as a GROUP BY field in SQL), and a data screening time period (such as an event window specified in SQL). In addition, the first task may also be another task supported by the computing cluster, for example, a big data analysis task, a machine learning task, a data flow analysis task, etc., which is not limited by the specific type of the first task in this embodiment.
The manner in which the execution device obtains the first task may be multiple.
In one possible implementation, the first task is a task stored in advance on the execution device, and the execution device can actively acquire the first task.
For example, the first task may be a task that needs to be periodically executed in the computing cluster, and then the executing device needs to periodically acquire the first task once and deliver the first task to the computing cluster to enable the first task to be periodically executed in the computing cluster. For example, in the financial domain, the first task may be a sales data analysis task, where the computing cluster is required to periodically extract and analyze sales data from the past daily, weekly, or monthly.
In another possible implementation, the first task is sent by the user to the execution device through the client. For example, when a user temporarily creates a first task that needs to be executed by the computing cluster, the user may send the first task to the execution device through the client, and then the execution device delivers the first task to the computing cluster.
Step 202, based on the data filtering condition and the total amount of data accessed by the computing cluster, the executing device predicts the target data amount to be processed when the computing cluster executes the first task.
Since the total amount of data accessed by the computing cluster is fixed at a time point, when all data accessed by the computing cluster is screened based on the data screening condition, the data which can be screened is always fixed. Therefore, based on the total amount of data accessed by the computing cluster and the data screening conditions indicated in the first task, the executing device can predict the target data amount corresponding to the data to be processed when the computing cluster executes the first task. That is, the target data amount is an indication of what total amount of data needs to be processed when the computing cluster performs the first task.
For example, assuming that the data stored in the data storage system to which the computing cluster is connected is sales detail data of each day in the last ten years of the company, and the total amount of data stored in the data storage system (i.e., the total amount of data to which the computing cluster is connected) is 100G, when the data filtering condition is to filter profit fields and sales amount fields of a month of a certain year among all sales detail data, it is possible to effectively predict what the data amount of data satisfying the data filtering condition is based on the total amount of data of the sales detail data.
Alternatively, in order to facilitate the execution device to accurately predict the target data amount to be processed when the computing cluster executes the first task, the execution device may predict the target data amount based on a regression model (such as a linear regression model) that is constructed in advance. Under the condition of having a pre-constructed regression model, the execution device can substitute the data screening condition and the total data accessed by the computing cluster into the regression model, so as to obtain the target data volume output by the regression model.
When constructing a regression model for predicting the data volume required by task execution, various forms of data screening conditions can be pre-constructed, and then data searching is performed in a data storage system accessed by a computing cluster based on the pre-constructed various forms of data screening conditions, so that the data volume under the various forms of data screening conditions is determined. Thus, based on the relation between the pre-constructed data screening condition and the corresponding data amount, a regression model is constructed, and the regression model is used for indicating and calculating the relation between the total amount of data accessed by the cluster and the data amount required by the data screening condition and the task execution.
In step 203, based on the target data amount, the executing device predicts a target resource amount required for the execution of the first task, where the target resource amount is used to indicate the processor resource amount and/or the memory amount.
In the case of the determination of the amount of data involved in executing the first task, it is often also possible to determine the amount of resources required to execute the first task. Thus, based on the target amount of data required when executing the first task, the executing device may predict the target amount of resources required when executing the first task, i.e. the processing resources that the computing cluster needs to allocate to the first task. In particular, the target amount of resources may be in particular an amount of resources of the processor and/or an amount of memory, i.e. at least one of the amount of resources of the processor and the amount of memory.
For example, in the case where the computing cluster is a Spark computing cluster, the target resource amount may specifically be a number of cores for indicating Driver CPU, a number of memories for Driver, a number of Executor, a number of cores for Executor CPU, and a number of memories for Executor. Wherein Driver and Executor are two different roles. Driver is the master node of Spark applications (i.e., applications running on a Spark computing cluster) and is responsible for the control and coordination of the entire Spark application. The Driver node breaks the Spark application into tasks and distributes the tasks to the Executor node for execution. The Driver node is also responsible for maintaining state information for the application and handling requests from users. Executor is a working node of the Spark application program, which is responsible for executing tasks assigned by the Driver node. Each Executor node has its own virtual machine process and can run on different physical machines. The Executor node receives the task through communication with the Driver node, and returns the result of the task to the Driver node. In general, driver nodes and Executor nodes play different roles in Spark applications. The Driver node is the control center of the application program, and the Executor node is the working node of the application program.
In this embodiment, the target resource amount predicted by the execution device based on the target data amount is a resource amount that can make the execution efficiency of the first task as high as possible. That is, when the processing resources are allocated to the first task in accordance with the target resource amount, the execution efficiency of the first task can be made as high as possible.
Specifically, for any task, if the processing resources allocated to the task are too small, the execution time of the task is too long, resulting in lower execution efficiency of the task, and if the processing resources allocated to the task are too large, most of the processing resources allocated to the task are not effectively utilized, and the execution time of the task is not significantly shortened due to the fact that the processing resources are allocated more, resulting in lower execution efficiency of the task. Generally, when the amount of resources allocated to a task is within a certain range, the execution efficiency of the task tends to reach a relatively high value, and the range corresponding to the relatively high execution efficiency is often related to the data amount of the task itself. Therefore, in this embodiment, the target amount of resources that needs to be allocated to the first task is predicted based on the target amount of data of the first task, and the amount of resources that can make the execution efficiency of the first task as high as possible can be effectively determined.
In step 204, the executing device sends a task processing request to the computing cluster, where the task processing request is used to request the computing cluster to process the first task and designate an amount of resources allocated by the computing cluster for the first task as a target amount of resources.
After predicting the target amount of resources required for executing the first task, the execution device may then send a task processing request to the computing cluster to request the computing cluster to process the first task. In the task processing request, the execution device further designates the amount of resources allocated to the first task by the computing cluster when processing the first task as the target amount of resources, so that the computing cluster can allocate processing resources to the first task according to the designated amount of resources, and excessive allocation of resources to the first task by the computing cluster is avoided.
In some embodiments, when the computing cluster performs tasks, the computing cluster may further involve a data redistribution operation, i.e. re-partitioning the data, so as to adjust the data processed by the computing nodes on the computing cluster, so as to improve the processing efficiency of the data as much as possible. In this case, the amount of resources required for task execution may also vary somewhat. That is, for two tasks having the same amount of data, the amount of resources required for a task that involves a data redistribution operation is typically different from a task that does not involve a data redistribution operation, i.e., the data redistribution operation affects the amount of resources required for execution of the task. Based on this, it is proposed in the present embodiment that, when predicting the amount of resources required for the task to execute, whether the task involves a data redistribution operation or not is considered at the same time.
Optionally, in the foregoing embodiment, after the first task is acquired, a data redistribution operation involved in executing the first task may be further determined based on a data filtering condition, where the data redistribution operation is used to redistribute data processed by a plurality of computing nodes on the computing cluster. Since the first task is actually performed on the data that satisfies the data filtering condition, it can be determined whether the data redistribution operation can be performed after determining that the data that is actually required to be processed is based on the data filtering condition, so that the data redistribution operation involved in the execution of the first task can be determined based on the data filtering condition in the present embodiment.
Thus, in step 203 described above, the executing device may specifically predict the target amount of resources required for the execution of the first task based on the target amount of data and the data redistribution operation. Specifically, the data redistribution operation is a means in the first task processing process, and can adjust the distribution situation of the data to increase the speed of the first task processing as much as possible. Generally, if a data redistribution operation is involved in the task processing, the amount of resources required for the task tends to be increased compared to a task that has the same amount of data but does not involve the data redistribution operation. That is, where a task involves a data redistribution operation, the processing of the task tends to require more resources to complete.
In the scheme, whether the task execution process involves data redistribution operation is determined based on the data screening condition indicated by the task, and the resource amount required by the task execution process is predicted further based on the data amount involved by the task and the data redistribution operation, so that the accuracy of the predicted resource amount can be effectively improved.
Illustratively, taking the first task as an SQL-based data query task as an example, the data redistribution operation in the Spark computing cluster is performed by a shuffle operator. Generally, the case of triggering the shuffle operator is approximately several cases shown below, and whether the shuffle operator is triggered can be determined by determining whether there is a related operation statement in the SQL statement corresponding to the first task (i.e., the data filtering condition indicated by the first task).
1. Aggregation operation, namely when aggregation operations such as group by and distinct, count, sum are used, a Shuffle operator is triggered, and a Hash Shuffle operator is triggered. Such as SELECT DEPARTMENT, AVG (salary) FROM employee GROUP BY department.
2. Connection operation, namely triggering a Shuffle operator when connection operations such as join, unit and the like are used, and triggering a Sort Shuffle operator. For example: SELECT FROM employee JOIN department ON emuployee. Dept\u id=partition.
3. Sequencing operations when sequencing operations such as order by and Sort are used, a Shuffle operator is triggered, and a Sort Shuffle operator is triggered. For example SELECT FROM employee ORDER BY SALARY DESC.
4. Repartitioning operation-when repartition operations are used, a Shuffle operator is triggered, and a Hash Shuffle operator is triggered. For example, SELECT FROM element.
5. Window function operation-when a window function is used, a Shuffle operator is triggered, and a Sort Shuffle operator is triggered. For example SELECT department,salary,RANK()OVER(PARTITION BY department ORDER BY salary DESC)as rank FROM employee.
In general, in the case where the first task is essentially a task represented by an SQL statement, by detecting whether a particular operation statement is included in the SQL statement, it can be determined whether the first task will involve a shuffle operator (i.e., a data redistribution operation) when executed.
Alternatively, when predicting the amount of resources required for the execution of the first task, the amount of resources required for the execution of the first task may be predicted by a regression model based on the target data amount and the data redistribution operation involved in the execution of the first task. The regression model is obtained by fitting data based on historical task processing of the computing clusters. And substituting the target data quantity and the data redistribution operation into the regression model obtained by fitting, so that the target resource quantity required by the execution of the first task output by the regression model can be obtained.
Specifically, since the computing clusters are continuously running and the computing clusters process a large number of similar tasks (such as data query tasks similar to SQL sentences) every day, the regression model can be effectively obtained by fitting the historical task processing data based on the computing clusters. The historical task processing data of the computing cluster may include the data volume actually related to the task executed in the historical time period, the data redistribution operation related to the task, and the allocated resource volume when the task is executed. Therefore, based on the historical task processing data of the computing cluster, the relation among the data quantity related to the task, the data redistribution operation related to the task and the resource quantity distributed during task execution can be obtained, and accordingly a corresponding regression model is obtained through fitting.
The regression model used for predicting the target resource amount required for executing the first task may be, for example, a linear regression model, an autoregressive moving average model (Auto-REGRESSIVE MOVING AVERAGE MODEL, ARIMA), a propset model, or a random forest model, and the specific type of the regression model is not specifically limited in this embodiment.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a determination of a target resource amount required for executing a first task according to an embodiment of the present application. As shown in fig. 3, when the target data size of the first task is 1G and the Shuffle operator is triggered when the first task is executed, the target data size is 1G and the Shuffle operator is triggered as input, and the input is input into the regression model, so as to obtain the target resource size output by the regression model. The target resource amount is specifically: the number of the Driver CPU cores is 1, and the Driver the memory is 1G, the number of executor 2, 4 cores of the Executor CPU and 5G of the Executor memory.
Optionally, when the regression model is fitted, task processing efficiency corresponding to historical task processing data for fitting the regression model meets a preset condition, wherein the task processing efficiency corresponding to the historical task processing data is related to the data volume of the task, the resource volume allocated by the task and the execution time of the task. The task processing efficiency meeting the preset condition may be, for example, that the task processing efficiency is greater than or equal to a preset threshold.
Generally, the longer the task execution time, the lower the task processing efficiency, the more the task data amount and the task execution time are, the lower the task processing efficiency, and the more the task data amount and the task execution time are, and the higher the task data amount and the task execution time are, respectively. Therefore, when the data amount of the task is determined, the task execution efficiency can be improved by allocating as little resource as possible to the task and by shortening the task execution time as much as possible. However, the amount of resources allocated to the task and the execution time of the task are in conflict, that is, the more the amount of resources allocated to the task is, the shorter the execution time of the task tends to be, and after the amount of resources allocated to the task reaches a certain degree, the shortening amount of the execution time of the task is obviously reduced. Therefore, based on the data volume of the task, the resource volume allocated by the task is reasonably determined, so that the resource volume allocated by the task and the execution time of the task are balanced, and further, the execution efficiency of the task is ensured to meet the preset condition.
Specifically, since the historical task processing data itself indicates the task execution condition in the historical period, the historical task processing data includes specific contents such as the data amount of the task, the amount of resources allocated to the task, and the execution time of the task. Therefore, based on the data amount of the task, the resource amount allocated by the task and the execution time of the task, the execution efficiency of each task in all the historical task processing data of the computing cluster can be calculated, and further the historical task processing data with the task processing efficiency meeting the preset condition is screened out to fit the regression model, so that the regression model obtained by fitting is ensured to predict the resource amount corresponding to higher execution efficiency based on the data amount of the task. That is, by fitting the regression model by screening task processing data corresponding to higher execution efficiency, it is possible to ensure that the first task can also have higher execution efficiency when resources are allocated to the first task based on the amount of resources predicted by the regression model, thereby ensuring rationality of resource allocation.
The above describes a process for predicting the amount of resources that need to be allocated for tasks that need to be delivered to processing in a computing cluster. However, in most scenarios, a plurality of task queues are often set in the computing cluster to process tasks, and the task queues where the tasks are located often affect the execution situation of the tasks, so how to specify the task queues to process the tasks when delivering the tasks will be described below.
Optionally, in the above embodiment, a plurality of task queues for processing tasks may be included in the computing cluster. Then, the task processing request is used to specify a target queue for processing the first task, in addition to the target amount of resources that need to be allocated for the first task, the target queue being one of a plurality of task queues. That is, the task processing request is a target queue that simultaneously specifies the target amount of resources that the first task needs to allocate and processes the first task. Thus, when the computing cluster processes the first task, the first task is placed in the target queue, and a target amount of resources is allocated for the first task to execute the first task.
Each task queue of the plurality of task queues included in the computing cluster is pre-allocated with a fixed amount of resources, and the amounts of resources allocated by different task queues may be the same or different. For example, assuming that there are 10 task queues in a computing cluster, the 1 st task queue is allocated 60% of the total computing cluster resources, the 2 nd task queue is allocated 20% of the total computing cluster resources, and the remaining 8 task queues are equally allocated the remaining 20% of the computing cluster resources.
In the scheme, under the condition that the computing cluster comprises a plurality of task queues, the target queues for processing the first tasks are appointed, so that the loads of the plurality of task queues can be guaranteed to be in an equilibrium state as far as possible, and the resources in the computing cluster can be reasonably utilized under the condition that part of task queues are overloaded and the other part of task queues are underloaded.
Alternatively, the process of determining a target queue for processing the first task among the plurality of task queues may include the following steps.
First, based on the resource occupation condition of each task queue in the plurality of task queues at the current time, a plurality of target samples with the closest resource occupation condition are searched for each task queue in the history samples of the plurality of task queues. The historical sample is used for indicating the resource occupation condition of the task queue in the historical time period. The target samples corresponding to each task queue belong to the history samples of each task queue.
Specifically, since the task queues in the computing cluster are long-standing after partitioning, i.e., each task queue runs long-term, each task queue is able to find a corresponding plurality of history samples. For example, in the case where the computing cluster has been running for 100 hours, the 100 hours can be divided into 100 time periods in units of 1 hour, and the resource occupation of each queue in the computing cluster in the 100 time periods constitutes 100 history samples corresponding to each queue. Based on the current resource occupation situation of each task queue, a plurality of target samples with the closest resource occupation situation can be searched for each task queue in 100 historical samples corresponding to each task queue.
Then, based on a plurality of target samples corresponding to each task queue, the expected resource occupation condition of each task queue at the future time is predicted.
Because the first task is not delivered in real time, but needs to be delivered to the computing cluster after determining the queue to be delivered and the amount of resources to be allocated to the first task, and determining the queue to be delivered and the amount of resources to be allocated for the first task often needs a certain time, in this embodiment, the expected resource occupation situation of the task queue in future time is predicted, so as to ensure that the resource occupation situation of the task queue is close to the resource occupation situation based on when the task queue is selected when the task queue is actually delivered for the first task, thereby ensuring the accuracy when the task queue is selected for the first task.
Since the plurality of target samples corresponding to each task queue actually represent the resource occupation condition in a certain time period, after the plurality of target samples are determined, the time period corresponding to the historical time is determined. The resource occupation condition under a certain time period after the plurality of target samples is obtained by pushing the plurality of target samples backwards for a certain time period on the basis of the plurality of time periods represented by the plurality of target samples, and then the resource occupation condition under a certain time period after the plurality of target samples is averaged to obtain the resource occupation condition which can be used as the expected resource occupation condition of the task queue in future time.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating determining an expected resource occupancy of a task queue according to an embodiment of the present application. As shown in FIG. 4, the task queue A occupies 75% of CPU and 80% of memory in the current time period, wherein 75% of CPU resources currently occupied by the task queue A are 75% of CPU resources owned by the task queue A, and 80% of memory resources currently occupied by the task queue A are 80% of memory resources owned by the task queue A. The resource occupation situation of each time period of the task queue a under the history time can be searched through a K-Nearest Neighbor (KNN) algorithm, so that K time periods, in which the resource occupation situation is closest to the resource occupation situation of the current time period, are found, and K is 3 in the current example. The 3 time periods comprise time period 1-time period 3 under the history time, wherein the resource occupation condition under the time period 1 is that a CPU occupies 73% and a memory occupies 79%, the resource occupation condition under the time period 2 is that the CPU occupies 76% and the memory occupies 82%, and the resource occupation condition under the time period 3 is that the CPU occupies 75% and the memory occupies 81%.
Based on the retrieved 3 time periods, the resource occupation situation of X hours after each of the 3 time periods may be determined in the historical resource occupation situation of the task queue a, where X may be determined according to the actual situation, for example, 1 or 2. Specifically, the resource occupation condition at the time of X hours after the time period 1 is 80% of CPU occupation, 85% of memory occupation, the resource occupation condition at the time of X hours after the time period 2 is 78% of CPU occupation, 83% of memory occupation, and the resource occupation condition at the time of X hours after the time period 3 is 76% of CPU occupation and 80% of memory occupation. The average value of the resource occupation situations after X hours in the three time periods is calculated, so that the average value of the resource occupation situations can be obtained, wherein the CPU occupation is 78% and the memory occupation is 84%. At this time, the CPU occupies 78% and the memory occupies 84% of the CPU can be used as the expected resource occupation of the task queue a after X hours at the current time.
Finally, a target queue is determined from the plurality of task queues for processing the first task based on the expected resource occupancy and the target amount of resources.
After the expected resource occupation condition of each task queue in the future time is predicted, the resource occupation condition of each task queue in the first task delivery is predicted, so that the corresponding task queue can be selected for the first task based on the expected resource occupation condition of each task queue. Because there may be other tasks to be posted at the same time when the first task is posted, in this embodiment, the task queue corresponding to each task may be determined based on the amount of resources required to be allocated for each task and the expected resource occupation situation of each task queue.
Specifically, when determining the task queue corresponding to the first task, it is required to determine that the remaining resources of the task queue can be not less than the target amount of resources required by the first task based on the expected resource occupation condition of the task queue, so as to ensure that the first task can be normally executed. In addition, when the corresponding task queue is determined for each task, the task queue can be selected based on the principle of load balancing, namely, the load of each task queue is ensured to be balanced as much as possible, and the phenomenon that part of the task queues are overloaded and the other part of the task queues are overloaded is avoided.
Optionally, the task processing request is further configured to indicate a processing priority of the first task, where the processing priority is determined based on the target resource amount.
Specifically, the smaller the amount of resources required by the task, the smaller the resources consumed during the task running, and the faster the task is run, so that the priority of the task with the smaller amount of resources required can be increased, and the task can be run out as soon as possible, so as to vacate more resources to run other tasks.
Optionally, the processing priority is further associated with an expected run-time of the first task, the expected run-time being determined based on the target amount of data and the target amount of resources. That is, the processing priority is related to both the target amount of resources of the first task and the expected runtime. Specifically, in the case where the target data amount and the target resource amount corresponding to the first task have been determined, the expected running time of the first task may be predicted based on a regression model obtained by fitting in advance to determine the processing priority of the first task.
In particular, in determining the processing priority of a task, the processing priority of the task may be determined from the amount of resources required by the task and the expected run time of the task. The smaller the amount of resources required by the task and the shorter the expected running time of the task, the higher the processing priority of the task, and the larger the amount of resources required by the task and the longer the expected running time of the task, the lower the processing priority of the task.
In general, in the embodiment of the present application, tasks with as little resources as possible and with as short a running time as possible in the task queue tend to be preferentially executed, so as to ensure that the computing cluster can complete execution of the tasks as quickly and efficiently.
For easy understanding, the resource allocation method of the task provided in this embodiment will be described in detail below with reference to specific examples.
Referring to fig. 5, fig. 5 is a schematic diagram of an operation architecture of a task resource allocation method according to an embodiment of the present application. As shown in fig. 5, the operation architecture of the resource allocation method of the task includes two parts, namely a big data platform and an algorithm platform. The big data platform is provided with a computing cluster and a database, and the computing cluster is used for completing offline tasks delivered by the algorithm platform by taking out data in the database and processing the taken-out data. In addition, the database also stores various cluster resource conditions generated when the computing cluster processes real-time tasks, such as resource occupation conditions of each queue in a plurality of queues of the computing cluster under a historical time period, and data such as resource amounts allocated to each task and running time of each task when the computing cluster processes each task. The off-line tasks refer to tasks which are delivered to the computing clusters and are waiting for the processing of the computing clusters, and the real-time tasks refer to tasks which are processed by the computing clusters.
The algorithm platform is a resource allocation method for executing the task provided by the embodiment of the application, so as to determine the amount of resources required for executing the task and the queue selected during executing the task, thereby indicating the computing cluster to process the task based on the designated amount of resources and the queue. The algorithm platform may be specifically deployed in the execution device described in the foregoing embodiment.
One or more timed tasks may be stored in the algorithm platform, which periodically obtains the timed tasks (e.g., once a day when the timed tasks are executed), and calculates the resources, selected task queues, and task priorities required for the timed tasks, thereby delivering the timed tasks to the computing cluster and specifying the resources, selected task queues, and task priorities required for the timed tasks.
Specifically, after the timing task is acquired, the algorithm platform acquires task information of the timing task to determine a data magnitude of the timing task and whether the timing task involves a data redistribution operation. In this way, the resources required for the execution of a timed task can be predicted by the task resource prediction model based on the data magnitude of the timed task and the data redistribution operations involved.
In addition, the algorithm platform acquires the current resource occupation condition of the computing cluster and the resource occupation condition under the historical time from a database of the big data platform. Based on the current resource occupation condition of the computing cluster and the resource occupation condition under the historical time, the future resource occupation condition of the task queues in the computing cluster can be predicted through a queue resource prediction model. Then, based on the resources required by the task in execution and the future resource occupation condition of the task queues in the computing cluster, a corresponding task queue can be selected for the task and the task priority corresponding to the task can be determined.
Finally, based on the resources required by the task execution, the task queue corresponding to the task and the task priority, a task processing request can be generated, so that the task processing request is sent to a computing cluster in a big data platform to finish the delivery of the task. When a computing cluster processes a task, the task processing can be executed based on resources required by the task to be executed, a task queue corresponding to the task and a task priority indicated in the task processing request.
Referring to fig. 6, fig. 6 is a schematic flow chart of generating a task processing request based on a timed task according to an embodiment of the present application. As shown in fig. 6, the process of generating task processing requests based on timed tasks may include four phases, data input, task resource occupancy prediction, queue resource occupancy prediction, and queue selection, respectively. The four phases described above will be described in detail below taking the computing cluster Spark as an example.
Stage one, data input.
First, by setting timed tasks on the execution device, the tasks are periodically acquired and parsed by the execution device. In the process of analyzing the task, the data filtering conditions in the task, namely the selection conditions (WHERE conditions), the statistics fields (SELECT fields), the statistics dimensions (GROUP BY fields) and the time window (data filtering time period) in the task are needed to be analyzed. Then, based on the data screening condition in the task and the total data amount accessed by the computing cluster Spark, linear regression prediction is performed to obtain accurate data magnitude (namely, the data amount involved in the task execution process). And judging the Shuffle operator involved in the task running by using the data statistics dimension and the selection condition.
And when the analysis task is triggered, acquiring the existing cluster resource condition and the history cluster resource condition as the current cluster input and the history cluster input. The cluster resource condition comprises the task number, CPU occupation, memory occupation, idle CPU core number and idle memory resource of each queue in different time periods, and is used for predicting the cluster resource occupation condition of a certain subsequent time period.
In addition, in the running process of the task, the running condition of each task is recorded in real time, and CPU resources, memory resources, task execution queues and task running time used by each task are recorded in a database for predicting the time occupation condition of the subsequent task.
Stage two, task resource occupation prediction
According to the task data magnitude obtained in the first stage and the Shuffle operator related to the task, fitting CPU resource parameters and memory resource parameters (such as Driver CPU core number, driver memory resource, executor number, executor CPU core number and Executor memory) required by the current task through a linear regression prediction model. Wherein, for each resource parameter required by the task, a linear regression prediction model is independently constructed, and the linear regression prediction model is constructed and adjusted based on historical task operation data. In addition, after the amount of resources required for task operation is predicted, the expected operation time of the task can be predicted based on the amount of resources required for task operation and the data magnitude of the task.
Stage three, queue resource occupancy prediction
Because of the time required for data preparation and model scheduling, it is often an option to begin predicting queue resource occupancy X hours prior to task delivery (X may be determined or adjusted according to the particular scenario, e.g., X is 2).
Specifically, according to the resource occupation condition of each task queue in the current time period, the task queue resource occupation condition in the historical time period is searched through a KNN algorithm, so that K samples with the most similar resource occupation rate are found for each task queue. Thus, for each task queue, the average value of the resource occupation situation of the K samples after X hours can be used as a reference to predict the resource occupation situation of the current task queue after X hours in the future. In this way, the operation of predicting the queue resource occupation of the merchant is executed on all the task queues, so that the overall cluster resource occupation condition during the future task delivery operation is deduced.
Stage four, queue selection
Based on the predicted resource occupation condition of the task queue in the future in the stage three, the running rationality of the current task in different queues can be gradually fitted. If the remaining amount of resources of the current task queue is less than the resources required by the current task, then the fitting of the current task queue is skipped, i.e., the current task cannot select the task queue. If a certain task queue will remain high occupied for a long period of time during the task delivery period, then the amount of direct fit data is appropriate for the task for which the resource consumption of the task queue is small. In general, the goal of selecting a corresponding task queue for each task is to balance the load among the task queues as much as possible, so as to avoid the phenomenon that the load among the task queues is too unbalanced.
For example, when selecting task queues for batch tasks, different tasks may be combined and corresponding task queues may be selected, and load balancing situations between the task queues after different tasks are combined are compared, so that a combination manner that makes load balancing between the task queues possible is selected.
Referring to fig. 7, fig. 7 is a schematic diagram illustrating selection of a task queue according to an embodiment of the present application. As shown in FIG. 7, the tasks to be posted include 4 tasks, task 1, task 2, task 3, and task 4, respectively, and the computing cluster currently has two task queues, A queue and B queue, respectively, and the amounts of resources owned by the A queue and the B queue are the same. The amount of resources required for executing task 1 is 60% of the total amount of one queue resource, the amount of resources required for executing task 2 is 25% of the total amount of one queue resource, the amount of resources required for executing task 3 is 23% of the total amount of one queue resource, and the amount of resources required for executing task 4 is 10% of the total amount of one queue resource. The free resources of the A queue in the future task delivery period are 80%, and the free resources of the B queue in the future task delivery period are 60%. At this time, by comparing the cases after each task is posted to the corresponding task queue, it can be determined to post task 1 and task 4 to the a queue, and post task 2 and task 3 to the B queue. After task 1 and task 4 were posted to the A-queue, the amount of resources left by the A-queue was 10%, and after task 2 and task 3 were posted to the B-queue, the amount of resources left by the B-queue was 12%, and it was apparent that the loads of the A-queue and the B-queue were substantially balanced.
Further, after selecting a corresponding task queue for each task, a task priority corresponding to each task may also be determined, so that the task queues process the assigned tasks in a priority order.
For each task queue, fitting can be performed by using a regression prediction model based on the resource amount and the expected running time of each task to be processed in the task queue, so as to obtain the task priority corresponding to each task.
Referring to fig. 8, fig. 8 is a schematic diagram illustrating task priority determination according to an embodiment of the present application. As shown in fig. 8, after determining the task queues corresponding to the tasks 1 to 4, the task priority corresponding to each task may be further determined on the basis of the example shown in fig. 7. In the a queue, a task 1 (60% of the required resource amount) having a large required resource amount and expected running time may be set as a low task priority, and a task 4 (10% of the required resource amount) having a small required resource amount and expected running time may be set as a low task priority. Further, in the B queue, since the amounts of resources required for the task 2 and the task 3 are close, the task 3 whose expected operation time is short can be set to a high task priority, and the task 2 whose expected operation time is long can be set to a low task priority.
The method provided by the embodiment of the present application is described in detail above, and the apparatus for performing the method provided by the embodiment of the present application will be described next.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a resource allocation device for tasks according to an embodiment of the present application. As shown in fig. 9, the resource allocation device for tasks provided in the embodiment of the present application includes an obtaining module 901, configured to obtain a first task, where the first task is used to instruct a computing cluster to process data that meets a data filtering condition, a processing module 902, configured to predict a target data volume that needs to be processed when the computing cluster executes the first task based on the data filtering condition and a total amount of data accessed by the computing cluster, the processing module 902 is further configured to predict a target resource volume that is required when the first task is executed based on the target data volume, the target resource volume is used to instruct a processor resource volume and/or a memory volume, and the processing module 902 is further configured to send a task processing request to the computing cluster, where the task processing request is used to request the computing cluster to process the first task and specify a resource volume allocated by the computing cluster for the first task as the target resource volume.
In one possible implementation, the processing module 902 is further configured to determine, based on the data filtering condition, a data redistribution operation involved in executing the first task, the data redistribution operation configured to redistribute data processed by the plurality of computing nodes on the computing cluster, and predict, based on the target data amount and the data redistribution operation, a target amount of resources required for executing the first task.
In a possible implementation, the processing module 902 is further configured to predict, based on the target data amount and the data redistribution operation, the target resource amount required for the execution of the first task through a regression model, where the regression model is obtained by fitting data based on historical task processing data of the computing cluster.
In one possible implementation manner, task processing efficiency corresponding to historical task processing data meets a preset condition, and the task processing efficiency is related to the data amount of a task, the amount of resources allocated to the task and the execution time of the task.
In one possible implementation, the computing cluster includes a plurality of task queues for processing tasks, and the task processing request is further for specifying a target queue for processing the first task, the target queue being one of the plurality of task queues.
In a possible implementation, the processing module 902 is further configured to, based on a resource occupation condition of each task queue in the plurality of task queues at a current time, retrieve, for each task queue, a plurality of target samples closest to the resource occupation condition among a history sample of the plurality of task queues, where the history sample is used to indicate the resource occupation condition of the task queue in a history period, predict, based on a plurality of target samples corresponding to each task queue, an expected resource occupation condition of each task queue in a future time, and determine, based on the expected resource occupation condition and a target data amount, that the target queue is used to process the first task in the plurality of task queues.
In one possible implementation, the task processing request is further used to indicate a processing priority of the first task, the processing priority being determined based on the target amount of resources.
In one possible implementation, the processing priority is also associated with an expected run-time of the first task, the expected run-time being determined based on the target amount of data and the target amount of resources.
In one possible implementation, the first task is a task that is periodically performed in the computing cluster.
In one possible implementation, the first task is an SQL-based data query task, and the data filtering criteria includes one or more of a custom data selection criteria, a data statistics field, a data statistics dimension, and a data filtering period.
The acquiring module 901 and the processing module 902 may be implemented by software, or may be implemented by hardware. Illustratively, an implementation of the processing module 902 is described next, taking the processing module 902 as an example. Similarly, the implementation of the acquisition module 901 may refer to the implementation of the processing module 902.
Processing module 902 as an example of a software functional unit, processing module 902 may include code that runs on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, the processing module 902 may include code running on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed in the same availability zone (availability zone, AZ) or may be distributed in different AZs, each AZ comprising one data center or multiple geographically close data centers. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud (virtual private cloud, VPC) or may be distributed in multiple VPCs. In general, one VPC is disposed in one region, and a communication gateway is disposed in each VPC for implementing inter-connection between VPCs in the same region and between VPCs in different regions.
Processing module 902 as an example of a hardware functional unit, processing module 902 may include at least one computing device, such as a server or the like. Alternatively, the processing module 902 may be a device implemented using an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or the like. The PLD may be implemented as a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (FPGA) GATE ARRAY, a general-purpose array logic (GENERIC ARRAY logic, GAL), or any combination thereof.
The multiple computing devices included in the processing module 902 may be distributed in the same region or may be distributed in different regions. The plurality of computing devices included in the processing module 902 may be distributed among the same AZ or may be distributed among different AZ. Likewise, multiple computing devices included in the processing module 902 may be distributed in the same VPC or may be distributed among multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.
It should be noted that, because the content of information interaction and implementation process between the modules/units of the above-mentioned device and the method embodiment of the present application are based on the same concept, the technical effects brought by the content are the same as those brought by the method embodiment of the present application, and specific content may refer to the description in the foregoing illustrated method embodiment of the present application, and will not be repeated herein.
The present application also provides a computing device 1000. Referring to fig. 10, fig. 10 is a schematic structural diagram of a computing device 1000 according to an embodiment of the application. As shown in FIG. 10, computing device 1000 includes a bus 1002, a processor 1004, a memory 1006, and a communication interface 1008. Communication between the processor 1004, memory 1006 and communication interface 1008 is via bus 1002. Computing device 1000 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 1000.
Bus 1002 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 10, but not only one bus or one type of bus. Bus 1002 may include a path for transferring information between various components of computing device 1000 (e.g., memory 1006, processor 1004, communication interface 1008).
The processor 1004 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (DIGITAL SIGNAL processor, DSP).
The memory 1006 may include volatile memory (RAM), such as random access memory (random access memory). The processor 1004 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk (HARD DISK DRIVE, HDD) or solid state disk (SSD STATE DRIVE).
The memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to implement the functions of the aforementioned acquisition module and processing module, respectively, thereby implementing the resource allocation method of the aforementioned tasks. That is, the memory 1006 has stored thereon instructions of a resource allocation method for executing a task.
Communication interface 1008 enables communication between computing device 1000 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a computing device cluster according to an embodiment of the present application. As shown in fig. 11, a cluster of computing devices includes at least one computing device 1000. The memory 1006 in one or more computing devices 1000 in the cluster of computing devices may have stored therein instructions of the same resource allocation method for performing tasks.
In some possible implementations, some instructions of the resource allocation method for performing the task may also be stored separately in the memory 1006 of one or more computing devices 1000 in the computing device cluster. In other words, a combination of one or more computing devices 1000 may collectively execute instructions of a resource allocation method for performing a task.
It should be noted that the memories 1006 in different computing devices 1000 in the computing device cluster may store different instructions for performing part of the functions of the data processing apparatus. That is, instructions stored in memory 1006 in different computing devices 1000 may implement the functionality of one or more of the acquisition modules and processing modules described above.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 12 shows one possible implementation. Fig. 12 is a schematic structural diagram of another computing device cluster according to an embodiment of the present application. As shown in fig. 12, in a computing device cluster 1200, two computing devices 1000A and 1000B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of the acquisition module are stored in memory 1006 in computing device 1000A. Meanwhile, instructions to perform the functions of a processing module are stored in the memory 1006 in the computing device 1000B.
It should be appreciated that the functionality of computing device 1000A shown in fig. 12 may also be performed by multiple computing devices 1000. Likewise, the functionality of computing device 1000B may also be performed by multiple computing devices 1000.
Referring to fig. 13, fig. 13 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present application. The present application also provides a computer-readable storage medium, in some embodiments, the execution flow described by the above embodiments may be embodied as computer program instructions encoded on the computer-readable storage medium or other non-transitory medium or article of manufacture in a machine-readable format.
Fig. 13 schematically illustrates a conceptual partial view of an example computer-readable storage medium comprising a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.
In one embodiment, computer-readable storage medium 1300 is provided using signal bearing medium 1301. The signal bearing medium 1301 may include one or more program instructions 1302 that when executed by one or more processors may provide the functionality or portions of the functionality described above with respect to the database system described above.
In some examples, signal bearing medium 1301 may comprise a computer readable medium 1303, such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disc (DVD), a digital tape, memory, ROM, or RAM, and the like.
In some implementations, the signal bearing medium 1301 may comprise a computer recordable medium 1304, such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like. In some implementations, the signal bearing medium 1301 may include a communication medium 1305, such as, but not limited to, a digital and/or analog communication medium (e.g., fiber optic cable, waveguide, wired communications link, wireless communications link, etc.). Thus, for example, the signal bearing medium 1301 may be conveyed by a communication medium 1305 in wireless form (e.g., a wireless communication medium compliant with the IEEE 802.X standard or other transmission protocol).
The one or more program instructions 1302 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, a computing device of the computing device may be configured to provide various operations, functions, or actions in response to program instructions 1302 communicated to the computing device through one or more of computer-readable medium 1303, computer-recordable medium 1304, and/or communication medium 1305.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any useful medium. The computer program product, when run on at least one computing device, causes the at least one computing device to perform the resource allocation method of tasks described in the embodiments above.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method of the various embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

Claims (24)

Translated fromUnknown language
1.一种任务的资源分配方法,其特征在于,包括:1. A method for allocating resources for a task, comprising:获取第一任务,所述第一任务用于指示计算集群处理满足数据筛选条件的数据;Obtaining a first task, where the first task is used to instruct the computing cluster to process data that meets a data screening condition;基于所述数据筛选条件,以及所述计算集群所接入的数据总量,预测所述计算集群执行第一任务时需处理的目标数据量;Based on the data screening condition and the total amount of data accessed by the computing cluster, predicting a target amount of data that needs to be processed by the computing cluster when executing the first task;基于所述目标数据量,预测所述第一任务执行时所需的目标资源量,所述目标资源量用于指示处理器资源量和/或内存量;Based on the target data volume, predicting a target resource volume required for executing the first task, where the target resource volume indicates a processor resource volume and/or a memory volume;向所述计算集群发送任务处理请求,所述任务处理请求用于请求所述计算集群处理所述第一任务且指定所述计算集群为所述第一任务所分配的资源量为所述目标资源量。A task processing request is sent to the computing cluster, where the task processing request is used to request the computing cluster to process the first task and specify the amount of resources allocated by the computing cluster to the first task as the target amount of resources.2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, further comprising:基于所述数据筛选条件,确定所述第一任务执行时所涉及的数据重分布操作,所述数据重分布操作用于重新分配所述计算集群上的多个计算节点所处理的数据;Determining, based on the data screening condition, a data redistribution operation involved in the execution of the first task, wherein the data redistribution operation is used to redistribute data processed by multiple computing nodes on the computing cluster;所述基于所述目标数据量,预测所述第一任务执行时所需的目标资源量,包括:The predicting, based on the target data volume, a target resource volume required for executing the first task includes:基于所述目标数据量和所述数据重分布操作,预测所述第一任务执行时所需的目标资源量。Based on the target data volume and the data redistribution operation, a target resource volume required for execution of the first task is predicted.3.根据权利要求2所述的方法,其特征在于,所述基于所述目标数据量,预测所述第一任务执行时所需的目标资源量,包括:3. The method according to claim 2, wherein predicting the target amount of resources required for executing the first task based on the target data amount comprises:基于所述目标数据量和所述数据重分布操作,通过回归模型预测所述第一任务执行时所需的目标资源量;Based on the target data volume and the data redistribution operation, predicting a target resource volume required for executing the first task through a regression model;所述回归模型是基于所述计算集群的历史任务处理数据拟合得到的。The regression model is obtained by fitting based on historical task processing data of the computing cluster.4.根据权利要求3所述的方法,其特征在于,所述历史任务处理数据对应的任务处理效率满足预设条件,所述任务处理效率与任务的数据量、任务所分配的资源量以及任务的执行时间相关。4. The method according to claim 3 is characterized in that the task processing efficiency corresponding to the historical task processing data meets preset conditions, and the task processing efficiency is related to the data volume of the task, the amount of resources allocated to the task, and the execution time of the task.5.根据权利要求1-4任意一项所述的方法,其特征在于,所述计算集群中包括用于处理任务的多个任务队列,所述任务处理请求还用于指定处理所述第一任务的目标队列,所述目标队列为所述多个任务队列中的一个任务队列。5. The method according to any one of claims 1-4 is characterized in that the computing cluster includes multiple task queues for processing tasks, and the task processing request is further used to specify a target queue for processing the first task, and the target queue is a task queue among the multiple task queues.6.根据权利要求5所述的方法,其特征在于,所述方法还包括:6. The method according to claim 5, further comprising:基于当前时间下所述多个任务队列中每个任务队列的资源占用情况,在所述多个任务队列的历史样本中,为所述每个任务队列检索到资源占用情况最接近的多个目标样本,所述历史样本用于指示任务队列在历史时间段下的资源占用情况;Based on the resource occupancy of each of the multiple task queues at the current time, retrieving a plurality of target samples with the closest resource occupancy for each task queue from historical samples of the multiple task queues, the historical samples being used to indicate the resource occupancy of the task queue in the historical time period;基于所述每个任务队列对应的多个目标样本,预测所述每个任务队列在未来时间下的预期资源占用情况;Based on the multiple target samples corresponding to each task queue, predict the expected resource occupancy of each task queue in the future;基于所述预期资源占用情况以及所述目标数据量,在所述多个任务队列中确定所述目标队列用于处理所述第一任务。Based on the expected resource occupancy and the target data volume, the target queue is determined among the multiple task queues for processing the first task.7.根据权利要求5或6所述的方法,其特征在于,所述任务处理请求还用于指示所述第一任务的处理优先级,所述处理优先级是基于所述目标资源量确定的。7 . The method according to claim 5 , wherein the task processing request is further used to indicate a processing priority of the first task, and the processing priority is determined based on the target resource amount.8.根据权利要求7所述的方法,其特征在于,所述处理优先级还与所述第一任务的预期运行时间相关,所述预期运行时间是基于所述目标数据量和所述目标资源量确定的。8 . The method according to claim 7 , wherein the processing priority is further related to an expected running time of the first task, and the expected running time is determined based on the target data volume and the target resource volume.9.根据权利要求1-8任意一项所述的方法,其特征在于,所述第一任务为在所述计算集群中周期性执行的任务。9 . The method according to claim 1 , wherein the first task is a task that is periodically executed in the computing cluster.10.根据权利要求1-9任意一项所述的方法,其特征在于,所述第一任务为基于结构化查询语言SQL的数据查询任务,所述数据筛选条件包括自定义的数据选择条件、数据统计字段、数据统计维度以及数据筛选时间段中的一个或多个。10. The method according to any one of claims 1 to 9, characterized in that the first task is a data query task based on the structured query language SQL, and the data screening conditions include one or more of customized data selection conditions, data statistical fields, data statistical dimensions, and data screening time periods.11.一种任务的资源分配装置,其特征在于,包括:11. A task resource allocation device, comprising:获取模块,用于获取第一任务,所述第一任务用于指示计算集群处理满足数据筛选条件的数据;an acquisition module, configured to acquire a first task, wherein the first task is configured to instruct the computing cluster to process data that meets a data screening condition;处理模块,用于基于所述数据筛选条件,以及所述计算集群所接入的数据总量,预测所述计算集群执行第一任务时需处理的目标数据量;a processing module, configured to predict a target amount of data to be processed by the computing cluster when executing the first task based on the data screening condition and the total amount of data accessed by the computing cluster;所述处理模块,还用于基于所述目标数据量,预测所述第一任务执行时所需的目标资源量,所述目标资源量用于指示处理器资源量和/或内存量;The processing module is further configured to predict a target amount of resources required for executing the first task based on the target data amount, where the target amount of resources indicates an amount of processor resources and/or an amount of memory;所述处理模块,还用于向所述计算集群发送任务处理请求,所述任务处理请求用于请求所述计算集群处理所述第一任务且指定所述计算集群为所述第一任务所分配的资源量为所述目标资源量。The processing module is further configured to send a task processing request to the computing cluster, wherein the task processing request is configured to request the computing cluster to process the first task and specify that the amount of resources allocated by the computing cluster to the first task is the target amount of resources.12.根据权利要求11所述的装置,其特征在于,所述处理模块,还用于:12. The device according to claim 11, wherein the processing module is further configured to:基于所述数据筛选条件,确定所述第一任务执行时所涉及的数据重分布操作,所述数据重分布操作用于重新分配所述计算集群上的多个计算节点所处理的数据;Determining, based on the data screening condition, a data redistribution operation involved in the execution of the first task, wherein the data redistribution operation is used to redistribute data processed by multiple computing nodes on the computing cluster;基于所述目标数据量和所述数据重分布操作,预测所述第一任务执行时所需的目标资源量。Based on the target data volume and the data redistribution operation, a target resource volume required for execution of the first task is predicted.13.根据权利要求12所述的装置,其特征在于,所述处理模块,还用于:13. The device according to claim 12, wherein the processing module is further configured to:基于所述目标数据量和所述数据重分布操作,通过回归模型预测所述第一任务执行时所需的目标资源量;Based on the target data volume and the data redistribution operation, predicting a target resource volume required for executing the first task through a regression model;所述回归模型是基于所述计算集群的历史任务处理数据拟合得到的。The regression model is obtained by fitting based on historical task processing data of the computing cluster.14.根据权利要求13所述的装置,其特征在于,所述历史任务处理数据对应的任务处理效率满足预设条件,所述任务处理效率与任务的数据量、任务所分配的资源量以及任务的执行时间相关。14. The device according to claim 13 is characterized in that the task processing efficiency corresponding to the historical task processing data meets a preset condition, and the task processing efficiency is related to the data volume of the task, the amount of resources allocated to the task, and the execution time of the task.15.根据权利要求11-14任意一项所述的装置,其特征在于,所述计算集群中包括用于处理任务的多个任务队列,所述任务处理请求还用于指定处理所述第一任务的目标队列,所述目标队列为所述多个任务队列中的一个任务队列。15. The device according to any one of claims 11-14 is characterized in that the computing cluster includes multiple task queues for processing tasks, and the task processing request is further used to specify a target queue for processing the first task, and the target queue is a task queue among the multiple task queues.16.根据权利要求15所述的装置,其特征在于,所述处理模块,还用于:16. The device according to claim 15, wherein the processing module is further configured to:基于当前时间下所述多个任务队列中每个任务队列的资源占用情况,在所述多个任务队列的历史样本中,为所述每个任务队列检索到资源占用情况最接近的多个目标样本,所述历史样本用于指示任务队列在历史时间段下的资源占用情况;Based on the resource occupancy of each of the multiple task queues at the current time, retrieving a plurality of target samples with the closest resource occupancy for each task queue from historical samples of the multiple task queues, the historical samples being used to indicate the resource occupancy of the task queue in the historical time period;基于所述每个任务队列对应的多个目标样本,预测所述每个任务队列在未来时间下的预期资源占用情况;Based on the multiple target samples corresponding to each task queue, predict the expected resource occupancy of each task queue in the future;基于所述预期资源占用情况以及所述目标数据量,在所述多个任务队列中确定所述目标队列用于处理所述第一任务。Based on the expected resource occupancy and the target data volume, the target queue is determined among the multiple task queues for processing the first task.17.根据权利要求15或16所述的装置,其特征在于,所述任务处理请求还用于指示所述第一任务的处理优先级,所述处理优先级是基于所述目标资源量确定的。17 . The apparatus according to claim 15 , wherein the task processing request is further used to indicate a processing priority of the first task, and the processing priority is determined based on the target resource amount.18.根据权利要求17所述的装置,其特征在于,所述处理优先级还与所述第一任务的预期运行时间相关,所述预期运行时间是基于所述目标数据量和所述目标资源量确定的。18 . The apparatus according to claim 17 , wherein the processing priority is further related to an expected running time of the first task, and the expected running time is determined based on the target data volume and the target resource volume.19.根据权利要求11-18任意一项所述的装置,其特征在于,所述第一任务为在所述计算集群中周期性执行的任务。19. The device according to any one of claims 11 to 18, wherein the first task is a task that is periodically executed in the computing cluster.20.根据权利要求11-19任意一项所述的装置,其特征在于,所述第一任务为基于结构化查询语言SQL的数据查询任务,所述数据筛选条件包括自定义的数据选择条件、数据统计字段、数据统计维度以及数据筛选时间段中的一个或多个。20. The device according to any one of claims 11 to 19, wherein the first task is a data query task based on the structured query language SQL, and the data screening conditions include one or more of a customized data selection condition, a data statistical field, a data statistical dimension, and a data screening time period.21.一种计算设备,其特征在于,包括处理器和存储器;所述处理器用于执行所述存储器中存储的指令,以使得所述计算设备执行如权利要求1至10任一所述的方法的操作步骤。21. A computing device, comprising a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device performs the operating steps of the method according to any one of claims 1 to 10.22.一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;22. A computing device cluster, comprising at least one computing device, each computing device comprising a processor and a memory;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至10任一所述的方法的操作步骤。The processor of the at least one computing device is configured to execute instructions stored in a memory of the at least one computing device, so that the computing device cluster performs the operating steps of the method according to any one of claims 1 to 10.23.一种计算机存储介质,其特征在于,所述计算机存储介质存储有指令,所述指令在由计算机执行时使得所述计算机实施权利要求1至10任意一项所述的方法。23. A computer storage medium, characterized in that the computer storage medium stores instructions, which, when executed by a computer, enable the computer to implement the method according to any one of claims 1 to 10.24.一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,所述指令在由计算机执行时使得所述计算机实施权利要求1至10任意一项所述的方法。24. A computer program product, characterized in that the computer program product stores instructions, which, when executed by a computer, enable the computer to implement the method according to any one of claims 1 to 10.
CN202410405255.XA2024-04-03Task resource allocation method and related devicePendingCN120780447A (en)

Publications (1)

Publication NumberPublication Date
CN120780447Atrue CN120780447A (en)2025-10-14

Family

ID=

Similar Documents

PublicationPublication DateTitle
EP4160405A1 (en)Task execution method and storage device
Grover et al.Extending map-reduce for efficient predicate-based sampling
CN110825526B (en)Distributed scheduling method and device based on ER relationship, equipment and storage medium
JP2016042284A (en)Parallel computer system, management device, method for controlling parallel computer system, and management device control program
US11762860B1 (en)Dynamic concurrency level management for database queries
Ma et al.Dependency-aware data locality for MapReduce
CN116827950A (en)Cloud resource processing method, device, equipment and storage medium
CN116010447A (en) A load balancing method and device for optimizing heterogeneous database user query
Shahmirzadi et al.Analyzing the impact of various parameters on job scheduling in the Google cluster dataset
CN113792079B (en)Data query method and device, computer equipment and storage medium
JP2009037369A (en) How to allocate resources to the database server
CN113407343A (en)Service processing method, device and equipment based on resource allocation
CN110928649A (en)Resource scheduling method and device
CN118467113A (en) A container-aware scheduling method, product, device and medium
CN109992468A (en) A process performance analysis method, device, system and computer storage medium
Maala et al.Cluster trace analysis for performance enhancement in cloud computing environments
CN113296877A (en)Data processing method and device, computer storage medium and electronic equipment
CN120780447A (en)Task resource allocation method and related device
CN117215732A (en)Task scheduling method, device, system and related equipment
KR20230053809A (en)Big data preprocessing method and apparatus using distributed nodes
US20250130726A1 (en)Data processing method, apparatus, device, and system
CN111090796A (en)Data mining algorithm based on MapReduce
US20240272818A1 (en)Data processing method and apparatus, device, and system
CN119474765B (en) A web-based big data analysis method and related equipment
CN113055476B (en)Cluster type service system, method, medium and computing equipment

Legal Events

DateCodeTitleDescription
PB01Publication

[8]ページ先頭

©2009-2025 Movatter.jp