Movatterモバイル変換


[0]ホーム

URL:


CN113034166A - Method and device for acquiring cloud service and cloud management server - Google Patents

Method and device for acquiring cloud service and cloud management server
Download PDF

Info

Publication number
CN113034166A
CN113034166ACN201911344792.3ACN201911344792ACN113034166ACN 113034166 ACN113034166 ACN 113034166ACN 201911344792 ACN201911344792 ACN 201911344792ACN 113034166 ACN113034166 ACN 113034166A
Authority
CN
China
Prior art keywords
job
instance
preemptive
time
expected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911344792.3A
Other languages
Chinese (zh)
Other versions
CN113034166B (en
Inventor
田永军
何万青
贺荣徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding LtdfiledCriticalAlibaba Group Holding Ltd
Priority to CN201911344792.3ApriorityCriticalpatent/CN113034166B/en
Publication of CN113034166ApublicationCriticalpatent/CN113034166A/en
Application grantedgrantedCritical
Publication of CN113034166BpublicationCriticalpatent/CN113034166B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The application discloses a method and a device for obtaining cloud service and a cloud management server, wherein jobs are subjected to priority division, preemptive instances are distributed to jobs with low priority, the jobs with low priority are guaranteed to be continuously or discontinuously calculated by using the preemptive instances, the preemptive instances with low cost are fully utilized, and the operation cost of the jobs is reduced.

Description

Method and device for acquiring cloud service and cloud management server
Technical Field
The present application relates to, but not limited to, cloud computing technologies, and in particular, to a method and an apparatus for acquiring a cloud service, and a cloud management server.
Background
More and more High Performance Computing (HPC) users deploy HPC clusters on public clouds for large-scale Computing.
The payment modes generally provided by the public cloud are prepayment and post-payment, wherein the prepayment comprises year per package, month per package and the like, and the longer the prepayment one-time purchase time is, the higher the price discount is; the post-payment is used according to needs, payment is made according to the use duration, large cloud manufacturers can basically settle accounts in seconds, the post-payment mode enables users to fully enjoy the elasticity brought by cloud computing, and the unit cost of general post-payment is higher than that of pre-payment. For HPC users in some areas, such as animation rendering users, the traffic computation is not evenly distributed per year, often requiring large-scale clusters in a short time to complete the rendering task, so the pre-paid model is not suitable, and it becomes a bottleneck if a post-paid pricing factor is used.
As public cloud computing grows in size, large cloud vendors offer a post-paid model, such as a preemptive instance (which may also be referred to as a bidding instance). This post-paid model price is floating, with a relatively large discount, and the user can set a bid or purchase a preemptive instance at the floating market price. However, instances purchased by a user may be proactively released for reclamation by the cloud service due to inventory scheduling of the public cloud or prices exceeding user bids, etc. That is, there is no real guarantee that the preemptive instance is well utilized by the job, and thus the job running cost is not well controlled.
Disclosure of Invention
The application provides a method and a device for acquiring cloud service and a cloud management server, which can make full use of low-cost preemptive instances and reduce operation cost.
The embodiment of the invention provides a method for acquiring cloud service, which comprises the following steps:
the cloud management server determines the priority level of the job according to the attribute information of the job;
allocating a preemption instance for the job with low priority level and running the job;
and learning that the running preemptive instance is about to be released, and continuously trying to allocate the preemptive instance to the job until the job running is finished.
In one illustrative example, the attribute information of the job includes: expected calculation duration and expected job end time;
the determining the priority level of the job comprises:
and acquiring the priority level of the job according to the expected calculation time of the job, the expected job ending time of the job, the running time of the job and the time interval length from the submission of the job to the current time.
In one illustrative example, the obtaining the priority level of the job comprises:
the value of Priority is calculated according to the following formula: priority ═ factor ((expectedfinished time-ElapsedTime) - (WallTime-allreadyfrunningtime)) ×; wherein, the factor represents a preset global influence factor, and the value range is [0, 1 ]; expectedfining time represents the time length expected by a user from the time of job submission to the time of job operation completion; ElapsedTime represents the length of the time interval from job submission to the current time; the AlreadyRunningTime represents the running time of the job, and the WallTime represents the time length of the job needing to be run;
when the calculated Priority is smaller than a preset first threshold, indicating that the Priority level is high Priority; when the calculated Priority is greater than or equal to a preset second threshold, indicating that the Priority level is low Priority; when the calculated Priority is greater than or equal to a first threshold and less than a second threshold, indicating that the Priority level is a medium Priority; wherein the first threshold is less than the second threshold.
In an exemplary embodiment, the cloud management server periodically determines the priority level of the job according to a preset period.
In one illustrative example, the attribute information of the job further includes: an expected cost;
the allocating a preemption instance and running the job includes:
and allocating computing resources to the jobs according to the expected cost, and counting the cost spent by the jobs in real time.
In one illustrative example, the attribute information of the job further includes: expected maximum operating costs; the method further comprises the following steps:
and the cloud management server expands the computing resources of the preemptive instance with the capacity lower than the expected maximum operation cost according to the counted cost already spent by the operation and the expected maximum operation cost.
In one illustrative example, the method further comprises:
the cloud management server allocates a prepayment mode for the operation with the medium priority level;
and the cloud management server allocates a volume instance for the job with the priority level of high priority.
In one illustrative example, said attempting to allocate a preemptive instance for a job comprises:
if the application software used by the user in the operation supports breakpoint resuming calculation, resetting the state of the operation into queue and storing the running state information of the operation so as to wait for new computing resources to resume calculation;
if the application software used by the user in the operation does not support breakpoint continued computation, setting the current preemptive instance to be dormant before the preemptive instance is released; resetting the state of the job to be queued and recording hibernation file information to wait for new computing resources; and when the capacity expansion of the computing resources is completed, the dormant file is used for continuing computing.
In one illustrative example, the method further comprises:
and judging whether the current time is close to the expected operation ending time in the attribute information of the operation, if so, expanding the capacity or keeping enough duration of the preemptive instance, and expanding the capacity of the pay-per-view instance by the cloud management server to operate the operation.
In one illustrative example, the determining whether the current time is near a desired job end time in the attribute information of the job includes:
presetting a time interval threshold, and calculating the absolute value of the difference value between the current time and the expected job end time in the attribute information of the job; and if the absolute value is less than or equal to a time interval threshold value, judging that the current time is close to the expected job end time in the attribute information of the job.
The application also provides a computer-readable storage medium, which stores computer-executable instructions for executing any one of the above methods for acquiring cloud services.
The application further provides an apparatus for implementing cloud computing resource allocation, comprising a memory and a processor, wherein the memory stores the following instructions executable by the processor: for performing the steps of any of the above described methods of obtaining cloud services.
The present application further provides an apparatus for acquiring cloud service, including: the system comprises a first processing module, a resource allocation module and a second processing module; wherein,
the first processing module is used for determining the priority level of the job according to the attribute information of the job;
the resource allocation module is used for allocating a preemption instance for the operation with low priority level and running the operation; receiving a notice from the second processing module, and continuously trying to allocate the preemptive instance to the job until the job operation is finished;
and the second processing module is used for knowing that the operating preemptive instance is about to be released and informing the resource allocation module.
In an exemplary embodiment, the first processing module determines the priority level of the job at regular time according to a preset period.
In an illustrative example, the resource allocation module is further configured to:
allocating a prepayment mode for the operation with the medium priority level;
a volume instance is assigned to a job whose priority level is high.
In an exemplary embodiment, the resource allocation module, upon receiving the notification from the second processing module, continuously attempts to allocate a preemptive instance to the job until the job runs out, and specifically is configured to:
acquiring that the preemptive instance in operation is about to be released, if the application software used by a user in the operation supports breakpoint resuming, resetting the state of the operation to be queue and storing the operation state information of the operation so as to wait for new computing resources to resume computing; if the application software used by the user in the operation does not support breakpoint continued computation, setting the current preemptive instance to be dormant before the preemptive instance is released; resetting the state of the job to be queued and recording hibernation file information to wait for new computing resources; and when the capacity expansion of the computing resources is finished, the dormant file is used for continuing computing.
In an illustrative example, the resource allocation module is further configured to:
and judging whether the current time is close to the expected operation ending time in the attribute information of the operation, if so, expanding the capacity or keeping enough duration of the preemptive example, and expanding the capacity and running the operation according to the volume payment example.
In one illustrative example, allocating a preemptive instance and running the job in the resource allocation module includes: distributing computing resources to the jobs according to the expected cost in the attribute information of the jobs, and counting the cost spent by the jobs in real time;
the attribute information of the job further includes: expected maximum operating costs; the resource allocation module is further configured to: and expanding the computing resources of the preemptive instance with the volume lower than the expected maximum operation cost according to the counted cost of the operation and the expected maximum operation cost.
The application also provides a cloud management server which comprises the device for acquiring the cloud service.
According to the method and the device, the jobs are subjected to priority division, the preemptive instances are distributed to the jobs with low priorities, the jobs with low priorities are guaranteed to be continuously or discontinuously calculated by using the preemptive instances, the preemptive instances with low cost are fully utilized, and the job running cost is reduced.
In an exemplary embodiment, if the capacity still cannot be expanded or the preemptive instance cannot be reserved for a sufficient time near the expected job end time in the attribute information of the job, the cloud management server further automatically expands the pay-per-view instance to run the job. Therefore, the calculation of the operation is completed on time, and the operation running cost is reduced to the maximum extent.
In one illustrative example, the present application further expands the computational resources of the preemptive instance below the expected maximum operating cost based on the expected maximum operating cost. Therefore, the calculation of the operation within controllable cost is fully guaranteed.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
Fig. 1 is a flowchart of a method for acquiring cloud services according to the present application;
fig. 2 is a schematic view of an application scenario of the method for acquiring cloud services according to the present application;
fig. 3 is a schematic structural diagram of a device for acquiring cloud services according to the present application;
fig. 4 is a schematic flowchart of a first embodiment of obtaining cloud services according to the present application;
fig. 5 is a flowchart illustrating a second embodiment of obtaining cloud services according to the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In one exemplary configuration of the present application, a computing device includes one or more processors (CPUs), input/output interfaces, a network interface, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
A preemptive instance is an on-demand instance intended to reduce the cost of using an Elastic Cloud Server (ECS) in some application scenarios. When creating the preemptive instance, the user needs to specify the bid mode, and when the current market price of the specified instance specification is lower than the bid of the user, the user can successfully create the preemptive instance and charge according to the current market price. After the preemptive instance is successfully created, the operation is the same as that of the pay-by-weight instance, and the user can also use the preemptive instance in combination with other cloud products (such as a cloud disk, an EIP address and the like). The preemptive instance once created has a specified duration, such as a one-hour guard period, i.e. within the first hour after creation, the user's instance is not released due to market demand and supply, and the user can run the service normally on the preemptive instance. If the market price at a certain moment is higher than the bid of the user or the resource inventory is insufficient, the preemptive instance of the user is released; otherwise, the user can continue to use the preemptive instance, and the user can actively release the preemptive instance in use.
Fig. 1 is a flowchart of a method for acquiring cloud services according to the present application, as shown in fig. 1, including:
step 100: and the cloud management server determines the priority level of the job according to the attribute information of the job.
In an illustrative example, a job submitted by a user is a specific task for running an application. Application software such as simulation software and the like.
In one illustrative example, the attribute information of a job includes at least:
the expected calculation time WallTime represents the time length of the operation required to be run;
the expected job end time ExpectedFinishTime indicates a time length expected by a user from the time of job submission to the time of job completion, for example, 2 days, 48 hours, and the like.
Determining the priority level of the job in this step includes:
acquiring the priority level of the operation according to the expected calculated time WallTime of the operation, the running time AlreadyRunningTime of the operation, the expected operation end time expectedFinishTime and the time interval length ElapsedTime from the operation submission to the current time;
in one illustrative example, determining Priority information (Priority) for a job may be as shown in equation (1):
Priority=((ExpectedFinishTime–ElapsedTime)–(WallTime-AlreadyRunningTime))×factor (1)
the factor represents a preset global influence factor, the value range is [0, 1], and the value of the global influence factor can be set according to actual requirements. The default value of the global impact factor may be 1. The lower the calculated Priority value, the higher the Priority level.
In an exemplary embodiment, the first threshold and the second threshold may be preset according to an actual application scenario, and the first threshold is smaller than the second threshold, so that: when the calculated Priority is less than a first threshold, indicating that the Priority level is high Priority; when the calculated Priority is greater than or equal to the second threshold, indicating that the Priority level is a low Priority; and when the calculated Priority is greater than or equal to the first threshold and less than the second threshold, indicating that the Priority level is a medium Priority, wherein the medium Priority is between the high Priority and the low Priority.
In an exemplary example, the cloud management server may periodically determine the priority level of the job at a preset period.
In one illustrative example, jobs of the same priority level may be placed in the same queue for subsequent processing. In the embodiment of the application, the priority level of the job is determined in real time, so that the job queue can be adjusted in real time, and the job can be distributed to the suitable running examples.
Step 101: and the cloud management server allocates a preemption instance for the job with low priority level and runs the job.
In one illustrative example, the attribute information of the job further includes: an expected cost; allocating the preemption instance and running the job in this step may include:
and allocating proper computing resources to the jobs according to the expected cost in the attribute information of the jobs, namely setting proper bids to ensure that the expected cost is not exceeded to automatically expand the preemptive instances, and counting the cost spent by the jobs in real time.
In one illustrative example, the present step further comprises:
and the cloud management server allocates a prepayment mode such as a bag year and a bag month example for the jobs with the medium priority level.
In one illustrative example, the present step further comprises:
the cloud management server allocates a volume instance for jobs with a priority level of high priority for immediate computation.
For example, assuming that the first threshold is 10 and the second threshold is 20, then,
if Priority > is 20, then place the job in queue Low;
if 20> Priority > 10, then the job is placed in queue Normal;
if Priority <10, place the job in queue High;
in this way, the cloud management server may obtain job information of each queue, such as: acquiring operation information queued in a queue Low, calculating the number of cores needing capacity expansion, and trying to automatically expand preemptive examples with various specifications; the following steps are repeated: acquiring operation information queued in a queue High, calculating the number of cores needing to be expanded, and trying to automatically expand the pay-by-volume examples with various specifications so as to ensure that the operation is completed as soon as possible; for another example: jobs in queue Normal may queue waiting for computing resources to be free.
Step 102: and the cloud management server learns that the running preemptive instance is about to be released, and continuously tries to distribute the preemptive instance for the job until the job running is finished.
In one illustrative example, the continually attempting to assign preemptive instances to jobs in this step may include:
if the application software used by the user in the job supports breakpoint resume, resetting the state of the job to be queue (queue), wherein the application software used by the user in the job can automatically save a checkpoint file, and the file can save the running state information of the job; waiting for new computing resources, and continuing the computation of the operation according to the previous progress;
if the application software used by the user in the job does not support breakpoint resumption, setting the current preemptive instance to be dormant (namely other jobs cannot be extended to the preemptive instance) before the preemptive instance is released; resetting the status of the job to queue (queue) and recording hibernation file information to wait for new computing resources; and when the capacity expansion of the computing resources is finished, the dormant file is used for continuing computing. The hibernation file information includes, for example, information of all memory files in the operating system and during the operation of the user.
Fig. 2 is a schematic view of an application scenario of the method for acquiring cloud services according to the present application, and as shown in fig. 2, the present application performs priority division on jobs, allocates preemptive instances to jobs with low priority, ensures that jobs with low priority use preemptive instance calculation continuously or discontinuously, makes full use of preemptive instances with low cost, and reduces job running cost.
In an exemplary embodiment, in the method for acquiring cloud service shown in fig. 1, in continuously attempting to allocate a preemptive instance to run a low-priority job, the present application further includes:
and judging whether the current time is close to the expected operation ending time in the attribute information of the operation, if so, still not expanding the capacity or not keeping enough duration of the preemptive instance, and then automatically expanding the capacity of the pay-per-view instance by the cloud management server to operate the operation.
Therefore, the calculation of the operation is completed on time, and the operation running cost is reduced to the maximum extent.
In an exemplary embodiment, determining whether the current time is close to the expected job end time ExpectedFinishTime in the attribute information of the job may include:
the time interval threshold is preset, the absolute value of the difference value between the current time and the expected job end time expectedfining time in the attribute information of the job is calculated, and if the absolute value is smaller than or equal to the time interval threshold, the expected job end time expectedfining time in the attribute information of the job adjacent to the current time is judged.
In one illustrative example, the attribute information of the job further includes: an expected maximum operating cost representing an upper limit value of a cost that a user desires to calculate a job; the method for acquiring the cloud service further comprises the following steps:
and expanding the computing resources of the preemptive instance with the cost which is lower than the expected maximum operation cost according to the cost which is counted in thestep 101 and the expected maximum operation cost of the job. Therefore, the calculation of the operation within controllable cost is fully guaranteed.
In one embodiment of the application, expected computing time, expected computing cost and the like are taken as factors for acquiring the cloud service, and more acquisition modes are provided on the basis of related technologies; moreover, the low-cost preemptive instance on the cloud is fully utilized, and the calculation cost is reduced to the maximum extent on the premise of ensuring that the job is calculated on time, so that the job running cost is controlled.
The application also provides a computer-readable storage medium storing computer-executable instructions for executing any one of the above methods for acquiring cloud services.
The present application further provides an apparatus for implementing obtaining cloud services, including a memory and a processor, where the memory stores the following instructions executable by the processor: for performing the steps of the method of obtaining cloud services of any of the above.
Fig. 3 is a schematic structural diagram of a device for acquiring cloud services according to the present application, as shown in fig. 3, the structural diagram at least includes: the system comprises a first processing module, a resource allocation module and a second processing module; wherein,
the first processing module is used for determining the priority level of the job according to the attribute information of the job;
the resource allocation module is used for allocating a preemption instance for the operation with low priority level and running the operation; receiving a notice from the second processing module, and continuously trying to allocate the preemptive instance to the job until the job operation is finished;
and the second processing module is used for knowing that the operating preemptive instance is about to be released and informing the resource allocation module.
In one illustrative example, the attribute information of a job includes at least: expected compute kernel WallTime, expected job end time ExpectedFinishTime.
In an exemplary embodiment, the first processing module is specifically configured to:
calculating the Priority information Priority of the operation according to the expected calculated time WallTime, the expected operation ending time expectedFinishTime, the running time AlreadyRunningTime of the operation and the time interval length from the operation submission to the current time;
when the calculated Priority is smaller than a preset first threshold, indicating that the Priority level is high Priority; when the calculated Priority is greater than or equal to a preset second threshold, indicating that the Priority level is low Priority; and when the calculated Priority is greater than or equal to the first threshold and less than the second threshold, indicating that the Priority level is a medium Priority, wherein the medium Priority is between the high Priority and the low Priority. Wherein the first threshold is less than the second threshold.
In an exemplary embodiment, the first processing module determines the priority level of the job at regular intervals according to a preset period.
In one illustrative example, the resource allocation module is further configured to:
allocating prepayment modes such as year-in-package and month-in-package examples for the jobs with the medium priority level;
a job whose priority level is high is assigned a volume instance for immediate computation.
In an exemplary embodiment, the resource allocation module, upon receiving the notification from the second processing module, continuously attempts to allocate a preemptive instance to the job until the job runs out, and is specifically configured to:
knowing that the preemptive instance in operation is about to be released, if the application software application used by the user in the operation supports breakpoint continuation, resetting the state of the operation to be queue (queue), wherein the application software used by the user in the operation can automatically save a checkpoint file, and the file can save the operation state information of the operation; waiting for new computing resources, and continuing the computation of the operation according to the previous progress; if the application software used by the user in the operation does not support breakpoint continuous calculation, setting the current preemptive instance to be dormant (namely other operations cannot be extended to the preemptive instance) before the preemptive instance is released; resetting the status of the job to queue (queue) and recording hibernation file information to wait for new computing resources; and when the capacity expansion of the computing resources is finished, the dormant file is used for continuing computing.
According to the method and the device, the jobs are subjected to priority division, the preemptive instances are distributed to the jobs with low priorities, the jobs with low priorities are guaranteed to be continuously or discontinuously calculated by using the preemptive instances, the preemptive instances with low cost are fully utilized, and the job running cost is reduced.
In one illustrative example, the resource allocation module is further configured to:
and judging whether the current time is close to the expected operation ending time ExpectedFinishTim in the attribute information of the operation, if so, expanding the capacity or keeping enough duration of the preemptive instance, and automatically expanding the capacity and running the operation according to the volume payment instance.
Therefore, the calculation of the operation is completed on time, and the operation running cost is reduced to the maximum extent.
In one illustrative example, allocating a preemption instance and running the job in a resource allocation module includes: allocating and calculating resource jobs according to expected costs in the attribute information of the jobs, and counting costs already spent by the jobs in real time;
the attribute information of the job further includes: an expected maximum operating cost representing an upper limit value of a cost that a user desires to calculate a job; the resource allocation module is further configured to:
and expanding the computing resources of the preemptive instance which are lower than the expected maximum operation cost according to the counted cost of the job and the expected maximum operation cost.
Therefore, the calculation of the operation within controllable cost is fully guaranteed.
The application also provides a cloud management server which comprises any device for acquiring the cloud service.
The technical solution of the present application is described in detail below with reference to specific embodiments.
A first embodiment, fig. 4 is a flowchart illustrating a first embodiment of obtaining cloud services according to the present application, where the first embodiment describes a process in which a user submits a job and performs computation using a preemptive instance. In this embodiment, it is assumed that the functions of the cloud management server are completed by a queue service, a job scheduling service, and a resource scheduling service, as shown in fig. 4, the method includes:
step 400: the queue service receives jobs submitted by users.
In the first embodiment, it is assumed that attribute information of a submitted job includes: the expected calculation time WallTime is 10 hours, and the expected job end time ExpectedFinishTime is two days, i.e., the calculation of the job is expected to be completed in two days.
Step 401 to step 402: the job scheduling service calculates a priority level of the job. Assuming that the priority level of the job is calculated as a Low priority according to equation (1), the job is placed in the queue Low.
Step 403 to step 405: the resource scheduling service inquires job queue information, automatically expands the preemptive instance for the job in the queue Low through cloud service such as ECS service, and adds the node into the cluster to run the job.
Step 406 to step 408: in the running process of the job, the resource scheduling service receives a notification that the preemptive instance of the ECS service is to be automatically released, in this embodiment, the resource scheduling service sets the current preemptive instance to be dormant, resets the state of the job to be queued (queue) and places the queued state in a corresponding queue, and saves a dormant file or a checkpoint file.
Step 409: and continuously trying to automatically expand the preemptive instance by the resource scheduling service, adding the nodes into the cluster to continuously run the job by using the hibernation file after the preemptive instance is distributed, and finishing the calculation of the job.
The embodiment ensures that the low-priority operation uses the preemptive instance for calculation continuously or discontinuously aiming at the low-priority operation, fully utilizes the low-cost preemptive instance and reduces the operation cost of the operation.
A second embodiment, fig. 5 is a flowchart illustrating a second embodiment of obtaining cloud services according to the present application, where the second embodiment describes a process in which a user submits a job and the job changes in priority during a calculation and waiting process. In this embodiment, it is assumed that the functions of the cloud management server are completed by a queue service, a job scheduling service, and a resource scheduling service, as shown in fig. 5, the method includes:
step 500: the queue service receives jobs submitted by users.
In the first embodiment, it is assumed that attribute information of a submitted job includes: the expected calculation time WallTime is 5 hours, and the expected job end time ExpectedFinishTime is one day, i.e., the calculation of a job is expected to be completed in one day.
Step 501 to step 502: the job scheduling service calculates a priority level of the job. Assuming that the priority level of the job is calculated as a Low priority according to equation (1), the job is placed in the queue Low.
Step 503 to step 505: the resource scheduling service inquires job queue information, automatically expands the preemptive instance for the job in the queue Low through cloud service such as ECS service, and adds the node into the cluster to run the job.
Step 506 to step 508: in the running process of the job, the resource scheduling service receives a notification that the preemptive instance of the ECS service is to be automatically released, in this embodiment, the resource scheduling service sets the current preemptive instance to be dormant, resets the state of the job to be queued (queue), and saves a dormant file or a checkpoint file.
Step 509 to step 510: the job scheduling service periodically recalculates the priority level of the job according to a preset timer, and places the job in a queue Normal assuming that the priority level of the job calculated according to formula (1) is a medium priority. However, queue Normal has no free resources and the job is always in a wait state.
Step 511 to step 512: the job scheduling service periodically recalculates the priority level of the job according to a preset timer, and if the priority level of the job calculated according to the formula (1) is the medium priority, the job is placed in the queue High.
Step 513 to step 515: the resource scheduling service queries job queue information, automatically expands the pay-per-view instances for the jobs in the queue High through a cloud service, such as an ECS service, and adds nodes to the cluster to run the jobs.
The embodiment ensures that the operation is completed on time, and reduces the operation cost to the maximum extent.
Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims (19)

CN201911344792.3A2019-12-242019-12-24Method and device for acquiring cloud service and cloud management serverActiveCN113034166B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201911344792.3ACN113034166B (en)2019-12-242019-12-24Method and device for acquiring cloud service and cloud management server

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201911344792.3ACN113034166B (en)2019-12-242019-12-24Method and device for acquiring cloud service and cloud management server

Publications (2)

Publication NumberPublication Date
CN113034166Atrue CN113034166A (en)2021-06-25
CN113034166B CN113034166B (en)2024-09-06

Family

ID=76451484

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201911344792.3AActiveCN113034166B (en)2019-12-242019-12-24Method and device for acquiring cloud service and cloud management server

Country Status (1)

CountryLink
CN (1)CN113034166B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114237866A (en)*2021-11-052022-03-25国网河南省电力公司电力科学研究院Cloud platform computing resource scheduling method and system for relay protection setting
CN114546601A (en)*2022-03-172022-05-27阿里巴巴(中国)有限公司Virtual machine management method and device
CN114860422A (en)*2022-03-282022-08-05知鱼智联科技股份有限公司Cloud service acquisition method and terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070195356A1 (en)*2006-02-232007-08-23International Business Machines CorporationJob preempt set generation for resource management
US20090276781A1 (en)*2008-04-302009-11-05International Business Machines CorporationSystem and method for multi-level preemption scheduling in high performance processing
CN103914346A (en)*2013-12-162014-07-09西北工业大学Group-based dual-priority task scheduling and energy saving method for real-time operating system
CN108429631A (en)*2017-02-152018-08-21华为技术有限公司 Method and device for instantiating network services
US20180336064A1 (en)*2017-05-172018-11-22Imam Abdulrahman Bin Faisal UniversityMethod for determining earliest deadline first schedulability of non-preemptive uni-processor system
CN109426550A (en)*2017-08-232019-03-05阿里巴巴集团控股有限公司The dispatching method and equipment of resource

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070195356A1 (en)*2006-02-232007-08-23International Business Machines CorporationJob preempt set generation for resource management
US20090276781A1 (en)*2008-04-302009-11-05International Business Machines CorporationSystem and method for multi-level preemption scheduling in high performance processing
CN103914346A (en)*2013-12-162014-07-09西北工业大学Group-based dual-priority task scheduling and energy saving method for real-time operating system
CN108429631A (en)*2017-02-152018-08-21华为技术有限公司 Method and device for instantiating network services
US20180336064A1 (en)*2017-05-172018-11-22Imam Abdulrahman Bin Faisal UniversityMethod for determining earliest deadline first schedulability of non-preemptive uni-processor system
CN109426550A (en)*2017-08-232019-03-05阿里巴巴集团控股有限公司The dispatching method and equipment of resource

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUANJIA QIU ET AL.: "Cost-minimizing preemptive scheduling of mapreduce workloads on hybrid clouds", 《2013 IEEE/ACM 21ST INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS)》, pages 1 - 6*
陈龙: "面向不同资源供应模式的云工作流资源调度", 《中国博士学位论文全文数据库 信息科技辑》, no. 5, pages 139 - 8*

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114237866A (en)*2021-11-052022-03-25国网河南省电力公司电力科学研究院Cloud platform computing resource scheduling method and system for relay protection setting
CN114546601A (en)*2022-03-172022-05-27阿里巴巴(中国)有限公司Virtual machine management method and device
CN114546601B (en)*2022-03-172025-05-27阿里巴巴(中国)有限公司 Virtual machine management method and device
CN114860422A (en)*2022-03-282022-08-05知鱼智联科技股份有限公司Cloud service acquisition method and terminal
CN114860422B (en)*2022-03-282025-01-28知鱼智联科技股份有限公司 A cloud service acquisition method and terminal

Also Published As

Publication numberPublication date
CN113034166B (en)2024-09-06

Similar Documents

PublicationPublication DateTitle
CN111480145B (en)System and method for scheduling workloads according to a credit-based mechanism
US9235401B2 (en)Deploying updates to an application during periods of off-peak demand
CN113034166B (en)Method and device for acquiring cloud service and cloud management server
CN113377540A (en)Cluster resource scheduling method and device, electronic equipment and storage medium
US20160077845A1 (en)Variable timeslices for processing latency-dependent workloads
CN106407190A (en)Event record querying method and device
US11190415B2 (en)Flexible capacity reservations for network-accessible resources
CN104079503A (en)Method and device of distributing resources
US20180176148A1 (en)Method of dynamic resource allocation for public clouds
CN111476602A (en)Method and server for controlling distribution of advertisement budget
CN109426550B (en)Resource scheduling method and equipment
US8799888B1 (en)Updating an application
WO2015149514A1 (en)Virtual machine deploying method and apparatus
CN109844714B (en)System and method for allocating input/output bandwidth in a storage system
CN105808341A (en)Method, apparatus and system for scheduling resources
CN111104227A (en)Resource control method and device of K8s platform and related components
CN109379299A (en)A kind of method for limiting of data traffic, device and system
CN105791447A (en) A video service-oriented cloud resource scheduling method and device
CN109726008A (en)Resource allocation methods and equipment
CN113032369A (en)Data migration method, device and medium
CN108958975A (en)A kind of method, device and equipment controlling data resume speed
CN115700481A (en) Resource management method and related equipment of public cloud data center
CN114610480B (en) A method for elastic resource scheduling, a control device, a device and a readable storage medium
CN107194712B (en)Method and device for recording change information of shared account and method and system for supplementing account of internal account
CN112463295B (en)Cloud workflow configuration and scheduling method supporting preemptible virtual machine instance

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp