BACKGROUND The present invention relates generally to data processing systems, and relates more particularly to the management of hardware and software components of data processing systems. Specifically, the present invention provides a method and apparatus for automatic allocation of computing resources amongst multiple entities that obtain value by utilizing the resources to perform computation.
The problem of how to optimally allocate a limited set of resources amongst multiple entities that use or consume the resources has been extensively studied in disciplines including economics, manufacturing, telecommunications networks, and computing systems. Within the latter domain, the recent evolution of highly interconnected, rapidly changing, distributed computing systems such as the Internet has made it increasingly important to be able to rapidly compute and execute resource allocation decisions in an automated fashion.
Traditional approaches to provisioning and capacity planning typically aim to achieve an external value of some overall system performance metric (e.g., maximum average throughput or minimum average response time). Other conventional techniques employ market-based mechanisms for resource allocation (e.g., auction bidding or bilateral negotiation mechanisms). For example, a commonly used approach has been to anticipate the maximum possible load on the system, and then perform one-time static allocation of resources capable of handling the maximum load within a specified margin of safety. A common problem with such approaches is that, with modern workloads such as hit rates on Web pages, the demand rate may vary dynamically and rapidly over many orders of magnitude, and a system that is statically provisioned for its peak workload may spend nearly all its time sitting idle.
Thus, there is a need in the art for a method and apparatus for dynamic resource allocation in distributed computing systems.
SUMMARY OF THE INVENTION In one embodiment, the present invention is a method for optimal and automatic allocation of finite resources (e.g., hardware or software that can be used within any overall process that performs computation) amongst multiple entities that can provide computational services given the resource(s). One embodiment of the inventive method involves establishing, for each entity, a service level utility indicative of how much business value is obtained for a given level of computational system performance and for a given level of demand for computing service. Each entity is capable of transforming its respective service-level utility into a corresponding resource-level utility indicative of how much business value may be obtained for a given set or amount of resources allocated to the entity. The resource-level utilities for each entity are aggregated, and resource allocations are subsequently determined and executed based upon the dynamic resource-level utility information established. The invention is thereby capable of making rapid allocation decisions, according to time-varying need or value of the resources by each of the entities. In addition, the inventive method is motivated by the perspective of an enterprise comprising multiple entities that use said finite computational resources to provide service to one or more customers, and is thus structured to optimize the business value of the enterprise.
BRIEF DESCRIPTION OF THE DRAWINGS So that the manner in which the above recited embodiments of the invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be obtained by reference to the embodiments thereof which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
FIG. 1 is a diagram of a networked data processing system in which the present invention may be implemented;
FIG. 2 is an overall view of a resource allocation system in accordance with one embodiment of the present invention;
FIG. 3 is a flow chart illustrating one embodiment of a method for dynamically allocating resources among multiple application environments;
FIG. 4 is a diagram illustrating the detailed functionality of an application environment module which constitutes a component of the overall system shown inFIG. 2; and
FIG. 5 is a high level block diagram of the present invention implemented using a general purpose computing device.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
DETAILED DESCRIPTION In one embodiment, the present invention is a method for optimal and automatic allocation of finite resources amongst multiple entities that can perform computational work given the resource(s). For the purposes of the present invention, the term “resource” may indicate an entire hardware or software component (e.g., a compute server, a storage device, a RAM circuit or a database server), or a portion of a component (e.g., bandwidth access or a fraction of a server). The method may be implemented, for example, within a data processing system such as a network, a server, or a client computer. The invention is capable of making allocation decisions in real time, according to time-varying need or value of the resources by each of the entities, thereby resolving the shortcomings associated with typical static resource allocation techniques. In addition, the method is structured to optimize the business value of an enterprise that provides computing services to multiple entities using said finite computational resources.
FIG. 1 is a schematic illustration of one embodiment of a networkdata processing system100 comprising a network of computers (e.g., clients) in which the present invention may be implemented. The networkdata processing system100 includes anetwork102, aserver104, astorage unit106 and a plurality ofclients108,110 and112. Thenetwork102 is the medium used to provide communications links between theserver104,storage unit106 andclients108,110,112 connected together within networkdata processing system100. Thenetwork102 may include connections, such as wire, wireless communication links, or fiber optic cables.
In the embodiment illustrated, theserver104 provides data, such as boot files, operating system images, and applications to theclients108,110,112 (i.e., theclients108,110, and112 are clients to server104). Theclients108,110, and112 may be, for example, personal computers or network computers. Although the networkdata processing system100 depicted inFIG. 1 comprises asingle server104 and three clients,108,100,112, those skilled in the art will recognize that the networkdata processing system100 may include additional servers, clients, and other devices not shown inFIG. 1.
In one embodiment, the networkdata processing system100 is the Internet, with thenetwork102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. In further embodiments, the networkdata processing system100 is implemented as an intranet, a local area network (LAN), or a wide area network (WAN). Furthermore, althoughFIG. 1 illustrates a networkdata processing system100 in which the method of the present invention my be implemented, those skilled in the art will realize that the present invention may be implemented in a variety of other data processing systems, including servers (e.g., server104) and client computers (e.g.,clients108,110,112). Thus,FIG. 1 is intended as an example, and not as an architectural limitation for the present invention.
FIG. 2 is a schematic illustration of one embodiment of adata center200 for executing the method of the present invention. Thedata center200 comprises a plurality ofapplication environment modules201,202, and203, one ormore resource arbiters204 and a plurality ofresources205,206,207,208 and209. Each application environment module201-203 is responsible for handlingrespective demands213,214 and215 (e.g., requests for information processing services) that may arrive from a particular customer or set of clients (e.g., clients108-112 inFIG. 1). Example client types include: online shopping services, online trading services, and online auction services.
In order to process client demands213,214 or215, the application environments201-203 may utilize the resources205-209 within thedata center200. As each application environment201-203 is independent from the others and provides different services, each application environment201-203 has its own set of resources205-209 at its disposal, the use of which must be optimized to maintain the appropriate quality of service (QoS) level for the application environment's clients. An arrow from an application environment201-203 to a resource205-209 denotes that the resource205-209 is currently in use by the application environment201-203 (e.g., inFIG. 2,resource205 is currently in use by application environment201). An application environment201-203 also makes use of data or software objects, such as respective Service Level Agreements (SLAs)210,211 and212 with its clients, in order to determine its service-level utility function U(S,D). An example SLA210-212 may specify payments to be made by the client based on mean end-to-end response time averaged over, say, a five-minute time interval. Additionally the client workload may be divided into a number of service classes (e.g., Gold, Silver and Bronze), and the SLA210-212 may specify payments based on details of response time characteristics within each service class.
Each application environment201-203 is in further communication with theresource arbiter module204. Although thedata center200 illustrated inFIG. 2 utilizes only oneresource arbiter204, those skilled in the art will appreciate that multiple resource arbiters may be implemented in thedata center200. Theresource arbiter204 is responsible for deciding, at any given time while thedata center200 is in operation, which resources205-209 may be used by which application environments201-203. In one embodiment, the application environments201-203 andresource arbiter204 are software modules consisting of autonomic elements (e.g., software components that couple conventional computing functionality with additional self-management capabilities), for example written in Java™, and communication between modules201-203 and204 takes place using standard Java interfaces. The modules201-203 and204 may run on a single computer or on different computers connected by a network such as the Internet or a Local Area Network (LAN), e.g., as depicted inFIG. 1. In the networked case, communication may additionally employ standard network communication protocols such as TCP/IP and HTTP, and standard Web interfaces such as OGSA.
FIG. 3 is a flow chart illustrating themethod300 by which theresource arbiter204 makes resource allocation decisions. Referring simultaneously toFIGS. 2 and 3, themethod300 is initialized atblock302 and proceeds to block304, where themethod300 establishes a service-level utility function U(S, D) for each application environment201-203. In one embodiment, the variable S is a vector that characterizes the multiple performance measures for multiple service classes, and the variable D is a vector that characterizes the demand. The service level utility indicates how much business value U is obtained by theapplication environment201,202 or203 for various levels S of computational system performance, and for a given level D of demand213-215 for computing service.
In one embodiment, the service-level utility function U(S, D) is established by the application environment's SLA210-212. While each application environment's service-level utility may be based on different performance metrics, all of the service-level utility functions U(S, D) share a common scale of valuation.
Inblock306, themethod300 transforms the service-level utility function U(S, D) into a resource-level utility function V(R) for each application environment201-203. The resource level utility indicates how much business value V is obtained for a given actual or hypothetical set or amount of resources R (e.g., selected from resources205-209) allocated to the application environment201-203. In one embodiment, R is a vector. For example, the utility information may express a utility curve V(m), the utility obtained from being able to use m compute servers, at various values of m ranging from 0 to the total number of compute servers within the data center. Additionally if the servers are of different types, the utility information may express the value of obtaining m servers of type A, n servers of type B, etc. More generally the utility information may express V({x}), the value of assigning a particular collection or set {X} of resources205-209, for various sets {x} ranging over the power set of possible resources205-209 that could be assigned to the application environment201-203. The utility information may be expressed, for example, in a parameterized functional form, or it may also be expressed in terms of values at a set of discrete points which may represent a subset or complete set of all possible resource levels that could be provided.
The transformation may additionally depend on a set of variables describing the application environment's current state (e.g., current demand213-215, system load, throughput or average response time), or on differences between a hypothetical resource allocation R and the application environment's current resource allocation R* (e.g., in a manner that reflects any costs associated with switching the allocation from R* to R, including delays, machine downtime, etc.). In one embodiment, the resource-level utility function is calculated according to the relation
Vi(Ri)=Ui(Si, Di, Ri) (EQN. 1)
such that SiεSi(Ri, Di), where Si(Ri, Di) is a relation specifying the set of service levels attainable with resources Riand demand Di. In one embodiment, the relation Si(Ri, Di) is obtained by standard computer systems modeling techniques (e.g., queuing theory). In another embodiment, the relation Si(Ri, Di) may instead or additionally be refined by training on a collection of observed system performance data {(St, Rt, Dt)} using standard machine learning procedures (e.g., supervised learning methods employing standard linear or nonlinear function approximators).
In one embodiment, the resource-level utility function V(R) estimates the current value of the current state. In another embodiment, the resource-level utility function estimates the expected cumulative discounted or undiscounted future value starting from the current state. In one embodiment, any one or more of a number of standard methodologies may be employed in the process of estimating expected future value, including prediction and forecasting methodologies such as time-series prediction methods and machine learning methodologies such as reinforcement learning algorithms (e.g., Q-Learning, Temporal Difference Learning, R-Learning or SARSA).
Inblock308, themethod300 communicates the respective resource-level utility functions for each application environment201-203 to theresource arbiter204 and aggregates all resource level utility functions. In one embodiment, while thedata center200 is running, from time to time each application environment201-203 communicates to theresource arbiter204 information regarding its current resource-level utility function. Said communication may take place either synchronously or asynchronously, and may be initiated by the application environments201-203, or may be in response to a prompt or query issued by theresource arbiter204.
Inblock310, themethod300, having received resource-level utility information from each application environment201-203, combines said utility information and thereupon decides how to assign each available resource205-209 in thedata center200, in a manner that optimizes the total utility obtained. In other words, theresource arbiter204 maximizes the sum of the resource-level utilities,
Said resource assignment may include the possibility of a null assignment, (i.e., the resource205-209 is not assigned to any application environment201-203) so that the resource205-209 may be kept in reserve to handle future workload. For example, in the case of undifferentiated compute servers within thedata center200, theresource arbiter204 may utilize the most recent utility curves from each application environment201-203 (V1(m), V2(m) and V3(m) respectively), and then compute an integral number of servers (m1, m2, m3) to assign to each application environment201-203 so as to maximize the total V1(m1)+V2(m2)+V3(m3). The determination of an allocation that optimizes total utility will generally be made by executing an optimization method. In one embodiment, the values (m1, m2, m3) are found by using standard linear or nonlinear algorithms such as hill climbing, simulated annealing, linear programming, or mixed-integer programming. Additionally, the objective function optimized by theresource arbiter204 may also include any switching costs that are incurred when a particular resource205-209 is reallocated from one application environment201-203 to another. Said switching costs may include, for example, machine downtime and/or other costs related to installing or removing data or software from the machine when it is reallocated.
Inblock312, themethod300 executes the resource allocation decision calculated inblock310, and communicates the resource allocation decision to the application environments201-203. In one embodiment, block312 additionally involves the causation of manipulations or operations performed upon the resources205-209, enabling the resources205-209 to be used by the application environments201-203 to which the resources205-209 have been assigned, or associated with de-allocating a resource205-209 from an application environment201-203 to which the resource205-209 is no longer assigned.
FIG. 4 is a schematic illustration of the basic operations and functionality of one embodiment of anapplication environment module401 according to the present invention, wherein theapplication environment module401 is any of the application environments201-203 depicted inFIG. 2. In one embodiment, theapplication environment module401 comprises anautonomic manager element402, aworkload router403, and a systemperformance monitoring element404. Interactions of theapplication environment401 with itsSLA410, itsclient demand411, its currently allocated resources (e.g., computeservers420,421, and422), and with theresource arbiter element412, are depicted as they were inFIG. 2.
While theapplication environment401 is in operation, from time totime client demand411 is received and transmitted to therouter403, which thereupon sends saiddemand411 to one of the assignedcompute servers420,421, or422, typically based on the use of a routing or load-balancing method. As client jobs are processed, their intermediate and final output are returned to the submitting client. From time to time the performance monitor404 may observe, request or receive information regarding measures or statistics of the system performance of the compute servers420-422, such as CPU/memory usage, average throughput, average response time, and average queue depth. Theautonomic manager402 combines said performance measures with information regarding thedemand411, the SLA610, and the currently allocated resources420-422, to produce an estimated resource-level utility function.
In one embodiment, said utility function indicates V(m), the value of being allocated an integral quantity m of undifferentiated compute servers, with the value of m ranging from zero to the total number of servers in the data center (e.g.,data center200 inFIG. 2). From time to time said utility function is transmitted to theresource arbiter412, possibly in response to a prompt or query sent from theresource arbiter412. From time to time saidresource arbiter412 will additionally transmit to theapplication environment401 updated information regarding its set of allocated resources. The updated information indicates, for example, that certain compute servers420-422 are newly available for usage, or that certain compute servers420-422 previously used by theapplication environment401 are to be de-allocated and are no longer available for usage.
In another embodiment, theautonomic manager module402 ofFIG. 4 further comprises a capability to model the effect of any adjustable operational parameters the resources420-422 may have (e.g., maximum queue depth, buffer pool sizes, etc.) on the observed system performance. Theautonomic manager402 further operates to set said parameters of the resources420-422, or of therouter403, or other internal parameters, to values such that the resulting system-level utility function optimizes the resource-level utility function.
In another embodiment of the invention, theautonomic manager module402 ofFIG. 4 further comprises a capability to model or predict the demand at future times given the observedcurrent demand411, and a capability to model or predict the system performance at future times given thecurrent demand411, current performance, and future allocated resources, which may be the same or different from the current allocated resources420-422. Theautonomic manager402 then computes a resource-level utility function indicating the cumulative discounted or undiscounted future utility associated with a hypothetical resource allocation made at the current time. In one embodiment, the predicted demand and predicted system performance are deterministic predictions at each future time. In another embodiment, the predicted demand and predicted system performance are probability distributions over possible levels of demand or performance at each future time. In one embodiment, the cumulative future utility is obtained by summation over a finite number of discrete future time steps. In another embodiment, the cumulative future utility is obtained by integration over a continuous future time interval.
In another embodiment of the invention, theautonomic manager module402 ofFIG. 4 does not explicitly predict future demand or future system performance, but instead uses machine learning procedures to estimate cumulative discounted or undiscounted future utility from a temporal sequence of observed data points, each data point consisting of: an observed demand, an observed system performance, an observed resource allocation, and an observed payment as specified by theSLA410. In one embodiment, the machine learning procedure consists of a standard reinforcement learning procedure such as Q-Learning, Temporal Difference Learning, R-Learning or SARSA.
FIG. 5 is a high level block diagram of the present dynamic resource allocation system that is implemented using a generalpurpose computing device500. In one embodiment, a generalpurpose computing device500 comprises aprocessor502, amemory504, a dynamic resource allocator ormodule505 and various input/output (I/O)devices506 such as a display, a keyboard, a mouse, a modem, and the like. In one embodiment, at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive, a floppy disk drive). It should be understood that thedynamic resource allocator505 can be implemented as a physical device or subsystem that is coupled to a processor through a communication channel.
Alternatively, thedynamic resource allocator505 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., I/O devices506) and operated by theprocessor502 in thememory504 of the generalpurpose computing device500. Thus, in one embodiment, theresource allocator505 for allocating resources among entities described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).
The functionalities of the arbiters and the application environments described with reference toFIGS. 2 and 4 may be performed by software modules of various types. For example, in one embodiment, the arbiters and/or application environments comprise autonomic elements. In another embodiment, the arbiters and/or application environments comprise autonomous agents software as may be constructed, for example, using the Agent Building and Learning Environment (ABLE). The arbiters and/or application environments may all run on a single computer, or they may run independently on different computers. Communication between the arbiters and the application environments may take place using standard interfaces and communication protocols. In the case of arbiters and application environments running on different computers, standard network interfaces and communication protocols may be employed, such as Web Services interfaces (e.g., those employed in the Open Grid Services Architecture (OGSA)).
Thus, the present invention represents a significant advancement in the field of dynamic resource allocation. A method and apparatus are provided that enable a finite number of resources to be dynamically allocated among a number of entities or application environments capable of performing computational work given the resources. The allocation is performed in a manner that optimizes the business value of the enterprise providing the computing services to a number of clients.
While foregoing is directed to the preferred embodiment of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.