BACKGROUNDThe field of the invention relates generally to distributed computing and, more particularly, to a computer-implemented system for provisioning heterogeneous computing resources using cloud computing resources and private computing resources.
Machine Learning, a branch of artificial intelligence, is a science concerned with developing algorithms that analyze empirical, real-world data, searching for patterns in that data in order to create accurate predictions about future events. One critical and challenging part of Machine Learning is model creation, a process of creating a model based on a set of “training data”. This empirical data, observed and recorded, may be used to generalize from those prior experiences. During the model creation process, practitioners find the best model for the given problem through trial and error, that is, generating numerous different models from the training data and choosing one that best meets the performance criteria based on the a set of validation data. Model creation is a complex search problem in the space of model structure and parameters of the various modeling options, and is computationally expensive because of the size of the search space.
The increasing computational complexity of Machine Learning problems requires greater computational capacity, in the form of faster computing resources, more computing resources, or both. In the late 1990's, the SETI@home project implemented a distributed computing mechanism to harness thousands of individual computers to help solve computationally intensive workloads in the Search for Extra-Terrestrial Intelligence (“SETI”). SETI@home needed to analyze massive amounts of observational data from a radio telescope in the search for radio transmissions that might indicate intelligent life in distant galaxies.
Computationally, the problem was divided based on the collected data, parceling the problem into millions of tiny regions of the sky. To process the work load, each tiny region, along with its associated data, was sent out to individual computers on the internet. As each computer finished processing a single tiny region, it would transmit its results back to a central server for collection. For SETI@home, thousands of internet-based computers became a broad distributed computing environment harnessed to solve a computationally complex problem. Similarly, Machine Learning problems represent a computationally complex problem that can also be broken into components and processed with numerous individual computing resources.
In the late 2000's, “Cloud Computing” has emerged as a source of computing resource available to consumers over the internet. Traditionally, if a developer is in need of computational resources, the developer would need to purchase hardware, install the hardware in a datacenter, and install and maintain an operating system on the hardware. Now, various cloud service providers offer a variety of computing resource services available on demand over the internet, such as “Infrastructure as a Service” (“IaaS”) and “Platform as a Service” (“PaaS”).
With IaaS and PaaS, consumers can “rent” individual computers or, more often, “virtual servers”, from the cloud service provider on an as-needed basis. These virtual servers may be pre-loaded with an operating system image, and accessible via the Internet through use of an Application Programming Interface (“API”). For example, a developer with a computationally complex problem could use the cloud service provider's API to provision a virtual server with the cloud service provider, transfer his software code or machine-executable instructions and data to the virtual server, and execute his job. When the job is finished, the developer could retrieve his results, and then shut down the virtual server. These IaaS and PaaS services offer an option to those in need of additional computational resources, but who do not have the regular need, the budget, or the infrastructure for having their own dedicated hardware. For developers who require an agile development environment for Machine Learning computation, cloud computing represents a promising source of computing resources.
BRIEF DESCRIPTIONIn one aspect, a system for distributed computing is provided. The system includes a job scheduler module configured to identify a job request. The job request includes one or more request requirements, and one or more individual jobs. The system also includes a resource module configured to determine an execution set of computing resources from a pool of computing resources based at least partially on the one or more request requirements. Each computing resource of the pool of computing resources has an associated application programming interface. The pool of computing resources includes one of at least one internal computing resource and at least one public cloud computing resource, and a plurality of public cloud computing resources. The resource module also assigns a first computing resource from the execution set of computing resources to a first individual job of the one or more individual jobs. The system further includes a plurality of interface modules. Each interface module of the plurality of interface modules configured to facilitate communication with one or more computing resources of the pool of computing resources using the associated application programming interface. The system also includes an executor module configured to identify a first interface module from the plurality of interface modules based at least in part on facilitating communication with the first computing resource. The executor module is also configured to transmit the first individual job for execution to the first computing resource using the first interface module.
In a further aspect, a method for distributed computing is provided. The method is implemented by at least one computer device including at least one processor and at least one memory device coupled to the at least one processor. The method includes identifying a job request comprising one or more individual jobs. The method also includes identifying one or more computing resource requirements for the job request. The method further includes determining an execution set of computing resources from a pool of computing resources based at least partially on the one or more computing resource requirements. Each computing resource of the pool of computing resources has an associated application programming interface. The pool of computing resources includes one of at least one internal computing resource and at least one external computing resource, and a plurality of external computing resources. The method also includes assigning a first computing resource from the execution set of computing resources to a first individual job of the one or more individual jobs. The method further includes identifying a plurality of interface modules. Each interface module of the plurality of interface modules is configured to facilitate communication with one or more computing resources of the pool of computing resources using the associated application programming interface. The method also includes selecting a first interface module from a plurality of interface modules based at least in part on facilitating communication with the first computing resource. The method further includes transmitting, by the at least one computer device, the first individual job for execution to the first computing resource using the first interface module.
In yet another aspect, a system for distributed computing is provided. The system includes a job scheduler module configured to identify a first job request and a second job request. The system also includes a resource module configured to assign a first computing resource to the first job request from a first execution set of computing resources associated with a first cloud service provider. The first computing resource has a first application programming interface. The resource module is also configured to assign a second computing resource to the second job request from one of a second execution set of computing resources associated with a second cloud service provider, and a set of internal computing resources. The second computing resource has a second application programming interface. The system further includes a first interface module configured to facilitate communication with the first computing resource using the first application programming interface. The system also includes a second interface module configured to facilitate communication with the second computing resource using the second application programming interface. The system further includes an executor module configured to transmit the first job request for execution to the first computing resource using the first interface module. The executor module is also configured to transmit the second job request for execution to the second computing resource using the second interface module.
DRAWINGSThese and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
FIG. 1 is a block diagram of an exemplary computing system that may be used for automated provisioning of heterogeneous computing resources for Machine Learning;
FIG. 2 is a diagram of an exemplary application environment which includes a system for automated provisioning of heterogeneous computing resources for Machine Learning using the computing system shown inFIG. 1;
FIG. 3 is a diagram of the exemplary application environment shown inFIG. 2 showing the major components of the system for automated provisioning of heterogeneous computing resources for Machine Learning shown inFIG. 2;
FIG. 4 is a data flow diagram of an exemplary request module of the system shown inFIG. 3, responsible for receiving and processing a request related to Machine Learning;
FIG. 5 is a data flow diagram of the exemplary job scheduler/optimizer module of the system shown inFIG. 3, responsible for preparing jobs for execution;
FIG. 6 is a data flow diagram of the exemplary executor module and resource module of the system shown inFIG. 3, responsible for assigning jobs to computing resources and transmitting jobs for execution;
FIG. 7 is a block diagram of an exemplary method of provisioning heterogeneous computing resources for Machine Learning using the system shown inFIG. 3;
FIG. 8 is a block diagram of another exemplary method of provisioning heterogeneous computing resources for Machine Learning using the system shown inFIG. 3;
FIG. 9 is a block diagram showing a first portion of an exemplary database table structure for the system shown inFIG. 3, showing the primary tables used by request module shown inFIG. 3;
FIG. 10 is a block diagram showing a second portion the exemplary database structure for thesystem201 shown inFIG. 3, showing the primary tables used by job scheduler/optimizer module shown inFIG. 3; and
FIG. 11 is a block diagram showing a third portion of the exemplary database structure for the system shown inFIG. 3, showing the primary tables used by the executor module and the resource module shown inFIG. 3.
Unless otherwise indicated, the drawings provided herein are meant to illustrate key inventive features of the invention. These key inventive features are believed to be applicable in a wide variety of systems comprising one or more embodiments of the invention. As such, the drawings are not meant to include all conventional features known by those of ordinary skill in the art to be required for the practice of the invention.
DETAILED DESCRIPTIONIn the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings.
The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
“Optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where the event occurs and instances where it does not.
Approximating language, as used herein throughout the specification and claims, may be applied to modify any quantitative representation that may permissibly vary without resulting in a change in the basic function to which it is related. Accordingly, a value modified by a term or terms, such as “about” and “substantially”, are not to be limited to the precise value specified. In at least some instances, the approximating language may correspond to the precision of an instrument for measuring the value. Here and throughout the specification and claims, range limitations may be combined and/or interchanged, such ranges are identified and include all the sub-ranges contained therein unless context or language indicates otherwise.
As used herein, the term “non-transitory computer-readable media” is intended to be representative of any tangible computer-based device implemented in any method or technology for short-term and long-term storage of information, such as, computer-readable instructions, data structures, program modules and sub-modules, or other data in any device. Therefore, the methods described herein may be encoded as executable instructions embodied in a tangible, non-transitory, computer readable medium, including, without limitation, a storage device and/or a memory device. Such instructions, when executed by a processor, cause the processor to perform at least a portion of the methods described herein. Moreover, as used herein, the term “non-transitory computer-readable media” includes all tangible, computer-readable media, including, without limitation, non-transitory computer storage devices, including, without limitation, volatile and nonvolatile media, and removable and non-removable media such as a firmware, physical and virtual storage, CD-ROMs, DVDs, and any other digital source such as a network or the Internet, as well as yet to be developed digital means, with the sole exception being a transitory, propagating signal.
As used herein, the term “cloud computing” refers generally to computing services offered over the internet. Also, as used herein, the term “cloud service provider” refers to the company or entity offering or hosting the computing service. There are many types of computing services that fall under the umbrella of “cloud computing,” including “Infrastructure as a Service” (“Iaas”) and “Platform as a Service” (“PaaS”). Further, as used herein, the term “IaaS” is used to refer to the computing service involving offering physical or virtual servers to consumers. Under the IaaS model, the consumer will “rent” a physical or virtual server from the cloud service provider, who provides the hardware but generally not the operating system or any higher-level application services. Moreover, as used herein, the term “PaaS” is used to refer to the computing service offering physical or virtual servers to consumers, but also including operating system installation and support, and possibly some base application installation and support such as a database or web server. Also, as used herein, the terms “cloud computing”, “IaaS”, and “PaaS” are used interchangeably. The systems and methods described herein are not limited to these two models of cloud computing. Any computing service that enables the operation of the systems and methods as described herein may be used.
As used herein, the term “private cloud” refers to a computing resources platform similar to “cloud computing”, as described above, but operated solely for a single organization. For example, and without limitation, a large company may establish a private cloud for its own computing needs. Rather than buying dedicated hardware for various specific internal projects or departments, the company may align its computing resources in the private cloud and allow its developers to leverage computing resources through the cloud model, thereby providing greater leverage of its computing resources across the company.
As used herein, the term “internal computing resources” refers generally to computing resources owned or otherwise available to the entity practicing the systems and methods described herein excluding the public “cloud computing” sources. Also, as used herein, private clouds are also considered internal computing resources. Further, as used herein, the term “external computing resources” includes the public “cloud computing” resources.
As used herein, the term “provisioning” refers to the process of establishing a computing resource for use. In order to make a resource available for use, the resource may need to be “provisioned”. For example, and without limitation, when a user seeks a computing resource such as a virtual server from a cloud service provider, the user engages in a transaction to “provision” the virtual server for the consumer's use for a period of time. “Provisioning” establishes the allocation of the computing resource for the user. In the setting of cloud computing of virtual servers, the “provisioning” process may actually cause the cloud service provider to create a virtual server, and perhaps install an operating system image and base applications on the virtual server, before allowing the user to use the resource. Alternatively, the term “provisioning” is also used to refer to the process of allocating an already-available but currently unused computing resource. For example, a cloud server that has already been “provisioned” from the cloud provider, but is not currently occupied with a computing task, can be referred to as being “provisioned” to a new computing task when it is assigned to that task. Also, as used herein, the terms “assignment”, “allocating”, and “provisioning”, with respect to cloud computing resources, are used interchangeably.
As used herein, the term “releasing” is the corollary to “provisioning.” “Releasing” is the process of relinquishing use of the computing resource. In order to vacate the resource, the resource may need to be “released”. For example, and without limitation, when a user has finished using a virtual server from a cloud service provider, the user “releases” the virtual server. “Releasing” informs the cloud service provider that the resource is no longer needed or in use by user, and that the resource may be re-provisioned.
As used herein, the term “algorithm” refers, generally, to any method of solving a problem. Also, as used herein, the term “model” refers, generally, to an algorithm for solving a problem. Further, as used herein, the terms “model” and “algorithm” are used interchangeably. More specifically, in the context of Machine Learning and supervised learning, “model” includes a dataset gathered from some real-world data source, in which a set of input variables and their corresponding output variables are gathered. When properly configured, the model can act as a predictor for a problem if the model utilizes variables similar to a problem. A model may be one of, without limitation, a one-class classifier, a multi-class classifier, or a predictor. In other contexts, the term “algorithm” may refer to methods of solving other problems, such as, without limitation, design of experiments and simulations. In some embodiments, an “algorithm” includes source code and/or computer-executable instructions that may be distributed and utilized to “solve” the problem through execution by a computing resource.
As used herein, the term “job” is used to refer, generally, to a body of work identified for, without limitation, execution, processing, or computing. The “job” may be divisible into multiple smaller jobs such that, when executed and aggregated, satisfy completion of the “job”. Also, as used herein, the term “job” may also be used to refer to one or more of the multiple smaller jobs that make up a larger job. Further, as used herein, the term “execution job” is used interchangeably with “job”, and may also be used to signify a “job” that is ready for execution.
As used herein, the terms “execution request”, “job request”, and “request” are used, interchangeably, to refer to the computational problem to be solved using the systems and methods described herein.
As used herein, the terms “requirement”, “limitation”, and “restriction” refers generally to a configuration parameter associated with a job request. For example, and without limitation, when a user enters a job request that defines use of a particular model M1, the user has specified a “requirement” that the request be executed using model M1. A “requirement” may also be characterized as a “limitation” or a “restriction” on the job request. For example, and without limitation, when a user enters a job request that restricts processing of the request to only internal computing resources, that restriction may be characterized as both a “requirement” that “only internal computing resources are used,” as well as a “limitation” or “restriction” that “no non-internal computing resources may be used to process the request.”
As used herein, the term “heterogeneous computing resources” refers to a set of computing resources that differ in an aspect of one of operating system, processor configuration (i.e., single-processor versus multi-processor), and memory architecture (i.e., 32-bit versus 64-bit). For example, and without limitation, if a set of computing resources includes System X, which is running the Linux operating system, and System Y, which is running Windows™ Server2003 operating system, then the set of computing resources is considered “heterogeneous”. Additionally, for example, and without limitation, if a set of computing resources includes System 1, which has a single intel-based processor running the Linux operating system, and System 2, which has four intel-based processors running the Linux operating system, then this set of computing resources is considered “heterogeneous”.
The exemplary systems and methods described herein allow a user to seamlessly leverage a diverse, heterogeneous pool of computing resources to perform computational tasks across various cloud computing providers, internal clouds, and other internal computing resources. More specifically, the system is used to search for optimal computational designs or configurations, such as machine learning models and associated model parameters, by automatically provisioning such search tasks across a variety of computing resources coming from a variety of computing providers. An algorithm database includes various versions of machine-executable code or binaries tailored for the variety of computing resource architectures that might be leveraged. An executor module maintains and communicates with the variety of computing resources through an Application Programming Interface (“API”) module, allowing the system to communicate with various different cloud computing providers, as well as internal computing resources such as a private cloud or a private server cluster. A user can input a request that tailors which algorithms are used to complete the request, as well as specifying computing restrictions to be used for execution. Therefore, the user can submit his computationally intensive job to the system, customized with performance requirements and certain restrictions, and thereby seamlessly leverage a potentially large, diverse, and heterogeneous pool of computing resources.
FIG. 1 is a block diagram of anexemplary computing system120 that may be used for automated provisioning of heterogeneous computing resources for Machine Learning. Alternatively, any computer architecture that enables operation of the systems and methods as described herein may be used.
In the exemplary embodiment,computing system120 includes amemory device150 and aprocessor152 operatively coupled tomemory device150 for executing instructions. In some embodiments, executable instructions are stored inmemory device150.Computing system120 is configurable to perform one or more operations described herein byprogramming processor152. For example,processor152 may be programmed by encoding an operation as one or more executable instructions and providing the executable instructions inmemory device150.Processor152 may include one or more processing units, e.g., without limitation, in a multi-core configuration.
In the exemplary embodiment,memory device150 is one or more devices that enable storage and retrieval of information such as executable instructions and/or other data.Memory device150 may include one or more tangible, non-transitory computer-readable media, such as, without limitation, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), a solid state disk, a hard disk, read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), and/or non-volatile RAM (NVRAM) memory. The above memory types are exemplary only, and are thus not limiting as to the types of memory usable for storage of a computer program.
Also, in the exemplary embodiment,memory device150 may be configured to store information associated with for automated provisioning of heterogeneous computing resources for Machine Learning, including, without limitation, Machine Learning models, application programming interfaces, cloud computing resources, and internal computing resources.
In some embodiments,computing system120 includes apresentation interface154 coupled toprocessor152.Presentation interface154 presents information, such as a user interface and/or an alarm, to auser156. For example,presentation interface154 may include a display adapter (not shown) that may be coupled to a display device (not shown), such as a cathode ray tube (CRT), a liquid crystal display (LCD), an organic LED (OLED) display, and/or a hand-held device with a display. In some embodiments,presentation interface154 includes one or more display devices. In addition, or alternatively,presentation interface154 may include an audio output device (not shown) (e.g., an audio adapter and/or a speaker).
In some embodiments,computing system120 includes auser input interface158. In the exemplary embodiment,user input interface158 is coupled toprocessor152 and receives input fromuser156.User input interface158 may include, for example, a keyboard, a pointing device, a mouse, a stylus, and/or a touch sensitive panel, e.g., a touch pad or a touch screen. A single component, such as a touch screen, may function as both a display device ofpresentation interface154 anduser input interface158.
Further, acommunication interface160 is coupled toprocessor152 and is configured to be coupled in communication with one or more other devices, such as, without limitation, anothercomputing system120, and any device capable of accessingcomputing system120 including, without limitation, a portable laptop computer, a personal digital assistant (PDA), and a smart phone.Communication interface160 may include, without limitation, a wired network adapter, a wireless network adapter, a mobile telecommunications adapter, a serial communication adapter, and/or a parallel communication adapter.Communication interface160 may receive data from and/or transmit data to one or more remote devices. For example,communication interface160 of onecomputing system120 may transmit transaction information tocommunication interface160 of anothercomputing system120.Computing system120 may be web-enabled for remote communications, for example, with a remote desktop computer (not shown).
Also,presentation interface154 and/orcommunication interface160 are both capable of providing information suitable for use with the methods described herein, e.g., touser156 or another device. Accordingly,presentation interface154 andcommunication interface160 may be referred to as output devices. Similarly,user input interface158 andcommunication interface160 are capable of receiving information suitable for use with the methods described herein and may be referred to as input devices.
Further,processor152 and/ormemory device150 may also be operatively coupled to astorage device162.Storage device162 is any computer-operated hardware suitable for storing and/or retrieving data, such as, but not limited to, data associated with adatabase164. In the exemplary embodiment,storage device162 is integrated incomputing system120. For example,computing system120 may include one or more hard disk drives asstorage device162. Moreover, for example,storage device162 may include multiple storage units such as hard disks and/or solid state disks in a redundant array of inexpensive disks (RAID) configuration.Storage device162 may include a storage area network (SAN), a network attached storage (NAS) system, and/or cloud-based storage. Alternatively,storage device162 is external tocomputing system120 and may be accessed by a storage interface (not shown).
Moreover, in the exemplary embodiment,database164 includes a variety of static and dynamic data associated with, without limitation, Machine Learning models, cloud computing resources, and internal computing resources.
The embodiments illustrated and described herein as well as embodiments not specifically described herein but within the scope of aspects of the disclosure, constitute exemplary means for automated provisioning of heterogeneous computing resources for Machine Learning. For example,computing system120, and any other similar computer device added thereto or included within, when integrated together, include sufficient computer-readable storage media that is/are programmed with sufficient computer-executable instructions to execute processes and techniques with a processor as described herein. Specifically,computing system120 and any other similar computer device added thereto or included within, when integrated together, constitute an exemplary means for recording, storing, retrieving, and displaying operational data associated with a system (not shown inFIG. 1) for automated provisioning of heterogeneous computing resources for Machine Learning.
FIG. 2 is a diagram of anexemplary application environment200 which includes a system201ffor automated provisioning of heterogeneous computing resources for Machine Learning using computing system120 (shown inFIG. 1). Auser202 conceives aproblem203 and submits arequest204 tosystem201.System201 interacts withcomputing resources206 in order to processrequest204. In the exemplary embodiment, computingresources206 consist of one or morepublic clouds208, one or moreprivate clouds210, andinternal computing resources212. Operationally,system201 receivesrequest204 fromuser202 and executes the request automatically acrossheterogeneous computing resources206, thereby insulatinguser202 from the execution details. The details ofsystem201 are explained in detail below.
FIG. 3 is a diagram of the exemplary application environment200 (shown inFIG. 2) showing the major components ofsystem201 for automated provisioning of heterogeneous computing resources for Machine Learning.User202 creates and submits arequest204 to arequest module304.Request module304 processes request204 by creating one or more “jobs” for processing. A job scheduler/optimizer module310 analyzes alibrary308 and selects the most appropriate models and parameters to use for execution of the job, based onrequest204. In some embodiments,library308 is a database of models.
Also, in the exemplary embodiment,library308 is a database of Machine Learning algorithms. Alternatively,library308 is a database of other computational algorithms. Each model inlibrary308 includes one or more sets of computer-executable instructions compiled for different hardware and operating system architectures. The computer-executable instructions are pre-compiled binaries for a given architecture. Alternatively, the computer-executable instructions may be un-compiled source code written in a programming or scripting language, such as Java and C++. The number of algorithms inlibrary308 is not fixed, i.e., algorithms may be added or removed. Machine learning algorithms inlibrary308 may be scalable to data size.
Further, in the exemplary embodiment, aresource module314 determines and assigns a subset of computing resource from computingresources206 appropriate for the job. Once the job has a subset of computing resources assigned, anexecutor module312 manages the submission of the job to the assigned computing resources. To communicate with thevarious computing resources206,executor module312 utilizesAPI modules313. The operations of each system component are explained in detail below.
InFIGS. 4-8, the operation of eachsystem201 component is described.FIG. 4 showsexemplary request module304.FIG. 5 shows exemplary job scheduler/optimizer module310.FIG. 6 showsexemplary executor module312 andresource module314.FIG. 7 shows an exemplary illustration ofsystem201 including the components fromFIGS. 4-6. The operations of each system component are explained in detail below.
In some embodiments, the components ofsystem201 communicate with each other through the use ofdatabase164. Entry of information into a table ofdatabase164 by one component may trigger action by another component. This mechanism of communication is only an exemplary method of passing information between components and advancing work flow. Alternatively, any mechanism of communication and work flow that enables operation of the systems and methods described herein may be used.
FIG. 4 is a data flow diagram400 ofexemplary request module304 of system201 (shown inFIG. 3), responsible for receiving andprocessing request204 related to Machine Learning. For example,user202 may submitrequest204 asking to perform model exploration using the task of classification. This model space exploration represents a computationally intensive task which may be broken up into sub-tasks and executed across multiple computing resources in order to gain the benefits of utilizing multiple computing resources.
Also, in the exemplary embodiment,request module304 stores requestinformation404 aboutrequest204. In some embodiments, requestinformation404 is stored in database164 (shown inFIG. 1). Alternatively, requestinformation404 may be stored in any other way, such as, without limitation, memory device150 (shown inFIG. 1), or any way that enables operation of the systems and methods described herein.Request information404 may include, without limitation, problem definition information, model names, model parameters, input data, label column number within data file providing the “ground truth” for training/optimization, task type, e.g., classification or clustering or regression or rule tuning, performance criteria, optimization method, e.g., grid search or evolutionary optimization, information regarding search space, e.g., grid points for grid search or search bounds for evolutionary optimization, computing requirements and preferences, data sensitivity, and encryption information. Alternatively, requestinformation404 may include any information that enables operation of the systems and methods described herein.
Further, in the exemplary embodiment,request module304 creates ajob402. In some embodiments,job402 is represented by a single row indatabase164. Alternatively, request204 may requiremultiple jobs402 to satisfyrequest204. For example, whenuser202 entersrequest204 asking to perform model exploration using classification,request module304 enters a row injobs402 table indicating a new classification job, andlinks job402 to itsown request information404. Job scheduler/optimizer module310 periodically checksjobs402 table for new, unprocessed jobs. Once scheduler/optimizer module310 seesjob402, it will act to further process the job as described below.
Moreover, in the exemplary embodiment,request module304 receives request results406 once a job has been fully processed. In some embodiments, request results406 are stored indatabase164. In operation,request module304 would receiverequest results406 by noticing that a newly returnedrequest result406 has been written intodatabase164. This result processing is a later step in the overall operation of system201 (shown inFIG. 3), and is discussed in more detail below.
FIG. 5 is a data flow diagram500 of exemplary job scheduler/optimizer module310 of system201 (shown inFIG. 3), responsible for preparingjobs402 for execution. Job scheduler/optimizer module310 analyzesjob402 andrequest information404, and selects one or more models fromlibrary308. Based onrequest information404, Job scheduler/optimizer module310 creates one ormore job models502. For example, when job scheduler/optimizer module310 sees a job requesting classification, job scheduler/optimizer module310 examinesrequest information404 to see if a particular type of classification, such as Support Vector Machine (“SVM”) or Artificial Neural Network (“ANN”) has been specified byuser202. If no specific model has been specified, then job scheduler/optimizer module310 will create a row injob model502 for each type of classification appropriate and available fromlibrary308.
Also, in the exemplary embodiment, ajob model instance504 is created by scheduler/optimizer module310 for eachjob model502. In operation,Job model instance504 serves to further limit how and wherejob model502 may be executed. Scheduler/optimizer module310 limitsjob model instance504 based onrequest information404 and model restrictions, such as, without limitation, preferred computing resources specified by user202 (shown inFIG. 3), and required platform specified by the particular model selected fromlibrary308. For example, when job scheduler/optimizer310 creates ajob402 for classification using SVM, job scheduler/optimizer310 looks at the SVM model withinlibrary308 andrequest information404 for the request. If the SVM model within library has a computing restriction such as only having a compiled version of the model for 32-bit Linux, thenjob model instance504 will be restricted to using only 32-bit Linux hosts. Alternatively, ifrequest information404 specifies only usinginternal computing resources212, thenjob model instance504 will be so restricted. In some embodiments,job model instance504 may consist of one or more execution tasks that are defined by search space information as part ofrequest information404. The execution tasks may be distributed and executed on a plurality of computing resources inexecutor module312, as discussed below.
In operation, in the exemplary embodiment, job scheduler/optimizer310 periodically checksjobs402 for unprocessed entries. Upon noticingnew job402, job scheduler/optimizer310 analyzesrequest information404 and selects several models fromlibrary308. Job scheduler/optimizer310 then creates a new row injob models502 for each model required to processjob402. Further, job scheduler/optimizer310 creates ajob model instance504 for eachjob model502, further limiting howjob model502 is processed. Each of thesejob model instances504 is created as individual rows indatabase164. Thesejob model instances504 will be processed byexecutor module312 andresource module314, as discussed below.
Also, in some embodiments, job scheduler/optimizer310 may perform a series of iterative jobs that requires submitting506additional job models502 andjob model instances504 after receiving results from a previousjob model instance504. In some embodiments, such as where the optimization method is specified as grid search or other combinatorial optimization, submitting and processing a single set ofjob models502 andjob model instances504 will suffice forsatisfying job402. In other embodiments, where optimization methods such as, without limitation, heuristic search, evolutionary algorithms, and stochastic optimization are specified,certain jobs402 may require post-execution processing of a first set of results, followed by submission of additional sets ofjob models502 andjob model instances504. This post-processing of results and submission ofadditional job models502 may occur a certain number of times, or until a satisfaction condition is met. Dependent on the number of performance criteria specified inrequest information404, the optimization may be either single-objective or multi-objective optimization.
FIG. 6 is a data flow diagram600 ofexemplary executor module312 andresource module314 of system201 (shown inFIG. 3), responsible for assigning jobs to computingresources206 and transmitting jobs for execution. Computing resource availability is maintained byresource module314 usinginstance resource602 table. Each row ininstance resource602 table correlates to one ormore computing resource206 which may be used to executejob model instances504. In some embodiments, eachinstance resource602 is a row stored in database164 (shown inFIG. 1). As used herein, the term “instance resource” may refer, alternatively, to either a database table used for trackingcomputing resources206, or to the individual computing resources that the table is used to track.
Also, in the exemplary embodiment,resource module314 selects a subset ofcomputing resources206 and assigns thoseinstance resources602 to eachjob model instance504 based on, without limitation, computing restrictions associated withjob model instance504, requestinformation404, and computing resource availability. In operation, when aninstance resource602 is assigned tojob model instance504,resource module314 creates a row indatabase164 used for tracking the assignment ofinstance resource602 tojob model instance504. For example,resource module314 sees a newjob model instance504 as requiring a set of computing resource.Resource module314 examines computing resource restrictions withinjob model instance504, and finds that there is a restriction to use only Linux nodes, but any public or private Linux nodes are acceptable.Resource module314 then searchesinstance resource602 to find a suitable set of Linux computing resources suitable forjob model instance504. The set of computing resources is then allocated tojob model instance504 for execution.
Further, in some embodiments,system201 may maintain a second table (not shown) indatabase164 that maintains a list of all of the current resources available tosystem201, such that each row ininstance resource602 correlates to a row in the second table. This second table may include individual computing resources currently provisioned frompublic cloud208 orprivate cloud210, and may also include individualinternal computing resources212. Also, in some embodiments,system201 may maintain a third table (not shown) indatabase164 that maintains a list of all of the computing resource providers, such that each individual computing resource listed in the second table correlates to a provider listed in the third table.
Moreover, in some embodiments,resource module314 considersrequest204 and/orrequest information404 when deciding how to allocate resources.Request204 may include cost, time, and/or security restrictions relative to computing resource utilization, such as, without limitation, using no-cost computing resources, using computing resources with a limited cost rate per node, using computing resources up to a fixed expense amount, time constraints, using only private computing resources, and using secure computing resources. For example, ifuser202 had specified a limitation to only using “secure” hosts in request, or to not spending more than a given dollar limit to execute the request, thenresource module314 would factor those additional limitations into the selection process during resource assignment. Alternatively, job scheduler/optimizer module310 may have consideredrequest204 and/orrequest information404 when adding restrictions tojob model instance504.
Also, in the exemplary embodiment, resource module parallelizes execution ofjob model instance504 by usingmultiple instance resources602 to satisfy execution ofjob model instance504. As used herein, “parallelization” is the process of breaking a single, large job up into smaller components and executing each small component individually using a plurality of computing resources. In some embodiments,job model instance504 may be distributed across model parameters, i.e., each computing resource would get all of the training data but only a fraction of the model parameters. Alternatively, any other method of parallelizingjob model instance504 that enables operation ofsystem201 as described herein may be used, such as, without limitation, distributing across training data, i.e., each computing resource would get all model parameters, but only a fraction of training data, or distributing both training data and model parameters. Further, in some scenarios,job model instance504 may be parallelized across heterogeneous computing resources, i.e., the set ofinstance resources602 allocated tojob model instance504 is heterogeneous. Moreover, in some scenarios,job model instance504 may be parallelized across multiple sources of computing resources, e.g., a portion ofjob model instance504 being executed bypublic cloud208, and another portion being executed byprivate cloud210 orinternal computing resource212.
In operation, in the exemplary embodiment,resource module314 periodically checks for newjob model instances504. Whenresource module314 notices newjob model instances504 that do not yet have resources assigned,resource module314 consultsinstance resource602 to find appropriate computing resources, and assigns appropriate, currently-unutilized instance resources602 tojob model instances504.Resource module314 looks forinstance resources602 that satisfy, without limitation, platform requirements of the model, such as operating system and size of processors and memory, and minimum to maximum number of cores specified by the model. In some embodiments, if at least the minimum number of required nodes is not available, thenjob model instance504 remains unscheduled, and will be examined again at a later time. In the exemplary embodiment,resource module314 will decide whether or not to request more resources, based on factors such as, without limitation, the number of requests currently queued, the types of models requested, the final solution quality required, cost and time constraints, the current quality achieved relative to cost and time constraints, and estimated resources required to run each model. If more resources will likely be required, thenresource module314 may requestmore computing resources206 frompublic clouds208 orprivate cloud210 to bringmore instance resources602 into the available pool of resources. For example, and without limitation, ifresource module314 assesses that it can meet the time requirements imposed byrequest204 for finding a high quality solution based on the number ofinstance resources602 currently engaged, it need not engageadditional instance resources602, since there may be an extra cost incurred as a result. In some embodiments,resource module314 uses a lookup table which includes the performance metrics mentioned above, and created based on historical performance on previous similar problems. In some embodiments,resource module314 may have a maximum number of resources that may be utilized at one time, such thatresource module314 may only provision up to this maximum amount. Onceinstance resources602 have been assigned tojob model instance504,executor module312 will continue processing thejob model instance504 using theinstance resource602, as described below.
Also, in the exemplary embodiment,executor module312 utilizesAPI modules313 to transmitjob model instances504 to computingresources206.Executor module312 is responsible for communicating withindividual computing resources206 to perform functions such as, without limitation, provisioning new computing resources, transmittingjob model instances504 to computingresources206 for execution, receiving results from execution, and relinquishing computing resources no longer needed.
Further, in the exemplary embodiment,executor module312 submitsjob model instance504 toinstance resource602 for execution.Instance resource602 is one ormore computing resources206 from sources includingpublic clouds208,private clouds210, and/orinternal computing resources212. To facilitate communication with each source of computing resource,executor module312 utilizesAPI modules313. Each source of computingresources206 has an associated API. An API is a communications specification created as a protocol for communicating with a particular program, e.g., in the case of a cloud provider, the cloud provider's API creates a method of communicating with the cloud provider and the cloud resources, for performing functions such as, without limitation, provisioning new computing resources, communicating with currently-provisioned computing resources, and releasing computing resources. EachAPI module313 communicates with one source of computing resources, such as, without limitation, Amazon EC2®, or an internal high-availability cluster of private servers. AnAPI module313 for an associated source of computing resources must be included withinsystem201 in order forresource module314 to provision and allocatejob model instances504 to that source of computing resources, and in order forexecutor module312 to executejob model instances504 using that source of computing resources. In some embodiments,job model instance504 will havemultiple instance resources602 assigned from different sources, and will engage multiple API modules to communicate with each respective computing resource.
In operation, in the exemplary embodiment,executor module312 periodically checksjob model instances504, looking forjob model instances504 that have computing resources allocated and which are prepared for execution.Executor module312 examinesinstance resources602 to determine which source of computing resource has been allocated tojob model instance504, then transmits a sub-task associated with execution ofjob model instance504 to the particular computing resource using its associate API module. For example, ifjob model instance504 has been assigned 10 Linux nodes, 8 from an internal Linux cluster, and 2 from a public cloud, then executor would engage the API module associated with the internal Linux cluster to execute the 8 sub-jobs on the internal Linux cluster, and would also engage the API module associated with the public cloud provider to execute the 2 sub jobs on the public cloud. In some embodiments,executor module312 submits the entirejob model instance504 task to asingle instance resource602.
Also, in the exemplary embodiment,executor module312 periodicallypolls instance resources602 to check for completion of the assigned sub-tasks related tojob model instance504. Executor module aggregates results from multiple sub-tasks and returns the aggregated results back to job scheduler/optimizer module310.Executor module312 receivesresults data606 directly frominstance resource602, i.e., from the individual server that executed a portion ofjob model instance504. Alternatively,executor module312 receivesresults data606 from astorage manager603 or sharedstorage604, described below. For example, ifjob model instance504 was assigned to 10instance resources602, thenexecutor module312 distributes sub-tasks to each of the 10instance resources602, and subsequently polls them until completion. Onceresults data606 from all 10instance resources602 are collected, they are aggregated and returned to job scheduler/optimizer module310. In some scenarios, job scheduler/optimizer module310, depending on the type of job, analyzes the aggregated result ofjob model instance504 and returns the result214 (shown inFIG. 3). In other scenarios, job scheduler/optimizer module310 may analyze the aggregated result ofjob model instance504, but then execute a further one or morejob model instances504 before returning afinal result214. The result of the firstjob model instance504 may be used in the subsequent one or morejob model instances504. In the exemplary embodiment, job scheduler/optimizer returns the aggregated result to request module304 (shown inFIG. 3). Alternatively, job scheduler/optimizer module310 returns results ofjob model instance504 to user202 (shown inFIG. 3).
Further, in some embodiments,executor module312 may monitor the status ofinstance resources602 for any failure associated with the assigned sub-task related tojob model instance504 to which it has been assigned. For example, and without limitation, a run-time error during execution, or an operating system failure of theinstance resource602 itself Upon recognizing a failure,executor module312 may restart the sub-task related tojob model instance504 on theoriginal instance resource602, or may reassign the sub-task to analternate instance resource602. In other embodiments,executor module312 may be configured as a second layer of fault tolerance, allowing a cloud service provider to deliver the first layer of fault tolerance through their own proprietary mechanisms, and only implementing the above-described mechanisms ifexecutor module312 senses failure of the cloud service provider's fault tolerance mechanism.
Further, in the exemplary embodiment,resource module314 performs the task of provisioning and releasing computing resources. In operation, request204 may requiresystem201 to utilize more computing resources than are currently provisioned and available.Resource module314 utilizesAPI modules313 to provision new nodes upon demand, as described above.Resource module314 also releases computing resources when they are no longer required. In some embodiments,resource module314 may release resources frominstance resources602 based on demand, or cost. For example, and without limitation,resource module314 may release a node during a time of the day where peak demand increases the cost of theinstance resource602 based on time constraints and cost constraints ofrequest204, and may reacquire theinstance resource602 when the peak demand period has ended.
Moreover, in some embodiments,system201 includes astorage manager603 and sharedstorage604. Sharedstorage604 may be, without limitation, private storage or cloud storage. Sharedstorage604 is accessible by computingresources206 in such a way as to allowcomputing resources206 to storedata606 associated with execution ofjob model instances504. Sharedstorage604 may be used to storedata606, such as, without limitation, model information, model input data, and execution results. Sharedstorage604 may also be accessible bystorage manager603, which may act to passdata606 regarding the execution results back throughsystem201.Storage manager603 may also allocate sharedstorage604 to computingresources206, and may allocate sharedstorage604 based on a request fromexecutor module312 or job scheduler/optimizer module310.
FIG. 7 is a block diagram of anexemplary method700 of automatic model identification and creation by provisioningheterogeneous computing resources206 for Machine Learning using system201 (shown inFIG. 3).Method700 is implemented by at least onecomputing system120 including at least one processor152 (shown inFIG. 1) and at least one memory device150 (shown inFIG. 1) coupled to the at least oneprocessor152. Anexecution request204 is received702.
Also, in the exemplary embodiment, one or more algorithms are selected704 fromlibrary308. Each algorithm inlibrary308 includes one of source code and machine-executable code. Selecting704 a subset of algorithms is based at least partially onexecution request204. One or more execution jobs, e.g.job model instances504, are identified706 for execution. Each of the one or more execution jobs includes at least one algorithm from thelibrary308.
Further, in the exemplary embodiment, a subset of computing resources is determined708 from a plurality ofcomputing resources206. Plurality ofcomputing resources206 includes one of at least one internal computing resource, i.e.,private cloud210 andinternal computing resource212, and at least one third-party computing resource, i.e.,public cloud208, and a plurality of third-party computing resources, i.e.,public cloud208.Computing system120 transmits710 at least one of the one or more execution jobs to at least onecomputing resource206 of the subset of computing resources, and receives712 anexecution result214.
FIG. 8 is a block diagram of anotherexemplary method800 of automatic model identification and creation by provisioningheterogeneous computing resources206 for Machine Learning using system201 (shown inFIG. 3).Method800 is implemented by at least onecomputing system120 including at least one processor152 (shown inFIG. 1) and at least one memory device150 (shown inFIG. 1) coupled to the at least oneprocessor152. Ajob request204 comprising one or more individual jobs is identified802. Forjob request204, one or more computing resource requirements are identified804.
Also, in the exemplary embodiment, an execution set of computing resources is determined806 from a pool of computing resources based at least partially on the one or more computing resource requirements.Computing resources206 includes one of at least one internal computing resource, i.e.,private cloud210 andinternal computing resource212, and at least one external computing resource, i.e.,public cloud208, and a plurality of external computing resources, i.e.,public clouds208. Each computing resource of the pool of computing resources defines an associated API that facilitates communication between system201 (shown inFIG. 3) and the computing resource. From the execution set of computing resources, a first computing resource is assigned808 to a first individual job of the one or more individual jobs, e.g., job model instances504 (shown inFIGS. 5-6).
Further, in the exemplary embodiment, a plurality ofinterface modules313 is identified810. Each interface module is configured to facilitate communication with one ormore computing resources206 using the associated API. An interface module is selected812 from plurality ofinterface modules313 based at least in part on facilitating communication with the first computing resource.Computing system120 transmits814 the first individual job for execution to the first computing resource using the first interface module, and receives816 an execution result. As used herein, the term “interface modules” refers to API modules.
FIGS. 9-11 show a diagram of an exemplary database table structure900 for system201 (shown inFIG. 2) in three parts. Each element inFIGS. 9-11 represents a separate table indatabase164, and the contents of each element show the table name and the table structure, including field names and data types. The interconnections between elements indicate at least a relation between the two tables, such as, without limitation, a common field. In operation, each table is utilized by one or more of the components ofsystem201 to track and process the various stages of execution of job request204 (shown inFIG. 2). The relationships between the tables and the components ofsystem201 are described below.
FIG. 9 is a block diagram showing a first portion of exemplary database table structure900 for system201 (shown inFIG. 3), showing the primary tables used by request module (shown inFIG. 3).Request916 is a high level table containing information regarding requests204 (shown inFIG. 3). Detailed information forrequest204 is stored inRequest Info913, and includes, without limitation, information regarding performance criteria, computing resource preferences and limitations, model names, input and output files, wrapper files, model files, and information about data sensitivity and encryption.Job914 is a table containing information relating toprocessing request204.Job914 ties together information fromtasks908 andRequest916, and is used to initiate processing further processing bysystem201. Asingle request204 may generate one or more entries inJob914.Models903 is a table that maintains a library of machine learning models available tosystem201.Tasks908 is a table that maintains task types for the variety of models thatsystem201 can handle.Task Models901 is a table that associatesModels903 with their respective task types.
In operation, in the exemplary embodiment, user202 (shown inFIG. 2) submitsrequest204 tosystem201. New requests302 are received and processed by request module304 (shown inFIG. 3). Upon receivingrequest204,request module304 creates a new row inRequest916, and a new row inRequest Info913. Information associated withrequest204 is stored inRequest Info913.Request204 may specify whichModel903user202 wants to be used. Alternatively,user202 may specify a task type, from whichsystem201 executes one ormore Models903 associated with that task type.Request module304 then creates a new row inJob914. The creation of theJob914 entries serves as an avenue of communication to job scheduler/optimizer module310 (shown inFIG. 3). When job scheduler/optimizer module310 notices new entries inJob914 table, job scheduler/optimizer module310 will continue processing.
FIG. 10 is a block diagram showing a second portion the exemplary database structure900 for system201 (shown inFIG. 3), showing the primary tables used by job scheduler/optimizer module (shown inFIG. 3). AJob Model915 table includes information about jobs that need to be executed to complete request204 (shown inFIG. 3). Each entry inJob Model915 is associated with a single entry inJob914 table, as well as a single entry inModels903. AJob Model Instance911 table includes information related to entries inJob Model915, further refining restrictions ofJob Model915 based on, for example, and without limitation, computing resource limitations based on the particular model, and computing resource limitations based on request204 (shown inFIG. 3). In the exemplary embodiment,Job Model Instance911 includes a single row for each row inJob Model915. Alternatively, a single row inJob Model915 may result in multipleJob Model Instances911.
In operation, in the exemplary embodiment, job scheduler/optimizer module310 (shown inFIG. 3) notices a new, unprocessed row appear inJob914. Job scheduler/optimizer module310 selectsn models903, and creates n new rows inJob Model915 table. Each of these new rows inJob Model915 represents a sub-task, affiliated with an individual model fromModels903, that needs to be executed to completerequest204. Job scheduler/optimizer module310 then creates n rows inJob Model Instance911 table, each correlating to one of the n new rows inJob Model915. Job scheduler/optimizer module310 considers and formulates restraints for eachJob Model915 when creatingJob Model Instance911. The creation of theJob Model Instance911 entries serves as an avenue of communication to resource module314 (shown inFIG. 3) and executor module312 (shown inFIG. 3). Whenresource module314 notices new entries inJob Model Instance911,resource module314 will continue processing.
FIG. 11 is a block diagram showing a third portion of the exemplary database structure900 for system201 (shown inFIG. 3), showing the primary tables used by executor module (shown inFIG. 3) and resource module (shown inFIG. 3). Information about computing resources206 (shown inFIG. 3) is maintained by three tables, ComputeResources912,Resource Instance920, andInstance Resource919.Compute Resources912 includes high-level information about sources of computing resources.Resource Instance920 provides details regarding each individual computing resource currently provisioned to or otherwise available for use bysystem201.Instance Resource919 tracks allocation ofResource Instances920. In the exemplary embodiment, eachResource Instance920 has a corresponding row inInstance Resource919 table whenever theResource Instance920 is assigned to perform work. Alternatively, a row inInstance Resource919 table is created when theResource Instance920 starts to perform assigned work.
In operation, in the exemplary embodiment, each cloud service provider with whichsystem201 is configured to act has a row inCompute Resources912. Each private cloud or internal resource may also have rows inCompute Resources912. For example, and without limitation, ComputeResources912 table may have an entry for Amazon EC2®, Rackspace®, Terremark®, a private internal cloud, and internal computing resources. Each row represents a source of computing services with whichsystem201 is configured to interact.Resource Instance920 has a row for each individual computing device currently provisioned to or otherwise available for use bysystem201. EachResource Instance920 will have a “parent”compute resource912 associated with it, based on which cloud service provider, or other source, theCompute Resource912 comes from. For example, and without limitation, whensystem201 provisions10 virtual servers from Amazon EC2®,system201 will create 10 entries inResource Instance920, each of which correspond to a single Amazon EC2® virtual server. In the exemplary embodiment, for cloud resources, these rows are created and deleted assystem201 provisions and releases virtual servers from the Cloud Service Providers. Alternatively, rows may remain in the table despite release of the row's associated virtual server.
Also in operation, in the exemplary embodiment, theResource Instances920 are assigned to perform work, i.e., they are assigned to executeJob Model Instances911. Thetable Instance Resource919 tracks the assignment ofResource Instances920 tojob model instances911. When a newJob Model Instance911 is added,resource module314 assigns aResource Instance920 to theJob Model Instance911.Resource module314 assignsResource Instance920 toJob Model Instance911 based on information inJob Model Instance911. Alternatively,executor module312 orresource module314 creates or updates a row inInstance Resource919 associated with theResource Instance920.
Also, in the exemplary embodiment, shared storage604 (shown inFIG. 7) may be assigned for use byResource Instances920. AStorage Resources906 table includes high-level information about storage resource providers available tosystem201. AStorage Instances907 table includes information about individual storage instances that have been provisioned by or assigned for use bysystem201. In operation, the execution of aJob Model Instance911 may require use of aStorage Instance907. Storage manager603 (shown inFIG. 7) assigns aStorage Instance907, i.e., shared storage604 (shown inFIG. 7), to theJob Model Instance911 for use during execution.
The above-described systems and methods provide ways to automatically provision computing resources from a heterogeneous set of computing resources for purposes of Machine Learning. The embodiments described herein take a request from a user, selects, from a database of models, a subset of models that meet the performance requirements specified in the user's request, and searches for a single best model or best combination of a series of models. The search process is performed by breaking up the model space into individual job components consisting of one or more models, with each model having multiple individual instances using that model. The division of the user's request into discrete units of work allows the system to leverage multiple computing resources in processing the request. The system leverages many different sources of computing resources, including both cloud computing resources from various cloud providers, as well as private clouds or internal computing resources. The system also leverages different types of computing resources, such as computing resources differing in underlying operating system and hardware architecture. The ability to leverage multiple sources of computing resources, as well as types of computing resources allows the system greater flexibility and computational capacity. The combination of automation, flexibility, and capacity makes analysis of large search spaces feasible where, before, it was a manual, time consuming process. The system also includes constraint features that can allow a user to customize a request such that it can be restricted to what type of computing resources it leverages, or how much computing resources it leverages.
An exemplary technical effect of the methods and systems described herein includes at least one of: (a) insulating the requesting user from the computational details of resource allocation; (b) leveraging different sources and types of computing resources for execution of the user's computational work; (c) leveraging distributed computing, from both internal and internet-based cloud computing providers, for processing a user's Machine Learning or other computational problems; (d) increasing flexibility and computational capacity available to users; (e) reducing human man-hours by automating the processing of a user's Machine Learning or other computational requests through the use of a models database; (f) increasing scalability to a particular problem's data size and computational complexity.
Exemplary embodiments of systems and methods for automated provisioning of heterogeneous computing resources for Machine Learning are described above in detail. The systems and methods described herein are not limited to the specific embodiments described herein, but rather, components of systems and/or steps of the methods may be utilized independently and separately from other components and/or steps described herein. For example, the methods may also be used in combination with other systems requiring distributed computing systems and methods, and are not limited to practice with only the automatic model identification and creation with high scalability systems and methods as described herein. Rather, the exemplary embodiments can be implemented and utilized in connection with many other concept extraction applications.
Although specific features of various embodiments may be shown in some drawings and not in others, this is for convenience only. In accordance with the principles of the systems and methods described herein, any feature of a drawing may be referenced and/or claimed in combination with any feature of any other drawing.
This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims.