- Notifications
You must be signed in to change notification settings - Fork121
Simplify HPC and Batch workloads on Azure
License
Azure/batch-shipyard
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This toolkit is no longer actively maintained. Thedevelop
branch hasproposed fixes for outstanding issues, but they will not be merged back tomaster
. Please see the mainAzure BatchGitHub repository for more information about Azure Batch.
Batch Shipyard is a tool to helpprovision, execute, and monitor container-based batch processing and HPCworkloads onAzure Batch. Batch Shipyardsupports bothDocker andSingularity containers. No experience with theAzure Batch SDK is needed; runyour containers with easy-to-understand configuration files. All Azureregions are supported, including non-public Azure regions.
Additionally, Batch Shipyard provides the ability to provision and manageentirestandalone remote file systems (storage clusters)in Azure, independent of any integrated Azure Batch functionality.
- Support for multiple container runtimes includingDocker,Singularity, andKata Containers tuned for Azure Batchcompute nodes
- Automated deployment of container images required for tasks to compute nodes
- Support for container registries includingAzure Container Registryfor both Docker and Singularity images (ORAS), other Internet-accessiblepublic and private registries, and support fortheSylabs Singularity Library andSingularity Hub
- Transparent support for GPU-accelerated container applications on bothDocker and SingularityonAzure N-Series VM instances
- Transparent assist for running Docker and Singularity containers utilizingInfiniband/RDMA on HPC Azure VM instances includingA-Series,H-Series,Hb/Hc-Series,andN-Series
- Integrated support forSingularity Encrypted Containers
- Comprehensivedata movementsupport: move data easily between locally accessible storage systems, remotefilesystems, Azure Blob or File Storage, and compute nodes
- Standalone Remote Filesystem Provisioningwith integration to auto-link these filesystems to compute nodes withsupport forNFS andGlusterFS distributed network file system
- Automatic shared data volume support for linking toRemote Filesystems,Azure Filevia SMB,Azure Blobviablobfuse,GlusterFS provisioned directly on compute nodes,and custom Linux mount support (fstab)
- Support for automated on-demand, per-job distributed scratch spaceprovisioning viaBeeGFS BeeOND
- Automated, integratedresource monitoringwithPrometheus andGrafanafor Batch pools and RemoteFS storage clusters
- Support forBatch Insights
- Support forelastic cloud burstingonSlurm to Batch pools with automatedRemoteFS shared file system linking
- Support forserverless executionbinding with Azure Functions
- Support for credential management throughAzure KeyVault
- Federationsupport: enables unified, constraint-based scheduling to collections ofheterogeneous pools, including across multiple Batch accounts and Azureregions
- Support for simple, scenario-basedpool autoscaleand autopool to dynamically scale and control computing resources on-demand
- Support forTask Factorieswith the ability to generate tasks based on parametric (parameter) sweeps,randomized input, file enumeration, replication, and custom Python code-basedgenerators
- Support formulti-instance tasksto accommodate MPI and multi-node cluster applications packaged as Docker orSingularity containers on compute pools with automatic job completion andtask termination
- Seamless, direct high-level configuration support for popular MPI runtimesincluding OpenMPI, MPICH, MVAPICH, and Intel MPI with automatic configurationfor Infiniband, including SR-IOV RDMA VM sizes
- Seamless integration with Azure Batch job, task and file concepts along withfull pass-through of theAzure Batch APIto containers executed on compute nodes
- Support forAzure Batch task dependenciesallowing complex processing pipelines and DAGs
- Support for merge or final task specification that automatically dependson all other tasks within the job
- Support for job schedules and recurrences for automatic execution oftasks at set intervals
- Support for live job and job schedule migration between pools
- Support forLow Priority Compute Nodes
- Support for deploying Batch compute nodes into a specifiedVirtual Networkand pre-defined public IP addresses
- Automatic setup of SSH or RDP users to all nodes in the compute pool andoptional creation of SSH tunneling scripts to Docker Hosts on compute nodes
- Support forcustom host imagesincluding Shared Image Gallery
- Support forWindows Containerson compliant Windows compute node pools with the ability to activateAzure Hybrid Use Benefitif applicable
Please seethe installation guidefor more information regarding the various local installation options andrequirements.
Batch Shipyard is integrated directly intoAzure Cloud Shelland you can execute any Batch Shipyard workload using your web browser orthe Microsoft AzureAndroidandiOSapp.
Simply request a Cloud Shell session and typeshipyard
to invoke the CLI;no installation is required. Try Batch Shipyard nowin your browser.
Please refer to theBatch Shipyard Documentation on Read the Docs.
Visit theBatch Shipyard Recipessection for various sample container workloads using Azure Batch and BatchShipyard.
Batch Shipyard is currently compatible with popular Azure Batch supportedMarketplace Linux VMs,compliant Linux custom images,and native Azure BatchWindows Server with ContainersVMs. Please see theplatform image supportdocumentation for more information specific to Batch Shipyard support ofcompute node host operating systems.
Please see theChange Logfor project history.
Please see this project'sCode of Conduct andContributing guidelines.
About
Simplify HPC and Batch workloads on Azure