Best practices for using sole-tenant nodes to run VM workloads

Linux Windows

When you're planning to run VM workloads on sole-tenant nodes, you must firstdecide how to deploy sole tenant nodes. In particular, you have to decide howmany node groups you need, and which maintenance policy the node groups should use:

  • Node groups: To choose the right number of node groups, you must weighoff availability and resource utilization. While a small number of node groupslets you optimize resource utilization and cost, it limits you to a singlezone. Deploying node groups across multiple zones lets you improve availability,but can result in lower resource utilization.

  • Autoscaling and maintenance policies: Depending on licensing requirementsof the operating systems and software that you're planning to run, nodeautoscaling and your choice of maintenance policy can impact your licensingcost and availability.

To make the right decision on how to use sole-tenant nodes, you must carefullyconsider your requirements.

Assessing requirements

The following section lists several requirements that you should consider beforedeciding how many node groups you need, and which maintenance policy the nodegroups should use.

BYOL and per-core licensing

If you're planning tobring your own license(BYOL) to Compute Engine, sole-tenant nodes can help you address thehardware requirements or constraints imposed by these licenses.

Some software and operating systems such as Windows Servercan belicensed by physical CPU coreand might define limits on how frequently you are allowed to change the hardwareunderlying your virtual machines.Node autoscaling and thedefault maintenance policycan lead to hardware being changed more often than your license terms permit,which can result in additional licensing fees.

To optimize for per-core BYOL, consider the following best practices:

  • Find a balance between optimizing infrastructure cost and licensing cost:To calculate the overall cost of running BYOL workloads on Compute Engine,you must consider both infrastructure cost and licensing cost. In some cases,minimizing infrastructure cost might increase licensing overhead, or viceversa. For example, using a node type with a high number of cores mightbe best from a cost/performance standpoint for certain workloads, but couldlead to additional licensing cost if licenses are priced by core.

  • Use separate node groups for BYOL and non-BYOL workloads: To limit thenumber of cores you need to license, avoid mixing BYOL and non-BYOL workloadsin a single node group and use separate node groups instead.

    If you use multiple different BYOL licensing models (for example, Windows ServerDatacenter and Standard), splitting node groups by licensing model can helpsimplify license tracking and reduce licensing cost.

  • Use CPU overcommit and node types with a high core/memory ratio: Node typesdiffer in their ratio between sockets, cores, and memory.Using a node type with a higher core:memory ratio andenabling CPU overcommitcan help reduce the number of cores that you need to license.

  • Avoid scale-in autoscaling:Node group autoscalinglets you automatically grow or shrink a node group based on current demand.Frequent growing and shrinking of a node group implies that you're frequentlychanging the hardware that your VMs run on.

    If you're changing hardware more frequently that you're allowed to move licenses betweenphysical machines, these hardware changes can lead to a situation where youhave to license more cores that you're actually using.

    If there are limits on how frequently you're allowed to move between physical machines,you can avoid excessive licensing cost by disabling autoscaling or by configuring autoscalingtoonly scale out.

  • Don't use the default maintenance policy: Thedefault maintenance policylets you optimize for VM availability, but can cause frequent hardware changes.To minimize hardware changes and optimize for licensing cost, use themigrate within node group maintenanceorrestart in placepolicy instead.

For workloads that don't require per-core licensing, consider the following best practices instead:

Management

When you have more than one workload, or when you have both development andproduction workloads that need to run on sole-tenant nodes, consider the followingbest practices:

  • Use separate node groups for development and production environments:Using separate node groups helps you isolate environments from another andavoid situations such as the following:

    • A development VM might impact the performance of production VMs byexcessively consuming CPU, disk, or network resources ("noisy neighbor").
    • A development workload might exhaust the capacity of a node group,preventing the creation of new production VMs.
  • Limit the number of node groups in each environment: If you run multiplenode groups, it can be difficult to fully utilize each node group. To optimizeutilization, combine workloads of each environment and schedule them on asmall number of node groups.

  • Use dedicated projects for managing node groups: For each environment,create a dedicated project to manage node groups. Thenshare the node groups with projects that contain workloads.

    This approach lets you simplify access control by using a separate projectfor each workload while also letting you optimize resource utilization bysharing node groups across workloads.

  • Share node groups with individual projects: Instead of sharing a nodegroup with an entire organization,share it with individual projects only. Selecting projects individually helpsyou maintain a separation between environments, and avoids disclosing informationabout node groups to other projects.

  • Establish a process for internal cost attribution: The cost for runninga node group is incurred in the project that contains the node group. Ifyou need to attribute this cost to individual projects or workloads, considertracking the usage of your sole-tenant VMsand using this data to perform internal cost attribution.

Availability

Your workloads might differ in their availability requirements, and whetherhigh-availability can be achieved on the application layer or whether it needsto be implemented on the infrastructure layer:

  • Clustered applications: Some of your workloads might implement high availabilityin the application by using clustering techniques such as replication andload-balancing.

    Examples for such workloads include Active Directory domain controllers,SQL Server Failover Cluster Instances and Availability Groups, or clusteredload-balanced applications running in IIS.

    Clustered applications can typically sustain individual VM outages as longas the majority of VMs remain available.

  • Non-clustered applications: Some of your workloads might not implementany clustering capabilities and instead require that the VM itself must bekept available.

    Examples for such workloads include non-replicated database servers or statefulapplication servers.

    To optimize the availability of individual VMs, you need to ensure that theVM can belive-migrated in case ofan upcoming node maintenance event.

    Live migration is supported by thedefault maintenance policyand themigrate within node groupmaintenance policy, but isn't supported if you use therestart in place maintenance policy.

  • Non-critical applications: Batch workloads, development/test workloads,and other lower-priority workloads might have no particular availabilityrequirements. For these workloads, it might be acceptable if individual VMsare unavailable during node maintenance.

To accommodate the availability requirements of your workloads, consider the followingbest practices:

  • Use node groups in different zones or regions to deploy clustered workloads:Sole-tenant nodes and node groups are a zonal resource. To protect againstzonal outages, deploy multiple node groups in different zones or regions.Usenode affinityto schedule VMs so that each instance of your clustered application runs ona different node in a different zone or region.

    If two or more of your node groups use the default or restart in placemaintenance policy, configure maintenance windows so that they are unlikely to overlap.

    If multiple instances of your clustered applications must run in a single zone,use anti-affinity to ensure that the VM instances are scheduled on different nodesor node groups.

  • Avoid the restart in place maintenance policy for non-clustered workloads that require high availability:Because therestart in placemaintenance policy shuts down VMs when the underlying node requires maintenance,prefer using different maintenance policy for node groups that run non-clusteredworkloads that require high availability.

  • Use managed instance groups to increase resilience and availability of workloads:You can further increase the resilience and availability of your deploymentby using managed instance groups to monitor the health of your workloads andtoautomatically recreate VM instances if necessary.You can use managed instance groups for bothstatelessandstateful workloads.

Performance

Your workloads might differ in their sensitivity to performance fluctuations.For certain internal applications or test workloads, optimizing cost might bemore important than ensuring consistent performance throughout the day. Forother workloads such as externally facing applications, performance might becritical and more important than resource utilization.

To make the best use of your sole-tenant nodes, consider the following best practices:

  • Use dedicated node groups and CPU overcommit for performance-insensitive workloads:CPU overcommit letsyou increase the VM density on sole-tenant nodes and can help reduce the numberof sole-tenant nodes required.

    To use CPU overcommit, you must use anode type that supports CPU overcommit.Enabling CPU overcommit for a node group causes additional charges per sole-tenant node.

    CPU overcommit can be most cost-effective if you use a dedicated node groupfor workloads that are suitable for CPU overcommit and enable CPU overcommitfor this node group only. Leave CPU overcommit disabled for any node groups thatneed to run performance-sensitive workloads.

  • Use a node type with a high core:memory ratio for CPU overcommit:While CPU overcommit lets you share cores between VMs, it doesn't let youshare memory between VMs. Using a node type that has relatively more memoryper CPU core helps you ensure that memory doesn't become a limiting factor.

  • Use node autoscaling for performance-sensitive workloads: To accommodatevarying resource needs for workloads that are performance-sensitive, configureyour node group touse autoscaling.

Deployment patterns

The best way to use sole-tenant nodes depends on your individual requirements.The following section describes a selection of patterns that you can use as astarting point to derive an architecture that suits your individual requirements.

Multiple node groups for mixed performance requirements

If you have a combination of workloads that are performance–sensitive (for example,customer-facing applications) and performance-insensitive (for example, internalapplications), you can use multiple node groups that use different node types:

Multiple node groups for mixed performance requirements

  • One node group uses CPU overcommit and a node type with a 1:8 vCPU:memoryratio. This node group is used for performance-insensitive workloads.
  • A second node group uses a compute-optimized node type with a 1:4 vCPU:memory ratiowithout CPU overcommit. This node group is used for performance-critical workloadsand is configured to scale up and down on demand.

Multi-zone high availability for clustered per-core licensed workloads

If you're running clustered workloads that use per-core licensing and need tominimize hardware changes, you can strike a balance between availability andlicensing overhead by using multiple node groups with non-overlapping maintenancewindows:

Multi-zone high availability for clustered per-core licensed workloads

  • Multiple node groups are deployed across differentzones or regions.
  • All node groups use therestartmaintenance policy. The node groups use non-overlapping maintenance windowsso that no more than one node group should experience maintenance-relatedoutages at a time.
  • VM instances that run clustered workloads use affinity labels so that eachcluster node is scheduled on a node group in a different zone.

Multi-zone high availability for mixed per-core licensed workloads

If you're using per-core licensing, but not all of your workloads are clustered,you can extend the previous pattern by using heterogeneous maintenance policies:

Multi-zone high availability for mixed per-core licensed workloads

  • The primary node group is deployed in zonea and runs both clustered andnon-clustered workloads. To minimize outages caused by hardware maintenance,the node group uses themigrate within node groupmaintenance policy.
  • One or more secondary node groups are deployed in additionalzones or regions.These node groups use therestartmaintenance policy, but use non-overlapping maintenance windows.
  • VM instances that run clustered workloads use affinity labels so that eachcluster node is scheduled on a node group in a different zone.
  • VM instances that run non-clustered workloads use affinity labels so thatthey are deployed on the primary node group.

By only scheduling clustered workloads on the secondary node groups, you can ensurethat the temporary outages caused by the restart maintenance policy has minimalimpact on overall availability. At the same time, you limit licensing and infrastructureoverhead by using themigrate within node groupmaintenance policy for the primary node group only.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.