Take advantage of elasticity Stay organized with collections Save and categorize content based on your preferences.
This principle in the performance optimization pillar of theGoogle Cloud Well-Architected Framework provides recommendations to help you incorporate elasticity, which is the abilityto adjust resources dynamically based on changes in workload requirements.
Elasticity allows different components ofa system to scale independently. This targeted scaling can help improve performance andcost efficiency by allocating resources precisely where they're needed, withoutover provisioning or under provisioning your resources.
Principle overview
The performance requirements of a system directly influence when and how thesystem scales vertically or scales horizontally. You need to evaluate the system'scapacity and determine the load that the system is expected to handle at baseline.Then, you need to determine how you want the system to respond to increases and decreasesin the load.
When the load increases, the system must scale out horizontally, scale upvertically, or both. For horizontal scaling, add replica nodes to ensure thatthe system has sufficient overall capacity to fulfill the increased demand. Forvertical scaling, replace the application's existing components with componentsthat contain more capacity, more memory, and more storage.
When the load decreases, the system must scale down (horizontally, vertically,or both).
Define thecircumstances in which the system scales up or scales down. Plan tomanually scale up systems for known periods of high traffic. Use tools likeautoscaling, which responds to increases or decreases in the load.
Recommendations
To take advantage of elasticity, consider the recommendations in the followingsections.
Plan for peak load periods
You need to plan an efficient scaling path for known events, such as expectedperiods of increased customer demand.
Consider scaling up your system ahead of known periods of high traffic. Forexample, if you're a retail organization, you expect demand to increase duringseasonal sales. We recommend that you manually scale up or scale out your systems beforethose sales to ensure that your system can immediately handle the increased loador immediately adjust existing limits. Otherwise, the system might take several minutes toadd resources in response to real-time changes. Your application's capacitymight not increase quickly enough and cause some users to experience delays.
For unknown or unexpected events, such as a sudden surge in demand or traffic,you can use autoscaling features to trigger elastic scaling that's based onmetrics. These metrics can include CPU utilization, load balancer servingcapacity, latency, and even custom metrics that you define inCloud Monitoring.
For example, consider an application that runs on aCompute Engine managed instance group (MIG). This application has a requirement that each instance performsoptimally until the average CPU utilization reaches 75%. In this example, youmight define anautoscaling policy that creates more instances when the CPU utilization reaches the threshold.These newly-created instances help absorb the load, which helps ensure that the averageCPU utilization remains at an optimal rate until the maximum number of instancesthat you've configured for the MIG is reached. When the demand decreases, theautoscaling policy removes the instances that are no longer needed.
Planresource slot reservations in BigQuery or adjust the limits for autoscaling configurations in Spanner by using themanaged autoscaler.
Use predictive scaling
If your system components include Compute Engine, you must evaluate whetherpredictive autoscaling is suitable for your workload. Predictive autoscaling forecasts the future loadbased on your metrics' historical trends—for example, CPU utilization.Forecasts are recomputed every few minutes, so the autoscaler rapidly adapts itsforecast to very recent changes in load. Without predictive autoscaling, anautoscaler can only scale a group reactively, based on observed real-time changesin load. Predictive autoscaling works with both real-time data andhistorical data to respond to both the current and the forecasted load.
Implement serverless architectures
Consider implementing a serverless architecture with serverless services thatare inherently elastic, such as the following:
Unlike autoscaling in other services that require fine-tuning rules (forexample, Compute Engine), serverless autoscaling is instant and canscale down to zero resources.
Use Autopilot mode for Kubernetes
For complex applications that require greater control over Kubernetes, considerAutopilot mode in Google Kubernetes Engine (GKE).Autopilot mode provides automation and scalability by default.GKE automatically scales nodes and resources based ontraffic. GKE manages nodes, creates new nodes for your applications, andconfigures automatic upgrades and repairs.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-12-06 UTC.