Patterns for scalable and resilient apps

This document introduces some patterns and practices for creating apps that areresilient and scalable, two essential goals of many modern architectureexercises. A well-designed app scales up and down as demand increases anddecreases, and is resilient enough to withstand service disruptions. Buildingand operating apps that meet these requirements requires careful planning anddesign.

Scalability: Adjusting capacity to meet demand

Scalability is the measure of a system's ability to handle varying amounts of work by addingor removing resources from the system. For example, a scalable web app is onethat works well with one user or many users, and that gracefully handles peaksand dips in traffic.

The flexibility to adjust the resources consumed by an app is a key businessdriver for moving to the cloud. With proper design, you can reduce costs byremoving under-utilized resources without compromising performance or userexperience. You can similarly maintain a good user experience during periods ofhigh traffic by adding more resources. In this way, your app can consume onlythe resources necessary to meet demand.

Google Cloud provides products and features to help you build scalable,efficient apps:

  • Compute Engine virtual machines andGoogle Kubernetes Engine (GKE) clusters integrate with autoscalers that let you grow or shrink resourceconsumption based on metrics that you define.
  • Google Cloud'sserverless platform provides managed compute, database, and other services that scale quicklyfrom zero to high request volumes, and you pay only for what you use.
  • Database products likeBigQuery,Spanner,andBigtable can deliver consistent performance across massive data sizes.
  • Cloud Monitoring provides metrics across your apps and infrastructure, helping you makedata-driven scaling decisions.

Resilience: Designing to withstand failures

A resilient app is one that continues to function despite failures of systemcomponents. Resilience requires planning at all levels of your architecture. Itinfluences how you lay out your infrastructure and network and how you designyour app and data storage. Resilience also extends to people and culture.

Building and operating resilient apps is hard. This is especially true fordistributed apps, which might contain multiple layers of infrastructure,networks, and services. Mistakes and outages happen, and improving theresilience of your app is an ongoing journey. With careful planning, you canimprove the ability of your app to withstand failures. With proper processes andorganizational culture, you can also learn from failures to further increaseyour app's resilience.

Google Cloud provides tools and services to help you build highly availableand resilient apps:

  • Google Cloud services are available inregions and zones across the globe, enabling you to deploy your app to best meet youravailability goals.
  • Compute Engine instance groups and GKE clusterscan be distributed and managed across the available zones in a region.
  • Compute Engineregional persistent disks are synchronously replicated across zones in a region.
  • Google Cloud provides a range ofload-balancing options to manage your app traffic, including global load balancing that can directtraffic to a healthy region closest to your users.
  • Google Cloud'sserverless platform includes managed compute and database products that offer built-inredundancy and load balancing.
  • Cloud Build runs your builds on Google Cloud and it lets you deploy onplatforms like App Engine, Compute Engine, Cloud Run, andGKE.
  • Cloud Monitoring provides metrics across your apps andinfrastructure, helping you make data-driven decisions about theperformance and health of your apps.

Drivers and constraints

There are varying requirements and motivations for improving the scalabilityand resilience of your app. There might also be constraints that limit yourability to meet your scalability and resilience goals. The relative importanceof these requirements and constraints varies depending on the type of app, theprofile of your users, and the scale and maturity of your organization.

Drivers

To help prioritize your requirements, consider the drivers from the differentparts of your organization.

Business drivers

Common drivers from the business side include the following:

  • Optimize costs and resource consumption.
  • Minimize app downtime.
  • Ensure that user demand can be met during periods of high usage.
  • Improve quality and availability of service.
  • Ensure that user experience and trust are maintained during any outages.
  • Increase flexibility and agility to handle changing market demands.

Development drivers

Common drivers from the development side include the following:

  • Minimize time spent investigating failures.
  • Increase time spent on developing new features.
  • Minimize repetitive toil through automation.
  • Build apps using the latest industry patterns and practices.

Operations drivers

Requirements to consider from the operations side include the following:

  • Reduce the frequency of failures requiring human intervention.
  • Increase the ability to automatically recover from failures.
  • Minimize repetitive toil through automation.
  • Minimize the impact from the failure of any particular component.

Constraints

Constraints might limit your ability to increase the scalability and resilienceof your app. Ensure that your design decisions don't introduce or contribute tothese constraints:

  • Dependencies on hardware or software that are difficult to scale.
  • Dependencies on hardware or software that are difficult to operate in ahigh-availability configuration.
  • Dependencies between apps.
  • Licensing restrictions.
  • Lack of skills or experience in your development and operations teams.
  • Organizational resistance to automation.

Patterns and practices

The remainder of this document defines patterns and practices to help you buildresilient and scalable apps. These patterns touch all parts of your applifecycle, including your infrastructure design, app architecture, storagechoices, deployment processes, and organizational culture.

Three themes are evident in the patterns:

  • Automation. Building scalable and resilient apps requires automation.Automating your infrastructure provisioning, testing, and app deploymentsincreases consistency and speed, and minimizes human error.
  • Loose coupling. Treating your system as a collection of looselycoupled, independent components allows flexibility and resilience.Independence covers how you physically distribute your resources and howyou architect your app and design your storage.
  • Data-driven design. Collecting metrics to understand the behavior ofyour app is critical. Decisions about when to scale your app, orwhether a particular service is unhealthy, need to be based on data.Metrics and logs should be core features.

Automate your infrastructure provisioning

Create immutable infrastructure through automation to improve the consistencyof your environments and increase the success of your deployments.

Treat your infrastructure as code

Infrastructure as code (IaC) is a technique that encourages you to treat yourinfrastructure provisioning and configuration in the same way you handleapplication code. Your provisioning and configuration logic is stored in sourcecontrol so that it's discoverable and can be versioned and audited. Because it'sin a code repository, you can take advantage of continuous integration andcontinuous deployment (CI/CD) pipelines, so that any changes to yourconfiguration can be automatically tested and deployed.

By removing manual steps from your infrastructure provisioning, IaC minimizeshuman error and improves the consistency and reproducibility of your apps andenvironments. In this way, adopting IaC increases the resilience of your apps.

Infrastructure Manager lets you automate the creation and management of Google Cloud resources.Alternatively,Config Connector lets you manage your resources using Kubernetes techniques and workflows.Google Cloud also has built-in support for popular third-party IaC tools,includingTerraform,Chef, andPuppet.

Create immutable infrastructure

Immutable infrastructure is a philosophy that builds on the benefits ofinfrastructure as code. Immutable infrastructure mandates that resources neverbe modified after they're deployed. If a virtual machine, Kubernetes cluster, orfirewall rule needs to be updated, you can update the configuration for theresource in the source repository. After you've tested and validated thechanges, you fully redeploy the resource using the new configuration. In otherwords, rather than tweaking resources, you re-create them.

Creating immutable infrastructure leads to more predictable deployments androllbacks. It also mitigates issues that are common in mutable infrastructures,like configuration drift andsnowflake servers.In this way, adopting immutable infrastructure further improves the consistencyand reliability of your environments.

Design for high availability

Availability is a measure of the fraction of time that a service is usable.Availability is often used as a keyindicator of overall service health. Highly available architectures aim to maximizeservice availability, typically through redundantly deploying components. Insimplest terms, achieving high availability typically involves distributingcompute resources, load balancing, and replicating data.

Physically distribute resources

Google Cloud services are available in locations across the globe. Theselocations are divided intoregions and zones.How you deploy your app across these regions and zones affects theavailability, latency, and other properties of your app. For more information,seebest practices for Compute Engine region selection.

Redundancy is the duplication of components of a system in order to increasethe overall availability of that system. In Google Cloud, redundancy istypically achieved by deploying your app or service to multiplezones, or even in multiple regions. If a service exists in multiple zones orregions, it can better withstand service disruptions in a particular zone orregion. Although Google Cloud makes every effort to prevent such disruptions,certain events are unpredictable and it's best to be prepared.

With Compute Enginemanaged instance groups,you can distribute virtual machine instances across multiple zones in a region,and you can manage the instances as a logical unit. Google Cloud alsooffersregional persistent disks to automatically replicate your data to two zones in a region.

You can similarly improve the availability and resilience of your apps deployedon GKE by creatingregional clusters.A regional cluster distributes GKE control plane components,nodes, and pods across multiple zones within a region. Because your controlplane components are distributed, you can continue to access the cluster'scontrol plane even during an outage involving one or more (but not all) zones.

Note: For more information about region-specific considerations, seeGeography and regions.

Favor managed services

Rather than independently installing, supporting, and operating all parts ofyour application stack, you can use managed services to consume parts of yourapplication stack as services. For example, rather than installing and managinga MySQL database on virtual machines (VMs), you can instead use a MySQL databaseprovided byCloud SQL.You then get an availabilityService Level Agreement (SLA) and can rely on Google Cloud to manage data replication, backups, and theunderlying infrastructure. By using managed services, you can spend less timemanaging infrastructure, and more time on improving the reliability of your app.

Many of Google Cloud's managed compute, database, and storage servicesoffer built-in redundancy, which can help you meet your availability goals. Manyof these services offer aregional model, which means the infrastructure thatruns your app is located in a specific region and is managed by Google to beredundantly available across all the zones within that region. If a zone becomesunavailable, your app or data automatically serves from another zone in theregion.

Certain database and storage services also offermulti-regional availability,which means that the infrastructure that runs your app is located in severalregions. Multi-regional services can withstand the loss of an entire region, buttypically at the cost of higher latency.

Load-balance at each tier

Load balancing lets you distribute traffic among groups of resources.When you distribute traffic, you help ensure that individual resources don'tbecome overloaded while others sit idle. Most load balancers also providehealth-checking features to help ensure that traffic isn't routed to unhealthyor unavailable resources.

Google Cloud offers several load-balancing choices. If your app runs onCompute Engine or GKE, you can choose the mostappropriate type of load balancer depending on the type, source, and otheraspects of the traffic. For more information, see theload-balancing overview and GKEnetworking overview.

Alternatively, some Google Cloud-managed services, suchas App Engine and Cloud Run, automatically load-balance traffic.

It's common practice to load-balance requests received from external sources,such as from web or mobile clients. However, using load balancers betweendifferent services or tiers within your app can also increase resilience andflexibility. Google Cloud provides internallayer 4 andlayer 7 load balancing for this purpose.

The following diagram shows an external load balancer distributing globaltraffic across two regions,us-central1 andasia-east1. It also showsinternal load balancing distributing traffic from the web tier to the internaltier within each region.

Distributing global traffic across regions.

Monitor your infrastructure and apps

Before you can decide how to improve the resilience and scalability of yourapp, you need to understand its behavior. Having access to a comprehensive setof relevant metrics and time series about the performance and health of your appcan help you discover potential issues before they cause an outage. They canalso help you diagnose and resolve an outage if it does occur. Themonitoring distributed systems chapter in the GoogleSRE book provides a good overview of some approaches to monitoring.

In addition to providing insight into the health of your app, metrics can alsobe used to control autoscaling behavior for your services.

Cloud Monitoring is Google Cloud's integrated monitoring tool. Cloud Monitoringingests events, metrics, and metadata, and provides insights through dashboardsand alerts. Most Google Cloud services automatically sendmetrics to Cloud Monitoring, and Google Cloud also supports many third-partysources. Cloud Monitoring can also be used as a backend for popular opensource monitoring tools, providing a "single pane of glass" withwhich to observe your app.

Monitor at all levels

Gathering metrics at various levels or tiers within your architecture providesa holistic picture of your app's health and behavior.

Infrastructure monitoring

Infrastructure-level monitoring provides the baseline health and performancefor your app. This approach to monitoring captures information like CPUload, memory usage, and the number of bytes written to disk. These metricscan indicate that a machine is overloaded or is not functioning as expected.

In addition to the metrics collected automatically,Cloud Monitoring provides anagent that can be installed to collect more detailed information fromCompute Engine VMs, including from third-party apps running on thosemachines.

App monitoring

We recommend that you capture app-level metrics. For example, you might wantto measure how long it takes to execute a particular query, or how long ittakes to perform a related sequence of service calls. You define theseapp-level metrics yourself. They capture information that the built-inCloud Monitoring metrics cannot. App-level metrics can captureaggregated conditions that more closely reflect key workflows, and they canreveal problems that low-level infrastructure metrics don't.

We also recommend usingOpenTelemetry to capture your app-level metrics. OpenTelemetry provides a single open standardfor telemetry data. Use OpenTelemetry to collect and export data from yourcloud first applications and infrastructure. You can then monitor andanalyze the exported telemetry data.

Service monitoring

For distributed and microservices-driven apps, it's important to monitor theinteractions between the different services and components in your apps. Thesemetrics can help you diagnose problems like increased numbers of errors orlatency between services.

Cloud Service Meshis a service mesh available on Google Cloud that lets you manage, observe,measure and secure your microservices on your chosen infrastructure, on and offGoogle Cloud.

End-to-end monitoring

End-to-end monitoring, also calledend-user monitoring, tests externallyvisible behavior the way a user sees it. This type of monitoring checks whethera user is able to complete critical actions within your defined thresholds. Thiscoarse-grained monitoring can uncover errors or latency that finer-grainedmonitoring might not, and it reveals availability as perceived by the user.

Expose the health of your apps

A highly available system must have some way of determining which parts of thesystem are healthy and functioning correctly. If certain resources appearunhealthy, the system can send requests elsewhere. Typically health checksinvolvepulling data from an endpoint to determine the status or health of aservice.

Health checking is a key responsibility of load balancers. When you create aload balancer that is associated with a group of virtual machine instances, youalso define ahealth check.The health check defines how the load balancer communicates with the virtualmachines to evaluate whether particular instances should continue to receivetraffic. Load-balancer health checks can also be used toautoheal groups of instances such that unhealthy machines are re-created. If you arerunning on GKE and load-balancing external traffic through aningress resource, GKE automatically creates appropriate healthchecks for the load balancer.

Kubernetes has built-in support for liveness and readiness probes. These probeshelp the Kubernetes orchestrator decide how to manage pods and requests withinyour cluster. If your app is deployed on Kubernetes, it's a good idea to exposethe health of your app to these probes through appropriate endpoints.

Establish key metrics

Monitoring and health checking provide you with metrics on the behavior andstatus of your app. The next step is to analyze those metrics to determine whichare the most descriptive or impactful. The key metrics vary, depending on theplatform that the app is deployed on, and on the work that the app is doing.

You're not likely to find just one metric that indicates whether to scale yourapp, or that a particular service is unhealthy. Often it's a combinationof factors that together indicate a certain set of conditions. WithCloud Monitoring, you can createcustom metrics to help capture these conditions. The Google SRE book advocatesfour golden signals for monitoring a user-facing system: latency, traffic, errors, and saturation.

Also consider your tolerance for outliers. Using an average or median value tomeasure health or performance might not be the best choice, because thesemeasures can hide wide imbalances. It's therefore important to consider themetricdistribution; the 99th percentile might be a more informative measurethan the average.

Define service level objectives (SLOs)

You can use the metrics that are collected by your monitoring system to defineservice level objectives (SLOs). SLOs specify a target level of performance orreliability for your service. SLOs are a key pillar of SRE practices and aredescribed in detail in theservice level objectives chapter in the SRE book, and also in theimplementing SLOs chapter in the SRE workbook.

You can useservice monitoring to define SLOs based on the metrics in Cloud Monitoring. You can createalerting policies on SLOs to let you know whether you are in danger of violatingan SLO.

Store the metrics

Metrics from your monitoring system are useful in the short term to help withreal-time health checks or to investigate recent problems.Cloud Monitoring retains your metrics forseveral weeks to best meet those use cases.

However, there is also value in storing your monitoring metrics forlonger-term analysis. Having access to a historical record can help you adopt adata-driven approach to refining your app architecture. You can use datacollected during and after an outage to identify bottlenecks andinterdependencies in your apps. You can also use the data to help create andvalidate meaningful tests.

Historical data can also help validate that your app is supporting businessgoals during key periods. For example, the data can help you analyze how yourapp scaled during high-traffic promotional events over the course of thelast few quarters or even years.

For details on how to export and store your metrics, see theCloud Monitoring metric export solution.

Determine scaling profile

You want your app to meet its user experience and performance goals withoutover-provisioning resources.

The following diagram shows how a simplified representation of an app's scalingprofile. The app maintains a baseline level of resources, and uses autoscalingto respond to changes in demand.

App scaling profile.

Balance cost and user experience

Deciding whether to scale your app is fundamentally about balancingcost against user experience. Decide what your minimum acceptable level ofperformance is, and potentially also where to set a ceiling. These thresholdsvary from app to app, and also potentially across different components orservices within a single app.

For example, a consumer-facing web or mobile app might have strict latencygoals.Research shows that even small delays can negatively impact how users perceive your app,resulting in lower conversions and fewer signups. Therefore, it's important toensure that your app has enough serving capacity to quickly respond to userrequests. In this instance, the higher costs of running more web servers mightbe justified.

The cost-to-performance ratio might be different for a non-business-criticalinternal app where users are probably more tolerant of small delays. Hence, yourscaling profile can be less aggressive. In this instance, keeping costs lowmight be of greater importance than optimizing the user experience.

Set baseline resources

Another key component of your scaling profile is deciding on an appropriateminimum set of resources.

Compute Engine virtual machines or GKE clusters typicallytake time to scale up, because new nodes need to be created and initialized.Therefore, it might be necessary to maintain a minimum set of resources, even ifthere is no traffic. Again, the extent of baseline resources is influenced bythe type of app and traffic profile.

Conversely, serverless technologies like App Engine, Cloud Run functions,and Cloud Run are designed to scale to zero, and to start up andscale quickly, even in the instance of a cold start. Depending on the type ofapp and traffic profile, these technologies can deliver efficiencies for partsof your app.

Configure autoscaling

Autoscaling helps you to automatically scale the computing resources consumed by your app.Typically, autoscaling occurs when certain metrics are exceeded or conditionsare met. For example, if request latencies to your web tier start exceeding acertain value, you might want to automatically add more machines to increaseserving capacity.

Many Google Cloud compute products have autoscaling features. Serverlessmanaged services like Cloud Run, Cloud Run functions, andApp Engine are designed to scale quickly. These servicestypically offer configuration options to limit or influence autoscalingbehavior, but in general, much of the autoscaler behavior is hidden from theoperator.

Compute Engine and GKE provide more options to controlscaling behavior. With Compute Engine, you can scale based onvarious inputs,including Cloud Monitoring custom metrics and load-balancer servingcapacity. You can set minimum and maximum limits on the scaling behavior, andyou can definean autoscaling policy with multiple signals to handle different scenarios. As with GKE, you canconfigure thecluster autoscaler to add or remove nodes based on workload or podmetrics,or on metricsexternal to the cluster.

We recommend that you configure autoscaling behavior based on key appmetrics, on your cost profile, and on your defined minimum required level ofresources.

Minimize startup time

For scaling to be effective, it must happen quickly enough to handle theincreasing load. This is especially true when adding compute or servingcapacity.

Use pre-baked images

If your app runs on Compute Engine VMs, you likely need to installsoftware and configure the instances to run your app. Although you can usestartup scripts to configure new instances, a more efficient way is to create acustom image.A custom image is a boot disk that you set up with your app-specific softwareand configuration.

For more information on managing images, see theimage-management best practices article.

When you've created your image, you can define aninstance template.Instance templates combine the boot disk image, machine type, and other instanceproperties. You can then use an instance template to create individual VMinstances or amanaged instance group.Instance templates are a convenient way to save a VM instance's configuration soyou can use it later to create identical new VM instances.

Although creating custom images and instance templates can increase yourdeployment speed, it can also increase maintenance costs because the imagesmight need to be updated more frequently. For more information, see thebalancing image configuration and deployment speed documents.

Containerize your app

An alternative to building customized VM instances is to containerize your app.Acontainer is a lightweight, standalone, executable package of software that includeseverything needed to run an app: code, runtime, system tools, systemlibraries, and settings. These characteristics make containerized appsmore portable, easier to deploy, and easier to maintain at scale than virtualmachines. Containers are also typically fast to start, which makes them suitablefor scalable and resilient apps.

Google Cloud offers several services to run your app containers.Cloud Run provides a serverless, managed compute platform to host your statelesscontainers. TheApp Engine Flexible environment hosts your containers in a managed platform as a service (PaaS).GKE provides a managed Kubernetes environment to host and orchestrate yourcontainerized apps. You can also run your appcontainers on Compute Engine when you need complete control over your container environment.

Optimize your app for fast startup

In addition to ensuring your infrastructure and app can be deployed asefficiently as possible, it's also important to ensure your app comes onlinequickly.

The optimizations that are appropriate for your app vary depending on the app'scharacteristics and execution platform. It's important to do the following:

  • Find and eliminate bottlenecks by profiling the critical sections ofyour app that are invoked at startup.
  • Reduce initial startup time by implementing techniques like lazyinitialization, particularly of expensive resources.
  • Minimize app dependencies that might need to be loaded at startup time.

Favor modular architectures

You can increase the flexibility of your app by choosing architectures thatenable components to be independently deployed, managed, and scaled. Thispattern can also improve resiliency by eliminating single points of failure.

Break your app into independent services

If you design your app as a set of loosely coupled, independent services, youcan increase your app's flexibility. If you adopt a loosely coupled design,it lets your services be independently released and deployed. In addition tomany other benefits, this approach enables those services to use differenttech stacks and to be managed by different teams. This loosely coupled approachis the key theme of architecture patterns like microservices and SOA.

As you consider how to draw boundaries around your services, availability andscalability requirements are key dimensions. For example, if a given componenthas a different availability requirement or scaling profile from your othercomponents, it might be a good candidate for a standalone service.

Aim for statelessness

A stateless app or service does not retain any local persistent data or state.A stateless model ensures that you can handle each request or interaction withthe service independent of previous requests. This model facilitatesscalability and recoverability, because it means that the service can grow,shrink, or be restarted without losing data that's required in order to handleany in-flight processes or requests. Statelessness is especially important whenyou are using an autoscaler, because the instances, nodes, or pods hosting theservice can be created and destroyed unexpectedly.

It might not be possible for all your services to be stateless. In such a case,be explicit about services that require state. By ensuring clean separation ofstateless and stateful services, you can ensure straightforward scalability forstateless services while adopting a more considered approach for statefulservices.

Manage communication between services

One challenge with distributed microservices architectures ismanaging communication between services. As your network of services grows, it'slikely that service interdependencies will also grow. You don't want the failureof one service to result in the failure of other services, sometimes called acascading failure.

You can help reduce traffic to an overloaded service or failing service byadopting techniques like thecircuit breaker pattern,exponential backoffs,andgraceful degradation.These patterns increase the resiliency of your app either by giving overloadedservices a chance to recover, or by gracefully handling error states. For moreinformation, see theaddressing cascading failures chapter in the Google SRE book.

Using aservice mesh can help you manage traffic across your distributed services. A service mesh issoftware that links services together, and helps decouple business logic fromnetworking. A service mesh typically provides resiliency features like requestretries, failovers, and circuit breakers.

Use appropriate database and storage technology

Certain databases and types of storage are difficult to scale and makeresilient. Make sure that your database choices don't constrain yourapp's availability and scalability.

Evaluate your database needs

The pattern of designing your app as a set of independent services also extendsto your databases and storage. It might be appropriate to choose differenttypes of storage for different parts of your app, which results in heterogeneousstorage.

Conventional apps often operate exclusively with relational databases. Relationaldatabases offer useful functionality such as transactions, strongconsistency, referential integrity, and sophisticated querying across tables.These features make relational databases a good choice for many common appfeatures. However, relational databases also have some constraints. They aretypically hard to scale, and they require careful management in ahigh-availability configuration. A relational database might not be the bestchoice for all your database needs.

Non-relational databases, often referred to as NoSQL databases, take adifferent approach. Although details vary across products, NoSQL databasestypically sacrifice some features of relational databases in favor of increasedavailability and easier scalability. In terms of theCAP theorem,NoSQL databases often choose availability over consistency.

Whether a NoSQL database is appropriate often comes down to the required degreeof consistency. If your data model for a particular service does not require allthe features of an RDBMS, and can be designed to be eventually consistent,choosing a NoSQL database might offer increased availability and scalability.

In the realm of data management, relational and non-relational databases areoften seen as complementary rather than competing technologies. By using bothtypes of databases strategically, organizations can harness the strengths ofeach to achieve optimal results in data storage, retrieval, and analysis.

In addition to a range of relational and NoSQL databases, Google Cloudalso offersSpanner,a strongly consistent, highly available, and globally distributed database withsupport for SQL. For information about choosing an appropriate database onGoogle Cloud, seeGoogle Cloud databases.

Implement caching

A cache's primary purpose is to increase data retrieval performance by reducingthe need to access the underlying slower storage layer.

Caching supports improved scalability by reducing reliance on disk-basedstorage. Because requests can be served from memory, request latencies to thestorage layer are reduced, typically allowing your service to handle morerequests. In addition, caching can reduce the load on services that aredownstream of your app, especially databases, allowing other components thatinteract with that downstream service to also scale more, or at all.

Caching can also increase resiliency by supporting techniques like gracefuldegradation. If the underlying storage layer is overloaded or unavailable, thecache can continue to handle requests. And even though the data returned fromthe cache might be incomplete or not up to date, that might be acceptable forcertain scenarios.

Memorystore for Redis provides a fully managed service that is powered by the Redis in-memorydatastore. Memorystore for Redis provides low-latency access and high throughputfor heavily accessed data. It can be deployed in a high-availabilityconfiguration that provides cross-zone replication and automatic failover.

Modernize your development processes and culture

DevOps can be considered a broad collection of processes, culture, and toolingthat promote agility and reduced time-to-market for apps and features bybreaking down silos between development, operations, and related teams. DevOpstechniques aim to improve the quality and reliability of software.

A detailed discussion of DevOps is beyond the scope of this document, but somekey aspects that relate to improving the reliability and resilience of your appare discussed in the following sections. For more details, see theGoogle CloudDevOps page.

Design for testability

Automated testing is a key component of modern software delivery practices.The ability to execute a comprehensive set of unit, integration, and systemtests is essential to verify that your app behaves as expected, and that it canprogress to the next stage of the deployment cycle. Testability is a key designcriterion for your app.

We recommend that you use unit tests for the bulk of your testing because theyare quick to execute and typically straightforward to maintain. We alsorecommend that you automate higher-level integration and system tests. Thesetests are greatly simplified if you adopt infrastructure-as-code techniques,because dedicated test environments and resources can be created on demand, andthen torn down once tests are complete.

As the percentage of your codebase covered by tests increases, you reduceuncertainty and the potential decrease in reliability from each code change.Adequate testing coverage means that you can make more changes beforereliability falls below an acceptable level.

Automated testing is an integral component ofcontinuous integration. Executing a robust set of automated tests on each codecommit provides fast feedback on changes, improving the quality and reliabilityof your software. Google Cloud–native tools likeCloud Build and third-party tools likeJenkins can help you implement continuous integration.

Automate your deployments

Continuous integration and comprehensive test automation give you confidence inthe stability of your software. And when they are in place, your next step isautomating deployment of your app. The level of deployment automation variesdepending on the maturity of your organization.

Choosing an appropriate deployment strategy is essential in order to minimizethe risks associated with deploying new software. With the right strategy, youcan gradually increase the exposure of new versions to larger audiences,verifying behavior along the way. You can also set clear provisions for rollbackif problems occur.

Adopt SRE practices for dealing with failure

For distributed apps that operate at scale, some degree of failure in one ormore components is common. If you adopt the patterns covered in this document,your app can better handle disruptions caused by a defective software release,unexpected termination of virtual machines, or even an infrastructure outagethat affects an entire zone.

However, even with careful app design, you inevitably encounter unexpectedevents that require human intervention. If you put structured processes in placeto manage these events, you can greatly reduce their impact and resolve themmore quickly. Furthermore, if you examine the causes and responses to the event,you can help protect your app against similar events in the future.

Strong processes formanaging incidents and performingblameless postmortems are key tenets of SRE. Although implementing the full practices of Google SREmight not be practical for your organization, if you adopt even a minimum set ofguidelines, you can improve the resilience of your app. The appendixes in theSRE book contain some templates that can help shape your processes.

Validate and review your architecture

As your app evolves, user behavior, traffic profiles, and even businesspriorities can change. Similarly, other services or infrastructure that your appdepends on can evolve. Therefore, it's important to periodically test andvalidate the resilience and scalability of your app.

Test your resilience

It's critical to test that your app responds to failures in the way you expect.The overarching theme is that the best way to avoid failure is to introducefailure and learn from it.

Simulating and introducing failures is complex. In addition to verifying thebehavior of your app or service, you must also ensure that expected alerts aregenerated, and appropriate metrics are generated. We recommend a structuredapproach, where you introduce basic failures and then escalate.

For example, you might proceed as follows, validating and documenting behaviorat each stage:

  • Introduce intermittent failures.
  • Block access to dependencies of the service.
  • Block all network communication.
  • Terminate hosts.

For details, see theBreaking your systems to make them unbreakable video from Google Cloud Next 2019.

If you're using a service mesh like Istio to manage your app services, you caninject faults at the application layer instead of killing pods or machines, or you can injectcorrupting packets at the TCP layer. You can introduce delays to simulatenetwork latency or an overloaded upstream system. You can also introduce aborts,which mimic failures in upstream systems.

Test your scaling behavior

We recommend that you use automated nonfunctional testing to verify that yourapp scales as expected. Often this verification is coupled with performance orload testing. You can use tools likehey to send load to a web app. For a more detailed example that shows how to do loadtesting against a REST endpoint, seeDistributed load testing using Google Kubernetes Engine.

One common approach is to ensure that key metrics stay within expected levelsfor varying loads. For example, if you're testing the scalability of your webtier, you might measure the average request latencies for spiky volumes of userrequests. Similarly, for a backend processing feature, you might measure theaverage task-processing time when the volume of tasks suddenly increases.

Also, you want your tests to measure that the number of resources that werecreated to handle the test load is within the expected range. For example, yourtests might verify that the number of VMs that were created to handle somebackend tasks does not exceed a certain value.

It's also important to test edge cases. What is thebehavior of your app or service when maximum scaling limits are reached? What isthe behavior if your service is scaling down and then load suddenly increasesagain?

Always be architecting

The technology world moves fast, and this is especially true of the cloud. Newproducts and features are released frequently, new patterns emerge, andthe demands from your users and internal stakeholders continue to grow.

As theprinciples for cloud-native architecture blog post defines, always be looking for ways to refine, simplify, and improvethe architecture of your apps. Software systems are living things and need toadapt to reflect your changing priorities.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-05-05 UTC.