You are viewing archived v1.24 Service Mesh documentation.
Available versions
Cloud Service Mesh latest
Cloud Service Mesh 1.26 archive
Cloud Service Mesh 1.24 archive
Cloud Service Mesh 1.24 archive
Cloud Service Mesh 1.23 archive
Cloud Service Mesh 1.22 archive
Cloud Service Mesh 1.21 archive
Cloud Service Mesh 1.20 archive
Anthos Service Mesh 1.19 archive
Designing SLOs
Note: This guide only supports Cloud Service Mesh with Istio APIs and doesnot support Google Cloud APIs. For more information see,Cloud Service Mesh overview.This page provides information that you might need beforecreating a service level objective (SLO).
For an introduction to SLOs, see theService level objectives overview.
SLI type and compliance targets
Cloud Service Mesh supports the following types of service level indicators:
- Latency: How long it takes a service to return a response to a request,measured in milliseconds.
- Availability: The fraction of the time that a service respondssuccessfully.
- Other: Customizable SLO type based on your configurable metrics.
You also define the compliance target that you want from your service. Ingeneral, SLOs shouldn't be higher than is necessary or meaningful for yourusers. Consider at what point users might notice service degradation. Forexample, if your users cannot tell the difference between a latencyof 300ms or 500ms for your service, use the higher value as the latencythreshold in the SLO. The lower value is more expensive to meet, and your userswon't notice the difference.
When you set a compliance target, consider the end-user requirements for yourservice. For example, an internal tool used by employees to book vacation timemight be fine with a 99% availability target (~3 days of downtime per year). Buta critical service for an online store might need 99.999% availability (~5minutes of downtime per year).
Compliance periods
In addition to defining a target for an SLI, an SLO specifies a period of timein which the SLI is being measured. For example, 99% availability over a singleday is different from 99% availability over a month. The first SLO would notpermit more than 14 minutes of consecutive downtime (24 hrs * 1%), whereas thesecond SLO would allow consecutive downtime up to ~7 hours (30 days * 1%).
The compliance period is particularly important when an SLO is included in aservice level agreement (SLA) with your users. An SLA is a contract with theusers of your service that typically specifies the consequences of not meetingthe SLOs. Whether or not you have an SLA with your users is a product orbusiness decision, but for monitoring purposes, you still need to specify acompliance period for your SLOs when you create them.
When you configure SLOs, you choose the type of compliance period:
Calendar: When you selectCalendar as thePeriod Type, youalso specify thePeriod Length, which can be a day, week or month.Periods are non-overlapping and fixed to the calendar start and end dates.Compliance can only be evaluated at the end of the period.
Rolling: When you selectRolling as thePeriod Type, youalso specify the number of days for thePeriod Length, for example, 30days. Unlike Calendar periods, rolling periods don't have fixed start andend dates. Cloud Service Mesh continually evaluates SLOs with a rollingcompliance period. The oldest data in the previous calculation drops out ofthe current calculation as it is replaced by new data. A rolling time periodprovides more compliance measurements because each day, you get a measure ofcompliance for the last 30 days, rather than one per month. However,services can hover between compliance and noncompliance as the SLO statuschanges daily.
Error budgets
Another important monitoring concept is the error budget. An SLO specifies anSLI and a target value that measures success of the service in the complianceperiod. The error budget for an SLO represents the total amount of time that aservice can be noncompliant before it is in violation of its SLO. Thus, an errorbudget is100% - SLO%. For example, if you have a rolling 30-day availabilitySLO with a 99.99% compliance target, your error budget is 0.01% of 30 days:just over 4 minutes of allowed downtime each 30 days. A service required to meeta 100% SLO has no error budget.
Error budgets let you track how many bad SLI measurements are allowed to occurduring the remainder of your compliance period before the service violates theSLO. You can use the error budget to help manage maintenance tasks likedeployment of new versions. When the error budget is close to depleted, it's nota good time to take risky actions like deploying new updates. Conversely,if you have a full error budget near the end of a compliance period, you mightwant launch new features since the risk of violating the SLO is lower.
If you are measuring an SLO with a calendar compliance period, Service Meshstarts the error budget at the maximum value and reduces the budget over time,triggering an SLO violation when the error budget drops below 0.Cloud Service Mesh resets the SLO's error budget at the end of thecompliance period.
If you are measuring an SLO over a rolling compliance period, you areeffectively always at the end of a compliance period. Rather than starting fromscratch, old data points are continuously dropped and new data points arecontinuously added. If a period of poor compliance rolls out of the compliancewindow, and if the SLO is compliant, the error budget goes up. At any point intime, anerror budget ≥ 0 indicates a compliant rolling SLO window, and anerror budget < 0 indicates a non-compliant rolling SLO window.
What's next
Learn more about SLOs from Site Reliability Engineering at Google:
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.