Migrate to Google Cloud: Optimize your environment

This document helps you plan and design the optimization phase of yourmigration to Google Cloud. After you've deployed your workloadsin Google Cloud, you can start optimizing your environment.

This document is part of the following multi-part series about migrating toGoogle Cloud:

The following diagram illustrates the path of your migration journey.

Migration path with four phases.

In the optimization phase, you refine your environment to make it moreefficient than your initial deployment.

This document is useful if you're planning to optimize an existing environmentafter migrating to Google Cloud, or if you're evaluating theopportunity to optimize and want to explore what it might look like.

The structure of the optimization phase follows the migration frameworkdescribed in this series:assess, plan, deploy, and optimize. You can use this versatile framework to planyour entire migration and to break down independent actions in each phase. Whenyou've completed the last step of the optimization phase, you can start thisphase over and find new targets for optimization. The optimization phase isdefined as anoptimization loop. An execution of the loop is defined as anoptimization iteration.

Optimization is an ongoing and continuous task. You constantly optimize yourenvironment as it evolves. To avoid uncontrolled and duplicative efforts, youcan set measurable optimization goals and stop when you meet these goals. Afterthat, you can always set new and more ambitious goals, but consider thatoptimization has a cost, in terms of resources, time, effort, and skills.

The following diagram shows the optimization loop.

For a larger image of this diagram, seeOptimization decision tree.

In this document, you perform the following repeatable steps of theoptimization loop:

Assess your environment, teams, and the optimization loop thatyou're following.
Establish optimization requirements and goals.
Optimize your environment and train your teams.
Tune the optimization loop.

This document discusses some of thesite reliability engineering (SRE) principles and concepts. Google developed the SRE discipline to efficiently andreliably run a global infrastructure serving billions of users.Adopting the complete SRE discipline in your organization might be impracticalif you need to modify many of your business and collaboration processes. Itmight be simpler to apply a subset of the SRE discipline that best suits yourorganization.

Assess your environment, teams, and optimization loop

Before starting any optimization task, you need to evaluate your environment.You also need to assess your teams's skills because optimizing your environmentmight require skills that your teams might lack. Finally, you need to assessthe optimization loop. The loop is a resource that you can optimize likeany other resource.

Assess your environment

You need a deep understanding of your environment. For any successfuloptimization, you need to understand how your environment works and you need toidentify potential areas of improvement. This assessment establishes a baselineso that you can compare your assessment against the optimization phase and thenext optimization iterations.

Migrate to Google Cloud: Assess and discover your workloads contains extensive guidance about assessing your workloads andassessing your environments. If you recently completed a migration toGoogle Cloud, you already have detailed information on how yourenvironment is configured, managed, and maintained. Otherwise, you usethat guidance to assess your environment.

Assess your teams

When you have a clear understanding of your environment, assess yourteams to understand their skills. You start by listing all skills, the level ofexpertise for each skill, and which team members are the most knowledgeable foreach skill. Use this assessment in the next phase to discover any missingskills that you need to meet your optimization goals. For example, ifyou start using a managed service, you need the skills to provision, configure,and interact with that service. If you want to add a caching layer to anapplication in your environment by usingMemorystore,you need expertise to use that service.

Take into account that optimizing your environment might impact yourbusiness and collaboration processes. For example, if you start using a fullymanaged service instead of a self-managed one, you can give your operators moretime toeliminate toil.

Assess your optimization loop

The optimization loop is a resource that you can optimize too. Use the datagathered in this assessment to gain clear insights into how your teams performedduring the last optimization iteration. For example, if you aim to shorten theiteration duration, you need data about your last iteration, including itscomplexity and the goals you were pursuing. You alsoneed information about all blockers that you encountered during the lastiteration to ensure that you have a mitigation strategy if those blockersreoccur.

If this optimization iteration is the first one, you might not have enoughdata to establish a baseline to compare your performance. Draft a set ofhypotheses about how you expect your teams to perform during the firstiteration. After the first optimization iteration, evaluate the loop andyour teams' performance and compare it against the hypotheses.

Establish your optimization requirements and goals

Before starting any optimization task, draft a set of clearly measurable goalsfor the iteration.

In this step, you perform the following activities:

Define your optimization requirements.
Set measurable optimization goals according to your optimizationrequirements.

Define your optimization requirements

You list your requirements for the optimization phase. A requirement expressesa need for improvement and doesn't necessarily have to be measurable.

Starting from a set of quality characteristics for your workloads, yourenvironment, and your own optimization loop, you can draft a questionnaireto guide you in setting your requirements. The questionnaire covers thecharacteristics that you find valuable for your environment, processes, andworkloads.

There are many sources to guide you in defining the quality characteristics.For example, theISO/IEC 25010 standard defines the quality characteristics for a software product, or you canreview theGoogle Cloud setup checklist.

For example, the questionnaire can ask the following questions:

Can your infrastructure and its components scale vertically orhorizontally?
Does your infrastructure support rolling back changes without manualintervention?
Do you already have a monitoring system that covers your infrastructureand your workloads?
Do you have an incident management system for your infrastructure?
How much time and effort does it take to implement the plannedoptimizations?
Were you able to meet all goals in your past iterations?

Starting from the answers to the questionnaire, you draft the list ofrequirements for this optimization iteration. For example, your requirementsmight be the following:

Increase the performance of an application.
Increase the availability of a component of your environment.
Increase the reliability of a component of your environment.
Reduce the operational costs of your environment.
Shorten the duration of the optimization iteration to reduce theinherent risks.
Increase development velocity and reduce time-to-market.

When you have the list of improvement areas, evaluate the requirements in thelist. In this evaluation, you analyze your optimization requirements, look forconflicts, and prioritize the requirements in the list. For example, increasingthe performance of an application might conflict with operational costreduction.

Set measurable goals

After you finalize the list of requirements, define measurable goals for eachrequirement. A goal might contribute to more than one requirement. If you haveany area of uncertainty or if you're not able to define all goals that you needto cover your requirements, go back to the assessment phase of this iteration togather any missing information, and then refine your requirements.

For help defining these goals, you can follow one of the SRE disciplines, thedefinition of service level indicators (SLIs) and service level objectives (SLOs):

SLIs are quantitative measures of the level of service that youprovide. For example, a key SLI might be the average request latency, errorrate, or system throughput.
SLOs are target values or ranges of values for a service level that ismeasured by an SLI. For example, an SLO might be that the average requestlatency is lower than 100 milliseconds.

After defining SLIs and SLOs, you might realize that you're notgathering all metrics that you need to measure your SLIs. This metricscollection is the first optimization goal that you can tackle. You set the goalsrelated to extending your monitoring system to gather all metrics that you needfor your SLIs.

Optimize your environment and your teams

After assessing your environment, teams, and optimization loop, as well asestablishing requirements and goals for this iteration, you're ready to performthe optimization step.

In this step, you perform the following activities:

Measure your environment, teams, and optimization loop.
Analyze the data coming from these measurements.
Perform the optimization activities.
Measure and analyze again.

Measure your environment, teams, and optimization loop

You extend your monitoring system to gather data about the behavior ofyour environment, teams, and the optimization loop to establish a baselineagainst which you can compare after optimizing.

This activity builds on and extends what you did in theassessment phase.After youestablish your requirements and goals,you know which metrics to gather for yourmeasurements to be relevant to your optimization goals. For example, if youdefined SLOs and the corresponding SLIs to reduce the response latency for oneof the workloads in your environment, you need to gather data to measure thatmetric.

Understanding these metrics also applies to your teams and to the optimizationloop. You can extend your monitoring system to gather data so that you measurethe metrics relevant to your teams and the optimization loop. For example, ifyou have SLOs and SLIs to reduce the duration of the optimization iteration, youneed to gather data to measure that metric.

When you design the metrics that you need to extend the monitoring system, takeinto account that gathering data might affect the performance of yourenvironment and your processes. Evaluate the metrics that you need toimplement for your measurements, and their sample intervals, to understand ifthey might affect performance. For example, a metricwith a high sample frequency might degrade performance, so you needto optimize further.

On Google Cloud, you can useCloud Monitoring to implement the metricsthat you need to gather data.To implement custom metrics in your workloads directly, you can useCloud Client Libraries for Cloud Monitoring,orOpenTelemetry.If you're using Google Kubernetes Engine (GKE),you can useGKE usage metering to gather information about resource usage, such as CPU, GPU, and TPU usage, andthen divide resource usage bynamespace orlabel.

Finally, you can use theCloud Architecture Center andGoogle Cloud Whitepapers as starting points to find new skills that your teams might require to optimizeyour environment.

Analyze data

After gathering your data, you analyze and evaluate it to understand how yourenvironment, teams, and optimization loop are performing against youroptimization requirements and goals.

In particular, you evaluate your environment against the following:

SLOs.
Industry best practices.
An environment without any technical debt.

TheSLOs that you established according to your optimization goals can help you understand if you'remeeting your expectations. If you're not meeting your SLOs, you need to enhanceyour teams or the optimization loop. For example, if you established an SLO forthe response latency for a workload to be in a given percentileand that workload isn't meeting that mark, that is a signal that youneed to optimize that part of the workload.

Additionally, you can compare your situation against a set ofrecognized best practices in the industry. For example, theGoogle Cloud setup checklist helps you configure a production-ready environment for enterprise workloads.

After collecting data, you can consider how to optimize your environment tomake it more cost efficient. You canexport Cloud Billing data to BigQuery andanalyze data with Looker Studio to understand how many resources you're using, and extract anyspending pattern from it.

Finally, you compare your environment to one where you don't have anytechnical debt,to see whether you're meeting your long-term goals and to see if the technicaldebt is increasing. For example, you might establish an SLO for how manyresources in your environment you're monitoring versus how many resources you haveprovisioned since the last iteration. If you didn't extend the monitoring systemto cover those new resources, your technical debt increased. When analyzing thechanges in your technical debt, also consider the factors that led to thosechanges. For example, a business need might require an increment in technicaldebt, or it might be unexpected. Knowing the factors that caused a change inyour technical debt gives you insights for future optimization targets.

To monitor your environment on Google Cloud, you can useMonitoring to designcharts,dashboards,andalerts.You can thenroute Cloud Logging data for a more in-depth analysis and extended retention period. For example, you cancreate aggregated sinks and use Cloud Storage, Pub/Sub, or BigQuery as destinations.If you export data to BigQuery, you can thenuse Looker Studio to visualize data so that you can identify trends and make predictions. You can also useevaluation tools such asRecommender andSecurity Command Center to automatically analyze your environment and processes, looking foroptimization targets.

After you analyze all of the measurement data, you need to answer twoquestions:

Are you meeting your optimization goals?
If you answeredyes, then this optimization iteration is completed, andyou can start a new one. If you answeredno, you can move to the secondquestion.
Given the resources that you budgeted, can you achieve the optimizationgoals that you set for this iteration?

To answer this question, consider all resources that you need, suchas time, money, and expertise. If you answeredyes, you can move to thenext section; otherwise, refine your optimization goals, considering theresources you can use for this iteration. For example, if you'reconstrained by a fixed schedule, you might need to schedule someoptimization goals for the next iteration.

Optimize your teams

Optimizing the environment is a continuous challenge and can require skillsthat your teams might lack, which you discovered during theassessment and theanalysis.For this reason, optimizing your teams by acquiring new skills and making yourprocesses more efficient is crucial to the success of your optimizationactivities.

To optimize your teams, you need to do the following:

Design and implement a training program.
Optimize your team structure and culture.

For your teams to acquire the skills that they are missing, you need to designand implement a training program or choose one that professionalGoogle Cloud trainers prepared. For more information, seeMigrate to Google Cloud: Assess and discover your workloads.

While optimizing your teams, you might find that there is room to improvestructure and culture. It's difficult to prescribe an ideal situation upfront,because every company has its own history and idiosyncrasies that contributed tothe evolution of your teams' structure and culture.

Transformational leadership is a good starting point to learn general frameworks for executing and measuringorganizational changes aimed at adoptingDevOps practices. Forpractical guidance on how to implement an effective DevOps culture in yourorganization, refer toSite Reliability Engineering,a comprehensive description of the SRE methodology. TheSite Reliability Workbook,the companion to the book, uses concrete examples to show you how toput SRE principles and practices to work.

Optimize your environment

After measuring and analyzing metrics data, you know which areas you need tooptimize.

This section covers general optimization techniques for yourGoogle Cloud environment. You can also perform anyoptimization activity that's specific to your infrastructure and to the servicesthat you're using.

Codify everything

One of the biggest advantages of adopting a public cloud environment likeGoogle Cloud, is that you can use well-defined interfaces such asCloud APIs to provision, configure, and manage resources. You can use your own choice oftools to define your Infrastructure as Code (IaC) process, and your own choiceofversion control systems.

You can use tools such asTerraform to provision your Google Cloud resources, and then tools such asAnsible,Chef,orPuppet toconfigure your these resources.An IaC process helps you implement an effective rollback strategy for youroptimization tasks. You can revert any change that you applied to the code thatdescribes your infrastructure. Also, you can avoid unexpected failures whileupdating your infrastructure by testing your changes.

Furthermore, you can apply similar processes to codify other aspects of yourenvironment, like policies as code, using tools such asOpen Policy Agent,and operations as code, such asGitOps.

Therefore, if youadopt an IaC process in the early optimization iterations, you can define further optimizationactivities as code. You can also adopt the process gradually, so you canevaluate if it's suitable to your environment.

Automate everything

To completely optimize your entire environment, you need to useresources efficiently. This means that you need to eliminate toil to saveresources and to reinvest in more important tasks that produce value, likeoptimization activities.

Per the SRE recommendation,the way to eliminate toil is by increasing automation. Not all automation tasksrequire highly specialized software engineerings skills or great efforts.Sometimes a short executable script executed periodically can save several hoursper day. Google Cloud provides tools such asGoogle Cloud CLI and managed services such as Cloud APIs,Cloud Scheduler,Cloud Composer,andCloud Run that your teams can use to automate repetitive tasks.

Monitor everything

If you can't gather detailed measures about your environment, you can'timprove it, because you lack data to back up your assumptions. This means thatyou don't know what to do to meet your optimization goals.

A comprehensive monitoring system is a necessary component for yourenvironment. The system monitors allessential metrics that you need to evaluate for your optimization goals. When you design yourmonitoring system, plan to monitor thefour golden signals at minimum.

You can use managed services such as Monitoring andLoggingto monitor your environment without having to set up a complicated monitoringsolution.

You might need to implement a monitoring system that canmonitor hybrid and multicloud environments to satisfy data restriction policies that force you to store data only incertain physical locations, or services that use multiple cloud environmentssimultaneously.

Adopt a cloud-ready approach

Cloud-ready is a paradigm that describes anefficient way for designing and running an application on the cloud.TheCloud Native Computing Foundation (CNCF) definesacloud-native application as an application that is scalable, resilient, manageable, and observable bytechnologies such as containers, service meshes, microservices, immutableinfrastructure, and declarative APIs. Google Cloud providesmanaged services such as GKE,Cloud Run,Cloud Service Mesh,Logging, and Monitoring to empower users to designand run cloud-ready applications.

Learn more about cloud-ready technologies fromCNCF Trail Map andCNCF Cloud Native Interactive Landscape.

Cost management

Because of their different billing and cost models, optimizing costs of apublic cloud environment like Google Cloud is different thanoptimizing an on-premises environment.

For more information, seeMigrate to Google Cloud: Minimize costs.

Measure and analyze again

When you complete the optimization activities for this iteration, you repeatthe measurements and the analysis to check if you reached your goals. Answer thefollowing question:

Did you meet your optimization goals?
If you answeredyes, you can move to the next section.
If you answeredno, go back to the beginning of theOptimize your environment and your teams phase.

Tune the optimization loop

In this section, you update and modify the optimization loop that you followedin this iteration to better fit your team structure and environment.

Codify the optimization loop

To optimize the optimization loop efficiently, you need to document and definethe loop in a form that is standardized, straightforward, and manageable,allowing room for changes. You can use a fully managed service such asCloud Composer to create, schedule, monitor, and manage your workflows. You can also firstrepresent your processes with a language such as thebusiness process model and notation (BPMN).After that, you can codify these processes with a standardized language such asthebusiness process execution language (BPEL).After adopting IaC, describing your processes with code lets you manage them asyou do therest of your environment.

Automate the optimization loop

After you codify the optimization loop, you can automate repetitive tasks toeliminate toil, save time, and make the optimization loop more efficient. Youcan start automating all tasks where a human decision isnot required, such as measuring data and producing aggregate reports for yourteams to analyze. For example, you can automate data analysis withCloud Monitoringto check if your environment meets the SLOs that you defined. Given thatoptimization is a never-ending task and that you iterate on the optimizationloop, even small automations can significantly increase efficiency.

Monitor the optimization loop

As you did for all the resources in your environment, you need to monitor theoptimization loop to verify that it's working as expected and also look forbottlenecks and future optimization goals. You can start monitoring it bytracking how much time and how many resources your teams spent on eachoptimization step. For example, you can use an issue tracking system and aproject management tool to monitor your processes and extract relevantstatistics about metrics like issue resolution time and time to completion.

What's next

Read aboutBest practices for validating a migration plan.
Read theSRE books to learn about other concepts and techniques to prepare for optimization.
Learn when tofind help for your migrations.
For more reference architectures, diagrams, and best practices, explore theCloud Architecture Center.

Contributors

Author:Marco Ferrari | Cloud Solutions Architect

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-12-07 UTC.

Movatterモバイル変換

Migrate to Google Cloud: Optimize your environment Stay organized with collections Save and categorize content based on your preferences.