Migrate to Google Cloud: Best practices for validating a migration plan

Last reviewed 2025-05-05 UTC

This document describes the best practices for validating the plan to migrateyour workloads to Google Cloud. This document doesn't list all of thepossible best practices for validating a migration plan, and it doesn't give youguarantees of success. Instead, it helps you to stimulate discussions aboutpotential changes and improvements to your migration plan.

This document is useful if you're planning a migration from an on-premisesenvironment, from a private hosting environment, or from another cloud providerto Google Cloud. The document is also useful if you're evaluating theopportunity to migrate and want to explore what it might look like.

This document is part of the following multi-part series about migrating toGoogle Cloud:

Assessment

Performing a completeassessment of your workloads and environments helps to ensure that you develop a deep understanding of your workloads andenvironments. Developing this understanding helps you to minimize the risks ofissues happening during and after your migration to Google Cloud.

Make a complete assessment

Before you proceed with thesteps that follow the assessment phase,complete theassessment of your workloads and environments.To make a complete assessment, consider the following items, which are oftenoverlooked:

  • Inventory: Ensure that theinventory of the workloads to migrate is up to date and that you completed the assessment. For example, consider howfresh and reliable the source data is for your assessment, and what gaps mightexist in the data.
  • Downtimes: Identify the workloads that can handle downtimes. Determinewhen these downtimes can occur and for how long they can last to minimizedisruption, such as during late evenings or weekends. Migrating workloadswhile experiencing zero or nearly zero downtimes is harder than migratingworkloads that can afford downtimes. To complete a zero-downtime migration, you need to design for and implement redundancy for each workload tomigrate. You also need to coordinate these redundant instances.

    When you assess how much downtime a workload can tolerate, assesswhether the business benefit of a zero-downtime migration is greater thanthe added migration complexity. Where possible, avoid creating azero-downtime requirement for a workload.

  • Clustering and redundancy: Assess which workloads supportclustering and redundancy. If a workload supports clustering andredundancy, you can deploy multiple instances of that workload, even acrossdifferent environments, such as the source environment and the targetenvironment. Clustered and redundant deployments might simplify themigration because those workloads coordinate with each other with limitedintervention.

  • Configuration updates: Assess how you update the configuration ofyour workloads. For example, consider how you deliver updates to theconfiguration of each workload that you want to migrate. This considerationis critical for the success of your migration because you might have toupdate the configuration of your workloads while you migrate them to thetarget environment.

  • Generate multiple assessment reports: During the assessment phase,it might be useful to generate more than one assessment report to account fordifferent scenarios. For example, you can generate reports to take intoaccount different load profiles for your workloads, such as at- and off-peaktimes.

Assess the failure modes that your workloads support

Knowing how your workloads behave under exceptional circumstances helps you toensure that you don't expose them to conditions from which they can't recover.As part of the assessment, gather information about thefailure modes and their effects that your workloads support and can automatically recover from, and whichfailure modes need your intervention. For example, you can start by consideringquestions about possible failure modes, such as the following:

  • What happens if a workload loses connectivity to the network?
  • Is a workload able to resume its work from where it left off after beingstopped?
  • What happens if the performance of a workload or its dependencies isinadequate?
  • What happens if there are two workloads that have the same identifier inthe architecture?
  • What happens if a scheduled task doesn't run?
  • What happens if two workloads process the same message?

Another source for unsupported failure modes might be the migration plan itself.Determine whether your migration plan includes steps that depend on the successof a particular condition and whether it includes contingencies if the conditionis not met. A plan that includes these types of conditions can indicate that theplan itself might fail or that individual components might fail duringmigration.

After you assess those failure modes and their effects, validate your findingsin a non-critical environment by simulating failures and injecting faults thatemulate those failure modes. For example, if a workload is designed toautomatically recover after a network connectivity loss, validate the automaticrecovery by forcibly interrupting its connectivity and then restoring it.

Assess your data processing pipelines

Your workload assessment should be able to answer the following questions:

  • Are resources correctly sized for the migration?
  • How much time is required to migrate the data that your workloads need?
  • Can the target environment accommodate the full volume of data?
  • How do your workloads behave when they have to accommodate spikes indemand or spikes in the amount of data that they produce in a given timewindow?
  • If there are spikes in demand or spikes in the amount of data that yourworkloads produce, is there any adverse effect, such as increased latencyor delays in responses?
  • After your workloads start, do they need time to ramp up to the expectedlevels of performance?

The results of this assessment are often models of the demand that yourworkloads satisfy and the data that the workloads produce in a given timewindow. When you gather data points to produce such models, consider that thosedata points might vary significantly between peak and non-peak time windows. Formore information about how and what to monitor, seeService Level Objectives in the Site Reliability Engineering book.

Ensure that you can update and deploy each workload to migrate

During the migration, you might need to update some of the workloads that you'remigrating. For example, you might need to deploy a fix for an issue, or rollback a recent change that is causing an issue. For each workload that you'remigrating, ensure that you can apply and deploy changes. For example, if you'remigrating a workload for which you have the source code, ensure that you canaccess that source code, and that you can build, package, and deploy the sourcecode as needed.

Your migration might include workloads that you can't apply and deploy changesto, such as pre-existing and ready-made software. In that scenario, refactoryour migration plan to consider additional effort to mitigate the issues thatmight occur after you migrate those workloads.

Assess your network infrastructure

A functional network infrastructure is fundamental for the migration. You canuse the network infrastructure as part of your migration tooling. For example,you can use load balancers and DNS servers to direct traffic according to yourmigration plan.

To avoid issues during the migration, it's important to assess your networkinfrastructure and evaluate to what extent it can support your migration. Forexample, you can start by considering questions about your load-balancinginfrastructure, such as the following:

  • What happens when you reconfigure your load balancers?
  • How long does it take for the updated configuration to be in effect?
  • When migrating with zero downtime, what happens if you get a spike oftraffic before the updated configuration is in place?

After you consider questions about your load-balancing infrastructure, nextconsider questions about your DNS infrastructure, such as the following:

  • Which DNS records should you update to point them to the targetenvironment, and when should you update them?
  • Which clients are using those DNS records?
  • How is the time to live (TTL) configured for the DNS records that you planto update?
  • Can you set the DNS record TTL to its minimum value in seconds before youperform the migration?
  • Do your DNS clients and intermediaries respect the TTL of the DNSrecords that you plan to update? For example, do your applications haveclient-side DNScaching that ignores the TTL that you've configured for the migration?Remember that DNS resolution involves multiple layers of caching. ConsiderusingGoogle Public DNS to avoid ISP DNS issues.
  • Do your DNS clients respect the TTL of the DNS records to update? Forexample, do your applications have client-side DNS caching that ignores theTTL that you've configured for the migration?
  • Do you detect traffic directed at your source environment even after youcompleted the migration?

Consider creating a proof of concept

A proof of concept (POC) is a small, preliminary implementation of a plannedmigration project. It validates the feasibility, functionality, and potentialbenefits of that project before you commit to a full implementation. A POC helpsyou to determine whether the migration workloads function correctly in thetarget environment.

Start by defining the scope and the specific success criteria for the POC.Your success criteria could include metrics like full target workloadcompatibility, minimal migration downtime, and specific performance demands.

After you identify your success criteria, test and validate your POC. In yourmigration plan documentation, capture your findings, the challenges that youencountered, and any potential solutions to those challenges.

Consider creating a POC when you want to investigate the following areas ofinterest:

  • Validate migration feasibility: Verify that your applications,workloads, and data function as expected in Google Cloud.
  • Estimate downtime and plan for rollback: Measure the downtime that'srequired to migrate your workloads and to transfer data. Validate yourrollback scenarios.
  • Refine the migration plan: Use the following considerations torefine your plan before you commit to a full-scale migration:
    • Identify the best migration approach.
    • Identify your modernization or workload refactoring needs.
    • Identify the potential risks or issues with your migration.
    • Test the migration.
  • Perform security and compliance validation: Ensure that the securitypolicies, the Identity and Access Management (IAM) roles, and the compliancerequirements for your migration align with your organization's needs.
  • Build confidence and stakeholder buy-in: Help ensure stakeholdersatisfaction. A successful POC builds stakeholder confidence in themigration plan by demonstrating tangible benefits to your leadership andtechnical teams.
  • Estimate costs and optimization possibilities: Estimate the costs thatare associated with the migration. Explore optimization possibilities, suchas testing different target environment sizes and migration tools.

Iterate through several POCs. Adjust the target workloads and the migrationplan until you create a POC that fulfills your success criteria.

Migration planning

Thoroughly planning your migration helps you to avoid issues during and afterthe migration. Planning also helps you to avoid effort to deal withunanticipated tasks.

Develop a rollback strategy for each step of the migration plan

During the migration, any step of the migration plan that you execute mightresult in unanticipated issues. To ensure that you're able to recover from thoseissues, prepare a rollback strategy for each step of the migration plan. Toavoid losing time during an outage, do the following:

  • Ensure that your rollback strategies work by periodically reviewingand testing each rollback strategy.
  • Set a maximum-allowed execution time for each migration step. After thisallowed execution time expires, your teams start rolling back the migrationstep.

Even if you have rollback strategies ready for each step of the migration plan,some of those steps might still be potentially disruptive. A potentiallydisruptive step might cause some kind of loss even if you roll it back, such asa data loss. Assess which steps of the migration plan are potentiallydisruptive.

If you automated any step of the migration plan, ensure that you have apreplanned procedure for each automated step if there is a failure in theautomation. As with rollback strategies, periodically review and test eachpreplanned procedure.

If you set up communication channels as part of the migration, to ensurethat you aren't locked out from your environment, provision backup channelsthat you can use to recover from a failure. For example, if you're setting upPartner Interconnect,during the migration you can also set up a backup access through the publicinternet in case you experience any issues during provisioning andconfiguration.

Plan to modernize and adapt your workloads

When planning your migration to Google Cloud, remember that migrating andintegrating your workloads takes time and might present challenges. Considercreating an overview document that describes the general architecture of yourworkloads, including information about the following topics:

  • Dependencies on external systems and on third-party middleware, such asstorage, messaging, and hosting.
  • Mechanisms to authenticate and authorize workloads.
  • Processes to integrate with IAM.
  • Requirements for the runtime environment.
  • Interactions with storage layer options, likeCloud Storage andGoogle Cloud databases.
  • Requirements for your data transfer volume and bandwidth.
  • Changes in your application code that you might make during your migration.
  • Options for integrating withGoogle Cloud Observability.

Modernizing your workloads might be necessary, such as integratingGoogle Cloud libraries for authentication, authorization, storage, andobservability. Modernizing legacy libraries can require effort. Plan for enoughtime to thoroughly test your workloads.

Plan for gradual rollouts and deployments

To reduce the scope of issues and problems that might occur during themigration, avoid big-scale changes, and design your migration plan to graduallydeploy changes. For example, plan for gradual deployments and configurationchanges.

If you plan for gradual rollouts, to lower the risk of unanticipated issuescaused by the application of the changes, minimize the number and thesize of those changes. After you identify and resolve issues in your first smallrollout, you can make the subsequent rollouts for similar changes at largerscales.

Alert development and operations teams

To reduce the impact of issues that might occur during a migration, alert theteams that are responsible for any workload to migrate. Also alert the teamsthat are responsible for the infrastructure of both the source and targetenvironments.

If your teams work in different time zones and you practice thefollow-the-sun operating model, ensure the following:

  • Your teams properly cover those time zones and they cover multipleconsecutive shifts, because they might be unable to resolve issues during asingle shift.
  • Your teams are prepared to collect detailed information about the issuesthat they might face. This collection provides the engineers on the nextshift a complete understanding of what the previous shift did, and why.
  • Specific people in your teams are responsible for any given shift.

Remove proof-of-concept resources from the target production environment

As part of the assessment, you might have used the target environment to hostexperiments and proofs of concept. Before the migration, remove any resourcesthat you created during those experiments and proofs of concept from theproduction area of the target environment.

You can keep resources in a non-production area of the target environment whilethe migration is in progress because they might help you to gather informationabout any issue that might arise during the migration. For example, to diagnoseissues that affect your production workloads after the migration, you cancompare the configuration and data logs of the production workload against theconfiguration and data logs of the proofs of concept and experiments.

After you complete the migration and you validate that the target environmentworks as expected, you can delete the resources in the non-production area ofthe target environment.

Define criteria to safely retire the source environment

To avoid the cost of running two environments indefinitely, define whatconditions must be met for you to safely retire the source environment, such asthe following:

  • All workloads, including their backups and high availability anddisaster recovery mechanisms, are successfully migrated to the targetenvironment.
  • The data migrated to the target environment is consistent, accessible, andusable.
  • The accuracy and completeness of the migrated data fulfill the definedstandard.
  • Resources that remain in the source environment aren't dependencies forworkloads that are out of the migration scope.
  • The performance of your workloads on the target environment fulfill yourSLA targets.
  • Your monitoring systems report that there isn't any network traffic tothe source environment that should be directed to the target environment.
  • After the workloads are running without issue in the target environmentfor a period that you define, you are confident that you no longer need theability to fall back to the source environment.

Plan to update all documents and dashboards

After you complete the migration, plan to comprehensively update your productionoperation runbooks, your support documents, and your monitoring dashboards. Thechanges that you need to make to your documentation might include the followingitems:

  • Architectural diagrams: Update your architectural diagrams toreflect the Google Cloud architecture, especially if you modernize andrefactor your workloads.
  • Connection and authentication: Update your documentation onauthentication methods, such as IAM, and networkconfigurations, to reflect the Google Cloud architecture.
  • Security: Update your documentation that discussesGoogle Cloud security features, including encryption at rest and intransit, and IAM-based access controls.
  • Maintenance and scaling: Update your production operation runbookson managed services maintenance windows, vertical and horizontal scalingprocedures, and best practices for performance optimization.
  • High Availability and failover: Update your documentation for highavailability configurations, regional and zonal synchronizationconsiderations, and failover mechanisms.
  • Backup and recovery: Update your backup and recovery processes toalign with the processes that Google Cloud and theBackup and Disaster Recovery (DR) Service support. These processes include automated backups, point-in-time recoverypossibilities, and export and import procedures.
  • Disaster recovery: Update your DR plan and procedures to align withthe DR capabilities of Google Cloud and then test the updated procedures.
  • Monitoring and logging: Integrate Google Cloud Observability into your monitoringdashboards and alerting systems. Update your documentation onCloud Quotas and specify how to interpret logs, metrics and alerts.

Operations

To efficiently manage the source environment and the target environment duringthe migration, you need to engineer your operational processes as well.

Monitor your environments

To observe how your source and target environments are behaving and to help youdiagnose issues as they occur, set up the following:

  • A monitoring system to gather metrics that are useful to your scenario.
  • A logging system to observe the flow of operations that is performed byyour workloads and other components of your environments.
  • An alerting system that warns you before a problematic event occurs.

Google Cloud Observability supports integrated monitoring, logging, and alerting for yourGoogle Cloud environment.

Because a workload and its dependencies span multiple environments, you mightneed to consider using multiple monitoring and alerting tools for differentenvironments. Consider the timing of when you migrate the monitoring andalerting policies that support the workloads. For example, if your sourceenvironment is configured to alert when a particular server is down, the alerttriggers when you intentionally turn down that server. The alert trigger isexpected, but it's unhelpful behavior. As part of the migration, you need tocontinuously adjust the alerts for the source environment and reconfigure themfor the target environment.

Manage the migration

To manage the migration, you review the performance of the migration to gatherinformation that you can use as a retrospective after the migration is complete.After you gather information, you use it to analyze the migration performanceand to prepare data points about potential improvements to your environments.

For example, to start planning to manage the migration, consider the followingquestions:

  • How long did each step of the migration plan take?
  • Were there any steps of the migration plan that took more time tocomplete than anticipated?
  • Were there any missing steps or checks?
  • Did any adverse events occur during the migration?

What's next

Contributors

Author:Marco Ferrari | Cloud Solutions Architect

Other contributor:Alex Cârciu | Solutions Architect

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-05-05 UTC.