Migrate containers to Google Cloud: Migrate from Kubernetes to GKE

This document helps you plan, design, and implement your migration from aself-managed Kubernetes environment toGoogle Kubernetes Engine (GKE).If done incorrectly, moving apps from one environment to another can be achallenging task, so you need to plan and execute your migration carefully.

This document is useful if you're planning to migrate from a self-managedKubernetes environment to GKE. Your environment might berunning in an on-premises environment, in a private hosting environment, or inanother cloud provider. This document is also useful if you're evaluating theopportunity to migrate and want to explore what it might look like.

GKE is a Google-managed Kubernetes service that you can use todeploy and operate containerized applications at scale using Google'sinfrastructure, and provides features that help you manage your Kubernetesenvironment, such as:

  • Two modes of operation: Standard and Autopilot. WithStandard, you manage the underlying infrastructure and theconfiguration of each node in your GKE cluster. WithAutopilot, GKE manages theunderlying infrastructure such as node configuration, autoscaling,auto-upgrades, baseline security and network configuration. For moreinformation about GKE modes of operation, seeChoose a GKE mode of operation.
  • Industry-unique service level agreement for Pods when using Autopilot in multiple zones.
  • Automated node pool creation and deletion withnode auto-provisioning.
  • Google-managedmulti-cluster networking to help you design and implement highly available, distributed architectures for your workloads.

For more information about GKE, seeGKE overview.

For this migration to Google Cloud, we recommend that you followthe migration framework described inMigrate to Google Cloud: Get started.

The following diagram illustrates the path of your migration journey.

Migration path with four phases.

You might migrate from your source environment to Google Cloud in a seriesof iterations—for example, you might migrate some workloads first and otherslater. For each separate migration iteration, you follow the phases of thegeneral migration framework:

  1. Assess and discover your workloads and data.
  2. Plan and build a foundation on Google Cloud.
  3. Migrate your workloads and data to Google Cloud.
  4. Optimize your Google Cloud environment.

For more information about the phases of this framework, seeMigrate to Google Cloud: Get started.

To design an effective migration plan, we recommend that you validate each stepof the plan, and ensure that you have a rollback strategy. To help you validateyour migration plan, seeMigrate to Google Cloud: Best practices for validating a migration plan.

Assess your environment

In the assessment phase, you determine the requirements and dependencies tomigrate your source environment to Google Cloud.

The assessment phase is crucial for the success of your migration. You need togain deep knowledge about the workloads you want to migrate, their requirements,their dependencies, and about your current environment. You need to understandyour starting point to successfully plan and execute a Google Cloudmigration.

The assessment phase consists of the following tasks:

  1. Build a comprehensive inventory of your workloads.
  2. Catalog your workloads according to their properties and dependencies.
  3. Train and educate your teams on Google Cloud.
  4. Build experiments and proofs of concept on Google Cloud.
  5. Calculate the total cost of ownership (TCO) of the target environment.
  6. Choose the migration strategy for your workloads.
  7. Choose your migration tools.
  8. Define the migration plan and timeline.
  9. Validate your migration plan.

For more information about the assessment phase and these tasks, seeMigrate to Google Cloud: Assess and discover your workloads.The following sections are based on information in that document.

Build your inventories

To scope your migration, you create two inventories:

  1. The inventory of your clusters.
  2. The inventory of your workloads that are deployed in those clusters.

After you build these inventories, you:

  1. Assess your deployment and operational processes for your sourceenvironment.
  2. Assess supporting services and external dependencies.

Build the inventory of your clusters

To build the inventory of your clusters, consider the following for eachcluster:

  • Number and type of nodes. When you know how many nodes and thecharacteristics of each node that you have in your current environment, yousize your clusters when you move to GKE. The nodesin your new environment might run on a different hardware architectureor generation than the ones you use in your environment. Theperformance of each architecture and generation is different, so the numberof nodes you need in your new environment might be different from yourenvironment. Evaluate any type of hardware that you're usingin your nodes, such as high-performance storage devices, GPUs, and TPUs.Assess which operating system image that you're using on your nodes.
  • Internal or external cluster. Evaluate which actors, eitherinternal to your environment or external, that each cluster is exposed to.To support your use cases, this evaluation includes the workloadsrunning in the cluster, and theinterfaces that interact with your clusters.
  • Multi-tenancy. If you're managing multi-tenant clusters in yourenvironment, assess if it works in your new Google Cloudenvironment. Now is a good time to evaluate how to improve yourmulti-tenant clusters because your multi-tenancy strategy influences howyou build your foundation on Google Cloud.
  • Kubernetes version. Gather information about the Kubernetesversion of your clusters to assess if there is a mismatch betweenthose versions and theones available in GKE.If you're running an older or a recently released Kubernetes version, youmight be using features that are unavailable in GKE. Thefeatures might be deprecated, or the Kubernetes version that ships them isnot yet available in GKE.
  • Kubernetes upgrade cycle. To maintain a reliable environment,understand how you're handling Kubernetes upgrades and how yourupgrade cycle relates toGKE upgrades.
  • Node pools. If you're using any form of node grouping, youmight want to consider how these groupings map to the concept ofnode pools in GKE because your grouping criteria might not be suitable forGKE.
  • Node initialization. Assess how you initialize each node beforemarking it as available to run your workloads so you can port thoseinitialization procedures over to GKE.
  • Network configuration. Assess the network configuration of yourclusters, their IP address allocation, how you configured theirnetworking plugins, how you configured their DNS servers and DNS serviceproviders, if you configured any form of NAT or SNAT for these clusters,and whether they are part of a multi-cluster environment.
  • Compliance: Assess any compliance and regulatory requirements that yourclusters are required to satisfy, and whether you're meeting theserequirements.
  • Quotas and limits. Assess how you configured quotas and limits for yourclusters. For example, how many Pods can each node run? How many nodes can acluster have?
  • Labels and tags. Assess any metadata that you applied to clusters, nodepools, and nodes, and how you're using them. For example, you might begenerating reports with detailed, label-based cost attribution.

The following items that you assess in your inventory focus on the security ofyour infrastructure and Kubernetes clusters:

  • Namespaces. If you use KubernetesNamespaces in your clusters to logically separate resources, assess whichresources are in each Namespace, and understand why you created thisseparation. For example, you might be using Namespaces as part of yourmulti-tenancy strategy. You might have workloads deployed in Namespacesreserved for Kubernetes system components, and you might not have as muchcontrol in GKE.
  • Role-based access control (RBAC). If you useRBAC authorization in your clusters, list a description of all ClusterRoles andClusterRoleBindings that you configured in your clusters.
  • Network policies. List allnetwork policies that you configured in your clusters, and understandhow network policies work in GKE.
  • Pod security contexts. Capture information aboutthePod security contexts that you configured in your clusters andlearn how they work in GKE.
  • Service accounts. If any process in your cluster is interacting withthe Kubernetes API server, capture information about theservice accounts that they're using.

When you build the inventory of your Kubernetes clusters, you might find thatsome of the clusters need to be decommissioned as part of your migration. Makesure that your migration plan includes retiring these resources.

Build the inventory of your Kubernetes workloads

After you complete the Kubernetes clusters inventory and assess the security ofyour environment, build the inventory of the workloads deployed in thoseclusters. When evaluating your workloads, gather information about the followingaspects:

  • Pods andcontrollers.To size the clusters in your new environment, assess how manyinstances of each workload you have deployed, and if you're usingResource quotas andcompute resource consumption limits.Gather information about the workloads that are running on the control planenodes of each cluster and the controllers that each workload uses. Forexample, how manyDeployments are you using? How manyDaemonSets are you using?
  • Jobs andCronJobs.Your clusters and workloads might need to run Jobs or CronJobs as part of their initialization or operation procedures. Assess how many instances of Jobs and CronJobs you have deployed, and theresponsibilities and completion criteria for each instance.
  • Kubernetes Autoscalers. To migrate your autoscaling policies inthe new environment, learn howthe Horizontal Pod Autoscaler andthe Vertical Pod Autoscaler,work on GKE.
  • Stateless and stateful workloads. Stateless workloads don't storedata or state in the cluster or to persistent storage. Statefulapplications save data for later use. For each workload, assess whichcomponents are stateless and which are stateful, because migratingstateful workloads is typically harder than migrating stateless ones.
  • Kubernetes features. From the cluster inventory, you know whichKubernetes version each cluster runs. Review therelease notes of each Kubernetes version to know which features it ships and whichfeatures it deprecates. Then assess your workloads against theKubernetes features that you need. The goal of this task is to knowwhether you're using deprecated features or features that are not yetavailable in GKE. If you find any unavailable features,migrate away from deprecated features and adopt the new ones when they'reavailable in GKE.
  • Storage. For stateful workloads, assess if they usePersistenceVolumeClaims.List any storage requirements, such as size and access mode, and howthese PersistenceVolumeClaims map to PersistenceVolumes. To account for future growth, assess if you need toexpand any PersistenceVolumeClaim.
  • Configuration and secret injection. To avoid rebuilding yourdeployable artifacts every time there is a change in the configuration ofyour environment, inject configuration and secrets into Pods usingConfigMaps andSecrets.For each workload, assess which ConfigMaps and Secrets that workload isusing, and how you're populating those objects.
  • Dependencies. Your workloads probably don't work in isolation. Theymight have dependencies, either internal to the cluster, or from externalsystems. For each workload, capture the dependencies, and if yourworkloads have any tolerance for when the dependencies are unavailable. Forexample, common dependencies include distributed file systems, databases,secret distribution platforms, identity and access management systems,service discovery mechanisms, and any other external systems.
  • Kubernetes Services. To expose your workloads to internal andexternal clients, useServices.For each Service, you need to know its type. For externally exposedservices, assess how that service interacts with the rest of yourinfrastructure. For example, how is your infrastructure supportingLoadBalancer services,Gateway objects,andIngress objects?WhichIngress controllers did you deploy in your clusters?
  • Service mesh. If you're using a service mesh in your environment, youassess how it's configured. You also need to know how many clusters itspans, which services are part of the mesh, and how you modify the topologyof the mesh.
  • Taints and tolerationsandaffinity and anti-affinity.For each Pod and Node, assess if you configured any Node taints, Podtolerations, or affinities to customize the scheduling of Pods in yourKubernetes clusters. These properties might also give you insights aboutpossible non-homogeneous Node or Pod configurations, and might mean thateither the Pods, the Nodes, or both need to be assessed with specialfocus and care. For example, if you configured a particular set of Pods tobe scheduled only on certain Nodes in your Kubernetes cluster, it might meanthat the Pods need specialized resources that are available only on thoseNodes.
  • Authentication: Assess how your workloads authenticate against resourcesin your cluster, and against external resources.

Assess supporting services and external dependencies

After you assess your clusters and their workloads, evaluate the rest of thesupporting services and aspects in your infrastructure, such as the following:

  • StorageClasses and PersistentVolumes. Assess how yourinfrastructure is backing PersistentVolumeClaims by listingStorageClasses fordynamic provisioning,andstatically provisionedPersistentVolumes.For each PersistentVolume, consider the following: capacity, volume mode,access mode, class, reclaim policy, mount options, and node affinity.
  • VolumeSnapshotsandVolumeSnapshotContents.For each PersistentVolume, assess if you configured any VolumeSnapshot, andif you need to migrate any existing VolumeSnapshotContents.
  • Container Storage Interface (CSI) drivers. If deployed in your clusters,assess if these drivers are compatible with GKE, and if youneed to adapt the configuration of your volumes to work withCSI drivers that are compatible with GKE.
  • Data storage. If you depend on external systems to provisionPersistentVolumes, provide a way for the workloads in yourGKE environment to use those systems. Data locality has animpact on the performance of stateful workloads, because the latencybetween your external systems and your GKE environment isproportional to the distance between them. For each external data storagesystem, consider its type, such as block volumes, file storage, or objectstorage, and any performance and availability requirements that it needs tosatisfy.
  • Custom resources and Kubernetes add-ons. Collect information about anycustom Kubernetes resources and anyKubernetes add-ons that you might have deployed in your clusters, because they might not workin GKE, or you might need to modify them. For example, if acustom resource interacts with an external system, you assess if that'sapplicable to your Google Cloud environment.
  • Backup. Assess how you're backing up the configuration of yourclusters and stateful workload data in your source environment.

Assess your deployment and operational processes

It's important to have a clear understanding of how your deployment andoperational processes work. These processes are a fundamental part of thepractices that prepare and maintain your production environment and theworkloads that run there.

Your deployment and operational processes might build the artifacts that yourworkloads need to function. Therefore, you should gather information about eachartifact type. For example, an artifact can be an operating system package, anapplication deployment package, an operating system image, a container image, orsomething else.

In addition to the artifact type, consider how you complete the following tasks:

  • Develop your workloads. Assess the processes that development teams havein place to build your workloads. For example, how are your development teamsdesigning, coding, and testing your workloads?
  • Generate the artifacts that you deploy in your source environment. Todeploy your workloads in your source environment, you might be generatingdeployable artifacts, such as container images or operating system images, oryou might be customizing existing artifacts, such as third-party operatingsystem images by installing and configuring software.Gathering information about how you're generating these artifacts helps you toensure that the generated artifacts are suitable for deployment inGoogle Cloud.
  • Store the artifacts. If you produce artifacts that you store in anartifact registry in your source environment, you need to make the artifactsavailable in your Google Cloud environment. You can do so by employingstrategies like the following:

    • Establish a communication channel between the environments: Make theartifacts in your source environment reachable from the targetGoogle Cloud environment.
    • Refactor the artifact build process: Complete a minor refactor of yoursource environment so that you can store artifacts in both the sourceenvironment and the target environment. This approach supports yourmigration by building infrastructure like an artifact repository before youhave to implement artifact build processes in the target Google Cloudenvironment. You can implement this approach directly, or you can build onthe previous approach of establishing a communication channel first.

    Having artifacts available in both the source and target environments lets youfocus on the migration without having to implement artifact build processes inthe target Google Cloud environment as part of the migration.

  • Scan and sign code. As part of your artifact build processes, you might beusing code scanning to help you guard against common vulnerabilities andunintended network exposure, and code signing to help you ensure that onlytrusted code runs in your environments.

  • Deploy artifacts in your source environment. After you generatedeployable artifacts, you might be deploying them in your source environment.We recommend that you assess each deployment process. The assessment helpsensure that your deployment processes are compatible with Google Cloud.It also helps you to understand the effort that will be necessary toeventually refactor the processes. For example, if your deployment processeswork with your source environment only, you might need to refactor them totarget your Google Cloud environment.

  • Inject runtime configuration. You might be injecting runtime configurationfor specific clusters, runtime environments, or workload deployments. Theconfiguration might initialize environment variables and other configurationvalues such as secrets, credentials, and keys. To help ensure that yourruntime configuration injection processes work on Google Cloud, werecommend that you assess how you're configuring the workloads that run inyour source environment.

  • Logging, monitoring, and profiling. Assess the logging, monitoring, andprofiling processes that you have in place to monitor the health of yoursource environment, the metrics of interest, and how you're consuming dataprovided by these processes.

  • Authentication. Assess how you're authenticating against yoursource environment.

  • Provision and configure your resources. To prepare your sourceenvironment, you might have designed and implemented processes that provisionand configure resources. For example, you might be usingTerraform along with configuration management tools to provision and configure resourcesin your source environment.

Plan and build your foundation

In the plan and build phase, you provision and configure the infrastructure todo the following:

  • Support your workloads in your Google Cloud environment.
  • Connect your source environment and your Google Cloud environment tocomplete the migration.

The plan and build phase is composed of the following tasks:

  1. Build a resource hierarchy.
  2. Configure Google Cloud's Identity and Access Management (IAM).
  3. Set up billing.
  4. Set up network connectivity.
  5. Harden your security.
  6. Set up logging, monitoring, and alerting.

For more information about each of these tasks, see theMigrate to Google Cloud: Plan and build your foundation.

The following sections integrate the considerations in Migrate toGoogle Cloud: Plan and build your foundation.

Plan for multi-tenancy

To design an efficient resource hierarchy, consider how your businessand organizational structures map to Google Cloud. For example, if youneed a multi-tenant environment on GKE, youcan choose between the following options:

  • Creating one Google Cloud project for each tenant.
  • Sharing one project among different tenants, and provisioning multipleGKE clusters.
  • Using Kubernetes namespaces.

Your choice depends on your isolation, complexity, and scalability needs. Forexample, having one project per tenant isolates the tenants from one another,but the resource hierarchy becomes more complex to manage due to the high numberof projects. However, although managing Kubernetes Namespaces is relativelyeasier than a complex resource hierarchy, this option doesn't guarantee as muchisolation. For example, the control plane might be shared between tenants. Formore information, seeCluster multi-tenancy.

Configure identity and access management

GKE supports multiple options for managing access to resourceswithin your Google Cloud project and its clusters using RBAC. For more information,seeAccess control.

Configure GKE networking

Network configuration is a fundamental aspect of your environment. Beforeprovisioning and configure any cluster, we recommend that you assesstheGKE network model,thebest practices for GKE networking,and how toplan IP addresses when migrating to GKE.

Set up monitoring and alerting

Having a clear picture of how your infrastructure and workloads are performingis key to finding areas of improvement. GKE hasdeep integrations with Google Cloud Observability,so you get logging, monitoring, and profiling information about yourGKE clusters and workloads inside those clusters.

Migrate data and deploy your workloads

In the deployment phase, you do the following:

  1. Provision and configure your GKE environment.
  2. Configure your GKE clusters.
  3. Refactor your workloads.
  4. Refactor deployment and operational processes.
  5. Migrate data from your source environment to Google Cloud.
  6. Deploy your workloads in your GKE environment.
  7. Validate your workloads and GKE environment.
  8. Expose workloads running on GKE.
  9. Shift traffic from the source environment to the GKE environment.
  10. Decommission the source environment.

Provision and configure your Google Cloud environment

Before moving any workload to your new Google Cloud environment, youprovision the GKE clusters.

GKE supports enabling certain features on existing clusters, butthere might be features that you can only enable at cluster creation time. Tohelp you avoid disruptions and simplify the migration, we recommend that youenable the cluster features that you need at cluster creation time. Otherwise,you might need to destroy and recreate your clusters in case the clusterfeatures you need cannot be enabled after creating a cluster.

After the assessment phase, you now know how to provision theGKE clusters in your new Google Cloud environment to meetyour needs. To provision your clusters, consider the following:

  • The number of clusters, the number of nodes per cluster, the types ofclusters, the configuration of each cluster and each node, and thescalability plans of each cluster.
  • The mode of operation of each cluster. GKE offers two modesof operation for clusters: GKE Autopilot andGKE Standard.
  • The number ofprivate clusters.
  • The choice betweenVPC-native or router-based networking.
  • TheKubernetes versions and release channels that you need in your GKE clusters.
  • The node pools to logically group the nodes in your GKEclusters, and if you need to automatically create node pools withnode auto-provisioning.
  • The initialization procedures that you can port from yourenvironment to the GKE environment and new procedures thatyou can implement. For example, you canautomatically bootstrap GKE nodes by implementing one or multiple, eventually privileged, initializationprocedures for each node or node pool in your clusters.
  • The scalability plans for each cluster.
  • The additional GKE features that you need, such asCloud Service Mesh, and GKE add-ons, such asBackup for GKE.

For more information about provisioning GKE clusters, see:

Fleet management

When you provision your GKE clusters, you might realize that youneed a large number of them to support all the use cases of your environment.For example, you might need to separate production fromnon-production environments, or separate services across teams or geographies.For more information, seemulti-cluster use cases.

As the number of clusters increases, your GKE environment mightbecome harder to operate because managing a large number of clusters posessignificant scalability and operational challenges. GKE providestools and features to help you managefleets, a logical grouping of Kubernetesclusters. For more information, seeFleet management.

Multi-cluster networking

To help you improve the reliability of your GKE environment, andto distribute your workloads across several GKE clusters, you canuse:

  • Multi-Cluster Service Discovery, a cross-cluster service discovery and invocation mechanism.Services are discoverable and accessible across GKE clusters.For more information, seeMulti-Cluster Service Discovery.
  • Multi-cluster gateways, a cross-cluster ingress traffic load balancingmechanism. For more information, seeDeploying multi-cluster Gateways.
  • Multi-cluster mesh on managed Cloud Service Mesh. For more information, seeSet up a multi-cluster mesh.

For more information about migrating from a single-cluster GKEenvironment to a multi-cluster GKE environment, seeMigrate to multi-cluster networking.

Configure your GKE clusters

After you provision your GKE clusters and before deploying anyworkload or migrating data, you configure namespaces, RBAC, network policies,service accounts, and other Kubernetes and GKE objects for eachGKE cluster.

To configure Kubernetes and GKE objects in yourGKE clusters, we recommend that you:

  1. Ensure that you have the necessary credentials and permissions to access boththe clusters in your source environment, and in your GKEenvironment.
  2. Assess if the objects in the Kubernetes clusters your source environment arecompatible with GKE, and how the implementations that backthese objects differ from the source environment and GKE.
  3. Refactor any incompatible object to make it compatible withGKE, or retire it.
  4. Create these objects to your GKE clusters.
  5. Configure any additional objects that your need in your GKEclusters.

Config Sync

To help you adoptGitOps best practices to manage the configuration of your GKE clustersas your GKE scales, we recommend that you useConfig Sync,a GitOps service to deploy configurations from a source of truth. For example,you can store the configuration of your GKE clustersin a Git repository, and use Config Sync to apply that configuration.

For more information, seeConfig Sync architecture.

Policy Controller

Policy Controller helps you apply and enforce programmable policies tohelp ensure that your GKE clusters and workloads run in a secureand compliant manner. As your GKE environment scales, you can usePolicy Controller to automatically apply policies, policy bundles, andconstraints to all your GKE clusters. For example, you canrestrict the repositories from where container images can be pulled from, or youcan require each namespace to have at least one label to help you ensureaccurate resource consumption tracking.

For more information, seePolicy Controller.

Refactor your workloads

A best practice to design containerized workloads is to avoid dependencies onthe container orchestration platform. This might not always be possible inpractice due to the requirements and the design of your workloads. For example,your workloads might depend on environment-specific features that are availablein your source environment only, such as add-ons, extensions, and integrations.

Although you might be able to migrate most workloads as-is toGKE, you might need to spend additional effort to refactorworkloads that depend on environment-specific features, in order to minimizethese dependencies, eventually switching to alternatives that are available onGKE.

To refactor your workloads before migrating them to GKE, you dothe following:

  1. Review source environment-specific features, such as add-ons, extensions, andintegrations.
  2. Adopt suitable alternative GKE solutions.
  3. Refactor your workloads.

Review source environment-specific features

If you're using source environment-specific features, and your workloads dependon these features, you need to:

  1. Find suitable alternatives GKE solutions.
  2. Refactor your workloads in order to make use of the alternativeGKE solutions.

As part of this review, we recommend that you do the following:

  • Consider whether you can deprecate any of these source environment-specificfeatures.
  • Evaluate how critical a source environment-specific feature is for the successof the migration.

Adopt suitable alternative GKE solutions

After you reviewed your source environment-specific features, and mapped them tosuitable GKE alternative solutions, you adopt these solutions inyour GKE environment. To reduce the complexity of your migration,we recommend that you do the following:

  • Avoid adopting alternative GKE solutions for sourceenvironment-specific features that you aim to deprecate.
  • Focus on adopting alternative GKE solutions for the mostcritical source environment-specific features, and plan dedicated migrationprojects for the rest.

Refactor your workloads

While most of your workloads might work as is in GKE, you mightneed to refactor some of them, especially if they depended on sourceenvironment-specific features for which you adopted alternativeGKE solutions.

This refactoring might involve:

  • Kubernetes object descriptors, such as Deployments, and Services expressed inYAML format.
  • Container image descriptors, such as Dockerfiles and Containerfiles.
  • Workloads source code.

To simplify the refactoring effort, we recommend that you focus on applying theleast amount of changes that you need to make your workloads suitable forGKE, and critical bug fixes. You can plan other improvements andchanges as part of future projects.

Refactor deployment and operational processes

After you refactor your workloads, you refactor your deployment and operationalprocesses to do the following:

  • Provision and configure resources in your Google Cloud environmentinstead of provisioning resources in your source environment.
  • Build and configure workloads, and deploy them in your Google Cloudinstead of deploying them in your source environment.

You gathered information about these processes during the assessment phaseearlier in this process.

The type of refactoring that you need to consider for these processes depends onhow you designed and implemented them. The refactoring also depends on what youwant the end state to be for each process. For example, consider the following:

  • You might have implemented these processes in your source environment and youintend to design and implement similar processes in Google Cloud. Forexample, you can refactor these processes to useCloud Build,Cloud Deploy,andInfrastructure Manager.
  • You might have implemented these processes in another third-party environmentoutside your source environment. In this case, you need to refactor theseprocesses to target your Google Cloud environment instead of your sourceenvironment.
  • A combination of the previous approaches.

Refactoring deployment and operational processes can be complex and can requiresignificant effort. If you try to perform these tasks as part of your workloadmigration, the workload migration can become more complex, and it can expose youto risks. After you assess your deployment and operational processes, you likelyhave an understanding of their design and complexity. If you estimate that yourequire substantial effort to refactor your deployment and operationalprocesses, we recommend that you consider refactoring these processes as part ofa separate, dedicated project.

For more information about how to design and implement deployment processes on Google Cloud, see:

This document focuses on the deployment processes that produce the artifacts todeploy, and deploy them in the target runtime environment. The refactoringstrategy highly depends on the complexity of these processes. The following listoutlines a possible, general, refactoring strategy:

  1. Provision artifact repositories on Google Cloud. For example, you canuse Artifact Registry to store artifacts and build dependencies.
  2. Refactor your build processes to store artifacts both in your sourceenvironment and in Artifact Registry.
  3. Refactor your deployment processes to deploy your workloads in your targetGoogle Cloud environment. For example, you can start by deploying asmall subset of your workloads in Google Cloud, using artifacts storedin Artifact Registry. Then, you gradually increase the number of workloadsdeployed in Google Cloud, until all the workloads to migrate run onGoogle Cloud.
  4. Refactor your build processes to store artifacts in Artifact Registry only.
  5. If necessary, migrate earlier versions of the artifacts to deploy from therepositories in your source environment to Artifact Registry. For example, you cancopy container images to Artifact Registry.
  6. Decommission the repositories in your source environment when you no longerrequire them.

To facilitate eventual rollbacks due to unanticipated issues during themigration, you can store container images both in your current artifactrepositories in Google Cloud while the migration to Google Cloud isin progress. Finally, as part of the decommissioning of your source environment,you can refactor your container image building processes to store artifacts inGoogle Cloud only.

Although it might not be crucial for the success of a migration, you might needto migrate your earlier versions of your artifacts from your source environmentto your artifact repositories on Google Cloud. For example, to supportrolling back your workloads to arbitrary points in time, you might need tomigrate earlier versions of your artifacts to Artifact Registry. For more information,seeMigrate images from a third-party registry.

If you're using Artifact Registry to store your artifacts, we recommend that youconfigure controls to help you secure your artifactrepositories, such as access control, data exfiltration prevention,vulnerability scanning, and Binary Authorization. For more information, seeControl access and protect artifacts.

Deploy your workloads

When your deployment processes are ready, you deploy your workloads toGKE. For more information, seeOverview of deploying workloads.

To prepare the workloads to deploy for GKE, we recommend that youanalyze your Kubernetes descriptors because some Google Cloudresources that GKE automatically provisions for you areconfigurable by using Kuberneteslabels andannotations,instead of having to manually provision these resources.For example, you can provision aninternal load balancer instead of an external one byadding an annotation to a LoadBalancer Service.

Validate your workloads

After you deploy workloads in your GKE environment, but beforeyou expose these workloads to your users, we recommend that you performextensive validation and testing. This testing can help you verify that your workloads arebehaving as expected. For example, you may:

  • Perform integration testing, load testing, compliance testing, reliabilitytesting, and other verification procedures that help you ensure that yourworkloads are operating within their expected parameters, and according totheir specifications.
  • Examine logs, metrics, and error reports inGoogle Cloud Observability to identify any potential issues, and to spot trends to anticipate problemsbefore they occur.

For more information about workload validation, seeTesting for reliability.

Expose your workloads

Once you complete the validation testing of the workloads running in yourGKE environment, expose your workloads to make them reachable.

To expose workloads running in your GKE environment,you can use Kubernetes Services, and a service mesh.

For more information about exposing workloads running in GKE, see:

Shift traffic to your Google Cloud environment

After you have verified that the workloads are running in your GKEenvironment, and after you have exposed them to clients, you shift traffic fromyour source environment to your GKE environment. To help youavoid big-scale migrations and all the related risks, we recommend that yougradually shift traffic from your source environment to your GKE.

Depending on how you designed your GKE environment, you haveseveral options to implement a load balancing mechanism that gradually shiftstraffic from your source environment to your target environment. For example,you may implement a DNS resolution policy that resolves DNS records accordingto some policy to resolve a certain percentage of requests to IPaddresses belonging to your GKE environment. Or you can implementa load balancing mechanism using virtual IP addresses and network loadbalancers.

After you start gradually shifting traffic to your GKEenvironment, we recommend that you monitor how your workloads behave astheir loads increase.

Finally, you perform acutover, which happens when you shift all the trafficfrom your source environment to your GKE environment.

For more information about load balancing, seeLoad balancing at the frontend.

Decommission the source environment

After the workloads in your GKE environment are serving requestscorrectly, you decommission your source environment.

Before you start decommissioning resources in your source environment, werecommend that you do the following:

  • Back up any data to help you restore resources in your source environment.
  • Notify your users before decommissioning the environment.

To decommission your source environment, do the following:

  1. Decommission the workloads running in the clusters in your sourceenvironment.
  2. Delete the clusters in your source environment.
  3. Delete the resources associated with these clusters, such as security groups,load balancers, and virtual networks.

To avoid leaving orphaned resources, the order in which you decommission theresources in your source environment is important. For example, certainproviders require that you decommission Kubernetes Services that lead to thecreation of load balancers before being able to decommission the virtualnetworks containing those load balancers.

Optimize your Google Cloud environment

Optimization is the last phase of your migration. In this phase, you iterate onoptimization tasks until your target environment meets your optimizationrequirements. The steps of each iteration are as follows:

  1. Assess your current environment, teams, and optimization loop.
  2. Establish your optimization requirements and goals.
  3. Optimize your environment and your teams.
  4. Tune the optimization loop.

You repeat this sequence until you've achieved your optimization goals.

For more information about optimizing your Google Cloud environment, seeMigrate to Google Cloud: Optimize your environment andGoogle Cloud Well-Architected Framework: Performance optimization.

The following sections integrate the considerations in Migrate toGoogle Cloud: Optimize your environment.

Establish your optimization requirements

Optimization requirements help you narrow the scope of the current optimizationiteration. For more information about optimization requirements and goals, seeEstablish your optimization requirements and goals.

To establish your optimization requirements for your GKEenvironment, start by consider the following aspects:

  • Security, privacy, and compliance: help you enhance the security postureof your GKE environment.
  • Reliability: help you improve the availability, scalability, andresilience of your GKE environment.
  • Cost optimization: help you optimize the resource consumption andresulting spending of your GKE environment.
  • Operational efficiency: help you maintain and operate yourGKE environment efficiently.
  • Performance optimization: help you optimize the performance of theworkloads deployed in your GKE environment.

Security, privacy, and compliance

  • Monitor the security posture of you GKE clusters. You canuse thesecurity posture dashboard to get opinionated, actionable recommendations to help you improve thesecurity posture of your GKE environment.
  • Harden your GKE environment. Understand theGKE security model,and how to hardenharden your GKE clusters.
  • Protect your software supply-chain. For security-critical workloads,Google Cloud provides a modular set of products that implementsoftware supply chain security best practices across the software lifecycle.

Reliability

Cost optimization

For more information about optimizing the cost of your GKEenvironment, see:

Operational efficiency

To help you avoid issues that affect your production environment, we recommendthat you:

  • Design your GKE clusters to be fungible. By considering yourclusters as fungible and by automating their provisioning and configuration,you can streamline and generalize the operational processes to maintain themand also simplify future migrations and GKE cluster upgrades.For example, if you need to upgrade a fungible GKE cluster to anew GKE version, you can automatically provision and configurea new, upgraded cluster, automatically deploy workloads in the new cluster,and decommission the old, outdated GKE cluster.
  • Monitor metrics of interest. Ensure that all the metrics of interest aboutyour workloads and clusters are properly collected. Also, verify that all therelevant alerts that use these metrics as inputs are in place, and working.

For more information about configuring monitoring, logging, and profiling inyour GKE environment, see:

Performance optimization

  • Set up cluster autoscaling and node auto-provisioning. Automaticallyresize your GKE cluster according to demand by usingcluster autoscaling andnode auto-provisioning.
  • Automatically scale workloads. GKE supports several scalingmechanisms, such as:

For more information, seeAbout GKE scalability.

What's next

Contributors

Author:Marco Ferrari | Cloud Solutions Architect

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-06-22 UTC.