About service discovery

This document explains how service discovery in Google Kubernetes Engine (GKE)simplifies application management and how to extend service discovery beyond asingle cluster by using Cloud DNS scopes, Multi-cluster Services (MCS), andService Directory.

This document is for GKE users, Developers, andAdmins and architects. To learn more about common roles and example tasks that wereference in Google Cloud content, seeCommon GKE Enterpriseuser roles and tasks.

Before you read this document, make sure you understand the following concepts:

Overview

Service discovery is a mechanism that lets services and applications find andcommunicate with each other dynamically without hardcoding IP addresses orendpoint configurations. Service discovery helps ensure that applications alwayshave access to up-to-date Pod IP addresses, even when Pods are rescheduled ornew Pods are added. GKE offers several ways to implement servicediscovery, includingkube-dns, customkube-dns deployments, andCloud DNS. You can further optimize DNS performance withNodeLocalDNSCache.

Benefits of service discovery

Service discovery provides the following benefits:

  • Simplified application management: service discovery eliminates the needto hardcode IP addresses in your application configurations. Applicationscommunicate by using logical Service names, which automatically resolve tothe correct Pod IP addresses. This approach simplifies configuration,especially in dynamic environments where Pod IP addresses might change dueto scaling or rescheduling.
  • Simplified scaling and resilience: service discovery simplifies scalingby decoupling service consumers from Pod IP addresses, which changefrequently. While your application scales, or if Pods fail and are replaced,Kubernetes automatically updates which Pods are available to receive trafficfor a given Service. Service discovery helps ensure that requests to thestable Service name are directed only to healthy Pods, which lets yourapplication scale or recover from failures without manual intervention orclient reconfiguration.
  • High availability: GKE uses load balancing together withservice discovery to help ensure high availability and improveresponsiveness for your applications, even under heavy loads.

Load balancing with service discovery

GKE helps ensure high availability for your applications bycombining different levels of load balancing with service discovery.

  • Internal Services: for services that are accessible only within thecluster, GKE's dataplane (kube-proxy orCilium) acts as aload balancer. It distributes incoming traffic evenly across multiplehealthy Pods, preventing overload and helping to ensure high availability.
  • External Services: for services that need to be accessible from outsidethe cluster, GKE provisions Google Cloud Load Balancers.These load balancers include external Google Cloud Load Balancers for publicinternet access and internal Google Cloud Load Balancers for access withinyour Virtual Private Cloud network. These load balancers distribute traffic acrossthe nodes in your cluster. The dataplane on each node then further routesthe traffic to the appropriate Pods.

In both internal and external scenarios, service discovery continuously updatesthe list of available Pods for each Service. This continuous updating helpsensure that both the dataplane (for internal services) and the Google Cloud loadbalancers (for external services) direct traffic only to healthy instances.

Use cases for service discovery

The following are common use cases for service discovery:

  • Microservice architecture: in a microservice architecture, applicationsoften consist of many smaller, independent services that need to interact.Service discovery enables these applications to find each other and exchangeinformation, even while the cluster scales.
  • Enable zero-downtime deployments and improve resilience: Servicediscovery facilitates zero-downtime updates for applications, includingcontrolled rollouts and canary deployments. It automates the discovery ofnew service versions and shifts traffic to them, which helpsreduce downtime during deployment and ensure a smooth transition for users.Service discovery also enhances resilience. When a Pod fails inGKE, a new one is deployed, and Service discovery registersthe new Pod and redirects traffic to it, which helps minimize applicationdowntime.

How service discovery works

In GKE, applications often consist of multiple Pods that need tofind and communicate with each other. Service discovery provides this capabilityby using the Domain Name System (DNS). Similar to how you use DNS to findwebsites on the internet, Pods in a GKE cluster use DNS to locateand connect with services by using their Service names. This approach lets Podsinteract effectively, regardless of where they are running in the cluster, andallows applications to communicate by using consistent service names rather thanunstable IP addresses.

How Pods perform DNS resolution

Pods in a GKE cluster resolve DNS names for Services and otherPods by using a combination of automatically generated DNS records and theirlocal DNS configuration.

Service DNS names

When you create a Kubernetes Service, GKE automatically assigns aDNS name to it. This name follows a predictable format, which any Pod in thecluster can use to access the Service:

<service-name>.<namespace>.svc.cluster.local

The default cluster domain iscluster.local, but you can customize the domainwhen you create the cluster. For example, a Service that's namedmy-web-app inthe default namespace would have the DNS namemy-web-app.default.svc.cluster.local.

The role of/etc/resolv.conf

To resolve these DNS names, Pods rely on their/etc/resolv.conf file. Thisconfiguration file tells the Pod which name server to send its DNS queries. TheIP address of the name server listed in this file depends on the specific DNSfeatures that are enabled on your GKE cluster. The followingtable outlines which name server IP a Pod uses, based on your configuration:

Cloud DNS for GKENodeLocal DNSCache`/etc/resolv.conf` name server value
EnabledEnabled`169.254.20.10`
EnabledDisabled`169.254.169.254`
DisabledEnabled`kube-dns` Service IP address
DisabledDisabled`kube-dns` Service IP address

This configuration helps ensure that DNS queries from the Pod are directed tothe correct component:

  • NodeLocal DNSCache: provides fast, local lookups on the node.
  • Metadata server IP (169.254.169.254): is used when Cloud DNS forGKE is enabled without NodeLocal DNSCache. DNS queries aredirected to this IP address, which Cloud DNS uses to intercept and handleDNS requests.
  • kube-dns Service IP address: is used for standard in-clusterresolution when Cloud DNS for GKE is disabled.

DNS architecture in GKE

GKE provides a flexible architecture for service discovery,Primarily by using DNS. The following components work together to resolve DNSqueries within your cluster:

  • kube-dns: the default in-cluster DNS provider for GKEStandard clusters. It runs as a managed deployment of Pods in thekube-system namespace and monitors the Kubernetes API for new Services tocreate the necessary DNS records.
  • Cloud DNS: Google Cloud's fully managed DNS service. It offers ahighly scalable and reliable alternative tokube-dns and is the defaultDNS provider for GKE Autopilot clusters.
  • NodeLocal DNSCache: a GKE add-on that improves DNSlookup performance. It runs a DNS cache on each node in your cluster,working with eitherkube-dns or Cloud DNS to serve DNS queries locally,which reduces latency and the load on the cluster's central DNS provider.For GKE Autopilot clusters,NodeLocal DNSCache isenabled by default and cannot be overridden.
  • Customkube-dns Deployment: a Deployment that lets you deploy andmanage your own instance ofkube-dns, which provides more control overkube-dns configuration and resources.

Choose your DNS provider

The following table summarizes the DNS providers available inGKE, including their features and when to choose each one:

ProviderFeaturesWhen to choose
`kube-dns`In-cluster DNS resolution for Services and Pods.All clusters with standard networking needs. The new version of `kube-dns` is suitable for both small and large-scale clusters.
Cloud DNSAdvanced DNS features (private zones, traffic steering, global load balancing), and integration with other Google Cloud services.Exposing services externally, multi-cluster environments, or for clusters with high DNS query rates (QPS).
Custom `kube-dns` DeploymentAdditional control over configuration, resource allocation, and the potential to use alternative DNS providers.Large-scale clusters or specific DNS needs that require more aggressive scaling or fine-grained control over resource allocation.

Service discovery outside a single cluster

You can extend service discovery beyond a single GKE cluster Byusing the following methods:

Cloud DNS scope

Clusters that use Cloud DNS for cluster DNS can operate in one of threeavailable scopes:

  • Cluster scope: this is the default behavior for Cloud DNS. In thismode, Cloud DNS functions equivalently tokube-dns by providing DNSresolution exclusively for resources that are within the cluster. DNSrecords are resolvable only from within the cluster, and they adhere to thestandard Kubernetes Service schema:<svc>.<ns>.svc.cluster.local.
  • Additive VPC scope: this optional feature extends thecluster scope by making headless Services resolvable from other resourceswithin the same VPC network, such as Compute Engine VMs oron-premises clients that connect by using Cloud VPN orCloud Interconnect.
  • VPC scope: with this configuration, DNS records forcluster Services are resolvable within the entire VPCnetwork. This approach means that any client that's in the sameVPC or is connected to it (through Cloud VPN orCloud Interconnect) can directly resolve Service names.

For more information about VPC scope DNS, seeUsing Cloud DNSfor GKE.

Multi-cluster Services

Multi-cluster Services (MCS) enables service discovery and traffic managementacross multiple GKE clusters. MCS lets you build applicationsthat span clusters while maintaining a unified service experience.

MCS leverages DNS-based service discovery to connect Services across clusters.When you create an MCS instance, it generates DNS records in the format of<svc>.<ns>.svc.clusterset.local. These records resolve to the IP addresses ofthe Service's endpoints in each participating cluster.

When a client in one cluster accesses an MCS, requests are routed to the nearestavailable endpoint in any of the participating clusters. This trafficdistribution is managed bykube-proxy (orCilium in GKEGKE Dataplane V2) on each node, which helps ensure efficient communication andload balancing across clusters.

Service Directory for GKE

Service Directory for GKE provides a unified registry for servicediscovery across your Kubernetes and non-Kubernetes deployments.Service Directory can register both GKE and non-GKE services in a single registry.

Service Directory is particularly useful if you want any of the following:

  • A single registry for Kubernetes and non-Kubernetes applications to discovereach other.
  • A managed service discovery tool.
  • The ability to store metadata about your Service that other clients canaccess.
  • The ability to set access permissions on a per-Service level.Service Directory services can be resolved by using DNS, HTTP, and gRPC.Service Directory is integrated with Cloud DNS, and can populateCloud DNS records that match services in Service Directory.

For more information, seeConfiguring Service Directory forGKE.

Optimize DNS performance and best practices

To help ensure reliable and efficient service discovery, especially inlarge-scale or high-traffic clusters, consider the following best practices andoptimization strategies.

Optimize performance with NodeLocal DNSCache

For clusters that have a high density of Pods, or applications that generate ahigh volume of DNS queries, you can improve DNS lookup speeds by enablingNodeLocal DNSCache.NodeLocal DNSCache is a GKE add-on thatruns a DNS cache on each node in your cluster. When a Pod makes a DNS request,the request goes to the cache that's on the same node. This approach reduceslatency and network traffic.

For more information about how to enable and configure this feature, seeSetting up NodeLocalDNSCache.

Scale your DNS provider

If you usekube-dns and experience intermittent timeouts, particularly duringperiods of high traffic, you might need to scale the number ofkube-dnsreplicas. Thekube-dns-autoscaler adjusts the number of replicas based on thenumber of nodes and cores in the cluster, and its parameters can be tuned todeploy more replicas sooner.

For detailed instructions, seeScaling upkube-dns.

General best practices

  • Select the appropriate DNS provider: choose your DNS provider based onyour cluster's needs. Cloud DNS is recommended for high-QPS workloads,multi-cluster environments, and when you need integration with your broaderVPC network. The new version ofkube-dns is suitable for awide range of clusters, from small to large, that have standard in-clusterservice discovery needs.
  • Avoid Spot VMs or Preemptible VMs forkube-dns: help ensurethe stability of your cluster's DNS service by not running critical systemcomponents likekube-dns on Spot VMs or Preemptible VMs.Unexpected node terminations can lead to DNS resolution issues.
  • Use clear and descriptive Service names: adopt consistent and meaningfulnaming conventions for your Services to make application configurationseasier to read and maintain.
  • Organize with namespaces: use Kubernetes namespaces to group relatedservices. This approach helps prevent naming conflicts and improves clusterresource organization.
  • Monitor and validate DNS: regularly monitor DNS metrics and logs toidentify potential issues before they impact your applications. Periodicallytest DNS resolution from within your Pods to ensure that service discoveryis functioning as expected.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.