Posted onJul 10, 2023 • Originally published atpaulyu.dev onJul 10, 2023

Streamline Network Observability on AKS

#azure #kubernetes #observability #terraform

Have you ever had to troubleshoot network issues in your Kubernetes clusters? If so, you know how challenging it can be to identify and resolve problems.

To troubleshoot network issues you probably had to use a combination of tools likekubectl,tcpdump,wireshark, andnetstat. The list goes on and on... While these tools are great for debugging and capturing network logs and traces, they don't provide a holistic view of your cluster's network traffic.

The good news is that there's a better way!

A few weeks ago, theNetwork Observability add-on for AKS was announced. This add-on is currently in preview and provides a simple way to enable network observability for your AKS clusters. The add-on is aneBPF-based solution that scrapes metrics from Kubernetes workloads and exposes them in Prometheus format. This allows you to use tools like Grafana to visualize your cluster's network traffic. This can be either Bring-Your-Own Prometheus and Grafana or Azure-managed Prometheus and Grafana.

The AKS docs include astep-by-step guide for enabling the add-on using the Azure CLI.

In this blog post, I'll walk you through the steps on how you can enable the AKS add-on using Terraform.

Before you begin

You should have anAzure subscription and theAzure CLI installed. You'll also need to install theTerraform CLI.

If you have all of the above, you're ready to get started!

Run the following command to log in to your Azure account using the Azure CLI:

az login

With the network observability add-on being in preview, you'll need to register theNetworkObservabilityPreview feature by running the following command:

az feature register\--namespace"Microsoft.ContainerService"\--name"NetworkObservabilityPreview"

NOTE: This command can take a few minutes to complete. You can check the status of the feature registration using the following command:
az feature show\--namespace"Microsoft.ContainerService"\--name"NetworkObservabilityPreview"

You can proceed when feature has been registered.

Overview of what we'll be doing

If you've used the Azure CLI command to enable the network observability add-on in your AKS cluster, you'll find that all it takes is a single flag (--enable-network-observability) to enable the feature and a few commands to wire up the AKS cluster to the Azure managed Prometheus and Grafana instances. I want to use Terraform to provision the add-on. It's a bit more involved but worth knowing how it's all wired up.

The process of enabling the network observability add-on using Terraform can be broken down into the following steps:

Create an AKS cluster
Create an Azure Monitor workspace with data collection rules, endpoints, and alerts for Prometheus
Enable the network monitoring add-on for the AKS cluster
Create an Azure Managed Grafana instance with proper role-based access control (RBAC) assignments so that you can log into Grafana and for Grafana to access the Azure Monitor workspace
Import theKubernetes / Networking dashboard into our Grafana instance

After following the steps above, we'll deploy a sample application to the AKS cluster and explore the network observability dashboard.

NOTE: If you're really curious to know what the--enable-network-observability flag does in Azure CLI, you can read through the source codehere

Setting up Terraform providers

All my Terraform code can be foundhere. You can use this as a reference to follow along with the steps below.

Create a new Terraform configuration file namedmain.tf and add the following code:

terraform{  required_providers{    azurerm={source="hashicorp/azurerm"      version="=3.62.1"}local={source="hashicorp/local"      version="=2.4.0"}    helm={source="hashicorp/helm"      version="=2.10.1"}    azapi={source="Azure/azapi"      version="=1.7.0"}}}provider"azurerm"{  features{    resource_group{      prevent_deletion_if_contains_resources=false}}}provider"helm"{  kubernetes{    config_path= local_file.example.filename}}locals{  name="neto11y${random_integer.example.result}"  location="eastus"}data"azurerm_client_config""current"{}

Here we are defining the required Terraform providers and the Azure provider configuration. We are also defining a few local variables that will be used throughout the Terraform configuration.

Notice that we're using theazapi andhelm providers in addition to theazurerm provider. Theazapi provider is used to update our AKS cluster and enable the network observability add-on. With this AKS add-on being in preview, it is not yet available inazurerm, so this is a great opportunity to utilize theazapi provider to update the AKS resource.

Thehelm provider is used to deploy a sample application to our AKS cluster. We'll get to that later.