Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Streamline Network Observability on AKS
Microsoft Azure profile imagePaul Yu
Paul Yu forMicrosoft Azure

Posted on • Originally published atpaulyu.dev on

     

Streamline Network Observability on AKS

Have you ever had to troubleshoot network issues in your Kubernetes clusters? If so, you know how challenging it can be to identify and resolve problems.

To troubleshoot network issues you probably had to use a combination of tools likekubectl,tcpdump,wireshark, andnetstat. The list goes on and on... While these tools are great for debugging and capturing network logs and traces, they don't provide a holistic view of your cluster's network traffic.

The good news is that there's a better way!

A few weeks ago, theNetwork Observability add-on for AKS was announced. This add-on is currently in preview and provides a simple way to enable network observability for your AKS clusters. The add-on is aneBPF-based solution that scrapes metrics from Kubernetes workloads and exposes them in Prometheus format. This allows you to use tools like Grafana to visualize your cluster's network traffic. This can be either Bring-Your-Own Prometheus and Grafana or Azure-managed Prometheus and Grafana.

The AKS docs include astep-by-step guide for enabling the add-on using the Azure CLI.

In this blog post, I'll walk you through the steps on how you can enable the AKS add-on using Terraform.

Before you begin

You should have anAzure subscription and theAzure CLI installed. You'll also need to install theTerraform CLI.

If you have all of the above, you're ready to get started!

Run the following command to log in to your Azure account using the Azure CLI:

az login
Enter fullscreen modeExit fullscreen mode

With the network observability add-on being in preview, you'll need to register theNetworkObservabilityPreview feature by running the following command:

az feature register\--namespace"Microsoft.ContainerService"\--name"NetworkObservabilityPreview"
Enter fullscreen modeExit fullscreen mode

NOTE: This command can take a few minutes to complete. You can check the status of the feature registration using the following command:

az feature show\--namespace"Microsoft.ContainerService"\--name"NetworkObservabilityPreview"

You can proceed when feature has been registered.

Overview of what we'll be doing

If you've used the Azure CLI command to enable the network observability add-on in your AKS cluster, you'll find that all it takes is a single flag (--enable-network-observability) to enable the feature and a few commands to wire up the AKS cluster to the Azure managed Prometheus and Grafana instances. I want to use Terraform to provision the add-on. It's a bit more involved but worth knowing how it's all wired up.

The process of enabling the network observability add-on using Terraform can be broken down into the following steps:

  1. Create an AKS cluster
  2. Create an Azure Monitor workspace with data collection rules, endpoints, and alerts for Prometheus
  3. Enable the network monitoring add-on for the AKS cluster
  4. Create an Azure Managed Grafana instance with proper role-based access control (RBAC) assignments so that you can log into Grafana and for Grafana to access the Azure Monitor workspace
  5. Import theKubernetes / Networking dashboard into our Grafana instance

After following the steps above, we'll deploy a sample application to the AKS cluster and explore the network observability dashboard.

NOTE: If you're really curious to know what the--enable-network-observability flag does in Azure CLI, you can read through the source codehere

Setting up Terraform providers

All my Terraform code can be foundhere. You can use this as a reference to follow along with the steps below.

Create a new Terraform configuration file namedmain.tf and add the following code:

terraform{  required_providers{    azurerm={source="hashicorp/azurerm"      version="=3.62.1"}local={source="hashicorp/local"      version="=2.4.0"}    helm={source="hashicorp/helm"      version="=2.10.1"}    azapi={source="Azure/azapi"      version="=1.7.0"}}}provider"azurerm"{  features{    resource_group{      prevent_deletion_if_contains_resources=false}}}provider"helm"{  kubernetes{    config_path= local_file.example.filename}}locals{  name="neto11y${random_integer.example.result}"  location="eastus"}data"azurerm_client_config""current"{}
Enter fullscreen modeExit fullscreen mode

Here we are defining the required Terraform providers and the Azure provider configuration. We are also defining a few local variables that will be used throughout the Terraform configuration.

Notice that we're using theazapi andhelm providers in addition to theazurerm provider. Theazapi provider is used to update our AKS cluster and enable the network observability add-on. With this AKS add-on being in preview, it is not yet available inazurerm, so this is a great opportunity to utilize theazapi provider to update the AKS resource.

Thehelm provider is used to deploy a sample application to our AKS cluster. We'll get to that later.

Deploy AKS and Azure Monitor workspace for Prometheus

Append the following code to yourmain.tf file:

resource"random_integer""example"{  min= 100  max= 999}resource"azurerm_resource_group""example"{  name="rg-${local.name}"  location= local.location}resource"azurerm_kubernetes_cluster""example"{  name="aks-${local.name}"  location= azurerm_resource_group.example.location  resource_group_name= azurerm_resource_group.example.name  dns_prefix="aks-${local.name}"  default_node_pool{    name="default"    node_count= 3    vm_size="Standard_DS3_v2"    os_sku="AzureLinux"}  identity{type="SystemAssigned"}  monitor_metrics{}}resource"azurerm_monitor_workspace""example"{  name="amon-${local.name}"  resource_group_name= azurerm_resource_group.example.name  location= azurerm_resource_group.example.location}resource"azurerm_monitor_data_collection_endpoint""example"{  name="msprom--${azurerm_resource_group.example.location}-${azurerm_kubernetes_cluster.example.name}"  resource_group_name= azurerm_resource_group.example.name  location= azurerm_resource_group.example.location  kind="Linux"}resource"azurerm_monitor_data_collection_rule""example"{  name="msprom--${azurerm_resource_group.example.location}-${azurerm_kubernetes_cluster.example.name}"  resource_group_name= azurerm_resource_group.example.name  location= azurerm_resource_group.example.location  data_collection_endpoint_id= azurerm_monitor_data_collection_endpoint.example.id  data_sources{    prometheus_forwarder{      name="PrometheusDataSource"      streams=["Microsoft-PrometheusMetrics"]}}  destinations{    monitor_account{      monitor_account_id= azurerm_monitor_workspace.example.id      name= azurerm_monitor_workspace.example.name}}  data_flow{    streams=["Microsoft-PrometheusMetrics"]    destinations=[azurerm_monitor_workspace.example.name]}}# associate to a Data Collection Ruleresource"azurerm_monitor_data_collection_rule_association""example_dcr_to_aks"{  name="dcr-${azurerm_kubernetes_cluster.example.name}"  target_resource_id= azurerm_kubernetes_cluster.example.id  data_collection_rule_id= azurerm_monitor_data_collection_rule.example.id}# associate to a Data Collection Endpointresource"azurerm_monitor_data_collection_rule_association""example_dce_to_aks"{  target_resource_id= azurerm_kubernetes_cluster.example.id  data_collection_endpoint_id= azurerm_monitor_data_collection_endpoint.example.id}
Enter fullscreen modeExit fullscreen mode

This will deploy an AKS cluster and an Azure Monitor workspace. It will also create a data collection endpoint and a data collection rule that will collect Prometheus metrics from the AKS cluster and send them to the Azure Monitor workspace.

Therandom_integer resource is used to generate a random number that will be appended to the resource names to make them unique and get us around the Azure naming restrictions.

There are additional alerts you can configure for Prometheus, but that isn't necessary for this walkthrough. We'll omit those for now to keep this post relatively short. You can view the code for thathere for node metrics andhere for k8s metrics

Enable Network Observability add-on

Append the following code to yourmain.tf file:

resource"azapi_update_resource""example"{type="Microsoft.ContainerService/managedClusters@2023-05-02-preview"  resource_id= azurerm_kubernetes_cluster.example.id  body= jsonencode({    properties={      networkProfile={        monitoring={          enabled=true}}}})  depends_on=[     azurerm_monitor_data_collection_rule_association.example_dce_to_aks,    azurerm_monitor_data_collection_rule_association.example_dcr_to_aks,]}
Enter fullscreen modeExit fullscreen mode

Here's where we use theazapi_update_resource resource to enable the Network Observability add-on. You can see that we're issuing a partial update to our AKS cluster resource. We're only updating thenetworkProfile.monitoring.enabled property totrue. This single flag enables the add-on.

Deploy Azure Managed Grafana and import dashboard

Append the following code to yourmain.tf file:

resource"azurerm_dashboard_grafana""example"{  name="amg-${local.name}"  resource_group_name= azurerm_resource_group.example.name  location= azurerm_resource_group.example.location  identity{type="SystemAssigned"}  azure_monitor_workspace_integrations{    resource_id= azurerm_monitor_workspace.example.id}}resource"null_resource""example"{  provisioner"local-exec"{command=<<-EOT      az grafana dashboard import\        --name${azurerm_dashboard_grafana.example.name}\        --resource-group${azurerm_resource_group.example.name}\        --folder 'Managed Prometheus'\        --definition 18814    EOT}  depends_on=[azurerm_role_assignment.example_amg_me]}resource"azurerm_role_assignment""example_amon_me"{  scope= azurerm_monitor_workspace.example.id  role_definition_name="Monitoring Data Reader"  principal_id= data.azurerm_client_config.current.object_id}resource"azurerm_role_assignment""example_amon_amg"{  scope= azurerm_monitor_workspace.example.id  role_definition_name="Monitoring Data Reader"  principal_id= azurerm_dashboard_grafana.example.identity[0].principal_id}resource"azurerm_role_assignment""example_amg_me"{  scope= azurerm_dashboard_grafana.example.id  role_definition_name="Grafana Admin"  principal_id= data.azurerm_client_config.current.object_id}
Enter fullscreen modeExit fullscreen mode

This will deploy an Azure Managed Grafana instance and import theAKS Network Observability dashboard into a folder called "Managed Prometheus". It will also assign the necessary permissions to the Azure Managed Grafana instance and the Azure Monitor workspace. In theazurerm_dashboard_grafana resource definition, you can see that we're using theazure_monitor_workspace_integrations block to integrate the Azure Managed Prometheus with Azure Managed Grafana.

Currently, there isn't any way to automate the import of dashboards into Azure Managed Grafana using Terraform. This is why we're using thenull_resource resource to run theaz grafana dashboard import Azure CLI command. This command will import the dashboard into Grafana. Here it is important that you have the proper permissions to import dashboards into Grafana. This is why we're using theazurerm_role_assignment resource to assign theGrafana Admin role to the current user (you). This role also allows you to authenticate to the Grafana portal using your Azure AD credentials.

Deploy a sample application using Helm

Append the following code to yourmain.tf file:

resource"local_file""example"{  filename="mykubeconfig"  content= azurerm_kubernetes_cluster.example.kube_config_raw}resource"helm_release""example"{  name="aks-store-demo"  chart="../helm/aks-store-demo"  depends_on=[    azapi_update_resource.example]}
Enter fullscreen modeExit fullscreen mode

This deploys a sample application using the Helm provider. In order to authenticate to the AKS cluster, we're using thelocal_file resource to write thekubeconfig file to a local file calledmykubeconfig. We pass this kubeconfig file to the Helm provider when deploying thehelm_release resource and will also use the file when executingkubectl commands below.

Get the endpoint for Azure Managed Grafana

Append the following code to yourmain.tf file:

output"amg_endpoint"{  value= azurerm_dashboard_grafana.example.endpoint}
Enter fullscreen modeExit fullscreen mode

This will output the endpoint for Azure Managed Grafana. You can use this endpoint to access the Grafana portal.

Run the Terraform deployment

Before we run the Terraform deployment, you may need to run a few Azure CLI commands to enable the necessary features.

If you haven't deployed Azure Managed Grafana in your subscription yet, you may need to register the resource provider by running the following command:

az provider register--namespace Microsoft.Dashboard
Enter fullscreen modeExit fullscreen mode

As mentioned above, we'll be using Azure CLI to import a dashboard into our Grafana instance. If you haven't used the Azure CLI to interact with Azure Managed Grafana before, you'll need to enable the feature by running the following command:

az extension add--name amg
Enter fullscreen modeExit fullscreen mode

Now that we have all the code in place, we can run Terraform to deploy the resources. Run the following commands:

terraform initterraform apply
Enter fullscreen modeExit fullscreen mode

This will initialize Terraform and deploy the resources. When prompted, typeyes to confirm the deployment. The deployment will take a few minutes to complete.

Verify that the application is running

If you do not havekubectl installed, you can install it using by running the following command:

az aks install-cli
Enter fullscreen modeExit fullscreen mode

With the Azure resources and application deployed, you can now verify that it's running by running the followingkubectl command:

kubectl--kubeconfig mykubeconfig get pod
Enter fullscreen modeExit fullscreen mode

You should see output similar to the following:

NAME                                READY   STATUS      RESTARTS        AGEmakeline-service-7777968887-b5jgh   1/1     Running     0               9m20smongodb-588bb45ff4-68mrx            1/1     Running     0               9m20sorder-service-646cd9fbbb-nz2md      1/1     Running     0               9m20sproduct-service-646dcdfc4d-k8dxw    1/1     Running     0               9m20srabbitmq-74699bc7f9-5q67r           1/1     Running     0               9m20sstore-admin-86d6c8c9c6-l6rt6        1/1     Running     0               9m20sstore-front-fb98898d5-j8285         1/1     Running     0               9m20svirtual-customer-577f759489-kjpqs   1/1     Running     0               9m20svirtual-worker-77dfb6d9c9-lw45q     1/1     Running     0               9m20s
Enter fullscreen modeExit fullscreen mode

Explore the Network Observability dashboard

Now that the application is running, you can explore the Network Observability dashboard. To do this, let's get the endpoint for Azure Managed Grafana. Run the following command:

terraform output amg_endpoint
Enter fullscreen modeExit fullscreen mode

Open a browser and navigate to the Azure Managed Grafana endpoint. You should see a login page, where you can login using your Azure AD credentials. Once logged in, you'll see navigation on the left side of the page. Click on theDashboards button.

Azure Managed Grafana dashboards button

You should see a list of dashboards. Click on theManaged Prometheus folder to expand it. Here you should see theKubernetes / Networking dashboard. Click on the dashboard to open it.

Azure Managed Grafana Kubernetes Networking dashboard

From here you can explore the dashboard and see the metrics that are being collected. You'll notice there are several collapsible sections to view

  • Traffic stats to view ingress and egress traffic packet rates
  • Drop stats to view drop counters including IP table rule drops
  • Connection stats to view TCP and UDP connection counters
  • Interface stats to view and identify any issues with network interfaces

Azure Managed Grafana Kubernetes Networking dashboard

Conclusion

In this article, you learned how to streamline network observability in AKS using the Azure Kubernetes Service Network Observability add-on. You learned how to deploy the add-on using Terraform and how to explore the metrics in the Azure Managed Grafana dashboard. You also learned how to deploy a sample application using Helm and how to verify that it's running, then explored the metrics in the Azure Managed Grafana dashboard.

Isn't this a better way?!?

This is the way

This feature is currently in preview and will continue to improve over time. It is limited to node-level metrics at this time, and pod-level metrics are coming soon.

If you have any feedback or suggestions, please feel free to reach out to me onTwitter orLinkedIn.

Peace ✌️

Resources

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Invent with purpose

Any language. Any platform.

More fromMicrosoft Azure

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp