Posted onJun 19, 2024 • Edited onJun 22, 2024

Back2Basics: Running Workloads on Amazon EKS

#aws #eks #kubernetes #karpenter

Overview

Welcome back to theBack2Basics series! In this part, we'll explore howKarpenter, a just-in-time node provisioner, automatically manages nodes based on your workload needs. We'll also walk you through deploying a voting application to showcase this functionality in action.

If you haven't read the first part, you can check it out here:

Back2Basics: Setting Up an Amazon EKS Cluster

Romar Cablao for AWS Community Builders ・ Jun 12

#aws#eks#kubernetes#opentofu

Infrastructure Setup

In the previous post, we covered the fundamentals of cluster provisioning usingOpenTofu and simple workload deployment. Now, we will enable additional addons includingKarpenter for automatic node provisioning based on workload needs.

First we need to uncomment these lines in03_eks.tf to create taints on the nodes managed by the initial node group.

      # Uncomment this if you will use Karpenter      # taints = {      #   init = {      #     key    = "node"      #     value  = "initial"      #     effect = "NO_SCHEDULE"      #   }      # }

Taints ensure that only pods configured to tolerate these taints can be scheduled on those nodes. This allows us to reserve the initial nodes for specific purposes whileKarpenter provisions additional nodes for other workloads.

We also need to uncomment the codes in04_karpenter and05_addons to activateKarpenter and provision other addons.

Once updated, we have to runtofu init,tofu plan andtofu apply. When prompted to confirm, typeyes to proceed with provisioning the additional resources.

Karpenter

Karpenter is an open-source project that automates node provisioning in Kubernetes clusters. By integrating with EKS, Karpenter dynamically scales the cluster by adding new nodes when workloads require additional resources and removing idle nodes to optimize costs. The Karpenter configuration defines different node classes and pools for specific workload types, ensuring efficient resource allocation. Read more:https://karpenter.sh/docs/

The template04_karpenter defines several node classes and pools categorized by workload type. These include:

critical-workloads: for running essential cluster addons
monitoring: dedicated to Grafana and other monitoring tools
vote-app: for the voting application we'll be deploying

Workload Setup

The voting application consists of several components:vote,result ,worker,redis, andpostgresql. While we'll deploy everything on Kubernetes for simplicity, you can leverage managed services likeAmazon ElastiCache for Redis andAmazon RDS for a production environment.

Component	Description
Vote	Handles receiving and processing votes.
Result	Provides real-time visualizations of the current voting results.
Worker	Synchronizes votes between Redis and PostgreSQL.
Redis	Stores votes temporarily, easing the load on PostgreSQL.
PostgreSQL	Stores all votes permanently for secure and reliable data access.

Here's the Voting App UI for both voting and results.

Deployment Using Kubernetes Manifest

If you explore theworkloads/manifest directory, you'll find separate YAML files for each workload. Let's take a closer look at the components used for stateful applications likepostgres andredis:

apiVersion: v1kind: Secret...---apiVersion: v1kind: PersistentVolumeClaim...---apiVersion: apps/v1kind: StatefulSet...---apiVersion: v1kind: Service...

As you may see,Secret,PersistentVolumeClaim,StatefulSet andService were used forpostgres andredis. Let's take a quick review of the following API objects used:

Secret - used to store and manage sensitive information such as passwords, tokens, and keys.
PersistentVolumeClaim - a request for storage, used to provision persistent storage dynamically.
StatefulSet - manages stateful applications with guarantees about the ordering and uniqueness ofpods.
Service - used for exposing an application that is running as one or morepods in the cluster.

Now, lets viewvote-app.yaml,results-app.yaml andworker.yaml:

apiVersion: v1kind: ConfigMap...---apiVersion: apps/v1kind: Deployment...---apiVersion: v1kind: Service...

Similar topostgres andredis, we have used a service for stateless workloads. Then we introduce the use ofConfigmap andDeployment.

Configmap - stores non-confidential configuration data in key-value pairs, decoupling configurations from code.
Deployment - used to provide declarative updates forpods andreplicasets, typically used for stateless workloads.

And lastly theingress.yaml. To make our service accessible from outside the cluster, we'll use anIngress. This API object manages external access to the services in a cluster, typically in HTTP/S.

apiVersion: networking.k8s.io/v1kind: Ingress...

Now that we've examined the manifest files, let's deploy them to the cluster. You can use the following command to apply all YAML files within theworkloads/manifest/ directory:

kubectl apply -f workloads/manifest/

For more granular control, you can apply each YAML file individually. To clean up the deployment later, simply runkubectl delete -f workloads/manifest/

While manifest files are a common approach, there are alternative tools for deployment management:

Kustomize: This tool allows customizing raw YAML files for various purposes without modifying the original files.
Helm: A popular package manager for Kubernetes applications. Helm charts provide a structured way to define, install, and upgrade even complex applications within the cluster.

Deployment Using Kustomize

Let's checkKustomize. If you haven't installed it's binary, you can refer toKustomize Installation Docs. This example utilizes an overlay file to make specific changes to the default configuration. To apply the builtkustomization, you can run the command:

kustomize build .\workloads\kustomize\overlays\dev\ | kubectl apply -f -

Here's what we've modified:

Added an annotation:note: "Back2Basics: A Series".
Set the replicas for both thevote andresult deployments to3.

To check you can refer to the commands below:

D:\> kubectl get pod -o custom-columns=NAME:.metadata.name,ANNOTATIONS:.metadata.annotationsNAME                          ANNOTATIONSpostgres-0                    map[note:Back2Basics: A Series]redis-0                       map[note:Back2Basics: A Series]result-app-6c9dd6d458-8hxkf   map[note:Back2Basics: A Series]result-app-6c9dd6d458-l4hp9   map[note:Back2Basics: A Series]result-app-6c9dd6d458-r5srd   map[note:Back2Basics: A Series]vote-app-cfd5fc88-lsbzx       map[note:Back2Basics: A Series]vote-app-cfd5fc88-mdblb       map[note:Back2Basics: A Series]vote-app-cfd5fc88-wz5ch       map[note:Back2Basics: A Series]worker-bf57ddcb8-kkk79        map[note:Back2Basics: A Series]D:\> kubectl get deployNAME         READY   UP-TO-DATE   AVAILABLE   AGEresult-app   3/3     3            3           5mvote-app     3/3     3            3           5mworker       1/1     1            1           5m

To remove all the resources we created, run the following command:

kustomize build .\workloads\kustomize\overlays\dev\ | kubectl delete -f -

Deployment Using Helm Chart

Next to check isHelm. If you haven't installed helm binary, you can refer toHelm Installation Docs. Once installed, lets add a repository and update.

helm repo add thecloudspark https://thecloudspark.github.io/helm-chartshelm repo update

Next, create avalues.yaml and add some overrides to the default configuration. You can also use existing config inworkloads/helm/values.yaml. This is how it looks like:

ingress:  enabled: true  className: alb  annotations:    alb.ingress.kubernetes.io/scheme: internet-facing    alb.ingress.kubernetes.io/target-type: instance# Vote Handler Configvote:  tolerations:    - key: app      operator: Equal      value: vote-app      effect: NoSchedule  nodeSelector:    app: vote-app  service:    type: NodePort# Results Handler Configresult:  tolerations:    - key: app      operator: Equal      value: vote-app      effect: NoSchedule  nodeSelector:    app: vote-app  service:    type: NodePort# Worker Handler Configworker:  tolerations:    - key: app      operator: Equal      value: vote-app      effect: NoSchedule  nodeSelector:    app: vote-app

As you may see, we addednodeSelector andtolerations to make sure that thepods will be scheduled on the dedicated nodes where we wanted them to run. This Helm chart offers various configuration options and you can explore them in more detail onArtifactHub: Vote App.

Now install the chart and apply overrides from values.yaml

# Installhelm install app -f workloads/helm/values.yaml thecloudspark/vote-app# Upgradehelm upgrade app -f workloads/helm/values.yaml thecloudspark/vote-app

Wait for the pods to be up and running, then access the UI using the provisioned application load balancer.

To uninstall just run the command below.

helm uninstall app

Going back to Karpenter

Under the hood,Karpenter provisioned nodes used by the voting app we've deployed. The sample logs you see here provide insights into it's activities:

{"level":"INFO","time":"2024-06-16T10:15:38.739Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"fb4d75f","pods":"default/result-app-6c9dd6d458-l4hp9, default/worker-bf57ddcb8-kkk79, default/vote-app-cfd5fc88-lsbzx","duration":"153.662007ms"}{"level":"INFO","time":"2024-06-16T10:15:38.739Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"fb4d75f","nodeclaims":1,"pods":3}{"level":"INFO","time":"2024-06-16T10:15:38.753Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"fb4d75f","nodepool":"vote-app","nodeclaim":"vote-app-r9z7s","requests":{"cpu":"510m","memory":"420Mi","pods":"8"},"instance-types":"m5.2xlarge, m5.4xlarge, m5.large, m5.xlarge, m5a.2xlarge and 55 other(s)"}{"level":"INFO","time":"2024-06-16T10:15:41.894Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470","instance-type":"t3.small","zone":"ap-southeast-1b","capacity-type":"spot","allocatable":{"cpu":"1700m","ephemeral-storage":"14Gi","memory":"1594Mi","pods":"11"}}{"level":"INFO","time":"2024-06-16T10:16:08.946Z","logger":"controller.nodeclaim.lifecycle","message":"registered nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470","node":"ip-10-0-206-99.ap-southeast-1.compute.internal"}{"level":"INFO","time":"2024-06-16T10:16:23.631Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470","node":"ip-10-0-206-99.ap-southeast-1.compute.internal","allocatable":{"cpu":"1700m","ephemeral-storage":"15021042452","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"1663292Ki","pods":"11"}}

As shown in the logs, whenKarpenter found pod/s that needs to be scheduled, a new node claim was created, launched and initialized. So whenever there is a need for additional resources, this component is responsible in fulfilling it.

Additionally,Karpenter automatically labels nodes it provisions withkarpenter.sh/initialized=true. Let's usekubectl to see these nodes:

kubectl get nodes -l karpenter.sh/initialized=true

This command will list all nodes that have this specific label. As you can see in the output below, three nodes have been provisioned byKarpenter:

NAME                                              STATUS   ROLES    AGE   VERSIONip-10-0-208-50.ap-southeast-1.compute.internal    Ready    <none>   10m   v1.30.0-eks-036c24bip-10-0-220-238.ap-southeast-1.compute.internal   Ready    <none>   10m   v1.30.0-eks-036c24bip-10-0-206-99.ap-southeast-1.compute.internal    Ready    <none>   1m    v1.30.0-eks-036c24b

Lastly, let's check related logs for node termination. This process involves removing nodes from the cluster. Decommissioning typically involves tainting the node first to prevent furtherpod scheduling, followed by node deletion.

{"level":"INFO","time":"2024-06-16T10:35:39.165Z","logger":"controller.disruption","message":"disrupting via consolidation delete, terminating 1 nodes (0 pods) ip-10-0-206-99.ap-southeast-1.compute.internal/t3.small/spot","commit":"fb4d75f","command-id":"5e5489a6-a99d-4b8d-912c-df314a4b5cfa"}{"level":"INFO","time":"2024-06-16T10:35:39.483Z","logger":"controller.disruption.queue","message":"command succeeded","commit":"fb4d75f","command-id":"5e5489a6-a99d-4b8d-912c-df314a4b5cfa"}{"level":"INFO","time":"2024-06-16T10:35:39.511Z","logger":"controller.node.termination","message":"tainted node","commit":"fb4d75f","node":"ip-10-0-206-99.ap-southeast-1.compute.internal"}{"level":"INFO","time":"2024-06-16T10:35:39.530Z","logger":"controller.node.termination","message":"deleted node","commit":"fb4d75f","node":"ip-10-0-206-99.ap-southeast-1.compute.internal"}{"level":"INFO","time":"2024-06-16T10:35:39.989Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","node":"ip-10-0-206-99.ap-southeast-1.compute.internal","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470"}

What's Next?

We've successfully deployed our voting application! And thanks toKarpenter, new nodes are added automatically when needed and terminates when not - making our setup more robust and cost effective. In the final part of this series, we'll delve into monitoring the voting application we've deployed withGrafana andPrometheus, providing us the visibility into resource utilization and application health.