NotificationsYou must be signed in to change notification settings
Fork2
Star54

Hot-swap Kubernetes clusters while keeping your service up and running.

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github/workflows		.github/workflows
api		api
charts/okra		charts/okra
cmd		cmd
config		config
docs		docs
hack		hack
pkg		pkg
testdata		testdata
.dockerignore		.dockerignore
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
Dockerfile		Dockerfile
Dockerfile.goreleaser		Dockerfile.goreleaser
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Repository files navigation

Okra

This project is under heavy development and has no tagged releases yet.
But I'd still appreciate it if you could help me by testing it and submitting pull requests,so that you can get the first release earlier!
We already have a throughout getting-started guide, a working Helm chart, and a container image published atmumoshu/okra:canary. So it shouldn't be that hard to give it a shot.

Okra is aKubernetes controller and a set ofCRDs which provide advanced multi-cluster appilcation rollout capabilities, such as canary deployment of clusters.

okra eases managing a lot of ephemeral Kubernetes clusters.

If you've been using ephemeral Kubernetes clusters and employed blue-green or canary deployments for zero-downtime cluster updates, you might have suffered from a lot of manual steps required.okra is intended to automate all those steps.

In a standard scenario, a system update withokra would like the below.

You provision one or more new clusters with cluster tags likename=web-1-v2, role=web, version=v2
Okra auto-imports the clusters intoArgoCD
ArgoCD ApplicationSet deploys your apps onto the new clusters
Okra updates the loadbalancer configuration to gradually migrate traffic to the new clusters, while running various checks to ensure application availability

ToC:

Project Status and Scope

okra (currently) integrates with AWS ALB and target groups for traffic management, CloudWatch Metrics and Datadog for canary analysis.

okra currently works on AWS only, but the design and the implementation of it is generic enough to be capable of adding more IaaS supports. Any contribution around that is welcomed.

Here's the list of possible additional IaaSes that the original author (@mumoshu) has thought of:

Cluster API
GKE

Here's the list of possible additional loadbalancers:

Concepts

Okra managescells for you. A cell can be compared to a few things.

A cell is like a Kubernetes pod of containers. A Kubernetes pod an isolated set of containers, where each container usually runs a single application, and you can have two or more pods for availability and scalability. A Okracell is a set of Kubernetes clusters, where each cluster runs your application and you can have two or more clusters behind a loadbalancer for horizontal scalability beyond the limit of a single cluster.

A cell is like a storage array but for Kubernetes clusters. You hot-swap a disk in a storage array while running. Similarly, withokra you hot-swap a cluster in a cell while keeping your application up and running.

Okra'scell-contorller is responsible for managing the traffic shift across clusters.

You give eachCell a set of settings to discover AWS target groups and configure loadbalancers, and metrics.

The controller periodically discovers AWS target groups. Once there are enough number of new target groups, it then compares the target groups associated to the loadbalancer. If there's any difference, it starts updating the ALB while checking various metrics for safe rollout.

Okra uses Kubernetes CRDs and custom resources as a state store and uses the standard Kubernetes API to interact with resources.

Okra calls various AWS APIs to create and update AWS target groups and update AWS ALB and NLB forward config for traffic management.

Comparison with Flagger and Argo Rollouts

UnlikeArgo Rollouts andFlagger, inOkra there is no notions of "active" and "preview" services for a blue-green deployment, or "canary" and "stable" services for a canary deployment.

It assumes there's one or more target groups per cell.cell basically does a canary deployment, where the old set of target groups is consdidered "stable" and the new set of target groups is considered "canary".

InFlagger orArgo Rollouts, you need to update its K8s resource to trigger a new rollout. In Okra you don't need to do so. You preconfigure its resource and Okra auto-starts a rollout once it discovers enough number of new target groups.

How it works

okra updates yourCell.

A okraCell is composed of target groups and an AWS loadbalancer, and a set of metrics for canary anlysis.

Each target group is tied to acluster, where acluster is a Kubernetes cluster that runs your container workloads.

Anapplication is deployed ontoclusters byArgoCD. The traffic to theapplication is routed via anAWS ALB in front ofclusters.

okra acts as an application traffic migrator.

It detects newtarget groups, and live migrate traffic by hot-swaping old target groups serving the affectedapplications with the new target groups, while keepining theapplications up and running.

Getting Started

Install Okra

First, you need to provision a Kubernetes cluster that is running ArgoCD, Argo Rollouts, and ArgoCD ApplicationSet controller.We call itmanagement cluster in the following guide.

To deploy required components onto the management cluster, use the following snippet:

# 1. Install ArgoCD and ApplicationSet# https://argocd-applicationset.readthedocs.io/en/stable/Getting-Started/#b-install-applicationset-and-argo-cd-togetherkubectl create namespace argocdkubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj-labs/applicationset/v0.3.0/manifests/install-with-argo-cd.yaml# 2. Install Argo Rollouts# https://argoproj.github.io/argo-rollouts/installation/kubectl create namespace argo-rolloutskubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Once your management cluster is up and running, installokra on it using Helm or Kustomize.

Option 1: Helm:

$ helm upgrade --install charts/okra -f values.yaml

Option 2: Kustomize:

$ kustomize build config/manager | kubectl apply -f

You can specify okra's container image tag to anything that is available onhttps://hub.docker.com/r/mumoshu/okra/tags.
For Helm, you do it likehelm upgrade --install charts/okra --set image.tag=$TAG.

Note that you need to provide AWS credentials tookra asit calls various AWS API to list and describe EKS clusters, generate Kubernetes API tokens, and interacting with loadbalancers.
For Helm, the simplest (but not recommended in production) way would be to provideAWS_ACCESS_KEY_ID andAWS_SECRET_ACCESS_KEY:
values.yaml:
region:ap-northeast-1image:tag:"canary"additionalEnv:-name:AWS_ACCESS_KEY_IDvalue:"..."-name:AWS_SECRET_ACCESS_KEYvalue:"..."
For production environments, you'd better useIAM roles for service accounts for security reason.

Create Load Balancer

Create a loadbalancer in front of all the clusters you're going to manage with Okra.

Currently, only AWS Application LoadBalancer is supported.

You can use Terraform, AWS CDK, Pulumi, AWS Console, AWS CLI, or whatever tool to create the loadbalancer.

The only requirement to use that with Okra is to take note of "ALB Listener ARN", which is used to tell Okrawhich loadbalancer to use for traffic management.

Provision Kubernetes Clusters

Onceokra is ready and you see no error, add one or more EKS clusters on your AWS account.

Tag your EKS clusters withService=demo, as we use it to letokra auto-import those as ArgoCD cluster secrets.
Create one or more target groups per EKS cluster and take note of target group ARNS

Deploy Applications onto Clusters

Do either of the below to register clusters to ArgoCD and Okra

Runargocd cluster add on the new cluster and either (1) create a new ArgoCDApplication custom resource per cluster or (2) let ArgoCDApplicationSet custom resource to auto-deploy onto the clusters
Use Okra'sClusterSet to auto-import EKS clusters to ArgoCD and useApplicationSet to auto-deploy

SeeArgocd cluster add - Argo CD - Declarative GitOps CD for Kubernetes for more information on theargocd cluster add command.

Also seeargoproj-labs/applicationset for more information on ArgoCDApplicationSet and the controller.

Auto-Deploy with ApplicationSet and ClusterSet

Assuming your Okra instance has access to AWS EKS and STS APIs, you can use Okra'sClusterSet custom resources toauto-discover EKS clusters and create corresponding ArgoCD cluster secrets.

This, in combination with ArgoCDApplicationSet, enables you to auto-deploy your applications onto any newly createdEKS clusters, without ever touching ArgoCD or Okra at all.

The followingClusterSet auto-discovers AWS EKS clusters tagged withService=demo and createscorresponding ArgoCD cluster secrets.

apiVersion:okra.mumo.co/v1alpha1kind:ClusterSetmetadata:name:cell1spec:generators:  -awseks:selector:matchTags:Service:demotemplate:metadata:labels:service:demo

Note thattemplate.metadat.labels.sevice instruct cluster secrets to getmetadata.labels ofservice: demo, so thatAWSTargetGroupSet can discover those clusters by labels.

Let's say you had an EKS cluster that looks like the below:

$ aws eks describe-cluster --name cdk1{    "cluster": {        "name": "cdk1",        "arn": "arn:aws:eks:REGION:ACCOUNT:cluster/cdk1",        "createdAt": "2021-09-20T03:21:44.391000+00:00",        "version": "1.21",        "endpoint": "https://SOME_CLUSTER_ID.SOME_SHARD_ID.REGION.eks.amazonaws.com",        "roleArn": "arn:aws:iam::ACCOUNT:role/NAME",        "resourcesVpcConfig": {            "subnetIds": [                "subnet-aaa",                "subnet-bbb",                "subnet-ccc"            ],            "securityGroupIds": [                "sg-ddd"            ],            "clusterSecurityGroupId": "sg-eee",            "vpcId": "vpc-fff",            "endpointPublicAccess": true,            "endpointPrivateAccess": true,            "publicAccessCidrs": [                "0.0.0.0/0"            ]        },        "kubernetesNetworkConfig": {            "serviceIpv4Cidr": "172.20.0.0/16"        },        "logging": {            "clusterLogging": [                {                    "types": [                        "api",                        "audit",                        "authenticator",                        "controllerManager",                        "scheduler"                    ],                    "enabled": false                }            ]        },        "identity": {            "oidc": {              ...            }        },        "status": "ACTIVE",        "certificateAuthority": {          ...        },        "platformVersion": "eks.2",        "tags": {            "Service": "demo"        }    }}

Note that this Okra is able to find this EKS cluster because:

This cluster hastags of"Service": "demo" while
The ClusterSet created above hasgenerators[].awseks.selector.matchTags ofService: demo

An ArgoCD cluster secret created by the aboveClusterSet should look the below, which is a regular ArgoCD cluster secret with the specified labels.

apiVersion:v1kind:Secretmetadata:name:cdk1namespace:defaultlabels:argocd.argoproj.io/secret-type:clusterservice:demotype:Opaquedata:config:<BASE64 ENCODED CONFIG JSON>name:<BASE64 ENCODED CLUSTER NAME>server:<BASE64 ENCODED HTTPS URL OF K8S API ENDPOINT>

Again, note that this clustser secret gotmetadata.labels ofservice: demo, becauseClusterSet hadtemplate.metadat.labels.sevice.

Register Target Groups

Okra works by gradually updating target groups weights behind a loadbalancer. In order to do so,you firstly need to tell which target groups to manage, by creatingAWSTargetGroup custom resourceon your management cluster per target group.

AnAWSTargetGroup custom resource is basically a target group ARN with a version number and labels.

apiVersion:okra.mumo.co/v1alpha1kind:AWSTargetGroupmetadata:name:default-web1labels:role:webokra.mumo.co/version:1.0.0spec:# Replace REGION, ACCOUNT, NAME, and ID with the actual valuesarn:arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/NAME/ID

Auto-Register Target Groups with AWSTargetGroupSet

Assuming you've already created ArgoCD cluster secret for clusters, Okra'sAWSTargetGroupSet can be used to auto-discovertarget groups associated to the cluster and register those asAWSTargetGroup resources.

The followingAWSTargetGroupSet auto-discoversTargetGroupBinding resources labeled withrole=web from clusterslabeled withservice=demo, to create correspondingAWSTargetGroup resources in the management cluster.

apiVersion:okra.mumo.co/v1alpha1kind:AWSTargetGroupSetmetadata:name:cell1namespace:defaultspec:generators:  -awseks:bindingSelector:matchLabels:role:webclusterSelector:matchLabels:service:demotemplate:metadata:{}

Let's say you had the belowTargetGroupBinding custom resource labled withrole: web in the new cluster labeled withserivce: demo:

# In the new clusterapiVersion:elbv2.k8s.aws/v1beta1kind:TargetGroupBindingmetadata:name:web1namespace:defaultlabels:role:webokra.mumo.co/version:1.0.0spec:arn:arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/NAME/ID

Okra is able to find the cluster thanks toclusterSelector.matchLabels.service=demo and also able find this target group binding thanks tobindingSelector.matchLabels.role=web.

The outcome is that Okra creates the belowAWSTargetGroup in the management cluster. Note thatmetadata.name of it isderived from the originalTargetGroupBinding'smetadata.namespace andmetadata.name, concatenated with- in between.

# In the management clusterapiVersion:okra.mumo.co/v1alpha1kind:AWSTargetGroupmetadata:name:default-web1labels:role:webokra.mumo.co/version:1.0.0spec:# Replace REGION, ACCOUNT, NAME, and ID with the actual valuesarn:arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/NAME/ID

The labelrole: web is later used byCell to detect it as a candidate for a canary target group, and theokra.mumo.co/version: 1.0.0 label is used to group and sort all the detected target groups to finally see which set of target groups are considered as a part of the next canary.

Create Cell

Finally, create aCell resource.

It specifies how it utilizes an existing AWS ALB inSpec.Ingress.AWSApplicationLoadBalancer and which listener rule to be used for rollout, and the information to detect target groups that serves your application.

An exampleCell custom resource follows.

On each reconcilation loop, Okra looks forAWSTargetGroup resources labeled withrole=web,and group those up by the version numbers saved under theokra.mumo.co/version labels.

AsSpec.Replicas being set to2, it waits until 2 latest target groups appear, and starts a canary rollout only after that.

If your application is not that big and a single cluster suffices, you can safely setreplicas: 1 or omitreplicas at all.

kind:Cellmetadata:name:cell1spec:ingress:type:AWSApplicationLoadBalancerawsApplicationLoadBalancer:listener:rule:forward:{}hosts:          -example.compriority:10listenerARN:arn:aws:elasticloadbalancing:ap-northeast-1:ACCOUNT:listener/app/...targetGroupSelector:matchLabels:role:webreplicas:2updateStrategy:canary:steps:      -setWeight:20      -analysis:args:          -name:service-namevalue:exampleapptemplates:          -templateName:success-rate      -pause:duration:5s      -setWeight:40type:Canary

Configuring ALB listener rule for the Cell

The listener rule part is required in order to configure your ALB.

listener:rule:forward:{}hosts:    -example.compriority:10

And this directly corresponds to the configuration of anALB Listener Rule.

priority: 10 is the priority of the listener rule to be added to your ALB listener andhosts: [example.com] is the conditions associated to the rule.

ALB supports both default and non-default listener rules. Every non-default listener rule requires a priority and non-empty rule conditions.

Okra is designed to not modify the default rule as it can be disruptive sometimes. That's why it requirespriority and a rule condition.

Other rule conditions likeheaders,methods,pathPatterns, and so on are also supported. See the output ofkubectl explain awsapplicationloadbalancerconfig.spec.listener --recursive for all the available conditions.

Configuring Canary Rollout Steps for the Cell

spec.updateStategy.canary.steps contains a definition of canary rollout steps.

Each step can be any of the belows:

setWeight: updates the canary target groups total weight to the given value. For example, when there's only one canary target group to be rolled out and it'ssetWeight: 20, the canary target group gets weight of20. If there were two canary target groups, each gets weight of10
analysis: runs ArgoCDAnalysisRun with given arguments. SeeArgoCD's documentation on Analysis for more information onAnalysisRun and its templateAnalysisTemplate.
pause: pauses the rollout for the duration.

Create and Rollout New Clusters

Now you're all set!

Every time you provision new clusters with greater version number,Cell automatically discovers new target groups associated to the new clusters, gradually update loadbalancer target groups weights while running various analysis.

Need a Kubernetes version upgrade? Create new Kubernetes clusters with the new Kubernetes version and watchCell automatically and safely rolls out the clusters.

Need a host OS upgrade? Create new clusters with nodes with the new version of the host OS and watchCell rolls out the new clusters.

And you can do the same on every kind of cluster-wide change! Enjoy running your ephemeral Kubernetes clusters.

Analysises and Experiments

Okra provides almost the same features for Argo RolloutsAnalysises andExperiments.

A major difference between Argo Rollouts' and Okra's is that Okra's provides access to thestatus.desiredVersion field in an analysis and experiment query argument, so that you can analyze and experiment your canary based on metrics specific to the version of your clusters.

In the below example, we have an analysis step that creates an analaysis run from a analysis template namedsuccess-rate, whose argumentcluster-version is set to the value obtained fromcell.status.desiredVersion.

ThedesiredVersion status field contains the desired version number(obtained from e.g. EKS cluster tags and AWS target group tags) of the cluster repliacs being rolled out, so that you can analyze based on metrics specific to the newly rolled out clusters.

A typical cell whose one of canary steps is aanalysis would look like the below. Notice thefieldPath: status.desiredVersion used to dynamically generate thecluster-version analysis run argument.

apiVersion:okra.mumo.co/v1alpha1kind:Cellmetadata:name:webspec:updateStrategy:type:Canarycanary:steps:# ...      -analysis:templates:          -templateName:success-rateargs:          -name:service-namevalue:guestbook-svc.default.svc.cluster.local          -name:cluster-versionvalueFrom:fieldRef:fieldPath:status.desiredVersion

Similarly, an experiment step can inculde afieldPath to have a dynaically generate argument:

apiVersion:okra.mumo.co/v1alpha1kind:Cellmetadata:name:webspec:updateStrategy:type:Canarycanary:steps:# ...      -experiment:duration:5mtemplates:          -name:wy# references the wy replicaset defined belowspecRef:wy# This should default to 1 as defined by Argo Rollouts but# the author observed that it doesn't work in practice.#replicas:1analyses:          -name:success-rate-ddtemplateName:success-rate-ddargs:            -name:service-namevalue:wy-serve            -name:cluster-versionvalueFrom:fieldRef:fieldPath:status.desiredVersion---apiVersion:apps/v1kind:ReplicaSetmetadata:labels:app:wyname:wyspec:replicas:0selector:matchLabels:app:wytemplate:metadata:creationTimestamp:nulllabels:app:wyspec:containers:      -image:mumoshu/wy:latestname:wyports:        -containerPort:8080resources:{}args:        -repeat        -get        --forever        --interval=5s        --url=http://localhost:8080        --argocd-cluster-secret=cdk1        --service=wy-serve        --remote-port=8080        --local-port=8080envFrom:        -secretRef:name:wyoptional:true

How it integrates with Datadog

As explained earlier,Okra relies on Argo RolloutsDatadog support.

It works like this- you define a OkraCell, so that the okra controller creates either Argo RolloutsAnalysisRun orExperiment, which in turn instruct Argo Rollouts to periodically query Datadog metrics withthe "Query timeseries points" API, update the AnalysisRun or Experiment's statuses to be eitherSuccessful orFailed. The final step is the okra controller gets notified about the status update and react to it by reconciling the parentCell resource, incrementing the canary step.

The only part specific to Datadog is that it queries Datadog, which has been implemented inargoproj/argo-rollouts#705 in Argo Rollouts.

If you're curious how you'd instrument your app so that it's metrics cna be used from Okra, you'd better get started by reading e.g.Mapping Prometheus Metrics to Datadog Metrics. There's nothing specific to Okra here.

Before authoring a complexCell spec including Analysis and Expriment, the author recommends you to try browsing Datadog dashboard, or use simpler tool likecurl to query metries.After you've done so, start tinkering with Okra, so that when it break you can be extra sure when and where it broke!

Notes

It is inteded to be deployed onto a "control-plane" cluster to where you usually deploy applications like ArgoCD.

It requires you to use:

NLB or ALB to load-balance traffic "across" clusters
- You bring your own LB, Listener, and tellokra the Listener ID, Number of Target Groups per Cell, and a label to group target groups by version.
Uses ArgoCD ApplicationSets to deploy your applications onto cluster(s)

In the future, it may add support for using Route 53 Weighted Routing instead of ALB.

Although we assume you use ApplicationSet for app deployments, it isn't really a strict requirement. Okra doesn't communiate with ArgoCD or ApplicationSet. All Okra does is to discover EKS clusters, create and label target groups for the discovered clusters, and rollout the target groups. You can just bring your own tool to deploy apps onto the clusters today.

It supports complex configurations like below:

One or more clusters per cell, or an ALB listener rule. Imagine a case that you need a pair of clusters to serve your service.okra is able to canary-deploy the pair of clusters, by periodically updating two target group weights as a whole.

The following situations are handled by Okra:

When there are enough number of "new" target groups, Okra gradually updates target group weights for a rollout
Okra automatically falls back to a "old" target groups when there are only old target groups in the AWS account while ALB points to "new" target groups that disappeared

CRDs

Okra provides several Kuberntetes CustomResourceDefinitions(CRD) to achieve its goal.

Seecrd.md for more documentation and details of each CRD.

CLI

okra provides 3 executables.

okrad: the Kubernetes controller manager that consists of various Kubernetes controller for Okra CRDs. Intended to be run in a Kubernetes cluster.
okractl: Akubectl-like CLI application that is for interacting withokrad through Kubernetes API server. Intended to be run on your machine or on a CI system for automation.
okra: the standalone CLI application that does its best to provide every single logic implemented inokrad's controllers. Intended to be run in CI to replicateokrad's functionality on a CI system, or to test each okra functionality in isolation.

The standard and author's recommended usage of Okra involvesokrad andokractl.

Forokra, we do our best to expose every singleokrad +okractl functionality via respectiveokra CLI commands, so that you can test each functionality in isolation.

It may be even possible to build your own CI job that replacesokra out of those commands!

SeeCLI for more information and its usage.

Related Projects

Okra is inspired by various open-source projects listed below.

ArgoCD is a continuous deployment system that embraces GitOps to sync desired state stored in Git with the Kubernetes cluster's state.okra integrates withArgoCD and especially itsApplicationSet controller for applicaation deployments.
- okra relies on ArgoCDApplicationSet controller'sCluster Generator feature
Flagger andArgo Rollouts enables canary deployments of apps running across pods.okra enables canary deployments of clusters running on IaaS.
argocd-clusterset auto-discovers EKS clusters and turns those into ArgoCD cluster secrets.okra does the same with itsClusterSet CRD andargocdcluster-controller.
terraform-provider-eksctl's courier_alb resource enables canary deployments on target groups behind AWS ALB with metrics analysis for Datadog and CloudWatc metrics.okra does the same with it'sAWSApplicationLoadBalancerConfig CRD andawsapplicationloadbalancerconfig-controller.

Why is it named "okra"?

Initially it was namedkubearray, but the original author wanted something more catchy and pretty.

In the beginning of this project, the author thought that hot-swapping a cluster while keeping your apps running looks like hot-swaping a drive while keeping a server running.

We tend to call a cluster of storages where each storage drive can be hot-swapped a "storage array", hence calling a tool to build a cluster of clusters where each cluster can be hot-swapped "kubearray" seemed like a good idea.

Later, he searched over the Internet for a prettier and catchier alternative. While browsing a list of cool Japanese terms with 3 syllables, he encountered "okra". "Okra" is a pod vegetable full of edible seeds. The term is relatively unique that it sounds almost the same in both Japanese and English. The author thought that "okra" can be a good metaphor for a cluster of sub-clusters when each seed in an okra is compared to a sub-cluster.

About

Hot-swap Kubernetes clusters while keeping your service up and running.

Releases3

v0.0.7 Latest

Feb 3, 2022

+ 2 releases

Sponsor this project

Learn more about GitHub Sponsors

Movatterモバイル変換

Uh oh!

mumoshu/okra

Folders and files

Latest commit

History

Repository files navigation

Okra

Project Status and Scope

Concepts

Comparison with Flagger and Argo Rollouts

How it works

Getting Started

Install Okra

Create Load Balancer

Provision Kubernetes Clusters

Deploy Applications onto Clusters

Auto-Deploy with ApplicationSet and ClusterSet

Register Target Groups

Auto-Register Target Groups with AWSTargetGroupSet

Create Cell

Configuring ALB listener rule for the Cell

Configuring Canary Rollout Steps for the Cell

Create and Rollout New Clusters

Analysises and Experiments

How it integrates with Datadog

Notes

CRDs

CLI

Related Projects

Why is it named "okra"?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases3

Sponsor this project

Uh oh!

Packages0

Uh oh!

Uh oh!

Languages

Packages