- Notifications
You must be signed in to change notification settings - Fork2
mumoshu/okra
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project is under heavy development and has no tagged releases yet.
But I'd still appreciate it if you could help me by testing it and submitting pull requests,so that you can get the first release earlier!
We already have a throughout getting-started guide, a working Helm chart, and a container image published at
mumoshu/okra:canary
. So it shouldn't be that hard to give it a shot.
Okra
is aKubernetes controller and a set ofCRDs which provide advanced multi-cluster appilcation rollout capabilities, such as canary deployment of clusters.
okra
eases managing a lot of ephemeral Kubernetes clusters.
If you've been using ephemeral Kubernetes clusters and employed blue-green or canary deployments for zero-downtime cluster updates, you might have suffered from a lot of manual steps required.okra
is intended to automate all those steps.
In a standard scenario, a system update withokra
would like the below.
- You provision one or more new clusters with cluster tags like
name=web-1-v2, role=web, version=v2
- Okra auto-imports the clusters intoArgoCD
- ArgoCD ApplicationSet deploys your apps onto the new clusters
- Okra updates the loadbalancer configuration to gradually migrate traffic to the new clusters, while running various checks to ensure application availability
ToC:
okra
(currently) integrates with AWS ALB and target groups for traffic management, CloudWatch Metrics and Datadog for canary analysis.
okra
currently works on AWS only, but the design and the implementation of it is generic enough to be capable of adding more IaaS supports. Any contribution around that is welcomed.
Here's the list of possible additional IaaSes that the original author (@mumoshu) has thought of:
- Cluster API
- GKE
Here's the list of possible additional loadbalancers:
- AWS NLB
- Envoy
- Istio Ingerss Gateway
- ingress-nginx
Okra
managescells for you. A cell can be compared to a few things.
A cell is like a Kubernetes pod of containers. A Kubernetes pod an isolated set of containers, where each container usually runs a single application, and you can have two or more pods for availability and scalability. A Okracell
is a set of Kubernetes clusters, where each cluster runs your application and you can have two or more clusters behind a loadbalancer for horizontal scalability beyond the limit of a single cluster.
A cell is like a storage array but for Kubernetes clusters. You hot-swap a disk in a storage array while running. Similarly, withokra
you hot-swap a cluster in a cell while keeping your application up and running.
Okra'scell-contorller
is responsible for managing the traffic shift across clusters.
You give eachCell
a set of settings to discover AWS target groups and configure loadbalancers, and metrics.
The controller periodically discovers AWS target groups. Once there are enough number of new target groups, it then compares the target groups associated to the loadbalancer. If there's any difference, it starts updating the ALB while checking various metrics for safe rollout.
Okra uses Kubernetes CRDs and custom resources as a state store and uses the standard Kubernetes API to interact with resources.
Okra calls various AWS APIs to create and update AWS target groups and update AWS ALB and NLB forward config for traffic management.
UnlikeArgo Rollouts
andFlagger
, inOkra
there is no notions of "active" and "preview" services for a blue-green deployment, or "canary" and "stable" services for a canary deployment.
It assumes there's one or more target groups per cell.cell
basically does a canary deployment, where the old set of target groups is consdidered "stable" and the new set of target groups is considered "canary".
InFlagger
orArgo Rollouts
, you need to update its K8s resource to trigger a new rollout. In Okra you don't need to do so. You preconfigure its resource and Okra auto-starts a rollout once it discovers enough number of new target groups.
okra
updates yourCell
.
A okraCell
is composed of target groups and an AWS loadbalancer, and a set of metrics for canary anlysis.
Each target group is tied to acluster
, where acluster
is a Kubernetes cluster that runs your container workloads.
Anapplication
is deployed ontoclusters
byArgoCD
. The traffic to theapplication
is routed via anAWS ALB in front ofclusters
.
okra
acts as an application traffic migrator.
It detects newtarget groups
, and live migrate traffic by hot-swaping old target groups serving the affectedapplications
with the new target groups, while keepining theapplications
up and running.
- Install Okra
- Create Load Balancer
- Provision Kubernetes Clusters
- Deploy Applications onto Clusters
- Register Target Groups
- Create Cell
- Create and Rollout New Clusters
- Analysises and Experiments
First, you need to provision a Kubernetes cluster that is running ArgoCD, Argo Rollouts, and ArgoCD ApplicationSet controller.We call itmanagement cluster
in the following guide.
To deploy required components onto the management cluster, use the following snippet:
# 1. Install ArgoCD and ApplicationSet# https://argocd-applicationset.readthedocs.io/en/stable/Getting-Started/#b-install-applicationset-and-argo-cd-togetherkubectl create namespace argocdkubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj-labs/applicationset/v0.3.0/manifests/install-with-argo-cd.yaml# 2. Install Argo Rollouts# https://argoproj.github.io/argo-rollouts/installation/kubectl create namespace argo-rolloutskubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
Once your management cluster is up and running, installokra
on it using Helm or Kustomize.
Option 1: Helm:
$ helm upgrade --install charts/okra -f values.yaml
Option 2: Kustomize:
$ kustomize build config/manager | kubectl apply -f
You can specify okra's container image tag to anything that is available onhttps://hub.docker.com/r/mumoshu/okra/tags.
For Helm, you do it like
helm upgrade --install charts/okra --set image.tag=$TAG
.
Note that you need to provide AWS credentials to
okra
asit calls various AWS API to list and describe EKS clusters, generate Kubernetes API tokens, and interacting with loadbalancers.For Helm, the simplest (but not recommended in production) way would be to provide
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
:
values.yaml
:region:ap-northeast-1image:tag:"canary"additionalEnv:-name:AWS_ACCESS_KEY_IDvalue:"..."-name:AWS_SECRET_ACCESS_KEYvalue:"..."For production environments, you'd better useIAM roles for service accounts for security reason.
Create a loadbalancer in front of all the clusters you're going to manage with Okra.
Currently, only AWS Application LoadBalancer is supported.
You can use Terraform, AWS CDK, Pulumi, AWS Console, AWS CLI, or whatever tool to create the loadbalancer.
The only requirement to use that with Okra is to take note of "ALB Listener ARN", which is used to tell Okrawhich loadbalancer to use for traffic management.
Onceokra
is ready and you see no error, add one or more EKS clusters on your AWS account.
- Tag your EKS clusters with
Service=demo
, as we use it to letokra
auto-import those as ArgoCD cluster secrets. - Create one or more target groups per EKS cluster and take note of target group ARNS
Do either of the below to register clusters to ArgoCD and Okra
- Run
argocd cluster add
on the new cluster and either (1) create a new ArgoCDApplication
custom resource per cluster or (2) let ArgoCDApplicationSet
custom resource to auto-deploy onto the clusters - Use Okra's
ClusterSet
to auto-import EKS clusters to ArgoCD and useApplicationSet
to auto-deploy
SeeArgocd cluster add - Argo CD - Declarative GitOps CD for Kubernetes for more information on theargocd cluster add
command.
Also seeargoproj-labs/applicationset for more information on ArgoCDApplicationSet
and the controller.
Assuming your Okra instance has access to AWS EKS and STS APIs, you can use Okra'sClusterSet
custom resources toauto-discover EKS clusters and create corresponding ArgoCD cluster secrets.
This, in combination with ArgoCDApplicationSet
, enables you to auto-deploy your applications onto any newly createdEKS clusters, without ever touching ArgoCD or Okra at all.
The followingClusterSet
auto-discovers AWS EKS clusters tagged withService=demo
and createscorresponding ArgoCD cluster secrets.
apiVersion:okra.mumo.co/v1alpha1kind:ClusterSetmetadata:name:cell1spec:generators: -awseks:selector:matchTags:Service:demotemplate:metadata:labels:service:demo
Note thattemplate.metadat.labels.sevice
instruct cluster secrets to getmetadata.labels
ofservice: demo
, so thatAWSTargetGroupSet
can discover those clusters by labels.
Let's say you had an EKS cluster that looks like the below:
$ aws eks describe-cluster --name cdk1{ "cluster": { "name": "cdk1", "arn": "arn:aws:eks:REGION:ACCOUNT:cluster/cdk1", "createdAt": "2021-09-20T03:21:44.391000+00:00", "version": "1.21", "endpoint": "https://SOME_CLUSTER_ID.SOME_SHARD_ID.REGION.eks.amazonaws.com", "roleArn": "arn:aws:iam::ACCOUNT:role/NAME", "resourcesVpcConfig": { "subnetIds": [ "subnet-aaa", "subnet-bbb", "subnet-ccc" ], "securityGroupIds": [ "sg-ddd" ], "clusterSecurityGroupId": "sg-eee", "vpcId": "vpc-fff", "endpointPublicAccess": true, "endpointPrivateAccess": true, "publicAccessCidrs": [ "0.0.0.0/0" ] }, "kubernetesNetworkConfig": { "serviceIpv4Cidr": "172.20.0.0/16" }, "logging": { "clusterLogging": [ { "types": [ "api", "audit", "authenticator", "controllerManager", "scheduler" ], "enabled": false } ] }, "identity": { "oidc": { ... } }, "status": "ACTIVE", "certificateAuthority": { ... }, "platformVersion": "eks.2", "tags": { "Service": "demo" } }}
Note that this Okra is able to find this EKS cluster because:
- This cluster has
tags
of"Service": "demo"
while - The ClusterSet created above has
generators[].awseks.selector.matchTags
ofService: demo
An ArgoCD cluster secret created by the aboveClusterSet
should look the below, which is a regular ArgoCD cluster secret with the specified labels.
apiVersion:v1kind:Secretmetadata:name:cdk1namespace:defaultlabels:argocd.argoproj.io/secret-type:clusterservice:demotype:Opaquedata:config:<BASE64 ENCODED CONFIG JSON>name:<BASE64 ENCODED CLUSTER NAME>server:<BASE64 ENCODED HTTPS URL OF K8S API ENDPOINT>
Again, note that this clustser secret gotmetadata.labels
ofservice: demo
, becauseClusterSet
hadtemplate.metadat.labels.sevice
.
Okra works by gradually updating target groups weights behind a loadbalancer. In order to do so,you firstly need to tell which target groups to manage, by creatingAWSTargetGroup
custom resourceon your management cluster per target group.
AnAWSTargetGroup
custom resource is basically a target group ARN with a version number and labels.
apiVersion:okra.mumo.co/v1alpha1kind:AWSTargetGroupmetadata:name:default-web1labels:role:webokra.mumo.co/version:1.0.0spec:# Replace REGION, ACCOUNT, NAME, and ID with the actual valuesarn:arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/NAME/ID
Assuming you've already created ArgoCD cluster secret for clusters, Okra'sAWSTargetGroupSet
can be used to auto-discovertarget groups associated to the cluster and register those asAWSTargetGroup
resources.
The followingAWSTargetGroupSet
auto-discoversTargetGroupBinding
resources labeled withrole=web
from clusterslabeled withservice=demo
, to create correspondingAWSTargetGroup
resources in the management cluster.
apiVersion:okra.mumo.co/v1alpha1kind:AWSTargetGroupSetmetadata:name:cell1namespace:defaultspec:generators: -awseks:bindingSelector:matchLabels:role:webclusterSelector:matchLabels:service:demotemplate:metadata:{}
Let's say you had the belowTargetGroupBinding
custom resource labled withrole: web
in the new cluster labeled withserivce: demo
:
# In the new clusterapiVersion:elbv2.k8s.aws/v1beta1kind:TargetGroupBindingmetadata:name:web1namespace:defaultlabels:role:webokra.mumo.co/version:1.0.0spec:arn:arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/NAME/ID
Okra is able to find the cluster thanks toclusterSelector.matchLabels.service=demo
and also able find this target group binding thanks tobindingSelector.matchLabels.role=web
.
The outcome is that Okra creates the belowAWSTargetGroup
in the management cluster. Note thatmetadata.name
of it isderived from the originalTargetGroupBinding
'smetadata.namespace
andmetadata.name
, concatenated with-
in between.
# In the management clusterapiVersion:okra.mumo.co/v1alpha1kind:AWSTargetGroupmetadata:name:default-web1labels:role:webokra.mumo.co/version:1.0.0spec:# Replace REGION, ACCOUNT, NAME, and ID with the actual valuesarn:arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/NAME/ID
The labelrole: web
is later used byCell
to detect it as a candidate for a canary target group, and theokra.mumo.co/version: 1.0.0
label is used to group and sort all the detected target groups to finally see which set of target groups are considered as a part of the next canary.
Finally, create aCell
resource.
It specifies how it utilizes an existing AWS ALB inSpec.Ingress.AWSApplicationLoadBalancer
and which listener rule to be used for rollout, and the information to detect target groups that serves your application.
An exampleCell
custom resource follows.
On each reconcilation loop, Okra looks forAWSTargetGroup
resources labeled withrole=web
,and group those up by the version numbers saved under theokra.mumo.co/version
labels.
AsSpec.Replicas
being set to2
, it waits until 2 latest target groups appear, and starts a canary rollout only after that.
If your application is not that big and a single cluster suffices, you can safely setreplicas: 1
or omitreplicas
at all.
kind:Cellmetadata:name:cell1spec:ingress:type:AWSApplicationLoadBalancerawsApplicationLoadBalancer:listener:rule:forward:{}hosts: -example.compriority:10listenerARN:arn:aws:elasticloadbalancing:ap-northeast-1:ACCOUNT:listener/app/...targetGroupSelector:matchLabels:role:webreplicas:2updateStrategy:canary:steps: -setWeight:20 -analysis:args: -name:service-namevalue:exampleapptemplates: -templateName:success-rate -pause:duration:5s -setWeight:40type:Canary
The listener rule part is required in order to configure your ALB.
listener:rule:forward:{}hosts: -example.compriority:10
And this directly corresponds to the configuration of anALB Listener Rule.
priority: 10
is the priority of the listener rule to be added to your ALB listener andhosts: [example.com]
is the conditions associated to the rule.
ALB supports both default and non-default listener rules. Every non-default listener rule requires a priority and non-empty rule conditions.
Okra is designed to not modify the default rule as it can be disruptive sometimes. That's why it requirespriority
and a rule condition.
Other rule conditions likeheaders
,methods
,pathPatterns
, and so on are also supported. See the output ofkubectl explain awsapplicationloadbalancerconfig.spec.listener --recursive
for all the available conditions.
spec.updateStategy.canary.steps
contains a definition of canary rollout steps.
Each step can be any of the belows:
setWeight
: updates the canary target groups total weight to the given value. For example, when there's only one canary target group to be rolled out and it'ssetWeight: 20
, the canary target group gets weight of20
. If there were two canary target groups, each gets weight of10
analysis
: runs ArgoCDAnalysisRun
with given arguments. SeeArgoCD's documentation on Analysis for more information onAnalysisRun
and its templateAnalysisTemplate
.pause
: pauses the rollout for the duration.
Now you're all set!
Every time you provision new clusters with greater version number,Cell
automatically discovers new target groups associated to the new clusters, gradually update loadbalancer target groups weights while running various analysis.
Need a Kubernetes version upgrade? Create new Kubernetes clusters with the new Kubernetes version and watchCell
automatically and safely rolls out the clusters.
Need a host OS upgrade? Create new clusters with nodes with the new version of the host OS and watchCell
rolls out the new clusters.
And you can do the same on every kind of cluster-wide change! Enjoy running your ephemeral Kubernetes clusters.
Okra
provides almost the same features for Argo RolloutsAnalysises andExperiments.
A major difference between Argo Rollouts' and Okra's is that Okra's provides access to thestatus.desiredVersion
field in an analysis and experiment query argument, so that you can analyze and experiment your canary based on metrics specific to the version of your clusters.
In the below example, we have an analysis step that creates an analaysis run from a analysis template namedsuccess-rate
, whose argumentcluster-version
is set to the value obtained fromcell.status.desiredVersion
.
ThedesiredVersion
status field contains the desired version number(obtained from e.g. EKS cluster tags and AWS target group tags) of the cluster repliacs being rolled out, so that you can analyze based on metrics specific to the newly rolled out clusters.
A typical cell whose one of canary steps is aanalysis
would look like the below. Notice thefieldPath: status.desiredVersion
used to dynamically generate thecluster-version
analysis run argument.
apiVersion:okra.mumo.co/v1alpha1kind:Cellmetadata:name:webspec:updateStrategy:type:Canarycanary:steps:# ... -analysis:templates: -templateName:success-rateargs: -name:service-namevalue:guestbook-svc.default.svc.cluster.local -name:cluster-versionvalueFrom:fieldRef:fieldPath:status.desiredVersion
Similarly, an experiment step can inculde afieldPath
to have a dynaically generate argument:
apiVersion:okra.mumo.co/v1alpha1kind:Cellmetadata:name:webspec:updateStrategy:type:Canarycanary:steps:# ... -experiment:duration:5mtemplates: -name:wy# references the wy replicaset defined belowspecRef:wy# This should default to 1 as defined by Argo Rollouts but# the author observed that it doesn't work in practice.#replicas:1analyses: -name:success-rate-ddtemplateName:success-rate-ddargs: -name:service-namevalue:wy-serve -name:cluster-versionvalueFrom:fieldRef:fieldPath:status.desiredVersion---apiVersion:apps/v1kind:ReplicaSetmetadata:labels:app:wyname:wyspec:replicas:0selector:matchLabels:app:wytemplate:metadata:creationTimestamp:nulllabels:app:wyspec:containers: -image:mumoshu/wy:latestname:wyports: -containerPort:8080resources:{}args: -repeat -get --forever --interval=5s --url=http://localhost:8080 --argocd-cluster-secret=cdk1 --service=wy-serve --remote-port=8080 --local-port=8080envFrom: -secretRef:name:wyoptional:true
As explained earlier,Okra
relies on Argo RolloutsDatadog
support.
It works like this- you define a OkraCell
, so that the okra controller creates either Argo RolloutsAnalysisRun
orExperiment
, which in turn instruct Argo Rollouts to periodically query Datadog metrics withthe "Query timeseries points" API, update the AnalysisRun or Experiment's statuses to be eitherSuccessful
orFailed
. The final step is the okra controller gets notified about the status update and react to it by reconciling the parentCell
resource, incrementing the canary step.
The only part specific to Datadog is that it queries Datadog, which has been implemented inargoproj/argo-rollouts#705 in Argo Rollouts.
If you're curious how you'd instrument your app so that it's metrics cna be used from Okra, you'd better get started by reading e.g.Mapping Prometheus Metrics to Datadog Metrics. There's nothing specific to Okra here.
Before authoring a complexCell
spec including Analysis and Expriment, the author recommends you to try browsing Datadog dashboard, or use simpler tool likecurl
to query metries.After you've done so, start tinkering with Okra, so that when it break you can be extra sure when and where it broke!
It is inteded to be deployed onto a "control-plane" cluster to where you usually deploy applications like ArgoCD.
It requires you to use:
- NLB or ALB to load-balance traffic "across" clusters
- You bring your own LB, Listener, and tell
okra
the Listener ID, Number of Target Groups per Cell, and a label to group target groups by version.
- You bring your own LB, Listener, and tell
- Uses ArgoCD ApplicationSets to deploy your applications onto cluster(s)
In the future, it may add support for using Route 53 Weighted Routing instead of ALB.
Although we assume you use ApplicationSet for app deployments, it isn't really a strict requirement. Okra doesn't communiate with ArgoCD or ApplicationSet. All Okra does is to discover EKS clusters, create and label target groups for the discovered clusters, and rollout the target groups. You can just bring your own tool to deploy apps onto the clusters today.
It supports complex configurations like below:
- One or more clusters per cell, or an ALB listener rule. Imagine a case that you need a pair of clusters to serve your service.
okra
is able to canary-deploy the pair of clusters, by periodically updating two target group weights as a whole.
The following situations are handled by Okra:
- When there are enough number of "new" target groups, Okra gradually updates target group weights for a rollout
- Okra automatically falls back to a "old" target groups when there are only old target groups in the AWS account while ALB points to "new" target groups that disappeared
Okra
provides several Kuberntetes CustomResourceDefinitions(CRD) to achieve its goal.
Seecrd.md for more documentation and details of each CRD.
okra
provides 3 executables.
okrad
: the Kubernetes controller manager that consists of various Kubernetes controller for Okra CRDs. Intended to be run in a Kubernetes cluster.okractl
: Akubectl
-like CLI application that is for interacting withokrad
through Kubernetes API server. Intended to be run on your machine or on a CI system for automation.okra
: the standalone CLI application that does its best to provide every single logic implemented inokrad
's controllers. Intended to be run in CI to replicateokrad
's functionality on a CI system, or to test each okra functionality in isolation.
The standard and author's recommended usage of Okra involvesokrad
andokractl
.
Forokra
, we do our best to expose every singleokrad
+okractl
functionality via respectiveokra
CLI commands, so that you can test each functionality in isolation.
It may be even possible to build your own CI job that replacesokra
out of those commands!
SeeCLI for more information and its usage.
Okra is inspired by various open-source projects listed below.
- ArgoCD is a continuous deployment system that embraces GitOps to sync desired state stored in Git with the Kubernetes cluster's state.
okra
integrates withArgoCD
and especially itsApplicationSet
controller for applicaation deployments.okra
relies on ArgoCDApplicationSet
controller'sCluster Generator
feature
- Flagger andArgo Rollouts enables canary deployments of apps running across pods.
okra
enables canary deployments of clusters running on IaaS. - argocd-clusterset auto-discovers EKS clusters and turns those into ArgoCD cluster secrets.
okra
does the same with itsClusterSet
CRD andargocdcluster-controller
. - terraform-provider-eksctl's courier_alb resource enables canary deployments on target groups behind AWS ALB with metrics analysis for Datadog and CloudWatc metrics.
okra
does the same with it'sAWSApplicationLoadBalancerConfig
CRD andawsapplicationloadbalancerconfig-controller
.
Initially it was namedkubearray
, but the original author wanted something more catchy and pretty.
In the beginning of this project, the author thought that hot-swapping a cluster while keeping your apps running looks like hot-swaping a drive while keeping a server running.
We tend to call a cluster of storages where each storage drive can be hot-swapped a "storage array", hence calling a tool to build a cluster of clusters where each cluster can be hot-swapped "kubearray" seemed like a good idea.
Later, he searched over the Internet for a prettier and catchier alternative. While browsing a list of cool Japanese terms with 3 syllables, he encountered "okra". "Okra" is a pod vegetable full of edible seeds. The term is relatively unique that it sounds almost the same in both Japanese and English. The author thought that "okra" can be a good metaphor for a cluster of sub-clusters when each seed in an okra is compared to a sub-cluster.
About
Hot-swap Kubernetes clusters while keeping your service up and running.