- Notifications
You must be signed in to change notification settings - Fork0
Kubernetes controller for GitHub Actions self-hosted runnners
License
elafarge/actions-runner-controller
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This controller operates self-hosted runners for GitHub Actions on your Kubernetes cluster.
ToC:
- Motivation
- Installation
- Setting Up Authentication with GitHub API
- Deploying Multiple Controllers
- Usage
- Contributing
GitHub Actions is a very useful tool for automating development. GitHub Actions jobs are run in the cloud by default, but you may want to run your jobs in your environment.Self-hosted runner can be used for such use cases, but requires the provisioning and configuration of a virtual machine instance. Instead if you already have a Kubernetes cluster, it makes more sense to run the self-hosted runner on top of it.
actions-runner-controller makes that possible. Just create aRunner resource on your Kubernetes, and it will run and operate the self-hosted runner for the specified repository. Combined with Kubernetes RBAC, you can also build simple Self-hosted runners as a Service.
actions-runner-controller usescert-manager for certificate management of Admission Webhook. Make sure you have already installed cert-manager before you install. The installation instructions for cert-manager can be found below.
Subsequent to this, install the custom resource definitions and actions-runner-controller withkubectl orhelm. This will create actions-runner-system namespace in your Kubernetes and deploy the required resources.
Kubectl Deployment:
# REPLACE "v0.20.2" with the version you wish to deploykubectl apply -f https://github.com/actions-runner-controller/actions-runner-controller/releases/download/v0.20.2/actions-runner-controller.yamlHelm Deployment:
Configure your values.yaml, see the chart'sREADME for the values documentation
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controllerhelm upgrade --install --namespace actions-runner-system --create-namespace \ --wait actions-runner-controller actions-runner-controller/actions-runner-controller
The solution supports both GitHub Enterprise Cloud and Server editions as well as regular GitHub. Both PAT (personal access token) and GitHub App authentication works for installations that will be deploying either repository level and / or organization level runners. If you need to deploy enterprise level runners then you are restricted to PAT based authentication as GitHub doesn't support GitHub App based authentication for enterprise runners currently.
If you are deploying this solution into a GitHub Enterprise Server environment then you will need version >=3.0.0.
When deploying the solution for a GitHub Enterprise Server environment you need to provide an additional environment variable as part of the controller deployment:
kubectlset env deploy controller-manager -c manager GITHUB_ENTERPRISE_URL=<GHEC/S URL> --namespace actions-runner-system
Note: The repository maintainers do not have an enterprise environment (cloud or server). Support for the enterprise specific feature set is community driven and on a best effort basis. PRs from the community are welcomed to add features and maintain support.
There are two ways for actions-runner-controller to authenticate with the GitHub API (only 1 can be configured at a time however):
- Using a GitHub App (not supported for enterprise level runners due to lack of support from GitHub)
- Using a PAT
Functionality wise, there isn't much of a difference between the 2 authentication methods. The primarily benefit of authenticating via a GitHub App is anincreased API quota.
If you are deploying the solution for a GitHub Enterprise Server environment you are able toconfigure your rate limit settings making the main benefit irrelevant. If you're deploying the solution for a GitHub Enterprise Cloud or regular GitHub environment and you run into rate limit issues, consider deploying the solution using the GitHub App authentication method instead.
You can create a GitHub App for either your user account or any organization, below are the app permissions required for each supported type of runner:
Note: Links are provided further down to create an app for your logged in user account or an organization with the permissions for all runner types set in each link's query string
Required Permissions for Repository Runners:
Repository Permissions
- Actions (read)
- Administration (read / write)
- Checks (read) (if you are going to useWebhook Driven Scaling)
- Metadata (read)
Required Permissions for Organization Runners:
Repository Permissions
- Actions (read)
- Metadata (read)
Organization Permissions
- Self-hosted runners (read / write)
Subscribe to events
- Check run (if you are going to useWebhook Driven Scaling)
Note: All API routes mapped to their permissions can be foundhere if you wish to review
Setup Steps
If you want to create a GitHub App for your account, open the following link to the creation page, enter any unique name in the "GitHub App name" field, and hit the "Create GitHub App" button at the bottom of the page.
If you want to create a GitHub App for your organization, replace the:org part of the following URL with your organization name before opening it. Then enter any unique name in the "GitHub App name" field, and hit the "Create GitHub App" button at the bottom of the page to create a GitHub App.
You will see anApp ID on the page of the GitHub App you created as follows, the value of this App ID will be used later.
Download the private key file by pushing the "Generate a private key" button at the bottom of the GitHub App page. This file will also be used later.
Go to the "Install App" tab on the left side of the page and install the GitHub App that you created for your account or organization.
When the installation is complete, you will be taken to a URL in one of the following formats, the last number of the URL will be used as the Installation ID later (For example, if the URL ends insettings/installations/12345, then the Installation ID is12345).
https://github.com/settings/installations/${INSTALLATION_ID}https://github.com/organizations/eventreactor/settings/installations/${INSTALLATION_ID}
Finally, register the App ID (APP_ID), Installation ID (INSTALLATION_ID), and downloaded private key file (PRIVATE_KEY_FILE_PATH) to Kubernetes as Secret.
Kubectl Deployment:
$ kubectl create secret generic controller-manager \ -n actions-runner-system \ --from-literal=github_app_id=${APP_ID} \ --from-literal=github_app_installation_id=${INSTALLATION_ID} \ --from-file=github_app_private_key=${PRIVATE_KEY_FILE_PATH}
Helm Deployment:
Configure your values.yaml, see the chart'sREADME for deploying the secret via Helm
Personal Access Tokens can be used to register a self-hosted runner byactions-runner-controller.
Log-in to a GitHub account that hasadmin privileges for the repository, andcreate a personal access token with the appropriate scopes listed below:
Required Scopes for Repository Runners
- repo (Full control)
Required Scopes for Organization Runners
- repo (Full control)
- admin:org (Full control)
- admin:public_key (read:public_key)
- admin:repo_hook (read:repo_hook)
- admin:org_hook (Full control)
- notifications (Full control)
- workflow (Full control)
Required Scopes for Enterprise Runners
- admin:enterprise (manage_runners:enterprise)
Note: When you deploy enterprise runners they will get access to organizations, however, access to the repositories themselves isNOT allowed by default. Each GitHub organization must allow enterprise runner groups to be used in repositories as an initial one time configuration step, this only needs to be done once after which it is permanent for that runner group.
Note: GitHub do not document exactly what permissions you get with each PAT scope beyond a vague description. The best documentation they provide on the topic can be foundhere if you wish to review. The docs target OAuth apps and so are incomplete and amy not be 100% accurate.
Once you have created the appropriate token, deploy it as a secret to your Kubernetes cluster that you are going to deploy the solution on:
Kubectl Deployment:
kubectl create secret generic controller-manager \ -n actions-runner-system \ --from-literal=github_token=${GITHUB_TOKEN}Helm Deployment:
Configure your values.yaml, see the chart'sREADME for deploying the secret via Helm
This feature requires controller version =>v0.18.0
Note: Be aware when using this feature that CRDs are cluster wide and so you should upgrade all of your controllers (and your CRDs) as the same time if you are doing an upgrade. Do not mix and match CRD versions with different controller versions. Doing so risks out of control scaling.
By default the controller will look for runners in all namespaces, the watch namespace feature allows you to restrict the controller to monitoring a single namespace. This then lets you deploy multiple controllers in a single cluster. You may want to do this either because you wish to scale beyond the API rate limit of a single PAT / GitHub App configuration or you wish to support multiple GitHub organizations with runners installed at the organization level in a single cluster.
This feature is configured via the controller's--watch-namespace flag. When a namespace is provided via this flag, the controller will only monitor runners in that namespace.
If you plan on installing all instances of the controller stack into a single namespace you will need to make the names of the resources unique to each stack. In the case of Helm this can be done by giving each install a unique release name, or via thefullnameOverride properties.
Alternatively, you can install each controller stack into its own unique namespace (relative to other controller stacks in the cluster), avoiding the need to uniquely prefix resources.
When you go to the route of sharing the namespace while giving each a unique Helm release name, you must also ensure the following values are configured correctly:
authSecret.nameneeds be unique per stack when each stack is tied to runners in different GitHub organizations and repositories AND you want your GitHub credentials to narrowly scoped.leaderElectionIdneeds to be unique per stack. If this is not unique to the stack the controller tries to race onto the leader election lock and resulting in only one stack working concurrently.
GitHub self-hosted runners can be deployed at various levels in a management hierarchy:
- The repository level
- The organization level
- The enterprise level
There are two ways to use this controller:
- Manage runners one by one with
Runner. - Manage a set of runners with
RunnerDeployment.
To launch a single self-hosted runner, you need to create a manifest file includesRunner resource as follows. This example launches a self-hosted runner with nameexample-runner for theactions-runner-controller/actions-runner-controller repository.
# runner.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:example-runnerspec:repository:example/myrepoenv:[]
Apply the created manifest file to your Kubernetes.
$ kubectl apply -f runner.yamlrunner.actions.summerwind.dev/example-runner created
You can see that the Runner resource has been created.
$ kubectl get runnersNAME REPOSITORY STATUSexample-runner actions-runner-controller/actions-runner-controller Running
You can also see that the runner pod has been running.
$ kubectl get podsNAME READY STATUS RESTARTS AGEexample-runner 2/2 Running 0 1m
The runner you created has been registered to your repository.
Now you can use your self-hosted runner. See theofficial documentation on how to run a job with it.
To add the runner to an organization, you only need to replace therepository field withorganization, so the runner will register itself to the organization.
# runner.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:example-org-runnerspec:organization:your-organization-name
Now you can see the runner on the organization level (if you have organization owner permissions).
To add the runner to an enterprise, you only need to replace therepository field withenterprise, so the runner will register itself to the enterprise.
# runner.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:example-enterprise-runnerspec:enterprise:your-enterprise-name
Now you can see the runner on the enterprise level (if you have enterprise access permissions).
You can manage sets of runners instead of individually through theRunnerDeployment kind and itsreplicas: attribute. This kind is required for many of the advanced features.
There areRunnerReplicaSet andRunnerDeployment kinds that corresponds to theReplicaSet andDeployment kinds but for theRunner kind.
You typically only needRunnerDeployment rather thanRunnerReplicaSet as the former is for managing the latter.
# runnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnerdeployspec:replicas:2template:spec:repository:mumoshu/actions-runner-controller-cienv:[]
Apply the manifest file to your cluster:
$ kubectl apply -f runnerdeployment.yamlrunnerdeployment.actions.summerwind.dev/example-runnerdeploy created
You can see that 2 runners have been created as specified byreplicas: 2:
$ kubectl get runnersNAME REPOSITORY STATUSexample-runnerdeploy2475h595fr mumoshu/actions-runner-controller-ci Runningexample-runnerdeploy2475ht2qbr mumoshu/actions-runner-controller-ci Running
Since the release of GitHub's
workflow_jobwebhook, webhook driven scaling is the preferred way of autoscaling as it enables targeted scaling of yourRunnerDeployments/RunnerSetsas it includes theruns-oninformation needed to scale the appropriate runners for that workflow run. More broadly, webhook driven scaling is the preferred scaling option as it is far quicker compared to the pull driven scaling and is easy to setup.
ARunnerDeployment orRunnerSet (seestateful runners for more details on this kind) can scale the number of runners betweenminReplicas andmaxReplicas fields driven by either pull based scaling metrics or via a webhook event (see limitations section ofstateful runners for cavaets of this kind). Whether the autoscaling is driven from a webhook event or pull based metrics it is implemented by backing aRunnerDeployment orRunnerSet kind with aHorizontalRunnerAutoscaler kind.
For both pull driven or webhook driven scaling an anti-flapping implementation is included, by default a runner won't be scaled down within 10 minutes of it having been scaled up. This delay is configurable by including the attributescaleDownDelaySecondsAfterScaleOut: in aHorizontalRunnerAutoscaler kind'sspec:.
This configuration has the final say on if a runner can be scaled down or not regardless of the chosen scaling method. Depending on your requirements, you may want to consider adjusting this by setting thescaleDownDelaySecondsAfterScaleOut: attribute.
Below is a complete basic example with one of the pull driven scaling metrics.
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runner-deploymentspec:template:spec:repository:example/myrepo---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:# Runners in the targeted RunnerDeployment won't be scaled down for 5 minutes instead of the default 10 minutes nowscaleDownDelaySecondsAfterScaleOut:300scaleTargetRef:name:example-runner-deploymentminReplicas:1maxReplicas:5metrics: -type:PercentageRunnersBusyscaleUpThreshold:'0.75'scaleDownThreshold:'0.25'scaleUpFactor:'2'scaleDownFactor:'0.5'
To configure webhook driven scaling see theWebhook Driven Scaling section
The pull based metrics are configured in themetrics attribute of a HRA (see snippet below). The period between polls is defined by the controller's--sync-period flag. If this flag isn't provided then the controller defaults to a sync period of 10 minutes. The default value is set to 10 minutes to prevent default deployments rate limiting themselves from the GitHub API, you will most likely want to adjust this.
apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:# Your RunnerDeployment Herename:example-runner-deploymentminReplicas:1maxReplicas:5# Your chosen scaling metrics heremetrics:[]
Metric Options:
TotalNumberOfQueuedAndInProgressWorkflowRuns
TheTotalNumberOfQueuedAndInProgressWorkflowRuns metric polls GitHub for all pending workflow runs against a given set of repositories. The metric will scale the runner count up to the total number of pending jobs at the sync time up to themaxReplicas configuration.
Benefits of this metric
- Supports named repositories allowing you to restrict the runner to a specified set of repositories server-side.
- Scales the runner count based on the depth of the job queue meaning a more 1:1 scaling of runners to queued jobs (caveat, see drawback #4)
- Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use ofGitHub labels.
Drawbacks of this metric
- A list of repositories must be included within the scaling metric. Maintaining a list of repositories may not be viable in larger environments or self-serve environments.
- May not scale quick enough for some users needs. This metric is pull based and so the queue depth is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
- Relatively large amounts of API requests required to maintain this metric, you may run in API rate limit issues depending on the size of your environment and how aggressive your sync period configuration is.
- The GitHub API doesn't provide a way to filter workflow jobs to just those targeting self-hosted runners. If your environment's workflows target both self-hosted and GitHub hosted runners then the queue depth this metric scales against isn't a true 1:1 mapping of queue depth to required runner count. As a result of this, this metric may scale too aggressively for your actual self-hosted runner count needs.
ExampleRunnerDeployment backed by aHorizontalRunnerAutoscaler:
Important!!! We no longer include the attributereplicas in ourRunnerDeployment if we are configuring autoscaling!
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runner-deploymentspec:template:spec:repository:example/myrepo---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deploymentminReplicas:1maxReplicas:5metrics: -type:TotalNumberOfQueuedAndInProgressWorkflowRunsrepositoryNames: -example/myrepo
PercentageRunnersBusy
TheHorizontalRunnerAutoscaler will poll GitHub for the number of runners in thebusy state which live in the RunnerDeployment's namespace, it will then scale depending on how you have configured the scale factors.
Benefits of this metric
- Supports named repositories server-side the same as the
TotalNumberOfQueuedAndInProgressWorkflowRunsmetric#313 - Supports GitHub organization wide scaling without maintaining an explicit list of repositories, this is especially useful for those that are working at a larger scale.#223
- Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use ofGitHub labels
- Supports scaling desired runner count on both a percentage increase / decrease basis as well as on a fixed increase / decrease count basis#223#315
Drawbacks of this metric
- May not scale quick enough for some users needs. This metric is pull based and so the number of busy runners are polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
- We are scaling up and down based on indicative information rather than a count of the actual number of queued jobs and so the desired runner count is likely to under provision new runners or overprovision them relative to actual job queue depth, this may or may not be a problem for you.
Examples of each scaling type implemented with aRunnerDeployment backed by aHorizontalRunnerAutoscaler:
Important!!! We no longer include the attributereplicas in ourRunnerDeployment if we are configuring autoscaling!
---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deploymentminReplicas:1maxReplicas:5metrics: -type:PercentageRunnersBusyscaleUpThreshold:'0.75'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale upscaleDownThreshold:'0.3'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale downscaleUpFactor:'1.4'# The scale up multiplier factor applied to desired countscaleDownFactor:'0.7'# The scale down multiplier factor applied to desired count
---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deploymentminReplicas:1maxReplicas:5metrics: -type:PercentageRunnersBusyscaleUpThreshold:'0.75'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale upscaleDownThreshold:'0.3'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale downscaleUpAdjustment:2# The scale up runner count added to desired countscaleDownAdjustment:1# The scale down runner count subtracted from the desired count
To configure pull driven scaling see thePull Driven Scaling section
Webhooks are processed by a seperate webhook server. The webhook server receives GitHub Webhook events and scalesRunnerDeployments by updating correspondingHorizontalRunnerAutoscalers.
Today, the Webhook server can be configured to respond GitHubcheck_run,workflow_job,pull_request andpush eventsby scaling up the matchingHorizontalRunnerAutoscaler by N replica(s), whereN is configurable withinHorizontalRunnerAutoscaler'sspec:.
More concretely, you can configure the targeted GitHub event types and theN inscaleUpTriggers:
kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runnersscaleUpTriggers: -githubEvent:checkRun:types:["created"]status:"queued"amount:1duration:"5m"
With the above example, the webhook server scalesexample-runners by1 replica for 5 minutes on eachcheck_run event with the type ofcreated and the status ofqueued received.
Of note is theHRA.spec.scaleUpTriggers[].duration attribute. This attribute is used to calculate if the replica number added via the trigger is expired or not. On each reconcilation loop, the controller sums up all the non-expiring replica numbers from previous scale up triggers. It then compares the summed desired replica number against the current replica number. If the summed desired replica number > the current number then it means the replica count needs to scale up.
As mentioned previously, thescaleDownDelaySecondsAfterScaleOut property has the final say still. If the latest scale-up time + the anti-flapping duration is later than the current time, it doesn’t immediately scale up and instead retries the calculation again later to see if it needs to scale yet.
The primary benefit of autoscaling on Webhook compared to the pull driven scaling is that it is far quicker as it allows you to immediately add runners resource rather than waiting for the next sync period.
You can learn the implementation details in#282
To enable this feature, you firstly need to install the webhook server, currently, only our Helm chart has the ability install it:see the values documentation for all configuration options
$helm --upgrade install actions-runner-controller/actions-runner-controller \ githubWebhookServer.enabled=true \ githubWebhookServer.ports[0].nodePort=33080
The above command will result in exposing the node port 33080 for Webhook events. Usually, you need to create anexternal loadbalancer targeted to the node port, and register the hostname or the IP address of the external loadbalancerto the GitHub Webhook.
Once you were able to confirm that the Webhook server is ready and running from GitHub - this is usually verified by theGitHub sending PING events to the Webhook server - create or update yourHorizontalRunnerAutoscaler resourcesby learning the following configuration examples.
- Example 1: Scale on each
workflow_jobevent - Example 2: Scale up on each
check_runevent - Example 3: Scale on each
pull_requestevent against a given set of branches - Example 4: Scale on each
pushevent
This feature requires controller version =>v0.20.0
The most flexible webhook GitHub offers is theworkflow_job webhook, it includes theruns-on information in the payload allowing scaling based on runner labels.
This webhook should cover most people's needs, please experiment with this webhook first before considering the others.
kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runnersscaleUpTriggers: -githubEvent:{}duration:"30m"
You can configure your GitHub webhook settings to only includeWorkflows Job events, so that it sends us three kinds ofworkflow_job events per a job run.
Each kind has astatus ofqueued,in_progress andcompleted. With the above configuration,actions-runner-controller adds one runner for aworkflow_job event whosestatus isqueued. Similarly, it removes one runner for aworkflow_job event whosestatus iscompleted. The cavaet to this to remember is that this the scale down is within the bounds of yourscaleDownDelaySecondsAfterScaleOut configuration, if this time hasn't past the scale down will be defered.
Note: This should work almost likehttps://github.com/philips-labs/terraform-aws-github-runner
To scale up replicas of the runners forexample/myrepo by 1 for 5 minutes on eachcheck_run, you write manifests like the below:
kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runnersscaleUpTriggers: -githubEvent:checkRun:types:["created"]status:"queued"amount:1duration:"5m"
To scale up replicas of the runners formyorg organization by 1 for 5 minutes on eachcheck_run, you write manifests like the below:
kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:organization:myorg---kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runnersscaleUpTriggers: -githubEvent:checkRun:types:["created"]status:"queued"# Optionally restrict autoscaling to being triggered by events from specific repositories within your organization still# repositories: ["myrepo", "myanotherrepo"]amount:1duration:"5m"
To scale up replicas of the runners forexample/myrepo by 1 for 5 minutes on eachpull_request against themain ordevelop branch you write manifests like the below:
kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runnersscaleUpTriggers: -githubEvent:pullRequest:types:["synchronize"]branches:["main", "develop"]amount:1duration:"5m"
See"activity types" for the list of valid values forscaleUpTriggers[].githubEvent.pullRequest.types.
To scale up replicas of the runners forexample/myrepo by 1 for 5 minutes on eachpush write manifests like the below:
kind:RunnerDeploymentmetadata:name:example-runnersspec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runnersscaleUpTriggers: -githubEvent:push:amount:1duration:"5m"
This feature requires controller version =>v0.19.0
Note: The controller creates a "registration-only" runner per RunnerReplicaSet when it is being scaled to zero and retains it until there are one or more runners available. This is a deprecated feature for GitHub Cloud as "registration-only" runners are no longer needed due to GitHub changing their runnerrouting logic to no longer fail a workflow run if it targets a runner label that there are no registered runners for.
The regularRunnerDeploymentreplicas: attribute as well as theHorizontalRunnerAutoscalerminReplicas: attribute supports being set to 0.
The main use case for scaling from 0 is with theHorizontalRunnerAutoscaler kind. To scale from 0 whilst still being able to provision runners as jobs are queued we must use theHorizontalRunnerAutoscaler with only certain scaling configurations, only the below configurations support scaling from 0 whilst also being able to provision runners as jobs are queued:
TotalNumberOfQueuedAndInProgressWorkflowRunsPercentageRunnersBusy+TotalNumberOfQueuedAndInProgressWorkflowRunsPercentageRunnersBusy+ Webhook-based autoscaling- Webhook-based autoscaling only
PercentageRunnersBusy can't be used alone as, by its definition, it needs one or more GitHub runners to becomebusy to be able to scale. If there isn't a runner to pick up a job and enter abusy state then the controller will never know to provision a runner to begin with as this metric has no knowledge of the job queue and is relying using the number of busy runners as a means for calculating the desired replica count.
If a HorizontalRunnerAutoscaler is configured with a secondary metric ofTotalNumberOfQueuedAndInProgressWorkflowRuns then be aware that the controller will check the primary metric ofPercentageRunnersBusy first and will only use the secondary metric to calculate the desired replica count if the primary metric returns 0 desired replicas.
This feature requires controller version =>v0.19.0
Scheduled Overrides allows you to configureHorizontalRunnerAutoscaler so that itsspec: gets updated only during a certain period of time. This feature is usually used for following scenarios:
- You want to reduce your infrastructure costs by scaling your Kubernetes nodes down outside a given period
- You want to scale for scheduled spikes in workloads
The most basic usage of this feature is to set a non-repeating override:
apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deploymentscheduledOverrides:# Override minReplicas to 100 only between 2021-06-01T00:00:00+09:00 and 2021-06-03T00:00:00+09:00 -startTime:"2021-06-01T00:00:00+09:00"endTime:"2021-06-03T00:00:00+09:00"minReplicas:100minReplicas:1
A scheduled override withoutrecurrenceRule is considered a one-off override, that is active betweenstartTime andendTime. In the second scenario, it overridesminReplicas to100 only between2021-06-01T00:00:00+09:00 and2021-06-03T00:00:00+09:00.
A more advanced configuration is to include arecurrenceRule in the override:
apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deploymentscheduledOverrides:# Override minReplicas to 0 only between 0am sat to 0am mon -startTime:"2021-05-01T00:00:00+09:00"endTime:"2021-05-03T00:00:00+09:00"recurrenceRule:frequency:Weekly# Optional sunset datetime attribute# untilTime: "2022-05-01T00:00:00+09:00"minReplicas:0minReplicas:1
A recurring override is initially active betweenstartTime andendTime, and then it repeatedly get activated after a certain period of time denoted byfrequency.
frequecy can take one of the following values:
DailyWeeklyMonthlyYearly
By default, a scheduled override repeats forever. If you want it to repeat until a specific point in time, defineuntilTime. The controller create the last recurrence of the override until the recurrence'sstartTime is equal or earlier thanuntilTime.
Do ensure that you have enough slack foruntilTime so that a delayed or offlineactions-runner-controller is much less likely to miss the last recurrence. For example, you might want to setuntilTime toM minutes after the last recurrence'sstartTime, so thatactions-runner-controller being offline up toM minutes doesn't miss the last recurrence.
Combining Multiple Scheduled Overrides:
In case you have a more complex scenarios, try writing two or more entries underscheduledOverrides.
The earlier entry is prioritized higher than later entries. So you usually define one-time overrides in the top of your list, then yearly, monthly, weekly, and lastly daily overrides.
A common use case for this may be to have 1 override to scale to 0 during the week outside of core business hours and another override to scale to 0 during all hours of the weekend.
When using default runner, runner pod starts up 2 containers: runner and DinD (Docker-in-Docker). This might create issues if there'sLimitRange set to namespace.
# dindrunnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-dindrunnerdeployspec:replicas:2template:spec:image:summerwind/actions-runner-dinddockerdWithinRunnerContainer:truerepository:mumoshu/actions-runner-controller-cienv:[]
This also helps with resources, as you don't need to give resources separately to docker and runner.
You can pass details through the spec selector. Here's an eg. of what you may like to do:
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:actions-runnernamespace:defaultspec:replicas:2template:metadata:annotations:cluster-autoscaler.kubernetes.io/safe-to-evict:"true"spec:nodeSelector:node-role.kubernetes.io/test:""securityContext:#All level/role/type/user values will vary based on your SELinux policies.#See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy for information about SELinux with containersseLinuxOptions:level:"s0"role:"system_r"type:"super_t"user:"system_u"tolerations: -effect:NoSchedulekey:node-role.kubernetes.io/testoperator:ExiststopologySpreadConstraints: -maxSkew:1topologyKey:kubernetes.io/hostnamewhenUnsatisfiable:ScheduleAnywaylabelSelector:matchLabels:runner-deployment-name:actions-runnerrepository:mumoshu/actions-runner-controller-ci# The default "summerwind/actions-runner" images are available at DockerHub:# https://hub.docker.com/r/summerwind/actions-runner# You can also build your own and specify it like the below:image:custom-image/actions-runner:latestimagePullPolicy:Alwaysresources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"# Timeout after a node crashed or became unreachable to evict your pods somewhere else (default 5mins)tolerations: -key:"node.kubernetes.io/unreachable"operator:"Exists"effect:"NoExecute"tolerationSeconds:10# true (default) = The runner restarts after running jobs, to ensure a clean and reproducible build environment# false = The runner is persistent across jobs and doesn't automatically restart# This directly controls the behaviour of `--once` flag provided to the github runnerephemeral:false# true (default) = A privileged docker sidecar container is included in the runner pod.# false = A docker sidecar container is not included in the runner pod and you can't use docker.# If set to false, there are no privileged container and you cannot use docker.dockerEnabled:false# Optional Docker containers network MTU# If your network card MTU is smaller than Docker's default 1500, you might encounter Docker networking issues.# To fix these issues, you should setup Docker MTU smaller than or equal to that on the outgoing network card.# More information:# - https://mlohr.com/docker-mtu/dockerMTU:1500# Optional Docker registry mirror# Docker Hub has an aggressive rate-limit configuration for free plans.# To avoid disruptions in your CI/CD pipelines, you might want to setup an external or on-premises Docker registry mirror.# More information:# - https://docs.docker.com/docker-hub/download-rate-limit/# - https://cloud.google.com/container-registry/docs/pulling-cached-imagesdockerRegistryMirror:https://mirror.gcr.io/# false (default) = Docker support is provided by a sidecar container deployed in the runner pod.# true = No docker sidecar container is deployed in the runner pod but docker can be used within the runner container instead. The image summerwind/actions-runner-dind is used by default.dockerdWithinRunnerContainer:true# Docker sidecar container image tweaks examples below, only applicable if dockerdWithinRunnerContainer = falsedockerdContainerResources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"# Additional N number of sidecar containerssidecarContainers: -name:mysqlimage:mysql:5.7env: -name:MYSQL_ROOT_PASSWORDvalue:abcd1234securityContext:runAsUser:0# workDir if not specified (default = /runner/_work)# You can customise this setting allowing you to change the default working directory location# for example, the below setting is the same as on the ubuntu-18.04 imageworkDir:/home/runner/work# You can mount some of the shared volumes to the dind container using dockerVolumeMounts, like any other volume mounting.# NOTE: in case you want to use an hostPath like the following example, make sure that Kubernetes doesn't schedule more than one runner# per physical host. You can achieve that by setting pod anti-affinity rules and/or resource requests/limits.volumes: -name:docker-extrahostPath:path:/mnt/docker-extratype:DirectoryOrCreate -name:repohostPath:path:/mnt/repotype:DirectoryOrCreatedockerVolumeMounts: -mountPath:/var/lib/dockername:docker-extra# You can mount some of the shared volumes to the runner container using volumeMounts.# NOTE: Do not try to mount the volume onto the runner workdir itself as it will not work. You could mount it however on a sub directory in the runner workdir# Please see https://github.com/actions-runner-controller/actions-runner-controller/issues/630#issuecomment-862087323 for more information.volumeMounts: -mountPath:/home/runner/work/reponame:repo# Optional storage medium type of runner volume mount.# More info: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir# "" (default) = Node's default medium# Memory = RAM-backed filesystem (tmpfs)# NOTE: Using RAM-backed filesystem gives you fastest possible storage on your host nodes.volumeStorageMedium:""# Total amount of local storage resources required for runner volume mount.# The default limit is undefined.# NOTE: You can make sure that nodes' resources are never exceeded by limiting used storage size per runner pod.# You can even disable the runner mount completely by setting limit to zero if dockerdWithinRunnerContainer = true.# Please see https://github.com/actions-runner-controller/actions-runner-controller/pull/674 for more information.volumeSizeLimit:4Gi# Optional name of the container runtime configuration that should be used for pods.# This must match the name of a RuntimeClass resource available on the cluster.# More info: https://kubernetes.io/docs/concepts/containers/runtime-classruntimeClassName:"runc"
To run a workflow job on a self-hosted runner, you can use the following syntax in your workflow:
jobs:release:runs-on:self-hosted
When you have multiple kinds of self-hosted runners, you can distinguish between them using labels. In order to do so, you can specify one or more labels in yourRunner orRunnerDeployment spec.
# runnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:custom-runnerspec:replicas:1template:spec:repository:actions-runner-controller/actions-runner-controllerlabels: -custom-runner
Once this spec is applied, you can observe the labels for your runner from the repository or organization in the GitHub settings page for the repository or organization. You can now select a specific runner from your workflow by using the label inruns-on:
jobs:release:runs-on:custom-runner
Note that if you specifyself-hosted in your workflow, then this will run your job onany self-hosted runner, regardless of the labels that they have.
Runner groups can be used to limit which repositories are able to use the GitHub Runner at an organization level. Runner groups have to becreated in GitHub first before they can be referenced.
To add the runner to the groupNewGroup, specify the group in yourRunner orRunnerDeployment spec.
# runnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:custom-runnerspec:replicas:1template:spec:group:NewGroup
This feature requires controller version =>v0.15.0
As similar as for regular pods and deployments, you firstly need an existing service account with the IAM role associated.Create one using e.g.eksctl. You can refer tothe EKS documentation for more details.
Once you set up the service account, all you need is to addserviceAccountName andfsGroup to any pods that uses the IAM-role enabled service account.
ForRunnerDeployment, you can set those two fields under the runner spec atRunnerDeployment.Spec.Template:
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnerdeployspec:template:spec:repository:USER/REOserviceAccountName:my-service-accountsecurityContext:fsGroup:1000
Istio 1.7.0 or greater hasholdApplicationUntilProxyStarts added inistio/istio#24737, which enables you to delay therunner container startup until the injectedistio-proxy container finish starting. Try using it if you need to use Istio. Otherwise the runner is unlikely to work, because it fails to call any GitHub API to register itself due toistio-proxy being not up and running yet.
Note that there's no official Istio integration in actions-runner-controller. It should work, but it isn't covered by our acceptance test (a contribution to resolve this is welcomed). In addition to that, none of the actions-runner-controller maintainers use Istio daily. If you need more information, or have any issues using it, refer to the following links:
This feature requires controller version =>v0.20.0
actions-runner-controller supportsRunnerSet API that let you deploy stateful runners. A stateful runner is designed to be able to store some data persists across GitHub Actions workflow and job runs. You might find it useful, for example, to speed up your docker builds by persisting the docker layer cache.
A basicRunnerSet would look like this:
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerSetmetadata:name:examplespec:ephemeral:falsereplicas:2repository:mumoshu/actions-runner-controller-ci# Other mandatory fields from StatefulSetselector:matchLabels:app:exampleserviceName:exampletemplate:metadata:labels:app:example
As it is based onStatefulSet,selector andtemplate.medatada.labels needs to be defined and have the exact same set of labels.serviceName must be set to some non-empty string as it is also required byStatefulSet.
Runner-related fields likeephemeral,repository,organization,enterprise, and so on should be written directly underspec.
Fields likevolumeClaimTemplates that originates fromStatefulSet should also be written directly underspec.
Pod-related fields like security contexts and volumes are written underspec.template.spec likeStatefulSet.
Similarly, container-related fields like resource requests and limits, container image names and tags, security context, and so on are written underspec.template.spec.containers. There are two reserved containername,runner anddocker. The former is for the container that runsactions runner and the latter is for the container that runs a dockerd.
For a more complex example, see the below:
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerSetmetadata:name:examplespec:# NOTE: RunnerSet supports non-ephemeral runners only todayephemeral:falsereplicas:2repository:mumoshu/actions-runner-controller-cidockerdWithinRunnerContainer:truetemplate:spec:securityContext:#All level/role/type/user values will vary based on your SELinux policies.#See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy for information about SELinux with containersseLinuxOptions:level:"s0"role:"system_r"type:"super_t"user:"system_u"containers: -name:runnerenv:[]resources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi" -name:dockerresources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"
You can also read the design and usage documentation written in the original pull request that introducedRunnerSet for more information.
Under the hood,RunnerSet relies on Kubernetes'sStatefulSet and Mutating Webhook. A statefulset is used to create a number of pods that has stable names and dynamically provisioned persistent volumes, so that each statefulset-managed pod gets the same persistent volume even after restarting. A mutating webhook is used to dynamically inject a runner's "registration token" which is used to call GitHub's "Create Runner" API.
We envision thatRunnerSet will eventually replaceRunnerDeployment, asRunnerSet provides a more standard API that is easy to learn and use because it is based onStatefulSet, and it has a support forvolumeClaimTemplates which is crucial to manage dynamically provisioned persistent volumes.
Limitations
- For autoscaling the
RunnerSetkind only supports pull driven scaling or theworkflow_jobevent for webhook driven scaling. - For autoscaling the
RunnerSetkind doesn't support theregistration-only runner - A known down-side of relying on
StatefulSetis that it misses a support formaxUnavailable. AStatefulSetbasically works likemaxUnavailable: 1inDeployment, which means that it can take down only one pod concurrently while doing a rolling-update of pods. Kubernetes 1.22 doesn't support customizing it yet so probably it takes more releases to arrive. Seekubernetes/kubernetes#68397 for more information.
BothRunnerDeployment andRunnerSet has ability to configureephemeral: true in the spec.
When it is configured, it passes a--once flag to every runner.
--once is an experimentalactions/runner feature that instructs the runner to stop after the first job run. But it is a known race issue that may fetch a job even when it's being terminated. If a runner fetched a job while terminating, the job is very likely to fail because the terminating runner doesn't wait for the job to complete. This is tracked in #466.
The below feature depends on an unreleased GitHub feature
GitHub seems to be adding an another flag called--ephemeral that is race-free. The pull request to add it toactions/runner can be found atactions/runner#660.
actions-runner-controller has a feature flag backend by an environment variable to enable using--ephemeral instead of--once. The environment variable isRUNNER_FEATURE_FLAG_EPHEMERAL. You can se it totrue on runner containers in your runner pods to enable the feature.
At the time of writing this, you need to wait until GitHub rolls out the server-side feature for
--ephemeral, AND you need to include your ownactions/runnerbinary built fromactions/runner#660 into the runner container image to test this feature.Please see comments in
runner/Dockerfilefor more information about how to build a custom image using your ownactions/runnerbinary.
For example, aRunnerSet config with the flag enabled looks like:
kind:RunnerSetmetadata:name:example-runnersetspec:# ...template:metadata:labels:app:example-runnersetspec:containers: -name:runnerimagePullPolicy:IfNotPresentenv: -name:RUNNER_FEATURE_FLAG_EPHEMERALvalue:"true"
Note that onceactions/runner#660 becomes generally available on GitHub, you no longer need to build a custom runner image to use this feature. Just setRUNNER_FEATURE_FLAG_EPHEMERAL and it should use--ephemeral.
In the future,--once might get removed inactions/runner.actions-runner-controller will make--ephemeral the default option forephemeral: true runners until the legacy flag is removed.
Cloud Tooling
The project supports being deployed on the various cloud Kubernetes platforms (e.g. EKS), it does not however aim to go beyond that. No cloud specific tooling is bundled in the base runner, this is an active decision to keep the overhead of maintaining the solution manageable.
Bundled Software
The GitHub hosted runners include a large amount of pre-installed software packages. GitHub maintain a list in README files athttps://github.com/actions/virtual-environments/tree/main/images/linux
This solution maintains a few runner images withlatest aligning with GitHub's Ubuntu version. Older images are maintained whilst GitHub also provides them as an option. These images do not contain all of the software installed on the GitHub runners. It contains the following subset of packages from the GitHub runners:
- Basic CLI packages
- git
- docker
- build-essentials
The virtual environments from GitHub contain a lot more software packages (different versions of Java, Node.js, Golang, .NET, etc) which are not provided in the runner image. Most of these have dedicated setup actions which allow the tools to be installed on-demand in a workflow, for example:actions/setup-java oractions/setup-node
If there is a need to include packages in the runner image for which there is no setup action, then this can be achieved by building a custom container image for the runner. The easiest way is to start with thesummerwind/actions-runner image and installing the extra dependencies directly in the docker image:
FROM summerwind/actions-runner:latestRUN sudo apt update -y \&& sudo apt install YOUR_PACKAGE&& sudo rm -rf /var/lib/apt/lists/*
You can then configure the runner to use a custom docker image by configuring theimage field of aRunner orRunnerDeployment:
apiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:custom-runnerspec:repository:actions-runner-controller/actions-runner-controllerimage:YOUR_CUSTOM_DOCKER_IMAGE
2020-11-12T22:17:30.693ZERRORcontroller-runtime.controllerReconciler error{"controller":"runner","request":"actions-runner-system/runner-deployment-dk7q8-dk5c9","error":"failed to create registration token: Post\"https://api.github.com/orgs/$YOUR_ORG_HERE/actions/runners/registration-token\": net/http: invalid header field value\"Bearer $YOUR_TOKEN_HERE\\n\" for key Authorization"}
Solution
Your base64'ed PAT token has a new line at the end, it needs to be created without a\n added, either:
echo -n $TOKEN | base64- Create the secret as described in the docs using the shell and documented flags
If you're running your action runners on a service mesh like Istio, you mighthave problems with runner configuration accompanied by logs like:
....runner Starting Runner listener with startup type: servicerunner Started listener processrunner An error occurred: Not configuredrunner Runner listener exited with error code 2runner Runner listener exit with retryable error, re-launch runner in 5 seconds.....This is because theistio-proxy has not completed configuring itself when theconfiguration script tries to communicate with the network.
Solution
Added originally to help users with older istio instances.Newer Istio instances can use Istio's
holdApplicationUntilProxyStartsattribute (istio/istio#11130) to avoid having to delay starting up the runner.Please read the discussion in#592 for more information.
Note: Prior to the runner version v2.279.0, the environment variable referenced below was calledSTARTUP_DELAY.
You can add a delay to the runner's entrypoint script by setting theSTARTUP_DELAY_IN_SECONDS environmentvariable for the runner pod. This will cause the script to sleep X seconds, this works with any runner kind.
ExampleRunnerDeployment with a 2 second startup delay:
apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnerdeployment-with-sleepspec:template:spec:env: -name:STARTUP_DELAY_IN_SECONDSvalue:"2"# Remember! env var values must be strings.
For more details on contributing to the project (including requirements) please check outGetting Started with Contributing.
About
Kubernetes controller for GitHub Actions self-hosted runnners
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Languages
- Go87.9%
- Shell6.0%
- Makefile2.9%
- Smarty1.3%
- Dockerfile1.2%
- JavaScript0.7%



