balasharan/actions-runner-gitPublic

forked fromactions/actions-runner-controller

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Kubernetes controller for GitHub Actions self-hosted runners

License

Apache-2.0 license

0 stars 1.3k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 935 Commits
.github		.github
acceptance		acceptance
api/v1alpha1		api/v1alpha1
charts		charts
cmd/githubwebhookserver		cmd/githubwebhookserver
config		config
controllers		controllers
docs/releasenotes		docs/releasenotes
github		github
hack		hack
hash		hash
logging		logging
pkg		pkg
runner		runner
simulator		simulator
test		test
testing		testing
.dockerignore		.dockerignore
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
SECURITY.md		SECURITY.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Repository files navigation

actions-runner-controller (ARC)

This controller operates self-hosted runners for GitHub Actions on your Kubernetes cluster.

ToC:

People

actions-runner-controller is an open-source project currently developed and maintained in collaboration with maintainers @mumoshu and @toast-gear, variouscontributors, and theawesome community, mostly in their spare time.

If you think the project is awesome and it's becoming a basis for your important business, considersponsoring us!

In case you are already the employer of one of contributors, sponsoring via GitHub Sponsors might not be an option. Just support them in other means!

We don't currently haveany sponsors dedicated to this project yet.

However,HelloFresh has recently started sponsoring @mumoshu for this project along with his other works. A part of their sponsorship will enable @mumoshu to add an E2E test to keep ARC even more reliable on AWS. Thank you for your sponsorship!

Status

Even though actions-runner-controller is used in production environments, it is still in its early stage of development, hence versioned 0.x.

actions-runner-controller complies to Semantic Versioning 2.0.0 in which v0.x means that there could be backward-incompatible changes for every release.

The documentation is kept inline with master@HEAD, we do our best to highlight any features that require a specific ARC version or higher however this is not always easily done due to there being many moving parts. Additionally, we actively do not retain compatibly with every GitHub Enterprise Server version nor every Kubernetes version so you will need to ensure you stay current within a reasonable timespan.

About

GitHub Actions is a very useful tool for automating development. GitHub Actions jobs are run in the cloud by default, but you may want to run your jobs in your environment.Self-hosted runner can be used for such use cases, but requires the provisioning and configuration of a virtual machine instance. Instead if you already have a Kubernetes cluster, it makes more sense to run the self-hosted runner on top of it.

actions-runner-controller makes that possible. Just create aRunner resource on your Kubernetes, and it will run and operate the self-hosted runner for the specified repository. Combined with Kubernetes RBAC, you can also build simple Self-hosted runners as a Service.

Installation

By default, actions-runner-controller usescert-manager for certificate management of Admission Webhook. Make sure you have already installed cert-manager before you install. The installation instructions for the cert-manager can be found below.

Installing cert-manager on Kubernetes

After installing cert-manager, install the custom resource definitions and actions-runner-controller withkubectl orhelm. This will create an actions-runner-system namespace in your Kubernetes and deploy the required resources.

Kubectl Deployment:

# REPLACE "v0.22.0" with the version you wish to deploykubectl apply -f https://github.com/actions-runner-controller/actions-runner-controller/releases/download/v0.22.0/actions-runner-controller.yaml

Helm Deployment:

Configure your values.yaml, see the chart'sREADME for the values documentation

helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controllerhelm upgrade --install --namespace actions-runner-system --create-namespace \             --wait actions-runner-controller actions-runner-controller/actions-runner-controller

GitHub Enterprise Support

The solution supports both GHEC (GitHub Enterprise Cloud) and GHES (GitHub Enterprise Server) editions as well as regular GitHub. Both PAT (personal access token) and GitHub App authentication works for installations that will be deploying either repository level and / or organization level runners. If you need to deploy enterprise level runners then you are restricted to PAT based authentication as GitHub doesn't support GitHub App based authentication for enterprise runners currently.

If you are deploying this solution into a GHES environment then you will need to be running version >=3.3.0.

When deploying the solution for a GHES environment you need to provide an additional environment variable as part of the controller deployment:

kubectlset env deploy controller-manager -c manager GITHUB_ENTERPRISE_URL=<GHEC/S URL> --namespace actions-runner-system

Note: The repository maintainers do not have an enterprise environment (cloud or server). Support for the enterprise specific feature set is community driven and on a best effort basis. PRs from the community are welcome to add features and maintain support.

Setting Up Authentication with GitHub API

There are two ways for actions-runner-controller to authenticate with the GitHub API (only 1 can be configured at a time however):

Using a GitHub App (not supported for enterprise level runners due to lack of support from GitHub)
Using a PAT

Functionality wise, there isn't much of a difference between the 2 authentication methods. The primary benefit of authenticating via a GitHub App is anincreased API quota.

If you are deploying the solution for a GHES environment you are able toconfigure your rate limit settings making the main benefit irrelevant. If you're deploying the solution for a GHEC or regular GitHub environment and you run into rate limit issues, consider deploying the solution using the GitHub App authentication method instead.

Deploying Using GitHub App Authentication

You can create a GitHub App for either your user account or any organization, below are the app permissions required for each supported type of runner:

Note: Links are provided further down to create an app for your logged in user account or an organization with the permissions for all runner types set in each link's query string

Required Permissions for Repository Runners:
Repository Permissions

Actions (read)
Administration (read / write)
Checks (read) (if you are going to useWebhook Driven Scaling)
Metadata (read)

Required Permissions for Organization Runners:
Repository Permissions

Actions (read)
Metadata (read)

Organization Permissions

Self-hosted runners (read / write)

Note: All API routes mapped to their permissions can be foundhere if you wish to review

Subscribe to events

At this point you have a choice of configuring a webhook, a webhook is needed if you are going to usewebhook driven scaling. The webhook can be configured centrally in the GitHub app itself or separately. In either case the event details are:

Check run (required for all webhook driven scaling events)
Workflow job (optionally) (required forwebhook driven scaling with workflow_job events

Setup Steps

If you want to create a GitHub App for your account, open the following link to the creation page, enter any unique name in the "GitHub App name" field, and hit the "Create GitHub App" button at the bottom of the page.

Create GitHub Apps on your account

If you want to create a GitHub App for your organization, replace the:org part of the following URL with your organization name before opening it. Then enter any unique name in the "GitHub App name" field, and hit the "Create GitHub App" button at the bottom of the page to create a GitHub App.

Create GitHub Apps on your organization

You will see anApp ID on the page of the GitHub App you created as follows, the value of this App ID will be used later.

Download the private key file by pushing the "Generate a private key" button at the bottom of the GitHub App page. This file will also be used later.

Go to the "Install App" tab on the left side of the page and install the GitHub App that you created for your account or organization.

When the installation is complete, you will be taken to a URL in one of the following formats, the last number of the URL will be used as the Installation ID later (For example, if the URL ends insettings/installations/12345, then the Installation ID is12345).

https://github.com/settings/installations/${INSTALLATION_ID}
https://github.com/organizations/eventreactor/settings/installations/${INSTALLATION_ID}

Finally, register the App ID (APP_ID), Installation ID (INSTALLATION_ID), and the downloaded private key file (PRIVATE_KEY_FILE_PATH) to Kubernetes as a secret.

Kubectl Deployment:

$ kubectl create secret generic controller-manager \    -n actions-runner-system \    --from-literal=github_app_id=${APP_ID} \    --from-literal=github_app_installation_id=${INSTALLATION_ID} \    --from-file=github_app_private_key=${PRIVATE_KEY_FILE_PATH}

Helm Deployment:

Configure your values.yaml, see the chart'sREADME for deploying the secret via Helm

Deploying Using PAT Authentication

Personal Access Tokens can be used to register a self-hosted runner byactions-runner-controller.

Log-in to a GitHub account that hasadmin privileges for the repository, andcreate a personal access token with the appropriate scopes listed below:

Required Scopes for Repository Runners

repo (Full control)

Required Scopes for Organization Runners

repo (Full control)
admin:org (Full control)
admin:public_key (read:public_key)
admin:repo_hook (read:repo_hook)
admin:org_hook (Full control)
notifications (Full control)
workflow (Full control)

Required Scopes for Enterprise Runners

admin:enterprise (manage_runners:enterprise)

Note: When you deploy enterprise runners they will get access to organizations, however, access to the repositories themselves isNOT allowed by default. Each GitHub organization must allow enterprise runner groups to be used in repositories as an initial one-time configuration step, this only needs to be done once after which it is permanent for that runner group.

Note: GitHub does not document exactly what permissions you get with each PAT scope beyond a vague description. The best documentation they provide on the topic can be foundhere if you wish to review. The docs target OAuth apps and so are incomplete and may not be 100% accurate.

Once you have created the appropriate token, deploy it as a secret to your Kubernetes cluster that you are going to deploy the solution on:

Kubectl Deployment:

kubectl create secret generic controller-manager \    -n actions-runner-system \    --from-literal=github_token=${GITHUB_TOKEN}

Helm Deployment:

Configure your values.yaml, see the chart'sREADME for deploying the secret via Helm

Deploying Multiple Controllers

This feature requires controller version =>v0.18.0

Note: Be aware when using this feature that CRDs are cluster-wide and so you should upgrade all of your controllers (and your CRDs) at the same time if you are doing an upgrade. Do not mix and match CRD versions with different controller versions. Doing so risks out of control scaling.

By default the controller will look for runners in all namespaces, the watch namespace feature allows you to restrict the controller to monitoring a single namespace. This then lets you deploy multiple controllers in a single cluster. You may want to do this either because you wish to scale beyond the API rate limit of a single PAT / GitHub App configuration or you wish to support multiple GitHub organizations with runners installed at the organization level in a single cluster.

This feature is configured via the controller's--watch-namespace flag. When a namespace is provided via this flag, the controller will only monitor runners in that namespace.

You can deploy multiple controllers either in a single shared namespace, or in a unique namespace per controller.

If you plan on installing all instances of the controller stack into a single namespace there are a few things you need to do for this to work.

All resources per stack must have a unique, in the case of Helm this can be done by giving each install a unique release name, or via thefullnameOverride properties.
authSecret.name needs to be unique per stack when each stack is tied to runners in different GitHub organizations and repositories AND you want your GitHub credentials to be narrowly scoped.
leaderElectionId needs to be unique per stack. If this is not unique to the stack the controller tries to race onto the leader election lock resulting in only one stack working concurrently. Your controller will be stuck with a log message something like thisattempting to acquire leader lease arc-controllers/actions-runner-controller...
The MutatingWebhookConfiguration in each stack must include a namespace selector for that stack's corresponding runner namespace, this is already configured in the helm chart.

Alternatively, you can install each controller stack into a unique namespace (relative to other controller stacks in the cluster). Implementing ARC this way avoids the first, second and third pitfalls (you still need to set the corresponding namespace selector for each stack's mutating webhook)

Usage

GitHub self-hosted runners can be deployed at various levels in a management hierarchy:

The repository level
The organization level
The enterprise level

There are two ways to use this controller:

Manage runners one by one withRunner.
Manage a set of runners withRunnerDeployment.

Repository Runners

To launch a single self-hosted runner, you need to create a manifest file that includes aRunner resource as follows. This example launches a self-hosted runner with nameexample-runner for theactions-runner-controller/actions-runner-controller repository.

# runner.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:example-runnerspec:repository:example/myrepoenv:[]

Apply the created manifest file to your Kubernetes.

$ kubectl apply -f runner.yamlrunner.actions.summerwind.dev/example-runner created

You can see that the Runner resource has been created.

$ kubectl get runnersNAME             REPOSITORY                             STATUSexample-runner   actions-runner-controller/actions-runner-controller   Running

You can also see that the runner pod has been running.

$ kubectl get podsNAME           READY   STATUS    RESTARTS   AGEexample-runner 2/2     Running   0          1m

The runner you created has been registered to your repository.

Now you can use your self-hosted runner. See theofficial documentation on how to run a job with it.

Organization Runners

To add the runner to an organization, you only need to replace therepository field withorganization, so the runner will register itself to the organization.

# runner.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:example-org-runnerspec:organization:your-organization-name

Now you can see the runner on the organization level (if you have organization owner permissions).

Enterprise Runners

To add the runner to an enterprise, you only need to replace therepository field withenterprise, so the runner will register itself to the enterprise.

# runner.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:example-enterprise-runnerspec:enterprise:your-enterprise-name

Now you can see the runner on the enterprise level (if you have enterprise access permissions).

RunnerDeployments

You can manage sets of runners instead of individually through theRunnerDeployment kind and itsreplicas: attribute. This kind is required for many of the advanced features.

There areRunnerReplicaSet andRunnerDeployment kinds that corresponds to theReplicaSet andDeployment kinds but for theRunner kind.

You typically only needRunnerDeployment rather thanRunnerReplicaSet as the former is for managing the latter.

# runnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnerdeployspec:replicas:2template:spec:repository:mumoshu/actions-runner-controller-cienv:[]

Apply the manifest file to your cluster:

$ kubectl apply -f runnerdeployment.yamlrunnerdeployment.actions.summerwind.dev/example-runnerdeploy created

You can see that 2 runners have been created as specified byreplicas: 2:

$ kubectl get runnersNAME                             REPOSITORY                             STATUSexample-runnerdeploy2475h595fr   mumoshu/actions-runner-controller-ci   Runningexample-runnerdeploy2475ht2qbr   mumoshu/actions-runner-controller-ci   Running

RunnerSets

This feature requires controller version =>v0.20.0

Ensure you see the limitations before using this kind!!!!!

For scenarios where you require the advantages of aStatefulSet, for example persistent storage, ARC implements a runner based on Kubernetes'StatefulSets, theRunnerSet.

A basicRunnerSet would look like this:

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerSetmetadata:name:examplespec:ephemeral:falsereplicas:2repository:mumoshu/actions-runner-controller-ci# Other mandatory fields from StatefulSetselector:matchLabels:app:exampleserviceName:exampletemplate:metadata:labels:app:example

As it is based onStatefulSet,selector andtemplate.medatada.labels it needs to be defined and have the exact same set of labels.serviceName must be set to some non-empty string as it is also required byStatefulSet.

Runner-related fields likeephemeral,repository,organization,enterprise, and so on should be written directly underspec.

Fields likevolumeClaimTemplates that originates fromStatefulSet should also be written directly underspec.

Pod-related fields like security contexts and volumes are written underspec.template.spec likeStatefulSet.

Similarly, container-related fields like resource requests and limits, container image names and tags, security context, and so on are written underspec.template.spec.containers. There are two reserved containername,runner anddocker. The former is for the container that runsactions runner and the latter is for the container that runs adockerd.

For a more complex example, see the below:

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerSetmetadata:name:examplespec:ephemeral:falsereplicas:2repository:mumoshu/actions-runner-controller-cidockerdWithinRunnerContainer:truetemplate:spec:securityContext:# All level/role/type/user values will vary based on your SELinux policies.# See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy for information about SELinux with containersseLinuxOptions:level:"s0"role:"system_r"type:"super_t"user:"system_u"containers:      -name:runnerenv:[]resources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"# This is an advanced configuration. Don't touch it unless you know what you're doing.securityContext:# Usually, the runner container's privileged field is derived from dockerdWithinRunnerContainer.# But in the case where you need to run privileged job steps even if you don't use docker/don't need dockerd within the runner container,# just specified `privileged: true` like this.# See https://github.com/actions-runner-controller/actions-runner-controller/issues/1282# Do note that specifying `privileged: false` while using dind is very likely to fail, even if you use some vm-based container runtimes# like firecracker and kata. Basically they run containers within dedicated micro vms and so# it's more like you can use `privileged: true` safer with those runtimes.## privileged: true      -name:dockerresources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"

You can also read the design and usage documentation written in the original pull request that introducedRunnerSet for more information#629.

Under the hood,RunnerSet relies on Kubernetes'sStatefulSet and Mutating Webhook. Astatefulset is used to create a number of pods that has stable names and dynamically provisioned persistent volumes, so that eachstatefulset-managed pod gets the same persistent volume even after restarting. A mutating webhook is used to dynamically inject a runner's "registration token" which is used to call GitHub's "Create Runner" API.

Limitations

For autoscaling theRunnerSet kind only supports pull driven scaling or theworkflow_job event for webhook driven scaling.

Persistent Runners

Every runner managed by ARC is "ephemeral" by default. The life of an ephemeral runner managed by ARC looks like this- ARC creates a runner pod for the runner. As it's an ephemeral runner, the--ephemeral flag is passed to theactions/runner agent that runs within therunner container of the runner pod.

--ephemeral is anactions/runner feature that instructs the runner to stop and de-register itself after the first job run.

Once the ephemeral runner has completed running a workflow job, it stops with a status code of 0, hence the runner pod is marked as completed, removed by ARC.

As it's removed after a workflow job run, the runner pod is never reused across multiple GitHub Actions workflow jobs, providing you a clean environment per each workflow job.

Although not generally recommended, it's possible to disable the passing of the--ephemeral flag by explicitly settingephemeral: false in theRunnerDeployment orRunnerSet spec. When disabled, your runner becomes "persistent". A persistent runner does not stop after workflow job ends, and in this modeactions/runner is known to clean only runner's work dir after each job. Whilst this can seem helpful it creates a non-deterministic environment which is not ideal for a CI/CD environment. Between runs, your actions cache, docker images stored in thedind and layer cache, globally installed packages etc are retained across multiple workflow job runs which can cause issues that are hard to debug and inconsistent.

Persistent runners are available as an option for some edge cases however they are not preferred as they can create challenges around providing a deterministic and secure environment.

Autoscaling

Since the release of GitHub'sworkflow_job webhook, webhook driven scaling is the preferred way of autoscaling as it enables targeted scaling of yourRunnerDeployment /RunnerSet as it includes theruns-on information needed to scale the appropriate runners for that workflow run. More broadly, webhook driven scaling is the preferred scaling option as it is far quicker compared to the pull driven scaling and is easy to set up.

If you are using controller version <v0.22.0 and you are not using GHES, and so can't set your rate limit budget, it is recommended that you use 100 replicas or fewer to prevent being rate limited.

ARunnerDeployment orRunnerSet can scale the number of runners betweenminReplicas andmaxReplicas fields driven by either pull based scaling metrics or via a webhook event (see limitations section ofRunnerSets for caveats of this kind). Whether the autoscaling is driven from a webhook event or pull based metrics it is implemented by backing aRunnerDeployment orRunnerSet kind with aHorizontalRunnerAutoscaler kind.

Important!!! If you opt to configure autoscaling, ensure you remove thereplicas: attribute in theRunnerDeployment /RunnerSet kinds that are configured for autoscaling#206

Anti-Flapping Configuration

For both pull driven or webhook driven scaling an anti-flapping implementation is included, by default a runner won't be scaled down within 10 minutes of it having been scaled up.

This anti-flap configuration also has the final say on if a runner can be scaled down or not regardless of the chosen scaling method.

This delay is configurable via 2 methods:

By setting a new default via the controller's--default-scale-down-delay flag
By setting by setting the attributescaleDownDelaySecondsAfterScaleOut: in aHorizontalRunnerAutoscaler kind'sspec:.

Below is a complete basic example of one of the pull driven scaling metrics.

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runner-deploymentspec:template:spec:repository:example/myrepo---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:# Runners in the targeted RunnerDeployment won't be scaled down# for 5 minutes instead of the default 10 minutes nowscaleDownDelaySecondsAfterScaleOut:300scaleTargetRef:name:example-runner-deployment# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetminReplicas:1maxReplicas:5metrics:  -type:PercentageRunnersBusyscaleUpThreshold:'0.75'scaleDownThreshold:'0.25'scaleUpFactor:'2'scaleDownFactor:'0.5'

Pull Driven Scaling

To configure webhook driven scaling see theWebhook Driven Scaling section

The pull based metrics are configured in themetrics attribute of a HRA (see snippet below). The period between polls is defined by the controller's--sync-period flag. If this flag isn't provided then the controller defaults to a sync period of1m, this can be configured in seconds or minutes.

Be aware that the shorter the sync period the quicker you will consume your rate limit budget, depending on your environment this may or may not be a risk. Consider monitoring ARCs rate limit budget when configuring this feature to find the optimal performance sync period.

apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:# Your RunnerDeployment Herename:example-runner-deployment# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetminReplicas:1maxReplicas:5# Your chosen scaling metrics heremetrics:[]

Metric Options:

TotalNumberOfQueuedAndInProgressWorkflowRuns

TheTotalNumberOfQueuedAndInProgressWorkflowRuns metric polls GitHub for all pending workflow runs against a given set of repositories. The metric will scale the runner count up to the total number of pending jobs at the sync time up to themaxReplicas configuration.

Benefits of this metric

Supports named repositories allowing you to restrict the runner to a specified set of repositories server-side.
Scales the runner count based on the depth of the job queue meaning a 1:1 scaling of runners to queued jobs.
Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use ofGitHub labels.

Drawbacks of this metric

A list of repositories must be included within the scaling metric. Maintaining a list of repositories may not be viable in larger environments or self-serve environments.
May not scale quickly enough for some users' needs. This metric is pull based and so the queue depth is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
Relatively large amounts of API requests are required to maintain this metric, you may run into API rate limit issues depending on the size of your environment and how aggressive your sync period configuration is.

ExampleRunnerDeployment backed by aHorizontalRunnerAutoscaler:

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runner-deploymentspec:template:spec:repository:example/myrepo---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deployment# IMPORTANT : If your HRA is targeting a RunnerSet you must specify the kind in the scaleTargetRef:, uncomment the below#kind: RunnerSetminReplicas:1maxReplicas:5metrics:  -type:TotalNumberOfQueuedAndInProgressWorkflowRunsrepositoryNames:    -example/myrepo

PercentageRunnersBusy

TheHorizontalRunnerAutoscaler will poll GitHub for the number of runners in thebusy state which live in the RunnerDeployment's namespace, it will then scale depending on how you have configured the scale factors.

Benefits of this metric

Supports named repositories server-side the same as theTotalNumberOfQueuedAndInProgressWorkflowRuns metric#313
Supports GitHub organization wide scaling without maintaining an explicit list of repositories, this is especially useful for those that are working at a larger scale.#223
Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use ofGitHub labels
Supports scaling desired runner count on both a percentage increase / decrease basis as well as on a fixed increase / decrease count basis#223 #315

Drawbacks of this metric

May not scale quickly enough for some users' needs. This metric is pull based and so the number of busy runners is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
We are scaling up and down based on indicative information rather than a count of the actual number of queued jobs and so the desired runner count is likely to under provision new runners or overprovision them relative to actual job queue depth, this may or may not be a problem for you.

Examples of each scaling type implemented with aRunnerDeployment backed by aHorizontalRunnerAutoscaler:

---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deployment# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetminReplicas:1maxReplicas:5metrics:  -type:PercentageRunnersBusyscaleUpThreshold:'0.75'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale upscaleDownThreshold:'0.3'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale downscaleUpFactor:'1.4'# The scale up multiplier factor applied to desired countscaleDownFactor:'0.7'# The scale down multiplier factor applied to desired count

---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deployment# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetminReplicas:1maxReplicas:5metrics:  -type:PercentageRunnersBusyscaleUpThreshold:'0.75'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale upscaleDownThreshold:'0.3'# The percentage of busy runners at which the number of desired runners are re-evaluated to scale downscaleUpAdjustment:2# The scale up runner count added to desired countscaleDownAdjustment:1# The scale down runner count subtracted from the desired count

Webhook Driven Scaling

To configure pull driven scaling see thePull Driven Scaling section

Webhooks are processed by a separate webhook server. The webhook server receives GitHub Webhook events and scalesRunnerDeployments by updating correspondingHorizontalRunnerAutoscalers.

Today, the Webhook server can be configured to respond to GitHub'scheck_run,workflow_job,pull_request, andpush eventsby scaling up the matchingHorizontalRunnerAutoscaler by N replica(s), whereN is configurable withinHorizontalRunnerAutoscaler'sspec:.

More concretely, you can configure the targeted GitHub event types and theN inscaleUpTriggers:

kind:HorizontalRunnerAutoscalerspec:scaleTargetRef:name:example-runners# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscaleUpTriggers:  -githubEvent:checkRun:types:["created"]status:"queued"amount:1duration:"5m"

With the above example, the webhook server scalesexample-runners by1 replica for 5 minutes on eachcheck_run event with the type ofcreated and the status ofqueued received.

Of note is theHRA.spec.scaleUpTriggers[].duration attribute. This attribute is used to calculate if the replica number added via the trigger is expired or not. On each reconciliation loop, the controller sums up all the non-expiring replica numbers from previous scale-up triggers. It then compares the summed desired replica number against the current replica number. If the summed desired replica number > the current number then it means the replica count needs to scale up.

As mentioned previously, thescaleDownDelaySecondsAfterScaleOut property has the final say still. If the latest scale-up time + the anti-flapping duration is later than the current time, it doesn’t immediately scale up and instead retries the calculation again later to see if it needs to scale yet.

The primary benefit of autoscaling on Webhooks compared to the pull driven scaling is that it is far quicker as it allows you to immediately add runner resources rather than waiting for the next sync period.

You can learn the implementation details in#282

To enable this feature, you first need to install the GitHub webhook server. To install via our Helm chart,see the values documentation for all configuration options

$helm upgrade --install --namespace actions-runner-system --create-namespace \             --wait actions-runner-controller actions-runner-controller/actions-runner-controller \             --set "githubWebhookServer.enabled=true,service.type=NodePort,githubWebhookServer.ports[0].nodePort=33080"

The above command will result in exposing the node port 33080 for Webhook events.Usually, you need to create an external load balancer targeted to the node port,and register the hostname or the IP address of the external load balancer to the GitHub Webhook.

With a custom Kubernetes ingress controller:

CAUTION: The Kubernetes ingress controllers described below is just a suggestion from the community andthe ARC team will not provide any user support for ingress controllers as it's not a part of this project.
The following guide on creating an ingress has been contributed by the awesome ARC community and is provided here as-is.You may, however, still be able to ask for help on the community on GitHub Discussions if you have any problems.

Kubernetes providesIngress resources to let you configure your ingress controller to expose a Kubernetes service.If you plan to expose ARC via Ingress, you might not be required to make it aNodePort service(although nothing would prevent an ingress controller to expose NodePort services too):

$helm upgrade --install --namespace actions-runner-system --create-namespace \             --wait actions-runner-controller actions-runner-controller/actions-runner-controller \             --set "githubWebhookServer.enabled=true"

The command above will create a new deployment and a service for receiving Github Webhooks on theactions-runner-system namespace.

Now we need to expose this service so that GitHub can send these webhooks over the network with TSL protection.

You can do it in any way you prefer, here we'll suggest doing it with a k8s Ingress.For the sake of this example we'll expose this service on the following URL:

https://your.domain.com/actions-runner-controller-github-webhook-server

Whereyour.domain.com should be replaced by your own domain.

Note: This step assumes you already have a configuredcert-manager and domain name for your cluster.

Let's start by creating an Ingress file calledarc-webhook-server.yaml with the following contents:

apiVersion:networking.k8s.io/v1kind:Ingressmetadata:name:actions-runner-controller-github-webhook-servernamespace:actions-runner-systemannotations:kubernetes.io/ingress.class:nginxnginx.ingress.kubernetes.io/backend-protocol:"HTTP"spec:tls:  -hosts:    -your.domain.comsecretName:your-tls-secret-namerules:    -http:paths:          -path:/actions-runner-controller-github-webhook-serverpathType:Prefixbackend:service:name:actions-runner-controller-github-webhook-serverport:number:80

Make sure to set thespec.tls.secretName to the name of your TLS secret andspec.tls.hosts[0] to your own domain.

Then create this resource on your cluster with the following command:

kubectl apply -n actions-runner-system -f arc-webhook-server.yaml

Configuring GitHub for sending webhooks for our newly created webhook server:

After this step your webhook server should be ready to start receiving webhooks from GitHub.

To configure GitHub to start sending you webhooks, go to the settings page of your repositoryor organization then click onWebhooks, then onAdd webhook.

There set the "Payload URL" field with the webhook URL you just created,if you followed the example ingress above the URL would be something like this:

https://your.domain.com/actions-runner-controller-github-webhook-server

Remember to replaceyour.domain.com with your own domain.

Then click on "let me select individual events" and chooseWorkflow Jobs.

You may also want to choose the following event(s) if you use it as a scale trigger in your HRA spec:

Check runs
Pushes
Pull Requests

Later you can remove any of these you are not using to reduce the amount of data sent to your server.

Then click onAdd Webhook.

GitHub will then send aping event to your webhook server to check if it is working, if it is you'll see a green V markalongside your webhook on the Settings -> Webhooks page.

Once you were able to confirm that the Webhook server is ready and running from GitHub create or update yourHorizontalRunnerAutoscaler resources by learning the following configuration examples.

Example 1: Scale on each`workflow_job` event

This feature requires controller version =>v0.20.0

Note: GitHub does not include the runner group information of a repository in the payload ofworkflow_job event in the initialqueued event. The runner group information is only included forworkflow_job events when the job has already been allocated to a runner (events with a status ofin_progress orcompleted). Please do raise feature requests againstGitHub for this information to be included in the initialqueued event if this would improve autoscaling runners for you.

The most flexible webhook GitHub offers is theworkflow_job webhook, it includes theruns-on information in the payload allowing scaling based on runner labels.

This webhook should cover most people's needs, please experiment with this webhook first before considering the others.

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:repository:example/myrepo---apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runnersspec:scaleDownDelaySecondsAfterScaleOut:300minReplicas:1maxReplicas:10scaleTargetRef:name:example-runners# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscaleUpTriggers:  -githubEvent:workflowJob:{}duration:"30m"

This webhook requires you to explicitly set the labels in the RunnerDeployment / RunnerSet if you are using them in your workflow to match the agents (fieldruns-on). Onlyself-hosted will be considered as included by default.

You can configure your GitHub webhook settings to only includeWorkflows Job events, so that it sends us three kinds ofworkflow_job events per a job run.

Each kind has astatus ofqueued,in_progress andcompleted. With the above configuration,actions-runner-controller adds one runner for aworkflow_job event whosestatus isqueued. Similarly, it removes one runner for aworkflow_job event whosestatus iscompleted. The caveat to this to remember is that this scale-down is within the bounds of yourscaleDownDelaySecondsAfterScaleOut configuration, if this time hasn't passed the scale down will be deferred.

Example 2: Scale up on each`check_run` event

Note: This should work almost likehttps://github.com/philips-labs/terraform-aws-github-runner

To scale up replicas of the runners forexample/myrepo by 1 for 5 minutes on eachcheck_run, you write manifests like the below:

kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:minReplicas:1maxReplicas:10scaleTargetRef:name:example-runners# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscaleUpTriggers:  -githubEvent:checkRun:types:["created"]status:"queued"amount:1duration:"5m"

To scale up replicas of the runners formyorg organization by 1 for 5 minutes on eachcheck_run, you write manifests like the below:

kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:organization:myorg---kind:HorizontalRunnerAutoscalerspec:minReplicas:1maxReplicas:10scaleTargetRef:name:example-runners# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscaleUpTriggers:  -githubEvent:checkRun:types:["created"]status:"queued"# Optionally restrict autoscaling to being triggered by events from specific repositories within your organization still# repositories: ["myrepo", "myanotherrepo"]amount:1duration:"5m"

Example 3: Scale on each`pull_request` event against a given set of branches

To scale up replicas of the runners forexample/myrepo by 1 for 5 minutes on eachpull_request against themain ordevelop branch you write manifests like the below:

kind:RunnerDeploymentmetadata:name:example-runnersspec:template:spec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:minReplicas:1maxReplicas:10scaleTargetRef:name:example-runners# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscaleUpTriggers:  -githubEvent:pullRequest:types:["synchronize"]branches:["main", "develop"]amount:1duration:"5m"

See"activity types" for the list of valid values forscaleUpTriggers[].githubEvent.pullRequest.types.

Example 4: Scale on each push event

To scale up replicas of the runners forexample/myrepo by 1 for 5 minutes on eachpush write manifests like the below:

kind:RunnerDeploymentmetadata:name:example-runnersspec:repository:example/myrepo---kind:HorizontalRunnerAutoscalerspec:minReplicas:1maxReplicas:10scaleTargetRef:name:example-runners# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscaleUpTriggers:  -githubEvent:push:amount:1duration:"5m"

Autoscaling to/from 0

This feature requires controller version =>v0.19.0

The regularRunnerDeployment /RunnerSetreplicas: attribute as well as theHorizontalRunnerAutoscalerminReplicas: attribute supports being set to 0.

The main use case for scaling from 0 is with theHorizontalRunnerAutoscaler kind. To scale from 0 whilst still being able to provision runners as jobs are queued we must use theHorizontalRunnerAutoscaler with only certain scaling configurations, only the below configurations support scaling from 0 whilst also being able to provision runners as jobs are queued:

TotalNumberOfQueuedAndInProgressWorkflowRuns
PercentageRunnersBusy +TotalNumberOfQueuedAndInProgressWorkflowRuns
PercentageRunnersBusy + Webhook-based autoscaling
Webhook-based autoscaling only

PercentageRunnersBusy can't be used alone as, by its definition, it needs one or more GitHub runners to becomebusy to be able to scale. If there isn't a runner to pick up a job and enter abusy state then the controller will never know to provision a runner to begin with as this metric has no knowledge of the job queue and is relying on using the number of busy runners as a means for calculating the desired replica count.

If a HorizontalRunnerAutoscaler is configured with a secondary metric ofTotalNumberOfQueuedAndInProgressWorkflowRuns then be aware that the controller will check the primary metric ofPercentageRunnersBusy first and will only use the secondary metric to calculate the desired replica count if the primary metric returns 0 desired replicas.

Webhook-based autoscaling is the best option as it is relatively easy to configure and also it can scale quickly.

Scheduled Overrides

This feature requires controller version =>v0.19.0

Scheduled Overrides allows you to configureHorizontalRunnerAutoscaler so that itsspec: gets updated only during a certain period of time. This feature is usually used for the following scenarios:

You want to reduce your infrastructure costs by scaling your Kubernetes nodes down outside a given period
You want to scale for scheduled spikes in workloads

The most basic usage of this feature is to set a non-repeating override:

apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deployment# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscheduledOverrides:# Override minReplicas to 100 only between 2021-06-01T00:00:00+09:00 and 2021-06-03T00:00:00+09:00  -startTime:"2021-06-01T00:00:00+09:00"endTime:"2021-06-03T00:00:00+09:00"minReplicas:100minReplicas:1

A scheduled override withoutrecurrenceRule is considered a one-off override, that is active betweenstartTime andendTime. In the second scenario, it overridesminReplicas to100 only between2021-06-01T00:00:00+09:00 and2021-06-03T00:00:00+09:00.

A more advanced configuration is to include arecurrenceRule in the override:

apiVersion:actions.summerwind.dev/v1alpha1kind:HorizontalRunnerAutoscalermetadata:name:example-runner-deployment-autoscalerspec:scaleTargetRef:name:example-runner-deployment# Uncomment the below in case the target is not RunnerDeployment but RunnerSet#kind: RunnerSetscheduledOverrides:# Override minReplicas to 0 only between 0am sat to 0am mon  -startTime:"2021-05-01T00:00:00+09:00"endTime:"2021-05-03T00:00:00+09:00"recurrenceRule:frequency:Weekly# Optional sunset datetime attribute# untilTime: "2022-05-01T00:00:00+09:00"minReplicas:0minReplicas:1

A recurring override is initially active betweenstartTime andendTime, and then it repeatedly gets activated after a certain period of time denoted byfrequency.

frequecy can take one of the following values:

Daily
Weekly
Monthly
Yearly

By default, a scheduled override repeats forever. If you want it to repeat until a specific point in time, defineuntilTime. The controller creates the last recurrence of the override until the recurrence'sstartTime is equal or earlier thanuntilTime.

Do ensure that you have enough slack foruntilTime so that a delayed or offlineactions-runner-controller is much less likely to miss the last recurrence. For example, you might want to setuntilTime toM minutes after the last recurrence'sstartTime, so thatactions-runner-controller being offline up toM minutes doesn't miss the last recurrence.

Combining Multiple Scheduled Overrides:

In case you have a more complex scenario, try writing two or more entries underscheduledOverrides.

The earlier entry is prioritized higher than later entries. So you usually define one-time overrides at the top of your list, then yearly, monthly, weekly, and lastly daily overrides.

A common use case for this may be to have 1 override to scale to 0 during the week outside of core business hours and another override to scale to 0 during all hours of the weekend.

Runner with DinD

When using the default runner, the runner pod starts up 2 containers: runner and DinD (Docker-in-Docker). This might create issues if there'sLimitRange set to namespace.

# dindrunnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-dindrunnerdeployspec:replicas:2template:spec:image:summerwind/actions-runner-dinddockerdWithinRunnerContainer:truerepository:mumoshu/actions-runner-controller-cienv:[]

This also helps with resources, as you don't need to give resources separately to docker and runner.

Additional Tweaks

You can pass details through the spec selector. Here's an eg. of what you may like to do:

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:actions-runnernamespace:defaultspec:replicas:2template:metadata:annotations:cluster-autoscaler.kubernetes.io/safe-to-evict:"true"spec:nodeSelector:node-role.kubernetes.io/test:""securityContext:#All level/role/type/user values will vary based on your SELinux policies.#See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy for information about SELinux with containersseLinuxOptions:level:"s0"role:"system_r"type:"super_t"user:"system_u"tolerations:      -effect:NoSchedulekey:node-role.kubernetes.io/testoperator:ExiststopologySpreadConstraints:        -maxSkew:1topologyKey:kubernetes.io/hostnamewhenUnsatisfiable:ScheduleAnywaylabelSelector:matchLabels:runner-deployment-name:actions-runnerrepository:mumoshu/actions-runner-controller-ci# The default "summerwind/actions-runner" images are available at DockerHub:# https://hub.docker.com/r/summerwind/actions-runner# You can also build your own and specify it like the below:image:custom-image/actions-runner:latestimagePullPolicy:Alwaysresources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"# Timeout after a node crashed or became unreachable to evict your pods somewhere else (default 5mins)tolerations:        -key:"node.kubernetes.io/unreachable"operator:"Exists"effect:"NoExecute"tolerationSeconds:10# true (default) = The runner restarts after running jobs, to ensure a clean and reproducible build environment# false = The runner is persistent across jobs and doesn't automatically restart# This directly controls the behaviour of `--once` flag provided to the github runnerephemeral:false# true (default) = A privileged docker sidecar container is included in the runner pod.# false = A docker sidecar container is not included in the runner pod and you can't use docker.# If set to false, there are no privileged container and you cannot use docker.dockerEnabled:false# Optional Docker containers network MTU# If your network card MTU is smaller than Docker's default 1500, you might encounter Docker networking issues.# To fix these issues, you should setup Docker MTU smaller than or equal to that on the outgoing network card.# More information:# - https://mlohr.com/docker-mtu/dockerMTU:1500# Optional Docker registry mirror# Docker Hub has an aggressive rate-limit configuration for free plans.# To avoid disruptions in your CI/CD pipelines, you might want to setup an external or on-premises Docker registry mirror.# More information:# - https://docs.docker.com/docker-hub/download-rate-limit/# - https://cloud.google.com/container-registry/docs/pulling-cached-imagesdockerRegistryMirror:https://mirror.gcr.io/# false (default) = Docker support is provided by a sidecar container deployed in the runner pod.# true = No docker sidecar container is deployed in the runner pod but docker can be used within the runner container instead. The image summerwind/actions-runner-dind is used by default.dockerdWithinRunnerContainer:true#Optional environment variables for docker container# Valid only when dockerdWithinRunnerContainer=falsedockerEnv:        -name:HTTP_PROXYvalue:http://example.com# Docker sidecar container image tweaks examples below, only applicable if dockerdWithinRunnerContainer = falsedockerdContainerResources:limits:cpu:"4.0"memory:"8Gi"requests:cpu:"2.0"memory:"4Gi"# Additional N number of sidecar containerssidecarContainers:        -name:mysqlimage:mysql:5.7env:            -name:MYSQL_ROOT_PASSWORDvalue:abcd1234securityContext:runAsUser:0# workDir if not specified (default = /runner/_work)# You can customise this setting allowing you to change the default working directory location# for example, the below setting is the same as on the ubuntu-18.04 imageworkDir:/home/runner/work# You can mount some of the shared volumes to the dind container using dockerVolumeMounts, like any other volume mounting.# NOTE: in case you want to use an hostPath like the following example, make sure that Kubernetes doesn't schedule more than one runner# per physical host. You can achieve that by setting pod anti-affinity rules and/or resource requests/limits.volumes:        -name:docker-extrahostPath:path:/mnt/docker-extratype:DirectoryOrCreate        -name:repohostPath:path:/mnt/repotype:DirectoryOrCreatedockerVolumeMounts:        -mountPath:/var/lib/dockername:docker-extra# You can mount some of the shared volumes to the runner container using volumeMounts.# NOTE: Do not try to mount the volume onto the runner workdir itself as it will not work. You could mount it however on a subdirectory in the runner workdir# Please see https://github.com/actions-runner-controller/actions-runner-controller/issues/630#issuecomment-862087323 for more information.volumeMounts:        -mountPath:/home/runner/work/reponame:repo# Optional storage medium type of runner volume mount.# More info: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir# "" (default) = Node's default medium# Memory = RAM-backed filesystem (tmpfs)# NOTE: Using RAM-backed filesystem gives you fastest possible storage on your host nodes.volumeStorageMedium:""# Total amount of local storage resources required for runner volume mount.# The default limit is undefined.# NOTE: You can make sure that nodes' resources are never exceeded by limiting used storage size per runner pod.# You can even disable the runner mount completely by setting limit to zero if dockerdWithinRunnerContainer = true.# Please see https://github.com/actions-runner-controller/actions-runner-controller/pull/674 for more information.volumeSizeLimit:4Gi# Optional name of the container runtime configuration that should be used for pods.# This must match the name of a RuntimeClass resource available on the cluster.# More info: https://kubernetes.io/docs/concepts/containers/runtime-classruntimeClassName:"runc"# This is an advanced configuration. Don't touch it unless you know what you're doing.containers:      -name:runner# Usually, the runner container's privileged field is derived from dockerdWithinRunnerContainer.# But in the case where you need to run privileged job steps even if you don't use docker/don't need dockerd within the runner container,# just specified `privileged: true` like this.# See https://github.com/actions-runner-controller/actions-runner-controller/issues/1282# Do note that specifying `privileged: false` while using dind is very likely to fail, even if you use some vm-based container runtimes# like firecracker and kata. Basically they run containers within dedicated micro vms and so# it's more like you can use `privileged: true` safer with those runtimes.## privileged: true

Custom Volume mounts

You can configure your own custom volume mounts. For example to have the work/docker data in memory or on NVME SSD, fori/o intensive builds. Other custom volume mounts should be possible as well, seekubernetes documentation

RAM Disk

Example how to place the runner work dir, docker sidecar and /tmp within the runner onto a ramdisk.

kind:RunnerDeploymentspec:template:spec:dockerVolumeMounts:        -mountPath:/var/lib/dockername:dockervolumeMounts:        -mountPath:/tmpname:tmpvolumes:        -name:dockeremptyDir:medium:Memory        -name:work# this volume gets automatically used up for the workdiremptyDir:medium:Memory        -name:tmpemptyDir:medium:Memoryemphemeral:true# recommended to not leak data between builds.

NVME SSD

In this example we provide NVME backed storage for the workdir, docker sidecar and /tmp within the runner.Here we use a working example on GKE, which will provide the NVME disk at /mnt/disks/ssd0. We will be placing the respective volumes in subdirs here and in order to be able to run multiple runners we will use the pod name as a prefix for subdirectories. Also the disk will fill up over time and disk space will not be freed until the node is removed.

Beware that running these persistent backend volumesleave data behind between 2 different jobs on the workdir and/tmp withemphemeral: false.

kind:RunnerDeploymentspec:template:spec:env:      -name:POD_NAMEvalueFrom:fieldRef:fieldPath:metadata.namedockerVolumeMounts:      -mountPath:/var/lib/dockername:dockersubPathExpr:$(POD_NAME)-docker      -mountPath:/runner/_workname:worksubPathExpr:$(POD_NAME)-workvolumeMounts:      -mountPath:/runner/_workname:worksubPathExpr:$(POD_NAME)-work      -mountPath:/tmpname:tmpsubPathExpr:$(POD_NAME)-tmpdockerEnv:      -name:POD_NAMEvalueFrom:fieldRef:fieldPath:metadata.namevolumes:      -hostPath:path:/mnt/disks/ssd0name:docker      -hostPath:path:/mnt/disks/ssd0name:work      -hostPath:path:/mnt/disks/ssd0name:tmpemphemeral:true# VERY important. otherwise data inside the workdir and /tmp is not cleared between builds

Docker image layers caching

Note: Ensure that the volume mount is added to the container that is running the Docker daemon.

docker stores pulled and built image layers in thedaemon's (note not client)local storage area which is usually at/var/lib/docker.

By leveraging RunnerSet's dynamic PV provisioning feature and your CSI driver, you can let ARC maintain a pool of PVs that arereused across runner pods to retain/var/lib/docker.

Be sure to add the volume mount to the container that is supposed to run the docker daemon.

By default, ARC creates a sidecar container nameddocker within the runner pod for running the docker daemon. In that case,it's where you need the volume mount so that the manifest looks like:

kind:RunnerSetmetadata:name:examplespec:template:spec:containers:      -name:dockervolumeMounts:        -name:var-lib-dockermountPath:/var/lib/dockervolumeClaimtemplates:  -metadata:name:var-lib-dockerspec:accessModes:      -ReadWriteOnceresources:requests:storage:10MistorageClassName:var-lib-docker

WithdockerdWithinRunnerContainer: true, you need to add the volume mount to therunner container.

Go module and build caching

Go is known to cache builds under$HOME/.cache/go-build and downloaded modules under$HOME/pkg/mod.The module cache dir can be customized by settingGOMOD_CACHE so by setting it to somewhere under$HOME/.cache,we can have a single PV to host both build and module cache, which might improve Go module downloading and building time.

kind:RunnerSetmetadata:name:examplespec:template:spec:containers:      -name:runnerenv:        -name:GOMODCACHEvalue:"/home/runner/.cache/go-mod"volumeMounts:        -name:cachemountPath:"/home/runner/.cache"volumeClaimTemplates:  -metadata:name:cachespec:accessModes:      -ReadWriteOnceresources:requests:storage:10MistorageClassName:cache

PV-backed runner work directory

ARC works by automatically creating runner pods for runningactions/runner andrunningconfig.sh which you had to ran manually without ARC.

config.sh is the script provided byactions/runner to pre-configure the runner process before being started. One of the options provided byconfig.sh is--work,which specifies the working directory where the runner runs your workflow jobs in.

The volume and the partition that hosts the work directory should have several or dozens of GBs free space that might be used by your workflow jobs.

By default, ARC uses/runner/_work as work directory, which is powered by Kubernetes'semptyDir.emptyDir is usually backed by a directory created within a host's volume, somewhere under/var/lib/kuberntes/pods. Thereforeyour host's volume that is backing/var/lib/kubernetes/pods must have enough free space to serve all the concurrent runner pods that might be deployed onto your host at the same time.

So, in case you see a job failure seemingly due to "disk full", it's very likely you need to reconfigure your host to have more free space.

In case you can't rely on host's volume, consider usingRunnerSet and backing the work directory with a ephemeral PV.

Kubernetes 1.23 or greater provides the support forgeneric ephemeral volumes, which is designed to support this exact use-case. It's defined in the Pod spec API so it isn't currently available forRunnerDeployment.RunnerSet is based on Kubernetes'StatefulSet which mostly embeds the Pod spec underspec.template.spec, so there you go.

kind:RunnerSetmetadata:name:examplespec:template:spec:containers:      -name:runnervolumeMounts:        -mountPath:/runner/_workname:work      -name:dockervolumeMounts:        -mountPath:/runner/_workname:workvolumes:      -name:workephemeral:volumeClaimTemplate:spec:accessModes:[ "ReadWriteOnce" ]storageClassName:"runner-work-dir"resources:requests:storage:10Gi

Runner Labels

To run a workflow job on a self-hosted runner, you can use the following syntax in your workflow:

jobs:release:runs-on:self-hosted

When you have multiple kinds of self-hosted runners, you can distinguish between them using labels. In order to do so, you can specify one or more labels in yourRunner orRunnerDeployment spec.

# runnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:custom-runnerspec:replicas:1template:spec:repository:actions-runner-controller/actions-runner-controllerlabels:        -custom-runner

Once this spec is applied, you can observe the labels for your runner from the repository or organization in the GitHub settings page for the repository or organization. You can now select a specific runner from your workflow by using the label inruns-on:

jobs:release:runs-on:custom-runner

Note that if you specifyself-hosted in your workflow, then this will run your job onany self-hosted runner, regardless of the labels that they have.

Runner Groups

Runner groups can be used to limit which repositories are able to use the GitHub Runner at an organization level. Runner groups have to becreated in GitHub first before they can be referenced.

To add the runner to the groupNewGroup, specify the group in yourRunner orRunnerDeployment spec.

# runnerdeployment.yamlapiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:custom-runnerspec:replicas:1template:spec:group:NewGroup

GitHub supports custom visibility in a Runner Group to make it available to a specific set of repositories only. By default if no GitHubauthentication is included in the webhook server ARC will be assumed that all runner groups to be usable in all repositories.Currently, GitHub does not include the repository runner group membership information in the workflow_job event (or any webhook). To make the ARC "runner group aware" additional GitHub API calls are needed to find out what runner groups are visible to the webhook's repository. This behaviour will impact your rate-limit budget and so the option needs to be explicitly configured by the end user.

This option will be enabled when proper GitHub authentication options (token, app or basic auth) are provided in the webhook server anduseRunnerGroupsVisibility is set to true, e.g.

githubWebhookServer:enabled:falsereplicaCount:1useRunnerGroupsVisibility:true

Runner Entrypoint Features

Environment variable values must all be strings

The entrypoint script is aware of a few environment variables for configuring features:

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnerdeploymentspec:template:spec:env:# Issues a sleep command at the start of the entrypoint        -name:STARTUP_DELAY_IN_SECONDSvalue:"2"# Disables the wait for the docker daemon to be available check        -name:DISABLE_WAIT_FOR_DOCKERvalue:"true"# Disables automatic runner updates        -name:DISABLE_RUNNER_UPDATEvalue:"true"# Configure runner with legacy --once instead of --ephemeral flag# WARNING | THIS ENV VAR IS DEPRECATED AND WILL BE REMOVED# THIS ENV VAR WILL BE REMOVED SOON.# SEE ISSUE #1196 FOR DETAILS        -name:RUNNER_FEATURE_FLAG_ONCEvalue:"true"

Using IRSA (IAM Roles for Service Accounts) in EKS

This feature requires controller version =>v0.15.0

Similar to regular pods and deployments, you firstly need an existing service account with the IAM role associated.Create one using e.g.eksctl. You can refer tothe EKS documentation for more details.

Once you set up the service account, all you need is to addserviceAccountName andfsGroup to any pods that use the IAM-role enabled service account.

ForRunnerDeployment, you can set those two fields under the runner spec atRunnerDeployment.Spec.Template:

apiVersion:actions.summerwind.dev/v1alpha1kind:RunnerDeploymentmetadata:name:example-runnerdeployspec:template:spec:repository:USER/REOserviceAccountName:my-service-accountsecurityContext:fsGroup:1000

Software Installed in the Runner Image

Cloud Tooling
The project supports being deployed on the various cloud Kubernetes platforms (e.g. EKS), it does not however aim to go beyond that. No cloud specific tooling is bundled in the base runner, this is an active decision to keep the overhead of maintaining the solution manageable.

Bundled Software
The GitHub hosted runners include a large amount of pre-installed software packages. GitHub maintains a list in README files athttps://github.com/actions/virtual-environments/tree/main/images/linux

This solution maintains a few runner images withlatest aligning with GitHub's Ubuntu version, these images do not contain all of the software installed on the GitHub runners. The images contain the following subset of packages from the GitHub runners:

Basic CLI packages
git
docker
build-essentials

The virtual environments from GitHub contain a lot more software packages (different versions of Java, Node.js, Golang, .NET, etc) which are not provided in the runner image. Most of these have dedicated setup actions which allow the tools to be installed on-demand in a workflow, for example:actions/setup-java oractions/setup-node

If there is a need to include packages in the runner image for which there is no setup action, then this can be achieved by building a custom container image for the runner. The easiest way is to start with thesummerwind/actions-runner image and then install the extra dependencies directly in the docker image:

FROM summerwind/actions-runner:latestRUN sudo apt update -y \&& sudo apt install YOUR_PACKAGE&& sudo rm -rf /var/lib/apt/lists/*

You can then configure the runner to use a custom docker image by configuring theimage field of aRunner orRunnerDeployment:

apiVersion:actions.summerwind.dev/v1alpha1kind:Runnermetadata:name:custom-runnerspec:repository:actions-runner-controller/actions-runner-controllerimage:YOUR_CUSTOM_DOCKER_IMAGE

Using without cert-manager

Assuming you are installing in the default namespace, ensure your certificate has SANs:

webhook-service.actions-runner-system.svc
webhook-service.actions-runner-system.svc.cluster.local

It is possible to use a self-signed certificate by following a guide likethis oneusingopenssl.

Install your certificate as a TLS secret:

$ kubectl create secret tls webhook-server-cert \  -n actions-runner-system \  --cert=path/to/cert/file \  --key=path/to/key/file

Set the Helm chart values as follows:

$ CA_BUNDLE=$(cat path/to/ca.pem| base64)$ helm --upgrade install actions-runner-controller/actions-runner-controller \  certManagerEnabled=false \  admissionWebHooks.caBundle=${CA_BUNDLE}

Troubleshooting

Seetroubleshooting guide for solutions to various problems people have run into consistently.

Contributing

For more details on contributing to the project (including requirements) please check outGetting Started with Contributing.

About

Kubernetes controller for GitHub Actions self-hosted runners

Resources

Readme

License

Apache-2.0 license

Contributing

Releases

96tags

Packages

No packages published

Languages

Go89.3%
Shell5.5%
Makefile2.3%
Dockerfile1.8%
Smarty1.1%

Movatterモバイル変換

License

balasharan/actions-runner-git

Folders and files

Latest commit

History

Repository files navigation

actions-runner-controller (ARC)

People

Status

About

Installation

GitHub Enterprise Support

Setting Up Authentication with GitHub API

Deploying Using GitHub App Authentication

Deploying Using PAT Authentication

Deploying Multiple Controllers

Usage

Repository Runners

Organization Runners

Enterprise Runners

RunnerDeployments

RunnerSets

Persistent Runners

Autoscaling

Anti-Flapping Configuration

Pull Driven Scaling

Webhook Driven Scaling

Example 1: Scale on eachworkflow_job event

Example 2: Scale up on eachcheck_run event

Example 3: Scale on eachpull_request event against a given set of branches

Example 4: Scale on each push event

Autoscaling to/from 0

Scheduled Overrides

Runner with DinD

Additional Tweaks

Custom Volume mounts

RAM Disk

NVME SSD

Docker image layers caching

Go module and build caching

PV-backed runner work directory

Runner Labels

Runner Groups

Runner Entrypoint Features

Using IRSA (IAM Roles for Service Accounts) in EKS

Software Installed in the Runner Image

Using without cert-manager

Troubleshooting

Contributing

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Example 1: Scale on each`workflow_job` event

Example 2: Scale up on each`check_run` event

Example 3: Scale on each`pull_request` event against a given set of branches

Packages