- Notifications
You must be signed in to change notification settings - Fork0
Sommerstudentprosjekt 2021 - opprydning av kubernetes ressurser
License
nais/babylon
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Babylon detects failing applications in your kubernetes cluster, notifies the responsibleteams, and cleans them up. By doing this, Babylon will give tidy clusters and avoidunnecessary resource usage.
By default, babylon looks for broken deploys in all namespaces, but this can be configured. If want to enable the allowlist, set the environment variableUSE_ALLOWED_NAMESPACES
totrue
, and add your namespaces to the environment variableALLOWED_NAMESPACES
, likedefault,babylon
.
Note:ALLOWED_NAMESPACES
is a comma seperated string without whitespace.
Working hours can be configured by creating a file calledworking-hours.yaml
in/etc/config
, the syntax usedis the exact same as Prometheus' Alertmanager, see theirdocs.Working hours only limit when resource pruning, limiting when alerts are received is awaiting featuresin Alerterator.
Type of Error | Reason |
---|---|
CreateContainerConfigError | A container could not be created due to errors in the resource definition. Happens when e.g., you try to reference a config map that doesn't exist/is missing keys |
ImagePullBackOff /ErrImagePull | Happens when a container cannot find/pull an image from its registry, usually terminal. This check is for both containers in a deployment and their init containers |
CrashLoopBackOff | Happens when the application inside the container crashes and/or restarts, see restart threshold below. This check is for both containers in a deployment and their init containers |
Name | Default | Description |
---|---|---|
ARMED | false | By default, the application will not perform destructive actions. To arm it set theARMED 💥 environment variable to true. |
RESOURCE_AGE | 10m | Any resources younger than this threshold will not be checked |
NOTIFICATION_DELAY | 24h | Time between Babylon first detects an resource as failing, and when a notification is sent. Note that Babylon first turns volatile against a resource afterNOTIFICATION_DELAY + GRACE_PERIOD . Note: This does not actually affect when the notification is sent, that is configured in thealerts.yaml . |
GRACE_PERIOD | 24h | The grace period starts with the first notification related to a resource. Resources will be handled (e.g. deleted, downscaled, or rolled back) at some point after the grace period has ended. |
RESTART_THRESHOLD | 200 | DuringCrashLoopBackOff the pod will be ignored while the number of restarts is less than the threshold |
TICKRATE | 15m | The tick rate is the duration for which the application's main loop will wait between each run (somewhat similar toTime.sleep ) |
LINKERD_DISABLED | none | Disable waiting on Linkerd sidecar during startup. |
UNLEASH_URL | none | URL to connect toUnleash |
USE_ALLOWED_NAMESPACES | false | Only allow Babylon to perform cleanup in allowed namespaces specified byALLOWED_NAMESPACES |
ALLOWED_NAMESPACES | none | Comma-separated list of namespaces (without whitespace) where cleanup is allowed. |
For development setup, seeCONTRIBUTING.md.
About
Sommerstudentprosjekt 2021 - opprydning av kubernetes ressurser