Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A reference for tools, configurations, and documentation used to monitor CircleCI server.

License

NotificationsYou must be signed in to change notification settings

CircleCI-Public/circleci-server-monitoring-reference

Repository files navigation

A reference for tools, configurations, and documentation used to monitor CircleCI server.

🚧Under Development

This repository is currently under active development and is not yet a supported resource. Please refer to it at your own discretion until further notice.

Table of Contents

server-monitoring-stack

A reference Helm chart for setting up a monitoring stack for CircleCI server

Version: 0.1.0-alpha.8

Installing the Monitoring Stack

Requirements

RepositoryNameVersion
https://grafana.github.io/helm-chartsgrafanaoperator(grafana-operator)v5.18.0
https://prometheus-community.github.io/helm-chartsprometheusOperator(prometheus-operator-crds)19.0.0

1. Configure Server for the Monitoring Stack

To set up monitoring for a CircleCI server instance, you need to configure Telegraf to set up a Prometheus client and expose a metrics endpoint. Add the following configuration to the CircleCIserver Helm chart values:

telegraf:config:outputs:      -file:files:["stdout"]      -prometheus_client:listen:":9273"

3. Add Helm Repository

First, add the CircleCI Server Monitoring Stack Helm repository:

$ helm repo add server-monitoring-stack https://packagecloud.io/circleci/server-monitoring-stack/helm$ helm repo update

3. Install Dependencies

Before installing the full chart, you must first install the dependency subcharts and operators.

3.1 Install Prometheus CRDs and Grafana Operator

Install the Prometheus Custom Resource Definitions (CRDs) and the Grafana operator chart. This assumes you are installing it in the same namespace as your CircleCI server installation:

$ helm install server-monitoring-stack server-monitoring-stack/server-monitoring-stack --set global.enabled=false --set prometheusOperator.installCRDs=true --version 0.1.0-alpha.8 -n<your-server-namespace>

NOTE: It's possible to install the monitoring stack in a different namespace than the CircleCI server installation. If you do so, set theprometheus.serviceMonitor.selectorNamespaces value with the target namespace.

3.2 Install Tempo Operator (Optional)

If you plan to enable distributed tracing with Tempo (tempo.enabled=true), you must manually install the Tempo Operator. There is currently no official Helm chart available for the Tempo Operator or its CRDs, so manual installation is required. The Tempo Operator also requires cert-manager to be installed in your cluster. Additionally, this reference chart requires thegrafanaOperator feature gate to be enabled for proper integration with Grafana.

For more detailed installation instructions, refer to theofficial Tempo Operator documentation.

Prerequisites:

  • cert-manager must be installed in your cluster

Example installation steps:

  1. Install the Tempo Operator:
$ kubectl apply -f https://github.com/grafana/tempo-operator/releases/download/v0.17.0/tempo-operator.yaml
  1. Enable thegrafanaOperator feature gate (required for integration with Grafana):
$ kubectl get cm tempo-operator-manager-config -n tempo-operator-system -o yaml| \    sed's/^  *grafanaOperator: false$/      grafanaOperator: true/'| \    kubectl apply -f -
  1. Restart the operator deployment to apply the configuration:
$ kubectl rollout restart deployment/tempo-operator-controller -n tempo-operator-system$ kubectlwait --for=condition=available --timeout=120s deployment/tempo-operator-controller -n tempo-operator-system

4. Install the Helm Chart

Next, install the Helm chart using the following command:

$ helm upgrade --install server-monitoring-stack server-monitoring-stack/server-monitoring-stack --reset-values --version 0.1.0-alpha.8 -n<your-server-namespace>

5. Verify Prometheus Is Up and Targeting Telegraf

To verify that Prometheus is working correctly and targeting Telegraf, use the following command to port-forward Prometheus:

$ kubectl port-forward svc/server-monitoring-prometheus 9090:9090 -n<your-namespace-here>

Then visithttp://localhost:9090/targets in your browser. Verify that Telegraf appears as a target and that its state is "up".

Prometheus UI showing Telegraf target as up

6. Verify Grafana Is Up and Connected to Prometheus

To verify that Grafana is working correctly and connected to Prometheus, use the following command to port-forward Grafana:

$ kubectl port-forward svc/server-monitoring-grafana-service 3000:3000<your-namespace-here>

Then visithttp://localhost:3000 in your browser. Once logged in with the default credentials, navigate tohttp://localhost:3000/dashboards and verify that the default dashboards are present and populating with data.

Prometheus UI showing Telegraf target as up

7. Next Steps

After ensuring both Prometheus and Grafana are operational, consider these enhancements:

Security

Secure Grafana by configuring credentials:

grafana:credentials:# Directly set these for quick setupsadminUser:"admin"adminPassword:"<your-secure-password-here>"# For production, use a Kubernetes secret to manage credentials securelyexistingSecretName:"<your-secret-here>"

Expose Grafana Externally

For external access, modify the service or ingress values. For example:

grafana:service:type:LoadBalancer

Enabling Persistent Storage

Persist data by enabling storage for Prometheus and Grafana:

prometheus:persistence:enabled:truestorageClass:<your-custom-storage-class>grafana:persistence:enabled:truestorageClass:<your-custom-storage-class>

NOTE: Use a custom storage class with a 'Retain' policy to allow for data retention even after uninstalling the chart.

Tempo Storage Configuration

When Tempo is enabled, it's recommended to use object storage instead of in-memory storage for trace persistence. Compatible storage backends for Tempo and CircleCI server include S3, GCS, and MinIO.

Configure object storage using thetempo.storage values detailed in thevalues section below.

NOTE: For production deployments, object storage provides better durability and scalability compared to in-memory storage, which loses traces on pod restarts.

For detailed configuration options, consult theofficial Tempo documentation.

Modifying or Adding Grafana Dashboards

The default dashboards are located in thedashboards directory of the reference chart. To add new dashboards or modify existing ones, follow these steps.

Dashboards are provisioned directly from CRDs, which means any manual edits will be lost upon a refresh. As such, the workflow outlined below is recommended for making changes:

  1. Create a Copy:
    • SelectEdit in the upper right corner.
    • ChooseSave dashboard ->Save as copy.
    • After saving, navigate to the copy.
  2. Make Edits:
    • Modify the copy as needed and exit edit mode.
  3. Export as JSON:
    • SelectExport in the upper right corner and thenExport as JSON.
    • Ensure thatExport the dashboard to use in another instance is toggled on.
  4. Update the JSON File:
    • Download the file and replace the./dashboards/server-slis.json file with the updated copy.
    • Run the following command to automatically validate the JSON and apply necessary updates:
      ./do validate-dashboards
  5. Commit and Open a PR:
    • Review and commit the changes.
    • Open a pull request for the On-Prem team to review.

Values

KeyTypeDefaultDescription
global.enabledbooltrue
global.fullnameOverridestring"server-monitoring"Override the full name for resources
global.imagePullSecretslist[]List of image pull secrets to be used across the deployment
global.nameOverridestring""Override the release name
grafana.credentials.adminPasswordstring"admin"Grafana admin password. Change from default for production environments.
grafana.credentials.adminUserstring"admin"Grafana admin username.
grafana.credentials.existingSecretNamestring""Name of an existing secret for Grafana credentials. Leave empty to create a new secret.
grafana.customConfigstring""Add any custom Grafana configurations you require here. This should be a YAML-formatted string of additional settings for Grafana.
grafana.dashboards.jsonDirectorystring"dashboards"The directory containing JSON files for Grafana dashboards.
grafana.datasource.jsonData.timeIntervalstring"5s"The time interval for Grafana to poll Prometheus. Specifies the frequency of data requests.
grafana.enabledstring"-"
grafana.image.repositorystring"grafana/grafana"Image repository for Grafana.
grafana.image.tagstring"12.0.0-security-01"Tag for the Grafana image.
grafana.ingress.classNamestring""Specifies the class of the Ingress controller. Required if the Kubernetes cluster includes multiple Ingress controllers.
grafana.ingress.enabledboolfalseEnable to create an Ingress resource for Grafana. Disabled by default.
grafana.ingress.hoststring""Hostname to use for the Ingress. Must be set if Ingress is enabled.
grafana.ingress.tls.enabledboolfalseEnable TLS for Ingress. Requires a TLS secret to be specified.
grafana.ingress.tls.secretNamestring""Name of the TLS secret used for securing the Ingress. Must be provided if TLS is enabled.
grafana.persistence.accessModeslist["ReadWriteOnce"]Access modes for the persistent volume.
grafana.persistence.enabledboolfalseEnable persistent storage for Grafana.
grafana.persistence.sizestring"10Gi"Size of the persistent volume claim.
grafana.persistence.storageClassstring""Storage class for persistent volume provisioner. You can create a custom storage class with a "retain" policy to ensure the persistent volume remains even after the chart is uninstalled.
grafana.replicasint1Number of Grafana replicas to deploy.
grafana.service.annotationsobject{}Metadata annotations for the service.
grafana.service.portint3000Port on which the Grafana service will be exposed.
grafana.service.typestring"ClusterIP"Specifies the type of service for Grafana. Options include ClusterIP, NodePort, or LoadBalancer. Use NodePort or LoadBalancer to expose Grafana externally. Ensure that grafana.credentials are set for security purposes.
grafanaoperatorobject{"fullnameOverride":"server-monitoring-grafana-operator","image":{"repository":"quay.io/grafana-operator/grafana-operator","tag":"v5.18.0"}}Full values for the Grafana Operator chart can be obtained at:https://github.com/grafana/grafana-operator/blob/master/deploy/helm/grafana-operator/values.yaml
grafanaoperator.fullnameOverridestring"server-monitoring-grafana-operator"Overrides the fully qualified app name.
grafanaoperator.image.repositorystring"quay.io/grafana-operator/grafana-operator"Image repository for the Grafana Operator.
grafanaoperator.image.tagstring"v5.18.0"Tag for the Grafana Operator image.
prometheus.enabledstring"-"
prometheus.image.repositorystring"quay.io/prometheus/prometheus"Image repository for Prometheus.
prometheus.image.tagstring"v3.2.1"Tag for the Prometheus image.
prometheus.persistence.accessModeslist["ReadWriteOnce"]Access modes for the persistent volume.
prometheus.persistence.enabledboolfalseEnable persistent storage for Prometheus.
prometheus.persistence.sizestring"10Gi"Size of the persistent volume claim.
prometheus.persistence.storageClassstring""Storage class for persistent volume provisioner. You can create a custom storage class with a "retain" policy to ensure the persistent volume remains even after the chart is uninstalled.
prometheus.replicasint2Number of Prometheus replicas to deploy.
prometheus.serviceMonitor.endpoints[0].metricRelabelings[0].actionstring"labeldrop"
prometheus.serviceMonitor.endpoints[0].metricRelabelings[0].regexstring"instance"
prometheus.serviceMonitor.endpoints[0].portstring"prometheus-client"Port name for the Prometheus client service.
prometheus.serviceMonitor.endpoints[0].relabelings[0].actionstring"labeldrop"
prometheus.serviceMonitor.endpoints[0].relabelings[0].regexstring`"(containerendpoint
prometheus.serviceMonitor.selectorLabelsobject{"app.kubernetes.io/instance":"circleci-server","app.kubernetes.io/name":"telegraf"}Labels to select ServiceMonitors for scraping metrics. By default, it's configured to scrape the existing Telegraf deployment in CircleCI server.
prometheus.serviceMonitor.selectorNamespaceslist[]Namespaces to look for ServiceMonitor objects. Set this if the CircleCI server monitoring stack is deploying in a different namespace than the actual CircleCI server installation.
prometheusOperator.crds.annotations."helm.sh/resource-policy"string"keep"
prometheusOperator.enabledstring"-"
prometheusOperator.image.repositorystring"quay.io/prometheus-operator/prometheus-operator"Image repository for Prometheus Operator.
prometheusOperator.image.tagstring"v0.81.0"Tag for the Prometheus Operator image.
prometheusOperator.installCRDsboolfalse
prometheusOperator.prometheusConfigReloader.image.repositorystring"quay.io/prometheus-operator/prometheus-config-reloader"Image repository for Prometheus Config Reloader.
prometheusOperator.prometheusConfigReloader.image.tagstring"v0.81.0"Tag for the Prometheus Config Reloader image.
prometheusOperator.replicasint1Number of Prometheus Operator replicas to deploy.
tempo.customConfigobject{}Add any custom Tempo configurations you require here. This should be a YAML object of additional settings for Tempo.
tempo.enabledstring"-"Enable Tempo distributed tracing Requires manual installation of Tempo Operator Set to true to enable, false to disable, "-" to use global default
tempo.podSecurityContextobject{"fsGroup":10001,"runAsGroup":10001,"runAsNonRoot":true,"runAsUser":10001}Pod security context for Tempo containers
tempo.podSecurityContext.fsGroupint10001Filesystem group ID for volume ownership and permissions
tempo.podSecurityContext.runAsGroupint10001Group ID to run the container processes
tempo.podSecurityContext.runAsNonRootbooltrueRun containers as non-root user
tempo.podSecurityContext.runAsUserint10001User ID to run the container processes
tempo.resourcesobject{"limits":{"cpu":"1000m","memory":"2Gi"},"requests":{"cpu":"500m","memory":"1Gi"}}Resource requirements for Tempo pods Adjust based on your trace volume and cluster capacity
tempo.resources.limits.cpustring"1000m"Maximum CPU Tempo pods can use
tempo.resources.limits.memorystring"2Gi"Maximum memory Tempo pods can use
tempo.resources.requests.cpustring"500m"Minimum CPU guaranteed to Tempo pods
tempo.resources.requests.memorystring"1Gi"Minimum memory guaranteed to Tempo pods
tempo.storageobject{"traces":{"backend":"memory","size":"20Gi","storageClassName":""}}Storage configuration for trace data
tempo.storage.traces.backendstring"memory"Storage backend for traces Default: in-memory storage (traces lost on pod restart) Suitable for development/testing environments only
tempo.storage.traces.sizestring"20Gi"Storage volume size For memory/pv: actual volume size For cloud backends: size of WAL (Write-Ahead Log) volume Increase for higher trace volumes or longer retention
tempo.storage.traces.storageClassNamestring""Storage class for persistent volume provisioner. Applies to both persistent volume and object storage backends.

Releases

Releases are managed by the CI/CD pipeline on the main branch, with an approval job gate calledapprove-deploy-chart. Before releasing, increment the Helm chart version inChart.yaml and regenerate the documentation using./do helm-docs. Once approved, the release will be available in thepackage repository.

Server Monitoring Reference Support Policy

This monitoring reference is not part of CircleCI’s Server product. CircleCI provides it as a monitoring tooling and configuration repository for CircleCI Server User(s) that may be referred to when the User(s) plan and deploy their own monitoring implementations.

CircleCI strives to ensure that the monitoring tooling and configurations in this reference are functional and up to date. While CircleCI may provide reference to, answer questions regarding, and/or review contributions to the monitoring tooling and configurations, CircleCI does not make any judgment or recommendation as to the suitability for any customer installation of them with CircleCI Server, nor provide support for their installation and/or management in any customer’s system.

This monitoring reference and the monitoring tooling and configurations are provided on an ‘as-is’ and ‘as available’ basis without any warranties of any kind. CircleCI disclaims all warranties, express or implied, including, but not limited to, all implied warranties of merchantability, title, fitness for a particular purpose, and noninfringement.

About

A reference for tools, configurations, and documentation used to monitor CircleCI server.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors4

  •  
  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp