monitoringartist/opentelemetry-collector-monitoringPublic

NotificationsYou must be signed in to change notification settings
Fork8
Star71

OpenTelemetry (OTEL) collector monitoring

grafana.com/grafana/dashboards/15983-opentelemetry-collector/

License

GPL-3.0 license

71 stars 8 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
dashboard		dashboard
doc		doc
LICENSE		LICENSE
README.md		README.md

Repository files navigation

OpenTelemetry (OTEL) collector monitoring

Metric name/label name conventions

OpenTelemetry collector metrics and label names can vary significantly. These variations depend on several factors:

OpenTelemetry Collector version
Specific receivers/exporters in use
Receiver/exporter configuration settings
Metric storage backend configuration

For example uptime metric can be saved as:

otelcol_process_uptime{service_instance_id="123456", ...}otelcol_process_uptime_total{service_instance_id="123456", ...}otelcol_process_uptime_seconds_total{service_instance_id="123456", ...}otelcol_process_uptime_seconds_total{service.instance.id="123456", ...}otelcol_process_uptime_seconds_total_total{service.instance.id="123456", ...}

This dashboard attempts to detect all these variations through hidden dashboard variablesto ensure compatibility across different configurations.

Metrics

Collector can expose Prometheus metrics locally on port 8888 and path/metrics. For containerized environments it may be desirable to expose thisport on a public interface instead of just locally.

receivers:  prometheus:    config:      scrape_configs:      - job_name: otel-collector-metrics        scrape_interval: 10s        static_configs:        - targets: ['127.0.0.1:8888']service:  pipelines:    metrics:      receivers: [prometheus]      processors: []      exporters: [...]  telemetry:    resource:      service.name: grafana-opentelemetry/metrics    metrics:      level: detailed      readers:        - pull:            exporter:              prometheus:                host: 127.0.0.1                port: 8888                with_resource_constant_labels:                  included:                      - service.name

Collector can scrape own metric via own metric pipeline, so real configurationcan looks like:

extensions:  sigv4auth/aws:receivers:  prometheus:    config:      scrape_configs:      - job_name: otel-collector-metrics        scrape_interval: 10s        static_configs:          - targets: ['127.0.0.1:8888']exporters:  prometheusremotewrite/aws:    endpoint: ${PROMETHEUS_ENDPOINT}    auth:      authenticator: sigv4auth/aws    retry_on_failure:      enabled: true      initial_interval: 1s      max_interval: 10s      max_elapsed_time: 30sservice:  pipelines:    metrics:      receivers: [prometheus]      processors: []      exporters: [prometheusremotewrite/aws]  telemetry:    resource:      service.name: grafana-opentelemetry/metrics    metrics:      level: detailed      readers:        - pull:            exporter:              prometheus:                host: 127.0.0.1                port: 8888                with_resource_constant_labels:                  included:                      - service.name

Grafana dashboard for OpenTelemetry collector metrics

This dashboard can also be used forGrafana Alloy monitoring.

Prometheus alerts

Recommended Prometheus alerts for OpenTelemetry collector metrics:

# keep in mind that metrics may have "_total|seconds_total|seconds_total_total" suffixes - check your metrics/configuration firstgroups:  - name: opentelemetry-collector    rules:      - alert: processor-dropped-spans        expr: sum(rate(otelcol_processor_dropped_spans{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some spans have been dropped by processor          description: Maybe collector has received non standard spans or it reached some limits      - alert: processor-dropped-metrics        expr: sum(rate(otelcol_processor_dropped_metric_points{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some metric points have been dropped by processor          description: Maybe collector has received non standard metric points or it reached some limits      - alert: processor-dropped-logs        expr: sum(rate(otelcol_processor_dropped_log_records{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some log records have been dropped by processor          description: Maybe collector has received non standard log records or it reached some limits      - alert: receiver-refused-spans        expr: sum(rate(otelcol_receiver_refused_spans{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some spans have been refused by receiver          description: Maybe collector has received non standard spans or it reached some limits      - alert: receiver-refused-metrics        expr: sum(rate(otelcol_receiver_refused_metric_points{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some metric points have been refused by receiver          description: Maybe collector has received non standard metric points or it reached some limits      - alert: receiver-refused-logs        expr: sum(rate(otelcol_receiver_refused_log_records{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some log records have been refused by receiver          description: Maybe collector has received non standard log records or it reached some limits      - alert: exporter-enqueued-spans        expr: sum(rate(otelcol_exporter_enqueue_failed_spans{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some spans have been enqueued by exporter          description: Maybe used destination has a problem or used payload is not correct      - alert: exporter-enqueued-metrics        expr: sum(rate(otelcol_exporter_enqueue_failed_metric_points{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some metric points have been enqueued by exporter          description: Maybe used destination has a problem or used payload is not correct      - alert: exporter-enqueued-logs        expr: sum(rate(otelcol_exporter_enqueue_failed_log_records{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some log records have been enqueued by exporter          description: Maybe used destination has a problem or used payload is not correct      - alert: exporter-failed-requests        expr: sum(rate(otelcol_exporter_send_failed_requests{}[1m])) > 0        for: 5m        labels:          severity: critical        annotations:          summary: Some exporter requests failed          description: Maybe used destination has a problem or used payload is not correct      - alert: high-cpu-usage        expr: max(rate(otelcol_process_cpu_seconds{}[1m])*100) > 90        for: 5m        labels:          severity: critical        annotations:          summary: High max CPU usage          description: Collector needs to scale up

Example of alert rules provided as a opentelemetry collector part of helm chart:https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/prometheusrule.yaml

Documentation

About

OpenTelemetry (OTEL) collector monitoring

grafana.com/grafana/dashboards/15983-opentelemetry-collector/

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OpenTelemetry (OTEL) collector monitoring

Metric name/label name conventions

Metrics

Grafana dashboard for OpenTelemetry collector metrics

Prometheus alerts

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors8

Movatterモバイル変換

License

monitoringartist/opentelemetry-collector-monitoring

Folders and files

Latest commit

History

Repository files navigation

OpenTelemetry (OTEL) collector monitoring

Metric name/label name conventions

Metrics

Grafana dashboard for OpenTelemetry collector metrics

Prometheus alerts

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors8