- Notifications
You must be signed in to change notification settings - Fork12
Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.
License
NotificationsYou must be signed in to change notification settings
ROCm/device-metrics-exporter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
AMD Device Metrics Exporter enables real-time collection of telemetry data in Prometheus format from AMD GPUs in HPC and AI environments. It provides comprehensive metrics including temperature, utilization, memory usage, power consumption, and more.
The Metrics Exporter container is available on Docker Hub:
docker run -d \ --device=/dev/dri \ --device=/dev/kfd \ -p 5000:5000 \ --name device-metrics-exporter \ rocm/device-metrics-exporter:v1.0.0
- Prometheus-compatible metrics endpoint
- Rich GPU telemetry data including:
- Temperature monitoring
- Utilization metrics
- Memory usage statistics
- Power consumption data
- PCIe bandwidth metrics
- Kubernetes integration via Helm chart
- Slurm integration support
- Configurable service ports
- Container-based deployment
- Ubuntu 22.04 or later
- ROCm 6.2.0
- Docker (or compatible container runtime)
For detailed documentation including installation guides, configuration options, and metric descriptions, see thedocumentation.
This project is licensed under the Apache 2.0 License - see theLICENSE file for details.
About
Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.
Resources
License
Stars
Watchers
Forks
Packages0
No packages published