NotificationsYou must be signed in to change notification settings
Fork10
Star18

Prometheus GPU Metrics Exporter

18 stars 10 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
build		build
cfn		cfn
helm-chart		helm-chart
template		template
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
buildspec-dockerhub.yml		buildspec-dockerhub.yml
buildspec-ecr.yml		buildspec-ecr.yml
pgme.go		pgme.go
version		version

Repository files navigation

Prometheus GPU Metrics Exporter (PGME)

PGME is a GPU Metrics exporters that leverages the nvidai-smi binary. The initial work and key metric gathering code isderived from:

https://github.com/zhebrak/nvidia_smi_exporter

Nvidia-smi command used to gather metrics:

nvidia-smi --query-gpu=name,index,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv,noheader,nounits

I have added the following in an attempt to make it a more robust service:

configuration via environment variables
Makefile for local build
liveness HTTP request probe for Kubernetes(k8s)
graceful shutdown of http server
exporter details at http://[[ip of server]]:[[port]/
Integration with AWS Codebuild and Publishing to DockerHub or AWS ECR via different buildspec files

Working On:

Kubernetes service and helm configuration

Building

Local MAC Build (Generates a binary that works on OSX based systems)

git clone https://github.com/chhibber/pgme.gitcd pgmemake build-mac

Local Linux Build (Genrates a binary that works on Linux systems)

https://github.com/chhibber/pgme.gitcd pgmemake build

Local Docker Build (Generates a docker image)

https://github.com/chhibber/pgme.gitcd pgmemake docker-build IMAGE_REPO_NAME=[[ repo_name/app_name ]] IMAGE_TAG=[[ version info ]]# Example runnvidia-docker run -p 9101:9101 chhibber/pgme2018/01/05 21:32:31 Starting the service...2018/01/05 21:32:31 - PORT set to 9101.  If  environment variable PORT is not set the default is 91012018/01/05 21:32:31 The service is listening on 9101...

Running the binary directly

The default port is 9101

You can change the port by defining the environment variabl PORT in front of the binary.

> PORT=9101 ./pgme

Runnign via Docker (Needed to expose the GPU to the running container)

nvidia-docker run -p 9101:9101 chhibber/pgme:2017.01

Available Metrics -http://localhost:9101/metrics

temperature_gpu{gpu="TITAN X (Pascal)[0]"} 41utilization_gpu{gpu="TITAN X (Pascal)[0]"} 0utilization_memory{gpu="TITAN X (Pascal)[0]"} 0memory_total{gpu="TITAN X (Pascal)[0]"} 12189memory_free{gpu="TITAN X (Pascal)[0]"} 12189memory_used{gpu="TITAN X (Pascal)[0]"} 0temperature_gpu{gpu="TITAN X (Pascal)[1]"} 78utilization_gpu{gpu="TITAN X (Pascal)[1]"} 95utilization_memory{gpu="TITAN X (Pascal)[1]"} 59memory_total{gpu="TITAN X (Pascal)[1]"} 12189memory_free{gpu="TITAN X (Pascal)[1]"} 1738memory_used{gpu="TITAN X (Pascal)[1]"} 10451temperature_gpu{gpu="TITAN X (Pascal)[2]"} 83utilization_gpu{gpu="TITAN X (Pascal)[2]"} 99utilization_memory{gpu="TITAN X (Pascal)[2]"} 82memory_total{gpu="TITAN X (Pascal)[2]"} 12189memory_free{gpu="TITAN X (Pascal)[2]"} 190memory_used{gpu="TITAN X (Pascal)[2]"} 11999temperature_gpu{gpu="TITAN X (Pascal)[3]"} 84utilization_gpu{gpu="TITAN X (Pascal)[3]"} 97utilization_memory{gpu="TITAN X (Pascal)[3]"} 76memory_total{gpu="TITAN X (Pascal)[3]"} 12189memory_free{gpu="TITAN X (Pascal)[3]"} 536memory_used{gpu="TITAN X (Pascal)[3]"} 11653

Prometheus example config

- job_name: "gpu_exporter"  static_configs:  - targets: ['localhost:9101']

About

Prometheus GPU Metrics Exporter

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Prometheus GPU Metrics Exporter (PGME)

Building

Running the binary directly

Runnign via Docker (Needed to expose the GPU to the running container)

Available Metrics -http://localhost:9101/metrics

Prometheus example config

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

chhibber/pgme

Folders and files

Latest commit

History

Repository files navigation

Prometheus GPU Metrics Exporter (PGME)

Building

Running the binary directly

Runnign via Docker (Needed to expose the GPU to the running container)

Available Metrics -http://localhost:9101/metrics

Prometheus example config

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages