Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Prometheus GPU Metrics Exporter

NotificationsYou must be signed in to change notification settings

chhibber/pgme

Repository files navigation

PGME is a GPU Metrics exporters that leverages the nvidai-smi binary. The initial work and key metric gathering code isderived from:

Nvidia-smi command used to gather metrics:

nvidia-smi --query-gpu=name,index,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv,noheader,nounits

I have added the following in an attempt to make it a more robust service:

  • configuration via environment variables
  • Makefile for local build
  • liveness HTTP request probe for Kubernetes(k8s)
  • graceful shutdown of http server
  • exporter details at http://[[ip of server]]:[[port]/
  • Integration with AWS Codebuild and Publishing to DockerHub or AWS ECR via different buildspec files

Working On:

  • Kubernetes service and helm configuration

Building

Local MAC Build (Generates a binary that works on OSX based systems)

git clone https://github.com/chhibber/pgme.gitcd pgmemake build-mac

Local Linux Build (Genrates a binary that works on Linux systems)

https://github.com/chhibber/pgme.gitcd pgmemake build

Local Docker Build (Generates a docker image)

https://github.com/chhibber/pgme.gitcd pgmemake docker-build IMAGE_REPO_NAME=[[ repo_name/app_name ]] IMAGE_TAG=[[ version info ]]# Example runnvidia-docker run -p 9101:9101 chhibber/pgme2018/01/05 21:32:31 Starting the service...2018/01/05 21:32:31 - PORT set to 9101.  If  environment variable PORT is not set the default is 91012018/01/05 21:32:31 The service is listening on 9101...

Running the binary directly

  • The default port is 9101

You can change the port by defining the environment variabl PORT in front of the binary.

> PORT=9101 ./pgme

Runnign via Docker (Needed to expose the GPU to the running container)

nvidia-docker run -p 9101:9101 chhibber/pgme:2017.01

temperature_gpu{gpu="TITAN X (Pascal)[0]"} 41utilization_gpu{gpu="TITAN X (Pascal)[0]"} 0utilization_memory{gpu="TITAN X (Pascal)[0]"} 0memory_total{gpu="TITAN X (Pascal)[0]"} 12189memory_free{gpu="TITAN X (Pascal)[0]"} 12189memory_used{gpu="TITAN X (Pascal)[0]"} 0temperature_gpu{gpu="TITAN X (Pascal)[1]"} 78utilization_gpu{gpu="TITAN X (Pascal)[1]"} 95utilization_memory{gpu="TITAN X (Pascal)[1]"} 59memory_total{gpu="TITAN X (Pascal)[1]"} 12189memory_free{gpu="TITAN X (Pascal)[1]"} 1738memory_used{gpu="TITAN X (Pascal)[1]"} 10451temperature_gpu{gpu="TITAN X (Pascal)[2]"} 83utilization_gpu{gpu="TITAN X (Pascal)[2]"} 99utilization_memory{gpu="TITAN X (Pascal)[2]"} 82memory_total{gpu="TITAN X (Pascal)[2]"} 12189memory_free{gpu="TITAN X (Pascal)[2]"} 190memory_used{gpu="TITAN X (Pascal)[2]"} 11999temperature_gpu{gpu="TITAN X (Pascal)[3]"} 84utilization_gpu{gpu="TITAN X (Pascal)[3]"} 97utilization_memory{gpu="TITAN X (Pascal)[3]"} 76memory_total{gpu="TITAN X (Pascal)[3]"} 12189memory_free{gpu="TITAN X (Pascal)[3]"} 536memory_used{gpu="TITAN X (Pascal)[3]"} 11653

Prometheus example config

- job_name: "gpu_exporter"  static_configs:  - targets: ['localhost:9101']

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp