GPU监控-dcmg-exporter

549人浏览 / 0人评论

一、相关文档

github地址:https://github.com/NVIDIA/dcgm-exporter?spm=a2c63.p38356.0.0.7bc96ceair6E2w

指标说明:https://www.alibabacloud.com/help/zh/container-service-for-kubernetes/latest/monitoring-metric-descriptions

安装文档:https://www.hangge.com/blog/cache/detail_3184.html

 

二、开始部署

1、部署nvidia-container-runtime

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \

sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo

yum install nvidia-container-runtime

2、安装docker 最好是安装 19版docker

3、安装dcgm-exporter

docker run -d --gpus all --rm -p 9400:9400 liepin-registry-vpc.cn-beijing.cr.aliyuncs.com/liepin/liepin-app/dcgm-exporter:3.1.8-3.1.5-ubuntu20.04

 

全部评论