一、相关文档
github地址:https://github.com/NVIDIA/dcgm-exporter?spm=a2c63.p38356.0.0.7bc96ceair6E2w
指标说明:https://www.alibabacloud.com/help/zh/container-service-for-kubernetes/latest/monitoring-metric-descriptions
安装文档:https://www.hangge.com/blog/cache/detail_3184.html
二、开始部署
1、部署nvidia-container-runtime
distribution=$(. /etc/os-release;
echo
$ID
$VERSION_ID
)
curl -s -L https:
//nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
yum install nvidia-container-runtime
2、安装docker 最好是安装 19版docker
3、安装dcgm-exporter
docker run -d --gpus all --rm -p 9400:9400 liepin-registry-vpc.cn-beijing.cr.aliyuncs.com/liepin/liepin-app/dcgm-exporter:3.1.8-3.1.5-ubuntu20.04
有问题请加博主微信进行沟通!
全部评论