前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >podman容器里面使用gpu

podman容器里面使用gpu

原创
作者头像
隔壁没老王
发布于 2025-06-13 10:01:48
发布于 2025-06-13 10:01:48
830
举报

1、确保使用podman容器的服务器已经安装gpu驱动

代码语言:shell
AI代码解释
复制
[root@test ~]# nvidia-smi 
Fri Jun 13 17:35:05 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01             Driver Version: 535.216.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:08.0 Off |                    0 |
| N/A   39C    P8              11W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
[root@test ~]# 

2、安装podman

代码语言:shell
AI代码解释
复制
[root@test ~]# yum install podman -y

3、启动服务

代码语言:shell
AI代码解释
复制
[root@test ~]# systemctl  start podman

4、拉取镜像测试

代码语言:shell
AI代码解释
复制
[root@test ~]# podman run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
✔ mirror.ccs.tencentyun.com/nvidia/cuda:11.0.3-base-ubuntu20.04
Trying to pull mirror.ccs.tencentyun.com/nvidia/cuda:11.0.3-base-ubuntu20.04...
Getting image source signatures
Copying blob e43c2058e496 done   | 
Copying blob 96d54c3075c9 done   | 
Copying blob 59f6381879f6 done   | 
Copying blob 655ed0df26cf done   | 
Copying blob 848b95ad96b5 done   | 
Copying config 97dfa1ef5e done   | 
Writing manifest to image destination
Error: runc: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: OCI runtime attempted to invoke a command that was not found
[root@test ~]# 

5、容器里面使用gpu,需要安装 NVIDIA Container Toolkit

参考 https://cloud.tencent.com/document/product/560/118463

6、再次测试

还是报错

代码语言:shell
AI代码解释
复制
[root@test ~]# podman run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Error: runc: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: OCI runtime attempted to invoke a command that was not found
[root@test ~]# 

7、添加配置

参考文档 https://blog.csdn.net/jiqiren_dasheng/article/details/124857320

加了这一段配置

代码语言:shell
AI代码解释
复制
[root@test ~]# Content=`cat << 'EOF'
> {
>     "version": "1.0.0",
>     "hook": {
>         "path": "/usr/bin/nvidia-container-toolkit",
>         "args": ["nvidia-container-toolkit", "prestart"],
>         "env": [
>             "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
>         ]
>     },
>     "when": {
>         "always": true,
>         "commands": [".*"]
>     },
>     "stages": ["prestart"]
> }
> EOF`
[root@test ~]#  
[root@test ~]# HookFile=/usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
[root@test ~]# sudo mkdir -p `dirname $HookFile`
[root@test ~]# sudo echo "$Content" > $HookFile
[root@test ~]# 

8、再次测试

就正常了

代码语言:shell
AI代码解释
复制
[root@test ~]# podman run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Fri Jun 13 09:57:59 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01             Driver Version: 535.216.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:08.0 Off |                    0 |
| N/A   39C    P8              11W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
[root@test ~]# 

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1、确保使用podman容器的服务器已经安装gpu驱动
  • 2、安装podman
  • 3、启动服务
  • 4、拉取镜像测试
  • 5、容器里面使用gpu,需要安装 NVIDIA Container Toolkit
  • 6、再次测试
  • 7、添加配置
  • 8、再次测试
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档