本篇介绍腾讯云环境GPU云服务器nvidia tesla驱动安装步骤。有很多腾讯云的使用者,在使用GPU服务器过程中,对驱动安装或者使用中有一些疑惑,比如系统kernel更新了,驱动失效了等问题。
目前腾讯云环境下支持安装GPU驱动的方式如下:
NVIDIA Telsa GPU 的 Linux 驱动在安装过程种需要编译 kernel module,所以要求系统安装好了 gcc 和编译 Linux Kernel Module 所依赖的包,例如 kernel-devel-$(uname -r) 等。
注意:操作系统选择 Linux 64-bit 代表下载的是 shell 安装文件,如果选择具体的发行版下载的文件则是对应的包安装文件。
# wget http://us.download.nvidia.com/tesla/440.33.01/NVIDIA-Linux-x86_64-440.33.01.run
NVIDIA-Linux-x86_64-440.33.01.run
加执行权限:# chmod +x NVIDIA-Linux-x86_64-440.33.01.run
# sudo yum install -y gcc kernel-devel-xxx
xxx是内核版本号,可以通过 uname -r 查看。
# sudo yum install dkms -y
dkms的作用:nvidia-installer can optionally register the NVIDIA kernel module sources, if installed, with DKMS, then build and install a kernel module using the DKMS-registered sources. This will allow the DKMS infrastructure to automatically build a new kernel module when changing kernels. During installation, if DKMS is detected, nvidia-installer will ask the user if they wish to register the module with DKMS; the default response is 'no'. This option will bypass the detection of DKMS, and cause the installer to attempt a DKMS-based installation regardless of whether DKMS is present.
白话文翻译:即注册nvidia驱动到dkms中,通过dkms管理,当内核更新的时候,会自动build新的nvidia内核模块。
# ./NVIDIA-Linux-x86_64-440.33.01.run --dkms --silent
其中--silent的作用,不弹出UI界面,单台安装还好,否则批量操作,就比较尴尬了。
# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:06.0 Off | Off |
| N/A 39C P8 7W / 75W | 0MiB / 8121MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# lsmod|grep nvidia
nvidia_drm 43714 0
nvidia_modeset 1110184 1 nvidia_drm
nvidia 19893642 13 nvidia_modeset
ipmi_msghandler 46608 2 ipmi_devintf,nvidia
drm_kms_helper 159169 2 cirrus,nvidia_drm
drm 370825 5 ttm,drm_kms_helper,cirrus,nvidia_drm
i2c_core 40756 4 drm,i2c_piix4,drm_kms_helper,nvidia
# modinfo nvidia
filename: /lib/modules/3.10.0-693.el7.x86_64/extra/nvidia.ko.xz
alias: char-major-195-*
version: 440.33.01
supported: external
license: NVIDIA
rhelversion: 7.4
srcversion: A5E9226CB2A7B16B12DA2CA
alias: pci:v000010DEd*sv*sd*bc03sc02i00*
alias: pci:v000010DEd*sv*sd*bc03sc00i00*
depends: ipmi_msghandler,i2c-core
vermagic: 3.10.0-693.el7.x86_64 SMP mod_unload modversions
parm: NvSwitchRegDwords:NvSwitch regkey (charp)
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_TCEBypassMode:int
parm: NVreg_EnableStreamMemOPs:int
parm: NVreg_EnableBacklightHandler:int
parm: NVreg_RestrictProfilingToAdminUsers:int
parm: NVreg_PreserveVideoMemoryAllocations:int
parm: NVreg_DynamicPowerManagement:int
parm: NVreg_EnableUserNUMAManagement:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_KMallocHeapMaxSize:int
parm: NVreg_VMallocHeapMaxSize:int
parm: NVreg_IgnoreMMIOCheck:int
parm: NVreg_NvLinkDisable:int
parm: NVreg_RegisterPCIDriver:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RegistryDwordsPerDevice:charp
parm: NVreg_RmMsg:charp
parm: NVreg_GpuBlacklist:charp
parm: NVreg_TemporaryFilePath:charp
parm: NVreg_AssignGpus:charp
按照本文安装,可以保障系统内核更新的时候,驱动自动build,不会出现驱动不可用状态。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。