报错:
Starting training for 200 epochs...
Epoch GPU_mem box_om cls_om dfl_om box_oo cls_oo dfl_oo Instances Size
0%| | 0/9 [00:00<?, ?it/s]Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN10cask_cudnn2kr8isAmpereEPNS0_5spa_tERKi, version libcudnn_cnn_infer.so.8
0%| | 0/9 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 18, in <module>
train_yolov10(model_type='n', data='coco128.yaml', epochs=200, imgsz=640, batch=32, device=0, workers=8,resume=False)
File "train.py", line 13, in train_yolov10
results = model.train(data=data, epochs=epochs, imgsz=imgsz, batch=batch, device=device, workers=workers, save=True,
File "/home/fut/anaconda3/envs/yolo10/lib/python3.8/site-packages/ultralytics/engine/model.py", line 657, in train
self.trainer.train()
File "/home/fut/anaconda3/envs/yolo10/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 213, in train
self._do_train(world_size)
File "/home/fut/anaconda3/envs/yolo10/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 389, in _do_train
self.scaler.scale(self.loss).backward()
File "/home/fut/anaconda3/envs/yolo10/lib/python3.8/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/fut/anaconda3/envs/yolo10/lib/python3.8/site-packages/torch/autograd/__init__.py", line 267, in backward
_engine_run_backward(
File "/home/fut/anaconda3/envs/yolo10/lib/python3.8/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: GET was unable to find an engine to execute this computation
问题分析:
一般安装高版本torch会导致这个问题,如果安装torch低版本都会解决问题,报错版本torch==2.3.0,如果你安装1.x版本不会这有问题。出现这个问题原因是torch2.3.0依赖cudnn会自动安装python环境,但是又和系统cudnn冲突。因此卸载环境自带的cudnn即可解决问题。
解决方法:
pip list查看安装cudnn
卸载这个模块即可,其他模块不要动。
pip uninstall nvidia-cudnn-cu11