导语:TensorRT立项之初的名字叫做GPU Inference Engine(简称GIE),是NVIDIA 推出的一款基于CUDA和cudnn的神经网络推断加速引擎,TensorRT现已支持TensorFlow、Caffe、Mxnet、Pytorch等几乎所有的深度学习框架,将TensorRT和NVIDIA的GPU结合起来,能在几乎所有的框架中进行快速和高效的部署推理。
本文以pytorch resnet101模型转换为例,介绍一下模型转换的过程,和线上推理遇到的问题。
模型生成和线上推理环境需保持一致,否则推理会出错。
记录一些安装中的问题,
依赖版本如下:
ENV CUDA_VERSION 10.2.89
ENV CUDNN_VERSION 8.0.3.33
ENV TENSORRT_VERSION 7.1.3
ENV PYCUDA 2020.1
先把模型转换成ONNX,再把ONNX模型转换成TensorRT。ONNX是一种开放格式,它可以让我们的算法及模型在不同的框架之间的迁移,Caffe2、PyTorch、TensorFlow、MXNet等主流框架都对ONNX有着不同程度的支持。
def load_model(model_path):
global model
device = 'cuda'
model = resnest101(num_classes=30)
weight = torch.load(model_path, map_location=device)
weight = {k[7:]: v for k, v in weight['weight1'].items()}
model.load_state_dict(weight)
del weight, weight
model.to(device)
def torch_2_onnx():
global model
input_name = ['input']
output_name = ['output']
model_name = 'model_batch1.onnx'
input = Variable(torch.randn(1, 3, 320, 180)).cuda()
torch.onnx.export(model, input, model_name, input_names=input_name,
output_names=output_name, verbose=True, opset_version=10)
onnx_model = onnx.load(model_name)
onnx.save(onnx_model, model_name)
input 定义推理图片batch数、通道数、尺寸
model_name 定义模型名字,运行完会存到当前路径
def onnx_2_trt(model_name):
# Build a TensorRT engine.
with build_engine_onnx(model_name) as engine:
with open(model_name.replace(".onnx", ".engine"), "wb") as fid:
fid.write(engine.serialize())
def build_engine_onnx(model_file):
with trt.Builder(TRT_LOGGER) as builder:
with builder.create_network(flags=(1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))) as network:
with trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = common.GiB(10)
with open(model_file, 'rb') as model:
if not parser.parse(model.read()):
print('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
return builder.build_cuda_engine(network)
运行完会在当前路径生成一个engine后缀的TRT模型文件
这一步我在实践中遇到了一个报错:
Assertion failed: tensors.count(input_name) error when converting onnx to tensorrt
在git上找到一个issues,通过升级tensorrt版本至7.1.3解决了问题。
def inference(model_name):
# load trt model
with trt.Runtime(TRT_LOGGER) as runtime:
with open(model_name, "rb") as fid:
engine = runtime.deserialize_cuda_engine(fid.read())
# allocate memory
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
with engine.create_execution_context() as context:
batch4d = np.random.random([1, 3, 320, 180]).astype(np.float32)
np.copyto(inputs[0].host, batch4d.ravel())
# infer
trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs,
stream=stream)
print(trt_outputs)
生产环境实践中遇到以下几个问题:
Cask Error in checkCaskExecError<false>: 10 (Cask Convolution execution)
解决方式:每个线程内单独初始化pycuda context
invalid device context - no currently active context
解决方式:https://wiki.tiker.net/PyCuda/Examples/MultipleThreads/
torch.onnx.export参数解释:https://blog.csdn.net/QFJIZHI/article/details/105245292
tensorrt官方文档:https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#overview
pycuda文档:https://wiki.tiker.net/PyCuda/
tensorrt加速原理:https://blog.csdn.net/xh_hit/article/details/79769599
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。