当前位置:网站首页>Triton部署mmdeploy导出的TensorRT模型失败篇
Triton部署mmdeploy导出的TensorRT模型失败篇
2022-08-04 05:38:00 【gy-77】
记录一下历程,最终没有部署成功,应该是Ubantu系统版本的问题。现在没有时间搞了,先记录一下,后续用到再填坑。
Triton demo
git clone -b r22.06 https://github.com/triton-inference-server/server.git
cd server/docs/examples
./fetch_models.sh
# 构建并启动容器1的服务
docker run --gpus=1 --rm --net=host -v /home/xbsj/gaoying/triton/triton_demo/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models
# 进入容器2,准备发送请求
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:22.06-py3-sdk
# 在容器2中发送请求
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Triton安装及启动服务(docker)
triton容器与cuda,tensorrt对应: Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation
更详细的在这: Frameworks Support Matrix :: NVIDIA Deep Learning Frameworks Documentation
1️⃣ Triton安装
拉取docker镜像,20.11是版本号, 可以去这里挑选:Triton Inference Server (Formerly TensorRT inference Server) | NVIDIA NGC
新建一个Dockerfile.triton文件,内容如下
FROM nvcr.io/nvidia/tritonserver:20.11-py3
RUN
保存并推出,运行下面命令安装triton 的 docker。先创建Dockerfile.triton文件再安装的好处是,可以把镜像命名为triton:2104,方便查看。并且如果想对triton docker镜像添加一些操作的话,可以在Dockerfile.triton文件中继续添加。
nvidia-docker build -f Dockerfile.triton -t triton:2011 .
2️⃣ 模型配置文件编写
新建一个本地目录,用于映射到docker容器
映射目录配置
.
└── model_rep # 宿主机要映射的根目录
├── demo1 # 模型1
│ ├── 1 # 模型版本号
│ │ └── model.pt # 模型
│ ├── 2 # 模型版本号
│ │ └── model.pt # 模型
│ └── config.pbtxt
└── demo2 # 模型2
├── 1
│ └── model.pt
└── config.pbtxt
模型配置文件编写
下面是一个用Netron软件打开的onnx格式的模型。我们可以看到输入,输出的名称,以及类型。我们根据这个修改配置文件中的input和output。下面是faster_rcnn_r50_trt的onnx模型文件,以及faster_rcnn_r50_trt的配置文件。
下面是对应上边模型的config.pbtxt配置文件
name: "faster_rcnn_r50_trt" # 模型名,也是目录名
platform: "tensorrt_plan" # 模型对应的平台,参考文章下面给出的表格
max_batch_size : 8 # 一次送入模型的最大batch_size。
input [
{
name: "input"
data_type: TYPE_FP32
dims: [ 3,-1,-1 ] # 第一个维度默认是batch size,不用咱们配置。因此我们从第二个维度开始配置。
# 如果是可变维度,我们就用 -1
}
]
output [
{
name: "dets"
data_type: TYPE_FP32
dims: [-1,-1]
},
{
name: "labels"
data_type: TYPE_INT32
dims: [ -1 ]
}
]
default_model_filename: "end2end.engine"
框架与platform对应表格:
| 框架名 | platform |
|---|---|
| TensorRT | tensorrt_plan |
| TensorFlow SavedModel | tensorflow_savedmodel |
| TensorFlow GraphDef | tensorflow_graphdef |
| ONNX | onnxruntime_onnx |
| Torch | pytorch_libtorch |
输入输出data_type对应表格:
| Model Config | TensorRT | TensorFlow | ONNX Runtime | PyTorch | API | NumPy |
|---|---|---|---|---|---|---|
| TYPE_BOOL | kBOOL | DT_BOOL | BOOL | kBool | BOOL | bool |
| TYPE_UINT8 | DT_UINT8 | UINT8 | kByte | UINT8 | uint8 | |
| TYPE_UINT16 | DT_UINT16 | UINT16 | UINT16 | uint16 | ||
| TYPE_UINT32 | DT_UINT32 | UINT32 | UINT32 | uint32 | ||
| TYPE_UINT64 | DT_UINT64 | UINT64 | UINT64 | uint64 | ||
| TYPE_INT8 | kINT8 | DT_INT8 | INT8 | kChar | INT8 | int8 |
| TYPE_INT16 | DT_INT16 | INT16 | kShort | INT16 | int16 | |
| TYPE_INT32 | kINT32 | DT_INT32 | INT32 | kInt | INT32 | int32 |
| TYPE_INT64 | DT_INT64 | INT64 | kLong | INT64 | int64 | |
| TYPE_FP16 | kHALF | DT_HALF | FLOAT16 | FP16 | float16 | |
| TYPE_FP32 | kFLOAT | DT_FLOAT | FLOAT | kFloat | FP32 | float32 |
| TYPE_FP64 | DT_DOUBLE | DOUBLE | kDouble | FP64 | float64 | |
| TYPE_STRING | DT_STRING | STRING | BYTES | dtype(object) |
3️⃣ 启动服务
启动并执行服务:
–gpus all代表启用gpus
/home/xbsj/gaoying/triton/model_rep/:/models 本地目录映射到docker目录
8000为http端口,8001为grpc端口
nvcr.io/nvidia/tritonserver:21.11-py3,版本号记得改成自己的。
docker run --gpus all -p8000:8000 -p8001:8001 -p8002:8002 -v /home/xbsj/gaoying/triton/model_rep:/model_rep -v /home/xbsj/gaoying/triton/plugin_rep:/plugin_rep --env LD_PRELOAD=/plugin_rep/libmmdeploy_tensorrt_ops.so triton:2201 tritonserver --model-repository=/model_rep
进入docker,启动服务
docker run --gpus=all --network=host --shm-size=2g -v /home/xbsj/gaoying/triton/model_rep/:/models -it nvcr.io/nvidia/tritonserver:21.04-py3 # 进入 docker
./bin/tritonserver --model-repository=/models # 启动 triton
docker run --gpus=all --network=host -v /home/xbsj/gaoying/triton/model_rep:/opt/ml/model -it triton:2104 # 进入 docker
./bin/tritonserver --model-repository=/models # 启动 triton
客户端测试接口
1️⃣ 命令行接口测试
测试命令是否准备好,宿主机命令行运行
curl -v localhost:8000/v2/health/ready
成功结果:
Trying 127.0.0.1…
TCP_NODELAY set
Connected to localhost (127.0.0.1) port 8000 (#0)
GET /v2/health/ready HTTP/1.1
Host: localhost:8000
User-Agent: curl/7.58.0
Accept: /< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<Connection #0 to host localhost left intact
2️⃣ triton client 接口测试
grpc
faster rcnn r50 十个迭代用时: 1.0688064098358154
import os
import time
import numpy as np
import tritonclient.grpc as grpcclient
from PIL import Image
def client_init(url="localhost:8001",
ssl=False, private_key=None, root_certificates=None, certificate_chain=None,
verbose=False):
triton_client = grpcclient.InferenceServerClient(
url=url,
verbose=verbose,
ssl=ssl,
root_certificates=root_certificates,
private_key=private_key,
certificate_chain=certificate_chain)
return triton_client
def infer_faster_rcnn_r50_trt_grpc(triton_client, model_name, input='input', dets='dets', labels='labels',
compression_algorithm=None):
inputs = []
outputs = []
# 添加输入的数据
inputs.append(grpcclient.InferInput(input, [1, 3, 427, 640], "FP32"))
# 给输入的数据赋值
root_dir = os.getcwd()
img_path = os.path.join(root_dir, 'demo.jpg') # 自己把一张图片命名为demo.jpg放到目录下
img = np.array(Image.open(img_path))
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0) # (1, 3, 427, 640)
inputs[0].set_data_from_numpy(img)
# 添加输出的数据
outputs.append(grpcclient.InferRequestedOutput(dets))
outputs.append(grpcclient.InferRequestedOutput(labels))
results = triton_client.infer(
model_name=model_name,
inputs=inputs,
outputs=outputs,
compression_algorithm=compression_algorithm
# client_timeout=0.1
)
# print('=' * 50)
print(results)
# print('=' * 50)
# # 转化为numpy格式
# print(results.as_numpy(output0))
# print('=' * 50)
# print(results.as_numpy(output1))
# print('=' * 50)
if __name__ == '__main__':
client = client_init()
st = time.time()
for i in range(10):
infer_faster_rcnn_r50_trt_grpc(triton_client=client, model_name='faster_rcnn_r50_trt')
print("grpc faster rcnn r50 十个迭代用时: {}".format(time.time() - st))
http
http faster rcnn r50 十个迭代用时:1.1643376350402832
import os
import time
import gevent.ssl
import numpy as np
import tritonclient.http as httpclient
from PIL import Image
def client_init(url="localhost:8000",
ssl=False, key_file=None, cert_file=None, ca_certs=None, insecure=False,
verbose=False):
if ssl:
ssl_options = {
}
if key_file is not None:
ssl_options['keyfile'] = key_file
if cert_file is not None:
ssl_options['certfile'] = cert_file
if ca_certs is not None:
ssl_options['ca_certs'] = ca_certs
ssl_context_factory = None
if insecure:
ssl_context_factory = gevent.ssl._create_unverified_context
triton_client = httpclient.InferenceServerClient(
url=url,
verbose=verbose,
ssl=True,
ssl_options=ssl_options,
insecure=insecure,
ssl_context_factory=ssl_context_factory)
else:
triton_client = httpclient.InferenceServerClient(
url=url, verbose=verbose)
return triton_client
def infer_faster_rcnn_r50_trt_http(triton_client, model_name='faster_rcnn_r50_trt',
input='input', output0='dets', output1='labels',
request_compression_algorithm=None,
response_compression_algorithm=None):
inputs = []
outputs = []
# 添加输入的数据
inputs.append(httpclient.InferInput(input, [1, 3, 427, 640], "FP32"))
# 给输入的数据赋值
root_dir = os.getcwd()
img_path = os.path.join(root_dir, 'demo.jpg') # 自己把一张图片命名为demo.jpg放到目录下
img = np.array(Image.open(img_path))
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0) # (1, 3, 427, 640)
inputs[0].set_data_from_numpy(img)
# OUTPUT0、OUTPUT1为配置文件中的输出节点名称
outputs.append(httpclient.InferRequestedOutput(output0, binary_data=False))
outputs.append(httpclient.InferRequestedOutput(output1, binary_data=False))
results = triton_client.infer(
model_name=model_name,
inputs=inputs,
outputs=outputs,
request_compression_algorithm=request_compression_algorithm,
response_compression_algorithm=response_compression_algorithm)
# print('=' * 50)
print(results)
# print('=' * 50)
# # 转化为numpy格式
# print(results.as_numpy(output0))
# print('=' * 50)
# print(results.as_numpy(output1))
# print('=' * 50)
if __name__ == '__main__':
triton_client = client_init()
st=time.time()
for i in range(10):
infer_faster_rcnn_r50_trt_http(triton_client)
print("http faster rcnn r50 十个迭代用时:{}".format(time.time()-st))
3️⃣ requests 接口测试
requests faster rcnn r50 十个迭代用时: 3.843385934829712
import os
import time
import numpy as np
from PIL import Image
import requests
def infer_demo_torch_http():
url = 'http://localhost:8000/v2/models/demo_torch/versions/1/infer'
data = {
"inputs": [{
"name": "input__0",
"shape": [2, 3],
"datatype": "INT64",
"data": [[1, 2, 3], [4, 5, 6]]
}],
"outputs": [{
"name": "output__0"}, {
"name": "output__1"}]
}
headers = {
'Content-Type': 'application/json'}
res = requests.post(url, json=data, headers=headers).json()
print(res)
def infer_demo_onnx_http():
url = 'http://localhost:8000/v2/models/demo_onnx/versions/1/infer'
data = {
"inputs": [{
"name": "INPUT0",
"shape": [8, 2],
"datatype": "FP32",
"data": [[0.1] * 2 for _ in range(8)]
}, {
"name": "INPUT1",
"shape": [8, 2],
"datatype": "INT32",
"data": [[1] * 2 for _ in range(8)]
}],
"outputs": [{
"name": "OUTPUT0"}, {
"name": "OUTPUT1"}]
}
headers = {
'Content-Type': 'application/json'}
res = requests.post(url, json=data, headers=headers).json()
print(res)
def infer_faster_rcnn_r50_onnx_http():
root_dir = os.getcwd()
img_path = os.path.join(root_dir, 'demo.jpg')
img = np.array(Image.open(img_path))
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0) # (1, 3, 427, 640)
# img = np.repeat(img, repeats=2, axis=0) # (2, 3, 427, 640)
img = img.tolist()
url = 'http://localhost:8000/v2/models/faster_rcnn_r50_onnx/versions/1/infer'
data = {
"inputs": [{
"name": "input",
"shape": [1, 3, 427, 640],
"datatype": "FP32",
"data": img
}, ],
"outputs": [{
"name": "dets"}, {
"name": "labels"}]
}
headers = {
'Content-Type': 'application/json'}
res = requests.post(url, json=data, headers=headers).json()
print(res)
def infer_faster_rcnn_r50_trt_http():
root_dir = os.getcwd()
img_path = os.path.join(root_dir, 'demo.jpg')
img = np.array(Image.open(img_path))
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0) # (1, 3, 427, 640)
img = img.tolist()
url = 'http://localhost:8000/v2/models/faster_rcnn_r50_trt/versions/1/infer'
data = {
"inputs": [{
"name": "input",
"shape": [1, 3, 427, 640],
"datatype": "FP32",
"data": img
}, ],
"outputs": [{
"name": "dets"}, {
"name": "labels"}]
}
headers = {
'Content-Type': 'application/json'}
res = requests.post(url, json=data, headers=headers).json()
print(res)
if __name__ == "__main__":
print('=' * 50)
print('| Infer demo_torch')
print('_' * 20)
infer_demo_torch_http()
print('=' * 50)
print('| Infer demo_onnx')
print('_' * 20)
infer_demo_onnx_http()
print('=' * 50)
print('| Infer faster_rcnn_r50_onnx')
print('_' * 20)
infer_faster_rcnn_r50_onnx_http()
print('=' * 50)
print('| Infer faster_rcnn_r50_trt')
print('_' * 20)
st = time.time()
for _ in range(10):
infer_faster_rcnn_r50_trt_http()
print("requests faster rcnn r50 十个迭代用时: {}".format(time.time() - st))
print('=' * 50)
triton压测
首先构建好我们的输入数据,input.json。
{
"inputs": [{
"name": "input__0",
"shape": [2, 3],
"datatype": "INT64",
"data": [[1, 2, 3], [4, 5, 6]]
}],
"outputs": [{
"name": "output__0"}, {
"name": "output__1"}]
}
安装一下用到的包
sudo apt install apache2-utils
压测命令
ab -k -c 5 -n 500 -p input.json http://localhost:8000/v2/models/demo/versions/1/infer
命令的意思是5个进程反复调用接口共500次,输入数据为input.json,模型是demo模型,版本1。
triton报错合集:
️ INVALID_ARGUMENT: getPluginCreator could not find plugin TRTBatchedNMS version 1
用mmdeploy docker转换出来的tensorrt模型,在triton docker中没法用,报以下错误:(triton的报错信息,刚开始我也不会看,那么一大堆,找不到关键是哪里报错。教大家一下,E开头的就是报错的)
E0630 01:31:22.566631 1 logging.cc:43] INVALID_ARGUMENT: getPluginCreator could not find plugin TRTBatchedNMS version 1
E0630 01:31:22.566657 1 logging.cc:43] safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
E0630 01:31:22.566739 1 logging.cc:43] INVALID_STATE: std::exception
E0630 01:31:22.572629 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
E0630 01:31:22.587565 1 model_repository_manager.cc:1215] failed to load ‘faster_rcnn_r50_tensorrt’ version 1: Internal: unable to create TensorRT engine
方法一(推荐)
参考:yolo模型部署——tensorRT模型加速+triton服务器模型部署
直接运行下面命令(根据自己的自行修改)
docker run --gpus all -p8000:8000 -p8001:8001 -p8002:8002 -v /home/xbsj/gaoying/triton/model_rep:/model_rep -v /home/xbsj/gaoying/triton/plugin_rep:/plugin_rep --env LD_PRELOAD=/plugin_rep/libmmdeploy_tensorrt_ops.so triton:2104 tritonserver --model-repository=/model_rep
方法二
解决方法来源: end2end.engine to Triton · Issue #465 · open-mmlab/mmdeploy (github.com)
具体方法:(我试了,没成功。。。是我操作不对)
1️⃣ 将 /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so 从 mmdeploy docker 镜像复制到 triton docker 镜像中的 /opt/tritonserver/lib/
docker run --gpus=all --network=host -v /home/xbsj/gaoying/triton/model_rep:/opt/ml/model -it triton:2104 # 宿主机命令行运行,进入triton docker容器,但不启动服务
docker ps # 宿主机命令行运行,查看triton docker容器的id
docker cp /data/imagetd/xbsj/gaoying//mmdeploy_out/libmmdeploy_tensorrt_ops.so 7725e367f0f0:/opt/tritonserver/lib/libmmdeploy_tensorrt_ops.so # 传输文件,宿主机->triton容器
2️⃣ 将 LD_PRELOAD=libmmdeploy_tensorrt_ops.so 附加到 /bin/serve/ 的末尾,tritonserver服务之前。
vim /bin/serve
添加上下面命令,105行
LD_PRELOAD=libmmdeploy_tensorrt_ops.so
启动服务
./bin/tritonserver --model-store=/models
️ ImportError: cannot import name ‘ORTWrapper’ from ‘mmdeploy.backend.onnxruntime’ (/data/imagetd/xbsj/gaoying/mmdeploy/mmdeploy/backend/onnxruntime/init.py)
解决方法来源:Bug using ORTwrapper · Issue #37 · open-mmlab/mmdeploy (github.com)
方法
在 mmdeploy/codebase/mmdet/core/post_processing/bbox_nms.py::select_nms_index 中,将return batched_dets, batched_labele 更改为 return batched_dets[:, 0:-1, :], batched_labels[:, 0:-1] 可能会修复 bug .
然后运行命令
python setup.py install
后边再进行模型转换
️ Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
解决方法参考:Bug using ORTwrapper · Issue #37 · open-mmlab/mmdeploy (github.com)
边栏推荐
- RuntimeError: You called this URL via POST, but the URL doesn‘t end in a slash and you have APPEND_S
- 什么是多态。
- SQL如何从字符串截取指定字符(LEFT、MID、RIGHT三大函数)
- 窥探晶体世界的奥秘 —— 230种空间群晶体结构模型全在这里
- 原型图总结规范
- Unable to preventDefault inside passive event listener due to target being treated as passive. See
- 七夕专属程序员的浪漫
- Error EPERM operation not permitted, mkdir ‘Dsoftwarenodejsnode_cache_cacach两种解决办法
- SegNet——论文笔记
- A priori box (Anchor) in target detection
猜你喜欢

HbuilderX 启动微信小程序 无法打开项目

SENet详解及Keras复现代码

Database knowledge: SQLServer creates non-sa user notes

QT 出现多冲定义问题

90多款matlab工具箱打包放送

解决腾讯云DescribeInstances api查询20条记录以上的问题

Hardware Knowledge: Introduction to RTMP and RTSP Traditional Streaming Protocols

FCN——语义分割的开山鼻祖(基于tf-Kersa复现代码)

ResNet详解:ResNet到底在解决什么问题?

IoU, GIoU, DIoU and CIoU in target detection
随机推荐
MySQL错误-this is incompatible with sql_mode=only_full_group_by完美解决方案
[漏洞问题] log4j漏洞 关于2.17.0升级到2.18.0 方案
IDEA中创建编写JSP
mysql锁机制
SegNet——论文笔记
VMD结合ISSA优化LSSVM功率预测
unicloud 腾讯云 上传文件 Have no access right to the storage uniapp
如何用matlab做高精度计算?【第二辑】
如何用matlab做高精度计算?【第一辑】
Microsoft computer butler 2.0 beta experience
DenseNet详解及Keras复现代码
MySQL大总结
Faster RCNN原理及复现代码
元素的增删克隆以及利用增删来显示数据到页面上
mysql:列类型之float、double
如何用matlab做高精度计算?【第三辑】(完)
目标检测中的IoU、GIoU、DIoU与CIoU
MySQL重置root密码
Based on the EEMD + + MLR GRU helped time series prediction
pycharm专业版使用