当前位置：网站首页>Introduction to tensorrt

Introduction to tensorrt

2022-04-23 21:02:00 【Top of the program】

1. Basic TensorRT Workflow

Insert picture description here

2. Transformation and deployment options
2.1 transformation

Use TF-TRT, In order to convert TensorFlow Model ,TensorFlow Integrate (TF-TRT) Provides model transformation and advanced runtime
API, And has a fallback to TensorRT The of a specific operator is not supported TensorFlow Realization .
A more efficient option for automatic model transformation and deployment is transformation
from .onnx Automatic file conversion ONNX. Use ONNX. ONNX Is a framework independent option , Can be used to TensorFlow ,PyTorch The model in such format is transformed into ONNX Format .TensorRT Support use ONNX Automatic file conversion TensorRT API or trtexec - The latter is what we will use in this guide .ONNX Conversion is All or nothing , This means that all operations in your model must be performed by TensorRT Support （ Or you A custom plug-in must be provided for unsupported operations ）.ONNX The final result of the conversion It's a single TensorRT engine , It allows you to use TF-TRT Less spending .
Use TensorRT API（ stay C++ or Python in ） Build the network manually
To maximize performance and customizability , You can also build TensorRT The engine is used manually TensorRT Network definition API. This mainly involves TensorRT The ecological system NVIDIA TensorRT DU-10313-001_v8.2.3 | 10 stay TensorRT In operation, the same network as the target model is built through operation , Use only TensorRT operation . establish TensorRT After network , You will only export the weights of the models taken from the frame and load them into the TensorRT In the network . For this method , About using TensorRT More information definition of network construction model API, You can find it here ：
Creating A Network Definition From Scratch Using The C++ API
Creating A Network Definition From Scratch Using The Python API
2.2 Deploy

Use TensorRT The deployment model has three options ：

‣ stay TensorFlow Deployment in China
‣ Use independent TensorRT Runtime API
‣ Use NVIDIA Triton Inference server

Your deployment choices will determine the steps required to transform the model .

Use TF-TRT when , The most common deployment option is simply to TensorFlow. TF-TRT The transformation produces a with TensorRT Operation of the
TensorFlow chart Insert it . This means you can run like any other TensorFlow Same operation TF-TRT Model USES Python
Model of .
TensorRT Runtime API Allow the lowest overhead and the most fine-grained control , but TensorRT Operators that are not supported by themselves must be implemented as plug-ins （ A library
Pre written plug-ins are available here ）. Use the most common path for runtime deployment API Is derived from the framework ONNX
To achieve , This is described in the following sections of this guide .
Last ,NVIDIA Triton Inference Server Is an open source reasoning service software , Enable the team to start from any framework （TensorFlow、TensorRT、 PyTorch、ONNX Run time or custom framework ）, From local storage or Google Cloud Anything based on GPU or CPU Infrastructure of （ cloud 、 Data center or edge ） On the platform or AWS S3. This is a flexible project , It has several unique functions For example, the first mock exam models execute heterogeneous models and multiple replicates of the same model. （ Multiple copies of the model can further reduce latency ） And load balancing and model analysis . This is a good choice if you need to pass HTTP Provide models - For example, in cloud reasoning solutions .
2.3 Choose the right workflow
The two most important factors in choosing how to transform and deploy the model are ：
1. The frame you choose .

Your preferred TensorRT Runtime target .
The following flowchart covers the different workflows covered in this guide . This flowchart will help you choose a path based on these two factors

The sample deployment uses ONNX
ONNX Conversion is usually automatic ONNX The most efficient way to model TensorRT engine . In this section , We will introduce the following five basic steps
In the deployment of pre training ONNX In the case of the model TensorRT transformation .

Reference article
https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb

https://github.com/ultralytics/yolov5/issues/251

版权声明
本文为[Top of the program]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/111/202204210545090116.html

当前位置：网站首页>Introduction to tensorrt

Introduction to tensorrt

边栏推荐

猜你喜欢

随机推荐