当前位置:网站首页>Introduction to tensorrt
Introduction to tensorrt
2022-04-23 21:02:00 【Top of the program】
1. Basic TensorRT Workflow
2. Transformation and deployment options
2.1 transformation
- Use TF-TRT, In order to convert TensorFlow Model ,TensorFlow Integrate (TF-TRT) Provides model transformation and advanced runtime
API, And has a fallback to TensorRT The of a specific operator is not supported TensorFlow Realization .
A more efficient option for automatic model transformation and deployment is transformation - from .onnx Automatic file conversion ONNX. Use ONNX. ONNX Is a framework independent option , Can be used to TensorFlow ,PyTorch The model in such format is transformed into ONNX Format .TensorRT Support use ONNX Automatic file conversion TensorRT API or trtexec - The latter is what we will use in this guide .ONNX Conversion is All or nothing , This means that all operations in your model must be performed by TensorRT Support ( Or you A custom plug-in must be provided for unsupported operations ).ONNX The final result of the conversion It's a single TensorRT engine , It allows you to use TF-TRT Less spending .
- Use TensorRT API( stay C++ or Python in ) Build the network manually
To maximize performance and customizability , You can also build TensorRT The engine is used manually TensorRT Network definition API. This mainly involves TensorRT The ecological system NVIDIA TensorRT DU-10313-001_v8.2.3 | 10 stay TensorRT In operation, the same network as the target model is built through operation , Use only TensorRT operation . establish TensorRT After network , You will only export the weights of the models taken from the frame and load them into the TensorRT In the network . For this method , About using TensorRT More information definition of network construction model API, You can find it here :
Creating A Network Definition From Scratch Using The C++ API
Creating A Network Definition From Scratch Using The Python API
2.2 Deploy
Use TensorRT The deployment model has three options :
‣ stay TensorFlow Deployment in China
‣ Use independent TensorRT Runtime API
‣ Use NVIDIA Triton Inference server
Your deployment choices will determine the steps required to transform the model .
Use TF-TRT when , The most common deployment option is simply to TensorFlow. TF-TRT The transformation produces a with TensorRT Operation of the
TensorFlow chart Insert it . This means you can run like any other TensorFlow Same operation TF-TRT Model USES Python
Model of .TensorRT Runtime API Allow the lowest overhead and the most fine-grained control , but TensorRT Operators that are not supported by themselves must be implemented as plug-ins ( A library
Pre written plug-ins are available here ). Use the most common path for runtime deployment API Is derived from the framework ONNX
To achieve , This is described in the following sections of this guide .Last ,NVIDIA Triton Inference Server Is an open source reasoning service software , Enable the team to start from any framework (TensorFlow、TensorRT、 PyTorch、ONNX Run time or custom framework ), From local storage or Google Cloud Anything based on GPU or CPU Infrastructure of ( cloud 、 Data center or edge ) On the platform or AWS S3. This is a flexible project , It has several unique functions For example, the first mock exam models execute heterogeneous models and multiple replicates of the same model. ( Multiple copies of the model can further reduce latency ) And load balancing and model analysis . This is a good choice if you need to pass HTTP Provide models - For example, in cloud reasoning solutions .
2.3 Choose the right workflow
The two most important factors in choosing how to transform and deploy the model are :
1. The frame you choose .
- Your preferred TensorRT Runtime target .
The following flowchart covers the different workflows covered in this guide . This flowchart will help you choose a path based on these two factors
The sample deployment uses ONNX
ONNX Conversion is usually automatic ONNX The most efficient way to model TensorRT engine . In this section , We will introduce the following five basic steps
In the deployment of pre training ONNX In the case of the model TensorRT transformation .
Reference article
https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb
https://github.com/ultralytics/yolov5/issues/251
版权声明
本文为[Top of the program]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/111/202204210545090116.html
边栏推荐
- Opencv reports an error. Expected PTR < CV:: UMAT > for argument '% s'‘
- 41. The first missing positive number
- C, print the source program of beautiful bell triangle
- Minecraft 1.12.2 module development (43) custom shield
- go interface
- Reentrant function
- Thinking after learning to type
- Lunch on the 23rd day at home
- go array
- MySQL基础合集
猜你喜欢
小米手机全球已舍弃“MI”品牌,全面改用“xiaomi”全称品牌
Deep analysis of C language function
Question brushing plan -- backtracking method (I)
CUDA, NVIDIA driver, cudnn download address and version correspondence
opencv应用——以图拼图
C, print the source program of beautiful bell triangle
常用60类图表使用场景、制作工具推荐
Linux中,MySQL的常用命令
Flomo software recommendation
Deep analysis of C language pointer (Part I)
随机推荐
Communication between RING3 and ring0
UKFslam
MySQL basic collection
Some grounded words
Tensorflow realizes gradient accumulation, and then returns
Rust更适合经验较少的程序员?
Tensorflow1. X and 2 How does x read those parameters saved in CKPT
wait、waitpid
2. Finishing huazi Mianjing -- 2
2.整理华子面经--2
Thinkphp5 + data large screen display effect
Deep analysis of C language function
How to make Jenkins job run automatically after startup
41. The first missing positive number
Prim、Kruskal
阿里云回应用户注册信息泄露事件
Graph traversal - BFS, DFS
setInterval、setTimeout、requestAnimationFrame
[leetcode refers to offer 27. Image of binary tree (simple)]
Another data analysis artifact: Polaris is really powerful