当前位置:网站首页>Introduction to tensorrt

Introduction to tensorrt

2022-04-23 21:02:00 Top of the program

1. Basic TensorRT Workflow

 Insert picture description here

2. Transformation and deployment options
2.1 transformation

  • Use TF-TRT, In order to convert TensorFlow Model ,TensorFlow Integrate (TF-TRT) Provides model transformation and advanced runtime
    API, And has a fallback to TensorRT The of a specific operator is not supported TensorFlow Realization .
    A more efficient option for automatic model transformation and deployment is transformation
  • from .onnx Automatic file conversion ONNX. Use ONNX. ONNX Is a framework independent option , Can be used to TensorFlow ,PyTorch The model in such format is transformed into ONNX Format .TensorRT Support use ONNX Automatic file conversion TensorRT API or trtexec - The latter is what we will use in this guide .ONNX Conversion is All or nothing , This means that all operations in your model must be performed by TensorRT Support ( Or you A custom plug-in must be provided for unsupported operations ).ONNX The final result of the conversion It's a single TensorRT engine , It allows you to use TF-TRT Less spending .
  • Use TensorRT API( stay C++ or Python in ) Build the network manually
    To maximize performance and customizability , You can also build TensorRT The engine is used manually TensorRT Network definition API. This mainly involves TensorRT The ecological system NVIDIA TensorRT DU-10313-001_v8.2.3 | 10 stay TensorRT In operation, the same network as the target model is built through operation , Use only TensorRT operation . establish TensorRT After network , You will only export the weights of the models taken from the frame and load them into the TensorRT In the network . For this method , About using TensorRT More information definition of network construction model API, You can find it here :
    Creating A Network Definition From Scratch Using The C++ API
    Creating A Network Definition From Scratch Using The Python API
    2.2 Deploy

Use TensorRT The deployment model has three options :

‣ stay TensorFlow Deployment in China
‣ Use independent TensorRT Runtime API
‣ Use NVIDIA Triton Inference server

Your deployment choices will determine the steps required to transform the model .

  • Use TF-TRT when , The most common deployment option is simply to TensorFlow. TF-TRT The transformation produces a with TensorRT Operation of the
    TensorFlow chart Insert it . This means you can run like any other TensorFlow Same operation TF-TRT Model USES Python
    Model of .

  • TensorRT Runtime API Allow the lowest overhead and the most fine-grained control , but TensorRT Operators that are not supported by themselves must be implemented as plug-ins ( A library
    Pre written plug-ins are available here ). Use the most common path for runtime deployment API Is derived from the framework ONNX
    To achieve , This is described in the following sections of this guide .

  • Last ,NVIDIA Triton Inference Server Is an open source reasoning service software , Enable the team to start from any framework (TensorFlow、TensorRT、 PyTorch、ONNX Run time or custom framework ), From local storage or Google Cloud Anything based on GPU or CPU Infrastructure of ( cloud 、 Data center or edge ) On the platform or AWS S3. This is a flexible project , It has several unique functions For example, the first mock exam models execute heterogeneous models and multiple replicates of the same model. ( Multiple copies of the model can further reduce latency ) And load balancing and model analysis . This is a good choice if you need to pass HTTP Provide models - For example, in cloud reasoning solutions .
    2.3 Choose the right workflow
    The two most important factors in choosing how to transform and deploy the model are :
    1. The frame you choose .

  1. Your preferred TensorRT Runtime target .
    The following flowchart covers the different workflows covered in this guide . This flowchart will help you choose a path based on these two factors
     Insert picture description here
    The sample deployment uses ONNX
    ONNX Conversion is usually automatic ONNX The most efficient way to model TensorRT engine . In this section , We will introduce the following five basic steps
    In the deployment of pre training ONNX In the case of the model TensorRT transformation .

Reference article
https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb

https://github.com/ultralytics/yolov5/issues/251

版权声明
本文为[Top of the program]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/111/202204210545090116.html