An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Last update: Dec 27, 2022

Overview

Ultra_Fast_Lane_Detection_TensorRT

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)
这是一个基于TensorRT加速UFLD的repo，包含PyThon ONNX Parser以及C++ TensorRT API版本, 还包括Torch2TRT版本, 对源码和论文感兴趣的请参见：https://github.com/cfzd/Ultra-Fast-Lane-Detection

一. PyThon ONNX Parser

1. How to run

1) pip install -r requirements.txt

2) TensorRT7.x wil be fine, and other version may got some errors

2) For PyTorch, you can also try another version like 1.6, 1.5 or 1.4

2. Build ONNX（将训练好的pth/pt模型转换为onnx）

1) static（生成静态onnx模型）:
python3 torch2onnx.py onnx_dynamic_int8/configs/tusimple_4.py --test_model ./tusimple_18.pth 

2) dynamic（生成支持动态输入的onnx模型）:
First: vim torch2onnx.py
second: change "fix" from "True" to "False"
python3 torch2onnx.py onnx_dynamic_int8/configs/tusimple_4.py --test_model ./tusimple_18.pth

3. Build trt engine（将onnx模型转换为TensorRT的推理引擎）

We support many different types of engine export, such as static fp32, fp16, dynamic fp32, fp16, and int8 quantization
我们支持多种不同类型engine的导出，例如：静态fp32、fp16，动态fp32、fp16，以及int8的量化

static(fp32, fp16): 对于静态模型的导出，终端输入：

fp32:
python3 build_engine.py --onnx_path model_static.onnx --mode fp32<br/>
fp16:
python3 build_engine.py --onnx_path model_static.onnx --mode fp16<br/>

dynamic(fp32, fp16): 对于动态模型的导出，终端输入：

fp32:
python3 build_engine.py --onnx_path model_dynamic.onnx --mode fp32 --dynamic
fp16:
python3 build_engine.py --onnx_path model_dynamic.onnx --mode fp16 --dynamic

int8 quantization 如果想使用int8量化，终端输入：

python3 build_engine.py --onnx_path model_static.onnx --mode int8 --int8_data_path data/testset1000
# （int8_data_Path represents the calibration dataset）
# （其中int8_data_path表示校正数据集)

4. evaluate(compare)

（If you want to compare the acceleration and accuracy of reasoning through TRT with using pytorch, you can run the script）
（如果您想要比较通过TRT推理后，相对于使用PyTorch的加速以及精确度情况，可以运行该脚本）

python3 evaluate.py --pth_path PATH_OF_PTH_MODEL --trt_path PATH_OF_TRT_MODEL

二. torch2trt

torch2trt is an easy tool to convert pytorch model to tensorrt, you can check model details here:
https://github.com/NVIDIA-AI-IOT/torch2trt
（torch2trt 是一个易于使用的PyTorch到TensorRT转换器）

How to run

1) git clone https://github.com/NVIDIA-AI-IOT/torch2trt

2) python setup.py install

2) PyTorch >= 1.6 (other versions may got some errors)

生成trt模型

python3 export_trt.py

torch2trt 预测demo (可视化)

python3 demo_torch2trt.py --trt_path PATH_OF_TRT_MODEL --data_path PATH_OF_YOUR_IMG

evaluated

python3 evaluate.py --pth_path PATH_OF_PTH_MODEL --trt_path PATH_OF_TRT_MODEL --data_path PATH_OF_YOUR_IMG --torch2trt

三. C++ TensorRT API

生成权重文件

python3 export_trtcy.py

trt模型生成

修改第十行为 #define USE_FP32,则为FP32模式, 修改第十行为 #define USE_FP16,则为FP16模式

mkdir build
cd build
cmake ..
make
./lane_det -transfer             //  'lane_det.engine'

Tensorrt预测

./lane_det -infer  ../imgs

四. trtexec

test tensorrt_dynamic_model on terminal, for instance, for batch_size=BATCH_SIZE, just run:

trtexec  --explicitBatch --minShapes=1x3x288x800 --optShapes=1x3x288x800 --maxShapes=32x3x288x800 --shapes=BATCH_SIZEx3x288x800 --loadEngine=lane_fp32_dynamic.trt --noDataTransfers --dumpProfile --separateProfileRun

You might also like...

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This repository is a modification of: https://github.com/openai/gpt-2 for our sp

1 Jan 6, 2022

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Arabic-Phonetic-Output You can input the phonetic version of any Arabic text her

1 Dec 30, 2021

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

148 Dec 26, 2022

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Colibri Core by Maarten van Gompel, [email protected], Radboud University Nijmegen Licensed under GPLv3 (See http://www.gnu.org/licenses/gpl-3.0.html

122 Nov 17, 2022

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

60 Dec 31, 2022

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

AI Dynamic Text Reader: This is a simple dynamic text reader based on Artificial

1 Jan 18, 2022

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支，删除 wavegan 分支！ 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块！ 2021/04/13 softdtw 分支支持使用 Sof

161 Dec 19, 2022

Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

112 Dec 5, 2022

A high-level yet extensible library for fast language model tuning via automatic prompt search

ruPrompts ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with Hugg

37 Dec 7, 2022

Comments

bug in UFLD_C++/main.cpp

in function softmax_mul() : exp() don't substruct channel's (100) largest value; int funcion argmax(): "int max" should change to "float max".

opened by tangjianping54 0
请问怎么用CULane数据集训练的权重来推理

我使用UFLD_C++来进行推理，修改了export_trtcy.py中的model = parsingNet(pretrained=False, backbone='18', cls_dim=(101, 56, 4), use_aux=False).cuda()，改为model = parsingNet(pretrained=False, backbone='18', cls_dim=(201, 18, 4), use_aux=False).cuda()，并且把OUTPUT_C改成201，把OUTPUT_H改成18，把OUTPUT_W改为4. 然后运行./lane_det -transfer的时候抛出了下面的错误： ./lane_det -transfer Loading weights: ../lane_culane.trtcy Platform supports fp16 mode and use it !!! Building engine, please wait for a while... [08/29/2022-11:29:31] [E] [TRT] (Unnamed Layer* 73) [Constant]: constant weights has count 29638656 but 46333952 was expected [08/29/2022-11:29:31] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 73) [Constant]_output, because the network is not valid. [08/29/2022-11:29:31] [E] [TRT] Network validation failed. Build engine successfully! lane_det: /home/juche/Desktop/lmf_workspace/Ultra_Fast_Lane_Detection_TensorRT/UFLD_C++/UFLD/UFLD_net.cpp:138: void UFLD_net::APIToModel(nvinfer1::IHostMemory**): Assertion `engine != nullptr' failed. Aborted (core dumped)

请问我该怎么办？

opened by limengfei3675 1
Unpickling issue with torch2trt

I converted the tusimple_18.pth weight from the original UFLD repo using torch2onnx.py and build_engine.py scripts to a trt file. Running evaluate.py shows Inference time with PyTorch = 141.777 ms and Inference time with TensorRT_static = 27.395 ms in fp16. However, running UFLD_torch2trt/demo_torch2trt.py returns this error: Traceback (most recent call last): File "UFLD_torch2trt/demo_torch2trt.py", line 96, in <module> demo_with_torch2trt(trt_path, data_path) File "UFLD_torch2trt/demo_torch2trt.py", line 31, in demo_with_torch2trt model_trt.load_state_dict(torch.load(trt_file_path)) File "/home/nam/.local/lib/python3.6/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/nam/.local/lib/python3.6/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: unpickling stack underflow It appears the issue mostly comes from loading old torchvision models, I tried to delete torch caches but it didnt work. I tried for both static and dynamic model but the result is the same. :(

opened by namKolorfuL 0
Issue with demo_trt.py

Hi, I downloaded tusimple_18.pth weight from the original UFLD repo and converted it to trt using your scipts in UFLD_Tiny. However, when doing inference with demo_trt.py, i got this error:

[email protected]:~/Desktop/Ultra_Fast_Lane_Detection_TensorRT$ python3 UFLD_Tiny/demo_trt.py --model ./model_static_fp16 Loading TRT file from path ./model_static_fp16.trt... [array([-0.2890625 , -1. , -1.4892578 , ..., 2.9804688 , 0.18823242, 9.140625 ], dtype=float32)] Traceback (most recent call last): File "UFLD_Tiny/demo_trt.py", line 123, in <module> main() File "UFLD_Tiny/demo_trt.py", line 93, in main out_j = trt_outputs[0].reshape(97, 56, 4) # tiny版本不一样 ValueError: cannot reshape array of size 22624 into shape (97,56,4) The output looks like a 1-D array. Any idea how to solve this? My system: Jetson TX2, Jetpack 4.5.1, Ubuntu 18.04, CUDA 10.2, Tensorrt 7.1.3

opened by namKolorfuL 0

Releases(TRT2021)

TRT2021(May 1, 2021)

Source code(tar.gz)
Source code(zip)

Owner

steven.yan

Algorithm engineer

GitHub Repository

Chinese segmentation library

What is loso? loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ( Fang-Pen Lin 82 Jun 28, 2022

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022

Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

27 Jul 09, 2022

A list of NLP(Natural Language Processing) tutorials

NLP Tutorial A list of NLP(Natural Language Processing) tutorials built on PyTorch. Table of Contents A step-by-step tutorial on how to implement and

1.3k Dec 25, 2022

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

End to end text to speech system using gruut and onnx

673 Dec 28, 2022

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Linear Multihead Attention (Linformer) PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer:

58 Dec 23, 2022

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

(简体中文|English) Quick Start | Documents | Models List PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks i

5.6k Jan 03, 2023

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Related tags

Overview

Ultra_Fast_Lane_Detection_TensorRT

一. PyThon ONNX Parser

1. How to run

2. Build ONNX（将训练好的pth/pt模型转换为onnx）

3. Build trt engine（将onnx模型转换为TensorRT的推理引擎）

static(fp32, fp16): 对于静态模型的导出，终端输入：

dynamic(fp32, fp16): 对于动态模型的导出，终端输入：

int8 quantization 如果想使用int8量化，终端输入：

4. evaluate(compare)

二. torch2trt

How to run

生成trt模型

torch2trt 预测demo (可视化)

evaluated

三. C++ TensorRT API

生成权重文件

trt模型生成

修改第十行为 #define USE_FP32,则为FP32模式, 修改第十行为 #define USE_FP16,则为FP16模式

Tensorrt预测

四. trtexec

test tensorrt_dynamic_model on terminal, for instance, for batch_size=BATCH_SIZE, just run:

You might also like...

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Simple and efficient RevNet-Library with DeepSpeed support

A high-level yet extensible library for fast language model tuning via automatic prompt search

Comments

bug in UFLD_C++/main.cpp

请问怎么用CULane数据集训练的权重来推理

Unpickling issue with torch2trt

Issue with demo_trt.py

Releases(TRT2021)

TRT2021(May 1, 2021)

Owner

steven.yan

Chinese segmentation library

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Shellcode antivirus evasion framework

A list of NLP(Natural Language Processing) tutorials

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

ASCEND Chinese-English code-switching dataset

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

تولید اسم های رندوم فینگیلیش

Blender addon - Scrub timeline from viewport with a shortcut

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

A look-ahead multi-entity Transformer for modeling coordinated agents.

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

Natural Language Processing Specialization

Official Stanford NLP Python Library for Many Human Languages

Beautiful visualizations of how language differs among document types.

Unofficial PyTorch implementation of Google AI's VoiceFilter system

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

A high-level yet extensible library for fast language model tuning via automatic prompt search