PyTorch implementation of the YOLO (You Only Look Once) v2

Overview

PyTorch implementation of the YOLO (You Only Look Once) v2

The YOLOv2 is one of the most popular one-stage object detector. This project adopts PyTorch as the developing framework to increase productivity, and utilize ONNX to convert models into Caffe 2 to benefit engineering deployment. If you are benefited from this project, a donation will be appreciated (via PayPal, 微信支付 or 支付宝).

Designs

  • Flexible configuration design. Program settings are configurable and can be modified (via configure file overlaping (-c/--config option) or command editing (-m/--modify option)) using command line argument.

  • Monitoring via TensorBoard. Such as the loss values and the debugging images (such as IoU heatmap, ground truth and predict bounding boxes).

  • Parallel model training design. Different models are saved into different directories so that can be trained simultaneously.

  • Using a NoSQL database to store evaluation results with multiple dimension of information. This design is useful when analyzing a large amount of experiment results.

  • Time-based output design. Running information (such as the model, the summaries (produced by TensorBoard), and the evaluation results) are saved periodically via a predefined time.

  • Checkpoint management. Several latest checkpoint files (.pth) are preserved in the model directory and the older ones are deleted.

  • NaN debug. When a NaN loss is detected, the running environment (data batch) and the model will be exported to analyze the reason.

  • Unified data cache design. Various dataset are converted into a unified data cache via corresponding cache plugins. Some plugins are already implemented. Such as PASCAL VOC and MS COCO.

  • Arbitrarily replaceable model plugin design. The main deep neural network (DNN) can be easily replaced via configuration settings. Multiple models are already provided. Such as Darknet, ResNet, Inception v3 and v4, MobileNet and DenseNet.

  • Extendable data preprocess plugin design. The original images (in different sizes) and labels are processed via a sequence of operations to form a training batch (images with the same size, and bounding boxes list are padded). Multiple preprocess plugins are already implemented. Such as augmentation operators to process images and labels (such as random rotate and random flip) simultaneously, operators to resize both images and labels into a fixed size in a batch (such as random crop), and operators to augment images without labels (such as random blur, random saturation and random brightness).

Feautures

  • Reproduce the original paper's training results.
  • Multi-scale training.
  • Dimension cluster.
  • Darknet model file (.weights) parser.
  • Detection from image and camera.
  • Processing Video file.
  • Multi-GPU supporting.
  • Distributed training.
  • Focal loss.
  • Channel-wise model parameter analyzer.
  • Automatically change the number of channels.
  • Receptive field analyzer.

Quick Start

This project uses Python 3. To install the dependent libraries, type the following command in a terminal.

sudo pip3 install -r requirements.txt

quick_start.sh contains the examples to perform detection and evaluation. Run this script. Multiple datasets and models (the original Darknet's format, will be converted into PyTorch's format) will be downloaded (aria2 is required). These datasets are cached into different data profiles, and the models are evaluated over the cached data. The models are used to detect objects in an example image, and the detection results will be shown.

License

This project is released as the open source software with the GNU Lesser General Public License version 3 (LGPL v3).

Owner
申瑞珉 (Ruimin Shen)
申瑞珉 (Ruimin Shen)
Official Implementation of Domain-Aware Universal Style Transfer

Domain Aware Universal Style Transfer Official Pytorch Implementation of 'Domain Aware Universal Style Transfer' (ICCV 2021) Domain Aware Universal St

KibeomHong 80 Dec 30, 2022
Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

playlist-story-builder This project attempts to embed a story into a music playlist by sorting the playlist so that the order of the music follows a n

Dylan R. Ashley 0 Oct 28, 2021
Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors

Gas detection Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors. Description The MQ-2 sensor can detect multiple gases (CO, H2, CH4, LPG,

Filip Š 15 Sep 30, 2022
A simple and extensible library to create Bayesian Neural Network layers on PyTorch.

Blitz - Bayesian Layers in Torch Zoo BLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in Wei

Pi Esposito 722 Jan 08, 2023
Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Utkarsh Agiwal 1 Feb 03, 2022
Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

FPS-Net Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation", accepted by ISPRS journal of Photogrammetry

15 Nov 30, 2022
Collection of NLP model explanations and accompanying analysis tools

Thermostat is a large collection of NLP model explanations and accompanying analysis tools. Combines explainability methods from the captum library wi

126 Nov 22, 2022
BLEURT is a metric for Natural Language Generation based on transfer learning.

BLEURT: a Transfer Learning-Based Metric for Natural Language Generation BLEURT is an evaluation metric for Natural Language Generation. It takes a pa

Google Research 492 Jan 05, 2023
Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Flybird | English Version 行为驱动开发(Behavior-driven development,缩写BDD),是一种软件过程的思想或者

Ctrip, Inc. 706 Dec 30, 2022
An end-to-end implementation of intent prediction with Metaflow and other cool tools

You Don't Need a Bigger Boat An end-to-end (Metaflow-based) implementation of an intent prediction flow for kids who can't MLOps good and wanna learn

Jacopo Tagliabue 614 Dec 31, 2022
Official implement of "CAT: Cross Attention in Vision Transformer".

CAT: Cross Attention in Vision Transformer This is official implement of "CAT: Cross Attention in Vision Transformer". Abstract Since Transformer has

100 Dec 15, 2022
[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

RainNet — Official Pytorch Implementation Region-aware Adaptive Instance Normalization for Image Harmonization Jun Ling, Han Xue, Li Song*, Rong Xie,

130 Dec 11, 2022
NeuroGen: activation optimized image synthesis for discovery neuroscience

NeuroGen: activation optimized image synthesis for discovery neuroscience NeuroGen is a framework for synthesizing images that control brain activatio

3 Aug 17, 2022
An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance"

Lidar-Segementation An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance" from

Wangxu1996 135 Jan 06, 2023
A synthetic texture-invariant dataset for object detection of UAVs

A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas

LARICS Lab 10 Aug 13, 2022
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Polyp-PVT by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao. This repo is the official implementation of "Polyp-PVT: Polyp Se

Deng-Ping Fan 102 Jan 05, 2023
MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system

MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system Getting started To start working on this assignment, you should

2 Aug 06, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

程星 87 Dec 24, 2022
Pytorch implementation of DeePSiM

Pytorch implementation of DeePSiM

1 Nov 05, 2021