This code provides various models combining dilated convolutions with residual networks

Related tags

Deep Learningdrn
Overview

Overview

This code provides various models combining dilated convolutions with residual networks. Our models can achieve better performance with less parameters than ResNet on image classification and semantic segmentation.

If you find this code useful for your publications, please consider citing

@inproceedings{Yu2017,
    title     = {Dilated Residual Networks},
    author    = {Fisher Yu and Vladlen Koltun and Thomas Funkhouser},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year      = {2017},
}

@inproceedings{Yu2016,
    title     = {Multi-scale context aggregation by dilated convolutions},
    author    = {Yu, Fisher and Koltun, Vladlen},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year      = {2016}
}

Code Highlights

  • The pretrained model can be loaded using Pytorch model zoo api. Example here.
  • Pytorch based image classification and semantic image segmentation.
  • BatchNorm synchronization across multipe GPUs.
  • High-resolution class activiation maps for state-of-the-art weakly supervised object localization.
  • DRN-D-105 gets 76.3% mIoU on Cityscapes with only fine training annotation and no context module.

Image Classification

Image classification is meant to be a controlled study to understand the role of high resolution feature maps in image classification and the class activations rising from it. Based on the investigation, we are able to design more efficient networks for learning high-resolution image representation. They have practical usage in semantic image segmentation, as detailed in image segmentation section.

Models

Comparison of classification error rate on ImageNet validation set and numbers of parameters. It is evaluated on single center 224x224 crop from resized images whose shorter side is 256-pixel long.

Name Top-1 Top-5 Params
ResNet-18 30.4% 10.8% 11.7M
DRN-A-18 28.0% 9.5% 11.7M
DRN-D-22 25.8% 8.2% 16.4M
DRN-C-26 24.9% 7.6% 21.1M
ResNet-34 27.7% 8.7% 21.8M
DRN-A-34 24.8% 7.5% 21.8M
DRN-D-38 23.8% 6.9% 26.5M
DRN-C-42 22.9% 6.6% 31.2M
ResNet-50 24.0% 7.0% 25.6M
DRN-A-50 22.9% 6.6% 25.6M
DRN-D-54 21.2% 5.9% 35.8M
DRN-C-58 21.7% 6.0% 41.6M
ResNet-101 22.4% 6.2% 44.5M
DRN-D-105 20.6% 5.5% 54.8M
ResNet-152 22.2% 6.2% 60.2M

The figure below groups the parameter and error rate comparison based on netwok structures.

comparison

Training and Testing

The code is written in Python using Pytorch. I started with code in torchvision. Please check their license as well if copyright is your concern. Software dependency:

  • Python 3
  • Pillow
  • pytorch
  • torchvision

Note If you want to train your own semantic segmentation model, make sure your Pytorch version is greater than 0.2.0 or includes commit 78020a.

Go to this page to prepare ImageNet 1K data.

To test a model on ImageNet validation set:

python3 classify.py test --arch drn_c_26 -j 4 
   
     --pretrained

   

To train a new model:

python3 classify.py train --arch drn_c_26 -j 8 
   
     --epochs 120

   

Besides drn_c_26, we also provide drn_c_42 and drn_c_58. They are in DRN-C family as described in Dilated Residual Networks. DRN-D models are simplified versions of DRN-C. Their code names are drn_d_22, drn_d_38, drn_d_54, and drn_d_105.

Semantic Image Segmentataion

Models

Comparison of mIoU on Cityscapes and numbers of parameters.

Name mIoU Params
DRN-A-50 67.3% 25.6M
DRN-C-26 68.0% 21.1M
DRN-C-42 70.9% 31.2M
DRN-D-22 68.0% 16.4M
DRN-D-38 71.4% 26.5M
DRN-D-105* 75.6% 54.8M

*trained with poly learning rate, random scaling and rotations.

DRN-D-105 gets 76.3% mIoU on Cityscapes testing set with multi-scale testing, poly learning rate and data augmentation with random rotation and scaling in training. Full results are here.

Prepare Data

The segmentation image data folder is supposed to contain following image lists with names below:

  • train_images.txt
  • train_labels.txt
  • val_images.txt
  • val_labels.txt
  • test_images.txt

The code will also look for info.json in the folder. It contains mean and std of the training images. For example, below is info.json used for training on Cityscapes.

{
    "mean": [
        0.290101,
        0.328081,
        0.286964
    ],
    "std": [
        0.182954,
        0.186566,
        0.184475
    ]
}

Each line in the list is a path to an input image or its label map relative to the segmentation folder.

For example, if the data folder is "/foo/bar" and train_images.txt in it contains

leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png
leftImg8bit/train/aachen/aachen_000001_000019_leftImg8bit.png

and train_labels.txt contrains

gtFine/train/aachen/aachen_000000_000019_gtFine_trainIds.png
gtFine/train/aachen/aachen_000001_000019_gtFine_trainIds.png

Then the first image path is expected at

/foo/bar/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png

and its label map is at

/foo/bar/gtFine/train/aachen/aachen_000000_000019_gtFine_trainIds.png

In training phase, both train_* and val_* are assumed to be in the data folder. In validation phase, only val_images.txt and val_labels.txt are needed. In testing phase, when there are no available labels, only test_images.txt is needed. segment.py has a command line option --phase and the corresponding acceptable arguments are train, val, and test.

To set up Cityscapes data, please check this document.

Optimization Setup

The current segmentation models are trained on basic data augmentation (random crops + flips). The learning rate is changed by steps, where it is decreased by a factor of 10 at each step.

Training

To train a new model, use

python3 segment.py train -d 
   
     -c 
    
      -s 896 \
    --arch drn_d_22 --batch-size 32 --epochs 250 --lr 0.01 --momentum 0.9 \
    --step 100

    
   

category_number is the number of categories in segmentation. It is 19 for Cityscapes and 11 for Camvid. The actual label maps should contain values in the range of [0, category_number). Invalid pixels can be labeled as 255 and they will be ignored in training and evaluation. Depends on the batch size, lr and momentum can be 0.01/0.9 or 0.001/0.99.

If you want to train drn_d_105 to achieve best results on cityscapes dataset, you need to turn on data augmentation and use poly learning rate:

python3 segment.py train -d 
   
     -c 19 -s 840 --arch drn_d_105 --random-scale 2 --random-rotate 10 --batch-size 16 --epochs 500 --lr 0.01 --momentum 0.9 -j 16 --lr-mode poly --bn-sync

   

Note:

  • If you use 8 GPUs for 16 crops per batch, the memory for each GPU is more than 12GB. If you don't have enough GPU memory, you can try smaller batch size or crop size. Smaller crop size usually hurts the performance more.
  • Batch normalization synchronization across multiple GPUs is necessary to train very deep convolutional networks for semantic segmentation. We provide an implementation as a pytorch extenstion in lib/. However, it is not for the faint-hearted to build from scratch, although an Makefile is provided. So a built binary library for 64-bit Ubuntu is provided. It is tested on Ubuntu 16.04. Also remember to add lib/ to your PYTHONPATH.

Testing

Evaluate models on testing set or any images without ground truth labels using our related pretrained model:

python3 segment.py test -d 
   
     -c 
    
      --arch drn_d_22 \
    --pretrained 
     
       --phase test --batch-size 1

     
    
   

You can download the pretrained DRN models on Cityscapes here: http://go.yf.io/drn-cityscapes-models.

If you want to evaluate a checkpoint from your own training, use --resume instead of --pretrained:

python3 segment.py test -d 
   
     -c 
    
      --arch drn_d_22 \
    --resume 
     
       --phase test --batch-size 1

     
    
   

You can also turn on multi-scale testing for better results by adding --ms:

python3 segment.py test -d 
   
     -c 
    
      --arch drn_d_105 \
    --resume 
     
       --phase val --batch-size 1 --ms

     
    
   
"Projelerle Yapay Zeka Ve Bilgisayarlı Görü" Kitabımın projeleri

"Projelerle Yapay Zeka Ve Bilgisayarlı Görü" Kitabımın projeleri Bu Github Reposundaki tüm projeler; kaleme almış olduğum "Projelerle Yapay Zekâ ve Bi

Ümit Aksoylu 4 Aug 03, 2022
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
Semi-supervised Implicit Scene Completion from Sparse LiDAR

Semi-supervised Implicit Scene Completion from Sparse LiDAR Paper Created by Pengfei Li, Yongliang Shi, Tianyu Liu, Hao Zhao, Guyue Zhou and YA-QIN ZH

114 Nov 30, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

73 Nov 06, 2022
Generate image analogies using neural matching and blending

neural image analogies This is basically an implementation of this "Image Analogies" paper, In our case, we use feature maps from VGG16. The patch mat

Adam Wentz 3.5k Jan 08, 2023
PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Future urban scene generation through vehicle synthesis This repository contains Pytorch code for the ICPR2020 paper "Future Urban Scene Generation Th

Alessandro Simoni 4 Oct 11, 2021
Generate pixel-style avatars with python.

face2pixel Generate pixel-style avatars with python. Run: Clone the project: git clone https://github.com/theodorecooper/face2pixel install requiremen

Theodore Cooper 2 May 11, 2022
Learning Features with Parameter-Free Layers (ICLR 2022)

Learning Features with Parameter-Free Layers (ICLR 2022) Dongyoon Han, YoungJoon Yoo, Beomyoung Kim, Byeongho Heo | Paper NAVER AI Lab, NAVER CLOVA Up

NAVER AI 65 Dec 07, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
Bayesian Meta-Learning Through Variational Gaussian Processes

vmgp This is the repository of Vivek Myers and Nikhil Sardana for our CS 330 final project, Bayesian Meta-Learning Through Variational Gaussian Proces

Vivek Myers 2 Nov 17, 2022
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 04, 2023
YOLO-v5 기반 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adaptive Cruise Control 기능 구현

자율 주행차의 영상 기반 차간거리 유지 개발 Table of Contents 프로젝트 소개 주요 기능 시스템 구조 디렉토리 구조 결과 실행 방법 참조 팀원 프로젝트 소개 YOLO-v5 기반으로 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adap

14 Jun 29, 2022
Unet network with mean teacher for altrasound image segmentation

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022
An implementation of Equivariant e2 convolutional kernals into a convolutional self attention network, applied to radio astronomy data.

EquivariantSelfAttention An implementation of Equivariant e2 convolutional kernals into a convolutional self attention network, applied to radio astro

2 Nov 09, 2021
Invertible conditional GANs for image editing

Invertible Conditional GANs This is the implementation of the IcGAN model proposed in our paper: Invertible Conditional GANs for image editing. Novemb

Guim 278 Dec 12, 2022
Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

Tushar Sarkar 27 Dec 20, 2022
Python script that allows you to automatically setup your Growtopia server.

AutoSetup Python script that allows you to automatically setup your Growtopia server. How To Use Firstly, install all the required modules that used i

Aspire 3 Mar 06, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 334 Dec 23, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It or

airctic 789 Dec 29, 2022