Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Last update: Dec 13, 2022

Overview

Oriented RepPoints for Aerial Object Detection

The code for the implementation of “Oriented RepPoints + Swin Transformer/ReResNet”.

Introduction

Based on the Oriented Reppoints detector with Swin Transformer backbone, the 3rd Place is achieved on the Task 1 and the 2nd Place is achieved on the Task 2 of 2021 challenge of Learning to Understand Aerial Images (LUAI) held on ICCV’2021. The detailed information is introduced in this paper of "LUAI Challenge 2021 on Learning to Understand Aerial Images, ICCVW2021".

New Feature

BackBone: add Swin-Transformer, ReResNet
DataAug: add Mosaic4or9, Mixup, HSV, RandomPerspective, RandomScaleCrop

Installation

Please refer to for installation and dataset preparation.

Getting Started

This repo is based on . Please see for the basic usage.

Results and Models

The results on DOTA test-dev set are shown in the table below(password:aabb/swin/ABCD). More detailed results please see the paper.

Model	Backbone	MS	DataAug	DOTAv1 mAP	DOTAv2 mAP	Download
OrientedReppoints	R-50	-	-	75.68	-	baidu(aabb)
OrientedReppoints	R-101	-	√	76.21	-	baidu(aabb)
OrientedReppoints	R-101	√	√	78.12	-	baidu(aabb)
OrientedReppoints	SwinT-tiny	-	√	-	-	-

ImageNet-1K and ImageNet-22K Pretrained Models

name	pretrain	resolution	[email protected]	[email protected]	#params	FLOPs	FPS	22K model	1K model	Need to turn read version
Swin-T	ImageNet-1K	224x224	81.2	95.5	28M	4.5G	755	-	github/baidu(swin)/config	✔
Swin-S	ImageNet-1K	224x224	83.2	96.2	50M	8.7G	437	-	github/baidu(swin)/config	✔
Swin-B	ImageNet-1K	224x224	83.5	96.5	88M	15.4G	278	-	github/baidu(swin)/config	✔
Swin-B	ImageNet-1K	384x384	84.5	97.0	88M	47.1G	85	-	github/baidu(swin)/test-config	✔
Swin-B	ImageNet-22K	224x224	85.2	97.5	88M	15.4G	278	github/baidu(swin)	github/baidu(swin)/test-config	✔
Swin-B	ImageNet-22K	384x384	86.4	98.0	88M	47.1G	85	github/baidu(swin)	github/baidu(swin)/test-config	✔
Swin-L	ImageNet-22K	224x224	86.3	97.9	197M	34.5G	141	github/baidu(swin)	github/baidu(swin)/test-config	✔
Swin-L	ImageNet-22K	384x384	87.3	98.2	197M	103.9G	42	github/baidu(swin)	github/baidu(swin)/test-config	✔
ReResNet50	ImageNet-1K	224x224	71.20	90.28	-	-	-	-	google/baidu(ABCD)/log	-

The mAOE results on DOTAv1 val set are shown in the table below(password:aabb).

Model	Backbone	mAOE	Download
OrientedReppoints	R-50	5.93°	baidu(aabb)

Note：

Wtihout the ground-truth of test subset, the mAOE of orientation evaluation is calculated on the val subset(original train subset for training).
The orientation (angle) of an aerial object is define as below, the detail of mAOE, please see the paper. The code of mAOE is mAOE_evaluation.py.

Visual results

The visual results of learning points and the oriented bounding boxes. The visualization code is .

Learning points

Oriented bounding box

Citation

@article{Li2021oriented,
  title={Oriented RepPoints for Aerial Object Detection},
  author={Wentong Li and Jianke Zhu},
  journal={arXiv preprint arXiv:2105.11111},
  year={2021}
}

Acknowledgements

I have used utility functions from other wonderful open-source projects. Espeicially thank the authors of:

OrientedRepPoints

Swin-Transformer-Object-Detection

ReDet

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Related tags

Overview

Oriented RepPoints for Aerial Object Detection

Introduction

New Feature

Installation

Getting Started

Results and Models

Visual results

Citation

Acknowledgements

Owner

A synthetic texture-invariant dataset for object detection of UAVs

🙄 Difficult algorithm, Simple code.

Robustness between the worst and average case

Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

This is an official source code for implementation on Extensive Deep Temporal Point Process

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

Repository for the NeurIPS 2021 paper: "Exploiting Domain-Specific Features to Enhance Domain Generalization".

Naszilla is a Python library for neural architecture search (NAS)

SimplEx - Explaining Latent Representations with a Corpus of Examples

ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Instantaneous Motion Generation for Robots and Machines.

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21)

[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

Simulation of self-focusing of laser beams in condensed media

Group Activity Recognition with Clustered Spatial Temporal Transformer

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)