SwinTransformer + OBBDet

The sixth place winning solution (6/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2021 Gaofen Challenge on Automated High-Resolution Earth Observation Image Interpretation.

Members

Qi Ming, Junjie Song, Yunpeng Dong.

Solution

Off-line date augmentation
We use random combination of affine transformation, flip, scaling, optical distortion for data augmentation.
Multi-scale training and testing
The training images are resized into sizes of 600, 800, and 1024 for training and testing.
Strong backbone
Swin transformer is adopt in ORCNN and RoI Transformer for better performance.
Model ensemble
We have merged the results from RoI Transformer, ORCNN, S2ANet, and ReDet.
Lower confidence
Set the output threshold into 0.005.

Tried but didn't work

Soft-NMS.
Adjust NMS threshold.
Class-agnostic NMS.
Mosaic, and mix up for data augmentation.
Oversample the categories with fewer instances.
Train the detectors for specific classes with low AP.
Multi-scale training and testing on SwinTransformer-based detectors (even dropped by about 1% mAP).

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

Related tags

Overview

SwinTransformer + OBBDet

Members

Solution

Tried but didn't work

Detections

Owner

ming71

Code for project: "Learning to Minimize Remainder in Supervised Learning".

PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Contains supplementary materials for reproduce results in HMC divergence time estimation manuscript

Bayesian Meta-Learning Through Variational Gaussian Processes

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

This repository contains code released by Google Research.

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Newt - a Gaussian process library in JAX.

Safe Policy Optimization with Local Features

A lightweight tool to get an AI Infrastructure Stack up in minutes not days.

My implementation of DeepMind's Perceiver

(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

A project that uses optical flow and machine learning to detect aimhacking in video clips.

The implementation for "Comprehensive Knowledge Distillation with Causal Intervention".

A deep neural networks for images using CNN algorithm.

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Person Re-identification

The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".