Xview3 solution - XView3 challenge, 2nd place solution

Overview

Xview3, 2nd place solution

https://iuu.xview.us/

test split aggregate score
public 0.593
holdout 0.604

Inference

To reproduce the submission results, first you need to install the required packages. The easiest way is to use docker to build an image or pull a prebuilt docker image.

Prebuilt docker image

One can pull the image from docker hub and use it for inference docker pull selimsefhub/xview3:mse_v2l_v2l_v3m_nf_b7_r34

Inference specification is the same as for XView reference solution

docker run --shm-size 16G --gpus=1 --mount type=bind,source=/home/xv3data,target=/on-docker/xv3data selimsefhub/xview3:mse_v2l_v2l_v3m_nf_b7_r34 /on-docker/xv3data/ 0157baf3866b2cf9v /on-docker/xv3data/prediction/prediction.csv

Build from scratch

docker build -t xview3 .

Training

For training I used an instance with 4xRTX A6000. For GPUs with smaller VRAM you will need to reduce crop sizes in configurations. As I did not make small tiles of large tiff and used memmap instead, fast disks like M.2 (ideally in raid0) should be used.

To reproduce training from scratch:

  1. build docker image as described above
  2. run docker image with modified entrypoint, e.g. docker run --rm --network=host --entrypoint /bin/bash --gpus all --ipc host -v /mnt:/mnt -it xview3:latest
  3. run ./train_all.sh NUM_GPUS DATA_DIR SHORE_DIR VAL_OUT_DIR, where DATA_DIR is the root directory with the dataset, SHORE_DIR path to shoreline data for validation set, VAL_OUT_DIR any path where csv prediction will be stored on evaluation phase after each epoch
  4. example ./train_all.sh 4 /mnt/md0/datasets/xview3/ /mnt/md0/datasets/xview3/shoreline/validation /mnt/md0/datasets/xview3/oof/
  5. it will overwrite existing weights under weights directory in container

Training time

As I used full resolution segmentation it was quite slow, 9-15 hours per model on 4 gpus.

Solution approach overview

Maritime object detection can be transformed to a binary segmentation and regressing problem using UNet like convolutional neural networks with the multiple outputs.

Targets

Model architecture and outputs

Generally I used UNet like encoder-decoder model with the following backbones:

  • EfficientNet V2 L - best performing
  • EfficientNet V2 M
  • EfficientNet B7
  • NFNet L0 (variant implemented by Ross Wightman). Works great with small batches due to absence of BatchNorm layers.
  • Resnet34

For the decoder I used standard UNet decoder with nearest upsampling without batch norm. SiLU was used as activation for convolutional layers. I used full resolution prediction for the masks.

Detection

Centers of objects are predicted as gaussians with sigma=2 pixels. Values are scaled between 0-255. Quality of dense gaussians is the most important part to obtain high aggregate score. During the competition I played with different loss functions with varied success:

  • Pure MSE loss - had high precision but low recall which was not good enough for the F1 score
  • MAE loss did not produce acceptable results
  • Thresholded MSE with sum reduction showed best results. Low value predictions did not play any role for the model's quality, so they are ignored. Though loss weight needed to be tuned properly.

Vessel classification

Vessel masks were prepared as binary round objects with fixed radius (4 pixels) Missing vessel value was transformed to 255 mask that was ignored in the loss function. As a loss function I used combination of BCE, Focal and SoftDice losses.

Fishing classification

Fishing masks were prepared the same way as vessel masks

Length estimation

Length mask - round objects with fixed radius and pixel values were set to length of the object. Missing length was ignored in the loss function. As a loss function for length at first I used MSE but then change to the loss function that directly reflected the metric. I.e.length_loss = abs(target - predicted_value)/target

Training procedure

Data

I tried to use train data split but annotation quality is not good enough and even pretraining on full train set and the finetuning on validation data was not better than simply using only validation data. In the end I used pure validation data with small holdout sets for evaluation. In general there was a data leak between val/train/test splits and I tried to use clean non overlapping validation which did not help and did not represent public scores well.
Data Leak

Optimization

Usually AdamW converges faster and provides better metrics for binary segmentation problems but it is prone to unstable training in mixed precision mode (NaNs/Infs in loss values). That's why as an optimizer I used SGD with the following parameters:

  • initial learning rate 0.003
  • cosine LR decay
  • weight decay 1e-4
  • nesterov momentum
  • momentum=0.9

For each model there were around 20-30k iterations. As I used SyncBN and 4 GPUs batch size=2 was good enough and I used larger crops instead of large batch size.

Inference

I used overlap inference with slices of size 3584x3584 and overlap 704 pixels. To reduce memory footprint predictions were transformed to uint8 and float16 data type before prostprocessing. See inference/run_inference.py for details.

Postprocessing

After center, vessel, fishing, length pixel masks are predicted they need to be transformed to detections in CSV format. From center gaussians I just used tresholding and found connected components. Each component is considered as a detected object. I used centroids of objects to obtain mean values for vessel/fishing/lengths from the respective masks.

Data augmentations

I only used random crops and random rotate 180. Ideally SAR orientation should be provided with the data (as in Spacenet 6 challenge) because SAR artifacts depend on Satellite direction.

Data acquisition, processing, and manipulation

Input

  • 2 SAR channels (VV, VH)
  • custom normalization (Intensity + 40)/15
  • missing pixel values changed to -100 before normalization

Spatial resolution of the supplementary data is very low and doesn't bring any value to the models.

During training and inference I used tifffile.memmap and cropped data from memory mapped file in order to avoid tile splitting.

You might also like...
4th place solution for the SIGIR 2021 challenge.

SIGIR-2021 (Tinkoff.AI) How to start Download train and test data: https://sigir-ecom.github.io/data-task.html Place it under sigir-2021/data/. Run py

 Meli Data Challenge 2021 - First Place Solution
Meli Data Challenge 2021 - First Place Solution

My solution for the Meli Data Challenge 2021

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.
The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

SwinTransformer + OBBDet The sixth place winning solution (6/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022
4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Feedback Prize - Evaluating Student Writing This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

Owner
Selim Seferbekov
Selim Seferbekov
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

67 Dec 05, 2022
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 71 Dec 28, 2022
Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

Orbivator_AI Breast Cancer Wisconsin (Diagnostic) GOAL To Determine which features of data (measurements) are most important for diagnosing breast can

anurag kumar singh 1 Jan 02, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020 BibTeX @INPROCEEDINGS{punnappurath2020modeling, author={Abhi

Abhijith Punnappurath 22 Oct 01, 2022
OCR-D wrapper for detectron2 based segmentation models

ocrd_detectron2 OCR-D wrapper for detectron2 based segmentation models Introduction Installation Usage OCR-D processor interface ocrd-detectron2-segm

Robert Sachunsky 13 Dec 06, 2022
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022
Simple Tensorflow implementation of "Adaptive Convolutions for Structure-Aware Style Transfer" (CVPR 2021)

AdaConv — Simple TensorFlow Implementation [Paper] : Adaptive Convolutions for Structure-Aware Style Transfer (CVPR 2021) Note This repository does no

Junho Kim 26 Nov 18, 2022
Riemann Noise Injection With PyTorch

Riemann Noise Injection - PyTorch A module for modeling GAN noise injection based on Riemann geometry, as described in Ruili Feng, Deli Zhao, and Zhen

2 May 27, 2022
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Wav2CLIP 🚧 WIP 🚧 Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP 📄 🔗 Ho-Hsiang Wu, Prem Seetharaman

Descript 240 Dec 13, 2022
A toolset for creating Qualtrics-based IAT experiments

Qualtrics IAT Tool A web app for generating the Implicit Association Test (IAT) running on Qualtrics Online Web App The app is hosted by Streamlit, a

0 Feb 12, 2022
EfficientNetV2-with-TPU - Cifar-10 case study

EfficientNetV2-with-TPU EfficientNet EfficientNetV2 adalah jenis jaringan saraf convolutional yang memiliki kecepatan pelatihan lebih cepat dan efisie

Sultan syach 1 Dec 28, 2021
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors,

24 Oct 26, 2022
A PyTorch implementation of deep-learning-based registration

DiffuseMorph Implementation A PyTorch implementation of deep-learning-based registration. Requirements OS : Ubuntu / Windows Python 3.6 PyTorch 1.4.0

24 Jan 03, 2023
Music Generation using Neural Networks Streamlit App

Music_Gen_Streamlit "Music Generation using Neural Networks" Streamlit App TO DO: Make a run_app.sh Introduction [~5 min] (Sohaib) Team Member names/i

Muhammad Sohaib Arshid 6 Aug 09, 2022
Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix Tuning Files: . ├── gpt2 # Code for GPT2 style autoregressive LM │ ├── train_e2e.py # high-level script

530 Jan 04, 2023
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
A PyTorch Library for Accelerating 3D Deep Learning Research

Kaolin: A Pytorch Library for Accelerating 3D Deep Learning Research Overview NVIDIA Kaolin library provides a PyTorch API for working with a variety

NVIDIA GameWorks 3.5k Jan 07, 2023
This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Dugh-NeurIPS-2021 This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroi

Ali Hashemi 5 Jul 12, 2022