Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Overview

Training-code-of-STM

This repository fully reproduces Space-Time Memory Networks image

Performance on Davis17 val set&Weights

backbone training stage training dataset J&F J F weights
Ours resnet-50 stage 1 MS-COCO 69.5 67.8 71.2 link
Origin resnet-50 stage 2 MS-COCO -> Davis&Youtube-vos 81.8 79.2 84.3 link
Ours resnet-50 stage 2 MS-COCO -> Davis&Youtube-vos 82.0 79.7 84.4 link
Ours resnest-101 stage 2 MS-COCO -> Davis&Youtube-vos 84.6 82.0 87.2 link

Requirements

  • Python >= 3.6
  • Pytorch 1.5
  • Numpy
  • Pillow
  • opencv-python
  • imgaug
  • scipy
  • tqdm
  • pandas
  • resnest

Datasets

MS-COCO

We use MS-COCO's instance segmentation part to generate pseudo video sequence. Specifically, we cut out the objects in one image and paste them on another one. Then we perform different affine transformations on the foreground objects and the background image. If you want to visualize some of the processed training frame sequence:

python dataset/coco.py -Ddavis "path to davis" -Dcoco "path to coco" -o "path to output dir"

image image

DAVIS

Youtube-VOS

Structure

 |- data
      |- Davis
          |- JPEGImages
          |- Annotations
          |- ImageSets
      
      |- Youtube-vos
          |- train
          |- valid
          
      |- Ms-COCO
          |- train2017
          |- annotations
              |- instances_train2017.json

Demo

python demo.py -g "gpu id" -s "set" -y "year" -D "path to davis" -p "path to weights" -backbone "[resnet50,resnet18,resnest101]"
#e.g.
python demo.py -g 0 -s val -y 17 -D ../data/Davis/ -p /smart/haochen/cvpr/0628_resnest_aspp/davis_youtube_resnest101_699999.pth -backbone resnest101
bmx-trees.mp4

Training

Stage 1

Pretraining on MS-COCO.

python train_coco.py -Ddavis "path to davis" -Dcoco "path to coco" -backbone "[resnet50,resnet18]" -save "path to checkpoints"
#e.g.
python train_coco.py -Ddavis ../data/Davis/ -Dcoco ../data/Ms-COCO/ -backbone resnet50 -save ../coco_weights/

Stage 2

Training on Davis&Youtube-vos.

python train_davis.py -Ddavis "path to davis" -Dyoutube "path to youtube-vos" -backbone "[resnet50,resnet18]" -save "path to checkpoints" -resume "path to coco pretrained weights"
#e.g. 
train_davis.py -Ddavis ../data/Davis/ -Dyoutube ../data/Youtube-vos/ -backbone resnet50 -save ../davis_weights/ -resume ../coco_weights/coco_pretrained_resnet50_679999.pth

Evaluation

Evaluating on Davis 2017&2016 val set.

python eval.py -g "gpu id" -s "set" -y "year" -D "path to davis" -p "path to weights" -backbone "[resnet50,resnet18,resnest101]"
#e.g.
python eval.py -g 0 -s val -y 17 -D ../data/davis -p ../davis_weights/davis_youtube_resnet50_799999.pth -backbone resnet50
python eval.py -g 0 -s val -y 17 -D ../data/davis -p ../davis_weights/davis_youtube_resnest101_699999.pth -backbone resnest101

Notes

  • STM is an attention-based implicit matching architecture, which needs large amounts of data for training. The first stage of training is necessary if you want to get better results.
  • Training takes about three days on a single NVIDIA 2080Ti. There is no log during training, you could add logs if you need.
  • Due to time constraints, the code is a bit messy and need to be optimized. Questions and suggestions are welcome.

Acknowledgement

This codebase borrows the code and structure from official STM repository

Citing STM

@inproceedings{oh2019video,
  title={Video object segmentation using space-time memory networks},
  author={Oh, Seoung Wug and Lee, Joon-Young and Xu, Ning and Kim, Seon Joo},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9226--9235},
  year={2019}
}
Owner
haochen wang
haochen wang
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 6.4k Jan 09, 2023
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
jiant is an NLP toolkit

jiant is an NLP toolkit The multitask and transfer learning toolkit for natural language processing research Why should I use jiant? jiant supports mu

ML² AT CILVR 1.5k Jan 04, 2023
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 01, 2023
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 358 Dec 24, 2022
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

164 Jan 02, 2023
Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

Francesco Saverio Zuppichini 61 Oct 26, 2022
ACL'22: Structured Pruning Learns Compact and Accurate Models

☕ CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

Princeton Natural Language Processing 130 Jan 04, 2023
Speach Recognitions

easy_meeting Добро пожаловать в интерфейс сервиса автопротоколирования совещаний Easy Meeting. Website - http://cf5c-62-192-251-83.ngrok.io/ Принципиа

Maksim 3 Feb 18, 2022
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022
hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

DevMysT 5 Jul 17, 2022
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

Phil Wang 66 Dec 29, 2022
IEEEXtreme15.0 Questions And Answers

IEEEXtreme15.0 Questions And Answers IEEEXtreme is a global challenge in which teams of IEEE Student members – advised and proctored by an IEEE member

Dilan Perera 15 Oct 24, 2022
NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

NeoDaysPlus Reduced contrast, expanded, and continuously developed version of the CDDA tileset NeoDays that's being completed with new sprites for mis

0 Nov 12, 2022
Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

Salesforce 1.7k Dec 28, 2022
TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

TFPNER TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech Named entity recognition (NER), which aims at identifyin

1 Feb 07, 2022