Official implementation of TMANet.

Related tags

Deep LearningTMANet
Overview

Temporal Memory Attention for Video Semantic Segmentation, arxiv

PWC PWC

Introduction

We propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50. (Accepted by ICIP2021)

If this codebase is helpful for you, please consider give me a star โญ ๐Ÿ˜Š .

image

Updates

2021/1: TMANet training and evaluation code released.

2021/6: Update README.md:

  • adding some Camvid dataset download links;
  • update 'camvid_video_process.py' script.

Usage

  • Install mmseg

    • Please refer to mmsegmentation to get installation guide.
    • This repository is based on mmseg-0.7.0 and pytorch 1.6.0.
  • Clone the repository

    git clone https://github.com/wanghao9610/TMANet.git
    cd TMANet
    pip install -e .
  • Prepare the datasets

    • Download Cityscapes dataset and Camvid dataset.

    • For Camvid dataset, we need to extract frames from downloaded videos according to the following steps:

      • Download the raw video from here, in which I provide a google drive link to download.
      • Put the downloaded raw video(e.g. 0016E5.MXF, 0006R0.MXF, 0005VD.MXF, 01TP_extract.avi) to ./data/camvid/raw .
      • Download the extracted images and labels from here and split.txt file from here, untar the tar.gz file to ./data/camvid , and we will get two subdirs "./data/camvid/images" (stores the images with annotations), and "./data/camvid/labels" (stores the ground truth for semantic segmentation). Reference the following shell command:
        cd TMANet
        cd ./data/camvid
        wget https://drive.google.com/file/d/1FcVdteDSx0iJfQYX2bxov0w_j-6J7plz/view?usp=sharing
        # or first download on your PC then upload to your server.
        tar -xf camvid.tar.gz 
      • Generate image_sequence dir frame by frame from the raw videos. Reference the following shell command:
        cd TMANet
        python tools/convert_datasets/camvid_video_process.py
    • For Cityscapes dataset, we need to request the download link of 'leftImg8bit_sequence_trainvaltest.zip' from Cityscapes dataset official webpage.

    • The converted/downloaded datasets store on ./data/camvid and ./data/cityscapes path.

      File structure of video semantic segmentation dataset is as followed.

      โ”œโ”€โ”€ data                                              โ”œโ”€โ”€ data                              
      โ”‚   โ”œโ”€โ”€ cityscapes                                    โ”‚   โ”œโ”€โ”€ camvid                        
      โ”‚   โ”‚   โ”œโ”€โ”€ gtFine                                    โ”‚   โ”‚   โ”œโ”€โ”€ images                    
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{img_suffix}                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{img_suffix}                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{img_suffix}                   โ”‚   โ”‚   โ”œโ”€โ”€ annotations               
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train.txt             
      โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit                               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val.txt               
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ test.txt              
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{seg_map_suffix}               โ”‚   โ”‚   โ”œโ”€โ”€ labels                    
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{seg_map_suffix}               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{seg_map_suffix}   
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{seg_map_suffix}               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{seg_map_suffix}   
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{seg_map_suffix}   
      โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit_sequence                      โ”‚   โ”‚   โ”œโ”€โ”€ image_sequence            
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{sequence_suffix}              โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{sequence_suffix}              โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{sequence_suffix}              
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   
      
  • Evaluation

    • Download the trained models for Cityscapes and Camvid. And put them on ./work_dirs/{config_file}
    • Run the following command(on Cityscapes):
    sh eval.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py
  • Training

    • Please download the pretrained ResNet-50 model, and put it on ./init_models .
    • Run the following command(on Cityscapes):
    sh train.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py

    Note: the above evaluation and training shell commands execute on Cityscapes, if you want to execute evaluation or training on Camvid, please replace the config file on the shell command with the config file of Camvid.

Citation

If you find TMANet is useful in your research, please consider citing:

@misc{wang2021temporal,
    title={Temporal Memory Attention for Video Semantic Segmentation}, 
    author={Hao Wang and Weining Wang and Jing Liu},
    year={2021},
    eprint={2102.08643},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Thanks mmsegmentation contribution to the community!

Owner
wanghao
wanghao
TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling This is the official code release for the paper 'TiP-Adapter: Training-fre

peng gao 189 Jan 04, 2023
Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Renato Almeida de Oliveira 18 Aug 31, 2022
Running Google MoveNet Multipose Tracking models on OpenVINO.

MoveNet MultiPose Tracking on OpenVINO

60 Nov 17, 2022
Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

1 Nov 12, 2021
Reinforcement Learning via Supervised Learning

Reinforcement Learning via Supervised Learning Installation Run pip install -e . in an environment with Python = 3.7.0, 3.9. The code depends on MuJ

Scott Emmons 49 Nov 28, 2022
Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

SNN_Calibration Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021 Feature Comparison of SNN calibration: Features SNN Direct Tr

Yuhang Li 60 Dec 27, 2022
A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API

Timbre Dissimilarity Metrics A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API Installation pip install -e . Usag

Ben Hayes 21 Jan 05, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

536 Dec 20, 2022
A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

BeatNet A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking. This repository

Mojtaba Heydari 157 Dec 27, 2022
Learnable Motion Coherence for Correspondence Pruning

Learnable Motion Coherence for Correspondence Pruning Yuan Liu, Lingjie Liu, Cheng Lin, Zhen Dong, Wenping Wang Project Page Any questions or discussi

liuyuan 41 Nov 30, 2022
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

3 Aug 08, 2021
Boston House Prediction Valuation Tool

Boston-House-Prediction-Valuation-Tool From Below Anlaysis The Valuation Tool is Designed Correlation Matrix Regrssion Analysis Between Target Vs Pred

0 Sep 09, 2022
PSGAN running with ncnnโšกๅฆ†ๅฎน่ฟ็งป/ไปฟๅฆ†โšกImitation Makeup/Makeup Transferโšก

PSGAN running with ncnnโšกๅฆ†ๅฎน่ฟ็งป/ไปฟๅฆ†โšกImitation Makeup/Makeup Transferโšก

WuJinxuan 144 Dec 26, 2022
PyTorchVideo is a deeplearning library with a focus on video understanding work

PyTorchVideo is a deeplearning library with a focus on video understanding work. PytorchVideo provides resusable, modular and efficient components needed to accelerate the video understanding researc

Facebook Research 2.7k Jan 07, 2023
The official PyTorch code implementation of "Human Trajectory Prediction via Counterfactual Analysis" in ICCV 2021.

Human Trajectory Prediction via Counterfactual Analysis (CausalHTP) The official PyTorch code implementation of "Human Trajectory Prediction via Count

46 Dec 03, 2022
Post-training Quantization for Neural Networks with Provable Guarantees

Post-training Quantization for Neural Networks with Provable Guarantees Authors: Jinjie Zhang ( Yixuan Zhou 2 Nov 29, 2022

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Jan 01, 2023
PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

Yonglong Tian 2.2k Jan 08, 2023