Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Overview

Polyp-PVT

by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao.

This repo is the official implementation of "Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers".

1. Introduction

Polyp-PVT is initially described in arxiv.

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing effective mechanism for fusing these features. Different from existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three novel modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), a and similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features. With the help of the SAM, we extend the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT , effectively suppresses noises in the features and significantly improves their expressive capabilities.

Polyp-PVT achieves strong performance on image-level polyp segmentation (0.808 mean Dice and 0.727 mean IoU on ColonDB) and video polyp segmentation (0.880 mean dice and 0.802 mean IoU on CVC-300-TV), surpassing previous models by a large margin.

2. Framework Overview

3. Results

3.1 Image-level Polyp Segmentation

3.2 Image-level Polyp Segmentation Compared Results:

We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:nhhv], including our results and that of compared models.

3.3 Video Polyp Segmentation

3.4 Video Polyp Segmentation Compared Results:

We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:33ie], including our results and that of compared models.

4. Usage:

4.1 Recommended environment:

Python 3.8
Pytorch 1.7.1
torchvision 0.8.2

4.2 Data preparation:

Downloading training and testing datasets and move them into ./dataset/, which can be found in this Google Drive/Baidu Drive [code:dr1h].

4.3 Pretrained model:

You should download the pretrained model from Google Drive/Baidu Drive [code:w4vk], and then put it in the './pretrained_pth' folder for initialization.

4.4 Training:

Clone the repository:

git clone https://github.com/DengPingFan/Polyp-PVT.git
cd Polyp-PVT 
bash train.sh

4.5 Testing:

cd Polyp-PVT 
bash test.sh

4.6 Evaluating your trained model:

Matlab: Please refer to the work of MICCAI2020 (link).

Python: Please refer to the work of ACMMM2021 (link).

Please note that we use the Matlab version to evaluate in our paper.

4.7 Well trained model:

You could download the trained model from Google Drive/Baidu Drive [code:9rpy] and put the model in directory './model_pth'.

4.8 Pre-computed maps:

Google Drive/Baidu Drive [code:x3jc]

5. Citation:

@aticle{dong2021PolypPVT,
  title={Polyp-PVT: Polyp Segmentation with PyramidVision Transformers},
  author={Bo, Dong and Wenhai, Wang and Deng-Ping, Fan and Jinpeng, Li and Huazhu, Fu and Ling, Shao},
  journal={arXiv preprint arXiv:2108.06932},
  year={2021}
}

6. Acknowledgement

We are very grateful for these excellent works PraNet, EAGRNet and MSEG, which have provided the basis for our framework.

7. FAQ:

If you want to improve the usability or any piece of advice, please feel free to contact me directly ([email protected]).

8. License

The source code is free for research and education use only. Any comercial use should get formal permission first.

Owner
Deng-Ping Fan
Researcher (PI)
Deng-Ping Fan
multimodal transformer

This repo holds the code to perform experiments with the multimodal autoregressive probabilistic model Transflower. Overview of the repo It is structu

Guillermo Valle 68 Dec 13, 2022
Locationinfo - A script helps the user to show network information such as ip address

Description This script helps the user to show network information such as ip ad

Roxcoder 1 Dec 30, 2021
Reproducing code of hair style replacement method from Barbershorp.

Barbershorp Reproducing code of hair style replacement method from Barbershorp. Also reproduces II2S, an improved version of Image2StyleGAN. Requireme

1 Dec 24, 2021
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

55 Dec 16, 2022
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
TrackFormer: Multi-Object Tracking with Transformers

TrackFormer: Multi-Object Tracking with Transformers This repository provides the official implementation of the TrackFormer: Multi-Object Tracking wi

Tim Meinhardt 321 Dec 29, 2022
Code for the paper: "On the Bottleneck of Graph Neural Networks and Its Practical Implications"

On the Bottleneck of Graph Neural Networks and its Practical Implications This is the official implementation of the paper: On the Bottleneck of Graph

75 Dec 22, 2022
A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

large-scale-ITE-UM-benchmark This repository contains code and data to reproduce the results of the paper "A Large Scale Benchmark for Individual Trea

10 Nov 19, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus ο½€Chen 2 Jun 09, 2022
Recreate CenternetV2 based on MMDET.

Introduction This project is trying to Recreate CenternetV2 based on MMDET, which is proposed in paper Probabilistic two-stage detection. This project

25 Dec 09, 2022
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology Self-Supervised Vision Transformers Learn Visual Concepts in Histopatholog

Richard Chen 95 Dec 24, 2022
Image Lowpoly based on Centroid Voronoi Diagram via python-opencv and taichi

CVTLowpoly: Image Lowpoly via Centroid Voronoi Diagram Image Sharp Feature Extraction using Guide Filter's Local Linear Theory via opencv-python. The

Pupa 4 Jul 29, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 865 Nov 17, 2022
Acute ischemic stroke dataset

AISD Acute ischemic stroke dataset contains 397 Non-Contrast-enhanced CT (NCCT) scans of acute ischemic stroke with the interval from symptom onset to

Kongming Liang 21 Sep 06, 2022
Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO)

KernelFunctionalOptimisation Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO) We have conducted all our experiments

2 Jun 29, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

daniel grzech 14 Nov 21, 2022
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022
tensorflow implementation of 'YOLO : Real-Time Object Detection'

YOLO_tensorflow (Version 0.3, Last updated :2017.02.21) 1.Introduction This is tensorflow implementation of the YOLO:Real-Time Object Detection It can

Jinyoung Choi 1.7k Nov 21, 2022