Pre-Trained Image Processing Transformer (IPT)

Last update: Dec 18, 2022

Related tags

Overview

Pre-Trained Image Processing Transformer (IPT)

By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao. [arXiv]

We study the low-level computer vision task (such as denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks.

MindSpore Code

Requirements

python 3
pytorch == 1.4.0
torchvision

Dataset

The benchmark datasets can be downloaded as follows:

For super-resolution:

Set5, Set14, B100, Urban100.

For denoising:

CBSD68, Urban100.

For deraining:

Rain100L.

The result images are converted into YCbCr color space. The PSNR is evaluated on the Y channel only.

Script Description

This is the inference script of IPT, you can following steps to finish the test of image processing tasks, like SR, denoise and derain, via the corresponding pretrained models.

Script Parameter

For details about hyperparameters, see option.py.

Evaluation

Pretrained models

The pretrained models are available in google drive

Evaluation Process

Inference example: For SR x2,x3,x4:

python main.py --dir_data $DATA_PATH --pretrain $MODEL_PATH --data_test Set5+Set14+B100+Urban100 --scale $SCALE

For Denoise 30,50:

python main.py --dir_data $DATA_PATH --pretrain $MODEL_PATH --data_test CBSD68+Urban100 --scale 1 --denoise --sigma $NOISY_LEVEL

For derain:

python main.py --dir_data $DATA_PATH --pretrain $MODEL_PATH --scale 1 --derain

Results

Detailed results on image super-resolution task.

Method	Scale	Set5	Set14	B100	Urban100
VDSR	X2	37.53	33.05	31.90	30.77
EDSR	X2	38.11	33.92	32.32	32.93
RCAN	X2	38.27	34.12	32.41	33.34
RDN	X2	38.24	34.01	32.34	32.89
OISR-RK3	X2	38.21	33.94	32.36	33.03
RNAN	X2	38.17	33.87	32.32	32.73
SAN	X2	38.31	34.07	32.42	33.1
HAN	X2	38.27	34.16	32.41	33.35
IGNN	X2	38.24	34.07	32.41	33.23
IPT (ours)	X2	38.37	34.43	32.48	33.76

Method	Scale	Set5	Set14	B100	Urban100
VDSR	X3	33.67	29.78	28.83	27.14
EDSR	X3	34.65	30.52	29.25	28.80
RCAN	X3	34.74	30.65	29.32	29.09
RDN	X3	34.71	30.57	29.26	28.80
OISR-RK3	X3	34.72	30.57	29.29	28.95
RNAN	X3	34.66	30.52	29.26	28.75
SAN	X3	34.75	30.59	29.33	28.93
HAN	X3	34.75	30.67	29.32	29.10
IGNN	X3	34.72	30.66	29.31	29.03
IPT (ours)	X3	34.81	30.85	29.38	29.49

Method	Scale	Set5	Set14	B100	Urban100
VDSR	X4	31.35	28.02	27.29	25.18
EDSR	X4	32.46	28.80	27.71	26.64
RCAN	X4	32.63	28.87	27.77	26.82
SAN	X4	32.64	28.92	27.78	26.79
RDN	X4	32.47	28.81	27.72	26.61
OISR-RK3	X4	32.53	28.86	27.75	26.79
RNAN	X4	32.49	28.83	27.72	26.61
HAN	X4	32.64	28.90	27.80	26.85
IGNN	X4	32.57	28.85	27.77	26.84
IPT (ours)	X4	32.64	29.01	27.82	27.26

Super-resolution result

Denoising result

Derain result

Citation

@misc{chen2020pre,
      title={Pre-Trained Image Processing Transformer}, 
      author={Chen, Hanting and Wang, Yunhe and Guo, Tianyu and Xu, Chang and Deng, Yiping and Liu, Zhenhua and Ma, Siwei and Xu, Chunjing and Xu, Chao and Gao, Wen},
      year={2021},
      eprint={2012.00364},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Main code from EDSR-PyTorch
Transformer code from detr

Pre-Trained Image Processing Transformer (IPT)

Related tags

Overview

Pre-Trained Image Processing Transformer (IPT)

MindSpore Code

Requirements

Dataset

Script Description

Script Parameter

Evaluation

Pretrained models

Evaluation Process

Results

Citation

Acknowledgement

Owner

HUAWEI Noah's Ark Lab

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

Fast (simple) spectral synthesis and emission-line fitting of DESI spectra.

This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Real-time analysis of intracranial neurophysiology recordings.

Agile SVG maker for python

Pytorch implementation for M^3L

This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

A multi-entity Transformer for multi-agent spatiotemporal modeling.

CellRank's reproducibility repository.

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

PyTorch implementation of UPFlow (unsupervised optical flow learning)

Style-based Neural Drum Synthesis with GAN inversion

Unsupervised captioning - Code for Unsupervised Image Captioning

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

Pre-Trained Image Processing Transformer (IPT)

Related tags

Overview

Pre-Trained Image Processing Transformer (IPT)

MindSpore Code

Requirements

Dataset

Script Description

Script Parameter

Evaluation

Pretrained models

Evaluation Process

Results

Citation

Acknowledgement

Owner

HUAWEI Noah's Ark Lab

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

Fast (simple) spectral synthesis and emission-line fitting of DESI spectra.

This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Real-time analysis of intracranial neurophysiology recordings.

Agile SVG maker for python

Pytorch implementation for M^3L

This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

A multi-entity Transformer for multi-agent spatiotemporal modeling.

CellRank's reproducibility repository.

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

PyTorch implementation of UPFlow (unsupervised optical flow learning)

Style-based Neural Drum Synthesis with GAN inversion

Unsupervised captioning - Code for Unsupervised Image Captioning

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务