RaftMLP

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

By Yuki Tatsunami and Masato Taki (Rikkyo University)

Abstract

For the past ten years, CNN has reigned supreme in the world of computer vision, but recently, Transformer has been on the rise. However, the quadratic computational cost of self-attention has become a serious problem in practice applications. There has been much research on architectures without CNN and self-attention in this context. In particular, MLP-Mixer is a simple architecture designed using MLPs and hit an accuracy comparable to the Vision Transformer. However, the only inductive bias in this architecture is the embedding of tokens. This leaves open the possibility of incorporating a non-convolutional (or non-local) inductive bias into the architecture, so we used two simple ideas to incorporate inductive bias into the MLP-Mixer while taking advantage of its ability to capture global correlations. A way is to divide the token-mixing block vertically and horizontally. Another way is to make spatial correlations denser among some channels of token-mixing. With this approach, we were able to improve the accuracy of the MLP-Mixer while reducing its parameters and computational complexity. The small model that is RaftMLP-S is comparable to the state-of-the-art global MLP-based model in terms of parameters and efficiency per calculation. In addition, we tackled the problem of fixed input image resolution for global MLP-based models by utilizing bicubic interpolation. We demonstrated that these models could be applied as the backbone of architectures for downstream tasks such as object detection. However, it did not have significant performance and mentioned the need for MLP-specific architectures for downstream tasks for global MLP-based models.

About Environment

Our base is PyTorch, Torchvision, and Ignite. We use mmdetection and mmsegmentation for object detection and semantic segmentation. We also use ClearML, AWS, etc., for experiment management.

We also use Docker for our environment, and with Docker and NVIDIA Container Toolkit installed, we can build a runtime environment at the ready.

Require

NVIDIA Driver
Docker(19.03+)
Docker Compose(1.28.0+)
NVIDIA Container Toolkit

Prepare

clearml.conf

Please copy clearml.conf.sample, you can easily create clearml.conf. Unless you have a Clear ML account, you should use the account. Next, you obtain the access key and secret key of the service. Let's write them on clearml.conf. If you don't have an AWS account, you will need one. Then, create an IAM user and an S3 bucket, and grant the IAM user a policy that allows you to read and write objects to the bucket you created. Include the access key and secret key of the IAM user you created and the region of the bucket you made in your clearml.conf.

docker-compose.yml

Please copy docker-compose.yml.sample to docker-compose.yml. Change the path/to/datasets in the volumes section to an appropriate directory where the datasets are stored. You can set device_ids on your environment. If you train semantic segmentation models or object detection models, you should set WANDB_API_KEY.

Datasets

Except for ImageNet, our codes automatically download datasets, but we recommend downloading them beforehand. Datasets need to be placed in the location set in the datasets directory in docker-compose.yml.

ImageNet1k

Please go to URL and register on the site. Then you can download ImageNet1k dataset. You should place it under path/to/datasets with the following structure.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

CIFAR10

No problem, just let the code download automatically. URL

CIFAR100

No problem, just let the code download automatically. URL

Oxford 102 Flowers

No problem, just let the code download automatically. URL

Stanford Cars

You should place it under path/to/datasets with the following structure.

│stanford_cars/
├──cars_train/
│  ├── 00001.jpg
│  ├── 00002.jpg
│  ├── ......
├──cars_test/
│  ├── 00001.jpg
│  ├── 00002.jpg
│  ├── ......
├──devkit/
│  ├── cars_meta.mat
│  ├── cars_test_annos.mat
│  ├── cars_train_annos.mat
│  ├── eval_train.m
│  ├── README.txt
│  ├── train_perfect_preds.txt
├──cars_test_annos_withlabels.matcars_test_annos_withlabels.mat

URL

iNaturalist18

You should place it under path/to/datasets with the following structure.

│i_naturalist_18/
├──train_val2018/
│  ├──Actinopterygii/
│  │  ├──2229/
│  │  │  ├── 014a31153ac74bf87f1f730480e4a27a.jpg
│  │  │  ├── 037d062cc1b8a85821449d2cdeca7749.jpg
│  │  │  ├── ......
│  │  ├── ......
│  ├── ......
├──train2018.json
├──val2018.json

URL

iNaturalist19

You should place it under path/to/datasets with the following structure.

│i_naturalist_19/
├──train_val2019/
│  ├──Amphibians/
│  │  ├──153/
│  │  │  ├── 0042d05b4ffbd5a1ce2fc56513a7777e.jpg
│  │  │  ├── 006f69e838b87cfff3d12120795c4ada.jpg
│  │  │  ├── ......
│  │  ├── ......
│  ├── ......
├──train2019.json
├──val2019.json

URL

MS COCO

You should place it under path/to/datasets with the following structure.

│coco/
├──train2017/
│  ├── 000000000009.jpg
│  ├── 000000000025.jpg
│  ├── ......
├──val2017/
│  ├── 000000000139.jpg
│  ├── 000000000285.jpg
│  ├── ......
├──annotations/
│  ├── captions_train2017.json
│  ├── captions_val2017.json
│  ├── instances_train2017.json
│  ├── instances_val2017.json
│  ├── person_keypoints_train2017.json
│  ├── person_keypoints_val2017.json

URL

ADE20K

In order for you to download the ADE20k dataset, you have to register at this site and get approved. Once downloaded the dataset, place it so that it has the following structure.

│ade/
├──ADEChallengeData2016/
│  ├──annotations/
│  │  ├──training/
│  │  │  ├── ADE_train_00000001.png
│  │  │  ├── ADE_train_00000002.png
│  │  │  ├── ......
│  │  ├──validation/
│  │  │  ├── ADE_val_00000001.png
│  │  │  ├── ADE_val_00000002.png
│  │  │  ├── ......
│  ├──images/
│  │  ├──training/
│  │  │  ├── ADE_train_00000001.jpg
│  │  │  ├── ADE_train_00000002.jpg
│  │  │  ├── ......
│  │  ├──validation/
│  │  │  ├── ADE_val_00000001.jpg
│  │  │  ├── ADE_val_00000002.jpg
│  │  │  ├── ......
│  │  ├──
│  ├──objectInfo150.txt
│  ├──sceneCategories.txt

ImageNet1k

configs/settings are available. Each of the training conducted in Subsection 4.1 can be performed in the following commands.

docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-s
docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-m
docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-l

The ablation study for channel rafts in subsection 4.2 ran the following commands.

Ablation Study

docker run trainer python run.py settings=imagenet-org-mixer
docker run trainer python run.py settings=imagenet-raft-mlp-r-1
docker run trainer python run.py settings=imagenet-raft-mlp-r-2
docker run trainer python run.py settings=imagenet-raft-mlp

The ablation study for multi-scale patch embedding in subsection 4.2 ran the following commands.

docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-m
docker run trainer python run.py settings=imagenet-raft-mlp-hierarchy-m

Transfer Learning

docker run trainer python run.py settings=finetune/cars-org-mixer.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/cifar10-org-mixer.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/cifar100-org-mixer.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/flowers102-org-mixer.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/inat18-org-mixer.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/inat19-org-mixer.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-l.yaml

Object Detection

The weights already trained by ImageNet should be placed in the following path.

path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-s/last_model_0.pt
path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-l/last_model_0.pt
path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-m/last_model_0.pt
path/to/datasets/weights/imagenet-org-mixer/last_model_0.pt

Please execute the following commands.

docker run trainer bash ./detection.sh configs/detection/maskrcnn_org_mixer_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_l_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_m_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_s_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_org_mixer_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_l_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_m_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_s_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8

Semantic Segmentation

As with object detection, the following should be executed after placing the weight files in advance.

docker run trainer bash ./segmentation.sh configs/segmentation/fpn_org_mixer_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_s_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_m_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_l_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8

Reference

@misc{tatsunami2021raftmlp,
  title={RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?},
  author={Yuki Tatsunami and Masato Taki},
  year={2021}
  eprint={2108.04384},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

License

This repository is relased under the Apache 2.0 license as douns in the LICENSE file.

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Related tags

Overview

RaftMLP

Abstract

About Environment

Require

Prepare

clearml.conf

docker-compose.yml

Datasets

ImageNet1k

CIFAR10

CIFAR100

Oxford 102 Flowers

Stanford Cars

iNaturalist18

iNaturalist19

MS COCO

ADE20K

ImageNet1k

Ablation Study

Transfer Learning

Object Detection

Semantic Segmentation

Reference

License

Owner

Okojo

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning"

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

A Python library for Deep Graph Networks

Self-training with Weak Supervision (NAACL 2021)

Mitsuba 2: A Retargetable Forward and Inverse Renderer

This repository contains all code and data for the Inside Out Visual Place Recognition task

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Music library streaming app written in Flask & VueJS

VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

A Python module for the generation and training of an entry-level feedforward neural network.

A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

Multiple Object Tracking with Yolov5!

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization