A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Last update: Dec 10, 2022

Overview

Masked Autoencoders Are Scalable Vision Learners

A TensorFlow implementation of Masked Autoencoders Are Scalable Vision Learners [1]. Our implementation of the proposed method is available in mae-pretraining.ipynb notebook. It includes evaluation with linear probing as well. Furthermore, the notebook can be fully executed on Google Colab. Our main objective is to present the core idea of the proposed method in a minimal and readable manner. We have also prepared a blog for getting started with Masked Autoencoder easily.

Source: Masked Autoencoders Are Scalable Vision Learners

With just 100 epochs of pre-training and a fairly lightweight and asymmetric Autoencoder architecture we achieve 49.33%% accuracy with linear probing on the CIFAR-10 dataset. Our training logs and encoder weights are released in Weights and Logs. For comparison, we took the encoder architecture and trained it from scratch (refer to regular-classification.ipynb) in a fully supervised manner. This gave us ~76% test top-1 accuracy.

We note that with further hyperparameter tuning and more epochs of pre-training, we can achieve a better performance with linear-probing. Below we present some more results:

Config	Masking proportion	LP performance	Encoder weights & logs
Encoder & decoder layers: 3 & 1 Batch size: 256	0.6	44.25%	Link
Do	0.75	46.84%	Link
Encoder & decoder layers: 6 & 2 Batch size: 256	0.75	48.16%	Link
Encoder & decoder layers: 9 & 3 Batch size: 256 Weight deacy: 1e-5	0.75	49.33%	Link

^{LP denotes linear-probing. Config is mostly based on what we define in the hyperparameters section of this notebook: mae-pretraining.ipynb.}

Acknowledgements

Xinlei Chen (one of the authors of the original paper)
Google Developers Experts Program and JarvisLabs for providing credits to perform extensive experimentation on A100 GPUs.

References

[1] Masked Autoencoders Are Scalable Vision Learners; He et al.; arXiv 2021; https://arxiv.org/abs/2111.06377.

You might also like...

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

26 Oct 5, 2022

Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

AAVAE Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders" Abstract Recent methods for self-supervised learnin

48 Dec 12, 2022

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

74 Dec 3, 2022

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

31 Nov 18, 2022

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

114 Jan 6, 2023

SimMIM: A Simple Framework for Masked Image Modeling

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

181 Dec 10, 2021

SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

186 Dec 30, 2022

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

21 Nov 17, 2022

Comments

Excellent work (`mae.ipynb`)!
@ariG23498 this is fantastic stuff. Super clean, readable, and coherent with the original implementation. A couple of suggestions that would likely make things even better:

Since you have already implemented masking visualization utilities how about making them part of the PatchEncoder itself? That way you could let it accept a test image, apply random masking, and plot it just like the way you are doing in the earlier cells. This way I believe the notebook will be cleaner.

AdamW (tfa.optimizers.adamw) is a better choice when it comes to training Transformer-based models.

Are we taking the loss on the correct component? I remember you mentioning it being dealt with differently.

After these points are addressed I will take a crack at porting the training loop to TPUs along with other performance monitoring callbacks.
opened by sayakpaul 7
Unshuffle the patches?

Your code helps me a lot! However, I still have some questions. In the paper, the authors say they unshuffle the full list before applying the deocder. In the MaskedAutoencoder class of your implementation, decoder_inputs = tf.concat([encoder_outputs, masked_embeddings], axis=1)
no unshuffling is used. I wonder if you can tell me the purpose of doing so? Thanks a lot!

opened by changtaoli 2
Could you also share the weight of the pretrained decoder?

Hi,

Thanks for your excellent implementation! I found that you have shared the weights of the encoder, but if we want to replicate the reconstruction, the pretrained decoder is still needed. So, could you also share the weight of the pretrained decoder?

Best Regards, Hongxin

opened by hongxin001 1

Issue with the plotting utility `show_masked_image`

Should be:

def show_masked_image(self, patches):
        # Utility function that helps visualize maksed images.
        _, unmask_indices = self.get_random_indices()
        unmasked_patches = tf.gather(patches, unmask_indices, axis=1, batch_dims=1)

        # Necessary for plotting.
        ids = tf.argsort(unmask_indices)
        sorted_unmask_indices = tf.sort(unmask_indices)
        unmasked_patches = tf.gather(unmasked_patches, ids, batch_dims=1)

        # Select a random index for visualization.
        idx = np.random.choice(len(sorted_unmask_indices))
        print(f"Index selected: {idx}.")

        n = int(np.sqrt(NUM_PATCHES))
        unmask_index = sorted_unmask_indices[idx]
        unmasked_patch = unmasked_patches[idx]

        plt.figure(figsize=(4, 4))

        count = 0
        for i in range(NUM_PATCHES):
            ax = plt.subplot(n, n, i + 1)

            if count < unmask_index.shape[0] and unmask_index[count].numpy() == i:
                patch = unmasked_patch[count]
                patch_img = tf.reshape(patch, (PATCH_SIZE, PATCH_SIZE, 3))
                plt.imshow(patch_img)
                plt.axis("off")
                count = count + 1
            else:
                patch_img = tf.zeros((PATCH_SIZE, PATCH_SIZE, 3))
                plt.imshow(patch_img)
                plt.axis("off")
        plt.show()

        # Return the random index to validate the image outside the method.
        return idx

opened by ariG23498 1

Releases(v1.0.0)

v1.0.0(Nov 22, 2021)
This release contains the:

encoder weights and logs

linear probing weights and logs

full supervision weights and logs

This ensures complete reproducibility of the experiments.
Source code(tar.gz)
Source code(zip)
44_25.zip(3.82 MB)
46_84.zip(3.82 MB)
48_16.zip(7.47 MB)
49_33.zip(11.11 MB)
[email protected]_76.17.tar.gz(4.73 MB)

Owner

Aritra Roy Gosthipaty

Learning with a learning rate of 1e-10.

GitHub Repository https://keras.io/examples/vision/masked_image_modeling/

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

53 Nov 22, 2022

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Introduction 1. Usage (For MSS) 1.1 Prepare running environment 1.2 Use pretrained model 1.3 Train new MSS models from scratch 1.3.1 How to train 1.3.

100 Dec 25, 2022

Learning kernels to maximize the power of MMD tests

Code for the paper "Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy" (arXiv:1611.04488; published at ICLR 2017), by Douga

201 Dec 17, 2022

Python 3 module to print out long strings of text with intervals of time inbetween

Python-Fastprint Python 3 module to print out long strings of text with intervals of time inbetween Install: pip install fastprint Sync Usage: from fa

2 Jun 27, 2022

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch = 0.2.0 torchvision = 0.1.8 fcn = 6.1.5 Pillow scipy tqdm

1.6k Jan 07, 2023

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Graph-based joint model with Nonignorable Missingness (GNM) This is a Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Lear

2 Apr 17, 2022

Iowa Project - My second project done at General Assembly, focused on feature engineering and understanding Linear Regression as a concept

Project 2 - Ames Housing Data and Kaggle Challenge PROBLEM STATEMENT Inferring or Predicting? What's more valuable for a housing model? When creating

1 Jan 03, 2022

The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot: All-In-One Controllable Conversational Model This is the implementation of the paper: The Adapter-Bot: All-In-One Controllable Conver

37 Nov 04, 2022

pytorch implementation for PointNet

PointNet.pytorch This repo is implementation for PointNet in pytorch. The model is in pointnet/model.py. It is teste

1.7k Dec 30, 2022

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

1 Nov 18, 2021

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 03, 2023

Improving Machine Translation Systems via Isotopic Replacement

CAT (Improving Machine Translation Systems via Isotopic Replacement) Machine translation plays an essential role in people’s daily international commu

10 Nov 30, 2022

Beancount-mercury - Beancount importer for Mercury Startup Checking

beancount-mercury beancount-mercury provides an Importer for converting CSV expo

4 Oct 31, 2022

Serving PyTorch 1.0 Models as a Web Server in C++

Serving PyTorch Models in C++ This repository contains various examples to perform inference using PyTorch C++ API. Run git clone https://github.com/W

223 Jan 04, 2023

PyTorch wrappers for using your model in audacity!

audacitorch This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for

130 Dec 14, 2022

This repository contains the implementation of the paper: Federated Distillation of Natural Language Understanding with Confident Sinkhorns

Federated Distillation of Natural Language Understanding with Confident Sinkhorns This repository provides an alternative method for ensembled distill

11 Nov 16, 2022

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 01, 2023

Official PyTorch implementation of the NeurIPS 2021 paper StyleGAN3

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

92 Nov 18, 2022

Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

Continual Reinforcement Learning This repository provides a simple way to run continual reinforcement learning experiments in PyTorch, including evalu

55 Dec 24, 2022

ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

ManipulaTHOR: A Framework for Visual Object Manipulation Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha

65 Dec 30, 2022