Scaling Vision with Sparse Mixture of Experts

This repository contains the code for training and fine-tuning Sparse MoE models for vision (V-MoE) on ImageNet-21k, reproducing the results presented in the paper:

Scaling Vision with Sparse Mixture of Experts, by Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby.

We will soon provide a colab analysing one of the models that we have released, as well as "config" files to train from scratch and fine-tune checkpoints. Stay tuned.

Installation

Simply clone this repository.

The file requirements.txt contains the requirements that can be installed via PyPi. However, we recommend installing jax, flax and optax directly from GitHub, since we use some of the latest features that are not part of any release yet.

In addition, you also have to clone the Vision Transformer repository, since we use some parts of it.

If you want to use RandAugment to train models (which we recommend if you train on ImageNet-21k or ILSVRC2012 from scratch), you must also clone the Cloud TPU repository, and name it cloud_tpu.

Checkpoints

We release the checkpoints containing the weights of some models that we trained on ImageNet (either ILSVRC2012 or ImageNet-21k). All checkpoints contain an index file (with .index extension) and one or multiple data files ( with extension .data-nnnnn-of-NNNNN, called shards). In the following list, we indicate only the prefix of each checkpoint. We recommend using gsutil to obtain the full list of files, download them, etc.

V-MoE S/32, 8 experts on the last two odd blocks, trained from scratch on ILSVRC2012 with RandAugment: gs://vmoe_checkpoints/vmoe_s32_last2_ilsvrc2012_randaug_medium.
V-MoE B/16, 8 experts on every odd block, trained from scratch on ImageNet-21k with RandAugment: gs://vmoe_checkpoints/vmoe_b16_imagenet21k_randaug_strong.
- Fine-tuned on ILSVRC2012: gs://vmoe_checkpoints/vmoe_b16_imagenet21k_randaug_strong_ft_ilsvrc2012

Disclaimers

This is not an officially supported Google product.

Scaling Vision with Sparse Mixture of Experts

Related tags

Overview

Scaling Vision with Sparse Mixture of Experts

Installation

Checkpoints

Disclaimers

Owner

Google Research

A novel pipeline framework for multi-hop complex KGQA task. About the paper title: Improving Multi-hop Embedded Knowledge Graph Question Answering by Introducing Relational Chain Reasoning

Node for thenewboston digital currency network.

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

Establishing Strong Baselines for TripClick Health Retrieval; ECIR 2022

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)

a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version

Pretraining on Dynamic Graph Neural Networks

Implementations of polygamma, lgamma, and beta functions for PyTorch

A Self-Supervised Contrastive Learning Framework for Aspect Detection

Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Oscar and VinVL

Einshape: DSL-based reshaping library for JAX and other frameworks.

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

TensorFlow implementation of ENet, trained on the Cityscapes dataset.

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.

Tools for the Cleveland State Human Motion and Control Lab

The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.