Mesh Transformer Jax

A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers.

See enwik8_example.py for an example of using this to implement an autoregressive language model.

Benchmarks

On a TPU v3-8 (see tpuv38_example.py):

~2.7B model

Initialized in 121.842s
Total parameters: 2722382080
Compiled in 49.0534s
it: 0, loss: 20.311113357543945
<snip>
it: 90, loss: 3.987450361251831
100 steps in 109.385s
effective flops (not including attn): 2.4466e+14

~4.8B model

Initialized in 101.016s
Total parameters: 4836720896
Compiled in 52.7404s
it: 0, loss: 4.632925987243652
<snip>
it: 40, loss: 3.2406811714172363
50 steps in 102.559s
effective flops (not including attn): 2.31803e+14

10B model

Initialized in 152.762s
Total parameters: 10073579776
Compiled in 92.6539s
it: 0, loss: 5.3125
<snip>
it: 40, loss: 3.65625
50 steps in 100.235s
effective flops (not including attn): 2.46988e+14

Model parallel transformers in Jax and Haiku

Related tags

Overview

Mesh Transformer Jax

Benchmarks

~2.7B model

~4.8B model

10B model

TODO

Owner

Ben Wang

[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Example repository for custom C++/CUDA operators for TorchScript

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

NVIDIA Deep Learning Examples for Tensor Cores

Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Composing methods for ML training efficiency

Database Reasoning Over Text project for ACL paper

A simple pygame dino game which can also be trained and played by a NEAT KI

[CVPR 2022] Deep Equilibrium Optical Flow Estimation

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

A geometric deep learning pipeline for predicting protein interface contacts.

Expert Finding in Legal Community Question Answering

NeRF visualization library under construction

RARA: Zero-shot Sim2Real Visual Navigation with Following Foreground Cues