PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Last update: Dec 06, 2022

Related tags

Overview

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

This repository is the official implementation of the following paper:

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models
Chaoyang He (USC), Shen Li (Facebook AI Research), Mahdi Soltanolkotabi (USC), Salman Avestimehr (USC)
Accepted to ICML 2021 (International Conference on Machine Learning 2021)

1. Introduction

The size of Transformer models is growing at an unprecedented rate. It has taken less than one year to reach trillion-level parameters since the release of GPT-3 (175B). Training such models requires both substantial engineering efforts and enormous computing resources, which are luxuries most research teams cannot afford. In this paper, we propose PipeTransformer, which leverages automated elastic pipelining for efficient distributed training of Transformer models. In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the pipeline, packs active layers into fewer GPUs, and forks more replicas to increase data-parallel width. We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on SQuAD and GLUE datasets. Our results show that compared to the state-of-the-art baseline, PipeTransformer attains up to $2.83$-fold speedup without losing accuracy. We also provide various performance analyses for a more comprehensive understanding of our algorithmic and system-wise design. Finally, we have modularized our training system with flexible APIs and made the source code publicly available.

2. Overall Design

3. Slides

https://docs.google.com/presentation/d/1t6HWL33KIQo2as0nSHeBpXYtTBcy0nXCoLiKd0EashY/edit?usp=sharing

4. Understanding PipeTransformer by Animation

https://videos.files.wordpress.com/3vsRzoiw/pipetransformer-animation_m4v_hd.mp4

5. Installation

Please follow INSTALL-CONDA.md.

6. Experiments

check README.md at

examples/image_classification

examples/question_answering

examples/text_classification

7. Citation

If you use any part of this code in your research or any engineering project, please cite our paper:

@article{he2021pipetransformer,
  title={Pipetransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models},
  author={He, Chaoyang and Li, Shen and Soltanolkotabi, Mahdi and Avestimehr, Salman},
  journal={Thirty-eighth International Conference on Machine Learning},
  year={2021}
}

8. Contacts

Chaoyang He
https://chaoyanghe.com
[email protected]
[email protected]

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Related tags

Overview

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

1. Introduction

2. Overall Design

3. Slides

4. Understanding PipeTransformer by Animation

5. Installation

6. Experiments

7. Citation

8. Contacts

Owner

DistributedML

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

A blender add-on that automatically re-aligns wrong axis objects.

내가 보려고 정리한 <프로그래밍 기초 Ⅰ> / organized for me

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

This project is used for the paper Differentiable Programming of Isometric Tensor Network

Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

ZEBRA: Zero Evidence Biometric Recognition Assessment

ScaleNet: A Shallow Architecture for Scale Estimation

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Semantic Segmentation in Pytorch

Source code for PairNorm (ICLR 2020)

Unified learning approach for egocentric hand gesture recognition and fingertip detection

Multi-Modal Machine Learning toolkit based on PaddlePaddle.

Jupyter notebooks for using & learning Keras

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Related tags

Overview

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

1. Introduction

2. Overall Design

3. Slides

4. Understanding PipeTransformer by Animation

5. Installation

6. Experiments

7. Citation

8. Contacts

Owner

DistributedML

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

A blender add-on that automatically re-aligns wrong axis objects.

내가 보려고 정리한 <프로그래밍 기초 Ⅰ> / organized for me

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

This project is used for the paper Differentiable Programming of Isometric Tensor Network

Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

ZEBRA: Zero Evidence Biometric Recognition Assessment

ScaleNet: A Shallow Architecture for Scale Estimation

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Semantic Segmentation in Pytorch

Source code for PairNorm (ICLR 2020)

Unified learning approach for egocentric hand gesture recognition and fingertip detection

Multi-Modal Machine Learning toolkit based on PaddlePaddle.

Jupyter notebooks for using & learning Keras

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang