Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Last update: Feb 23, 2022

Overview

Accelerated Sparse Neural Training: A Provable and Efficient Method to FindN:M Transposable Masks

Recently, researchers proposed pruning deep neural network weights (DNNs) using an $N:M$ fine-grained block sparsity mask. In this mask, for each block of M weights, we have at least N zeros. In contrast to unstructured sparsity, N:M fine-grained block sparsity allows acceleration in actual modern hardware. Previously suggested solutions enabled DNN acceleration at the inference phase. To also allow such acceleration in the training phase, we suggest a novel transposable-fine-grained sparsity mask where the same mask can be used for both forward and backward passes. Our transposable mask ensures that both the weight matrix and its transpose follow the same sparsity pattern; thus the matrix multiplication required for passing the error backward can also be accelerated. We discuss the transposable constraint and devise a new measure for mask constraints, called mask-diversity (MD), which correlates with their expected accuracy. Lastly, we formulate the problem of finding the optimal transposable mask as a minimum-cost-flow problem and suggest a fast linear approximation that can be used when the masks dynamically change while training. Our experiments suggest 2x speed-up with no accuracy degradation over vision and language models. A reference implementation is available in the supplementary material.

Reproducing the results

This repository is partially based on convNet.pytorch repo. please ensure that you are using pytorch 1.7+. Reproducing AdaPrune results

cd AdaPrune
sh scripts/adaprune_dense_bnt.sh
sh scripts/adaprune_sparse.sh

Reproducing static NM-transposable starting from dense pre-trained model:

cd static_TNM
sh scripts/prune_pretrained_R50.sh

Reproducing dynamic NM-transposable from scratch:

cd dynamic_TNM
sh scripts/clone_and_copy.sh
sh scripts/run_R18.sh
sh scripts/run_R50.sh

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Related tags

Overview

Accelerated Sparse Neural Training: A Provable and Efficient Method to FindN:M Transposable Masks

Reproducing the results

Owner

itay hubara

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Document processing using transformers

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

Search-Engine - 📖 AI based search engine

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Beautiful visualizations of how language differs among document types.

Label data using HuggingFace's transformers and automatically get a prediction service

PG-19 Language Modelling Benchmark

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

Textpipe: clean and extract metadata from text