TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Last update: Oct 26, 2022

Overview

Parameterization of Hypercomplex Multiplications (PHM)

This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication) layers and PHM-Transformers in the paper Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters at ICLR 2021.

Installation

One may install the following libraries before running our code:

tensorflow-gpu (1.14.0)
tensor2tensor (1.14.0)

Usage

The usage of this repository follows the original tensor2tensor repository (e.g., t2t-datagen, t2t-trainer, t2t-avg-all, followed by t2t-decoder). It helps to gain familiarity on tensor2tensor before attempting to run our code. Specifically, setting --t2t_usr_dir=./Parameterization-of-Hypercomplex-Multiplications will allow tensor2tensor to register PHM-Transformers.

Training

For example, to evaluate PHM-Transformer (n=4) on the En-Vi machine translation task (t2t-datagen --problem=translate_envi_iwslt32k), one may set the following flags when training:

t2t-trainer \
--problem=translate_envi_iwslt32k \
--model=light_transformer \
--hparams_set=light_transformer_base_single_gpu \
--hparams="light_mode='random',hidden_size=512,factor=4" \
--train_steps=50000

where light_transformer with light_mode='random' is the alias of the PHM-Transformer in our implementation.

Aggretating Checkpoints

After training, the latest 8 checkpoints are averaged:

t2t-avg-all --model_dir $TRAIN_DIR --output_dir $AVG_DIR --n 8

where $TRAIN_DIR and $AVG_DIR need to be specified by users.

Testing

To decode the target sequence, one has to additionally set the decode_hparams as follows:

t2t-decoder \
--decode_hparams="beam_size=5,alpha=0.6"

Then t2t-bleu is invoked for calculating the BLEU.

PHM Implementations

PHM is implemented with operations in make_random_mul and random_ffn, which are mathematically equivalent to sum of Kronecker products.

Among works that use PHM, some have offered alternative PHM implementations:

Citation

If you find this repository helpful, please cite our paper:

@inproceedings{zhang2021beyond,
  title={Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters},
  author={Zhang, Aston and Tay, Yi and Zhang, Shuai and Chan, Alvin and Luu, Anh Tuan and Hui, ‪Siu Cheung and Fu, Jie},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Related tags

Overview

Parameterization of Hypercomplex Multiplications (PHM)

Installation

Usage

Training

Aggretating Checkpoints

Testing

PHM Implementations

Citation

Owner

Aston Zhang

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

PyTorch implementation of the paper Deep Networks from the Principle of Rate Reduction

ScaleNet: A Shallow Architecture for Scale Estimation

Locationinfo - A script helps the user to show network information such as ip address

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

The mini-MusicNet dataset

Easy and Efficient Object Detector

A collection of loss functions for medical image segmentation

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

Official PyTorch implementation of RIO

Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

Official implementation of NLOS-OT: Passive Non-Line-of-Sight Imaging Using Optimal Transport (IEEE TIP, accepted)

Notes taking website build with Docker + Django + React.

Code for our CVPR 2021 paper "MetaCam+DSCE"

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Related tags

Overview

Parameterization of Hypercomplex Multiplications (PHM)

Installation

Usage

Training

Aggretating Checkpoints

Testing

PHM Implementations

Citation

Owner

Aston Zhang

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

PyTorch implementation of the paper Deep Networks from the Principle of Rate Reduction

ScaleNet: A Shallow Architecture for Scale Estimation

Locationinfo - A script helps the user to show network information such as ip address

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

The mini-MusicNet dataset

Easy and Efficient Object Detector

A collection of loss functions for medical image segmentation

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

Official PyTorch implementation of RIO

Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

Official implementation of NLOS-OT: Passive Non-Line-of-Sight Imaging Using Optimal Transport (IEEE TIP, accepted)

Notes taking website build with Docker + Django + React.

Code for our CVPR 2021 paper "MetaCam+DSCE"

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang