TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Overview

Parameterization of Hypercomplex Multiplications (PHM)

This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication) layers and PHM-Transformers in the paper Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters at ICLR 2021.

Installation

One may install the following libraries before running our code:

Usage

The usage of this repository follows the original tensor2tensor repository (e.g., t2t-datagen, t2t-trainer, t2t-avg-all, followed by t2t-decoder). It helps to gain familiarity on tensor2tensor before attempting to run our code. Specifically, setting --t2t_usr_dir=./Parameterization-of-Hypercomplex-Multiplications will allow tensor2tensor to register PHM-Transformers.

Training

For example, to evaluate PHM-Transformer (n=4) on the En-Vi machine translation task (t2t-datagen --problem=translate_envi_iwslt32k), one may set the following flags when training:

t2t-trainer \
--problem=translate_envi_iwslt32k \
--model=light_transformer \
--hparams_set=light_transformer_base_single_gpu \
--hparams="light_mode='random',hidden_size=512,factor=4" \
--train_steps=50000

where light_transformer with light_mode='random' is the alias of the PHM-Transformer in our implementation.

Aggretating Checkpoints

After training, the latest 8 checkpoints are averaged:

t2t-avg-all --model_dir $TRAIN_DIR --output_dir $AVG_DIR --n 8

where $TRAIN_DIR and $AVG_DIR need to be specified by users.

Testing

To decode the target sequence, one has to additionally set the decode_hparams as follows:

t2t-decoder \
--decode_hparams="beam_size=5,alpha=0.6"

Then t2t-bleu is invoked for calculating the BLEU.

PHM Implementations

PHM is implemented with operations in make_random_mul and random_ffn, which are mathematically equivalent to sum of Kronecker products.

Among works that use PHM, some have offered alternative PHM implementations:

Citation

If you find this repository helpful, please cite our paper:

@inproceedings{zhang2021beyond,
  title={Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters},
  author={Zhang, Aston and Tay, Yi and Zhang, Shuai and Chan, Alvin and Luu, Anh Tuan and Hui, ‪Siu Cheung and Fu, Jie},
  booktitle={International Conference on Learning Representations},
  year={2021}
}
Owner
Aston Zhang
Dive into Deep Learning: D2L.ai 《动手学深度学习》: zh.D2L.ai
Aston Zhang
FastReID is a research platform that implements state-of-the-art re-identification algorithms.

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

JDAI-CV 2.8k Jan 07, 2023
Deep Learning for Time Series Forecasting.

nixtlats:Deep Learning for Time Series Forecasting [nikstla] (noun, nahuatl) Period of time. State-of-the-art time series forecasting for pytorch. Nix

Nixtla 5 Dec 06, 2022
A simple code to perform canny edge contrast detection on images.

CECED-Canny-Edge-Contrast-Enhanced-Detection A simple code to perform canny edge contrast detection on images. A simple code to process images using c

Happy N. Monday 3 Feb 15, 2022
Pytorch implementation of our paper under review -- 1xN Pattern for Pruning Convolutional Neural Networks

1xN Pattern for Pruning Convolutional Neural Networks (paper) . This is Pytorch re-implementation of "1xN Pattern for Pruning Convolutional Neural Net

Mingbao Lin (林明宝) 29 Nov 29, 2022
DTCN IJCAI - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

写在前面 利用TensorRT加速推理速度是以时间换取精度的做法,意味着在推理速度上升的同时将会有精度的下降,不过不用太担心,精度下降微乎其微。此外,要有NVIDIA显卡,经测试,CUDA10.2可以支持20系列显卡及以下,30系列显卡需要CUDA11.x的支持,并且目前有bug。 默认你已经完成了

Helium 6 Jul 28, 2022
Libtorch yolov3 deepsort

Overview It is for my undergrad thesis in Tsinghua University. There are four modules in the project: Detection: YOLOv3 Tracking: SORT and DeepSORT Pr

Xu Wei 226 Dec 13, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
A repository for generating stylized talking 3D and 3D face

style_avatar A repository for generating stylized talking 3D faces and 2D videos. This is the repository for paper Imitating Arbitrary Talking Style f

Haozhe Wu 191 Dec 22, 2022
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
Iterative Normalization: Beyond Standardization towards Efficient Whitening

IterNorm Code for reproducing the results in the following paper: Iterative Normalization: Beyond Standardization towards Efficient Whitening Lei Huan

Lei Huang 21 Dec 27, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

54 Dec 12, 2022
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos Carreño 108 Dec 27, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
This is code of book "Learn Deep Learning with PyTorch"

深度学习入门之PyTorch Learn Deep Learning with PyTorch 非常感谢您能够购买此书,这个github repository包含有深度学习入门之PyTorch的实例代码。由于本人水平有限,在写此书的时候参考了一些网上的资料,在这里对他们表示敬意。由于深度学习的技术在

Xingyu Liao 2.5k Jan 04, 2023
A containerized REST API around OpenAI's CLIP model.

OpenAI's CLIP — REST API This is a container wrapping OpenAI's CLIP model in a RESTful interface. Running the container locally First, build the conta

Santiago Valdarrama 48 Nov 06, 2022
Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Towards End-to-End Image Compression and Analysis with Transformers Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analy

37 Dec 21, 2022
Deep-Learning-Book-Chapter-Summaries - Attempting to make the Deep Learning Book easier to understand.

Deep-Learning-Book-Chapter-Summaries This repository provides a summary for each chapter of the Deep Learning book by Ian Goodfellow, Yoshua Bengio an

Aman Dalmia 1k Dec 27, 2022
pixelNeRF: Neural Radiance Fields from One or Few Images

pixelNeRF: Neural Radiance Fields from One or Few Images Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa UC Berkeley arXiv: http://arxiv.org/abs/2

Alex Yu 1k Jan 04, 2023