Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Last update: Sep 06, 2021

Overview

Period-alternatives-of-Softmax

Experimental Demo for our paper

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

We suggest that replacing the exponential function by periodic functions. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.

** Create your own 'dataset' fold, and maybe need to modify the demo.py file for your own dataset except for cifar-10, cifar-100 and Tiny-imageNet.

Function available:

softmax , norm_softmax
sinmax, norm_sinmax
cosmax, norm_cosmax
sin_2_max, norm_sin_2_max
sin_2_max_move, norm_sin_2_max_move
sirenmax, norm_sirenmax
sin_softmax, norm_sin_softmax

mode available:

search:
        Random search for a suitable set of learning rate and weight decay, and record the results in 
        Attention_test/*functions/lr_wd_search.txt
run:
        Train the demo, and there will be four .npy files created in root.
        (1) 'record_val_acc.npy' for val acc record every 100 iter;
        (2) 'record_train_acc.npy' for train acc record every batch;
        (3) 'record_loss.npy' for train loss record every batch;
        (4) 'kq_value.npy' for Q.K record *before sclaled*.
att_run:
        Same as the run mode but:
        (1) No kq_value record;
        (2) Every 5 epoch, input a test image and record the attention score map of each head of each layer.
            Saved in 'Attention_test/attention_maps.npy'

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Related tags

Overview

Period-alternatives-of-Softmax

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

Function available:

mode available:

Owner

slwang9353

Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

A script depending on VASP output for calculating Fermi-Softness.

This is the code for HOI Transformer

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SysWhispers Shellcode Loader

The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

🔮 Execution time predictions for deep neural network training iterations across different GPUs.

MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

This is the code of paper ``Contrastive Coding for Active Learning under Class Distribution Mismatch'' with python.

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

Teaches a student network from the knowledge obtained via training of a larger teacher network

Sentiment analysis translations of the Bhagavad Gita

Cross View SLAM

Command-line tool for downloading and extending the RedCaps dataset.

Second Order Optimization and Curvature Estimation with K-FAC in JAX.

Generate Contextual Directory Wordlist For Target Org

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)