Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)
Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}

Norm-based Analysis of Transformer

Related tags

Overview

Norm-based Analysis of Transformer

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Citation

Owner

Goro Kobayashi

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

Semi-Supervised Graph Prototypical Networks for Hyperspectral Image Classification, IGARSS, 2021.

for a paper about leveraging discourse markers for training new models

MGFN: Multi-Graph Fusion Networks for Urban Region Embedding was accepted by IJCAI-2022.

An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models

Hand-distance-measurement-game - Hand Distance Measurement Game

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

GAN Image Generator and Characterwise Image Recognizer with python

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

An index of algorithms for learning causality with data

Dynamic hair modeling from monocular videos using deep neural networks

Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

SGPT: Multi-billion parameter models for semantic search

CasualHealthcare's Pneumonia detection with Artificial Intelligence (Convolutional Neural Network)

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection.