Graph neural network message passing reframed as a Transformer with local attention

Last update: Dec 28, 2022

Overview

Adjacent Attention Network

An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with multi-head attention at each successive layer. Since Graph Attention Network is already taken, I decided to name it Adjacent Attention Network instead. The design will be more transformer-centric. Instead of using the square root inverse adjacency matrix trick by Kipf and Welling, in this framework it will simply be translated to the proper attention mask at each layer.

This repository is for my own exploration into the graph neural network field. My gut tells me the transformers architecture can generalize and outperform graph neural networks.

Install

$ pip install adjacent-attention-network

Usage

Basically a transformers where each node pays attention to the neighbors as defined by the adjacency matrix. Complexity is O(n * max_neighbors). Max number of neighbors as defined by the adjacency matrix.

The following example will have a complexity of ~ 1024 * 100

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4
)

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1) < 0.1
nodes   = torch.randn(1, 1024, 512)
mask    = torch.ones(1, 1024).bool()

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

If the number of neighbors contain outliers, then the above will lead to wasteful computation, since a lot of nodes will be doing attention on padding. You can use the following stop-gap measure to account for these outliers.

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4,
    num_neighbors_cutoff = 100
).cuda()

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1).cuda() < 0.1
nodes   = torch.randn(1, 1024, 512).cuda()
mask    = torch.ones(1, 1024).bool().cuda()

# for some reason, one of the nodes is fully connected to all others
adj_mat[:, 0] = 1.

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

For non-local attention, I've decided to use a trick from the Set Transformers paper, the Induced Set Attention Block (ISAB). From the lens of graph neural net literature, this would be analogous as having global nodes for message passing non-locally.

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4,
    num_global_nodes = 5
).cuda()

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1).cuda() < 0.1
nodes   = torch.randn(1, 1024, 512).cuda()
mask    = torch.ones(1, 1024).bool().cuda()

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

519 Jan 2, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

71 Dec 30, 2022

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

25 Dec 28, 2022

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

21 Dec 30, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

7 Nov 7, 2022

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

8 Dec 30, 2022

Graph neural network message passing reframed as a Transformer with local attention

Related tags

Overview

Adjacent Attention Network

Install

Usage

You might also like...

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Releases(0.0.12)

0.0.12(Dec 24, 2022)

0.0.11(Dec 14, 2020)

0.0.10(Dec 14, 2020)

0.0.9(Dec 14, 2020)

0.0.8(Dec 14, 2020)

0.0.7(Dec 14, 2020)

0.0.6(Dec 14, 2020)

0.0.5(Dec 14, 2020)

0.0.4(Dec 14, 2020)

0.0.3(Dec 14, 2020)

0.0.2(Dec 14, 2020)

0.0.1(Dec 14, 2020)

Owner

Phil Wang

An open-access benchmark and toolbox for electricity price forecasting

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Implementation of Barlow Twins paper

A simple and lightweight genetic algorithm for optimization of any machine learning model

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

Code for "Offline Meta-Reinforcement Learning with Advantage Weighting" [ICML 2021]

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Reproducing-BowNet: Learning Representations by Predicting Bags of Visual Words

Backend code to use MCPI's python API to make infinite worlds with custom generation

Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

Code base for the paper "Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation"

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy"

DM-ACME compatible implementation of the Arm26 environment from Mujoco

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Object Detection and Multi-Object Tracking

Focal Loss for Dense Rotation Object Detection