Bootstrapped Representation Learning on Graphs

Related tags

Deep Learningbgrl
Overview

Bootstrapped Representation Learning on Graphs

Overview of BGRL

This is the PyTorch implementation of BGRL Bootstrapped Representation Learning on Graphs

The main scripts are train_transductive.py and train_ppi.py used for training on the transductive task datasets and the PPI dataset respectively.

For linear evaluation, using the checkpoints we provide

Setup

To set up a Python virtual environment with the required dependencies, run:

python3 -m venv bgrl_env
source bgrl_env/bin/activate
pip install --upgrade pip

Follow instructions to install PyTorch 1.9.1 and PyG:

pip install torch==1.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install absl-py==0.12.0 tensorboard==2.6.0 ogb

The code uses PyG (PyTorch Geometric). All datasets are available through this package.

Experiments on transductive tasks

Train model from scratch

To run BGRL on a dataset from the transductive setting, use train_transductive.py and one of the configuration files that can be found in config/.

For example, to train on the Coauthor-CS dataset, use the following command:

python3 train_transductive.py --flagfile=config/coauthor-cs.cfg

Flags can be overwritten:

python3 train_transductive.py --flagfile=config/coauthor-cs.cfg\
                              --logdir=./runs/coauthor-cs-256\
                              --predictor_hidden_size=256

Evaluation is performed periodically during training. We fit a logistic regression model on top of the representation to assess its performance throughout training. Evaluation is triggered every eval_epochsand will not back-propagate any gradient to the encoder.

Test accuracies under linear evaluation are reported on TensorBoard. To start the tensorboard server run the following command:

tensorboard --logdir=./runs

Perform linear evaluation using the provided model weights

The configuration files we provide allow to reproduce the results in the paper, summarized in the table below. We also provide weights of the BGRL-trained encoders for each dataset.

WikiCS Amazon Computers Amazon Photos CoauthorCS CoauthorPhy
BGRL 79.98 ± 0.10
(weights)
90.34 ± 0.19
(weights)
93.17 ± 0.30
(weights)
93.31 ± 0.13
(weights)
95.73 ± 0.05
(weights)

To run linear evaluation, using the provided weights, run the following command for any of the datasets:

python3 linear_eval_transductive.py --flagfile=config-eval/coauthor-cs.cfg

Note that the dataset is split randomly between train/val/test, so the reported accuracy might be slightly different with each run. In our reported table, we average across multiple splits, as well as multiple randomly initialized network weights.

Experiments on inductive task with multiple graphs

To train on the PPI dataset, use train_ppi.py:

python3 train_ppi.py --flagfile=config/ppi.cfg

The evaluation for PPI is different due to the size of the dataset, we evaluate by training a linear layer on top of the representations via gradient descent for 100 steps.

The configuration files for the different architectures can be found in config/. We provide weights of the BGRL-trained encoder as well.

PPI
BGRL 69.41 ± 0.15 (weights)

To run linear evaluation, using the provided weights, run the following command:

python3 linear_eval_ppi.py --flagfile=config-eval/ppi.cfg

Note that our reported score is based on an average over multiple runs.

Citation

If you find the code useful for your research, please consider citing our work:

@misc{thakoor2021bootstrapped,
     title={Large-Scale Representation Learning on Graphs via Bootstrapping}, 
     author={Shantanu Thakoor and Corentin Tallec and Mohammad Gheshlaghi Azar and Mehdi Azabou and Eva L. Dyer and Rémi Munos and Petar Veličković and Michal Valko},
     year={2021},
     eprint={2102.06514},
     archivePrefix={arXiv},
     primaryClass={cs.LG}}
Owner
NerDS Lab :: Neural Data Science Lab
machine learning and neuroscience
NerDS Lab :: Neural Data Science Lab
Time Series Cross-Validation -- an extension for scikit-learn

TSCV: Time Series Cross-Validation This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the traini

Wenjie Zheng 222 Jan 01, 2023
A Python package for faster, safer, and simpler ML processes

Bender 🤖 A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022
Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Revisiting RCAN: Improved Training for Image Super-Resolution Introduction Image super-resolution (SR) is a fast-moving field with novel architectures

Zudi Lin 76 Dec 01, 2022
Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

ConvNeXt Tensorflow This is unofficial tensorflow keras implementation of ConvNe

29 Oct 06, 2022
my graduation project is about live human face augmentation by projection mapping by using CNN

Live-human-face-expression-augmentation-by-projection my graduation project is about live human face augmentation by projection mapping by using CNN o

1 Mar 08, 2022
A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

Matthew Macy 606 Dec 21, 2022
Extremely simple and fast extreme multi-class and multi-label classifiers.

napkinXC napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus of implementing various m

Marek Wydmuch 43 Nov 14, 2022
Learned model to estimate number of distinct values (NDV) of a population using a small sample.

Learned NDV estimator Learned model to estimate number of distinct values (NDV) of a population using a small sample. The model approximates the maxim

2 Nov 21, 2022
GoodNews Everyone! Context driven entity aware captioning for news images

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy! Model preview: Huge T

117 Dec 19, 2022
Deep Two-View Structure-from-Motion Revisited

Deep Two-View Structure-from-Motion Revisited This repository provides the code for our CVPR 2021 paper Deep Two-View Structure-from-Motion Revisited.

Jianyuan Wang 145 Jan 06, 2023
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Imbalanced Dataset Sampler Introduction In many machine learning applications, we often come across datasets where some types of data may be seen more

Ming 2k Jan 08, 2023
Lab course materials for IEMBA 8/9 course "Coding and Artificial Intelligence"

IEMBA 8/9 - Coding and Artificial Intelligence Dear IEMBA 8/9 students, welcome to our IEMBA 8/9 elective course Coding and Artificial Intelligence, t

Artificial Intelligence & Machine Learning (AI:ML Lab) @ HSG 1 Jan 11, 2022
BlueFog Tutorials

BlueFog Tutorials Welcome to the BlueFog tutorials! In this repository, we've put together a collection of awesome Jupyter notebooks. These notebooks

4 Oct 27, 2021
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

Kanghyun Choi 21 Nov 03, 2022
MediaPipe is a an open-source framework from Google for building multimodal

MediaPipe is a an open-source framework from Google for building multimodal (eg. video, audio, any time series data), cross platform (i.e Android, iOS, web, edge devices) applied ML pipelines. It is

Bhavishya Pandit 3 Sep 30, 2022
PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

Yoonki Jeong 129 Dec 22, 2022
PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

Small Lesion Segmentation in Brain MRIs with Subpixel Embedding PyTorch implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedd

22 Oct 21, 2022
UPSNet: A Unified Panoptic Segmentation Network

UPSNet: A Unified Panoptic Segmentation Network Introduction UPSNet is initially described in a CVPR 2019 oral paper. Disclaimer This repository is te

Uber Research 622 Dec 26, 2022
a pytorch implementation of auto-punctuation learned character by character

Learning Auto-Punctuation by Reading Engadget Articles Link to Other of my work 🌟 Deep Learning Notes: A collection of my notes going from basic mult

Ge Yang 137 Nov 09, 2022