[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Overview

Efficient Graph Similarity Computation - (EGSC)

This repo contains the source code and dataset for our paper:

Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation
2021 Conference on Neural Information Processing Systems (NeurIPS 2021)
[Paper]

@inproceedings{qin2021slow,
              title={Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation},
              author={Qin, Can and Zhao, Handong and Wang, Lichen and Wang, Huan and Zhang, Yulun and Fu, Yun},
              booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
              year={2021}
            }
    

EGSC Illustration of knowledge distillation to achieve a fast model given a early-fusion model. Such the fast/individual model enables the online inference.

Introduction

Graph Similarity Computation (GSC) is essential to wide-ranging graph appli- cations such as retrieval, plagiarism/anomaly detection, etc. The exact computation of graph similarity, e.g., Graph Edit Distance (GED), is an NP-hard problem that cannot be exactly solved within an adequate time given large graphs. Thanks to the strong representation power of graph neural network (GNN), a variety of GNN-based inexact methods emerged. To capture the subtle difference across graphs, the key success is designing the dense interaction with features fusion at the early stage, which, however, is a trade-off between speed and accuracy. For Slow Learning of graph similarity, this paper proposes a novel early-fusion approach by designing a co-attention-based feature fusion network on multilevel GNN features. To further improve the speed without much accuracy drop, we introduce an efficient GSC solution by distilling the knowledge from the slow early-fusion model to the student one for Fast Inference. Such a student model also enables the offline collection of individual graph embeddings, speeding up the inference time in orders. To address the instability through knowledge transfer, we decompose the dynamic joint embedding into the static pseudo individual ones for precise teacher-student alignment. The experimental analysis on the real-world datasets demonstrates the superiority of our approach over the state-of-the-art methods on both accuracy and efficiency. Particularly, we speed up the prior art by more than 10x on the benchmark AIDS data.

Dataset

We have used the standard dataloader, i.e., ‘GEDDataset’, directly provided in the PyG, whose downloading link can be referred below.

AIDS700nef

LINUX

ALKANE

IMDBMulti

The code takes pairs of graphs for training from an input folder where each pair of graph is stored as a JSON. Pairs of graphs used for testing are also stored as JSON files. Every node id and node label has to be indexed from 0. Keys of dictionaries are stored strings in order to make JSON serialization possible.

Every JSON file has the following key-value structure:

{"graph_1": [[0, 1], [1, 2], [2, 3], [3, 4]],
 "graph_2":  [[0, 1], [1, 2], [1, 3], [3, 4], [2, 4]],
 "labels_1": [2, 2, 2, 2],
 "labels_2": [2, 3, 2, 2, 2],
 "ged": 1}

The **graph_1** and **graph_2** keys have edge list values which descibe the connectivity structure. Similarly, the **labels_1** and **labels_2** keys have labels for each node which are stored as list - positions in the list correspond to node identifiers. The **ged** key has an integer value which is the raw graph edit distance for the pair of graphs.

Requirements

The codebase is implemented in Python 3.6.12. package versions used for development are just below.

matplotlib        3.3.4
networkx          2.4
numpy             1.19.5
pandas            1.1.2
scikit-learn      0.23.2
scipy             1.4.1
texttable         1.6.3
torch             1.6.0
torch-cluster     1.5.9
torch-geometric   1.7.0
torch-scatter     2.0.6
torch-sparse      0.6.9
tqdm              4.60.0

The installation of pytorch-geometric (PyG) please refers to its official tutorial.

File Structure

.
├── README.md
├── LICENSE                            
├── EGSC-T
│   ├── src
│   │    ├── egsc.py 
│   │    ├── layers.py
│   │    ├── main.py
│   │    ├── parser.py        
│   │    └── utils.py                             
│   ├── README.md                      
│   └── train.sh                        
└── GSC_datasets
    ├── AIDS700nef
    ├── ALKANE
    ├── IMDBMulti
    └── LINUX

To Do

- [x] GED Datasets Processing
- [x] Teacher Model Training
- [ ] Student Model Training
- [ ] Knowledge Distillation
- [ ] Online Inference

The remaining implementations are coming soon.

Acknowledgement

We would like to thank the SimGNN and Extended-SimGNN which we used for this implementation.

Owner
PhD student in Northeastern University, Boston, USA
A curated list of awesome projects and resources related fastai

A curated list of awesome projects and resources related fastai

Tanishq Abraham 138 Dec 22, 2022
Exploring Image Deblurring via Blur Kernel Space (CVPR'21)

Exploring Image Deblurring via Encoded Blur Kernel Space About the project We introduce a method to encode the blur operators of an arbitrary dataset

VinAI Research 118 Dec 19, 2022
ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

ViViT is a collection of numerical tricks to efficiently access curvature from the generalized Gauss-Newton (GGN) matrix based on its low-rank structure. Provided functionality includes computing

Felix Dangel 12 Dec 08, 2022
TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video Timely handgun detection is a cr

Mario Duran-Vega 18 Dec 26, 2022
A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

BeatNet A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking. This repository

Mojtaba Heydari 157 Dec 27, 2022
ComputerVision - This repository aims at realized easy network architecture

ComputerVision This repository aims at realized easy network architecture Colori

DongDong 4 Dec 14, 2022
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

StochFuzz: A New Solution for Binary-only Fuzzing StochFuzz is a (probabilistically) sound and cost-effective fuzzing technique for stripped binaries.

Zhuo Zhang 164 Dec 05, 2022
VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

VoxHRNet This is the official implementation of the following paper: Whole Brain Segmentation with Full Volume Neural Network Yeshu Li, Jonathan Cui,

Microsoft 12 Nov 24, 2022
Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

Vision and Language Group@ MIL 23 Dec 21, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

24 Nov 02, 2022
Official implementation of our paper "Learning to Bootstrap for Combating Label Noise"

Learning to Bootstrap for Combating Label Noise This repo is the official implementation of our paper "Learning to Bootstrap for Combating Label Noise

21 Apr 09, 2022
MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

MAg Paper Abstract File structure Dataset prepare Data description How to use MAg? Why not try the MAg_lib! Trained models Experiment and results Some

Calvin Pang 3 Apr 08, 2022
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

43 Oct 14, 2022
the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

Wanquan Feng 205 Nov 09, 2022
Repo for the Video Person Clustering dataset, and code for the associated paper

Video Person Clustering Repo for the Video Person Clustering dataset, and code for the associated paper. This reporsitory contains the Video Person Cl

Andrew Brown 47 Nov 02, 2022
Official implementation of "Robust channel-wise illumination estimation"

This repository provides the official implementation of "Robust channel-wise illumination estimation." accepted in BMVC (2021).

Firas Laakom 4 Nov 08, 2022
This project deals with the detection of skin lesions within the ISICs dataset using YOLOv3 Object Detection with Darknet.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Skin Lesion detection using YOLO This project deal

Lalith Veerabhadrappa Badiger 1 Nov 22, 2021