MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Last update: Jun 28, 2022

Related tags

Overview

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Results on MAG240M

Here, we demonstrate the following performance on the MAG240M dataset from [email protected] 2021.

Model	Test Acc	Validation Acc	Parameters	Hardware
Our Model	0.7447	0.7669 ± 0.0003 (ensemble 0.7696)	743,449	Tesla V100 (21GB)

Reproducing results

0. Requirements

Here just list python3 packages we used in this competition:

numpy==1.19.2
torch==1.5.1+cu101
dgl-cu101==0.6.0.post1
ogb==1.3.1
sklearn==0.23.2
tqdm==4.46.1

1. Prepare Graph and Features

The preprocess code modifed from dgl baseline. We created graph with 6 different edge types instead of 5.

# Time cost: 3hours,30mins

python3 $MAG_CODE_PATH/preprocess.py
        --rootdir $MAG_INPUT_PATH \
        --author-output-path $MAG_PREP_PATH/author.npy \
        --inst-output-path $MAG_PREP_PATH/inst.npy \
        --graph-output-path $MAG_PREP_PATH \
        --graph-as-homogeneous \
        --full-output-path $MAG_PREP_PATH/full_feat.npy

The graphs and features will be saved in MAG_PREP_PATH , where the MAG_PREP_PATH is specified in run.sh.

Calculate features

The meta-path based features are generated by this script. Details can be found in our technical report.

# Time cost: 2hours,20mins (only generate label related features)

python3 $MAG_CODE_PATH/feature.py
        $MAG_INPUT_PATH \
        $MAG_PREP_PATH/dgl_graph_full_heterogeneous_csr.bin \
        $MAG_FEAT_PATH \
        --seed=42

Train RGAT model and prepare RGAT features

The RGAT model is modifed from dgl baseline. The validation accuracy is 0.701 , as same as described in the dgl baseline github.

# Time cost: 33hours,40mins (20mins for each epoch)

python3 $MAG_CODE_PATH/rgat.py
        --rootdir $MAG_INPUT_PATH \
        --graph-path $MAG_PREP_PATH/dgl_graph_full_homogeneous_csc.bin \
        --full-feature-path $MAG_PREP_PATH/full_feat.npy \
        --output-path $MAG_RGAT_PATH/ \
        --epochs=100 \
        --model-path $MAG_RGAT_PATH/model.pt \
        --submission-path $MAG_RGAT_PATH/

You will get embeddings as input features of the following MPLP models.

2. Train MPLP models

The train process splits to two steps:

train the model with full train samples at a large learning rate (here we use StepLR(lr=0.01, step_size=100, gamma=0.25))
then fine tune the model with latest train samples (eg, paper with year >= 2018) with a small learning rate (0.000625)

You can train the MPLP model by running the following commands:

# Time cost: 2hours,40mins for each seed

for seed in $(seq 0 7);
do
    python3 $MAG_CODE_PATH/mplp.py \
            $MAG_INPUT_PATH \
            $MAG_MPLP_PATH/data/ \
            $MAG_MPLP_PATH/output/seed${seed} \
            --gpu \
            --seed=${seed} \
            --batch_size=10240 \
            --epochs=200 \
            --num_layers=2 \
            --learning_rate=0.01 \
            --dropout=0.5 \
            --num_splits=5
done

3. Ensemble MPLP results

While having all the results with k-fold cross validation training under 8 different seeds, you can average the results by running code below:

python3 $MAG_CODE_PATH/ensemble.py $MAG_MPLP_PATH/output/ $MAG_SUBM_PATH

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Related tags

Overview

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Results on MAG240M

Reproducing results

0. Requirements

1. Prepare Graph and Features

Calculate features

Train RGAT model and prepare RGAT features

2. Train MPLP models

3. Ensemble MPLP results

Owner

Qiuying Peng

This is the face keypoint train code of project face-detection-project

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

A quick recipe to learn all about Transformers

Official repository for Hierarchical Opacity Propagation for Image Matting

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

The official repository for "Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning" paper.

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

PyTorch implementation for paper Neural Marching Cubes.

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

PyTorch IPFS Dataset

Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

GPT, but made only out of gMLPs

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Active window border replacement for window managers.

Dogs classification with Deep Metric Learning using some popular losses

This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Greedy Gaussian Segmentation

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"