KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

Overview

KDD CUP 2020: AutoGraph

Team: aister


  • Members: Jianqiang Huang, Xingyuan Tang, Mingjian Chen, Jin Xu, Bohang Zheng, Yi Qi, Ke Hu, Jun Lei
  • Team Introduction: Most of our members come from the Search Ads Algorithm Team of the Meituan Dianping Advertising Platform Department. We participated in three of the five competitions held by KDD CUP 2020 and achieved promising results. We won first place in Debiasing(1/1895), first place in AutoGraph(1/149), and third place in Multimodalities Recall(3/1433).
  • Based on the business scenario of Meituan and Dianping App, the Search Ads Algorithm Team of Meituan Dianping has rich expertise in innovation and algorithm optimization in the field of cutting-edge technology, including but not limited to, conducting algorithm research and application in the fileds of Debiasing, Graph Learning and Multimodalities.
  • If you are interested in our team or would like to communicate with our team(b.t.w., we are hiring), you can email to [email protected].

Introduction


  • The competition inviting participants deploy AutoML solutions for graph representation learning, where node classification is chosen as the task to evaluate the quality of learned representations. There are 15 graph datasets which consists of five public datasets to develop AutoML solutions, five feedback datasets to evaluate solutions and other five unseen datasets for the final rankings. Each dataset contains the index value of the node, the processed characteristic value, and the weight of the directed edge. We proposed automatic solutions that can effectively and efficiently learn high-quality representation for each node based on the given features, neighborhood and structural information underlying the graph. Please refer to the competition official website for more details: https://www.automl.ai/competitions/3

Preprocess


  • Feature
    • The size of node degree can obviously represent the importance of node, but the information of node degree with too much value is easy to overfit. So we bucket the node degree.
    • Node index embedding
    • The multi-hop neighbor information of the node.

Model Architecture


  • Automatic proxy evaluation is a better method to select proper models for a new dataset. However, the extremely limited time budget does not allow online model selection. For a trade-off of accuracy and speed, we offline evaluate many models and empirically find that GCN, GAT, GraphSAGE, and TAGConv can get robust and good results on the 5 public dataset and 5 feedback datasets. Thus we use them for ensemble in this code. One can get better results using proxy evaluation.
  • We design different network structures for directed graph and undirected graph, sparse graph and dense graph, graph with node features and graph without node features.

Training Procedure


  • Search learning rate
    • lr_list = [0.05, 0.03, 0.01, 0.0075, 0.005, 0.003, 0.001, 0.0005]
    • Select the optimal learning rate of each model in this data set. After 16 rounds of training, choose the learning rate which get lowest loss(average of epoch 14th, 15th and 16th) in the model.
  • Estimate running time
    • By running the model, estimating the model initialization time and training time for each epoch.
    • The model training epochs are determined according to remaining time and running time of the model.
  • Training and validation
    • The difference of training epochs will lead to the big difference of model effect. It is very easy to overfit for the graph with only node ID information and no original features. So we adopt cross validation and early stopping, which makes the model more robust.
    • training with the following parameters:
      • Learning rate = best_lr
      • Loss: NLL Loss
      • Optimizer: Adam

Reproducibility


  • Requirement
    • Python==3.6
    • torch==1.4.0
    • torch-geometric==1.3.2
    • numpy==1.18.1
    • pandas==1.0.1
    • scikit-learn==0.19.1
  • Training
    • Run ingestion.py.

Reference


[1] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.
[2] Veličković P, Cucurull G, Casanova A, et al. Graph attention networks[J]. arXiv preprint arXiv:1710.10903, 2017.
[3] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs[C]//Advances in neural information processing systems. 2017: 1024-1034.
[4] Du J, Zhang S, Wu G, et al. Topology adaptive graph convolutional networks[J]. arXiv preprint arXiv:1710.10370, 2017.

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
Uncertain natural language inference

Uncertain Natural Language Inference This repository hosts the code for the following paper: Tongfei Chen*, Zhengping Jiang*, Adam Poliak, Keisuke Sak

Tongfei Chen 14 Sep 01, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Principled Detection of Out-of-Distribution Examples in Neural Networks

ODIN: Out-of-Distribution Detector for Neural Networks This is a PyTorch implementation for detecting out-of-distribution examples in neural networks.

189 Nov 29, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

Byeongchang Kim 19 Mar 15, 2022
Finetune SSL models for MOS prediction

Finetune SSL models for MOS prediction This is code for our paper under review for ICASSP 2022: "Generalization Ability of MOS Prediction Networks" Er

Yamagishi and Echizen Laboratories, National Institute of Informatics 32 Nov 22, 2022
Totally Versatile Miscellanea for Pytorch

Totally Versatile Miscellania for PyTorch Thomas Viehmann [email protected] Thi

Thomas Viehmann 428 Dec 28, 2022
Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars This repository is the official implementation of Colar. In this work,

LeYang 246 Dec 13, 2022
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021) This repository contains the code

149 Dec 15, 2022
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

3 Aug 08, 2021
A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

Connor Anderson 20 Dec 03, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

VatsalAryanBhatanagar 1 Dec 30, 2021
Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co

hongbin_xu 127 Jan 04, 2023
OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

OpenMMLab 3k Jan 07, 2023
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation The reference code of Improving Factual Completeness and C

46 Dec 15, 2022
Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology

A310 Computational Neuroscience - Okinawa Institute of Science and Technology, 2022 This repository contains modeling practice materials and homework

Sungho Hong 1 Jan 24, 2022
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 08, 2023