A large-scale database for graph representation learning

Last update: Nov 25, 2022

Overview

A Large-Scale Database for Graph Representation Learning

MalNet: Advancing State-of-the-art Graph Databases

Recent research focusing on developing graph kernels, neural networks and spectral methods to capture graph topology has revealed a number of shortcomings of existing graph benchmark datasets, which often contain graphs that are relatively:

limited in number,
small in scale in terms of nodes and edges, and
restricted in class diversity.

To solve these issues, we have been working to develop the worlds largest public graph representation learning database to date at Georgia Tech’s Polo Club of Data Science. We release MalNet, which contains over 1.2 million function call graphs averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families of classes (see Figure 1 below).

Compared to the popular REDDIT-12K database, MalNet offers 105x more graphs, 44x larger graphs on average, and 63x more classes.

What is a function call graph (FCG)?

Function call graphs represent the control flow of programs (see Figure 2 below), and can be statically extracted from many types of software (e.g., EXE, PE, APK). We use the Android ecosystem due to its large market share, easy accessibility, and diversity of malicious software. With the generous permission of the AndroZoo we collected 1,262,024 Android APK files, specifically selecting APKs containing both a family and type label obtained from the Euphony classification structure.

How do we download and explore MalNet?

We have designed and developed MalNet Explorer, an interactive graph exploration and visualization tool to help people easily explore the data before downloading. Figure 3 shows MalNet Explorer’s desktop web interface and its main components. MalNet Explorer and the data is available online at: www.mal-net.org.

How to run the code?

The experiments we conducted in the arXiv paper can be run using dm_experiments.py.

A large-scale database for graph representation learning

Related tags

Overview

A Large-Scale Database for Graph Representation Learning

MalNet: Advancing State-of-the-art Graph Databases

What is a function call graph (FCG)?

How do we download and explore MalNet?

How to run the code?

Owner

Scott Freitas

Download & Install mods for your favorit game with a few simple clicks

An image classification app boilerplate to serve your deep learning models asap!

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

On the model-based stochastic value gradient for continuous reinforcement learning

Pytorch implementation of our method for regularizing nerual radiance fields for few-shot neural volume rendering.

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Simple reimplemetation experiments about FcaNet

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Author Disambiguation using Knowledge Graph Embeddings with Literals

Exploring Image Deblurring via Blur Kernel Space (CVPR'21)

PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Dilated RNNs in pytorch

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021)

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Differentiable Simulation of Soft Multi-body Systems

MoCap-Solver: A Neural Solver for Optical Motion Capture Data

Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI)