Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Related tags

Deep Learningppuda
Overview

Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

authors: Boris Knyazev, Michal Drozdzal, Graham Taylor, Adriana Romero-Soriano

Overview Results on ResNet-50

This repository contains the code to train and evaluate Graph HyperNetworks (GHNs). This repository also contains the DeepNets-1M dataset of neural architectures proposed in our paper to train and evaluate GHNs. Our improved GHNs trained on our DeepNets-1M allow to predict parameters for diverse networks, even if they are very different from those used to train GHNs (e.g. ResNet-50). Parameter prediction by GHNs is performed in a single forward pass and on average takes < 1 second either on GPU or CPU!

Table of Contents

Requirements and installation

The main requirements are:

  • Python 3.6+
  • PyTorch 1.9+
  • NetworkX

For graph visualizations we use pygraphviz. To make it work, graphviz may need to be installed as sudo apt-get install graphviz graphviz-dev.

To install the ppuda package

pip install .  # use pip install -e . to install an editable version

To obtain all packages required for every file

pip install -r requirements.txt

It's also possible to use conda to install this by running:

conda create --name ppuda --file requirements.txt --channel default --channel anaconda --channel conda-forge --channel pytorch

Available GHNs

We release five GHNs. Below are top-1 accuracies on CIFAR-10 and top-5 accuracies on ImageNet using the parameters predicted by one of the GHNs.

Model ResNet-50* Best Architecture (index)
MLP-CIFAR-10 17.7 60.2 (167)
GHN-1-CIFAR-10 19.2 59.9 (179)
GHN-2-CIFAR-10 58.6 77.1 (210)
GHN-1-ImageNet 6.9 32.1 (295)
GHN-2-ImageNet 5.3 48.3 (85)

* ResNet-50 is an unseen architecture (i.e. trained GHNs have not observed such or even similar architectures during training). Our GHNs can still predict good parameters for such unseen architectures. On ImageNet, even though the performance is low, the predicted parameters are very useful for fine-tuning.

Denotes the architecture index in the test split of DeepNets-1M.

Each GHN checkpoint takes just a few megabytes and is stored in the checkpoints folder of this repository.

Minimal example: predict parameters for ResNet-50

ImageNet:

from ppuda.ghn.nn import GHN2
import torchvision.models as models

ghn = GHN2('imagenet')      # load our GHN-2 trained on ImageNet
model = models.resnet50()   # ResNet-50 or any other torchvision model
model = ghn(model)          # predict parameters in < 1 second on GPU/CPU

# That's it! The model can be now evaluated on ImageNet to obtain top5=5.2%.

CIFAR-10:

from ppuda.ghn.nn import GHN2
import torchvision.models as models

# On CIFAR-10, we have an additional step of adjusting 
# the first layer(s) of the network for a 32x32 image size,
# since torchvision models expect a 224x224 input, 
# while GHNs on CIFAR-10 were trained on 32x32 inputs.

from ppuda.utils import adjust_net

ghn = GHN2('cifar10')                    # load our GHN-2 trained on CIFAR-10
model = models.resnet50(num_classes=10)  # ResNet-50 
model = adjust_net(model)                # adjust to a 32x32 input
model = ghn(model)                       # predict parameters in < 1 second on GPU/CPU

# That's it! The model can be now evaluated on CIFAR-10 to obtain top1=58.6%.

Full example for ResNet-50 and other torchvision models can be found in examples/torch_models.py and examples/all_torch_models.py. See other examples in examples.

Note 1: For the networks with batch norm, the running statistics of batch norm layers are not predicted (since these statistics are not trainable parameters). So to evaluate such networks, our code computes batch statistics on the evaluation set with batch size = 64. The networks without batch norm (e.g. in our BN-Free split) have the same accuracies regardless of the batch size.

Note 2: To evaluate/train on ImageNet, follow the data instructions below on how to prepare the ImageNet dataset.

Data

DeepNets-1M

To train or evaluate on DeepNets-1M, first download the dataset file by running ./data/download.sh.

To generate a new DeepNets-1M dataset, the following command can be used:

python experiments/net_generator.py train 1000000 ./data

The dataset generated using this command should be close to our training dataset. Other splits can be regenerated by specifying the split as the first argument.

CIFAR-10

CIFAR-10 is downloaded automatically and is saved in the --data_dir folder (default is ./data).

ImageNet

We implemented a simple wrapper of the torchvision.datasets.ImageNet implementation. The ImageNet root folder imagenet is expected to be in the ./data folder by default with the following structure:

./data
│   imagenet
│   │   train
|   |   |    n01440764
|   |   |    n01443537
|   |   |    ...
│   │   val
|   |   |    n01440764
|   |   |    n01443537
|   |   |    ...
│   │   ILSVRC2012_devkit_t12.tar.gz
│   deepnets1m_train.hdf5       
|   deepnets1m_train_meta.json
|   ...

Both imagenet/train and imagenet/val must contain separate folders for each class. Follow the official instructions on how to obtain ImageNet (ILSVRC 2012) data.

Reproducing main results

The arguments of our scripts are described in config.py. The default hyperparameters are based on our paper. Below, the examples to run the scripts and override the default hyperparameters are shown.

DeepNets-1M results

Training GHN

  • GHN-1 on CIFAR-10: python experiments/train_ghn.py --name ghn1

  • GHN-2 on CIFAR-10: python experiments/train_ghn.py -m 8 -n -v 50 --ln --name ghn2

  • MLP on CIFAR-10: python experiments/train_ghn.py -m 8 -n -v 50 --ln -H mlp --name mlp

where -m 8 denotes meta batch size = 8, -n denotes to normalize predicted parameters, -v 50 denotes adding virtual edges to graphs with 50 as the maximum shortest path length, --ln denotes adding layer normalization before decoding the parameters, --name ghn2 denotes the directory name where to save trained checkpoints (which is combined with --save_dir to obtain the full path), -H mlp denotes using MLP instead of GatedGNN.

To train on Imagenet, use -d imagenet. To train GHNs on multiple GPUs (e.g. with a large meta batch size), add --multigpu to use all CUDA devices available (make sure to set CUDA_VISIBLE_DEVICES appropriately).

For example, to train GHN-2 on Imagenet and 4 GPUs: export CUDA_VISIBLE_DEVICES=0,1,2,3; python experiments/train_ghn.py -m 8 -n -v 50 --ln --name ghn2_imagenet -d imagenet --multigpu

Evaluating GHNs

  • Evaluate GHN-2 on CIFAR-10 on all architectures of $split from DeepNets-1M: python experiments/eval_ghn.py --ckpt ./checkpoints/ghn2_cifar10.pt -d cifar10 --split $split

  • Evaluate GHN-2 on CIFAR-10 on a single architecture from DeepNets-1M: python experiments/eval_ghn.py --ckpt ./checkpoints/ghn2_cifar10.pt -d cifar10 --split $split --arch $ind

where $split is one from val, test, wide, deep, dense, bnfree, predefined, $ind is an integer index of the architecture in a split.

Training and evaluating SGD

  • Train architecture=0 from the test split of DeepNets-1M for 50 epochs on CIFAR-10: python experiments/sgd/train_net.py --split test --arch 0 --epochs 50

  • Train the best architecture from the DARTS paper for 50 epochs on CIFAR-10: python experiments/sgd/train_net.py --arch DARTS --epochs 50

  • Train architecture=0 from the wide split of DeepNets-1M for 1 epoch on ImageNet: python experiments/sgd/train_net.py --split wide --arch 0 --epochs 1 -d imagenet

Fine-tuning predicted parameters on other tasks

The parameters predicted by GHN-2 trained on ImageNet can be fine-tuned on any vision dataset, such as CIFAR-10.

100-shot CIFAR-10

  • Fine-tune ResNet-50 initialized with the parameters predicted by GHN-1-ImageNet: python experiments/sgd/train_net.py --split predefined --arch 0 --epochs 50 -d cifar10 --n_shots 100 --wd 1e-3 --ckpt ./checkpoints/ghn1_imagenet.pt

  • Fine-tune ResNet-50 initialized with the parameters predicted by GHN-2-ImageNet: python experiments/sgd/train_net.py --split predefined --arch 0 --epochs 50 -d cifar10 --n_shots 100 --wd 1e-3 --ckpt ./checkpoints/ghn2_imagenet.pt

  • Fine-tune ResNet-50 initialized randomly with Kaiming He's method: python experiments/sgd/train_net.py --split predefined --arch 0 --epochs 50 -d cifar10 --n_shots 100 --wd 1e-3

  • Fine-tune ResNet-50 pretrained on Imagenet: python experiments/sgd/train_net.py --split predefined --arch 0 --epochs 50 -d cifar10 --n_shots 100 --wd 1e-3 --pretrained

  • Fine-tune ViT initialized with the parameters predicted by GHN-2-ImageNet: python experiments/sgd/train_net.py --split predefined --arch 1 --epochs 50 -d cifar10 --n_shots 100 --wd 1e-3 --ckpt ./checkpoints/ghn2_imagenet.pt

  • Fine-tune DARTS initialized with the parameters predicted by GHN-2-ImageNet: python experiments/sgd/train_net.py --arch DARTS --epochs 50 -d cifar10 --n_shots 100 --wd 1e-3 --init_channels 48 --layers 14 --ckpt ./checkpoints/ghn2_imagenet.pt

--wd 1e-3 was generally the best in these experiments. To report the results in the paper, we also tuned the initial learning rate on the 200 validation images of the 100-shot CIFAR-10 training set, so the results obtained with the scripts above might be a bit different from the reported ones.

Object detection

In the paper, we fine-tune on Penn-Fudan object detection. Our experiments are based on PyTorch Object Detection Finetuning Tutorial.

The dataset can be downloaded from here and should be put inside the ./data folder like ./data/PennFudanPed.

The commands to fine-tune/train networks for object detection are similar to those for 100-shot CIFAR-10 above, but are based on the experiments/sgd/detector/train_detector.py script and the hyperparameters from the tutorial. For example, to fine-tune DARTS initialized with the parameters predicted by GHN-2-ImageNet.

python experiments/sgd/detector/train_detector.py -d PennFudanPed --arch DARTS --ckpt ./checkpoints/ghn2_imagenet.pt --init_channels 48 --layers 14

Property prediction

To train and evaluate regression models on top of graph embeddings extracted using GHN-2-CIFAR-10:

python experiments/property_prediction.py cifar10 ./checkpoints/ghn2_cifar10.pt

The script will evaluate the four properties of architectures discussed in the paper: accuracy on the clean test set, accuracy on a corrupted test set, inference speed, and speed of convergence.

The extracted embeddings in the .npy format for each GHN are available in the checkpoints folder, but will be recomputed if they are missing.

NAS

Training the best (in terms of accuracy in this example) architecture on CIFAR-10 with SGD for 600 epochs according to the DARTS protocol:

python experiments/sgd/train_net.py --split search --arch 35133 --epochs 600 --cutout --drop_path_prob 0.2 --auxiliary

Architecture 35133 was found to be the best in the search split on CIFAR-10 using our GHN-2.

Visualization

Example of visualizing the computational graph of ResNet-50.

import torchvision
from ppuda.deepnets1m.graph import Graph

Graph(torchvision.models.resnet50()).visualize(node_size=100)

Example of visualizing the computational graph of the best DARTS network.

from ppuda.deepnets1m.graph import Graph
from ppuda.deepnets1m.net import Network
from ppuda.deepnets1m.genotypes import DARTS

model = Network(C=48, num_classes=1000, genotype=DARTS, n_cells=14)
Graph(model).visualize(node_size=50)
ResNet-50 ViT DARTS

See more examples for different architectures in examples/graph_visualization.ipynb.

License

The majority of PPUDA is licensed under MIT license, however portions of the project are available under separate license terms: DARTS is licensed under the Apache 2.0 license and NetworkX is licensed under the 3-Clause BSD license.

Contributions

Please submit a pull request or open a github issue (see the details). Make sure to comply with our code of conduct.

Acknowledgements

We thank the Vector AI Engineering team (Gerald Shen, Maria Koshkina and Deval Pandya) for code review.

Citation

@inproceedings{knyazev2021parameter,
  title={Parameter Prediction for Unseen Deep Architectures},
  author={Knyazev, Boris and Drozdzal, Michal and Taylor, Graham W and Romero-Soriano, Adriana},
  booktitle={Advances in Neural Information Processing Systems},
  year={2021}  
}
Owner
Facebook Research
Facebook Research
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 22 Sep 28, 2022
An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

deepbci 272 Jan 08, 2023
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

Networks Learning 11 Dec 09, 2022
Rule Extraction Methods for Interactive eXplainability

REMIX: Rule Extraction Methods for Interactive eXplainability This repository contains a variety of tools and methods for extracting interpretable rul

Mateo Espinosa Zarlenga 21 Jan 03, 2023
Automatic differentiation with weighted finite-state transducers.

GTN: Automatic Differentiation with WFSTs Quickstart | Installation | Documentation What is GTN? GTN is a framework for automatic differentiation with

100 Dec 29, 2022
Code of Periodic Activation Functions Induce Stationarity

Periodic Activation Functions Induce Stationarity This repository is the official implementation of the methods in the publication: L. Meronen, M. Tra

AaltoML 12 Jun 07, 2022
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

Evolving-spiking-neuron-cellular-automata-and-networks-to-emulate-in-vitro-neuronal-activity Code accompanying "Evolving spiking neuron cellular autom

SOCRATES: Self-Organizing Computational substRATES 2 Dec 02, 2022
Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

Hyperparameter Optimization of Machine Learning Algorithms This code provides a hyper-parameter optimization implementation for machine learning algor

Li Yang 1.1k Dec 19, 2022
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Hung Nguyen 72 Dec 20, 2022
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Ka

Tomas Jakab 87 Nov 30, 2022
Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning" This is the code for the paper Solving Graph-based Public Goo

Victor-Alexandru Darvariu 3 Dec 05, 2022
Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.

Exposure: A White-Box Photo Post-Processing Framework ACM Transactions on Graphics (presented at SIGGRAPH 2018) Yuanming Hu1,2, Hao He1,2, Chenxi Xu1,

Yuanming Hu 719 Dec 29, 2022
Vision transformers (ViTs) have found only limited practical use in processing images

CXV Convolutional Xformers for Vision Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-o

Cloudwalker 23 Sep 10, 2022
Tensorflow2.0 🍎🍊 is delicious, just eat it! 😋😋

How to eat TensorFlow2 in 30 days ? 🔥 🔥 Click here for Chinese Version(中文版) 《10天吃掉那只pyspark》 🚀 github项目地址: https://github.com/lyhue1991/eat_pyspark

lyhue1991 9.7k Jan 01, 2023
Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

RSPNet Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning" [Suppleme

35 Jun 24, 2022