Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Last update: Jul 07, 2022

Overview

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020)

Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

Main project page.

Generation of scenes with many objects. Our method achieves better performance on such scenes than previous methods. Left: A partial input scene graph. Middle: Generation using [1]. Right: Generation using our proposed method.

Our novel contributions are:

We propose a model that uses canonical representations of SGs, thus obtaining stronger invariance properties. This in turn leads to generalization on semantically equivalent graphs and improved robustness to graph size and noise in comparison to existing methods.
We show how to learn the canonicalization process from data.
We use our canonical representations within an SG-to-image model and demonstrate our approach results in an improved generation on Visual Genome, COCO, and CLEVR, compared to the state-of-the-art baselines.

Dependencies

To get started with the framework, install the following dependencies:

Python 3.7
pip install -r requirments.txt

Data

Follow the commands below to build the data.

COCO

./scripts/download_coco.sh

VG

./scripts/download_vg.sh

CLEVR

Please download the CLEVR-Dialog Dataset from here.

Training

Training a SG-to-Layout model:

python -m scripts.train --dataset={packed_coco, packed_vg, packed_clevr}

Training AttSpade - Layout-to-Image model:

Optional arguments:

--output_dir=output_path_dir/%s (s is the run_name param) --run_name=folder_name --checkpoint_every=N (default=5000) --dataroot=datasets_path --debug (a flag for debug)

Train on COCO (with boxes):

python -m scripts.train --dataset=coco --batch_size=16 --loader_num_workers=0 --skip_graph_model=0 --skip_generation=0 --image_size=256,256 --min_objects=1 --max_objects=1000 --gpu_ids=0 --use_cuda

Train on VG:

python -m scripts.train --dataset=vg --batch_size=16 --loader_num_workers=0 --skip_graph_model=0 --skip_generation=0 --image_size=256,256 --min_objects=3 --max_objects=30 --gpu_ids=0 --use_cuda

Train on CLEVR:

python -m scripts.train --dataset=packed_clevr --batch_size=6 --loader_num_workers=0 --skip_graph_model=0 --skip_generation=0 --image_size=256,256 --use_img_disc=1 --gpu_ids=0 --use_cuda

Inference

Inference SG-to-Layout

To produce layout outputs and IOU results, run:

python -m scripts.layout_generation --checkpoint=<trained_model_folder> --gpu_ids=<0/1/2>

A new folder with the results will be created in: <trained_model_folder>

Pre-trained Models:

Packed COCO: link

Packed Visual Genome: link

Inference Layout-to-Image (LostGANs)

Please use LostGANs implementation

Inference Layout-to-Image (from dataframe)

To produce the image from a dataframe, run:

python -m scripts.generation_dataframe --checkpoint=<trained_model_folder>

A new folder with the results will be created in: <trained_model_folder>

Inference Layout-to-Image (AttSPADE)

COCO/ Visual Genome

Generate images from a layout (dataframe):

python -m scripts.generation_dataframe --gpu_ids=<0/1/2> --checkpoint=<model_path> --output_dir=<output_path> --data_frame=<dataframe_path> --mode=<gt/pred>

mode=gt defines use gt_boxes while mode=pred use predicted box by our WSGC model from the paper (see the dataframe for more details).

Pre-trained Models:

COCO

dataframe: link; 128x128 resolution: link; 256x256 resolution: link

Visual Genome

dataframe: link; 128x128 resolution: link; 256x256 resolution: link

Generate images from a scene graph:

python -m scripts.generation_attspade --gpu_ids=<0/1/2> --checkpoint=<model/path> --output_dir=<output_path>

CLEVR

This script generates CLEVR images on large scene graphs from scene_graphs.pkl. It generates the CLEVR results for both WSGC + AttSPADE and Sg2Im + AttSPADE. For more information, please refer to the paper.

python -m scripts.generate_clevr --gpu_ids=<0/1/2> --layout_not_learned_checkpoint=<model_path> --layout_learned_checkpoint=<model_path> --output_dir=<output_path>

Pre-trained Models:

Baseline (Sg2Im): link; WSGC: link

Acknowledgment

This implementation is built on top of [1]: https://github.com/google/sg2im.

References

[1] Justin Johnson, Agrim Gupta, Li Fei-Fei, Image Generation from Scene Graphs, 2018.

Citation

@inproceedings{herzig2019canonical,
 author    = {Herzig, Roei and Bar, Amir and Xu, Huijuan and Chechik, Gal and Darrell, Trevor and Globerson, Amir},
 title     = {Learning Canonical Representations for Scene Graph to Image Generation},
 booktitle = {Proc. of the European Conf. on Computer Vision (ECCV)},
 year      = {2020}
}

Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Related tags

Overview

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020)

Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

Our novel contributions are:

Dependencies

Data

COCO

VG

CLEVR

Training

Training a SG-to-Layout model:

Training AttSpade - Layout-to-Image model:

Inference

Inference SG-to-Layout

Pre-trained Models:

Inference Layout-to-Image (LostGANs)

Inference Layout-to-Image (from dataframe)

Inference Layout-to-Image (AttSPADE)

COCO/ Visual Genome

Generate images from a layout (dataframe):

Pre-trained Models:

COCO

Visual Genome

Generate images from a scene graph:

CLEVR

Pre-trained Models:

Acknowledgment

References

Citation

Owner

roei_herzig

Sparse-dense operators implementation for Paddle

How to Predict Stock Prices Easily Demo

The code of Zero-shot learning for low-light image enhancement based on dual iteration

[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)

Asynchronous Advantage Actor-Critic in PyTorch

Ansible Automation Example: JSNAPY PRE/POST Upgrade Validation

DGN pymarl - Implementation of DGN on Pymarl, which could be trained by VDN or QMIX

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

WTTE-RNN a framework for churn and time to event prediction

The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.

Unofficial implementation of the Involution operation from CVPR 2021

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Fuzzing JavaScript Engines with Aspect-preserving Mutation

Neural Network to colorize grayscale images

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson