Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Last update: Sep 26, 2022

Related tags

Deep Learning TE-VQGAN

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin¹, Gyubok Lee¹, Jiyoung Lee¹, Joonseok Lee^2,3, Edward Choi¹ | Paper

¹KAIST, ²Google Research, ³Seoul National University

Abstract

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Requirements

TBU

Download Dataset

TBU

Training TE-VQGAN (Stage 1)

TBU

Training Bi-directional Image-Text Generator (Stage 2)

TBU

Thanks to

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch. Thanks to all related works!

Citation

@misc{shin2021translationequivariant,
      title={Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation}, 
      author={Woncheol Shin and Gyubok Lee and Jiyoung Lee and Joonseok Lee and Edward Choi},
      year={2021},
      eprint={2112.00384},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Related tags

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Abstract

Requirements

Download Dataset

Training TE-VQGAN (Stage 1)

Training Bi-directional Image-Text Generator (Stage 2)

Thanks to

Citation

Owner

Woncheol Shin

Mahadi-Now - This Is Pakistani Just Now Login Tools

Caffe-like explicit model constructor. C(onfig)Model

Parameter Efficient Deep Probabilistic Forecasting

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

DNA-RECON { Automatic Web Reconnaissance Tool }

Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

A big endian Gentoo port developed on a Pine64.org RockPro64

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Pydantic models for pywttr and aiopywttr.

From the basics to slightly more interesting applications of Tensorflow

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Keyword-BERT: Keyword-Attentive Deep Semantic Matching

Flax is a neural network ecosystem for JAX that is designed for flexibility.

Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval.

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

Scalable machine learning based time series forecasting

A simple and lightweight genetic algorithm for optimization of any machine learning model