Unsupervised captioning - Code for Unsupervised Image Captioning

Overview

Unsupervised Image Captioning

by Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo

Introduction

Most image captioning models are trained using paired image-sentence data, which are expensive to collect. We propose unsupervised image captioning to relax the reliance on paired data. For more details, please refer to our paper.

alt text

Citation

@InProceedings{feng2019unsupervised,
  author = {Feng, Yang and Ma, Lin and Liu, Wei and Luo, Jiebo},
  title = {Unsupervised Image Captioning},
  booktitle = {CVPR},
  year = {2019}
}

Requirements

mkdir ~/workspace
cd ~/workspace
git clone https://github.com/tensorflow/models.git tf_models
git clone https://github.com/tylin/coco-caption.git
touch tf_models/research/im2txt/im2txt/__init__.py
touch tf_models/research/im2txt/im2txt/data/__init__.py
touch tf_models/research/im2txt/im2txt/inference_utils/__init__.py
wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
mkdir ckpt
tar zxvf inception_v4_2016_09_09.tar.gz -C ckpt
git clone https://github.com/fengyang0317/unsupervised_captioning.git
cd unsupervised_captioning
pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:`pwd`

Dataset (Optional. The files generated below can be found at Gdrive).

In case you do not have the access to Google, the files are also available at One Drive.

  1. Crawl image descriptions. The descriptions used when conducting the experiments in the paper are available at link. You may download the descriptions from the link and extract the files to data/coco.

    pip3 install absl-py
    python3 preprocessing/crawl_descriptions.py
    
  2. Extract the descriptions. It seems that NLTK is changing constantly. So the number of the descriptions obtained may be different.

    python -c "import nltk; nltk.download('punkt')"
    python preprocessing/extract_descriptions.py
    
  3. Preprocess the descriptions. You may need to change the vocab_size, start_id, and end_id in config.py if you generate a new dictionary.

    python preprocessing/process_descriptions.py --word_counts_output_file \ 
      data/word_counts.txt --new_dict
    
  4. Download the MSCOCO images from link and put all the images into ~/dataset/mscoco/all_images.

  5. Object detection for the training images. You need to first download the detection model from here and then extract the model under tf_models/research/object_detection.

    python preprocessing/detect_objects.py --image_path\
      ~/dataset/mscoco/all_images --num_proc 2 --num_gpus 1
    
  6. Generate tfrecord files for images.

    python preprocessing/process_images.py --image_path\
      ~/dataset/mscoco/all_images
    

Training

  1. Train the model without the intialization pipeline.

    python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --multi_gpu --batch_size 512 --save_checkpoint_steps 1000\
      --gen_lr 0.001 --dis_lr 0.001
    
  2. Evaluate the model. The last element in the b34.json file is the best checkpoint.

    CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images
    js-beautify saving/b34.json
    
  3. Evaluate the model on test set. Suppose the best validation checkpoint is 20000.

    python test_model.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images --job_dir saving/model.ckpt-20000
    

Initialization (Optional. The files can be found at here).

  1. Train a object-to-sentence model, which is used to generate the pseudo-captions.

    python initialization/obj2sen.py
    
  2. Find the best obj2sen model.

    python initialization/eval_obj2sen.py --threads 8
    
  3. Generate pseudo-captions. Suppose the best validation checkpoint is 35000.

    python initialization/gen_obj2sen_caption.py --num_proc 8\
      --job_dir obj2sen/model.ckpt-35000
    
  4. Train a captioning using pseudo-pairs.

    python initialization/im_caption.py --o2s_ckpt obj2sen/model.ckpt-35000\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt
    
  5. Evaluate the model.

    CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images --job_dir saving_imcap
    js-beautify saving_imcap/b34.json
    
  6. Train sentence auto-encoder, which is used to initialize sentence GAN.

    python initialization/sentence_ae.py
    
  7. Train sentence GAN.

    python initialization/sentence_gan.py
    
  8. Train the full model with initialization. Suppose the best imcap validation checkpoint is 18000.

    python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --imcap_ckpt saving_imcap/model.ckpt-18000\
      --sae_ckpt sen_gan/model.ckpt-30000 --multi_gpu --batch_size 512\
      --save_checkpoint_steps 1000 --gen_lr 0.001 --dis_lr 0.001
    

Credits

Part of the code is from coco-caption, im2txt, tfgan, resnet, Tensorflow Object Detection API and maskgan.

Xinpeng told me the idea of self-critic, which is crucial to training.

Owner
Yang Feng
SWE @ Goolgle
Yang Feng
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees This repository is being continuously updated, please stay tuned! Any code con

86 Dec 28, 2022
Neural Contours: Learning to Draw Lines from 3D Shapes (CVPR2020)

Neural Contours: Learning to Draw Lines from 3D Shapes This repository contains the PyTorch implementation for CVPR 2020 Paper "Neural Contours: Learn

93 Dec 16, 2022
Multi-Horizon-Forecasting-for-Limit-Order-Books

Multi-Horizon-Forecasting-for-Limit-Order-Books This jupyter notebook is used to demonstrate our work, Multi-Horizon Forecasting for Limit Order Books

Zihao Zhang 116 Dec 23, 2022
[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

117 Dec 27, 2022
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
LSTMs (Long Short Term Memory) RNN for prediction of price trends

Price Prediction with Recurrent Neural Networks LSTMs BTC-USD price prediction with deep learning algorithm. Artificial Neural Networks specifically L

5 Nov 12, 2021
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs This is an implemetation of the paper Few-shot Relation Extraction via Baye

MilaGraph 36 Nov 22, 2022
Deep Learning segmentation suite designed for 2D microscopy image segmentation

Deep Learning segmentation suite dessigned for 2D microscopy image segmentation This repository provides researchers with a code to try different enco

7 Nov 03, 2022
Unofficial JAX implementations of Deep Learning models

JAX Models Table of Contents About The Project Getting Started Prerequisites Installation Usage Contributing License Contact About The Project The JAX

107 Jan 05, 2023
A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Minimal Hand A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run. This project provides the

Yuxiao Zhou 824 Jan 07, 2023
Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec This repo

Building and Urban Data Science (BUDS) Group 5 Dec 02, 2022
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022
Long Expressive Memory (LEM)

Long Expressive Memory for Sequence Modeling This repository contains the implementation to reproduce the numerical experiments of the paper Long Expr

Konstantin Rusch 47 Dec 17, 2022
TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition This is an implementation of TCPNet. Introduction For video recognition task, a g

Zilin Gao 21 Dec 08, 2022
Data labels and scripts for fastMRI.org

fastMRI+: Clinical pathology annotations for the fastMRI dataset The fastMRI dataset is a publicly available MRI raw (k-space) dataset. It has been us

Microsoft 51 Dec 22, 2022
Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th

Thang Vu 15 Dec 02, 2022
Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email:

wasteland 11 Nov 12, 2022
Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Learning Domain Invariant Representations in Goal-conditioned Block MDPs Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jim

Chongyi Zheng 3 Apr 12, 2022