source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Overview

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

This repository contains source code and pre-trained/fine-tuned checkpoints for NAACL 2021 paper "LightningDOT". It currently supports fine-tuning on MSCOCO and Flickr30k. Pre-training code and a demo for FULL MSCOCO retrieval are also available.

Overview of LightningDot

Some code in this repo is copied/modifed from UNITER and DPR.

If you find the code useful for your research, please consider citing:

    @inproceedings{sun2021lightningdot,
    title={LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval},
    author={Sun, Siqi and Chen, Yen-Chun and Li, Linjie and Wang, Shuohang and Fang, Yuwei and Liu, Jingjing},
    booktitle={NAACL-HLT},
    year={2021}
    } 

UNITER Environment

To run UNITER for re-ranker, please set a seperate environment based on this repo.

All pre-training and fine-tuning are using a conda environment that can be created as follows.

Environment

Under the project home folder, first run (depends on your CUDA version)

conda env create -f DVL.yml
conda activate DVL
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

, then install apex by

cd ../
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

In order to use distributed training, under super user, install mpi by

rm -r /usr/local/mpi

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.4.tar.gz 
tar -xvf openmpi-4.0.4.tar.gz 
cd openmpi-4.0.4
./configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --disable-getpwuid --with-verbs
sudo apt-get install libnuma-dev
sudo make -j$(nproc) all && sudo make install
ldconfig

cd -
rm -r openmpi-4.0.4
rm openmpi-4.0.4.tar.gz

export OPENMPI_VERSION=4.0.4

. Finally install horovod by

echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" \
    > /etc/apt/sources.list.d/nvidia-ml.list
apt update
apt install libnccl2=2.4.7-1+cuda10.1 libnccl-dev=2.4.7-1+cuda10.1

export PATH=/usr/local/mpi/bin:$PATH
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod
ldconfig

If you see Error Msg: /usr/bin/ld: cannot find -lnuma, then try

sudo apt-get install libnuma-dev

Download Checkpoints and Meta file

Under project home folder, run

bash bash/download_data.sh

Currently the raw image files and extracted features are not available to download.

Pre-training

Modify the config file at ./config/pretrain-alldata-base.json accordingly, and run

horovodrun -np $NUM_GPU python pretrain.py --config ./config/pretrain-alldata-base.json

. Typically you need to change img_checkpoint, output_dir, and train/val datasets.

A pre-trained checkpoint is availabe at LightningDot.

The checkpoints for UNITER-base and BERT-base can be obtaind from UNITER-base and BERT-base.

Fine-tuning on MSCOCO and Flickr30k

We provide an sample bash script at ./bash/train_flickr.sh, which we used to search for learning rate.

Two checkpoints that have been already fine-tuned on MSCOCO and Flickr30k are also provided at COCO-FT and Flickr-FT.

Evaluation

Run

python eval_itm.py  your_eval_config.json  your_checkpoint.pt 

to run the evaluation. We provide three examples that could be obtained solely based on checkpoints and configurations provided in this repo.

Note that your results may NOT be exactly the same with results below due to different machine/environment configurations (but they should be close enough).

  • Zero-shot evaluation on Flickr30k:
python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/LightningDot.pt
image retrieval recall = {1: 0.5332, 5: 0.8058, 10: 0.8804}
txt retrieval recall = {1: 0.682, 5: 0.891, 10: 0.94}.
  • Fine-tune on flickr, evaluate on flickr:
python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/flickr-ft.pt
image retrieval recall = {1: 0.699, 5: 0.911, 10: 0.9518}
txt retrieval recall = {1: 0.839, 5: 0.972, 10: 0.986}
  • Fine-tune on MSCOCO, evaluate on MSCOCO:
python eval_itm.py ./config/coco_eval_config.json ./data/model/coco-ft.pt
image retrieval recall = {1: 0.4577, 5: 0.7453, 10: 0.8379}
txt retrieval recall = {1: 0.6004, 5: 0.8516, 10: 0.9172}

Meta File

You may need the meta file used in some scripts, which can be obtained from MSCOCO-Meta and Flickr-Meta.

Demo

TODO

Re-Ranking

Note that Re-ranker is using prediction file generated from UNITER or OSCAR due to use of different pytorch version.

Re-ranking script is currently provided as is, and has not been cleaned yet.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

MIT

Setup freqtrade/freqUI on Heroku

UNMAINTAINED - REPO MOVED TO https://github.com/p-zombie/freqtrade Creating the app git clone https://github.com/joaorafaelm/freqtrade.git && cd freqt

João 51 Aug 29, 2022
SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

[arXiv] The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, wh

ZOZO, Inc. 138 Nov 24, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Across-animal odor decoding by probabilistic manifold alignment (NeurIPS 2021) This repository is the official implementation of aligned mixture of la

Pedro Herrero-Vidal 3 Jul 12, 2022
Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Weighted Projective Spaces ML Description: The database of 5-vectors describing 4d weighted projective spaces which admit Calabi-Yau hypersurfaces are

Ed Hirst 3 Sep 08, 2022
Nicholas Lee 3 Jan 09, 2022
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 0 Dec 15, 2022
Audio2Face - Audio To Face With Python

Audio2Face Discription We create a project that transforms audio to blendshape w

FACEGOOD 724 Dec 26, 2022
This project implements "virtual speed" from heart rate monito

ANT+ Virtual Stride Based Speed and Distance Monitor Overview This project imple

2 May 20, 2022
SplineConv implementation for Paddle.

SplineConv implementation for Paddle This module implements the SplineConv operators from Matthias Fey, Jan Eric Lenssen, Frank Weichert, Heinrich Mül

北海若 3 Dec 29, 2021
Explainable Zero-Shot Topic Extraction

Zero-Shot Topic Extraction with Common-Sense Knowledge Graph This repository contains the code for reproducing the results reported in the paper "Expl

D2K Lab 56 Dec 14, 2022
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
Roadmap to becoming a machine learning engineer in 2020

Roadmap to becoming a machine learning engineer in 2020, inspired by web-developer-roadmap.

Chris Hoyean Song 1.7k Dec 29, 2022
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022
A facial recognition doorbell system using a Raspberry Pi

Facial Recognition Doorbell This project expands on the person-detecting doorbell system to allow it to identify faces, and announce names accordingly

rydercalmdown 22 Apr 15, 2022
A knowledge base construction engine for richly formatted data

Fonduer is a Python package and framework for building knowledge base construction (KBC) applications from richly formatted data. Note that Fonduer is

HazyResearch 386 Dec 05, 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

LightHuBERT LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT | Github | Huggingface | SUPER

WangRui 46 Dec 29, 2022
Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Metadata Extractor The exifextract script can be used to read in exif metadata f

1 Feb 16, 2022
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
List some popular DeepFake models e.g. DeepFake, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, SimSwap, CihaNet, etc.

deepfake-models List some popular DeepFake models e.g. DeepFake, CihaNet, SimSwap, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, Si

Mingcan Xiang 100 Dec 17, 2022