CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Related tags

Deep Learningcloob
Overview

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Andreas Fürst* 1, Elisabeth Rumetshofer* 1, Viet Tran1, Hubert Ramsauer1, Fei Tang3, Johannes Lehner1, David Kreil2, Michael Kopp2, Günter Klambauer1, Angela Bitto-Nemling1, Sepp Hochreiter1 2

1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
2 Institute of Advanced Research in Artificial Intelligence (IARAI)
3 HERE Technologies
* Equal contribution


Detailed blog post on this paper at this link.

The full paper is available here.


Implementation of CLOOB

This repository contains the implemenation of CLOOB used to obtain the results reported in the paper. The implementation is based on OpenCLIP, an open source implementation of OpenAI's CLIP.

Setup

We provide an 'environment.yml' file to set up a conda environment with all required packages. Run the following command to clone the repository and create the environment.

# Clone repository and swtich into the directory
git clone https://github.com/ml-jku/cloob
cd cloob

# Create the environment and activate it
conda env create --file environment.yml
conda activate cloob

# Additionally, webdataset needs to be installed from git repo for pre-training on YFCC 
pip install git+https://github.com/tmbdev/webdataset.git

# Add the directory to the PYTHONPATH environment variable
export PYTHONPATH="$PYTHONPATH:$PWD/src"

Data

For pre-training we use the two datasets supported by OpenCLIP, namely Conceptual Captions and YFCC.

Conceptual Captions

OpenCLIP already provides a script to download and prepare the Conceptual Captions dataset, which contains 2.89M training images and 13k validation images. First, download the Conceptual Captions URLs and then run the script gather_cc.py.

python3 src/data/gather_cc.py path/to/Train_GCC-training.tsv path/to/Validation_GCC-1.1.0-Validation.tsv

YFCC

We use the same subset of ~15M images from the YFCC100M dataset as CLIP. They provide a list of (line number, photo identifier, photo hash) of each image contained in this subset here.

For more information see YFCC100m Subset on OpenAI's github.

Downstream Tasks

In the paper we report results on several downstream tasks. Except for ImageNet we provide links to already pre-processed versions (where necessary) of the respective test set.

Dataset Description Official Processed
Birdsnap This dataset contains images of North American bird species, however
our dataset is smaller than reported in CLIP as some samples are no longer available.
Link Link
Country211 This dataset was published in CLIP and is a small subset of the YFCC100m dataset.
It consists of photos that can be assigned to 211 countries via GPS coordinates.
For each country 200 photos are sampled for the training set and 100 for testing.
Link Link
Flowers102 Images of 102 flower categories commonly occuring in the United Kingdom were collected.
Several classes are very similar and there is a large variation in scale, pose and lighting.
Link Link
GTSRB This dataset was released for a challenge held at the IJCNN 2011.
The dataset contains images of german traffic signs from more than 40 classes.
Link Link
Stanford Cars This dataset contains images of 196 car models at the level of make,
model and year (e.g. Tesla Model S Sedan 2012).
Link Link
UCF101 The dataset has been created by extracting the middle frame from each video. Link Link
ImageNet This dataset spans 1000 object classes and contains 1,281,167 training images,
50,000 validation images and 100,000 test images.
Link -
ImageNet v2 The ImageNetV2 dataset contains new test data for the ImageNet benchmark. Link -

Usage

In the following there is an example command for pretraining on CC with an effective batch size of 512 when used on 4 GPUs.

/conceptual_captions/Train-GCC-training_output.csv" \ --val-data=" /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv" \ --path-data=" /conceptual_captions" \ --imagenet-val=" /imagenet/val" \ --warmup 20000 \ --batch-size=128 \ --lr=1e-3 \ --wd=0.1 \ --lr-scheduler="cosine-restarts" \ --restart-cycles=10 \ --epochs=70 \ --method="cloob" \ --init-inv-tau=30 \ --init-scale-hopfield=8 \ --workers=8 \ --model="RN50" \ --dist-url="tcp://127.0.0.1:6100" \ --batch-size-eval=512 ">
python -u src/training/main.py \
--train-data="
       
        /conceptual_captions/Train-GCC-training_output.csv
        "
        \
--val-data="
       
        /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv
        "
        \
--path-data="
       
        /conceptual_captions
        "
        \
--imagenet-val="
       
        /imagenet/val
        "
        \
--warmup 20000 \
--batch-size=128 \
--lr=1e-3 \
--wd=0.1 \
--lr-scheduler="cosine-restarts" \
--restart-cycles=10 \
--epochs=70 \
--method="cloob" \
--init-inv-tau=30 \
--init-scale-hopfield=8 \
--workers=8 \
--model="RN50" \
--dist-url="tcp://127.0.0.1:6100" \
--batch-size-eval=512

Zeroshot evaluation of downstream tasks

We provide a Jupyter notebook to perform zeroshot evaluation with a trained model.

LICENSE

MIT LICENSE

Owner
Institute for Machine Learning, Johannes Kepler University Linz
Software of the Institute for Machine Learning, JKU Linz
Institute for Machine Learning, Johannes Kepler University Linz
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 08, 2021
Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras. Implementation of various Deep Image Segmentation mo

Divam Gupta 2.6k Jan 05, 2023
Pansharpening by convolutional neural networks in the full resolution framework

Z-PNN: Zoom Pansharpening Neural Network Pansharpening by convolutional neural networks in the full resolution framework is a deep learning method for

20 Nov 24, 2022
"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

undirected-generation-dev This repo contains the source code of the models described in the following paper "Learning and Analyzing Generation Order f

Yichen Jiang 0 Mar 25, 2022
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022
Pytorch tutorials for Neural Style transfert

PyTorch Tutorials This tutorial is no longer maintained. Please use the official version: https://pytorch.org/tutorials/advanced/neural_style_tutorial

Alexis David Jacq 135 Jun 26, 2022
Code for the paper "Reinforced Active Learning for Image Segmentation"

Reinforced Active Learning for Image Segmentation (RALIS) Code for the paper Reinforced Active Learning for Image Segmentation Dependencies python 3.6

Arantxa Casanova 79 Dec 19, 2022
Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

A Latent Transformer for Disentangled Face Editing in Images and Videos Official implementation for paper: A Latent Transformer for Disentangled Face

InterDigital 108 Dec 09, 2022
Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

This is an official Pytorch implementation of the approaches proposed in: Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in

Han Peng 16 Dec 23, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 05, 2023
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
The official repository for "Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning" paper.

Intermdiate layer matters - SSL The official repository for "Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning" paper. Downl

Aakash Kaku 35 Sep 19, 2022
Lightweight plotting to the terminal. 4x resolution via Unicode.

Uniplot Lightweight plotting to the terminal. 4x resolution via Unicode. When working with production data science code it can be handy to have plotti

Olav Stetter 203 Dec 29, 2022
Download files from DSpace systems (because for some reason DSpace won't let you)

DSpaceDL A tool for downloading files from DSpace items. For some reason, DSpace systems have a dogshit UI, and Universities absolutely LOOOVE to use

Soumitra Shewale 5 Dec 01, 2022
Management Dashboard for Torchserve

Torchserve Dashboard Torchserve Dashboard using Streamlit Related blog post Usage Additional Requirement: torchserve (recommended:v0.5.2) Simply run:

Ceyda Cinarel 103 Dec 10, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
Prml - Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

Pattern Recognition and Machine Learning (PRML) This project contains Jupyter notebooks of many the algorithms presented in Christopher Bishop's Patte

Gerardo Durán-Martín 1k Jan 07, 2023
Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

ApproxMVBB Status Build UnitTests Homepage Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in

Gabriel Nützi 390 Dec 31, 2022
Recognize Handwritten Digits using Deep Learning on the browser itself.

MNIST on the Web An attempt to predict MNIST handwritten digits from my PyTorch model from the browser (client-side) and not from the server, with the

Harjyot Bagga 7 May 28, 2022