CLOOB training (JAX) and inference (JAX and PyTorch)

Overview

cloob-training

Pretrained models

There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint trained on LAION 400M.

Zero-shot ImageNet validation set accuracy (using OpenCLIP's code):

Model name Top 1 Top 5
cloob_laion_400m_vit_b_16_16_epochs 0.61238 0.8492
cloob_laion_400m_vit_b_16_32_epochs 0.62816 0.85964
OpenAI CLIP ViT-B/32 0.6327 0.88772
OpenAI CLIP ViT-B/16 0.68132 0.91768
OpenAI CLIP ViT-L/14 0.75388 0.9454
OpenAI CLIP ViT-L/14 @ 336 px 0.76564 0.9515
OpenAI CLIP RN50 0.59806 0.86498
OpenAI CLIP RN101 0.62296 0.88106
OpenAI CLIP RN50x4 0.66268 0.9046
OpenAI CLIP RN50x16 0.70754 0.92822
OpenAI CLIP RN50x64 0.74134 0.94146

PyTorch

from cloob_training import model_pt, pretrained

pretrained.list_configs()

returns:

['cloob_laion_400m_vit_b_16_16_epochs', 'cloob_laion_400m_vit_b_16_32_epochs']

The models can be used by:

config = pretrained.get_config('cloob_laion_400m_vit_b_16_16_epochs')
model = model_pt.get_pt_model(config)
checkpoint = pretrained.download_checkpoint(config)
model.load_state_dict(model_pt.get_pt_params(config, checkpoint))
model.eval().requires_grad_(False).to('cuda')

Model class attributes:

model.config: the model config dict.

model.image_encoder: the image encoder, which expects NCHW batches of normalized images (preprocessed by model.normalize), where C = model.config['image_encoder']['input_channels'] and H, W = model.config['image_encoder']['image_size'].

model.text_encoder: the text encoder, which expects text tokenized by model.tokenize.

model.normalize: the preprocessor for image tensors.

model.tokenize: the preprocessor for text.

JAX

Coming soon...

Training (JAX only)

Coming soon...

Owner
Katherine Crowson
AI/generative artist.
Katherine Crowson
Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Unsupervised Learning of Visual 3D Keypoints for Control [Project Website] [Paper] Boyuan Chen1, Pieter Abbeel1, Deepak Pathak2 1UC Berkeley 2Carnegie

Boyuan Chen 34 Jul 22, 2022
CSPML (crystal structure prediction with machine learning-based element substitution)

CSPML (crystal structure prediction with machine learning-based element substitution) CSPML is a unique methodology for the crystal structure predicti

8 Dec 20, 2022
This repository contains code to train and render Mixture of Volumetric Primitives (MVP) models

Mixture of Volumetric Primitives -- Training and Evaluation This repository contains code to train and render Mixture of Volumetric Primitives (MVP) m

Meta Research 125 Dec 29, 2022
Python implementation of "Single Image Haze Removal Using Dark Channel Prior"

##Dependencies pillow(~2.6.0) Numpy(~1.9.0) If the scripts throw AttributeError: __float__, make sure your pillow has jpeg support e.g. try: $ sudo ap

Joyee Cheung 73 Dec 20, 2022
Knowledge Distillation Toolbox for Semantic Segmentation

SegDistill: Toolbox for Knowledge Distillation on Semantic Segmentation Networks This repo contains the supported code and configuration files for Seg

9 Dec 12, 2022
Sleep staging from ECG, assisted with EEG

Sleep_Staging_Knowledge Distillation This codebase implements knowledge distillation approach for ECG based sleep staging assisted by EEG based sleep

2 Dec 12, 2022
TensorFlow implementation of AlexNet and its training and testing on ImageNet ILSVRC 2012 dataset

AlexNet training on ImageNet LSVRC 2012 This repository contains an implementation of AlexNet convolutional neural network and its training and testin

Matteo Dunnhofer 161 Nov 25, 2022
[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

Guangrui Li 84 Dec 26, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It or

airctic 789 Dec 29, 2022
RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

RITA: a Study on Scaling Up Generative Protein Sequence Models RITA is a family of autoregressive protein models, developed by a collaboration of Ligh

LightOn 69 Dec 22, 2022
Accelerated deep learning R&D

Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and

Catalyst-Team 3.1k Jan 06, 2023
Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

Calibrate your listeners! Robust communication-based training for pragmatic speakers Rose E. Wang, Julia White, Jesse Mu, Noah D. Goodman Findings of

Rose E. Wang 3 Apr 02, 2022
The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

ReCo - Regional Contrast This repository contains the source code of ReCo and baselines from the paper, Bootstrapping Semantic Segmentation with Regio

Shikun Liu 128 Dec 30, 2022
bespoke tooling for offensive security's Windows Usermode Exploit Dev course (OSED)

osed-scripts bespoke tooling for offensive security's Windows Usermode Exploit Dev course (OSED) Table of Contents Standalone Scripts egghunter.py fin

epi 268 Jan 05, 2023
PyTorch reimplementation of Diffusion Models

PyTorch pretrained Diffusion Models A PyTorch reimplementation of Denoising Diffusion Probabilistic Models with checkpoints converted from the author'

Patrick Esser 265 Jan 01, 2023
Spectralformer: Rethinking hyperspectral image classification with transformers

Spectralformer: Rethinking hyperspectral image classification with transformers Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang, Antonio Plaza

Danfeng Hong 102 Dec 29, 2022
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

SADRNet Paper link: SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction Requirements python

Multimedia Computing Group, Nanjing University 99 Dec 30, 2022
Yolo algorithm for detection + centroid tracker to track vehicles

Vehicle Tracking using Centroid tracker Algorithm used : Yolo algorithm for detection + centroid tracker to track vehicles Backend : opencv and python

6 Dec 21, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022