GluonMM is a library of transformer models for computer vision and multi-modality research

Last update: Dec 02, 2022

Overview

GluonMM

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

Install

First, clone the repository locally,

git clone https://github.com/amazon-research/gluonmm.git

Then install dependencies,

conda create -n gluonmm python=3.7
conda activate gluonmm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install timm tensorboardX yacs tqdm requests pandas decord scikit-image opencv-python

# Install apex for half-precision training (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

We have extensively tested the usage with PyTorch 1.8.1 and torchvision 0.9.1 with CUDA 10.2.

Model zoo

Image classification

Video action recognition

VidTr

Usage

For detailed usage, please refer to the README file in each model family. For example, the training, evaluation and model zoo information of video transformer VidTr can be found at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

Parts of the code are heavily derived from pytorch-image-models, DeiT, Swin-transformer, vit-pytorch and vision_transformer.

GluonMM is a library of transformer models for computer vision and multi-modality research

Related tags

Overview

GluonMM

Install

Model zoo

Image classification

Video action recognition

Usage

Security

License

Acknowledgement

Owner

This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Lenia - Mathematical Life Forms

Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Haze Removal can remove slight to extreme cases of haze affecting an image

The Deep Learning with Julia book, using Flux.jl.

A new video text spotting framework with Transformer

Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks.

Extension to fastai for volumetric medical data

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

PyMatting: A Python Library for Alpha Matting

U-2-Net: U Square Net - Modified for paired image training of style transfer