Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Last update: Dec 21, 2022

Overview

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

Tf alloc - Simplication of GPU allocation for Tensorflow2

Bravia core script for python

Tensorflow port of a full NetVLAD network

Supporting code for the paper "Dangers of Bayesian Model Averaging under Covariate Shift"

A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

zeus is a Python implementation of the Ensemble Slice Sampling method.

Lightweight Python library for adding real-time object tracking to any detector.

Provide partial dates and retain the date precision through processing

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

Empower Sequence Labeling with Task-Aware Language Model

Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

Utilities to bridge Canvas-generated course rosters with GitLab's API.

Capstone-Project-2 - A game program written in the Python language

SegNet model implemented using keras framework

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.