TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Last update: Jan 06, 2023

Related tags

Deep Learning multimodal

Overview

TorchMultimodal (Alpha Release)

Introduction

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:

A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities).
A repository of examples that show how to combine these building blocks with components and common infrastructure from across the PyTorch Ecosystem to replicate state-of-the-art models published in the literature. These examples should serve as baselines for ongoing research in the field, as well as a starting point for future work.

As a first open source example, researchers will be able to train and extend FLAVA using TorchMultimodal.

Installation

TorchMultimodal requires Python >= 3.8. The library can be installed with or without CUDA support.

Building from Source

Create conda environment

conda create -n torch-multimodal python=<python_version>
conda activate torch-multimodal

Install pytorch, torchvision, and torchtext. See PyTorch documentation. For now we only support Linux platform.

conda install pytorch torchvision torchtext cudatoolkit=11.3 -c pytorch-nightly -c nvidia

# For CPU-only install
conda install pytorch torchvision torchtext cpuonly -c pytorch-nightly

Download and install TorchMultimodal and remaining requirements.

git clone --recursive https://github.com/facebookresearch/multimodal.git torchmultimodal
cd torchmultimodal

pip install -e .

For developers please follow the development installation.

Documentation

The library builds on the following concepts:

Architectures: These are general and composable classes that capture the core logic associated with a family of models. In most cases these take modules as inputs instead of flat arguments (see Models below). Examples include the LateFusionArchitecture, FLAVA and CLIPArchitecture. Users should either reuse an existing architecture or a contribute a new one. We avoid inheritance as much as possible.
Models: These are specific instantiations of a given architecture implemented using builder functions. The builder functions take as input all of the parameters for constructing the modules needed to instantiate the architecture. See cnn_lstm.py for an example.
Modules: These are self-contained components that can be stitched up in various ways to build an architecture. See lstm_encoder.py as an example.

Contributing

See the CONTRIBUTING file for how to help out.

License

TorchMultimodal is BSD licensed, as found in the LICENSE file.

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Related tags

Overview

TorchMultimodal (Alpha Release)

Introduction

Installation

Building from Source

Documentation

Contributing

License

Owner

Meta Research

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

A practical ML pipeline for data labeling with experiment tracking using DVC.

VOGUE: Try-On by StyleGAN Interpolation Optimization

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This repository provides a basic implementation of our GCPR 2021 paper "Learning Conditional Invariance through Cycle Consistency"

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Sentiment analysis translations of the Bhagavad Gita

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

DNA sequence classification by Deep Neural Network

Voxel Transformer for 3D object detection

Message Passing on Cell Complexes

🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Code associated with the paper "Towards Understanding the Data Dependency of Mixup-style Training".

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling