Multi-Modal Machine Learning toolkit based on PyTorch.

Last update: Jan 05, 2022

Related tags

Deep Learning TorchMM

Overview

简体中文 | English

TorchMM

简介

多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库，为处理图片文本等多模态数据提供高效的解决方案，助力多模态学习应用落地。

近期更新

2022.1.5 发布 TorchMM 初始版本 v1.0

特性

丰富的任务场景：工具包提供多模态融合、跨模态检索、图文生成等多种多模态学习任务算法模型库，支持用户自定义数据和训练。
成功的落地实践：基于工具包算法已有相关落地应用，如球鞋真伪鉴定、球鞋风格迁移、家具图片自动描述、舆情监控等。

应用展示

球鞋真伪鉴定

更多信息欢迎访问我们的网站 Ysneaker ！

框架

TorchMM 包括以下模块：

数据处理：提供统一的数据接口和多种数据处理格式
模型库：包括多模态融合、跨模态检索、图文生成、多任务算法
训练器：对每种任务设置统一的训练流程和相关指标计算

使用

下载工具包

git clone https://github.com/njustkmg/TorchMM.git

使用示例：

from torchmm import TorchMM

# config: Model running parameters, see configs/
# data_root: Path to dataset
# image_root: Path to images
# gpu: Which gpu to use

runner = PaddleMM(config='configs/cmml.yml',
                  data_root='data/COCO', 
                  image_root='data/COCO/images', 
                  cuda=0)

或者

python run.py --config configs/cmml.yml --data_root data/COCO --image_root data/COCO/images --cuda 0

模型库 (更新中)

[1] Comprehensive Semi-Supervised Multi-Modal Learning

[2] Stacked Cross Attention for Image-Text Matching

[4] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[5] Attention on Attention for Image Captioning

[6] VQA: Visual Question Answering

[7] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

实验结果

多模态融合

	Average_Precision	Coverage	Example_AUC	Macro_AUC	Micro_AUC	Ranking_loss
CMML	0.682	18.827	0.948	0.927	0.950	0.052
Early(add)							ResNet+LSTM
Early(concat)							ResNet+GRU

许可证书

本项目的发布受 Apache 2.0 license 许可认证。

Multi-Modal Machine Learning toolkit based on PyTorch.

Related tags

Overview

TorchMM

简介

近期更新

特性

应用展示

框架

使用

模型库 (更新中)

实验结果

许可证书

Owner

njustkmg

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

End-To-End Crowdsourcing

Combine Tacotron2 and Hifi GAN to generate speech from text

MacroTools provides a library of tools for working with Julia code and expressions.

La source de mon module 'pyfade' disponible sur Pypi.

Image-Stitching - Panorama composition using SIFT Features and a custom implementaion of RANSAC algorithm

This is an official implementation of the High-Resolution Transformer for Dense Prediction.

Instance Semantic Segmentation List

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Backend code to use MCPI's python API to make infinite worlds with custom generation

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

ECAENet (TensorFlow and Keras)

Pure python PEMDAS expression solver without using built-in eval function

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Non-stationary GP package written from scratch in PyTorch

Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification

Code for “ACE-HGNN: Adaptive Curvature ExplorationHyperbolic Graph Neural Network”