This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Last update: Dec 26, 2022

Overview

MultiModal-InfoMax

🔥 If you would be interested in other multimodal works in our DeCLaRe Lab, welcome to visit the clustered repository

Introduction

Multimodal-informax (MMIM) synthesizes fusion results from multi-modality input through a two-level mutual information (MI) maximization. We use BA (Barber-Agakov) lower bound and contrastive predictive coding as the target function to be maximized. To facilitate the computation, we design an entropy estimation module with associated history data memory to facilitate the computation of BA lower bound and the training process.

Usage

Download the CMU-MOSI and CMU-MOSEI dataset from Google Drive or Baidu Disk (extraction code: g3m2). Place them under the folder Multimodal-Infomax/datasets
Set up the environment (need conda prerequisite)

conda env create -f environment.yml
conda activate MMIM

Start training

python main.py --dataset mosi --contrast

Citation

Please cite our paper if you find our work useful for your research:

@article{han2021improving,
  title={Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis},
  author={Han, Wei and Chen, Hui and Poria, Soujanya},
  journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2021}
}

Contact

Should you have any question, feel free to contact me through [email protected]

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Related tags

Overview

MultiModal-InfoMax

Introduction

Usage

Citation

Contact

Owner

Deep Cognition and Language Research (DeCLaRe) Lab

Python package for multiple object tracking research with focus on laboratory animals tracking.

Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Torch implementation of SegNet and deconvolutional network

Deep Learning Specialization by Andrew Ng, deeplearning.ai.

Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

A custom DeepStack model that has been trained detecting ONLY the USPS logo

MonoScene: Monocular 3D Semantic Scene Completion

Used to record WKU's utility bills on a regular basis.

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

A collection of inference modules for fastai2

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Moment-DETR code and QVHighlights dataset