NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Last update: Nov 15, 2022

Related tags

Overview

MCSE: Multimodal Contrastive Learning of Sentence Embeddings

This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multimodal Contrastive Learning of Sentence Embeddings. If you find this reposity useful, please consider citing our paper.

Contact: Miaoran Zhang ([email protected])

Pre-trained Models & Results

Model	Avg. STS
flickr-mcse-bert-base-uncased [Google Drive]	77.70
flickr-mcse-roberta-base [Google Drive]	78.44
coco-mcse-bert-base-uncased [Google Drive]	77.08
coco-mcse-roberta-base [Google Drive]	78.17

Note: flickr indicates that models are trained on wiki+flickr, and coco indicates that models are trained on wiki+coco.

Quickstart

Setup

Python 3.9.5
Pytorch 1.7.1
Install other packages:

pip install -r requirements.txt

Data Preparation

Please organize the data directory as following:

REPO ROOT
|
|--data    
|  |--wiki1m_for_simcse.txt  
|  |--flickr_random_captions.txt    
|  |--flickr_resnet.hdf5    
|  |--coco_random_captions.txt    
|  |--coco_resnet.hdf5

Wiki1M

wget https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt

Flickr30k & MS-COCO
You can either download the preprocessed data we used:
(annotation sources: flickr30k-entities and coco).

Or preprocess the data by yourself (take Flickr30k as an example):

Download the flickr30k-entities.
Request access to the flickr-images from here. Note that the use of the images much abide by the Flickr Terms of Use.

Run script:

unzip ${path_to_flickr-entities}/annotations.zip

python preprocess/prepare_flickr.py \
    --flickr_entities_dir ${path_to_flickr-entities}  \  
    --flickr_images_dir ${path_to_flickr-images} \
    --output_dir data/
    --batch_size 32

Train & Evaluation

Prepare the senteval datasets for evaluation:

cd SentEval/data/downstream/
bash download_dataset.sh

Run scripts:
```
# For example:  (more examples are given in scripts/.)
sh scripts/run_wiki_flickr.sh
```
Note: In the paper we run experiments with 5 seeds (0,1,2,3,4). You can find the detailed parameter settings in Appendix.

Acknowledgements

The extremely clear and well organized codebase: SimCSE
SentEval toolkit

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Related tags

Overview

MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Pre-trained Models & Results

Quickstart

Setup

Data Preparation

Train & Evaluation

Acknowledgements

Owner

Saarland University Spoken Language Systems Group

Repository for Project Insight: NLP as a Service

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

DELTA is a deep learning based natural language and speech processing platform.

A text augmentation tool for named entity recognition.

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Spooky Skelly For Python

💫 Industrial-strength Natural Language Processing (NLP) in Python

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This is the source code of RPG (Reward-Randomized Policy Gradient)

test

A curated list of efficient attention modules

An ActivityWatch watcher to pose questions to the user and record her answers.

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

A fast and easy implementation of Transformer with PyTorch.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Image2pcl - Enter the metaverse with 2D image to 3D projections

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.