State of the art Semantic Sentence Embeddings

Overview

Contrastive Tension

State of the art Semantic Sentence Embeddings

Published Paper · Huggingface Models · Report Bug

Overview

This is the official code accompanied with the paper Semantic Re-Tuning via Contrastive Tension.
The paper was accepted at ICLR-2021 and official reviews and responses can be found at OpenReview.

Contrastive Tension(CT) is a fully self-supervised algorithm for re-tuning already pre-trained transformer Language Models, and achieves State-Of-The-Art(SOTA) sentence embeddings for Semantic Textual Similarity(STS). All that is required is hence a pre-trained model and a modestly large text corpus. The results presented in the paper sampled text data from Wikipedia.

This repository contains:

  • Tensorflow 2 implementation of the CT algorithm
  • State of the art pre-trained STS models
  • Tensorflow 2 inference code
  • PyTorch inference code

Requirements

While it is possible that other versions works equally fine, we have worked with the following:

  • Python = 3.6.9
  • Transformers = 4.1.1

Usage

All the models and tokenizers are available via the Huggingface interface, and can be loaded for both Tensorflow and PyTorch:

import transformers

tokenizer = transformers.AutoTokenizer.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')

TF_model = transformers.TFAutoModel.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')
PT_model = transformers.AutoModel.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')

Inference

To perform inference with the pre-trained models (or other Huggigface models) please see the script ExampleBatchInference.py.
The most important thing to remember when running inference is to apply the attention_masks on the batch output vector before mean pooling, as is done in the example script.

CT Training

To run CT on your own models and text data see ExampleTraining.py for a comprehensive example. This file currently creates a dummy corpus of random text. Simply replace this to whatever corpus you like.

Pre-trained Models

Note that these models are not trained with the exact hyperparameters as those disclosed in the original CT paper. Rather, the parameters are from a short follow-up paper currently under review, which once again pushes the SOTA.

All evaluation is done using the SentEval framework, and shows the: (Pearson / Spearman) correlations

Unsupervised / Zero-Shot

As both the training of BERT, and CT itself is fully self-supervised, the models only tuned with CT require no labeled data whatsoever.
The NLI models however, are first fine-tuned towards a natural language inference task, which requires labeled data.

Model Avg Unsupervised STS STS-b #Parameters
Fully Unsupervised
BERT-Distil-CT 75.12 / 75.04 78.63 / 77.91 66 M
BERT-Base-CT 73.55 / 73.36 75.49 / 73.31 108 M
BERT-Large-CT 77.12 / 76.93 80.75 / 79.82 334 M
Using NLI Data
BERT-Distil-NLI-CT 76.65 / 76.63 79.74 / 81.01 66 M
BERT-Base-NLI-CT 76.05 / 76.28 79.98 / 81.47 108 M
BERT-Large-NLI-CT 77.42 / 77.41 80.92 / 81.66 334 M

Supervised

These models are fine-tuned directly with STS data, using a modified version of the supervised training object proposed by S-BERT.
To our knowledge our RoBerta-Large-STSb is the current SOTA model for STS via sentence embeddings.

Model STS-b #Parameters
BERT-Distil-CT-STSb 84.85 / 85.46 66 M
BERT-Base-CT-STSb 85.31 / 85.76 108 M
BERT-Large-CT-STSb 85.86 / 86.47 334 M
RoBerta-Large-CT-STSb 87.56 / 88.42 334 M

Other Languages

Model Language #Parameters
BERT-Base-Swe-CT-STSb Swedish 108 M

License

Distributed under the MIT License. See LICENSE for more information.

Contact

If you have questions regarding the paper, please consider creating a comment via the official OpenReview submission.
If you have questions regarding the code or otherwise related to this Github page, please open an issue.

For other purposes, feel free to contact me directly at: [email protected]

Acknowledgements

Owner
Fredrik Carlsson
Fredrik Carlsson
This is a simple plugin for Vim that allows you to use OpenAI Codex.

🤖 Vim Codex An AI plugin that does the work for you. This is a simple plugin for Vim that will allow you to use OpenAI Codex. To use this plugin you

Tom Dörr 195 Dec 28, 2022
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability

Alibaba 185 Dec 21, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 05, 2023
A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

OMNI A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes. Why? When I finished my Kubernetes cluster using a few Raspber

Matias Godoy 148 Dec 29, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

Hand Gesture Volume Controller Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out). Code Firstly I have created a

Tejas Prajapati 16 Sep 11, 2021
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l

ZJU-VIPA 37 Nov 10, 2022
PyTorch implementation of some learning rate schedulers for deep learning researcher.

pytorch-lr-scheduler PyTorch implementation of some learning rate schedulers for deep learning researcher. Usage WarmupReduceLROnPlateauScheduler Visu

Soohwan Kim 59 Dec 08, 2022
Repository for tackling Kaggle Ultrasound Nerve Segmentation challenge using Torchnet.

Ultrasound Nerve Segmentation Challenge using Torchnet This repository acts as a starting point for someone who wants to start with the kaggle ultraso

Qure.ai 46 Jul 18, 2022
YOLOX + ROS(1, 2) object detection package

YOLOX + ROS(1, 2) object detection package

Ar-Ray 158 Dec 21, 2022
Log4j JNDI inj. vuln scanner

Log-4-JAM - Log 4 Just Another Mess Log4j JNDI inj. vuln scanner Requirements pip3 install requests_toolbelt Usage # make sure target list has http/ht

Ashish Kunwar 66 Nov 09, 2022
DISTIL: Deep dIverSified inTeractIve Learning.

DISTIL: Deep dIverSified inTeractIve Learning. An active/inter-active learning library built on py-torch for reducing labeling costs.

decile-team 110 Dec 06, 2022
An example of time series augmentation methods with Keras

Time Series Augmentation This is a collection of time series data augmentation methods and an example use using Keras. News 2020/04/16: Repository Cre

九州大学 ヒューマンインタフェース研究室 229 Jan 02, 2023
The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

Likelihood-Free Inference in State-Space Models with Unknown Dynamics This package contains the codes required to run the experiments in the paper. Th

Alex Aushev 0 Dec 27, 2021
CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper) (Accepted for oral presentation at ACM

Minha Kim 1 Nov 12, 2021
State-Relabeling Adversarial Active Learning

State-Relabeling Adversarial Active Learning Code for SRAAL [2020 CVPR Oral] Requirements torch = 1.6.0 numpy = 1.19.1 tqdm = 4.31.1 AL Results The

10 Jul 14, 2022
Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

Context-Aware-Healthcare Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs Download

LuChang 9 Dec 26, 2022
Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Extrapolating from a Single Image to a Thousand Classes using Distillation by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution) Extrapolating from

Yuki M. Asano 16 Nov 04, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022