Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

Overview

FENSE

The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evaluated with Image Caption Metrics?"

The main branch contains an easy-to-use interface for fast evaluation of an audio captioning system.

Online demo avaliable at https://share.streamlit.io/blmoistawinde/fense/main/streamlit_demo/app.py .

To get the dataset (AudioCaps-Eval and Clotho-Eval) and the code to reproduce, please refer to the experiment-code branch.

Installation

Clone the repository and pip install it.

git clone https://github.com/blmoistawinde/fense.git
cd fense
pip install -e .

Usage

Single Sentence

To get the detailed scores of each component for a single sentence.

from fense.evaluator import Evaluator

print("----Using tiny models----")
evaluator = Evaluator(device='cpu', sbert_model='paraphrase-MiniLM-L6-v2', echecker_model='echecker_clotho_audiocaps_tiny')

eval_cap = "An engine in idling and a man is speaking and then"
ref_cap = "A machine makes stitching sounds while people are talking in the background"

score, error_prob, penalized_score = evaluator.sentence_score(eval_cap, [ref_cap], return_error_prob=True)

print("Cand:", eval_cap)
print("Ref:", ref_cap)
print(f"SBERT sim: {score:.4f}, Error Prob: {error_prob:.4f}, Penalized score: {penalized_score:.4f}")

System Score

To get a system's overall score on a dataset by averaging sentence-level FENSE, you can use eval_system.py, with your system outputs prepared in the format like test_data/audiocaps_cands.csv or test_data/clotho_cands.csv .

For AudioCaps test set:

python eval_system.py --device cuda --dataset audiocaps --cands_dir ./test_data/audiocaps_cands.csv

For Clotho Eval set:

python eval_system.py --device cuda --dataset clotho --cands_dir ./test_data/clotho_cands.csv

Performance Benchmark

We benchmark the performance of FENSE with different choices of SBERT model and Error Detector on the two benchmark dataset AudioCaps-Eval and Clotho-Eval. (*) is the combination reported in paper.

AudioCaps-Eval

SBERT echecker HC HI HM MM total
paraphrase-MiniLM-L6-v2 none 62.1 98.8 93.7 75.4 80.4
paraphrase-MiniLM-L6-v2 tiny 57.6 94.7 89.5 82.6 82.3
paraphrase-MiniLM-L6-v2 base 62.6 98 82.5 85.4 85.5
paraphrase-TinyBERT-L6-v2 none 64 99.2 92.5 73.6 79.6
paraphrase-TinyBERT-L6-v2 tiny 58.6 95.1 88.3 82.2 82.1
paraphrase-TinyBERT-L6-v2 base 64.5 98.4 91.6 84.6 85.3(*)
paraphrase-mpnet-base-v2 none 63.1 98.8 94.1 74.1 80.1
paraphrase-mpnet-base-v2 tiny 58.1 94.3 90 83.2 82.7
paraphrase-mpnet-base-v2 base 63.5 98 92.5 85.9 85.9

Clotho-Eval

SBERT echecker HC HI HM MM total
paraphrase-MiniLM-L6-v2 none 59.5 95.1 76.3 66.2 71.3
paraphrase-MiniLM-L6-v2 tiny 56.7 90.6 79.3 70.9 73.3
paraphrase-MiniLM-L6-v2 base 60 94.3 80.6 72.3 75.3
paraphrase-TinyBERT-L6-v2 none 60 95.5 75.9 66.9 71.8
paraphrase-TinyBERT-L6-v2 tiny 59 93 79.7 71.5 74.4
paraphrase-TinyBERT-L6-v2 base 60.5 94.7 80.2 72.8 75.7(*)
paraphrase-mpnet-base-v2 none 56.2 96.3 77.6 65.2 70.7
paraphrase-mpnet-base-v2 tiny 54.8 91.8 80.6 70.1 73
paraphrase-mpnet-base-v2 base 57.1 95.5 81.9 71.6 74.9

Reference

If you use FENSE in your research, please cite:

@misc{zhou2021audio,
      title={Can Audio Captions Be Evaluated with Image Caption Metrics?}, 
      author={Zelin Zhou and Zhiling Zhang and Xuenan Xu and Zeyu Xie and Mengyue Wu and Kenny Q. Zhu},
      year={2021},
      eprint={2110.04684},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
You might also like...
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Source code for NAACL 2021 paper
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

Pure python PEMDAS expression solver without using built-in eval function

pypemdas Pure python PEMDAS expression solver without using built-in eval function. Supports nested parenthesis. Supported operators: + - * / ^ Exampl

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Yet another video caption

Yet another video caption

Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Wileless-PDGNet Implementation

Wileless-PDGNet Implementation This repo is related to the following paper: Boning Li, Ananthram Swami, and Santiago Segarra, "Power allocation for wi

6 Oct 04, 2022
This project intends to use SVM supervised learning to determine whether or not an individual is diabetic given certain attributes.

Diabetes Prediction Using SVM I explore a diabetes prediction algorithm using a Diabetes dataset. Using a Support Vector Machine for my prediction alg

Jeff Shen 1 Jan 14, 2022
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Unlearnable Examples Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah

Hanxun Huang 98 Dec 07, 2022
MiniSom is a minimalistic implementation of the Self Organizing Maps

MiniSom Self Organizing Maps MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial N

Giuseppe Vettigli 1.2k Jan 03, 2023
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022
True per-item rarity for Loot

True-Rarity True per-item rarity for Loot (For Adventurers) and More Loot A.K.A mLoot each out/true_rarity_{item_type}.json file contains probabilitie

Dan R. 3 Jul 26, 2022
Rule Extraction Methods for Interactive eXplainability

REMIX: Rule Extraction Methods for Interactive eXplainability This repository contains a variety of tools and methods for extracting interpretable rul

Mateo Espinosa Zarlenga 21 Jan 03, 2023
Code for SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes (NeurIPS 2021)

SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes (NeurIPS 2021) SyncTwin is a treatment effect estimation method tailored for observat

Zhaozhi Qian 3 Nov 03, 2022
Transfer SemanticKITTI labeles into other dataset/sensor formats.

LiDAR-Transfer Transfer SemanticKITTI labeles into other dataset/sensor formats. Content Convert datasets (NUSCENES, FORD, NCLT) to KITTI format Minim

Photogrammetry & Robotics Bonn 64 Nov 21, 2022
Source code for Fixed-Point GAN for Cloud Detection

FCD: Fixed-Point GAN for Cloud Detection PyTorch source code of Nyborg & Assent (2020). Abstract The detection of clouds in satellite images is an ess

Joachim Nyborg 8 Dec 22, 2022
E2C implementation in PyTorch

Embed to Control implementation in PyTorch Paper can be found here: https://arxiv.org/abs/1506.07365 You will need a patched version of OpenAI Gym in

Yicheng Luo 42 Dec 12, 2022
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

deep-learning-LAM-avulsion-diagnosis The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis

1 Jan 12, 2022
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022
Implementation of various Vision Transformers I found interesting

Implementation of various Vision Transformers I found interesting

Kim Seonghyeon 78 Dec 06, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 05, 2023
a Lightweight library for sequential learning agents, including reinforcement learning

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning) TL;DR salina is a lightweight library

Facebook Research 405 Dec 17, 2022
Learning View Priors for Single-view 3D Reconstruction (CVPR 2019)

Learning View Priors for Single-view 3D Reconstruction (CVPR 2019) This is code for a paper Learning View Priors for Single-view 3D Reconstruction by

Hiroharu Kato 38 Aug 17, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022