BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation

Overview

BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation

Installing The Dependencies

$ conda create --name beametrics python>=3.8
$ conda activate beametrics

WARNING: You need to install, before any package, correct version of pytorch linked to your cuda version.

(beametrics) $ conda install pytorch cudatoolkit=10.1 -c pytorch

Install BEAMetrics:

(beametrics) $ cd BEAMetrics
(beametrics) $ pip install -e .

Install Nubia metric (not on PyPI, 16/08/2021):

(beametrics) git clone https://github.com/wl-research/nubia.git
(beametrics) pip install -r requirements.txt

Alternatively, you can remove nubia from _DEFAULT_METRIC_NAMES in metrics.metric_reporter.

Reproducing the results

First you need to get the processed files, which include the metric scores. You can do that either by simply downloading the processed data (see Section Download Data), or by re-computing the scores (see Section Compute Correlations).

Then, the first bloc in the notebook visualize.ipynb allows to get all the tables from the paper (and also to generate the latex code in data/correlation).

Download the data

All the dataset can be downloaded from this zip file. It needs to be unzipped into the path data before running the correlations.

unzip data.zip

The data folder contains:

  • a subfolder raw containing all the original dataset
  • a subfolder processed containing all the dataset processed in a unified format
  • a subfolder correlation containing all the final correlation results, and the main tables of the paper
  • a subfolder datacards containing all the data cards

Computing the correlations

Processing the files to a clean json with the metrics computed:

python beametrics/run_all.py

The optional argument --dataset allows to compute only on a specific dataset, e.g.:

python run_all.py --dataset SummarizationCNNDM.

The list of the datasets and their corresponding configuration can be found in configs/__init__.

When finished, you can print the final table as in the paper, see the notebook visualize.ipynb.

Data Cards:

For each dataset, a data card is available in the datacard folder. The cards are automatically generated when running run_all.py, by filling the template with the dataset configuration as detailed bellow, in Adding a new dataset.

Adding a new dataset:

In configs/, you need to create a new .py file that inherites from ConfigBase (in configs/co'nfig_base.py). You are expected to fill the mandatory fields that allow to run the code and fill the data card template:

  • file_name: the file name located in data/raw
  • file_name_processed: the file name once processed and formated
  • metric_names: you can pass _DEFAULT_METRIC_NAMES by default or customize it, e.g. metric_names = metric_names + ('sari',) where sari corresponds to a valid metric (see the next section)
  • name_dataset: the name of the dataset as it was published
  • short_name_dataset: few letters that will be used to name the dataset in the final table report
  • languages: the languages of the dataset (e.g. [en] or [en, fr])
  • task: e.g. 'simplification', 'data2text
  • number_examples: the total number of evaluated texts
  • nb_refs: the number of references available in the dataset
  • dimensions_definitions: the evaluated dimensions and their corresponding definition e.g. {'fluency: 'How fluent is the text?'}
  • scale: the scale used during the evaluation, as defined in the protocol
  • source_eval_sets: the dataset from which the source were collected to generate the evaluated examples
  • annotators: some information about who were the annotators
  • sampled_from: the URL where was released the evaluation dataset
  • citation: the citation of the paper where the dataset was released

Your class needs its custom method format_file. The function takes as input the dataset's file_name and return a dictionary d_data. The format for d_data has to be the same for all the datasets:

d_data = {
    key_1: {
        'source': "a_source", 
        'hypothesis': "an_hypothesis",
        'references': ["ref_1", "ref_2", ...],
        'dim_1': float(a_score),
        'dim_2': float(an_other_score),
    },
    ...
    key_n: {
        ...
    }
}

where 'key_1' and 'key_n' are the keys for the first and nth example, dim_1 and dim_2 dimensions corresponding to self.dimensions.

Finally, you need to add your dataset to the dictionary D_ALL_DATASETS located in config/__init__.

Adding a new metric:

First, create a class inheriting from metrics/metrics/MetricBase. Then, simply add it to the dictionary _D_METRICS in metrics/__init__.

For the metric to be computed by default, its name has to be added to either

  • _DEFAULT_METRIC_NAMES: metrics computed on each dataset
  • _DEFAULT_METRIC_NAMES_SRC: metrics computed on dataset that have a text format for their source (are excluded for now image captioning and data2text). These two tuples are located in metrics/metric_reported.

Alternatively, you can add the metric to a specific configuration by adding it to the attribute metric_names in the config.

Realtime_Multi-Person_Pose_Estimation

Introduction Multi Person PoseEstimation By PyTorch Results Require Pytorch Installation git submodule init && git submodule update Demo Download conv

tensorboy 1.3k Jan 05, 2023
MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

Documentation: https://mmgeneration.readthedocs.io/ Introduction English | 简体中文 MMGeneration is a powerful toolkit for generative models, especially f

OpenMMLab 1.3k Dec 29, 2022
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

Agustinus Kristiadi 7k Jan 02, 2023
PyTorch implementation of UNet++ (Nested U-Net).

PyTorch implementation of UNet++ (Nested U-Net) This repository contains code for a image segmentation model based on UNet++: A Nested U-Net Architect

4ui_iurz1 642 Jan 04, 2023
HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands Oral Presentation, 3DV 2021 Korrawe Karunratanakul, Adrian Spurr, Zicong

Korrawe Karunratanakul 43 Oct 07, 2022
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
A Deep Learning Framework for Neural Derivative Hedging

NNHedge NNHedge is a PyTorch based framework for Neural Derivative Hedging. The following repository was implemented to ease the experiments of our pa

GUIJIN SON 17 Nov 14, 2022
The original weights of some Caffe models, ported to PyTorch.

pytorch-caffe-models This repo contains the original weights of some Caffe models, ported to PyTorch. Currently there are: GoogLeNet (Going Deeper wit

Katherine Crowson 9 Nov 04, 2022
Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

Pacman AI Jussi Doherty CAP 4601 - Introduction to Artificial Intelligence - Fall 2020 Python version 3.0+ Source of this project This repo contains a

Jussi Doherty 1 Jan 03, 2022
Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis

Hierarchical Attention Mining (HAM) for weakly-supervised abnormality localization This is the official PyTorch implementation for the HAM method. Pap

Xi Ouyang 22 Jan 02, 2023
COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping Version 1.0 COVINS is an accurate, scalable, and versatile vis

ETHZ V4RL 183 Dec 27, 2022
The second project in Python course on FCC

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Denise T 1 Dec 13, 2021
automatic color-grading

color-matcher Description color-matcher enables color transfer across images which comes in handy for automatic color-grading of photographs, painting

hahnec 168 Jan 05, 2023
Intrusion Detection System using ensemble learning (machine learning)

IDS-ML implementation of an intrusion detection system using ensemble machine learning methods Data set This project is carried out using the UNSW-15

4 Nov 25, 2022
Learning View Priors for Single-view 3D Reconstruction (CVPR 2019)

Learning View Priors for Single-view 3D Reconstruction (CVPR 2019) This is code for a paper Learning View Priors for Single-view 3D Reconstruction by

Hiroharu Kato 38 Aug 17, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

Libraries, tools and tasks created and used at DeepMind Robotics.

DeepMind 270 Nov 30, 2022
Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional network

Evan Shelhamer 3.2k Jan 08, 2023
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 04, 2023
A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

CFN-SR A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION The audio-video based multimodal

skeleton 15 Sep 26, 2022
Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

Weakly Supervised Segmentation with TensorFlow This repo contains a TensorFlow implementation of weakly supervised instance segmentation as described

Phil Ferriere 220 Dec 13, 2022