Multi-angle c(q)uestion answering

Related tags

Deep Learningmacaw
Overview

Macaw

Introduction

Macaw (Multi-angle c(q)uestion answering) is a ready-to-use model capable of general question answering, showing robustness outside the domains it was trained on. It has been trained in "multi-angle" fashion, which means it can handle a flexible set of input and output "slots" (like question, answer, explanation) .

Macaw was built on top of T5 and comes in different sizes: macaw-11b, macaw-3b, and macaw-large, as well as an answer-focused version featured on various leaderboards: macaw-answer-11b (see below).

Examples

Some suggestive examples from the Macaw (11B) model, for different angles:

  • (Q→A) Given a question, what's the answer?
    Q: James went camping in the woods, but forgot to bring a hammer to bang the tent pegs in. What else might he use?
    → A: rocks

  • (QM→A) Given a question and answer choices, what's the answer?
    Q: James went camping in the woods, but forgot to bring a hammer to bang the tent pegs in. What else might he use?
    M: (A) a leaf (B) a log (C) a worm
    → A: a log

  • (Q→AE) Given a question, what's the answer and an explanation?
    Q: Which force pulls objects to the ground?
    → A: gravity
    → E: Gravitational force causes objects that have mass to be pulled down on a planet.

  • (A→QE) Given an answer, what's a plausible question and explanation?
    A: elephant
    → Q: Which animal has the largest ears?
    → E: The ears of an elephant are the largest.

  • (C→QA) Given a context, what's a plausible question and answer?
    C: A car needs a battery to start.
    → Q: What is required for a car to start?
    → A: battery

For many more examples of the basic Q→A angle, see examples.md.

Usage examples

Macaw can easily be used in the Hugging Face transformers library, as shown here for the smallest model (the smallest model is not generally recommended, but has much smaller footprint), where given a question we want to return an answer and suggested multiple-choice answer options.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("allenai/macaw-large")
model = AutoModelForSeq2SeqLM.from_pretrained("allenai/macaw-large")
input_string = "$answer$ ; $mcoptions$ ; $question$ = What is the color of a cloudy sky?"
input_ids = tokenizer.encode(input_string, return_tensors="pt")
output = model.generate(input_ids, max_length=200)

>>> tokenizer.batch_decode(output, skip_special_tokens=True)
['$answer$ = gray ; $mcoptions$ = (A) blue (B) white (C) grey (D) white']

(run pip install -r requirements.txt if any dependencies are missing). Note there's no guarantee the different slots are fully coherent, as in gray/grey (and duplicate "white") here, more so for the macaw-large model vs the larger ones.

The code in macaw/utils.py includes some convenience wrappers, such as load_model and run_macaw, here are some examples loading the macaw-11b model onto two GPUs (need around 48GB total GPU memory for the largest model to work):

from macaw.utils import load_model, run_macaw
model_dict = load_model("allenai/macaw-11b", cuda_devices=[0,1])
res1 = run_macaw("Q: Which force pulls objects to the ground?\nA\nE", model_dict)
# Alternate input syntax
res2 = run_macaw({"Q:":"Which force causes a compass needle to point north?", "A":""}, model_dict)
# Add sampling options for the output
res3 = run_macaw("Q: Which force pulls objects to the ground?\nA\nE", model_dict, {"do_sample": True, "temperature": 2.0})

>>> [print(res["output_slots_list"][0]) for res in [res1, res2, res3]]
{'answer': 'gravity', 'explanation': 'Gravitational force causes objects that have mass to be pulled down on a planet.'}
{'answer': 'magnetism'}
{'answer': 'gravitional force', 'explanation': 'Gravitational force causes objects that have mass to be pulled down on a planet.'}

For batch evaluation of instances at various angles, see macaw/batch_eval.py for pointers.

Supported slots

Here are the slots available in Macaw, generally applicable for both input and output:

Slot name Description Example
question (Q) Question text What is the color of a cloudy sky?
answer (A) Answer text The sky is blue
mcoptions (M) Multiple-choice answer options (A) blue (B) white (C) grey
context (C) Potentially relevant context (noisy IR) The sky looks blue to us because...
explanation (E) Sentences explaining the answer A cloudy sky is usually gray in color...

An angle is a specific set of input/output slots, for instance QM->AE is the task of producing answer and explanation, given a question and multiple-choice options. Macaw is trained on a wide variety of angles and handles unseen angles as well, one exception is that the context (C) only appears as an input slot in the training data.

The Challenge300 dataset of probing questions

The Challenge300 dataset of 300 diverse probing examples can be found in challenge300-probes-v1.jsonl. The basic Q→A output from Macaw (at different sizes), as well as outputs from GPT3, Jurassic-1 and alternate T5 models trained on NaturalQuestions, can be seen in examples.md.

Demo

See DEMO.md for instructions and code to host an interactive version of Macaw.

Training data

Macaw was trained in two steps from the text-to-text transformer model T5:

  1. Multi-angle version of UnifiedQA by fine-tuning T5 on the following 7 datasets and associated angles:

  2. Further fine-tuning of Multi-Angle UnifiedQA on multiple-choice and direct-answer elementary science questions, along with (up to 5) explanation sentences from WorldTreeV2:

    • ARC: QMC→AE, AQC→M, QMEC→A, QME→A, QE→A, QMC→A, QC→AE, QM→AE, QMAC→E, QMA→E
    • ARC-DA: QC→AE, Q→AE, QC→A, Q→A, QEC→A, QE→A, AE→Q, AC→Q, QA→E, AQC→E
  3. A specialized answer-focused model, macaw-answer-11b (called "UnifiedQA + ARC MC/DA + IR" on the leaderboards for ARC, ARC-Easy, and ARC-DA) was trained on a smaller set of angles, not including explanations:

    • ARC: QMC→A, QAC→M, QC→A, QM→A, MAC→Q, AC→QM, M→QA
    • ARC-DA: QC→A, Q→A, AC→Q, C→QA

Available models

The Macaw models can be accessed from the Hugging Face model hub:

For a sense of the degradation in performance for the smaller sizes, here are baseline scores on the ARC Challenge and ARC Easy multiple-choice development questions. Included are variants with and without IR context from a large science corpus (corresponding to angles QMC→A and QM→A respectively).

Model ARC Challenge ARC Challenge (no IR) ARC Easy ARC Easy (no IR)
Macaw (11B) 76.9 74.6 91.2 84.9
Macaw-3B 68.2 67.9 87.9 77.7
Macaw-large 57.2 50.5 82.5 63.9
Macaw-answer (11B) 79.9 75.2 90.5 85.8

Disclaimer

As a model capable of generating free form text, the output of the model is not guaranteed to be free of offensive material, so appropriate caution is advised when using the model.

Citation

If you use Macaw in your work, please reference the related paper using

@article{Tafjord2021Macaw,
  title={General-Purpose Question-Answering with {M}acaw},
  author={Oyvind Tafjord and Peter Clark},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.02593}
}
Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

StochFuzz: A New Solution for Binary-only Fuzzing StochFuzz is a (probabilistically) sound and cost-effective fuzzing technique for stripped binaries.

Zhuo Zhang 164 Dec 05, 2022
Integrated physics-based and ligand-based modeling.

ComBind ComBind integrates data-driven modeling and physics-based docking for improved binding pose prediction and binding affinity prediction. Given

Dror Lab 44 Oct 26, 2022
Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

IIGROUP 6 Sep 21, 2022
MGFN: Multi-Graph Fusion Networks for Urban Region Embedding was accepted by IJCAI-2022.

Multi-Graph Fusion Networks for Urban Region Embedding (IJCAI-22) This is the implementation of Multi-Graph Fusion Networks for Urban Region Embedding

202 Nov 18, 2022
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
Deep Inertial Prediction (DIPr)

Deep Inertial Prediction For more information and context related to this repo, please refer to our website. Getting Started (non Docker) Note: you wi

Arcturus Industries 12 Nov 11, 2022
ICSS - Interactive Continual Semantic Segmentation

Presentation This repository contains the code of our paper: Weakly-supervised c

Alteia 9 Jul 23, 2022
Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

Exploring the Limits of Out-of-Distribution Detection In this repository we're collecting replications for the key experiments in the Exploring the Li

Stanislav Fort 35 Jan 03, 2023
PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

XLearning Group 33 Nov 01, 2022
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Temporal Segment Networks (TSN) We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation fo

1.4k Jan 01, 2023
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
Omniscient Video Super-Resolution

Omniscient Video Super-Resolution This is the official code of OVSR (Omniscient Video Super-Resolution, ICCV 2021). This work is based on PFNL. Datase

36 Oct 27, 2022
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

NVIDIA Corporation 2.8k Dec 27, 2022
This repository contains the map content ontology used in narrative cartography

Narrative-cartography-ontology This repository contains the map content ontology used in narrative cartography, which is associated with a submission

Weiming Huang 0 Oct 31, 2021
Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer

AdaConv Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer from "Adaptive Convolutions for Structure-

65 Dec 22, 2022
Simple STAC Catalogs discovery tool.

STAC Catalog Discovery Simple STAC discovery tool. Just paste the STAC Catalog link and press Enter. Details STAC Discovery tool enables discovering d

Mykola Kozyr 21 Oct 19, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Introduction This repository contains the PyTorch implem

25 Nov 09, 2022
This is the winning solution of the Endocv-2021 grand challange.

Endocv2021-winner [Paper] This is the winning solution of the Endocv-2021 grand challange. Dependencies pytorch # tested with 1.7 and 1.8 torchvision

Vajira Thambawita 14 Dec 03, 2022
A lightweight deep network for fast and accurate optical flow estimation.

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation The official PyTorch implementation of FastFlowNet (ICRA 2021). Authors: Lingtong

Tone 161 Jan 03, 2023