The Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Related tags

Deep LearningFSB
Overview

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

This repository includes the dataset, experiments results, and code for the paper:

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems PDF.

Authors: Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, Pascale Fung

Abstract

Learning to converse using only a few examples is a grand challenge in Conversational AI. The current best conversational models, which are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL), are language models (LMs) fine-tuned on large conversational datasets. Training these models is expensive, both in terms of computational resources and time, and it is hard to keep these models up to date with new conversational skills. A simple yet unexplored solution is prompt-based few-shot learning (Brown et al. 2020) which does not require gradient-based fine-tuning but instead uses a few examples in the LM context as the only source of learning. In this paper, we explore prompt-based few-shot learning in dialogue tasks. We benchmark LMs of different sizes in 9 response generation tasks, which include a variety of knowledge-grounded tasks, task-oriented generations, general open-chat, and controlled stylistic generation, and 5 conversational parsing tasks, which include dialogue state tracking, graph path generation, persona information extraction, and document retrieval. The current largest, released, LM (GPT-J-6B) achieves competitive performance to full-training state-of-the-art models by using the prompt-based few-shot learning, thus no training. Moreover, we proposed a novel perplexity-based classifier, that also does not require any fine-tuning, to select the most appropriate prompt given a dialogue history, as to create an all-in-one model with multiple dialogue skills. Finally, by combining the power of prompt-based few-shot learning and the skill selector, we create an end-to-end chatbot named the Few-Shot Bot, which automatically selects the most appropriate conversational skill, queries different KBs or the internet, and uses it to generate a human-like response, all by using only one dialogue example per skill.

Installation

In this repo, we load all the validation and test sets used in the evaluation. For running the experiments and the demo, you should install the following requirements:

pip install -r requirements.txt

Basic Running

Reproducing the results and plots

The generation folder stores the generated responses of the experiments in all datasets. To generate the tables and the plots in the paper, run

python generate_plots_tables.py

This script loads all the files and computes the mean between different runs and it generates the plots. Note that this script is very custum for each datasets, but it can serve as guide line for future extentions.

Running the experiments

There are three main files to run 1) response generation (main_response_generation.py), 2) conversational parsing (main_conversational_parsing.py), and 3) skill-selector (main_skill_selector.py). In these files, we load the necessary prompt (load_prefix) and we run the generation (generate_response) for each sample in the test set. Since each dialogue skill require a different template, as shown in the paper, we create a function that converts structured data into the correct shot prompt. An example of this function can be found in prompts/persona_chat.py, and in generic_prompts.py we store the generation functions.

In each main file there is configuration object (mapper) which specify meta-information about the task (i.e., number of shots, generation length, decoding type, prompt converter). Expecially for conversational parsing, there are different decoding type. For example, in MWOZ the model generates the dialogue state, which is further looped into the next turn.

How to run?

For example, to run the persona chat experiments (0, 1, k-shots), you can use the following command:

python main_response_generation.py --model_checkpoint EleutherAI/gpt-j-6B --dataset persona --gpu 0

In case your GPU has less that 16GB, then you could add --multigpu to spawn 4 GPUs (e.g., 1080Ti) and do inference in parallel. Similarly, for conversational parsing tasks, you could use:

python main_conversational_parsing.py --model_checkpoint EleutherAI/gpt-j-6B --dataset wow-parse --gpu 0

Notice that some parsing task requires a knowledge base (e.g., dialKG-parse requires the KG in neo4j). Finally, to run the skill-selector task, you could use:

python main_skill_selector.py --model_checkpoint EleutherAI/gpt-j-6B --shots_k 6 --repetition 1 --gpu 0

where repetition is the seed for selecting random samples in the prompts.

Runners

In the runners folder, we provide a rudimental runner to run all the experiments and reproduce the results in the paper.

Few-Shot Bot

There are two modes for the FSB such as 1) controlled style generation and 2) full-model. Currently we support the controlled style generation model. Check the FSB-CG.ipynb to try to interact with FSB in your local machine, or try directly in colab at https://colab.research.google.com/drive/15hQv1V3Cs5kQVfLOE_FZc1VCWQ3YpWVd?usp=sharing (Remeber to select the enviroment with GPU).

Owner
Andrea Madotto
Deep learning, Machine Learning, Learning To Learn, Natural Language Processing.
Andrea Madotto
Temporally Coherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Duc Linh Nguyen 2 Jan 18, 2022
Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language This repository contains the code, model, and deployment config

16 Oct 23, 2022
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
Causal estimators for use with WhyNot

WhyNot Estimators A collection of causal inference estimators implemented in Python and R to pair with the Python causal inference library whynot. For

ZYKLS 8 Apr 06, 2022
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

dev 34 Dec 27, 2022
Training a Resilient Q-Network against Observational Interference, Causal Inference Q-Networks

Obs-Causal-Q-Network AAAI 2022 - Training a Resilient Q-Network against Observational Interference Preprint | Slides | Colab Demo | Environment Setup

23 Nov 21, 2022
Classify music genre from a 10 second sound stream using a Neural Network.

MusicGenreClassification Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University. Featured in

Matan Lachmish 453 Dec 27, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB This repository is the official implementation of the paper. Our results comming soon in

Xiaoqiang Wang 8 May 22, 2022
Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis Abstract: This work targets at using a general deep lea

163 Dec 14, 2022
The object detection pipeline is based on Ultralytics YOLOv5

AYOLOv2 The main goal of this repository is to rewrite the object detection pipeline with a better code structure for better portability and adaptabil

153 Dec 22, 2022
Computer Vision application in the web

Computer Vision application in the web Preview Usage Clone this repo git clone https://github.com/amineHY/WebApp-Computer-Vision-streamlit.git cd Web

Amine Hadj-Youcef. PhD 35 Dec 06, 2022
A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

Evan 1.3k Jan 02, 2023
BankNote-Net: Open dataset and encoder model for assistive currency recognition

BankNote-Net: Open Dataset for Assistive Currency Recognition Millions of people around the world have low or no vision. Assistive software applicatio

Microsoft 13 Oct 28, 2022
PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

ExORL: Exploratory Data for Offline Reinforcement Learning This is an original PyTorch implementation of the ExORL framework from Don't Change the Alg

Denis Yarats 52 Jan 01, 2023
Spam your friends and famly and when you do your famly will disown you and you will have no friends.

SpamBot9000 Spam your friends and family and when you do your family will disown you and you will have no friends. Terms of Use Disclaimer: Please onl

DJ15 0 Jun 09, 2022
Framework web SnakeServer.

SnakeServer - Framework Web 🐍 Documentação oficial do framework SnakeServer. Conteúdo Sobre Como contribuir Enviar relatórios de segurança Pull reque

Jaedson Silva 0 Jul 21, 2022
Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

Peter Schaldenbrand 247 Dec 23, 2022
Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.

Knowledge Graph Embeddings and Chemical Effect Prediction, 2020. Scripts and outputs related to the paper Prediction of Adverse Biological Effects of

Knowledge Graphs at the Norwegian Institute for Water Research 1 Nov 01, 2021
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022