The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
A Telegram Video Merge Bot by @AbirHasan2005

VideoMerge-Bot This is very simple Telegram Videos Merge Bot by @AbirHasan2005. Using FFmpeg for Merging Videos. Features: Merge Multiple Videos. User

Abir Hasan 57 Nov 12, 2022
An Open Source ALL-In-One Telegram RoBot, that can do lot of things.

An Open Source ALL-In-One Telegram RoBot, that can do lot of things.

JOBIN 0 Dec 01, 2021
Automatic generation of crypto-arts based on image layers

NFT Generator Автоматическая генерация крипто-артов на основе слоев изображения. Установка pip3 install -r requirements.txt rm -rf result/* Как это ра

Zproger 31 Dec 29, 2022
The python SDK for Eto, the AI focused data platform for teams bringing AI models to production

Eto Labs Python SDK This is the python SDK for Eto, the AI focused data platform for teams bringing AI models to production. The python SDK makes it e

5 Apr 21, 2022
An Undertale RPG Discord bot to fight monsters, bosses, level up and duel with other players

UNDERTALE-RPG An Undertale RPG Discord bot to fight monsters, bosses, level up and duel with other players!. Explanation you can collect gold which is

2 Oct 21, 2021
A Python library for the Docker Engine API

Docker SDK for Python A Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run c

Docker 6.1k Jan 03, 2023
A Python Script to scan through an Instagram account to find all the followers and followings.

Instagram Followers Scan A Python Script to scan through an Instagram account to find all the followers and followings. You can also get filtered list

Nityasmit Mallick 6 Oct 27, 2022
Telegram bot made with Python to get notified when visa slots are available

Visa slot bot I created this bot to getnotified when screenshots are available in the Telegram channel for dropbox appointments. How do I use this? Ch

Jimil 7 Jan 03, 2023
A combination between python-flask, that fetch and send data from league client during champion select thanks to LCU

A combination between python-flask, that fetch data and send from league client during champion select thanks to LCU and compare picked champs to the gamesDataBase that we need to collect using my ot

Anas Hamrouni 1 Jan 19, 2022
A minimalistic library designed to provide native access to YNAB data from Python

pYNAB A minimalistic library designed to provide native access to YNAB data from Python. Install The simplest way is to install the latest version fro

Ivan Smirnov 92 Apr 06, 2022
The official Pushy SDK for Python apps.

pushy-python The official Pushy SDK for Python apps. Pushy is the most reliable push notification gateway, perfect for real-time, mission-critical app

Pushy 1 Dec 21, 2021
Código que verifica se o grafo é Hamiltoniano (Em Python)

Código para encontrar um ciclo de Hamilton em um dado grafo e a partir daí verificar se o grafo é hamiltoniano. Um ciclo hamiltoniano é um ciclo gerad

Hemili Beatriz 1 Jan 08, 2022
An elegant mirai-api-http v2 Python SDK.

Ariadne 一个适用于 mirai-api-http v2 的 Python SDK。 本项目适用于 mirai-api-http 2.0 以上版本。 目前仍处于开发阶段,内部接口可能会有较大的变化。 安装 poetry add graia-ariadne 或 pip install graia

Graia Project 259 Jan 02, 2023
Quot-a-lecture - Lecture transcript question extraction

Setup virtualenv venv source venv/bin/activate pip install -r requirements.txt

Pratyaksh Sharma 5 Sep 12, 2022
Discord Bot Sending Members - Leaked by BambiKu ( Me )

Wokify Bot Discord Bot Sending Members - Leaked by BambiKu ( Me ) Info The Bot was orginaly made by someone else! Ghost-Dev just wanted to sell "priva

bambiku 6 Jul 05, 2022
Minecraft name sniper written in python.

⚠️ IMPORTANT ⚠️ DO NOT USE MCSNIPERPY -- READ BELOW This sniper does not support Microsoft accounts or prename / gc sniping and is MUCH harder to use

MCsniperPY 201 Dec 30, 2022
A powerful, cool and well-made userbot for your Telegram profile with promising extension capabilities.

Telecharm userbot A powerful, fast and simple Telegram userbot written in Python 3 and based on Pyrogram 1.X. Currently in active WIP state, so feel f

Daniil Kovalenko 16 Dec 01, 2022
Python API for working with RESQML models

resqpy: Python API for working with RESQML models Introduction resqpy is a pure python package which provides a programming interface (API) for readin

BP 44 Dec 14, 2022
Python Twitter API

Python Twitter Tools The Minimalist Twitter API for Python is a Python API for Twitter, everyone's favorite Web 2.0 Facebook-style status updater for

Mike Verdone 2.9k Jan 03, 2023
A discord bot consuming Notion API to add, retrieve data to Notion databases.

Notion-DiscordBot A discord bot consuming Notion API to add and retrieve data from Notion databases. Instructions to use the bot: Pre-Requisites: a)In

Servatom 57 Dec 29, 2022