The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
自用直播源集合,附带检测与分类功能。

myiptv 自用直播源集合,附带检测与分类功能。 为啥搞 TLDR: 太闲了。 自己有收集直播源的爱好,和录制直播源的需求。 一些软件自带的直播源太过难用。 网上现有的直播源太杂,且缺乏检测。 一些大源缺乏持续更新,如 iptv-org。 使用指南与 TODO 每次进行大更新后都会进行一次 rel

abc1763613206 171 Dec 11, 2022
A telegram mirror bot with an integrated RSS feed reader.

About What is this repo? This is a slightly modified fork which includes some extra features & memes added to my liking. How's this different from the

11 May 15, 2022
Python client for the iNaturalist APIs

pyinaturalist Introduction iNaturalist is a community science platform that helps people get involved in the natural world by observing and identifyin

Nicolas Noé 79 Dec 22, 2022
𝐀 𝐦𝐨𝐝𝐮𝐥𝐚𝐫 𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦 𝐆𝐫𝐨𝐮𝐩 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐛𝐨𝐭 𝐰𝐢𝐭𝐡 𝐮𝐥𝐭𝐢𝐦𝐚𝐭𝐞 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬

𝐇𝐨𝐰 𝐓𝐨 𝐃𝐞𝐩𝐥𝐨𝐲 For easiest way to deploy this Bot click on the below button 𝐌𝐚𝐝𝐞 𝐁𝐲 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐆𝐫𝐨𝐮𝐩 𝐒𝐨𝐮𝐫𝐜𝐞𝐬 𝐆𝐞𝐧𝐞?

Mukesh Solanki 2 Oct 06, 2021
This Mirror Bot is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our beloved Google Drive.

MIRROR HUNTER This Mirror Bot is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our beloved Google Drive. Repo la

anime republic 130 May 28, 2022
A Superfast SMS & Call bomber for Linux And Termux

PSKR_BOMBER 💣 📱 💀 A Superfast SMS & Call bomber for Linux And Termux ! Disclaimer This tool is for educational purposes only ! Don't use this to ta

1 Dec 20, 2021
Automatically deploy freqtrade to a remote Docker host and auto update strategies.

Freqtrade Automatically deploy freqtrade to a remote Docker host and auto update strategies. I've been using it to automatically deploy to vultr, but

p-zombie 109 Jan 07, 2023
SickNerd aims to slowly enumerate Google Dorks via the googlesearch API then requests found pages for metadata

CLI tool for making Google Dorking a passive recon experience. With the ability to fetch and filter dorks from GHDB.

Jake Wnuk 21 Jan 02, 2023
Aria & Qbittorent Mirror Bot

Eunha Mirror Eunha Mirror is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our beloved Google Drive. Features su

ovin 158 Dec 19, 2022
The easiest way to deploy this Bot

How To Host The easiest way to deploy this Bot Update Channe

Isekai Reszz 1 Jan 23, 2022
A Telegram bot to transcribe audio, video and image into text.

Transcriber Bot A Telegram bot to transcribe audio, video and image into text. Deploy to Heroku Local Deploying Install the FFmpeg. Make sure you have

10 Dec 19, 2022
A discord bot wrapper for python have slash command

A discord bot wrapper for python have slash command

4 Dec 04, 2021
DragDev Maintained Instance Of discord.py

discord.py - DragDev Flavour A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. The Future of discord.py

DragDev Studios 3 Aug 27, 2022
Python SCript to scrape members from a selected Telegram group.

A python script to scrape all the members in a telegram group anad save in a CSV file. REGESTRING Go to this link https://core.telegram.org/api/obtain

Gurjeet Singh 7 Dec 01, 2022
Hacktoberfest2021 - Submit Just 4 PRs to earn SWAGS and Tshirts🔥

dont contribute in this repo, contribute only in below mentioned repo Special Note For Everyone ''' always make more then 4 pull request lets you have

Keshav Singh 820 Jan 02, 2023
A unified API wrapper for YouTube and Twitch chat bots.

Chatto A unified API wrapper for YouTube and Twitch chat bots. Contributing Chatto is open to contributions. To find out where to get started, have a

Ethan Henderson 5 Aug 01, 2022
Cleiton Leonel 4 Apr 22, 2022
A Telegram bot to all media and documents files to web link .

FileStreamBot A Telegram bot to all media and documents files to web link . Report a Bug | Request Feature 🍁 About This Bot : This bot will give you

Code X Mania 129 Jan 03, 2023
Aria/qBittorrent Telegram mirror/leech bot.

Missneha Mirror Leech Bot Aria/qBittorrent Telegram mirror/leech bot. missneha Mirror Leech Bot is a multipurpose Telegram Bot written in Python for m

ACHAL 6 Sep 30, 2022
Monitoring plugin for MikroTik devices

check_routeros - Monitoring MikroTik devices This is a monitoring plugin for Icinga, Nagios and other compatible monitoring solutions to check MikroTi

DinoTools 6 Dec 24, 2022