Python Markov Chain chatbot running on Telegram

Overview

Hanasubot

Hanasubot (Japanese 話すボット, talking bot) is a Python chatbot running on Telegram. The bot is based on Markov Chains so it can learn your word instantly, unlike neural network chatbots which require training. It uses a modified version of markovify library for that purporse. However, the output may not make sense at all, though it can sometimes generate hilarious replies.

In theory, the bot can learn in any languages, but for some languages word segmentation is required. The bot currently supports Chinese and Japanese word segmentation, with pkuseg, CkipTagger and mecab. Language detection relies on pycld2.

Hanasubot has a permission system so you can easily stop the bot learning from naughty kids in your group, while still reply them. Users with admin right can erase lines from bot corpus as well.

The bot is designed for Chinese Telegram groups so there are a lot of messages written in Chinese. I18n will happen in future and any help is welcome.

Installation

Python 3.6+ is required.

VENV_PATH=/path/to/your/venv  # Change this
python3 -m venv $VENV_PATH
source $VENV_PATH/bin/activate

pip3 install -r requirements.txt

If you are using Python 3.6, dataclasses 0.8 is required as well:

pip3 install dataclasses==0.8

For Python 3.7 and up, dataclasses is included so no need to install it.

To use CkipTagger for Traditional Chinese tokenization, you have to download the model file (see CkipTagger readme for a detailed guide):

python3 -c "from ckiptagger import data_utils; data_utils.download_data_gdown('./')"

Then unzip to a folder named ckipdata, in the same directory as the Python scripts.

Optionally, you can initialize the user dict for pkuseg and CkipTagger, before start running the bot:

touch ./pkuseg_dict.txt
touch ./ckip_dict.json

Configuration

Copy config.example.py and fill it out. Please check the comments in config file.

cp config.example.py config.py

After that, simply start the bot:

python3 tgbot.py

Bot commands and usage

Simply reply to the bot and it will say some random words if you have collected enough corpus. The bot will also learn from your message instantly. Special commands are as follows.

Require root

  • /reload_config - Reload config file without restarting the bot. Some entries cannot be dynamically reloaded though, see config.example.py for details.

Require admin

  • /erase - Remove lines from corpus. (Non-admins can only erase lines sent by themselves.)
  • /userweight - Set user weight.
  • /ban - Set user right to -1.
  • /restrict - Set user right to 1.
  • /grantnormal - Set user right to 2.
  • /granttrusted -Set user right to 3.
  • /grantadmin - Set user right to 4. Admins are able to add/remove other admins with above commands. See also the user right levels section.

Require trusted

  • /addword_cn - Add a word into pkuseg user dictionary.
  • /addword_tw - Add a word into CkipTagger user dictionary.
  • /rmword_cn - Remove a word from pkuseg user dictionary.
  • /rmword_tw - Remove a word from CkipTagger user dictionary.

Other commands

  • /clddbg - Test language detection of some texts.
  • /cutdbg - Test tokenization of some texts.
  • /policy - See what data is collected by the bot and so on.
  • /reload - Claim your admin rights after you get Telegram group admin.
  • /source - See the source code.
  • /start - Start chatting, useful when you can't find the bot messages to reply.

Database

Initialize

CREATE TABLE IF NOT EXISTS chat(
    chat_id integer PRIMARY KEY,
    chat_tgid integer NOT NULL UNIQUE,
    chat_name text
);
CREATE TABLE IF NOT EXISTS user(
    user_id integer PRIMARY KEY,
    user_tgid integer NOT NULL UNIQUE,
    user_name text,
    user_right integer DEFAULT 2,
    user_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS corpus(
    corpus_id integer PRIMARY KEY,
    corpus_time integer,
    corpus_line text NOT NULL UNIQUE,
    corpus_raw integer REFERENCES raw,
    corpus_chat integer REFERENCES chat,
    corpus_user integer REFERENCES user,
    corpus_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS raw(
    raw_id integer PRIMARY KEY,
    raw_text text UNIQUE
);

User right levels

  • 5 - root.
  • 4 - admin, can change user rights (except root users), can erase a line from corpus, and can set user_weight and corpus_weight (WIP).
  • 3 - trusted user, can feed the bot via private messages, and can add words into dictionary (for tokenization purposes).
  • 2 - normal user.
  • 1 - restricted user, bot will not write their messages into database.
  • -1 - banned user, bot will not reply to their messages.

TODOs

  • Let admins set corpus_weight
  • Batch /erase

License

MIT

A discord bot that will help you browse/download nhentai sources.

Risa Introduction Risa is an nHentai discord bot that will help you browse and download your favorite doujin inside your own discord server. Hosting M

markee7 14 Oct 25, 2021
Clipboard-watcher - Keep an eye on the apps that are using your clipboard

clipboard-watcher This repository contains the code of an experiment, in order t

Gonçalo Valério 48 Oct 13, 2022
Discord bot to monitor collection of mods on the Steam Workshop and notify on update to selected discord server via Nextcordbot API.

Steam-Workshop-Monitor Discord bot to monitor collection of mods on the Steam Workshop and notify on update to selected Discord channel via Nextcordbo

7 Nov 03, 2022
livestream-chat: Overlay para chats de livestreams

livestream-chat Overlay para chats de livestreams. Inicialmente para rodar dentro do browser do obs-studio. TODO: Issues iniciais Suporte a API do You

Eduardo Mendes 10 Dec 16, 2022
Python API Client for Twitter API v2

🐍 Python Client For Twitter API v2 🚀 Why Twitter Stream ? Twitter-Stream.py a python API client for Twitter API v2 now supports FilteredStream, Samp

Twitivity 31 Nov 19, 2022
Simple library for logging to Loggly

#Hoover A python wrapper used to hit the Loggly. API For more information on Hoover see http://wiki.loggly.com/hooverguide ##Install With this git rep

Hoover Loggly 34 May 19, 2021
User-Bot for reporting russian propaganda channels

Юзер-Бот, що автоматизує репортування Телеграм каналів пропагандистів Цей Телеграм Юзер-Бот використовується для автоматизації репорту пропагандистьск

58 Nov 07, 2022
A Discord bot that may save your day by predicting it.

Sage A Discord bot that may save your day by predicting it.

1 Nov 17, 2022
Box SDK for Python

Box Python SDK Installing Getting Started Authorization Server-to-Server Auth with JWT Traditional 3-legged OAuth2 Other Auth Options Usage Documentat

Box 371 Dec 29, 2022
VaccineAlarm is a simple python script that allows user to get notified when their desired vaccine doses are available at vaccine centers near them.

Introduction VaccineAlarm is a simple python script that allows user to get notified when their desired vaccine doses are available at vaccine centers

Ankit Tripathi 5 Nov 26, 2021
GUI Pancakeswap2 and Uniswap3 trading client (and bot)

GUI Pancakeswap2 and Uniswap3 trading client (and bot) (MOST ADVANCE TRADING BOT SUPPORT WINDOWS LINUX MAC) (AUTO BUY TOKEN ON LUNCH AFTER ADD LIQUIDI

16 Dec 23, 2021
Watches your earnings on EarnApp and notifies you when you earned balance or received an payout.

EarnApp-Earning-Monitor Watches your earnings on EarnApp and notifies you when you earned balance or received an payout. Installation Install Python3

Yariya 21 Oct 17, 2022
A Dm Bot, also knows as Mass DM bot which can send one message to All of the Users in a Specific Server!

Discord DM Bot discord.py 1.7.2 python 3.9.5 asyncio 3.4.3 Installation Cloud Host Tutorial uploaded in YouTube, watch it by clicking here. Local Host

hpriyam8 7 Mar 24, 2022
Eclipse-grabber - Generate Discord Token Grabbers for both Windows and MacOS

Eclipse Grabber Eclipse Discord Token Grabber What is Eclipse? Eclipse is an ope

Dimitris Kalopisis 117 Dec 23, 2022
A bot to display per user data from the Twitch Leak

twitch-leak-bot-discord A bot to display per user data from the Twitch Leak by username Where's the data? I can't and don't want to supply the .csv's

SSSEAL-C 0 Nov 08, 2022
Automatic SystemVerilog linting in github actions with the help of Verible

Verible Lint Action Usage See action.yml This is a GitHub Action used to lint Verilog and SystemVerilog source files and comment erroneous lines of co

CHIPS Alliance 10 Dec 26, 2022
1.本项目采用Python Flask框架开发提供(应用管理,实例管理,Ansible管理,LDAP管理等相关功能)

op-devops-api 1.本项目采用Python Flask框架开发提供(应用管理,实例管理,Ansible管理,LDAP管理等相关功能) 后端项目配套前端项目为:op-devops-ui jenkinsManager 一.插件python-jenkins bug修复 (1).插件版本 pyt

3 Nov 12, 2021
A Django-style ORM idea for manipulating Google Datastore entities

No SeiQueLa ORM EM DESENVOLVIMENTO Uma ideia de ORM no estilo do Django para manipular entidades do Google Datastore. Montando seu modelo: from noseiq

Geraldo Castro 16 Nov 01, 2022
ClassesMD5-64 - Get whatsapp md5 code using python

Hello Installation Clone Repo & install bash $ git clone https://github.com/Pito

PitoDev 1 Jan 03, 2022
Prime Mega is a modular bot running on python3 with autobots theme and have a lot features.

PRIME MEGA Prime Mega is a modular bot running on python3 with autobots theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is Cre

『TØNIC』 乂 ₭ILLΣR 45 Dec 15, 2022