This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Last update: Mar 11, 2022

Related tags

Overview

NLP Classifier

Introduction

This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using batch_inference.py. This architecture can be easily extended to cover a lot more models.

Installation

Set up

$ https://github.com/abdullahtarek/nlp_classifier.git
$ cd nlp_classifier.git
Move the train.csv and test.csv in the data folder

Python

$ pip install -r requirements.txt
$ Copy the training or testing dataset in the "data" folder
$ python training.py or $ python batch_inference.py

Docker

$ docker build . -t nlp_classifier
$ docker run -it -v $DATA_FOLDER:/app/data -v $LOCAL_SAVED_MODEL_FOLDER:/app/saved_models nlp_classifier python batch_inference.py or python training.py

Extra options

Manging Configurations

All configurations are in the conf folder where you can change the data path, model path, etc.
You can also provide the configuration flag while running the script. You can write --help after the python command to see which configs you can change. Example: python3 batch_inference.py --help.

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Related tags

Overview

NLP Classifier

Introduction

Installation

Set up

Python

Docker

Extra options

Manging Configurations

Owner

Abdullah Tarek

Implementation of Fast Transformer in Pytorch

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

Korean Sentence Embedding Repository

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

NLP - Machine learning

CredData is a set of files including credentials in open source projects

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Chatbot with Pytorch, Python & Nextjs

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

硕士期间自学的NLP子任务，供学习参考

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Maha is a text processing library specially developed to deal with Arabic text.

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.