Repository for Project Insight: NLP as a Service

Overview

Project Insight

NLP as a Service

Project Insight

GitHub issues GitHub forks Github Stars GitHub license Code style: black

Contents

  1. Introduction
  2. Installation
  3. Project Details
  4. License

Introduction

Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.

The downstream NLP tasks covered:

  • News Classification

  • Entity Recognition

  • Sentiment Analysis

  • Summarization

  • Information Extraction To Do

The user can select different models from the drop down to run the inference.

The users can also directly use the backend fastapi server to have a command line inference.

Features of the solution

  • Python Code Base: Built using Fastapi and Streamlit making the complete code base in Python.
  • Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
  • Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
    • This makes it easy to update, manitain, start, stop individual NLP services.

Installation

  • Clone the Repo.
  • Run the Docker Compose to spin up the Fastapi based backend service.
  • Run the Streamlit app with the streamlit run command.

Setup and Documentation

  1. Download the models

    • Download the models from here
    • Save them in the specific model folders inside the src_fastapi folder.
  2. Running the backend service.

    • Go to the src_fastapi folder
    • Run the Docker Compose comnand
    $ cd src_fastapi
    src_fastapi:~$ sudo docker-compose up -d
  3. Running the frontend app.

    • Go to the src_streamlit folder
    • Run the app with the streamlit run command
    $ cd src_streamlit
    src_streamlit:~$ streamlit run NLPfily.py
  4. Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation

Project Details

Demonstration

Project Insight Demo

Directory Details

  • Front End: Front end code is in the src_streamlit folder. Along with the Dockerfile and requirements.txt

  • Back End: Back End code is in the src_fastapi folder.

    • This folder contains directory for each task: Classification, ner, summary...etc
    • Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
    • Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
    - sentiment
        > app
            > api
                > distilbert
                    - model.bin
                    - network.py
                    - tokeniser files
                >roberta
                    - model.bin
                    - network.py
                    - tokeniser files
    
    • For each new model under each service a new folder will have to be added.

    • Each folder model will need the following files:

      • Model bin file.
      • Tokenizer files
      • network.py Defining the class of the model if customised model used.
    • config.json: This file contains the details of the models in the backend and the dataset they are trained on.

How to Add a new Model

  1. Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials

  2. Save the model files, tokenizer files and also create a network.py script if using a customized training network.

  3. Create a directory within the NLP task with directory_name as the model name and save all the files in this directory.

  4. Update the config.json with the model details and dataset details.

  5. Update the <service>pro.py with the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:

    • Create a new directory in classification/app/api/. Directory name bert.

    • Update config.json with following:

      "classification": {
      "model-1": {
          "name": "DistilBERT",
          "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)"
      },
      "model-2": {
          "name": "BERT",
          "info": "Model Info"
      }
      }
    • Update classificationpro.py with the following snippets:

      Only if customized class used

      from classification.bert import BertClass

      Section where the model is selected

      if model == "bert":
          self.model = BertClass()
          self.tokenizer = BertTokenizerFast.from_pretrained(self.path)

License

This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

Owner
Abhishek Kumar Mishra
Eat, Sleep, Pray, and Code * An Operations Innovation Lead at IHS Markit during working hours. * Love to read manga and cook new cuisines.
Abhishek Kumar Mishra
Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive" This repository contains demo programs for the Talking Head Anime

Pramook Khungurn 901 Jan 06, 2023
Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

Victor Zhong 33 Dec 27, 2022
AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

ε‡Œι€†ζˆ˜ 75 Dec 05, 2022
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

Jeff Johannsen 3 Nov 27, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 6.4k Jan 09, 2023
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ Getting started Prerequ

Cambridge Quantum 315 Jan 01, 2023
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large πŸ’» GitHub Repository πŸ“š Documentat

Xing Han Lu 244 Dec 30, 2022
A PyTorch implementation of VIOLET

VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling A PyTorch implementation of VIOLET Overview VIOLET is an implementati

Tsu-Jui Fu 119 Dec 30, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

420 Dec 28, 2022
ChessCoach is a neural network-based chess engine capable of natural-language commentary.

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Chris Butner 380 Dec 03, 2022
An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | δΈ­ζ–‡ Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

Fan 137 Oct 26, 2022
Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP This repository maintains some utility scripts for retrieving and preprocessing Wikipedia text

Masatoshi Suzuki 44 Oct 19, 2022
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

297 Dec 29, 2022
TextAttack πŸ™ is a Python framework for adversarial attacks, data augmentation, and model training in NLP

TextAttack πŸ™ Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About β€’ Setup β€’ Usage β€’ Design About TextAttack

QData 2.2k Jan 03, 2023
Course project of [email protected]

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git

Poeroz 2 Apr 24, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

What is MUSE? MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE). MUS

Dani El-Ayyass 47 Sep 05, 2022
Search Git commits in natural language

NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co

Pushkar Patel 50 Mar 22, 2022