Basic yet complete Machine Learning pipeline for NLP tasks

Last update: Aug 22, 2022

Related tags

Text Data & NLP ml-pipeline

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

This repository accompanies the article on building basic yet complete ML pipelines for solving NLP tasks.

Requirements

Docker

telnet

Please refer to installation instructions for your system if needed.

Running the pipeline

The whole pipeline of 4 services (mail server, database, prediction service and orchestrator) can be started with one command:

docker-compose -f docker-compose.yaml up --build

It should start printing log messages from the services.

Sending an email

The pipeline is triggered by an unread email appearing in the mailbox. In order to send one, telnet util can be used.

Connecting to the IMAP mail server: telnet localhost 3025

Sending the email with telnet:

EHLO user
MAIL FROM:<[email protected]>
RCPT TO:<user>
DATA
Subject: Hello World
 
Hello!

She works at Apple now but before that she worked at Microsoft.
.
QUIT

If everything went well, something like this should appear in logs:

orchestrator_1                   | Polling mailbox...
prediction-worker_1              | INFO:     172.19.0.5:55294 - "POST /predict HTTP/1.1" 200 OK
orchestrator_1                   | Recorded to DB with id=34: [{'entity_text': 'Apple', 'start': 24, 'end': 29}, {'entity_text': 'Microsoft', 'start': 58, 'end': 67}]

Checking the result

The data must also be recorded to the database. In order to check that, any DB client can be used with the following connection parameters:

host: localhost
port: 5432
database: maildb
username: pguser
pasword: password

and running SELECT * FROM mail LIMIT 10 query.

Basic yet complete Machine Learning pipeline for NLP tasks

Related tags

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

Requirements

Running the pipeline

Running the pipeline

Sending an email

Checking the result

Owner

Ivan

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Text to speech for Vietnamese, ez to use, ez to update

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Chatbot with Pytorch, Python & Nextjs

I can help you convert your images to pdf file.

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Pipeline for training LSA models using Scikit-Learn.

Fully featured implementation of Routing Transformer

KoBART model on huggingface transformers

Sequence Modeling with Structured State Spaces

Few-shot Natural Language Generation for Task-Oriented Dialog

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset