Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

Overview

NLP Boot Camp (Jan) Synopsis

Full Name:

Prameya Mohanty

Name of your School:

Delhi Public School, Rourkela

Class:

VIII

Title of the Project:

iTransect – A Language Detector cum Translator

Project Domain:

Natural Language Processing

Summary:

This application is an AI and NLP enabled language detector cum translator. It can first detect the language used in the text entered by the user. Then it can also convert the text in your desired language. This app has a capability to recognize and translate text to over 15 languages.

Context:

We frequently face problems while reading google articles or while going through websites which are not in English language or our mother tongue. Many rural people also don't understand any language except their Mother Tongue. So, they can also translate the text and go through it.

My idea for this problem is that we can create a translator to translate the text into a language which we can understand. But another problem which occurs is that we need to first recognize that the original text is written in which language and mostly we fail to do so. For this reason, my application would just take the text as input, recognize the language of the text and then it would also translate the text into our desired language.

I transformed my idea into a solution by performing some Natural Language Processing on the text given by the user to first recognize the language used in the text and then translate into the desired language of the user.

How does it work:

I have used the MultinomialNB Model of the Scikit-Learn Library. The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

My application contains a Huge Dataset which contains over 15 languages and some texts on those languages. This dataset in trained on the MultinomialNB Model of the Scikit-Learn Library. This helps it to predict the language of the desired text which we provide to it. Then I have used the GoogleTrans API to Translate our Text into the desired language of the user.

My application takes some text as input from the user. Then it detects the language used in the text by a MultinomialNB Model of the Scikit-Learn Library. After that it uses the GoogleTrans API to translate the text into the desired language of the user.

The future scope of my model is that we can increase the dataset by adding more languages so that the predictions would be more accurate. This would also help our application to cover a broader audience.

Instructions for Usage:

  1. Prerequisite: To use this application, you should have Python installed in your system. Installation of Git is recommended but not compulsory.

  2. Clone Repo: If you have git installed in your system then you can use the command given here or else you can just click on the Code button and then click on the Download ZIP Button. git clone https://github.com/The-Coding-Hub/iTransect.git

  3. Install Requirements: Now you need to install the requirements of this application using pip and the requirements.txt file. Command to be executed in the console is given below. pip install -r ./requirements.txt

  4. Start App: Now you are all set the use this application. You just need to execute the command given below to start the development server of Python Flask in your Localhost.

  5. Enjoy App: Just open the link given in your console and then you can enjoy our application!

Video Link:

https://youtu.be/QsJQ1lxI2Lw

Code Folder Link:

https://github.com/The-Coding-Hub/iTransect

Owner
TheCodingHub
Student at Delhi Public School, Rourkela, Odisha. Programming is my favorite sport. YouTube Channel: TheCodingHub
TheCodingHub
KR-FinBert And KR-FinBert-SC

KR-FinBert & KR-FinBert-SC Much progress has been made in the NLP (Natural Language Processing) field, with numerous studies showing that domain adapt

5 Jul 29, 2022
GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

Yunxiao Zhao 5 Nov 04, 2022
BERT-based Financial Question Answering System

BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b

Bithiah Yuan 61 Sep 18, 2022
A music comments dataset, containing 39,051 comments for 27,384 songs.

Music Comments Dataset A music comments dataset, containing 39,051 comments for 27,384 songs. For academic research use only. Introduction This datase

Zhang Yixiao 2 Jan 10, 2022
Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

CvarAdversarialRL Official code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning". Initial setup Create a virtual

Mathieu Godbout 1 Nov 19, 2021
DeepPavlov Tutorials

DeepPavlov tutorials DeepPavlov: Sentence Classification with Word Embeddings DeepPavlov: Transfer Learning with BERT. Classification, Tagging, QA, Ze

Neural Networks and Deep Learning lab, MIPT 28 Sep 13, 2022
A look-ahead multi-entity Transformer for modeling coordinated agents.

baller2vec++ This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling

Michael A. Alcorn 30 Dec 16, 2022
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Thempra 2 Dec 21, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
Black for Python docstrings and reStructuredText (rst).

Style-Doc Style-Doc is Black for Python docstrings and reStructuredText (rst). It can be used to format docstrings (Google docstring format) in Python

Telekom Open Source Software 13 Oct 24, 2022
Non-Autoregressive Predictive Coding

Non-Autoregressive Predictive Coding This repository contains the implementation of Non-Autoregressive Predictive Coding (NPC) as described in the pre

Alexander H. Liu 43 Nov 15, 2022
Translates basic English sentences into the Huna language (hoo-NAH)

huna-translator The Huna Language Translates basic English sentences into the Huna language (hoo-NAH). The Huna constructed language was developed in

Miles Smith 0 Jan 20, 2022
Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

Shuai Zhang 4 Dec 01, 2021
Natural Language Processing Best Practices & Examples

NLP Best Practices In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive bus

Microsoft 6.1k Dec 31, 2022
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
Autoregressive Entity Retrieval

The GENRE (Generative ENtity REtrieval) system as presented in Autoregressive Entity Retrieval implemented in pytorch. @inproceedings{decao2020autoreg

Meta Research 611 Dec 16, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

BERT Convolutions Code for the paper Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models. Contains expe

mlpc-ucsd 21 Jul 18, 2022
DLO8012: Natural Language Processing & CSL804: Computational Lab - II

NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II DLO8012: NLP & CSL804: CL-II [SEMESTER VIII] Syllabus NLP - Reference Books THE WALL MEGA SATISH

AMEY THAKUR 7 Apr 28, 2022