Text Classification Using LSTM

Overview

Text-Classification-Using-LSTM

Ontology Classification-Using-LSTM

Introduction

Text classification is the task of assigning a set of predefined categories to free text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new articles can be organized by topics, support tickets can be organized by urgency, chat conversations can be organized by language, brand mentions can be organized by sentiment, and so on.

Technologies Used

1. IDE - Pycharm
2. LSTM - As a classification Deep learning Model
3. GPU - P-4000
4. Google Colab - Text Analysis
5. Flas- Fast API
6. Postman - API Tester
7. Gensim - Word2Vec embeddings

🔑 Prerequisites All the dependencies and required libraries are included in the file requirements.txt

  Python 3.6

Dataset

The DBpedia ontology classification dataset is constructed by picking 14 non-overlapping classes from DBpedia 2014. They are listed in classes.txt. From each of thse 14 ontology classes, we randomly choose 40,000 training samples and 5,000 testing samples. Therefore, the total size of the training dataset is 560,000 and testing dataset 70,000. The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 14), title and content. The title and content are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). There are no new lines in title or content.

For Dataset Please click here

Process - Flow of This project

🚀 Installation of Text-Classification-Using-LSTM

  1. Clone the repo
git clone https://github.com/KrishArul26/Text-Classification-DBpedia-ontology-classes-Using-LSTM.git
  1. Change your directory to the cloned repo
cd Text-Classification-DBpedia-ontology-classes-Using-LSTM

  1. Create a Python 3.6 version of virtual environment name 'lstm' and activate it
pip install virtualenv

virtualenv bert

lstm\Scripts\activate

  1. Now, run the following command in your Terminal/Command Prompt to install the libraries required!!!
pip install -r requirements.txt

💡 Working

Type the following command:

python app.py

After that You will see the running IP adress just copy and paste into you browser and import or upload your speech then closk the predict button.

Implementations

In this section, contains the project directory, explanation of each python file presents in the directory.

1. Project Directory

Below picture illustrate the complete folder structure of this project.

2. preprocess.py

Below picture illustrate the preprocess.py file, It does the necessary text cleaning process such as removing punctuation, numbers, lemmatization. And it will create train_preprocessed, validation_preprocessed and test_preprocessed pickle files for the further analysis.

3. word_embedder_gensim.py

Below picture illustrate the word_embedder_gensim.py, After done with text pre-processing, this file will take those cleaned text as input and will be creating the Word2vec embedding for each word.

4. rnn_w2v.py

Below picture illustrate the rnn_w2v.py, After done with creating Word2vec for each word then those vectors will use as input for creating the LSTM model and Train the LSTM (RNN) model with body and Classes.

5. index.htmml

Below picture illustrate the index.html file, these files use to create the web frame for us.

6. main.py

Below picture illustrate the main.py, After evaluating the LSTM model, This files will create the Rest -API, To that It will use FLASK frameworks and get the request from the customer or client then It will Post into the prediction files and Answer will be deliver over the web browser.

7. Testing Rest-API

Owner
KrishArul26
Google Certified - TensorFlow Developer | Google Cloud Associated Engineer | Enthusiastic in Machine Learning | Deep Learning | Object Detection | AI
KrishArul26
Build Text Rerankers with Deep Language Models

Reranker is a lightweight, effective and efficient package for training and deploying deep languge model reranker in information retrieval (IR), question answering (QA) and many other natural languag

Luyu Gao 140 Dec 06, 2022
State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 05, 2022
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

Microsoft 394 Dec 17, 2022
The training code for the 4th place model at MDX 2021 leaderboard A.

The training code for the 4th place model at MDX 2021 leaderboard A.

Chin-Yun Yu 32 Dec 18, 2022
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

GNES.ai 1.2k Jan 06, 2023
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022
Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

79 Nov 29, 2022
Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

Loren Kociko 1 Jun 14, 2022
Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R

Weijie Liu 834 Jan 09, 2023
Collection of scripts to pinpoint obfuscated code

Obfuscation Detection (v1.0) Author: Tim Blazytko Automatically detect control-flow flattening and other state machines Description: Scripts and binar

Tim Blazytko 230 Nov 26, 2022
AI-Broad-casting - AI Broad casting with python

Basic Code 1. Use The Code Configuration Environment conda create -n code_base p

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

Rishikesh (ऋषिकेश) 126 Jan 02, 2023
LeBenchmark: a reproducible framework for assessing SSL from speech

LeBenchmark: a reproducible framework for assessing SSL from speech

11 Nov 30, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

2 Oct 17, 2021
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates Vibhor Agarwal, Sagar Joglekar, Anthony P. Young an

Vibhor Agarwal 2 Jun 30, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

AI2 112 Nov 06, 2022
SurvTRACE: Transformers for Survival Analysis with Competing Events

⭐ SurvTRACE: Transformers for Survival Analysis with Competing Events This repo provides the implementation of SurvTRACE for survival analysis. It is

Zifeng 13 Oct 06, 2022