Sentiment Analysis Project

This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The training models for this Machine Learning project are built through Count Vectorizer (for the countvectorizer.py program) and TF-IDF Vectorizer (for the tdidf.py program). You can see the difference in implementation and accuracy results through both types of Vectorizers by running the programs separately (usually, TF-IDF Vectorizer is considered more accurate).

System Requirements

Use the pip install command to install the following imports:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier

Usage (description of actions performed)

1. dataset imported
2. null values deleted
3. 30% representative sample is taken to avoid slow down of system
4. sentiments column added
5. input training features and labels defined
6. dataset split into training sets and testing sets
7. text data vectorizer (using CountVectorizer or TF-IDF Vectorizer)
8. models trained:
 -  Logistic Regression (linear clasification)
 -  Support Vector Machine (linear/non-linear data separated into classes by a line/hyperplane)
 -  K Nearest Neighbor (local approximation)
9. print Accuracy Scores, Confusion Matrix, Ture Positive and Negative Rates for all three models

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Related tags

Overview

Sentiment Analysis Project

System Requirements

Usage (description of actions performed)

Contributing

License

Owner

Simran Farrukh

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Just a Basic like Language for Zeno INC

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Yodatranslator is a simple translator English to Yoda-language

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Word Bot for JKLM Bomb Party

Code for the paper "Flexible Generation of Natural Language Deductions"

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Prompt tuning toolkit for GPT-2 and GPT-Neo

GPT-3 command line interaction

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!