Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Last update: Jan 12, 2022

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

The main part of the work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

The complete text and analysis of the work is available and located in EDA-and-Sentiment-Analysis-on IMDB-Dataset.pdf file
The implementation and code of the project is located in the Implementation-Python Files folder.

Overview

The goal of this work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Dataset

For this work, a large dataset which consists of movie reviews was used. Specifically, the publicly available Internet Movie Database (IMDB) review dataset

The data can be obtained from Kaggle or direcetly from Stanford

Methodology

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the initial questions were set in respect to the used dataset. Subsequentially, the data scrapping and data collection were performed. In addition, after the data preprocessing steps were performed, different data analytics and analysis were ,employed in order to better understand the data insights. Finally, during the final analysis, different methodologies and models were utilized in order to classify the textual data based on the sentiment. It is crucial to mention that the whole processed followed a cyclical scheme.

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Related tags

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Technical-Report and Code Availability

Overview

Dataset

Methodology

Owner

Nikolas Petrou

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

DaCy: The State of the Art Danish NLP pipeline using SpaCy

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

NL. The natural language programming language.

Watson Natural Language Understanding and Knowledge Studio

Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Code for the paper "Flexible Generation of Natural Language Deductions"

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Generate vector graphics from a textual caption

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Beautiful visualizations of how language differs among document types.

Label data using HuggingFace's transformers and automatically get a prediction service

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

IMDB film review sentiment classification based on BERT's supervised learning model.