Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Last update: Jan 18, 2022

Related tags

Deep Learning IMDB-Success-Predictor

Overview

IMDB Success Predictor

Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine tuning a pre trained DistilBERT Transformer using Transfer Learning and then saving and reusing the saved model for further use.

Stack

DistilBERT Transformer
Tensorflow
Numpy and Pandas
Selenium, BeautifulSoup4 and requests

Metrics

Accuracy achieved: 81.3492%
ROC_AUC_Score achieved: 0.7217

Installation

1) Ensure Python and Jupyter Notebook are installed. Optionally Conda environment can also be used.

Install the required modules using

pip install -r requirements.txt 

or conda install -r requirements.txt

or !pip install -r requirements.txt for Google Colab.

Selenium requires browser specific drivers. Guides for Chrome and Firefox are mentioned below. Alternatively,this step is optional if the notebook is run on Google Colab.
Chrome: https://chromedriver.chromium.org/getting-started
Firefox: https://www.lambdatest.com/blog/selenium-firefox-driver-tutorial/

Training

1)(Optional) Run the IMDB Web scraper . This generates the already provided csv file and imdb_movies pickle file.

Run the IMDB Web scraper on an environment which has GPU acceleration. Here it is used with Google Colab where Nvidia Tesla T4 or Nvidia Tesla K80 are allocated.
```
Training Time: Roughly 20-25 mins
Epochs: 10
Training Batch Size: 8
Max length of each Sentence: 512 
```
A Movie_prediction_model directory is created with config.json file(provided) and a tf_model.h5 (not provided due to space constraints).

Usage

1) Ensure the model has been created inside Movie_prediction_model directory.

Run the python file using python DistilBERT_Movie_Classifier.py
Enter the description of the movie or TV show you want to predict for. An output will be generated with the binary prediction of success based of IMDB Ratings.

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Related tags

Overview

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

Owner

Gautam Diwan

Server files for UltimateLabeling

Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

Reproducing Results from A Hybrid Approach to Targeting Social Assistance

Awesome Weak-Shot Learning

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Code and data accompanying our SVRHM'21 paper.

Build Low Code Automated Tensorflow, What-IF explainable models in just 3 lines of code.

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

A transformer-based method for Healthcare Image Captioning in Vietnamese

Official implementation of Densely connected normalizing flows

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

A framework for Quantification written in Python

Prompt Tuning with Rules

Implementation of BI-RADS-BERT & The Advantages of Section Tokenization.

Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch

ML for NLP and Computer Vision.

The fastest way to visualize GradCAM with your Keras models.

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes