A machine learning project that predicts the price of used cars in the UK

Overview

Car Price Prediction

Car Image

Image Credit: AA Cars

Project Overview

  • Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup.
  • Cleaned the data and built a model to help determine the price of cars on auction
  • Built a flask web app and deploy to cloud

Packages/Tools Used

  • Python Version: 3.9
  • BeautifulSoup
  • Request
  • Numpy
  • Matplotlib
  • Seaborn
  • Scikit-Learn

Data

The data was scraped from AA Cars. The data was scraped from multiple pages from the site and was stored as a csv file. The scraped data contains:

  • Name
  • Price
  • Year
  • Mileage
  • Engine
  • Transmisson

Data Cleaning

The features (columns) contained messy entries and were tidied using some custom functions. The following steps were taken.

  • Removed the duplicate rows in the data because it will affect the analysis.
  • Deleted thhe rows with missing values because they ae not up to 1% of the data.
  • Extracted the manufaturer of each car from the name column
  • Corrected some of the values in the manufacturers column by merging similar value and correcting those wrongly extracted.
  • Removed the pounds symbol and the comma in the values of the price column
  • Created an age column by substacting the values in the year column fom the current year, 2021. This is an easier column to work with.
  • Removed the commas, space and miles input in all the values of the mileage columns.
    • Corrected some of the values in the engine and transmission columns by merging similar value and correcting those wrongly extracted.

Exploratory Data Analysis

  • The count of the number of cars owned by each car manufacturer Car manufacturer distribution

  • The count of the number of cars from the different years Year distribution

  • The count of the number of cars with the diffrent car engine types Car engine distribution

  • The count of the number of cars with different car transmission types Car transmission distribution

  • The word cloud of all car manufacturers.

Car manufacturer wordcloud

Model Building

  • The 'name' and 'year' column were dropped because they are irrelevant.
  • The categorical features (name, colour and transmission) were transformed into numerical data and I scaled all the feature values to make all of them be in the same range
  • Linear Regression, Ridge Regression, Random Forest Regressor, Ada Boost Regressor and Support Vector Regressor models were all built.
  • Root mean squared error (RMSE) which is the square root of the sum of the difference between the true value and the predicted value was the metric used to evaluate the performance of the model.
  • The CatBoost Regressor model has the best performance and it was hypertuned using GridSearchCV to improve the performance.
  • The model was tested on new data and it gave a good output.

A flask web app is currently under construction

NB: I am open to constructive criticisms about this project

Owner
Victor Umunna
Victor Umunna
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Model Search Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers sp

AriesTriputranto 1 Dec 13, 2021
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

Computational Data Science Lab 182 Dec 31, 2022
Automated Time Series Forecasting

AutoTS AutoTS is a time series package for Python designed for rapidly deploying high-accuracy forecasts at scale. There are dozens of forecasting mod

Colin Catlin 652 Jan 03, 2023
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
Time Series Prediction with tf.contrib.timeseries

TensorFlow-Time-Series-Examples Additional examples for TensorFlow Time Series(TFTS). Read a Time Series with TFTS From a Numpy Array: See "test_input

Zhiyuan He 476 Nov 17, 2022
Fit interpretable models. Explain blackbox machine learning.

InterpretML - Alpha Release In the beginning machines learned in darkness, and data scientists struggled in the void to explain them. Let there be lig

InterpretML 5.2k Jan 09, 2023
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
Firebase + Cloudrun + Machine learning

A simple end to end consumer lending decision engine powered by Google Cloud Platform (firebase hosting and cloudrun)

Emmanuel Ogunwede 8 Aug 16, 2022
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 07, 2023
Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.

Toolkit for Building Robust ML models that generalize to unseen domains (RobustDG) Divyat Mahajan, Shruti Tople, Amit Sharma Privacy & Causal Learning

Microsoft 149 Jan 06, 2023
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023
A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

This is a salary prediction website developed with the help of machine learning, this makes prediction of salary on basis of few parameters like interview score, experience test score.

1 Feb 10, 2022
SPCL 48 Dec 12, 2022
Predict the demand for electricity (R) - FRENCH

06.demand-electricity Predict the demand for electricity (R) - FRENCH Prédisez la demande en électricité Prérequis Pour effectuer ce projet, vous devr

1 Feb 13, 2022
Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Felix Daudi 1 Jan 06, 2022
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
Tutorial for Decision Threshold In Machine Learning.

Decision-Threshold-ML Tutorial for improve skills: 'Decision Threshold In Machine Learning' (from GeeksforGeeks) by Marcus Mariano For more informatio

0 Jan 20, 2022