Machine Learning Algorithms

Overview

Machine-Learning-Algorithms

In this project, the dataset was created through a survey opened on Google forms. The purpose of the form is to find the person's favorite shopping type based on the information provided. In this context, 13 questions were asked to the user. As a result of these questions, the estimation of the shopping type, which is a classification problem, will be carried out with 5 different algorithms.

These algorithms;

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine
  • K Neighbors
  • Decision Tree

algorithms will have a total of 12 parameters

A total of 219 people participated in the survey and the answers given to this form were used in the training of the algorithm.

Target variables to be estimated;

  • Clothing
  • Technology
  • Home/Life
  • Book/Magazine

The questions asked to make the estimation are as follows:

  • Gender
  • Age
  • Which store would you prefer to go to?
  • Which store would you prefer to go to?
  • Which store would you prefer to go to?
  • What is your favorite season?
  • What is the importance of the dollar exchange rate for your shopping?
  • What is your satisfaction level with your budget for shopping?
  • How would you rate your social life?
  • Which of the online shopping sites do you prefer?
  • How often do you go shopping?
  • What is your average sleep time per day?
  • What is your favorite type of shopping? // target

The dataset, which is in the form of a csv file, is read to the system as a dataframe. And the column of information in which hour and minute the user filled out the form, which does not make sense for our algorithm, is removed.

Since the numbers in some columns is way more different than the others before the PCA operation is performed, the standardization process is applied to the columns so that they do not have a greater effect than the combination of these columns during the PCA operation.

The features and target columns to be used during the export of the dataset to the algorithms are determined.

In order to fit the resulting algorithms, the initial state of the dataset, its normalized state and the pca applied states are kept separately. The generated data is divided into parts as train = 0.8 and test = 0.2. Cross Validation process will be applied on 0.8 train data.

Before giving the dataset to the 5 algorithms, the answers written in the text in the dataset and the text in the other questions are encoded and the dataset is converted into numbers.

The 5 algorithms are functions from the sklearn library. The Cross Validation process was performed using the GridSearchCV() function, excluding the Logistic Regression algorithm. In the Logistic regression algorithm, since it is possible to do Cross Validation with the logistic regression function it is not necessary to use GridSearchCV().

GridSearchCV() applies K-Fold Cross Validation by trying the parameters I gave for the function, the number of K for my project is 10. By dividing the cross validation process parameters and the train data we provide, it is determined at which values we can get the best result.

An algorithm is created using the determined parameters and the algorithm is tested with the test data to be fitted with the train data.

Detailed information about dataset can be found in the report.

Owner
Göktuğ Ayar
Computer Engineering student at Yildiz Technical University
Göktuğ Ayar
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
Educational python for Neural Networks, written in pure Python/NumPy.

Educational python for Neural Networks, written in pure Python/NumPy.

127 Oct 27, 2022
Mixing up the Invariant Information clustering architecture, with self supervised concepts from SimCLR and MoCo approaches

Self Supervised clusterer Combined IIC, and Moco architectures, with some SimCLR notions, to get state of the art unsupervised clustering while retain

Bendidi Ihab 9 Feb 13, 2022
This repository demonstrates the usage of hover to understand and supervise a machine learning task.

Hover Example Apps (works out-of-the-box on Binder) This repository demonstrates the usage of hover to understand and supervise a machine learning tas

Pavel 43 Dec 03, 2021
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
Python module for performing linear regression for data with measurement errors and intrinsic scatter

Linear regression for data with measurement errors and intrinsic scatter (BCES) Python module for performing robust linear regression on (X,Y) data po

Rodrigo Nemmen 56 Sep 27, 2022
cleanlab is the data-centric ML ops package for machine learning with noisy labels.

cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and lear

Cleanlab 51 Nov 28, 2022
A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

Dustin Tran 1.5k Dec 29, 2022
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstraction

ZenML 2.6k Jan 08, 2023
A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

Author Ibrahim Koné From-Machine-Learning-Models-To-WebAPI A collection of Machine Learning Models To Web Api which are built on open source technolog

Ibrahim Koné 2 May 24, 2022
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

Computational Data Science Lab 182 Dec 31, 2022
Adaptive: parallel active learning of mathematical functions

adaptive Adaptive: parallel active learning of mathematical functions. adaptive is an open-source Python library designed to make adaptive parallel fu

741 Dec 27, 2022
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible

IMBENS: Class-imbalanced Ensemble Learning in Python Language: English | Chinese/中文 Links: Documentation | Gallery | PyPI | Changelog | Source | Downl

Zhining Liu 176 Jan 04, 2023
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Warren - Stock Price Predictor

Web app to predict closing stock prices in real time using Facebook's Prophet time series algorithm with a multi-variate, single-step time series forecasting strategy.

Kumar Nityan Suman 153 Jan 03, 2023
Fundamentals of Machine Learning

Fundamentals-of-Machine-Learning This repository introduces the basics of machine learning algorithms for preprocessing, regression and classification

Happy N. Monday 3 Feb 15, 2022
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022