Diabetes Prediction with Logistic Regression

Overview

Diabetes Prediction with Logistic Regression

  1. Exploratory Data Analysis
  2. Data Preprocessing
  3. Model & Prediction
  4. Model Evaluation
  5. Model Validation: Holdout
  6. Model Validation: 10-Fold Cross Validation
  7. Prediction for A New Observation

Business Problem

Characteristics of people with diabetes will be able to predict whether they have a patient or not it is desirable to develop a machine learning model.

Dataset Story

The data set is part of a large data set maintained at the National Institutes of Diabetes-dIgestive-Kidney Diseases in the United States. this data used for a diabetes study conducted on Pima Indian women aged 21 years and older living in the city of Phoenix, which is their city. The data consists of 768 observations and 8 numerical independent variables. The target variable is specified as "output";

1 diabetes test result is positive, 0 indicates that it is negative.

Variables

  • Pregnancies: Number of pregnancies
  • Glucose: 2 Hours plasma glucose concentration in the oral glucose tolerance test
  • Blood Pressure: mm Hg
  • SkinThickness:
  • Insulin: 2 Hours serum insulin (mu U/ml)
  • DiabetesPedigreeFunction
  • Age: years
  • Outcome: Having diabete (1) or not (0)

In this study, the diabetes data set was reviewed and it was tried to predict whether a person has diabetes with a Logistic Regression model. Firstly, the dependent variable "outcome" was reviewed in the study. In the last step, new variables were produced and the success of the model was tried to be increased. The accuracy rate and F1 score of the established model were determined as 0.63 and the AUC value was determined as 0.84. Finally, it was estimated by the established model whether a randomly selected person has diabetes or not.

Owner
AZİZE SULTAN PALALI
Doping Hafıza | Data Analyst | Data Science and Machine Learning Bootcamp Participant at Veri Bilimi Okulu
AZİZE SULTAN PALALI
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Mert Sezer Ardal 1 Jan 31, 2022
The Emergence of Individuality

The Emergence of Individuality

16 Jul 20, 2022
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022
flexible time-series processing & feature extraction

A corona statistics and information telegram bot.

PreDiCT.IDLab 206 Dec 28, 2022
using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

Mohammed Hussien 2 May 02, 2022
Timeseries analysis for neuroscience data

=================================================== Nitime: timeseries analysis for neuroscience data ===============================================

NIPY developers 212 Dec 09, 2022
This handbook accompanies the course: Machine Learning with Hung-Yi Lee

This handbook accompanies the course: Machine Learning with Hung-Yi Lee

RenChu Wang 472 Dec 31, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
JMP is a Mixed Precision library for JAX.

Mixed precision training [0] is a technique that mixes the use of full and half precision floating point numbers during training to reduce the memory bandwidth requirements and improve the computatio

DeepMind 108 Dec 31, 2022
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 01, 2023
Bayesian Modeling and Computation in Python

Bayesian Modeling and Computation in Python Open access and Code This repository contains the open access version of the text and the code examples in

Bayesian Modeling and Computation in Python 339 Jan 02, 2023
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your

BDFD 6 Nov 05, 2022
Automatically create Faiss knn indices with the most optimal similarity search parameters.

It selects the best indexing parameters to achieve the highest recalls given memory and query speed constraints.

Criteo 419 Jan 01, 2023
scikit-learn is a python module for machine learning built on top of numpy / scipy

About scikit-learn is a python module for machine learning built on top of numpy / scipy. The purpose of the scikit-learn-tutorial subproject is to le

Gael Varoquaux 122 Dec 12, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 07, 2023
MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

旷视天元 MegEngine 371 Dec 21, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
About Solve CTF offline disconnection problem - based on python3's small crawler

About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug

天河 32 Oct 25, 2022
TIANCHI Purchase Redemption Forecast Challenge

TIANCHI Purchase Redemption Forecast Challenge

Haorui HE 4 Aug 26, 2022