This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Last update: Jan 09, 2022

Related tags

Machine Learning Zillow-Houses

Overview

Zillow-Houses

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Pipeline is consists of 10 general steps

Exploratory Data Analysis (Univariate, Bivariate, Hypothesis testing, Confident Interals)
Missing values (different advanced and not strategies to impute: MICE algo with the using of gradient boosting, lightgbm etc.)
Duplicate checking
Advanced Anomaly Detection (models such as KNN, Isolation Forests, and final detector witch aggregates results from base models - SUOD)
Multicollinearity problem solving
Feature Engineering
Feature Transformation of some features with hypothesis testing on it (fitting distributions with some statistical tests)
Advanced Feature Selection and not - Recursive Feature Elimination with cross-validation on different tree-based models such as Gradient Boosting, Random Forests etc) and of course Lasso with L1-norm, Feature Importances of trees and combine them into one algorithm witch takes in account all the above method
Modeling (different regression models, fine-tuning, learning curves, validation curves, Residuals Analysis etc.). Later, i wan't to use some stacking stategies on boosted trees and some NN models
Results analysis: best model selection with the using of confident intervals and different non-parametric statistical tests etc.

This solution also contains custom preprocessing pipeline witch automaticly can do 2-8 steps ( all in :) )

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Related tags

Overview

Zillow-Houses

Owner

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Examples and code for the Practical Machine Learning workshop series

Time Series Prediction with tf.contrib.timeseries

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

Metric learning algorithms in Python

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

Firebase + Cloudrun + Machine learning

Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

XGBoost + Optuna

MIT-Machine Learning with Python–From Linear Models to Deep Learning

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

A simple application that calculates the probability distribution of a normal distribution

cuML - RAPIDS Machine Learning Library

Estudos e projetos feitos com PySpark.

Tools for mathematical optimization region

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

Napari sklearn decomposition

This is an auto-ML tool specialized in detecting of outliers

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Related tags

Overview

Zillow-Houses

Owner

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Examples and code for the Practical Machine Learning workshop series

Time Series Prediction with tf.contrib.timeseries

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

Metric learning algorithms in Python

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

Firebase + Cloudrun + Machine learning

Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

XGBoost + Optuna

MIT-Machine Learning with Python–From Linear Models to Deep Learning

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

A simple application that calculates the probability distribution of a normal distribution

cuML - RAPIDS Machine Learning Library

Estudos e projetos feitos com PySpark.

Tools for mathematical optimization region

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

Napari sklearn decomposition

This is an auto-ML tool specialized in detecting of outliers

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。