Accelerating model creation and evaluation.

Last update: Dec 06, 2021

Overview

EmeraldML

A machine learning library for streamlining the process of
(1) cleaning and splitting data,
(2) training, optimizing, and testing various models based on the task, and
(3) scoring and ranking them
during the exploratory phase for an elementary analysis of which models perform better for a specific dataset.

Installation

Dependencies

Python (>= 3.7)
NumPy (>= 1.21.2)
pandas (>= 1.3.3)
scikit-learn (>= 0.24.2)
statsmodels (>= 0.12.2)

User installation

pip install emeraldml

Development

Source code

You can check the latest sources with the command:

git clone https://github.com/yu3ufff/emeraldml.git

Demo

Getting the data:

import pandas as pd
audi = pd.read_csv('audi.csv')
audi.head()

|    | model   |   year |   price | transmission   |   mileage | fuelType   |   tax |   mpg |   engineSize |
|---:|:--------|-------:|--------:|:---------------|----------:|:-----------|------:|------:|-------------:|
|  0 | A1      |   2017 |   12500 | Manual         |     15735 | Petrol     |   150 |  55.4 |          1.4 |
|  1 | A6      |   2016 |   16500 | Automatic      |     36203 | Diesel     |    20 |  64.2 |          2   |
|  2 | A1      |   2016 |   11000 | Manual         |     29946 | Petrol     |    30 |  55.4 |          1.4 |
|  3 | A4      |   2017 |   16800 | Automatic      |     25952 | Diesel     |   145 |  67.3 |          2   |
|  4 | A3      |   2019 |   17300 | Manual         |      1998 | Petrol     |   145 |  49.6 |          1   |

Using EmeraldML:

import emerald
from emerald.boa import RegressionBoa

rboa = RegressionBoa(random_state=3)
rboa.hunt(data=audi, target='price')
rboa.ladder

[(OptimalRFRegressor, 0.9624889664024406),
 (OptimalDTRegressor, 0.9514992411732952),
 (OptimalKNRegressor, 0.9511411883559433),
 (OptimalLinearRegression, 0.8876961846248467),
 (OptimalABRegressor, 0.8491539140007975)]

for i in range(len(rboa)):
    print(rboa.model(i))

RandomForestRegressor(min_samples_split=5, n_estimators=500, random_state=3)
DecisionTreeRegressor(max_depth=15, min_samples_split=10, random_state=3)
KNeighborsRegressor(n_neighbors=3, p=1)
LinearRegression()
AdaBoostRegressor(learning_rate=0.1, n_estimators=100, random_state=3)

Accelerating model creation and evaluation.

Related tags

Overview

EmeraldML

Installation

Dependencies

User installation

Development

Source code

Demo

Owner

Yusuf

Data from "Datamodels: Predicting Predictions with Training Data"

MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

pandas, scikit-learn, xgboost and seaborn integration

Continuously evaluated, functional, incremental, time-series forecasting

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning

Implementation of K-Nearest Neighbors Algorithm Using PySpark

XAI - An eXplainability toolbox for machine learning

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Warren - Stock Price Predictor

Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python

PLUR is a collection of source code datasets suitable for graph-based machine learning.

Add built-in support for quaternions to numpy