Python based GBDT implementation

Last update: Sep 21, 2022

Related tags

Machine Learning Py-Boost

Overview

Py-boost: a research tool for exploring GBDTs

Modern gradient boosting toolkits are very complex and are written in low-level programming languages. As a result,

It is hard to customize them to suit one’s needs
New ideas and methods are not easy to implement
It is difficult to understand how they work

Py-boost is a Python-based gradient boosting library which aims at overcoming the aforementioned problems.

Authors: Anton Vakhrushev, Leonid Iosipoi.

Py-boost Key Features

Simple. Py-boost is a simplified gradient boosting library but it supports all main features and hyperparameters available in other implementations.

Fast with GPU. Despite the fact that Py-boost is written in Python, it works only on GPU and uses Python GPU libraries such as CuPy and Numba.

Easy to customize. Py-boost can be easily customized even if one is not familiar with GPU programming (just replace np with cp). What can be customized? Almost everuthing via custom callbacks. Examples: Row/Col sampling strategy, Training control, Losses/metrics, Multioutput handling strategy, Anything via custom callbacks

Installation

Before installing py-boost via pip you should have cupy installed. You can use:

pip install -U cupy-cuda110 py-boost

Note: replace with your cuda version! For the details see this guide

Quick tour

Py-boost is easy to use since it has similar to scikit-learn interface. For usage example please see:

Tutorial_1_Basics for simple usage examples
Tutorial_2_Advanced_multioutput for advanced multioutput features
Tutorial_3_Custom_features for examples of customization

More examples are comming soon

Other Sber AI Lab Projects

LightAutoML: https://github.com/sberbank-ai-lab/LightAutoML
AutoWoE: https://github.com/sberbank-ai-lab/AutoMLWhitebox
RePlay: https://github.com/sberbank-ai-lab/RePlay

Python based GBDT implementation

Related tags

Overview

Py-boost: a research tool for exploring GBDTs

Py-boost Key Features

Installation

Quick tour

Other Sber AI Lab Projects

Owner

Sberbank AI Lab

Classification based on Fuzzy Logic(C-Means).

Machine Learning approach for quantifying detector distortion fields

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

机器学习检测webshell

A concept I came up which ditches the idea of "layers" in a neural network.

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

High performance Python GLMs with all the features!

Machine-care - A simple python script to take care of simple maintenance tasks

Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑‍🔧

Real-time stream processing for python

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Mesh TensorFlow: Model Parallelism Made Easier

Continuously evaluated, functional, incremental, time-series forecasting

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache (Py)Spark type annotations (stub files).

fastFM: A Library for Factorization Machines

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

Simulation of early COVID-19 using SIR model and variants (SEIR ...).

A simple application that calculates the probability distribution of a normal distribution

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.