deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Last update: Oct 17, 2022

Related tags

Deep Learning deep-table

Overview

deep-table

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Design

Architecture

As shown below, each pretraining/fine-tuning model is decomposed into two modules: Encoder and Head.

Encoder

Encoder has Embedding and Backbone.

Embedding makes continuous/categorical features tokenized or simply normalized.
Backbone processes the tokenized features.

Pretraining/Fine-tuning Head

Pretraining/Fine-tuning Head uses Encoder module for training.

Implemented Methods

Available Modules

Encoder - Embedding

FeatureEmbedding
TabTransformerEmbedding

Encoder - Backbone

MLPBackbone
FTTransformerBackbone
SAINTBackbone

Model - Head

MLPHeadModel

Model - Pretraining

DenoisingPretrainModel
SAINTPretrainModel
TabTransformerPretrainModel
VIMEPretrainModel

How To Use

Step 0. Install

python setup.py install

# Installation with pip
pip install -e .

Step 1. Define config.json

You have to define three configs at least.

encoder
model
trainer

Minimum configurations are as follows:

from omegaconf import OmegaConf

encoder_config = OmegaConf.create({
    "embedding": {
        "name": "FeatureEmbedding",
    },
    "backbone": {
        "name": "FTTransformerBackbone",
    }
})

model_config = OmegaConf.create({
    "name": "MLPHeadModel"
})

trainer_config = OmegaConf.create({
    "max_epochs": 1,
})

Other parameters can be changed also by config.json if you want.

Step 2. Define Datamodule

from deep_table.data.data_module import TabularDatamodule


datamodule = TabularDatamodule(
    train=train_df,
    validation=val_df,
    test=test_df,
    task="binary",
    dim_out=1,
    categorical_cols=["education", "occupation", ...],
    continuous_cols=["age", "hours-per-week", ...],
    target=["income"],
    num_categories=110,
)

Step 3. Run Training

>> {'accuracy': array([0.8553...]), 'AUC': array([0.9111...]), 'F1 score': array([0.9077...]), 'cross_entropy': array([0.3093...])} ">

from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


estimator = Estimator(
    encoder_config,      # Encoder architecture
    model_config,        # model settings (learning rate, scheduler...)
    trainer_config,      # training settings (epoch, gpu...)
)

estimator.fit(datamodule)
predict = estimator.predict(datamodule.dataloader(split="test"))
get_scores(predict, target, task="binary")
>>> {'accuracy': array([0.8553...]),
     'AUC': array([0.9111...]),
     'F1 score': array([0.9077...]),
     'cross_entropy': array([0.3093...])}

If you want to train a model with pretraining, write as follows:

from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


pretrain_model_config = OmegaConf.create({
    "name": "SAINTPretrainModel"
})

pretrain_model = Estimator(encoder_config, pretrain_model_config, trainer_config)
pretrain_model.fit(datamodule)

estimator = Estimator(encoder_config, model_config, trainer_config)
estimator.fit(datamodule, from_pretrained=pretrain_model)

See notebooks/train_adult.ipynb for more details.

Custom Datasets

You can use your own datasets.

Prepare datasets and create DataFrame
Preprocess DataFrame
Create your own datamodules using TabularDatamodule

Example code is shown below.

import pandas as pd

import os,sys; sys.path.append(os.path.abspath(".."))
from deep_table.data.data_module import TabularDatamodule
from deep_table.preprocess import CategoryPreprocessor


# 0. Prepare datasets and create DataFrame
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

# 1. Preprocessing pd.DataFrame
category_preprocesser = CategoryPreprocessor(categorical_columns=["species"], use_unk=False)
iris = category_preprocesser.fit_transform(iris)

# 2. TabularDatamodule
datamodule = TabularDatamodule(
    train=iris.iloc[:20],
    val=iris.iloc[20:40],
    test=iris.iloc[40:],
    task="multiclass",
    dim_out=3,
    categorical_columns=[],
    continuous_columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    target=["species"],
    num_categories=0,
)

See notebooks/custom_dataset.ipynb for the full training example.

Custom Models

You can also use your Embedding/Backbone/Model. Set arguments as shown below.

estimator = Estimator(
    encoder_config, model_config, trainer_config,
    custom_embedding=YourEmbedding, custom_backbone=YourBackbone, custom_model=YourModel
)

If custom models are set, the attributes name in corresponding configs will be overwritten.

See notebooks/custom_model.ipynb for more details.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Related tags

Overview

deep-table

Design

Architecture

Encoder

Pretraining/Fine-tuning Head

Implemented Methods

Available Modules

Encoder - Embedding

Encoder - Backbone

Model - Head

Model - Pretraining

How To Use

Step 0. Install

Step 1. Define config.json

Step 2. Define Datamodule

Step 3. Run Training

Custom Datasets

Custom Models

Owner

Human4D Dataset tools for processing and visualization

Gym environments used in the paper: "Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors"

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

PyoMyo - Python Opensource Myo library

Accelerated deep learning R&D

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

Datasets, Transforms and Models specific to Computer Vision

Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Notebook and code to synthesize complex and highly dimensional datasets using Gretel APIs.

Rest API Written In Python To Classify NSFW Images.

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

A cool little repl-based simulation written in Python

TFOD-MASKRCNN - Tensorflow MaskRCNN With Python

Malware Bypass Research using Reinforcement Learning

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Matplotlib Image labeller for classifying images

A transformer-based method for Healthcare Image Captioning in Vietnamese