A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Last update: Dec 17, 2022

Related tags

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

Contrastive Learning for Event Sequences (CoLES)
Contrastive Predictive Coding (CPC)
Replaced Token Detection (RTD) from ELECTRA
Next Sequence Prediction (NSP) from BERT
Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Contrastive loss (paper)
Triplet loss (paper)
Binomial deviance loss (paper)
Histogramm loss (paper)
Margin loss (paper)
VICReg loss (paper)

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

Self-supervided training and embeddings for downstream task notebook
Self-supervided embeddings in CatBoost notebook
Self-supervided training and fine-tuning notebook
PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments

torch.stack in def collate_feature_dict

ptls/data_load/utils.py

Hello!

If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

opened by Ivanich-spb 11
Not supported multiGPU option from pytorchlightning.Trainer

Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

opened by mazitovs 1
Correct seq_len for feature dict
rec = { 'mcc': [0, 1, 2, 3], 'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0], }

How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array
opened by ivkireev86 1
Save categories encodings along with model weights in demos

Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

opened by ivkireev86 1
Documentation index
Прототип главной страницы документации. Три секции:

описание моделей библиотеки

гайд как использовать библиотеку

как писать свои компоненты

Есть краткое описание и ссылки на подробные (которые напишем потом).

В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.
opened by ivkireev86 1
KL cyclostationarity test tools

Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

Think about tests for other frameworks.

opened by ivkireev86 0
Repair pyspark tests
def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

df = df.withColumn('ts', dt_to_timestamp('dt')) ts = [rec.ts for rec in df.select('ts').collect()]

assert ts == [0, 1325419276, 1640822400]

E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError

def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

assert et[0] == 0

E assert -10800 == 0

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError
opened by ikretus 0
docs. Development guide (for demo notebooks)
add current patterns

when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False

documentation user feedback
opened by ivkireev86 0

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)
What's Changed

fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90

add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87

MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88

fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

New Contributors

@ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90

@mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1
Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 9, 2022)
What's Changed

Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72

Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73

fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74

ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75

tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76

Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78

spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80

Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81

tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83

Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

New Contributors

@blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 27, 2022)
What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 12, 2022)
More Pythonic Core API: constructor arguments instead of config objects

What's Changed

cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9

All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15

Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13

all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20

Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8

Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18

loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22

Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23

readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26

work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7

trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0
Source code(tar.gz)
Source code(zip)

Owner

Dmitri Babaev

GitHub Repository

Expert Finding in Legal Community Question Answering

Expert Finding in Legal Community Question Answering Arian Askari, Suzan Verberne, and Gabriella Pasi. Expert Finding in Legal Community Question Answ

3 Oct 31, 2022

Use CLIP to represent video for Retrieval Task

A Straightforward Framework For Video Retrieval Using CLIP This repository contains the basic code for feature extraction and replication of results.

54 Dec 22, 2022

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Dual-Level Collaborative Transformer for Image Captioning This repository contains the reference code for the paper Dual-Level Collaborative Transform

160 Dec 11, 2022

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022

Reimplementation of Learning Mesh-based Simulation With Graph Networks

Pytorch Implementation of Learning Mesh-based Simulation With Graph Networks This is the unofficial implementation of the approach described in the pa

33 Dec 14, 2022

S2s2net - Sentinel-2 Super-Resolution Segmentation Network

S2S2Net Sentinel-2 Super-Resolution Segmentation Network Getting started Install

10 Nov 10, 2022

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Cascading Feature Extraction for Fast Point Cloud Registration This repository contains the source code for the paper [Arxive link comming soon]. Meth

7 May 26, 2022

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Dataset Distillation by Matching Training Trajectories Project Page | Paper This repo contains code for training expert trajectories and distilling sy

256 Jan 05, 2023

HyperaPy: An automatic hyperparameter optimization framework ⚡🚀

hyperpy HyperPy: An automatic hyperparameter optimization framework Description HyperPy: Library for automatic hyperparameter optimization. Build on t

7 Sep 06, 2022

A library for using chemistry in your applications

Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -

28 Dec 17, 2021

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (CVPR 2021, oral presentation) CoCosNet v2: Full-Resolution Correspondence

308 Dec 07, 2022

IMBENS: class-imbalanced ensemble learning in Python.

IMBENS: class-imbalanced ensemble learning in Python. Links: [Documentation] [Gallery] [PyPI] [Changelog] [Source] [Download] [知乎/Zhihu] [中文README] [a

176 Jan 04, 2023

Implementation of the SUMO (Slim U-Net trained on MODA) model

SUMO - Slim U-Net trained on MODA Implementation of the SUMO (Slim U-Net trained on MODA) model as described in: TODO: add reference to paper once ava

6 Nov 19, 2022

The-Secret-Sharing-Schemes - This interactive script demonstrates the Secret Sharing Schemes algorithm

The-Secret-Sharing-Schemes This interactive script demonstrates the Secret Shari

1 Jan 02, 2022

Various operations like path tracking, counting, etc by using yolov5

Object-tracing-with-YOLOv5 Various operations like path tracking, counting, etc by using yolov5

5 Nov 28, 2022

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

ClimART - A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models Official PyTorch Implementation Using deep le

21 Dec 31, 2022

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

A Workbook for the Qiskit Developer Certification Exam Hello everyone! This is Bartu, a fellow Qiskitter. I have recently taken the Certification exam

66 Dec 10, 2022

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Related tags

Overview

Install from PyPi

Install from source

Demo notebooks

Experiments on public datasets

Comments

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)

What's Changed

New Contributors

v0.5.0(Nov 9, 2022)

What's Changed

New Contributors

v0.4.0(Jul 27, 2022)

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

v0.3.0(Jun 12, 2022)

What's Changed

Owner

Dmitri Babaev

Expert Finding in Legal Community Question Answering

Use CLIP to represent video for Retrieval Task

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Reimplementation of Learning Mesh-based Simulation With Graph Networks

S2s2net - Sentinel-2 Super-Resolution Segmentation Network

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

HyperaPy: An automatic hyperparameter optimization framework ⚡🚀

A library for using chemistry in your applications

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

IMBENS: class-imbalanced ensemble learning in Python.

Implementation of the SUMO (Slim U-Net trained on MODA) model

The-Secret-Sharing-Schemes - This interactive script demonstrates the Secret Sharing Schemes algorithm

Various operations like path tracking, counting, etc by using yolov5

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Codes for paper "KNAS: Green Neural Architecture Search"

Implementation of UNet on the Joey ML framework