Mixing up the Invariant Information clustering architecture, with self supervised concepts from SimCLR and MoCo approaches

Overview

Self Supervised clusterer

Combined IIC, and Moco architectures, with some SimCLR notions, to get state of the art unsupervised clustering while retaining interesting image latent representations in the feature space using contrastive learning.

Installation

Currently successfully tested on Ubuntu 18.04 and Ubuntu 20.04, with python 3.6 and 3.8

Works for Pytorch versions >= 1.4. Launch following command to install all pd

pip3 install -r requirements.txt

Logs

All information is logged to tensorboard. If you activate the neptune flag, you can also make logs to Neptune.ai.

Tensorboard

To check logs of your trainings using tensorboard, use the command :

tensorboard --logdir=./logs/NAME_OF_TEST/events

The NAME_OF_TEST is generated automatically for each automatic training you launch, composed of the inputed name of the training you chose (explained further below in commands), and the exact date and time when you launched the training. For example test_on_nocadozole_20210518-153531

Neptune

Before using neptune as a log and output control tool, you need to create a neptune account and get your developer token. Create a neptune_token.txt file and store the token in it.

Create in neptune a folder for your outputs, with a name of your choice, then go to main.py and modify from line 129 :

if args.offline :
    CONNECTION_MODE = "offline"
    run = neptune.init(project='USERNAME/PROJECT_NAME',# You should add your project name and username here
                   api_token=token,
                   mode=CONNECTION_MODE,
                   )
else :
    run = neptune.init(project='USERNAME/PROJECT_NAME',# You should add your project name and username here
               api_token=token,
               )

Preparing your own data

All datasets will be put in the ./data folder. As you might have to create various different datasets inside, create a folder inside for each dataset you use, while giving it a linux-friendly name.

To be completed

Commands

  • Adding the --labels command means you have ground truth for classes, and you wish to use it in evaluation

  • Adding the --neptune command means you wish to log your data in neptune (Check logging section)

  • output_k is the number of clusters

  • model_name is the name you'll use to keep track of this specific model. Date of training launch will be added to its name.

  • augmentation is the contrastive loss augmentation types you'll be using. They can be consulted and modified in the datasets/datasetgetter.py file.

  • epochs is the maximal number of epochs you wish to have. It is 1000 by default

  • batch_size is the training batch size. Default is 32

  • val_batch is the validation batch size. Default is 10

  • sty_dim is the size of the style vector. default is 128

  • img_size size of input images

  • --debug is a flag for activating debug mode, where the training is very fast, just to check if everything is working fine

training from scratch
python main.py --gpu 2  --output_k 9  --model_name=validating_best_image_transfer --augmentation BBC --data_type BBBC021_196  --data_folder N1 --neptune --img_size 196
training using pretrained model
python main.py --gpu 2  --output_k 9  --model_name=validating_best_image_transfer --augmentation improved_v2 --data_type BBBC021_196  --data_folder ND8D --labels --neptune --load_model testing_high_cluster_number_20210604-024131_
valiadtion using pretrained model
python main.py --gpu 2  --output_k 9  --model_name=validating_best_image_transfer --augmentation improved_v2 --data_type BBBC021_196  --data_folder ND8D --labels --validation --neptune --load_model testing_high_cluster_number_20210604-024131_
Owner
Bendidi Ihab
Computational Biologist & DL Eng
Bendidi Ihab
InfiniteBoost: building infinite ensembles with gradient descent

InfiniteBoost Code for a paper InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109). A. Rogozhnikov, T. Likhomanenko De

Alex Rogozhnikov 183 Jan 03, 2023
Pandas DataFrames and Series as Interactive Tables in Jupyter

Pandas DataFrames and Series as Interactive Tables in Jupyter Star Turn pandas DataFrames and Series into interactive datatables in both your notebook

Marc Wouts 364 Jan 04, 2023
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models

538 Jan 01, 2023
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 07, 2023
Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

Puesta en Producción de un modelo de aprendizaje automático con Flask y Heroku L

Jesùs Guillen 1 Jun 03, 2022
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.

7.4k Jan 04, 2023
onelearn: Online learning in Python

onelearn: Online learning in Python Documentation | Reproduce experiments | onelearn stands for ONE-shot LEARNning. It is a small python package for o

15 Nov 06, 2022
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
Markov bot - A Writing bot based on Markov Chain for Data Structure Lab

基于马尔可夫链的写作机器人 前端 用html/css完成 Demo展示(已给出文本的相应展示) 用户提供相关的语料库后训练的成果 后端 要完成的几个接口 解析文

DysprosiumDy 9 May 05, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 07, 2023
Microsoft 5.6k Jan 07, 2023
MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

旷视天元 MegEngine 371 Dec 21, 2022
Time-series momentum for momentum investing strategy

Time-series-momentum Time-series momentum strategy. You can use the data_analysis.py file to find out the best trigger and window for a given asset an

Victor Caldeira 3 Jun 18, 2022
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstraction

ZenML 2.6k Jan 08, 2023
Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Machine Learning Engineering for Production (MLOps) Specialization on Coursera (offered by deeplearning.ai) Programming assignments from all courses i

Aman Chadha 173 Jan 05, 2023
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

Miles Cranmer 924 Jan 03, 2023
🎛 Distributed machine learning made simple.

🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
Open MLOps - A Production-focused Open-Source Machine Learning Framework

Open MLOps - A Production-focused Open-Source Machine Learning Framework Open MLOps is a set of open-source tools carefully chosen to ease user experi

Data Revenue 590 Dec 28, 2022
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023