Convert monolithic Jupyter notebooks into Ploomber pipelines.

Last update: Dec 16, 2022

Overview

Soorgeon

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Note: Soorgeon is in alpha, help us make it better.

Install

pip install soorgeon

Usage

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

To learn more, check out our guide.

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory daya analysis notebook:

cd examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Related tags

Overview

Soorgeon

Install

Usage

Examples

Community

Owner

Ploomber

A meta plugin for processing timelapse data timepoint by timepoint in napari

Pipeline to convert a haploid assembly into diploid

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

Nobel Data Analysis

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

A tool to compare differences between dataframes and create a differences report in Excel

An Indexer that works out-of-the-box when you have less than 100K stored Documents

TextDescriptives - A Python library for calculating a large variety of statistics from text

Fast, flexible and easy to use probabilistic modelling in Python.

Universal data analysis tools for atmospheric sciences

Program that predicts the NBA mvp based on data from previous years.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

Making the DAEN information accessible.

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

Convert tables stored as images to an usable .csv file

Pipeline and Dataset helpers for complex algorithm evaluation.