Pipetools enables function composition similar to using Unix pipes.

Overview

Pipetools

tests-badge coverage-badge pypi-badge

Complete documentation

pipetools enables function composition similar to using Unix pipes.

It allows forward-composition and piping of arbitrary functions - no need to decorate them or do anything extra.

It also packs a bunch of utils that make common operations more convenient and readable.

Source is on github.

Why?

Piping and function composition are some of the most natural operations there are for plenty of programming tasks. Yet Python doesn't have a built-in way of performing them. That forces you to either deep nesting of function calls or adding extra glue code.

Example

Say you want to create a list of python files in a given directory, ordered by filename length, as a string, each file on one line and also with line numbers:

>>> print(pyfiles_by_length('../pipetools'))
1. ds_builder.py
2. __init__.py
3. compat.py
4. utils.py
5. main.py

All the ingredients are already there, you just have to glue them together. You might write it like this:

def pyfiles_by_length(directory):
    all_files = os.listdir(directory)
    py_files = [f for f in all_files if f.endswith('.py')]
    sorted_files = sorted(py_files, key=len, reverse=True)
    numbered = enumerate(py_files, 1)
    rows = ("{0}. {1}".format(i, f) for i, f in numbered)
    return '\n'.join(rows)

Or perhaps like this:

def pyfiles_by_length(directory):
    return '\n'.join('{0}. {1}'.format(*x) for x in enumerate(reversed(sorted(
        [f for f in os.listdir(directory) if f.endswith('.py')], key=len)), 1))

Or, if you're a mad scientist, you would probably do it like this:

pyfiles_by_length = lambda d: (reduce('{0}\n{1}'.format,
    map(lambda x: '%d. %s' % x, enumerate(reversed(sorted(
        filter(lambda f: f.endswith('.py'), os.listdir(d)), key=len))))))

But there should be one -- and preferably only one -- obvious way to do it.

So which one is it? Well, to redeem the situation, pipetools give you yet another possibility!

pyfiles_by_length = (pipe
    | os.listdir
    | where(X.endswith('.py'))
    | sort_by(len).descending
    | (enumerate, X, 1)
    | foreach("{0}. {1}")
    | '\n'.join)

Why would I do that, you ask? Comparing to the native Python code, it's

  • Easier to read -- minimal extra clutter
  • Easier to understand -- one-way data flow from one step to the next, nothing else to keep track of
  • Easier to change -- want more processing? just add a step to the pipeline
  • Removes some bug opportunities -- did you spot the bug in the first example?

Of course it won't solve all your problems, but a great deal of code can be expressed as a pipeline, giving you the above benefits. Read on to see how it works!

Installation

$ pip install pipetools

Uh, what's that?

Usage

The pipe

The pipe object can be used to pipe functions together to form new functions, and it works like this:

from pipetools import pipe

f = pipe | a | b | c

# is the same as:
def f(x):
    return c(b(a(x)))

A real example, sum of odd numbers from 0 to x:

from functools import partial
from pipetools import pipe

odd_sum = pipe | range | partial(filter, lambda x: x % 2) | sum

odd_sum(10)  # -> 25

Note that the chain up to the sum is lazy.

Automatic partial application in the pipe

As partial application is often useful when piping things together, it is done automatically when the pipe encounters a tuple, so this produces the same result as the previous example:

odd_sum = pipe | range | (filter, lambda x: x % 2) | sum

As of 0.1.9, this is even more powerful, see X-partial.

Built-in tools

Pipetools contain a set of pipe-utils that solve some common tasks. For example there is a shortcut for the filter class from our example, called where():

from pipetools import pipe, where

odd_sum = pipe | range | where(lambda x: x % 2) | sum

Well that might be a bit more readable, but not really a huge improvement, but wait!

If a pipe-util is used as first or second item in the pipe (which happens quite often) the pipe at the beginning can be omitted:

odd_sum = range | where(lambda x: x % 2) | sum

See pipe-utils' documentation.

OK, but what about the ugly lambda?

where(), but also foreach(), sort_by() and other pipe-utils can be quite useful, but require a function as an argument, which can either be a named function -- which is OK if it does something complicated -- but often it's something simple, so it's appropriate to use a lambda. Except Python's lambdas are quite verbose for simple tasks and the code gets cluttered...

X object to the rescue!

from pipetools import where, X

odd_sum = range | where(X % 2) | sum

How 'bout that.

Read more about the X object and it's limitations.

Automatic string formatting

Since it doesn't make sense to compose functions with strings, when a pipe (or a pipe-util) encounters a string, it attempts to use it for (advanced) formatting:

>>> countdown = pipe | (range, 1) | reversed | foreach('{}...') | ' '.join | '{} boom'
>>> countdown(5)
'4... 3... 2... 1... boom'

Feeding the pipe

Sometimes it's useful to create a one-off pipe and immediately run some input through it. And since this is somewhat awkward (and not very readable, especially when the pipe spans multiple lines):

result = (pipe | foo | bar | boo)(some_input)

It can also be done using the > operator:

result = some_input > pipe | foo | bar | boo

Note

Note that the above method of input won't work if the input object defines __gt__ for any object - including the pipe. This can be the case for example with some objects from math libraries such as NumPy. If you experience strange results try falling back to the standard way of passing input into a pipe.

But wait, there is more

Checkout the Maybe pipe, partial application on steroids or automatic data structure creation in the full documentation.

Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

Oracle DevRel 55 Oct 24, 2022
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

Dhaval Patel 1 Jan 09, 2022
Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

Elton Law 329 Dec 05, 2022
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 01, 2022
BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings. it also can assist the binary code analysis rese

BinTuner 42 Dec 16, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

Tobias Krabel 863 Jan 08, 2023
LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022
šŸŒ Create 3d-printable STLs from satellite elevation data šŸŒ

mapa šŸŒ Create 3d-printable STLs from satellite elevation data Installation pip install mapa Usage mapa uses numpy and numba under the hood to crunch

Fabian Gebhart 13 Dec 15, 2022
Data Analysis for First Year Laboratory at Imperial College, London.

Data Analysis for First Year Laboratory at Imperial College, London. For personal reference only, and to reference in lab reports and lab books.

Martin He 0 Aug 29, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 06, 2021
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

DƩbora Mendes de Azevedo 1 Feb 03, 2022
A neural-based binary analysis tool

A neural-based binary analysis tool Introduction This directory contains the demo of a neural-based binary analysis tool. We test the framework using

Facebook Research 208 Dec 22, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022