Simple and fast histogramming in Python accelerated with OpenMP.

Overview

pygram11

Documentation Status Actions Status PyPI version Conda Forge Python Version

Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11.

pygram11 provides functions for very fast histogram calculations (and the variance in each bin) in one and two dimensions. The API is very simple; documentation can be found here (you'll also find some benchmarks there).

Installing

From PyPI

Binary wheels are provided for Linux and macOS. They can be installed from PyPI via pip:

pip install pygram11

From conda-forge

For installation via the conda package manager pygram11 is part of conda-forge.

conda install pygram11 -c conda-forge

Please note that on macOS the OpenMP libraries from LLVM (libomp) and Intel (libiomp) may clash if your conda environment includes the Intel Math Kernel Library (MKL) package distributed by Anaconda. You may need to install the nomkl package to prevent the clash (Intel MKL accelerates many linear algebra operations, but does not impact pygram11):

conda install nomkl ## sometimes necessary fix (macOS only)

From Source

You need is a C++14 compiler and OpenMP. If you are using a relatively modern GCC release on Linux then you probably don't have to worry about the OpenMP dependency. If you are on macOS, you can install libomp from Homebrew (pygram11 does compile on Apple Silicon devices with Python version >= 3.9 and libomp installed from Homebrew). With those dependencies met, simply run:

git clone https://github.com/douglasdavis/pygram11.git --recurse-submodules
cd pygram11
pip install .

Or let pip handle the cloning procedure:

pip install git+https://github.com/douglasdavis/[email protected]

Tests are run on Python versions >= 3.7 and binary wheels are provided for those versions.

In Action

A histogram (with fixed bin width) of weighted data in one dimension:

>>> rng = np.random.default_rng(123)
>>> x = rng.standard_normal(10000)
>>> w = rng.uniform(0.8, 1.2, x.shape[0])
>>> h, err = pygram11.histogram(x, bins=40, range=(-4, 4), weights=w)

A histogram with fixed bin width which saves the under and overflow in the first and last bins:

>>> x = rng.standard_normal(1000000)
>>> h, __ = pygram11.histogram(x, bins=20, range=(-3, 3), flow=True)

where we've used __ to catch the None returned when weights are absent. A histogram in two dimensions with variable width bins:

>>> x = rng.standard_normal(1000)
>>> y = rng.standard_normal(1000)
>>> xbins = [-2.0, -1.0, -0.5, 1.5, 2.0, 3.1]
>>> ybins = [-3.0, -1.5, -0.1, 0.8, 2.0, 2.8]
>>> h, err = pygram11.histogram2d(x, y, bins=[xbins, ybins])

Manually controlling OpenMP acceleration with context managers:

>>> with pygram11.omp_disabled():  # disable all thresholds.
...     result, _ = pygram11.histogram(x, bins=10, range=(-3, 3))
...
>>> with pygram11.omp_forced(key="thresholds.var1d"):  # force a single threshold.
...     result, _ = pygram11.histogram(x, bins=[-3, -2, 0, 2, 3])
...

Histogramming multiple weight variations for the same data, then putting the result in a DataFrame (the input pandas DataFrame will be interpreted as a NumPy array):

>> data = rng.standard_normal(N) >>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True) >>> count_df = pd.DataFrame(count, columns=weights.columns) >>> err_df = pd.DataFrame(err, columns=weights.columns)">
>>> N = 10000
>>> weights = pd.DataFrame({"weight_a": np.abs(rng.standard_normal(N)),
...                         "weight_b": rng.uniform(0.5, 0.8, N),
...                         "weight_c": rng.uniform(0.0, 1.0, N)})
>>> data = rng.standard_normal(N)
>>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True)
>>> count_df = pd.DataFrame(count, columns=weights.columns)
>>> err_df = pd.DataFrame(err, columns=weights.columns)

I also wrote a blog post with some simple examples.

Other Libraries

  • boost-histogram provides Pythonic object oriented histograms.
  • Simple and fast histogramming in Python using the NumPy C API: fast-histogram (no variance or overflow support).
  • To calculate histograms in Python on a GPU, see cupy.histogram.

If there is something you'd like to see in pygram11, please open an issue or pull request.

Comments
  • Where is pygram11 in performance compared to fast-histogram?

    Where is pygram11 in performance compared to fast-histogram?

    Basically the title.

    I want to go for raw speed only and don't care for much more than being able to specify bin-edges and getting back a numpy-array from the function that calculates the histogram.

    To give a little more context: It could be, that i will have to calculate a histogram of 1'000'000 bins, where each bin is 1000 integers wide. I would have to fill 1000 values in those bins. Can you tell me if this is possible with pygram11?

    opened by tim-hilt 7
  • Config adjustments

    Config adjustments

    • Rewrite configuration; now controllable by a module, context managers, and decorator
    • modularize things a bit (separate numpy API from pygram11 core API)
    opened by douglasdavis 0
  • WIP: Large overhaul of backend, reorganize internal (non-public) API

    WIP: Large overhaul of backend, reorganize internal (non-public) API

    • [x] OpenMP required to build (not a difficult dependency to satisfy) - toggled via if statements on the input data size.
      • [x] omp function argument deprecated
    • [x] Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directory, no pybind11)
    • [x] Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • [x] Python code moved to the top level module (__init__.py) (without changing public API)
    • [x] Documentation updates
    opened by douglasdavis 0
  • Improve OpenMP inspection; docs improvements; pybind11 update

    Improve OpenMP inspection; docs improvements; pybind11 update

    • Improve API for accessing information about OpenMP availability and properties
    • Docs improvements
    • Update to latest pybind11 release (2.4.3)
    • Clean up some old scripts
    opened by douglasdavis 0
  • Development for v0.5

    Development for v0.5

    • [x] implement C++ back end for multi weight histogramming
    • [x] redesign one-dimensional histogramming backend, simplify code by using more of the pybind11 array API (front end API uinchanged)
    opened by douglasdavis 0
  • Allow over/under flow for 1D histograms

    Allow over/under flow for 1D histograms

    • extend the number of bins in memory by 2
    • fill bin 0 for data less than xmin
    • fill bin nbins + 1 for data larger than than xmax
    • first real bin is now index 1
    • last real bin is now index nbins
    • add named arguments to python functions which dictate how to slice result so we return the desired bin counts.
    opened by douglasdavis 0
Releases(0.13.2)
  • 0.13.2(Dec 5, 2022)

  • 0.13.1(Aug 30, 2022)

  • 0.13.0(Aug 24, 2022)

  • 0.12.2(May 25, 2022)

  • 0.12.1(Oct 19, 2021)

  • 0.12.0(Apr 16, 2021)

    • OpenMP configuration redesigned.
      • constants have been removed in favor of new pygram11.config module.
      • Decorators and context managers added to control toggling OpenMP thresholds
        • pygram11.without_omp decorator
        • pygram11.with_omp decorator
        • pygram11.disable_omp context manager
        • pygram11.force_omp context manager
      • See the documentation for more
    • New keyword argument added to histogramming functions (cons_var) for returning variance instead of standard error.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 27, 2021)

    Renamed internal Python files hist.py and misc.py to _hist.py and _misc.py, respectively.

    The contents of these modules are brought in to the main pygram11 module namespace by imports in __init__.py (the submodules themselves are not meant to be part of the public API). This avoids tab completions of pygram11.hi<tab> yielding pygram11.hist when we actually want pygram11.histogram.

    Source code(tar.gz)
    Source code(zip)
  • 0.11.1(Feb 25, 2021)

    Two convenience functions added to the pygram11 namespace:

    • bin_centers: returns an array representing the the center of histogram bins given a total number of bins and an axis range or given existing bin edges.
    • bin_edges: returns an array representing bin edges given a total number of bins and an axis range.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Feb 11, 2021)

    • API change: functions calls with weights=None now return None as the second return. Previously the uncertainty was returned (which is just the square-root of the bin heights); now users can take the square-root themselves, and the back-end does not waste cycles tracking the uncertainty since it's trivial for unweighted histograms.
    • More types are supported without conversions (previously np.float64 and np.float32 were the only supported array types, and we converted non-floating point input). Now signed and unsigned integer inputs (both 64 and 32 bit) are supported.
      • If unsupported array types are used TypeError is now raised. This library prioritizes performance; hidden performance hits are explicitly avoided.
    • Configurable thresholds have been introduced to configure when OpenMP acceleration is used (described in the documentation).
    • The back-end was refactored with some help from boost::mp11 to aid in adding more type support without introducing more boilerplate. We now vendor boost::mp11 as a submodule.
    • Bumped the vendored pybind11 submodule to v2.6.2.
    • C++14 now required to build from source.
    • Added Apple Silicon support for Python 3.9 with libomp from Homebrew installed at /opt/homebrew.
    • Documentation improvements
    • Renamed master branch to main.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0rc1(Feb 10, 2021)

  • 0.10.3(Oct 9, 2020)

  • 0.10.2(Oct 9, 2020)

    • Vendor pybind11 2.6 series.
    • Some changes to backend types (std::size_t -> pybind11::ssize_t for sizes) (https://github.com/pybind/pybind11/pull/2293)
    • Improvements to CI/CD: using cibuildwheel to build wheels.
    • Python 3.9 is now tested and compatible wheels are built.
    Source code(tar.gz)
    Source code(zip)
  • 0.10.2rc1(Oct 9, 2020)

  • 0.10.1(Sep 5, 2020)

  • 0.10.0(Jun 30, 2020)

    Renamed internal Python module from histogram to hist. This avoids a clash with the module function of the same name. Some IDE features were confused.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(May 26, 2020)

  • 0.9.0(May 26, 2020)

  • 0.8.2(May 18, 2020)

  • 0.8.1(Apr 26, 2020)

  • 0.8.0(Feb 25, 2020)

    Public API changes:

    • Remove previously deprecated omp function argument

    Backend changed:

    • moved 1D backend back to pybind11 (template flexibility)
    • parallelization cutoffs consistently at array size of 5000 for all calculations
    • all non-floating point inputs are converted

    Other:

    • Improved tests
    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Feb 11, 2020)

  • 0.7.2(Feb 6, 2020)

  • 0.7.1(Feb 6, 2020)

  • 0.7.0(Feb 5, 2020)

    • OpenMP is now required to build from source (not a difficult dependency to satisfy)
      • omp function argument deprecated
      • omp_max_threads renamed to omp_get_max_threads to mirror OpenMP C API
      • omp_available function removed
    • Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directly, no pybind11), more types supported (avoid conversions)
    • Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • Python code moved to the top level module (__init__.py) (without changing public API)
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 20, 2019)

    • OpenMP inspection improved; new functions replace pygram11.OPENMP:
      • pygram11.omp_available() -> bool checks for availability
      • pygram11.omp_max_threads() -> int checks for the maximum available threads
    • some documentation improvements
    • bump pybind11 version to 2.4.3
    • pygram11.OPENMP will be removed in a future release
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Oct 17, 2019)

  • 0.5.2(Aug 20, 2019)

  • 0.5.1(Aug 10, 2019)

    • Fix setup.py such that setuptools doesn't try to install numpy via easy_install (remove unnecessary setup_requires argument).
    • Add _max_threads attribute to pygram11._core.
    • Fixed macOS wheels (use delocate to ensure OpenMP symbols are bundled).
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 29, 2019)

    New features

    • histogramming multiple weight variations in a single function call is now possible via fix1dmw and var1dmw
    • In the NumPy like API, passing a 2 dimensional array to the weights argument will use this feature as well.
    • Binary wheels for Linux and macOS are provided on PyPI (conda-forge binaries are of course still available as well)

    Other updates

    • All functions now return the sqrt(sum-of-weights-squared) instead of sum-of-weights; before v0.5.x the 2D functions returned the sum of weights squared.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0.a2(Jul 29, 2019)

Owner
Doug Davis
OSS Engineer @anaconda
Doug Davis
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 822 Jan 07, 2023
A gui application to visualize various sorting algorithms using pure python.

Sorting Algorithm Visualizer A gui application to visualize various sorting algorithms using pure python. Language : Python 3 Libraries required Tkint

Rajarshi Banerjee 19 Nov 30, 2022
MPL Plotter is a Matplotlib based Python plotting library built with the goal of delivering publication-quality plots concisely.

MPL Plotter is a Matplotlib based Python plotting library built with the goal of delivering publication-quality plots concisely.

Antonio López Rivera 162 Nov 11, 2022
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects.

Orbitals in Python A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects, and o

Prakrisht Dahiya 1 Nov 25, 2021
A simple, fast, extensible python library for data validation.

Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic

kk 209 Sep 19, 2022
An open-source tool for visual and modular block programing in python

PyFlow PyFlow is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are coming in bit by

1.1k Jan 06, 2023
Scientific Visualization: Python + Matplotlib

An open access book on scientific visualization using python and matplotlib

Nicolas P. Rougier 8.6k Dec 31, 2022
Dipto Chakrabarty 7 Sep 06, 2022
simple tool to paint axis x and y

simple tool to paint axis x and y

G705 1 Oct 21, 2021
A dashboard built using Plotly-Dash for interactive visualization of Dex-connected individuals across the country.

Dashboard For The DexConnect Platform of Dexterity Global Working prototype submission for internship at Dexterity Global Group. Dashboard for real ti

Yashasvi Misra 2 Jun 15, 2021
Python toolkit for defining+simulating+visualizing+analyzing attractors, dynamical systems, iterated function systems, roulette curves, and more

Attractors A small module that provides functions and classes for very efficient simulation and rendering of iterated function systems; dynamical syst

1 Aug 04, 2021
Multi-class confusion matrix library in Python

Table of contents Overview Installation Usage Document Try PyCM in Your Browser Issues & Bug Reports Todo Outputs Dependencies Contribution References

Sepand Haghighi 1.3k Dec 31, 2022
PyPassword is a simple follow up to PyPassphrase

PyPassword PyPassword is a simple follow up to PyPassphrase. After finishing that project it occured to me that while some may wish to use that option

Scotty 2 Jan 22, 2022
This Crash Course will cover all you need to know to start using Plotly in your projects.

Plotly Crash Course This course was designed to help you get started using Plotly. If you ever felt like your data visualization skills could use an u

Fábio Neves 2 Aug 21, 2022
Plot-configurations for scientific publications, purely based on matplotlib

TUEplots Plot-configurations for scientific publications, purely based on matplotlib. Usage Please have a look at the examples in the example/ directo

Nicholas Krämer 487 Jan 08, 2023
GitHub English Top Charts

Help you discover excellent English projects and get rid of the interference of other spoken language.

kon9chunkit 529 Jan 02, 2023
trade bot connected to binance API/ websocket.,, include dashboard in plotly dash to visualize trades and balances

Crypto trade bot 1. What it is Trading bot connected to Binance API. This project made for fun. So ... Do not use to trade live before you have backte

G 3 Oct 07, 2022
Some useful extensions for Matplotlib.

mplx Some useful extensions for Matplotlib. Contour plots for functions with discontinuities plt.contour mplx.contour(max_jump=1.0) Matplotlib has pro

Nico Schlömer 519 Dec 30, 2022
A blender import/export system for Defold

defold-blender-export A Blender export system for the Defold game engine. Setup Notes There are no exhaustive documents for this tool yet. Its just no

David Lannan 27 Dec 30, 2022