Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

Overview

physt Physt logo

P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM).

The goal is to unify different concepts of histograms as occurring in numpy, pandas, matplotlib, ROOT, etc. and to create one representation that is easily manipulated with from the data point of view and at the same time provides nice integration into IPython notebook and various plotting options. In short, whatever you want to do with histograms, physt aims to be on your side.

Note: bokeh plotting backend has been discontinued (due to external library being redesigned.)

Travis ReadTheDocs Join the chat at https://gitter.im/physt/Lobby PyPI version Anaconda-Server Badge Anaconda-Server Badge

Versioning

  • Versions 0.3.x support Python 2.7 (no new releases in 2019)
  • Versions 0.4.x support Python 3.5+ while continuing the 0.3 API
  • Versions 0.4.9+ support only Python 3.6+ while continuing the 0.3 API
  • Versions 0.5.x slightly change the interpretation of *args in h1, h2, ...

Simple example

from physt import h1

# Create the sample
heights = [160, 155, 156, 198, 177, 168, 191, 183, 184, 179, 178, 172, 173, 175,
           172, 177, 176, 175, 174, 173, 174, 175, 177, 169, 168, 164, 175, 188,
           178, 174, 173, 181, 185, 166, 162, 163, 171, 165, 180, 189, 166, 163,
           172, 173, 174, 183, 184, 161, 162, 168, 169, 174, 176, 170, 169, 165]

hist = h1(heights, 10)           # <--- get the histogram data
hist << 190                      # <--- add a forgotten value
hist.plot()                      # <--- and plot it

Heights plot

2D example

from physt import h2
import seaborn as sns

iris = sns.load_dataset('iris')
iris_hist = h2(iris["sepal_length"], iris["sepal_width"], "human", bin_count=[12, 7], name="Iris")
iris_hist.plot(show_zero=False, cmap="gray_r", show_values=True);

Iris 2D plot

3D directional example

import numpy as np
from physt import special_histograms

# Generate some sample data
data = np.empty((1000, 3))
data[:,0] = np.random.normal(0, 1, 1000)
data[:,1] = np.random.normal(0, 1.3, 1000)
data[:,2] = np.random.normal(1, .6, 1000)

# Get histogram data (in spherical coordinates)
h = special_histograms.spherical(data)                 

# And plot its projection on a globe
h.projection("theta", "phi").plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")   

Directional 3D plot

See more in docstring's and notebooks:

Installation

Using pip:

pip install physt

Features

Implemented

  • 1D histograms
  • 2D histograms
  • ND histograms
  • Some special histograms
    • 2D polar coordinates (with plotting)
    • 3D spherical / cylindrical coordinates (beta)
  • Adaptive rebinning for on-line filling of unknown data (beta)
  • Non-consecutive bins
  • Memory-effective histogramming of dask arrays (beta)
  • Understands any numpy-array-like object
  • Keep underflow / overflow / missed bins
  • Basic numeric operations (* / + -)
  • Items / slice selection (including mask arrays)
  • Add new values (fill, fill_n)
  • Cumulative values, densities
  • Simple statistics for original data (mean, std, sem)
  • Plotting with several backends
    • matplotlib (static plots with many options)
    • vega (interactive plots, beta, help wanted!)
    • folium (experimental for geo-data)
    • plotly (very basic, help wanted!)
    • ascii (experimental)
  • Algorithms for optimized binning
    • human-friendly
    • mathematical
  • IO, conversions
    • I/O JSON
    • I/O xarray.DataSet (experimental)
    • O ROOT file (experimental)
    • O pandas.DataFrame (basic)

Planned

  • Rebinning
    • using reference to original data?
    • merging bins
  • Statistics (based on original data)?
  • Stacked histograms (with names)
  • Potentially holoviews plotting backend (instead of the discontinued bokeh one)

Not planned

  • Kernel density estimates - use your favourite statistics package (like seaborn)
  • Rebinning using interpolation - it should be trivial to use rebin (https://github.com/jhykes/rebin) with physt

Rationale (for both): physt is dumb, but precise.

Dependencies

  • Python 3.5+
  • numpy
  • (optional) matplotlib - simple output
  • (optional) xarray - I/O
  • (optional) protobuf - I/O
  • (optional) uproot - I/O
  • (optional) astropy - additional binning algorithms
  • (optional) folium - map plotting
  • (optional) vega3 - for vega in-line in IPython notebook (note that to generate vega JSON, this is not necessary)
  • (optional) asciiplotlib - for ASCII bar plots
  • (optional) xtermcolot - for ASCII color maps
  • (testing) py.test, pandas
  • (docs) sphinx, sphinx_rtd_theme, ipython

Publicity

Talk at PyData Berlin 2018:

Contribution

I am looking for anyone interested in using / developing physt. You can contribute by reporting errors, implementing missing features and suggest new one.

Thanks to:

Patches:

Alternatives and inspirations

Comments
  • python 2.7 plotting is not working

    python 2.7 plotting is not working

    When runnin plot() function I get the error below even though matplotlib is installed. Also the algorithm is pretty slow when running on something bigger than toy example.

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 137, in __call__
        return plot(self.histogram, kind=kind, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 91, in plot
        backend_name, backend = _get_backend(backend)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 70, in _get_backend
        raise RuntimeError("No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).")
    RuntimeError: No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).
    
    bug 
    opened by romange 13
  • Smooth polar histograms?

    Smooth polar histograms?

    Thanks for writing this awesome library!

    I have a question regarding smoothing of polar 2D histograms. I am constructing a histogram like described on this page https://physt.readthedocs.io/en/latest/special_histograms.html#Polar-histogram and now I want to smooth it with a Gaussian kernel (like scipy.ndimage.gaussian_filter). What is the most elegant / correct method to do that?

    question 
    opened by horsto 7
  • Rebinning histograms related project

    Rebinning histograms related project

    Hi I found a project on rebinning histogram at https://github.com/jhykes/rebin and I opened an issue (jhykes/rebin#5) on that project page asking about integrating his code to this project. I hope you will appreciate it.

    enhancement idea? 
    opened by DancingQuanta 7
  • Option to center labels on bins

    Option to center labels on bins

    If you have a large dataset with a small number of values (such as consisting only of integers 1-10) then it would be nice to have the bin x-axis labels at the center under the respective bin instead of at the bin edges.

    I recognise this case is more of a 'histogram as bar plot' kind of thing, but it is a use-case I have often.

    opened by nzjrs 5
  • Usage of spherical histogram

    Usage of spherical histogram

    Hi, I have tried the example of spherical histogram. After a small modification of the code (normalized the data as unit vectors),

    n = 100 data = np.empty((n, 3)) data[:,0] = np.random.normal(0, 1, n) data[:,1] = np.random.normal(0, 1, n) data[:,2] = np.random.normal(0, 1, n) for i in range(n): scale = np.sqrt(data[i,0]**2 + data[i,1]**2 + data[i,2]**2) data[i,0] = data[i,0]/scale data[i,1] = data[i,1]/scale data[i,2] = data[i,2]/scale

    h = special.spherical_histogram(data, theta_bins=20, phi_bins=20) ax.scatter(data[:,0], data[:,1], data[:,2])

    globe = h.projection("theta", "phi") globe.plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")

    plt.show()

    I got an error: “RuntimeError: Bins not in rising order.” What did I do wrong? Thank you for your support.

    question 
    opened by zhengpuchen 3
  • approximate histograms

    approximate histograms

    I'm following the paper (http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) implemented by https://github.com/carsonfarmer/streamhist, and the notion of approximate histograms seems elegant and efficient.

    After seeing the internals of streamhist (trying to fix bugs) and reading the paper, I can imagine ways to make a better implementation: e.g. much more efficient discovery of bins to be joined, and avoiding temporary lists when possible. Also the code seems overly complex, partially due to features like "bin freezing" which try to workaround poor bin joining performance.

    Anyway since streamhist is defunct, I'm thinking about trying an implementation. I wonder if this kind of histogram would fit into physt (and if sortedcollections would be reasonable as a dependency).

    opened by belm0 3
  • please make this library discoverable

    please make this library discoverable

    name: physt (?) github tag line: P(i/y)thon h(i/y)stograms (???)

    google search for "python streaming histogram"

    • top result is https://github.com/carsonfarmer/streamhist (unused / unmaintained)
    • physt not in initial 10 pages of results...

    For over a year I've wanted to find a Python library which supports efficient histogram updates without a bunch of ugly dependencies. I've searched many times. Today I happened to get lucky by seeing physt mentioned at the bottom of a SO question (https://stackoverflow.com/questions/40627274/).

    To improve discoverability by search, please consider updating the github tag line to concisely and accurately describe the library (... rather than be cute).

    opened by belm0 2
  • Warning in current numpy

    Warning in current numpy

    If you try to merge bins:

    from physt import h2
    from scipy.stats import multivariate_normal
    hist = h2(*multivariate_normal.rvs((0,0), size=100_000).T, bins=100)
    hist.merge_bins(2)
    

    You get a warning from numpy:

    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:572: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_frequencies[new_index] += old_frequencies[old_index]
    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:573: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_errors2[new_index] += old_errors2[old_index]
    
    opened by henryiii 2
  • Add 2D & ND histograms

    Add 2D & ND histograms

    • [x] Analogous data model to Histogram1D
    • [x] refactor HistogramBase class -> common behaviour of 1D and 2D
    • [x] revisit binning schemas
    • [x] histogram2D facade function to be compatible with numpy one
    • [x] plotting
    • [x] arithmetic operations
    • [x] documentation
    • [ ] stats
    enhancement 
    opened by janpipek 2
  • ImportError with newer plotly

    ImportError with newer plotly

    [SOMEDIR}\physt\physt\plotting\plotly.py in <module>
         12 
         13 import plotly.offline as pyo
    ---> 14 import plotly.plotly as pyp
         15 import plotly.graph_objs as go
         16 
    
    ~\Miniconda3\lib\site-packages\plotly\plotly\__init__.py in <module>
          2 from _plotly_future_ import _chart_studio_error
          3 
    ----> 4 _chart_studio_error("plotly")
    
    ~\Miniconda3\lib\site-packages\_plotly_future_\__init__.py in _chart_studio_error(submodule)
         41 
         42 def _chart_studio_error(submodule):
    ---> 43     raise ImportError(
         44         """
         45 The plotly.{submodule} module is deprecated,
    
    ImportError: 
    The plotly.plotly module is deprecated,
    please install the chart-studio package and use the
    chart_studio.plotly module instead. 
    
    bug visualization 
    opened by janpipek 1
  • Wrong bars center in polar_map

    Wrong bars center in polar_map

    I have found that the bars in polar_map are centered on the left edge of the phi bins instead of their center. Because of this, the representation of the histogram does not coincide with the data, as in the figure below: polarmap_wrong

    I think this can be easily solved by replacing

    bars = ax.bar(phipos[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    with

    bars = ax.bar(phipos[i] + 0.5*dphi[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    in the definition of polar_map.

    By the way, thank you for this amazing package!

    bug visualization 
    opened by ruhugu 1
  • Be more explicit about bins too narrow for float representation

    Be more explicit about bins too narrow for float representation

    If the computed range for the binning divided by the number of bins is lower than the minimum float difference at the scale, we receive an error [ValueError: Bins not in rising order.] which is not very informative.

    To reproduce:

    data = [1, np.nextafter(1, 2)]
    physt.h1(data)
    

    It also happens when the range is 0, like in:

    data = [1, 1]
    physt.h1(data)
    
    enhancement 
    opened by janpipek 1
Releases(v0.5.2)
Owner
Jan Pipek
PyData Prague
Jan Pipek
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 05, 2023
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:

JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f

Leonardo Taccari 462 Jan 02, 2023
A set of useful perceptually uniform colormaps for plotting scientific data

Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti

HoloViz 590 Dec 31, 2022
Log visualizer for whirl-framework

Lumberjack Log visualizer for whirl-framework Установка pip install -r requirements.txt Как пользоваться python3 lumberjack.py -l путь до лога -o

Vladimir Malinovskii 2 Dec 19, 2022
A simple project on Data Visualization for CSCI-40 course.

Simple-Data-Visualization A simple project on Data Visualization for CSCI-40 course - the instructions can be found here SAT results in New York in 20

Hugo Matousek 8 Oct 27, 2021
I'm doing Genuary, an aritifiacilly generated month to build code that make beautiful things

Genuary 2022 I'm doing Genuary, an aritifiacilly generated month to build code that make beautiful things. Every day there is a new prompt for making

Joaquín Feltes 1 Jan 10, 2022
Python code for solving 3D structural problems using the finite element method

3DFEM Python 3D finite element code This python code allows for solving 3D structural problems using the finite element method. New features will be a

Rémi Capillon 6 Sep 29, 2022
Data parsing and validation using Python type hints

pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De

Samuel Colvin 12.1k Jan 06, 2023
RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

Matheus Breguêz 13 Oct 12, 2022
Draw tree diagrams from indented text input

Draw tree diagrams This repository contains two very different scripts to produce hierarchical tree diagrams like this one: $ ./classtree.py collectio

Luciano Ramalho 8 Dec 14, 2022
Farhad Davaripour, Ph.D. 1 Jan 05, 2022
CompleX Group Interactions (XGI) provides an ecosystem for the analysis and representation of complex systems with group interactions.

XGI CompleX Group Interactions (XGI) is a Python package for the representation, manipulation, and study of the structure, dynamics, and functions of

Complex Group Interactions 67 Dec 28, 2022
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

CogniPy for Pandas - In-memory Graph Database and Knowledge Graph with Natural Language Interface Whats in the box Reasoning, exploration of RDF/OWL,

Cognitum Octopus 34 Dec 13, 2022
Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

physt P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM). The goal is to unify different

Jan Pipek 120 Dec 08, 2022
Extensible, parallel implementations of t-SNE

openTSNE openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction al

Pavlin Poličar 1.1k Jan 03, 2023
A small timeseries transformation API built on Flask and Pandas

#Mcflyin ###A timeseries transformation API built on Pandas and Flask This is a small demo of an API to do timeseries transformations built on Flask a

Rob Story 84 Mar 25, 2022
This is Pygrr PolyArt, a program used for drawing custom Polygon models for your Pygrr project!

This is Pygrr PolyArt, a program used for drawing custom Polygon models for your Pygrr project!

Isaac 4 Dec 14, 2021
Python Package for CanvasXpress JS Visualization Tools

CanvasXpress Python Library About CanvasXpress for Python CanvasXpress was developed as the core visualization component for bioinformatics and system

Dr. Todd C. Brett 5 Nov 07, 2022
Simple python implementation with matplotlib to manually fit MIST isochrones to Gaia DR2 color-magnitude diagrams

Simple python implementation with matplotlib to manually fit MIST isochrones to Gaia DR2 color-magnitude diagrams

Karl Jaehnig 7 Oct 22, 2022
:small_red_triangle: Ternary plotting library for python with matplotlib

python-ternary This is a plotting library for use with matplotlib to make ternary plots plots in the two dimensional simplex projected onto a two dime

Marc 611 Dec 29, 2022