Supervised domain-agnostic prediction framework for probabilistic modelling

Overview

skpro

PyPI version Build Status License

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data points.

The package offers a variety of features and specifically allows for

  • the implementation of probabilistic prediction strategies in the supervised contexts
  • comparison of frequentist and Bayesian prediction methods
  • strategy optimization through hyperparamter tuning and ensemble methods (e.g. bagging)
  • workflow automation

List of developers and contributors

Documentation

The full documentation is available here.

Installation

Installation is easy using Python's package manager

$ pip install skpro

Contributing & Citation

We welcome contributions to the skpro project. Please read our contribution guide.

If you use skpro in a scientific publication, we would appreciate citations.

Comments
  • Distributions as return objects

    Distributions as return objects

    Re-opening the sub-issue opened in #3 and commented upon by @murphyk

    Question: should skpro's predict methods return a vector of distribution objects? For example, using the distributions from scipy.stats which implement methods pdf, cdf, mean, var, etc.

    Pro:

    • this would be using an existing, consolidated, and well-supported interface
    • it might be easier to use
    • it might be easier to understand

    Contra:

    • mixture types are not supported
    • l2 norm is not supported (as would be needed for squared/Gneiting loss)
    • mixed distributions on the reals, especially empirical distributions (weighted sum of deltas) which are returned by Bayesian packages are not supported
    • vectors of distributions are not supported, alternatively Cartesian products of distributions
    • this is not the status quo
    help wanted 
    opened by fkiraly 11
  • documentation: np.mean(y_pred) does not work

    documentation: np.mean(y_pred) does not work

    I'm following along with this intro example.. However this line fails

    (numpy.mean(y_pred) * 2).shape
    

    Error below (seems to be because Distribution objects don't support the mean() function but instead insist on obscurely calling it point!)

    np.mean(y_pred)
    Traceback (most recent call last):
    
      File "<ipython-input-38-19819be87ab5>", line 1, in <module>
        np.mean(y_pred)
    
      File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
        out=out, **kwargs)
    
      File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 75, in _mean
        ret = umr_sum(arr, axis, dtype, out, keepdims)
    
    TypeError: unsupported operand type(s) for +: 'Distribution' and 'Distribution'
    
    opened by murphyk 3
  • First example: 'utils' not found

    First example: 'utils' not found

    The first example in your documentation (DensityBaseline) does not run right on my machine: it throws a 'module not found' exception at the call to 'utils'.

    This might be a python version problem (I am using 3.6), so perhaps it's not an error in the normal sense - though I don't see any specification that the package required a particular python version. Apologies if I missed it: in any case, I fixed it by importing matplotlib instead: i.e.

    import matplotlib.pyplot as plt plt.scatter(y_test, y_pred)

    instead of:

    import utils utils.plot_performance(y_test, y_pred)

    opened by Thomas-M-H-Hope 2
  • problem in loading the skpro

    problem in loading the skpro

    It has been 2 days that I am trying to import skpro. But I can not I keep getting this error:

    cannot import name 'six' from 'sklearn.externals' (C:\Users\My Book\anaconda3\lib\site-packages\sklearn\externals_init_.py)

    opened by honestee 1
  • (wish)list of probabilistic regressors to implement or to interface

    (wish)list of probabilistic regressors to implement or to interface

    A wishlist for probabilistic regression methods to implement or interface. This is partly copied from the R counterpart https://github.com/mlr-org/mlr3proba/issues/32 . Number of stars at the end is estimated difficulty or time investment.

    GLM

    • [ ] generalized linear model(s) with regression link, e.g., Gaussian *
    • [ ] generalized linear model(s) with count link, e.g., Poisson *
    • [ ] heteroscedastic linear regression ***
    • [ ] Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

    KRR aka Gaussian process regression

    • [ ] vanilla kernel ridge regression with fixed kernel parameters and variance *
    • [ ] kernel ridge regression with MLE for kernel parameters and regularization parameter **
    • [ ] heteroscedastic KRR or Gaussian processes ***

    CDE

    • [ ] variants of conditional density estimation (Nadaraya-Watson type) **
    • [ ] reduction to density estimation by binning of input variables, then apply unconditional density estimation **

    Tree-based

    • [ ] probabilistic regression trees **

    Neural networks

    • [ ] interface tensorflow probability - some hard-coded NN architectures **
    • [ ] generic tensorflow probability interface - some hard-coded NN architectures ***

    Bayesian toolboxes

    • [ ] generic pymc3 interface ***
    • [ ] generic pyro interface ****
    • [ ] generic Stan interface ****
    • [ ] generic JAGS interface ****
    • [ ] generic BUGS interface ****
    • [ ] generic Bayesian interface - prior-valued hyperparameters *****

    Pipeline elements for target transformation

    • [ ] distr fixed target transformation **
    • [ ] distr predictive target calibration **

    Composite techniques, reduction to deterministic regression

    • [ ] stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
    • [ ] use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
    • [ ] upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
    • [ ] generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
    • [ ] reduction via bootstrapped sampling of a determinstic regressor **

    Ensembling type pipeline elements and compositors

    • [ ] simple bagging, averaging of pdf/cdf **
    • [ ] probabilistic boosting ***
    • [ ] probabilistic stacking ***

    baselines

    • [ ] always predict a Gaussian with mean = training mean, var = training var *
    • [ ] IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
    • [ ] IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

    Other reduction from/to probabilistic regression

    • [ ] reducing deterministic regression to probabilistic regression - take mean, median or mode **
    • [ ] reduction(s) to quantile regression, use predictive quantiles to make a distr ***
    • [ ] reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
    • [ ] reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
    • [ ] reduction to survival, as the sub-case of no censoring **
    • [ ] reduction to classification, by binning ***
    good first issue 
    opened by fkiraly 0
  • skpro-refactoring (version-2)

    skpro-refactoring (version-2)

    See below some comments/description of the coming refactoring contents :

    • Distribution classes refactoring in a more OOD way (see. skpro->distribution)
    • Losse functions (see. metrics->distribution)
    • Estimators (see. metrics->distribution)

    Some descriptive notebooks (in docs->notebooks) and a full set of unit test (in tests) are also available.

    opened by jesellier 24
Releases(v1.0.1-beta)
Owner
The Alan Turing Institute
The UK's national institute for data science and artificial intelligence.
The Alan Turing Institute
Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation

deeptime Releases: Installation via conda recommended. conda install -c conda-forge deeptime pip install deeptime Documentation: deeptime-ml.github.io

495 Dec 28, 2022
PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

This is the official implementation of the following paper: Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Con

ElementAI 217 Jan 01, 2023
EsViT: Efficient self-supervised Vision Transformers

Efficient Self-Supervised Vision Transformers (EsViT) PyTorch implementation for EsViT, built with two techniques: A multi-stage Transformer architect

Microsoft 352 Dec 25, 2022
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data This is the official PyTorch implementation of the SeCo paper: @articl

ElementAI 101 Dec 12, 2022
No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

No-Reference Image Quality Assessment Algorithms No-reference Image Quality Assessment(NIQA) is a task of evaluating an image without a reference imag

Dae-Young Song 26 Jan 04, 2023
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

MIND 478 Jan 01, 2023
Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.

VQGAN-CLIP-GENERATOR Overview This is a package (with available notebook) for running VQGAN+CLIP locally, with a focus on ease of use, good documentat

Ryan Hamilton 98 Dec 30, 2022
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

Vassilis Choutas 1.3k Jan 07, 2023
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

31 Sep 27, 2022
Optimus: the first large-scale pre-trained VAE language model

Optimus: the first pre-trained Big VAE language model This repository contains source code necessary to reproduce the results presented in the EMNLP 2

314 Dec 19, 2022
The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding, where the hidden data can be utilized for various management purposes, including hyper-linking, annotation

Maxim Zaika 1 Nov 17, 2021
The challenge for Quantum Coalition Hackathon 2021

Qchack 2021 Google Challenge This is a challenge for the brave 2021 qchack.io participants. Instructions Hello, intrepid qchacker, welcome to the G|o

quantumlib 18 May 04, 2022
Fedlearn支持前沿算法研发的Python工具库 | Fedlearn algorithm toolkit for researchers

FedLearn-algo Installation Development Environment Checklist python3 (3.6 or 3.7) is required. To configure and check the development environment is c

89 Nov 14, 2022
Official Implementation of Neural Splines

Neural Splines: Fitting 3D Surfaces with Inifinitely-Wide Neural Networks This repository contains the official implementation of the CVPR 2021 (Oral)

Francis Williams 56 Nov 29, 2022
Read and write layered TIFF ImageSourceData and ImageResources tags

Read and write layered TIFF ImageSourceData and ImageResources tags Psdtags is a Python library to read and write the Adobe Photoshop(r) specific Imag

Christoph Gohlke 4 Feb 05, 2022
Continuous Time LiDAR odometry

CT-ICP: Elastic SLAM for LiDAR sensors This repository implements the SLAM CT-ICP (see our article), a lightweight, precise and versatile pure LiDAR o

385 Dec 29, 2022
Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

Ubiquitous Knowledge Processing Lab 191 Dec 28, 2022
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Background Activation Suppression for Weakly Supervised Object Localization PyTorch implementation of ''Background Activation Suppression for Weakly S

35 Jan 06, 2023
Conditional Gradients For The Approximately Vanishing Ideal

Conditional Gradients For The Approximately Vanishing Ideal Code for the paper: Wirth, E., and Pokutta, S. (2022). Conditional Gradients for the Appro

IOL Lab @ ZIB 0 May 25, 2022