A framework for Quantification written in Python

Related tags

Deep LearningQuaPy
Overview

QuaPy

QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python.

QuaPy is based on the concept of "data sample", and provides implementations of the most important aspects of the quantification workflow, such as (baseline and advanced) quantification methods, quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. QuaPy also makes available commonly used datasets, and offers visualization tools for facilitating the analysis and interpretation of the experimental results.

Installation

pip install quapy

A quick example:

The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the Adjusted Classify & Count quantification method, using, as the evaluation measure, the Mean Absolute Error (MAE) between the predicted and the true class prevalence values of the test set.

import quapy as qp
from sklearn.linear_model import LogisticRegression

dataset = qp.datasets.fetch_twitter('semeval16')

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training)

estim_prevalence = model.quantify(dataset.test.instances)
true_prevalence  = dataset.test.prevalence()

error = qp.error.mae(true_prevalence, estim_prevalence)

print(f'Mean Absolute Error (MAE)={error:.3f}')

Quantification is useful in scenarios characterized by prior probability shift. In other words, we would be little interested in estimating the class prevalence values of the test set if we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the class prevalence of the training set. For this reason, any quantification model should be tested across many samples, even ones characterized by class prevalence values different or very different from those found in the training set. QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the Wiki for detailed examples.

Features

  • Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
  • Versatile functionality for performing evaluation based on artificial sampling protocols.
  • Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
  • Datasets frequently used in quantification (textual and numeric), including:
    • 32 UCI Machine Learning datasets.
    • 11 Twitter quantification-by-sentiment datasets.
    • 3 product reviews quantification-by-sentiment datasets.
  • Native support for binary and single-label multiclass quantification scenarios.
  • Model selection functionality that minimizes quantification-oriented loss functions.
  • Visualization tools for analysing the experimental results.

Requirements

  • scikit-learn, numpy, scipy
  • pytorch (for QuaNet)
  • svmperf patched for quantification (see below)
  • joblib
  • tqdm
  • pandas, xlrd
  • matplotlib

SVM-perf with quantification-oriented losses

In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD), SVM(AE), or SVM(RAE), you have to first download the svmperf package, apply the patch svm-perf-quantification-ext.patch, and compile the sources. The script prepare_svmperf.sh does all the job. Simply run:

./prepare_svmperf.sh

The resulting directory svm_perf_quantification contains the patched version of svmperf with quantification-oriented losses.

The svm-perf-quantification-ext.patch is an extension of the patch made available by Esuli et al. 2015 that allows SVMperf to optimize for the Q measure as proposed by Barranquero et al. 2015 and for the KLD and NKLD measures as proposed by Esuli et al. 2015. This patch extends the above one by also allowing SVMperf to optimize for AE and RAE.

Wiki

Check out our Wiki, in which many examples are provided:

Comments
  • Couldn't train QuaNet on multiclass data

    Couldn't train QuaNet on multiclass data

    Hi, I am having trouble in training a QuaNet quantifier for multiclass (20) data. Everything works fine with where my dataset only has 2 classes. It looks like the ACC quantifier is not able to aggregate from more than 2 classes?

    The classifier is built and trained as with the code below

    classifier = LSTMnet(dataset.vocabulary_size, dataset.n_classes)
    learner = NeuralClassifierTrainer(classifier)
    learner.fit(*dataset.training.Xy)
    

    where it has all the default configurations

    {'embedding_size': 100, 'hidden_size': 256, 'repr_size': 100, 'lstm_class_nlayers': 1, 'drop_p': 0.5}

    Then I tried to train QuaNet with following code

    model = QuaNetTrainer(learner, qp.environ['SAMPLE_SIZE'])
    model.fit(dataset.training, fit_learner=False)
    

    and it showed that QuaNet is built as

    QuaNetModule( (lstm): LSTM(120, 64, batch_first=True, dropout=0.5, bidirectional=True) (dropout): Dropout(p=0.5, inplace=False) (ff_layers): ModuleList( (0): Linear(in_features=208, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=512, bias=True) ) (output): Linear(in_features=512, out_features=20, bias=True) )

    And then the error occured in model.fit().

    Attached is the error I get.

    Traceback (most recent call last): File "quanet-test.py", line 181, in model.fit(dataset.training, fit_learner=False) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 126, in fit self.epoch(train_data_embed, train_posteriors, self.tr_iter, epoch_i, early_stop, train=True) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 182, in epoch quant_estims = self.get_aggregative_estims(sample_posteriors) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 145, in get_aggregative_estims prevs_estim.extend(quantifier.aggregate(predictions)) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 238, in aggregate return ACC.solve_adjustment(self.Pte_cond_estim_, prevs_estim) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 246, in solve_adjustment adjusted_prevs = np.linalg.solve(A, B) File "<array_function internals>", line 6, in solve File "/usr/local/lib64/python3.6/site-packages/numpy/linalg/linalg.py", line 394, in solve r = gufunc(a, b, signature=signature, extobj=extobj) ValueError: solve1: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m)->(m) (size 2 is different from 20)

    Thank you!

    opened by vickysvicky 4
  • Parameter fit_learner in QuaNetTrainer (fit method)

    Parameter fit_learner in QuaNetTrainer (fit method)

    The parameter fit_leaner is not used in the function:

    def fit(self, data: LabelledCollection, fit_learner=True):

    and the learner is fitted every time:

    self.learner.fit(*classifier_data.Xy)

    opened by pglez82 1
  • Wiki correction

    Wiki correction

    In the last part of the Methods wiki page, where it says:

    from classification.neural import NeuralClassifierTrainer, CNNnet

    I think it should say:

    from quapy.classification.neural import NeuralClassifierTrainer, LSTMnet

    opened by pglez82 1
  • Error in LSTMnet

    Error in LSTMnet

    I think there is the function init_hidden:

    def init_hidden(self, set_size):
            opt = self.hyperparams
            var_hidden = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
            var_cell = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
            if next(self.lstm.parameters()).is_cuda:
                var_hidden, var_cell = var_hidden.cuda(), var_cell.cuda()
            return var_hidden, var_cell
    

    Where it says opt['lstm_hidden_size'] should be opt['hidden_size']

    opened by pglez82 1
  • EMQ can be instantiated with a transformation function

    EMQ can be instantiated with a transformation function

    This transformation function is applied to each intermediate estimate.

    Why should someone want to transform the prior between two iterations? A transformation of the prior is a heuristic, yet effective way of promoting desired properties of the solution. For instance,

    • small values could be enhanced if the data is extremely imbalanced
    • small values could be reduced if the user is looking for a sparse solution
    • neighboring values could be averaged if the user is looking for a smooth solution
    • the function could also leave the prior unaltered and just be used as a callback for logging the progress of the method

    I hope this feature is useful. Let me know what you think!

    opened by mirkobunse 0
  • fixing two problems with parameters: hidden_size and lstm_nlayers

    fixing two problems with parameters: hidden_size and lstm_nlayers

    I found another problem with a parameter. When using LSTMnet with QuaNet two parameters overlap (lstm_nlayers). I have renamed the one in the LSTMnet to lstm_class_nlayers.

    opened by pglez82 0
  • Using a different gpu than cuda:0

    Using a different gpu than cuda:0

    The code seems to be tied up to using only 'cuda', which by default uses the first gpu in the system ('cuda:0'). It would be handy to be able to tell the library in which cuda gpu you want to train (cuda:0, cuda:1, etc).

    opened by pglez82 0
Releases(0.1.6)
Owner
The Human Language Technologies group of ISTI-CNR
Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

Diego Restrepo 11 Sep 10, 2022
Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository Repository containing all scripts used in the studies of Online-compatible Un

0 Nov 09, 2021
Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks

MindCraft Authors: Cristian-Paul Bara*, Sky CH-Wang*, Joyce Chai This is the official code repository for the paper (arXiv link): Cristian-Paul Bara,

Situated Language and Embodied Dialogue (SLED) Research Group 14 Dec 29, 2022
OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

OpenDILab 205 Dec 29, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022
Multivariate Time Series Transformer, public version

Multivariate Time Series Transformer Framework This code corresponds to the paper: George Zerveas et al. A Transformer-based Framework for Multivariat

363 Jan 03, 2023
Normal Learning in Videos with Attention Prototype Network

Codes_APN Official codes of CVPR21 paper: Normal Learning in Videos with Attention Prototype Network (https://arxiv.org/abs/2108.11055) Overview of ou

11 Dec 13, 2022
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

Chirag Jain 17 May 16, 2022
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

35 Sep 08, 2021
Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

Second-order Attention Network for Single Image Super-resolution (CVPR-2019) "Second-order Attention Network for Single Image Super-resolution" is pub

516 Dec 28, 2022
A PyTorch based deep learning library for drug pair scoring.

Documentation | External Resources | Datasets | Examples ChemicalX is a deep learning library for drug-drug interaction, polypharmacy side effect and

AstraZeneca 597 Dec 30, 2022
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework via Self-Supervised Multi-Task Learning. Code will be available soon.

Official-PyTorch-Implementation-of-TransMEF Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fu

117 Dec 27, 2022
Controlling the MicriSpotAI robot from scratch

Project-MicroSpot-AI Controlling the MicriSpotAI robot from scratch Colaborators Alexander Dennis Components from MicroSpot The MicriSpotAI has the fo

Dennis Núñez-Fernández 5 Oct 20, 2022
RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

RODD Official Implementation of 2022 CVPRW Paper RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection Introduction: Recent studie

Umar Khalid 17 Oct 11, 2022
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 09, 2023
Simulations for Turring patterns on an apically expanding domain. T

Turing patterns on expanding domain Simulations for Turring patterns on an apically expanding domain. The details about the models and numerical imple

Yue Liu 0 Aug 03, 2021
Recurrent Conditional Query Learning

Recurrent Conditional Query Learning (RCQL) This repository contains the Pytorch implementation of One Model Packs Thousands of Items with Recurrent C

Dongda 4 Nov 28, 2022
Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, L

3 Dec 02, 2022