A framework for Quantification written in Python

Last update: Dec 14, 2022

Related tags

Overview

QuaPy

QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python.

QuaPy is based on the concept of "data sample", and provides implementations of the most important aspects of the quantification workflow, such as (baseline and advanced) quantification methods, quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. QuaPy also makes available commonly used datasets, and offers visualization tools for facilitating the analysis and interpretation of the experimental results.

Installation

pip install quapy

A quick example:

The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the Adjusted Classify & Count quantification method, using, as the evaluation measure, the Mean Absolute Error (MAE) between the predicted and the true class prevalence values of the test set.

import quapy as qp
from sklearn.linear_model import LogisticRegression

dataset = qp.datasets.fetch_twitter('semeval16')

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training)

estim_prevalence = model.quantify(dataset.test.instances)
true_prevalence  = dataset.test.prevalence()

error = qp.error.mae(true_prevalence, estim_prevalence)

print(f'Mean Absolute Error (MAE)={error:.3f}')

Quantification is useful in scenarios characterized by prior probability shift. In other words, we would be little interested in estimating the class prevalence values of the test set if we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the class prevalence of the training set. For this reason, any quantification model should be tested across many samples, even ones characterized by class prevalence values different or very different from those found in the training set. QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the Wiki for detailed examples.

Features

Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
Versatile functionality for performing evaluation based on artificial sampling protocols.
Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
Datasets frequently used in quantification (textual and numeric), including:
- 32 UCI Machine Learning datasets.
- 11 Twitter quantification-by-sentiment datasets.
- 3 product reviews quantification-by-sentiment datasets.
Native support for binary and single-label multiclass quantification scenarios.
Model selection functionality that minimizes quantification-oriented loss functions.
Visualization tools for analysing the experimental results.

Requirements

scikit-learn, numpy, scipy
pytorch (for QuaNet)
svmperf patched for quantification (see below)
joblib
tqdm
pandas, xlrd
matplotlib

SVM-perf with quantification-oriented losses

In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD), SVM(AE), or SVM(RAE), you have to first download the svmperf package, apply the patch svm-perf-quantification-ext.patch, and compile the sources. The script prepare_svmperf.sh does all the job. Simply run:

./prepare_svmperf.sh

The resulting directory svm_perf_quantification contains the patched version of svmperf with quantification-oriented losses.

The svm-perf-quantification-ext.patch is an extension of the patch made available by Esuli et al. 2015 that allows SVMperf to optimize for the Q measure as proposed by Barranquero et al. 2015 and for the KLD and NKLD measures as proposed by Esuli et al. 2015. This patch extends the above one by also allowing SVMperf to optimize for AE and RAE.

Wiki

Check out our Wiki, in which many examples are provided:

Comments

Couldn't train QuaNet on multiclass data
Hi, I am having trouble in training a QuaNet quantifier for multiclass (20) data. Everything works fine with where my dataset only has 2 classes. It looks like the ACC quantifier is not able to aggregate from more than 2 classes?

The classifier is built and trained as with the code below

classifier = LSTMnet(dataset.vocabulary_size, dataset.n_classes) learner = NeuralClassifierTrainer(classifier) learner.fit(*dataset.training.Xy)

where it has all the default configurations

{'embedding_size': 100, 'hidden_size': 256, 'repr_size': 100, 'lstm_class_nlayers': 1, 'drop_p': 0.5}

Then I tried to train QuaNet with following code

model = QuaNetTrainer(learner, qp.environ['SAMPLE_SIZE']) model.fit(dataset.training, fit_learner=False)

and it showed that QuaNet is built as

QuaNetModule( (lstm): LSTM(120, 64, batch_first=True, dropout=0.5, bidirectional=True) (dropout): Dropout(p=0.5, inplace=False) (ff_layers): ModuleList( (0): Linear(in_features=208, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=512, bias=True) ) (output): Linear(in_features=512, out_features=20, bias=True) )

And then the error occured in model.fit().

Attached is the error I get.

Traceback (most recent call last): File "quanet-test.py", line 181, in model.fit(dataset.training, fit_learner=False) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 126, in fit self.epoch(train_data_embed, train_posteriors, self.tr_iter, epoch_i, early_stop, train=True) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 182, in epoch quant_estims = self.get_aggregative_estims(sample_posteriors) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 145, in get_aggregative_estims prevs_estim.extend(quantifier.aggregate(predictions)) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 238, in aggregate return ACC.solve_adjustment(self.Pte_cond_estim_, prevs_estim) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 246, in solve_adjustment adjusted_prevs = np.linalg.solve(A, B) File "<array_function internals>", line 6, in solve File "/usr/local/lib64/python3.6/site-packages/numpy/linalg/linalg.py", line 394, in solve r = gufunc(a, b, signature=signature, extobj=extobj) ValueError: solve1: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m)->(m) (size 2 is different from 20)

Thank you!
opened by vickysvicky 4
Parameter fit_learner in QuaNetTrainer (fit method)

The parameter fit_leaner is not used in the function:

def fit(self, data: LabelledCollection, fit_learner=True):

and the learner is fitted every time:

self.learner.fit(*classifier_data.Xy)

opened by pglez82 1
Wiki correction

In the last part of the Methods wiki page, where it says:

from classification.neural import NeuralClassifierTrainer, CNNnet

I think it should say:

from quapy.classification.neural import NeuralClassifierTrainer, LSTMnet

opened by pglez82 1

Error in LSTMnet

I think there is the function init_hidden:

def init_hidden(self, set_size):
        opt = self.hyperparams
        var_hidden = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
        var_cell = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
        if next(self.lstm.parameters()).is_cuda:
            var_hidden, var_cell = var_hidden.cuda(), var_cell.cuda()
        return var_hidden, var_cell

Where it says opt['lstm_hidden_size'] should be opt['hidden_size']

opened by pglez82 1

EMQ can be instantiated with a transformation function
This transformation function is applied to each intermediate estimate.

Why should someone want to transform the prior between two iterations? A transformation of the prior is a heuristic, yet effective way of promoting desired properties of the solution. For instance,

small values could be enhanced if the data is extremely imbalanced

small values could be reduced if the user is looking for a sparse solution

neighboring values could be averaged if the user is looking for a smooth solution

the function could also leave the prior unaltered and just be used as a callback for logging the progress of the method

I hope this feature is useful. Let me know what you think!
opened by mirkobunse 0
fixing two problems with parameters: hidden_size and lstm_nlayers

I found another problem with a parameter. When using LSTMnet with QuaNet two parameters overlap (lstm_nlayers). I have renamed the one in the LSTMnet to lstm_class_nlayers.

opened by pglez82 0
Using a different gpu than cuda:0

The code seems to be tied up to using only 'cuda', which by default uses the first gpu in the system ('cuda:0'). It would be handy to be able to tell the library in which cuda gpu you want to train (cuda:0, cuda:1, etc).

opened by pglez82 0

Releases(0.1.6)

0.1.6(Nov 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

The Human Language Technologies group of ISTI-CNR

GitHub Repository

On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition

On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition With the spirit of reproducible research, this repository contains codes requ

0 Feb 24, 2022

Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning"

CAPGNN Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning" Paper URL: https://ar

1 Mar 12, 2022

CUAD

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

273 Dec 17, 2022

Proof-Of-Concept Piano-Drums Music AI Model/Implementation

Rock Piano "When all is one and one is all, that's what it is to be a rock and not to roll." ---Led Zeppelin, "Stairway To Heaven" Proof-Of-Concept Pi

4 Nov 28, 2021

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

906 Dec 30, 2022

pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

1.4k Jan 07, 2023

This repo tries to recognize faces in the dataset you created

YÜZ TANIMA SİSTEMİ Bu repo oluşturacağınız yüz verisetlerini tanımaya çalışan ma

2 Dec 30, 2021

Machine Learning University: Accelerated Computer Vision Class

Machine Learning University: Accelerated Computer Vision Class This repository contains slides, notebooks, and datasets for the Machine Learning Unive

1.3k Dec 28, 2022

VR-Caps: A Virtual Environment for Active Capsule Endoscopy

VR-Caps: A Virtual Environment for Capsule Endoscopy Overview We introduce a virtual active capsule endoscopy environment developed in Unity that prov

90 Dec 27, 2022

Self-Supervised Learning with Kernel Dependence Maximization

Self-Supervised Learning with Kernel Dependence Maximization This is the code for SSL-HSIC, a self-supervised learning loss proposed in the paper Self

29 Dec 29, 2022

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

CLIN-X (CLIN-X-ES) & (CLIN-X-EN) This repository holds the companion code for the system reported in the paper: "CLIN-X: pre-trained language models a

4 Dec 05, 2022

Contextual Attention Localization for Offline Handwritten Text Recognition

CALText This repository contains the source code for CALText model introduced in "CALText: Contextual Attention Localization for Offline Handwritten T

0 Feb 17, 2022

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

1 Jan 04, 2022

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

COIN_FLIPPY ##This is a simple example package. You can use Github-flavored Markdown to write your content. Coinflippy A coin flip game in which you c

2 Dec 26, 2021

Genshin-assets - 👧 Public documentation & static assets for Genshin Impact data.

genshin-assets This repo provides easy access to the Genshin Impact assets, primarily for use on static sites. Sources Genshin Optimizer - An Artifact

5 Nov 22, 2022

Construct a neural network frame by Numpy

本项目的CSDN博客链接：https://blog.csdn.net/weixin_41578567/article/details/111482022 1. 概览本项目主要用于神经网络的学习，通过基于numpy的实现，了解神经网络底层前向传播、反向传播以及各类优化器的原理。该项目目前已实现的功

24 Jan 22, 2022

ICCV2021 Expert-Goal Trajectory Prediction

ICCV 2021: Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples This repository contains the code for the paper Where are yo

21 Dec 12, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

code for our BMVC 2021 paper "HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification"

HCV_IIRC code for our BMVC 2021 paper HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification by Kai Wang, Xialei Li

13 Oct 03, 2022

An Active Automata Learning Library Written in Python

AALpy An Active Automata Learning Library AALpy is a light-weight active automata learning library written in pure Python. You can start learning auto

78 Dec 30, 2022