Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Last update: Dec 30, 2022

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
metrics of uncertainty, consistency, and agreement with aggregate
loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available ( ✅ ) and in progress ( 🟡 ).

Categorical Responses

Method	Status
Majority Vote	✅
Dawid-Skene	✅
Gold Majority Vote	✅
M-MSR	✅
Wawa	✅
Zero-Based Skill	✅
GLAD	✅
BCC	🟡

Textual Responses

Method	Status
RASA	✅
HRRASA	✅
ROVER	✅

Image Segmentation

Method	Status
Segmentation MV	✅
Segmentation RASA	✅
Segmentation EM	✅

Pairwise Comparisons

Method	Status
Bradley-Terry	✅
Noisy Bradley-Terry	✅

Citation

Ustalov D., Pavlichenko N., Losev V., Giliazev I., and Tulin E. A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python. The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track. HCOMP 2021. 2021. arXiv: 2109.08584 [cs.HC].

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

For reporting bugs please use the Toloka/bugreport page.
Join our English-speaking slack community for both tech and abstract questions.

License

Comments

Crowd-Kit Learning

This is just an example of what this subpackage will contain.

We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

opened by pilot7747 10
Fix the documentation generation issues
Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[x] I have updated the documentation accordingly.

[ ] I have added tests to cover my changes.

[ ] All new and existing tests passed.

documentation enhancement
opened by dustalov 9
Add MACE

Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.
enhancement good first issue

opened by jcklie 4
Add MACE aggregation model
I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

Description

Based on the original VB inference implementation, I wrote it in Python.

Connected issues (if any)

https://github.com/Toloka/crowd-kit/issues/5

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.
opened by pilot7747 3
Documentation updates
Updated index.md and the Classification section:

added extra information to the models descriptions;

added descriptions for parameters;

fixed error and typos in descriptions.
opened by Natalyl3 2
Binary Relevance aggregation
Description

I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.
opened by denaxen 2
Use mypy --strict
Description

This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

Connected issues (if any)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

enhancement
opened by dustalov 2
Run Jupyter notebooks with tests
Description

This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

Connected issues (if any)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

enhancement good first issue
opened by dustalov 2
Dramatically improve the code maintainability
This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

Description

In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

Connected issues (if any)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

bug documentation enhancement
opened by dustalov 2
Add header and LM-based aggregation item
Description

This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

Connected issues (if any)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have added tests to cover my changes.

[x] All new and existing tests passed.

documentation
opened by dustalov 2
Renamed columns?

Hi, the guide says

df = pd.read_csv('results.csv') # should contain columns: task, performer, label

but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

opened by jcklie 2
Ordinal Labels
Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

Labels are categorical but have an ordering defined 1 < ... < K.

The K class ordinal labels are transformed into K−1 binary class label data.

Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.

The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].

The class with the maximum probability is assigned to the instance

enhancement
opened by vikasraykar 2

Releases(v1.2.0)

v1.2.0(Dec 14, 2022)
Crowd-Kit Learning subpackage introducing implementations of deep learning from crowds methods: CoNAL and CrowdLayer

Added Multi-Binary aggregation

Source code(tar.gz)
Source code(zip)
v1.2.0.rc1(Dec 13, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0(Sep 27, 2022)
New aggregation methods: One-Coin Dawid Skene, MACE, and KOS

Fixed bugs in Dawid-Skene implementation

Improved maintainability by removing stub files

Switched to setup.cfg from setup.py

Source code(tar.gz)
Source code(zip)
v1.1.0.rc4(Sep 26, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc3(Sep 23, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc2(Jul 28, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc1(Jul 28, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(Mar 22, 2022)
Not a backward-compatible change:

Replaced all mentions of "performer" with "worker". This change is not backward compatible because parameters names and DataFrame/Series columns are also affected.

Improvements:

GoldMajorityVote true_labels argument now supports multiple ground truth values for a single task.

Added tol optimization parameter as a tolerance stopping criteria for iterative methods with a variable number of steps.

Python 3.10 support added.

Enhanced aggregation methods descriptions.

Source code(tar.gz)
Source code(zip)
v0.0.9(Nov 30, 2021)
Added TextSummarization aggregation

Added new datasets

Added entropy_threshold method

Added names for pd.Series which are available after fit

Added on_missing_skill and default_skill params for models that use skills

Source code(tar.gz)
Source code(zip)
v0.0.8(Oct 14, 2021)
Added GLAD aggregeation

Fixed https://github.com/Toloka/crowd-kit/issues/6

Fixed https://github.com/Toloka/crowd-kit/issues/3

Source code(tar.gz)
Source code(zip)
v0.0.7(Sep 2, 2021)
Added segmentation EM

Added ROVER

Fixed HRRASA and refactored TextRASA and TextHRRASA

Source code(tar.gz)
Source code(zip)
v0.0.6(Aug 18, 2021)

crowd-kit==0.0.6 release
Source code(tar.gz)
Source code(zip)
v0.0.5(Jul 18, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.4(May 19, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.3(Apr 12, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.2(Apr 7, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.1(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Toloka

Data labeling platform for ML

GitHub Repository

This is official implementaion of paper "Token Shift Transformer for Video Classification".

This is official implementaion of paper "Token Shift Transformer for Video Classification". We achieve SOTA performance 80.40% on Kinetics-400 val. Paper link

60 Dec 30, 2022

source code the paper Fast and Robust Iterative Closet Point.

Fast-Robust-ICP This repository includes the source code the paper Fast and Robust Iterative Closet Point. Authors: Juyong Zhang, Yuxin Yao, Bailin De

320 Dec 28, 2022

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

M-BERT-Study CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY Motivation Multilingual BERT (M-BERT) has shown surprising cross lingual a

1 Feb 28, 2022

Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

268 Dec 22, 2022

Stitch it in Time: GAN-Based Facial Editing of Real Videos

STIT - Stitch it in Time [Project Page] Stitch it in Time: GAN-Based Facial Edit

1.1k Jan 04, 2023

Optimize Trading Strategies Using Freqtrade

Optimize trading strategy using Freqtrade Short demo on building, testing and optimizing a trading strategy using Freqtrade. The DevBootstrap YouTube

139 Jan 01, 2023

tensorflow implementation of 'YOLO : Real-Time Object Detection'

YOLO_tensorflow (Version 0.3, Last updated :2017.02.21) 1.Introduction This is tensorflow implementation of the YOLO:Real-Time Object Detection It can

1.7k Nov 21, 2022

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN This is the PyTorch implementation of StyleGAN of All Trades: Image Manipulati

360 Dec 28, 2022

REBEL: Relation Extraction By End-to-end Language generation

REBEL: Relation Extraction By End-to-end Language generation This is the repository for the Findings of EMNLP 2021 paper REBEL: Relation Extraction By

222 Jan 06, 2023

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

215 Nov 28, 2022

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation", if you find this useful and use

57 Dec 27, 2022

Pytorch implementation of forward and inverse Haar Wavelets 2D

9 Oct 30, 2022

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

3 Mar 30, 2022

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

21 Dec 22, 2022

[TPDS'21] COSCO: Container Orchestration using Co-Simulation and Gradient Based Optimization for Fog Computing Environments

COSCO Framework COSCO is an AI based coupled-simulation and container orchestration framework for integrated Edge, Fog and Cloud Computing Environment

39 Dec 25, 2022

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Related tags

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

Installing

Getting Started

Implemented Aggregation Methods

Categorical Responses

Textual Responses

Image Segmentation

Pairwise Comparisons

Citation

Questions and Bug Reports

License

Comments

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Releases(v1.2.0)

v1.2.0(Dec 14, 2022)

v1.2.0.rc1(Dec 13, 2022)

v1.1.0(Sep 27, 2022)

v1.1.0.rc4(Sep 26, 2022)

v1.1.0.rc3(Sep 23, 2022)

v1.1.0.rc2(Jul 28, 2022)

v1.1.0.rc1(Jul 28, 2022)

v1.0.0(Mar 22, 2022)

v0.0.9(Nov 30, 2021)

v0.0.8(Oct 14, 2021)

v0.0.7(Sep 2, 2021)

v0.0.6(Aug 18, 2021)

v0.0.5(Jul 18, 2021)

v0.0.4(May 19, 2021)

v0.0.3(Apr 12, 2021)

v0.0.2(Apr 7, 2021)

v0.0.1(Mar 2, 2021)

Owner

Toloka

This is official implementaion of paper "Token Shift Transformer for Video Classification".

source code the paper Fast and Robust Iterative Closet Point.

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Unofficial implementation of PatchCore anomaly detection

Stitch it in Time: GAN-Based Facial Editing of Real Videos

Optimize Trading Strategies Using Freqtrade

tensorflow implementation of 'YOLO : Real-Time Object Detection'

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN

REBEL: Relation Extraction By End-to-end Language generation

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Pytorch implementation of forward and inverse Haar Wavelets 2D

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

Beginner-friendly repository for Hacktober Fest 2021. Start your contribution to open source through baby steps. 💜

基于Paddlepaddle复现yolov5，支持PaddleDetection接口

Magic tool for managing internet connection in local network by @zalexdev

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Official implementation of the NeurIPS 2021 paper Online Learning Of Neural Computations From Sparse Temporal Feedback

[TPDS'21] COSCO: Container Orchestration using Co-Simulation and Gradient Based Optimization for Fog Computing Environments