Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Last update: Oct 18, 2022

Related tags

Overview

The KLEJ Benchmark Baselines

The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language understanding.

This repository contains example scripts to easily fine-tune models from the transformers library on the KLEJ benchmark.

Installation

Install the Python package using the following commands:

$ git clone https://github.com/allegro/klejbenchmark-baselines
$ pip install klejbenchmark-baselines/

Quick Start

To fine-tune your model on KLEJ tasks using the default settings, you can use the provided example scripts.

First, download the KLEJ benchmark datasets:

$ bash scripts/download_klej.sh

After downloading KLEJ, customize training parameters inside the scripts/run_training.sh script and train the models using:

$ bash scripts/run_training.sh

It will create:

Tensorboard logs with training and validation metrics,
checkpoints of the best models,
a zip file with predictions for the test sets, which is a valid submission for the KLEJ benchmark.

The zip file can be submitted at the klejbenchmark.com website for the evaluation on the test sets.

Custom Training

It's also possible to train each model separately and customize the training parameters using the klejbenchmark_baselines/main.py script.

License

Apache 2 License

Citation

If you use this code, please cite the following paper:

@inproceedings{rybak-etal-2020-klej,
    title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding",
    author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.111",
    pages = "1191--1201",
}

Authors

This code was created by the Allegro Machine Learning Research team.

You can contact us at: [email protected]

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Related tags

Overview

The KLEJ Benchmark Baselines

Installation

Quick Start

Custom Training

License

Citation

Authors

Owner

Allegro Tech

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

A Streamlit web app that generates Rick and Morty stories using GPT2.

code for modular summarization work published in ACL2021 by Krishna et al

NLP, before and after spaCy

Longformer: The Long-Document Transformer

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

Machine Psychology: Python Generated Art

Leon is an open-source personal assistant who can live on your server.

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

customer care chatbot made with Rasa Open Source.

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

Weaviate demo with the text2vec-openai module

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Code for PED: DETR For (Crowd) Pedestrian Detection

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Python3 to Crystal Translation using Python AST Walker

lightweight, fast and robust columnar dataframe for data analytics with online update

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.