The evaluator covering all of the metrics required by tasks within the DUE Benchmark.

Related tags

Testingevaluator
Overview

DUE Evaluator

The repository contains the evaluator covering all of the metrics required by tasks within the DUE Benchmark, i.e., set-based F1 (for KIE), ANLS (used in document VQA), accuracy (including variant used in WTQ), as well as group-based ANLS we proposed for KIE problems with structured output.

Usage

The deval command will be available after the package installation. Every time, it is required to provide input and output files (both in the DU-Schema format) using -o and -r parameters.

Other settings are task-specific and limited to metric (-m) and optional case-insensitiveness (-i). Recommended values of these are:

Dataset Metric Case insensitive
DocVQA, InfographicsVQA ANLS Yes
Kleister Charity, DeepForm F1 Yes
PapersWithCode GROUP-ANLS Yes
WikiTableQuestions WTQ No (handled by metric itself)
TabFact F1 (obtained value will be equal to Accuracy) No
Owner
DUE Benchmark
The benchmark consisting of both available and reformulated datasets to measure the end-to-end capabilities of systems in real-world scenarios.
DUE Benchmark
Pytest support for asyncio.

pytest-asyncio: pytest support for asyncio pytest-asyncio is an Apache2 licensed library, written in Python, for testing asyncio code with pytest. asy

pytest-dev 1.1k Jan 02, 2023
自动化爬取并自动测试所有swagger-ui.html显示的接口

swagger-hack 在测试中偶尔会碰到swagger泄露 常见的泄露如图: 有的泄露接口特别多,每一个都手动去试根本试不过来 于是用python写了个脚本自动爬取所有接口,配置好传参发包访问 原理是首先抓取http://url/swagger-resources 获取到有哪些标准及对应的文档地

jayus 534 Dec 29, 2022
Subprocesses for Humans 2.0.

Delegator.py — Subprocesses for Humans 2.0 Delegator.py is a simple library for dealing with subprocesses, inspired by both envoy and pexpect (in fact

Amit Tripathi 1.6k Jan 04, 2023
Language-agnostic HTTP API Testing Tool

Dredd — HTTP API Testing Framework Dredd is a language-agnostic command-line tool for validating API description document against backend implementati

Apiary 4k Jan 05, 2023
Enabling easy statistical significance testing for deep neural networks.

deep-significance: Easy and Better Significance Testing for Deep Neural Networks Contents ⁉️ Why 📥 Installation 🔖 Examples Intermezzo: Almost Stocha

Dennis Ulmer 270 Dec 20, 2022
WIP SAT benchmarking tooling, written with only my personal use in mind.

SAT Benchmarking Some early work in progress tooling for running benchmarks and keeping track of the results when working on SAT solvers and related t

Jannis Harder 1 Dec 26, 2021
Sixpack is a language-agnostic a/b-testing framework

Sixpack Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries

1.7k Dec 24, 2022
Sixpack is a language-agnostic a/b-testing framework

Sixpack Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries

1.7k Dec 24, 2022
This repository contnains sample problems with test cases using Cormen-Lib

Cormen Lib Sample Problems Description This repository contnains sample problems with test cases using Cormen-Lib. These problems were made for the pu

Cormen Lib 3 Jun 30, 2022
WEB PENETRATION TESTING TOOL 💥

N-WEB ADVANCE WEB PENETRATION TESTING TOOL Features 🎭 Admin Panel Finder Admin Scanner Dork Generator Advance Dork Finder Extract Links No Redirect H

56 Dec 23, 2022
Tattoo - System for automating the Gentoo arch testing process

Naming origin Well, naming things is very hard. Thankfully we have an excellent

Arthur Zamarin 4 Nov 07, 2022
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Leon 3.5k Dec 30, 2022
Mock smart contracts for writing Ethereum test suites

Mock smart contracts for writing Ethereum test suites This package contains comm

Trading Strategy 222 Jan 04, 2023
Airspeed Velocity: A simple Python benchmarking tool with web-based reporting

airspeed velocity airspeed velocity (asv) is a tool for benchmarking Python packages over their lifetime. It is primarily designed to benchmark a sing

745 Dec 28, 2022
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application

Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application Crawling a server-side-rendered web application is a

70 Aug 07, 2022
Auto-hms-action - Automation of NU Health Management System

🦾 Automation of NU Health Management System 🤖 長崎大学 健康管理システムの自動化 🏯 Usage / 使い方

k5-mot 3 Mar 04, 2022
Pynguin, The PYthoN General UnIt Test geNerator is a test-generation tool for Python

Pynguin, the PYthoN General UnIt test geNerator, is a tool that allows developers to generate unit tests automatically.

Chair of Software Engineering II, Uni Passau 997 Jan 06, 2023
Declarative HTTP Testing for Python and anything else

Gabbi Release Notes Gabbi is a tool for running HTTP tests where requests and responses are represented in a declarative YAML-based form. The simplest

Chris Dent 139 Sep 21, 2022
API Rest testing FastAPI + SQLAchmey + Docker

Transactions API Rest Implement and design a simple REST API Description We need to a simple API that allow us to register users' transactions and hav

TxeMac 2 Jun 30, 2022
pytest splinter and selenium integration for anyone interested in browser interaction in tests

Splinter plugin for the pytest runner Install pytest-splinter pip install pytest-splinter Features The plugin provides a set of fixtures to use splin

pytest-dev 238 Nov 14, 2022