This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

Overview

Parquet benchmarks

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

The results on Azure's Standard D4s v3 (4 vcpus, 16 GiB memory) are available here.

Read uncompressed

read uncompressed i64

read uncompressed bool

read uncompressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

read uncompressed dict utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Read compressed (snappy)

read compressed i64

read compressed bool

read compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Write uncompressed

write uncompressed i64

write uncompressed bool

write uncompressed utf8

Write compressed (snappy)

write compressed i64

write compressed bool

write compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Run benchmarks

To reproduce, use

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install pyarrow

# create files
venv/bin/python write_parquet.py

# run benchmarks
venv/bin/python run.py

# print results to stdout as csv
venv/bin/python summarize.py

Details

The benchmark reads a single column from a file pre-loaded into memory, decompresses and deserializes the column to an arrow array.

The benchmark includes different configurations:

  • dictionary-encoded vs plain encoding
  • single page vs multiple pages
  • compressed vs uncompressed
  • different types:
    • i64
    • bool
    • utf8
Declarative HTTP Testing for Python and anything else

Gabbi Release Notes Gabbi is a tool for running HTTP tests where requests and responses are represented in a declarative YAML-based form. The simplest

Chris Dent 139 Sep 21, 2022
Front End Test Automation with Pytest Framework

Front End Test Automation Framework with Pytest Installation and running instructions: 1. To install the framework on your local machine: clone the re

Sergey Kolokolov 2 Jun 17, 2022
A browser automation framework and ecosystem.

Selenium Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provide

Selenium 25.5k Jan 01, 2023
Tutorial for integrating Oxylabs' Residential Proxies with Selenium

Oxylabs’ Residential Proxies integration with Selenium Requirements For the integration to work, you'll need to install Selenium on your system. You c

Oxylabs.io 8 Dec 08, 2022
A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

folder-automation A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well Th

Parag Jyoti Paul 31 May 28, 2021
This project is used to send a screenshot by email of your MyUMons schedule using Selenium python lib (headless mode)

MyUMonsSchedule Use MyUMonsSchedule python script to send a screenshot by email (Gmail) of your MyUMons schedule. If you use it on Windows, take care

Pierre-Louis D'Agostino 6 May 12, 2022
Pyramid debug toolbar

pyramid_debugtoolbar pyramid_debugtoolbar provides a debug toolbar useful while you're developing your Pyramid application. Note that pyramid_debugtoo

Pylons Project 95 Sep 17, 2022
Selenium-python but lighter: Helium is the best Python library for web automation.

Selenium-python but lighter: Helium Selenium-python is great for web automation. Helium makes it easier to use. For example: Under the hood, Helium fo

Michael Herrmann 3.2k Dec 31, 2022
Photostudio是一款能进行自动化检测网页存活并实时给网页拍照的工具,通过调用Fofa/Zoomeye/360qua/shodan等 Api快速准确查询资产并进行网页截图,从而实施进一步的信息筛查。

Photostudio-红队快速爬取网页快照工具 一、简介: 正如其名:这是一款能进行自动化检测,实时给网页拍照的工具 信息收集要求所收集到的信息要真实可靠。 当然,这个原则是信息收集工作的最基本的要求。为达到这样的要求,信息收集者就必须对收集到的信息反复核实,不断检验,力求把误差减少到最低限度。我

s7ck Team 41 Dec 11, 2022
A Python Selenium library inspired by the Testing Library

Selenium Testing Library Slenium Testing Library (STL) is a Python library for Selenium inspired by Testing-Library. Dependencies Python 3.6, 3.7, 3.8

Anže Pečar 12 Dec 26, 2022
Ab testing - The using AB test to test of difference of conversion rate

Facebook recently introduced a new type of offer that is an alternative to the current type of bidding called maximum bidding he introduced average bidding.

5 Nov 21, 2022
Generic automation framework for acceptance testing and RPA

Robot Framework Introduction Installation Example Usage Documentation Support and contact Contributing License Introduction Robot Framework is a gener

Robot Framework 7.7k Jan 07, 2023
This file will contain a series of Python functions that use the Selenium library to search for elements in a web page while logging everything into a file

element_search with Selenium (Now With docstrings 😎 ) Just to mention, I'm a beginner to all this, so it it's very possible to make some mistakes The

2 Aug 12, 2021
Automatic SQL injection and database takeover tool

sqlmap sqlmap is an open source penetration testing tool that automates the process of detecting and exploiting SQL injection flaws and taking over of

sqlmapproject 25.7k Jan 04, 2023
Python dilinin Selenium kütüphanesini kullanarak; Amazon, LinkedIn ve ÇiçekSepeti üzerinde test işlemleri yaptığımız bir case study reposudur.

Python dilinin Selenium kütüphanesini kullanarak; Amazon, LinkedIn ve ÇiçekSepeti üzerinde test işlemleri yaptığımız bir case study reposudur. LinkedI

Furkan Gulsen 8 Nov 01, 2022
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:

A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:

Dion Häfner 255 Jan 04, 2023
FakeDataGen is a Full Valid Fake Data Generator.

FakeDataGen is a Full Valid Fake Data Generator. This tool helps you to create fake accounts (in Spanish format) with fully valid data. Within this in

Joel GM 64 Dec 12, 2022
Testinfra test your infrastructures

Testinfra test your infrastructure Latest documentation: https://testinfra.readthedocs.io/en/latest About With Testinfra you can write unit tests in P

pytest-dev 2.1k Jan 07, 2023
A simple script to login into twitter using Selenium in python.

Quick Talk A simple script to login into twitter using Selenium in python. I was looking for a way to login into twitter using Selenium in python. Sin

Lzy-slh 4 Nov 20, 2022
A single module to link Python ecosystem to the Web

A single module to link Python ecosystem to the Web. Have a quick look at the Gallery first to get convinced ! FAQ For any questions, please use Stack

66 Dec 21, 2022