BREP : Binary Search in plaintext and gzip files

Overview

BREP : Binary Search in plaintext and gzip files

Search large files in O(log n) time using binary search.
We support plaintext and Gzipped files.

Benchmark : 8x faster than grep on a 2GB dataset !

brep is usually faster than grep for >1GB datasets.

Check tests/benchmark.py to reproduce the results.

grep ^777 test.txt : 1.594 s (15 runs)
brep 777 test.txt : 206.8 ms (15 runs)

Installation

pip install brep or pip install . from this repo

Index your file

In order to conduct binary search, your file needs to be sorted.
We recommend GNU sort, as it's multithreaded and supports large files.
LC_ALL=C sort -u -o output_file input_file

BREP supports compressed file in the GZIP format.
We recommend pigz for quick multicore compression : pigz file

Usage

Provide 1 prefix search term and 1 filepath
brep 77777 test/large.gz

You can also search from our Python class

from brep import Search

for result in Search("77777", "test/large.gz"):
    print(result)

Contribute

PRs are welcome!

Install dev dependencies: pip install -e .[dev]
Test and lint before submitting: pytest && flake8

Todo

  • Reimplement in Rust
  • Faster gz size estimation
  • Search multiple strings at once
You might also like...
dotsend is a web application which helps you to upload your large files and share file via link

dotsend is a web application which helps you to upload your large files and share file via link

A wrapper for DVD file structure and ISO files.

vs-parsedvd DVDs were an error. A wrapper for DVD file structure and ISO files. You can find me in the IEW Discord server

Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

Fasta Uploader A tool for batch processing large fasta files and accompanying metadata table to repositories via API The python fasta_uploader.py scri

Python interface for reading and appending tar files

Python interface for reading and appending tar files, while keeping a fast index for finding and reading files in the archive. This interface has been

A simple file module for creating, editing and saving files.

A simple file module for creating, editing and saving files.

This program can help you to move and rename many files at once
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files
Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

asammdf is a fast parser and editor for ASAM (Association for Standardization of Automation and Measuring Systems) MDF (Measurement Data Format) files

Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series.
Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series.

Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series. The Fourier series can be animated and visualized, the function can be output as a two dimensional vector for Desmos and there is a method to output the coefficients as LaTeX code.

Releases(v1.0.1)
Owner
Arnaud de Saint Meloir
Software Engineer
Arnaud de Saint Meloir
Simple archive format designed for quickly reading some files without extracting the entire archive

Simple archive format designed for quickly reading some files without extracting the entire archive

Jarred Sumner 336 Dec 30, 2022
Annotate your Python requirements.txt file with summaries of each package.

Summarize Requirements 🐍 📜 Annotate your Python requirements.txt file with a short summary of each package. This tool: takes a Python requirements.t

Zeke Sikelianos 8 Apr 22, 2022
Provides a convenient way to append numpy arrays to a file.

Provides a convenient way to append numpy arrays to a file. The NpendWriter and NpendReader classes are used to write and read numpy arrays respective

3 May 14, 2022
Nmap XML output to CSV and HTTP/HTTPS URLS.

xml-to-csv-url Convert NMAP's XML output to CSV file and print URL addresses for HTTP/HTTPS ports. NOTE: OS Version Parsing is not working properly ye

1 Dec 21, 2021
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file w

2 Oct 15, 2022
QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/

Quarkslab 103 Dec 30, 2022
Remove [x]_ from StudIP zip Archives and archive_filelist.csv completely

This tool removes the "[x]_" at the beginning of StudIP zip Archives. It also deletes the "archive_filelist.csv" file

Kelke vl 1 Jan 19, 2022
CSV-Handler written in Python3

CSVHandler This code allows you to work intelligently with CSV files. A file in CSV syntax is converted into several lists, which are combined in a to

Max Tischberger 1 Jan 13, 2022
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
Small-File-Explorer - I coded a small file explorer with several options

Petit explorateur de fichier / Small file explorer Pour la première option (création de répertoire) / For the first option (creation of a directory) e

Xerox 1 Jan 03, 2022
Measure file similarity in a many-to-many fashion

Mesi Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The outpu

GatorEducator 3 Feb 02, 2022
Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Department for International Trade 34 Jan 05, 2023
Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags.

Tiff Tools Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags. Developed by Kitware, Inc. with funding from The National Canc

Digital Slide Archive 32 Dec 14, 2022
ValveVMF - A python library to parse Valve's VMF files

ValveVMF ValveVMF is a Python library for parsing .vmf files for the Source Engi

pySourceSDK 2 Jan 02, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021
Convert All TXT Files To One File.

AllToOne Convert All TXT Files To One File. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Lea

4 Jun 07, 2022
Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series.

Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series. The Fourier series can be animated and visualized, the function can be output as a tw

Alexander 12 Jan 01, 2023
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".

the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If

ActiveState Software 948 Dec 31, 2022
Various technical documentation, in electronically parseable format

a-pile-of-documentation Various technical documentation, in electronically parseable format. You will need Python 3 to run the scripts and programs in

Jonathan Campbell 2 Nov 20, 2022
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 05, 2022