Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Related tags

Data AnalysisPyUpBit
Overview

Contributors Forks Stargazers Issues LinkedIn


PyUpBit

CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing
Paper

Table of Contents
  1. About The Project
  2. Usage
  3. Contact
  4. Acknowledgements

About The Project

Bitmaps are common data structures used in database implemen- tations due to having fast read performance. Often they are used in applications in need of common equality and selective range queries. Essentially, they store a bit-vector for each value in the domain of each attribute to keep track of large scale data files. How- ever, the main drawbacks associated with bitmap indexes are its encoding and decoding performances of bit-vectors. Currently the state of art update-optimized bitmap index, update conscious bitmaps, are able to support extremely efficient deletes and have improved update speeds by treating updates as delete then insert. Update conscious bitmaps make use of an additional bit-vector, called the existence bit-vector, to keep track of whether or not a value has been updated. By initializing all values of the existence bit-vector to 1, the data for each attribute associated with each row in the existence bit-vector is validated and presented. If a value needs to be deleted, the corresponding row in the existence bit-vector gets changed to 0, invalidating any data associated with that row. This new method in turn allows for very efficient deletes. To add on, updates are then performed as a delete operation, then an insert operation in to the end of the bit-vector. However, update conscious bitmaps do not scale well with more data. As more and more data gets updated and inserted, the run time increases significantly as well. Because update queries are out-of- place and increase size of vectors, read queries become increasingly expensive and time consuming. Furthermore, as the number of updates and deletes increases, the bit-vector becomes less and less compressible. This brings us to updateable Bitmaps (UpBit). According to the paper, UpBit: Scalable In-Memory Updatable Bitmap Indexing, re- searchers Manos Athanassoulis, Zheng Yan, and Stratos Idreos developed a new bitmap structure that improved the write per- formance of bitmaps without sacrificing read performance. The main differentiating point of UpBit is its use of an update bit vector for every value in the domain of an attribute that keeps track of updated values. This allows for faster write performance without sacrificing read performance. Based on this paper, we implemented UpBit and compared it to our implementation of update conscious bitmaps to compare and test the performances of both methods.

Usage

We used PyCharm to conduct our tests, /ucb, /upbit for algorithms, /tests for running testing scripts, and rest of the files for compression for memory usage improvement as well as creating and visualizing data.

Contact

Daniel Park - @h1yung - [email protected]

Acknowledgements

  • Original Paper
  • Winston Chen
  • Gregory Chininis
  • Daniel Hooks
  • Michael Lee
Owner
Hyeong Kyun (Daniel) Park
I like coding
Hyeong Kyun (Daniel) Park
Senator Trades Monitor

Senator Trades Monitor This monitor will grab the most recent trades by senators and send them as a webhook to discord. Installation To use the monito

Yousaf Cheema 5 Jun 11, 2022
A 2-dimensional physics engine written in Cairo

A 2-dimensional physics engine written in Cairo

Topology 38 Nov 16, 2022
sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

Saiem Gilani 37 Dec 27, 2022
Using Python to derive insights on particular Pokemon, Types, Generations, and Stats

Pokémon Analysis Andreas Nikolaidis February 2022 Introduction Exploratory Analysis Correlations & Descriptive Statistics Principal Component Analysis

Andreas 1 Feb 18, 2022
A columnar data container that can be compressed.

Unmaintained Package Notice Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore. During

944 Dec 09, 2022
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
Tools for the analysis, simulation, and presentation of Lorentz TEM data.

ltempy ltempy is a set of tools for Lorentz TEM data analysis, simulation, and presentation. Features Single Image Transport of Intensity Equation (SI

McMorran Lab 1 Dec 26, 2022
University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021
CRISP: Critical Path Analysis of Microservice Traces

CRISP: Critical Path Analysis of Microservice Traces This repo contains code to compute and present critical path summary from Jaeger microservice tra

Uber Research 110 Jan 06, 2023
Convert monolithic Jupyter notebooks into Ploomber pipelines.

Soorgeon Join our community | Newsletter | Contact us | Blog | Website | YouTube Convert monolithic Jupyter notebooks into Ploomber pipelines. soorgeo

Ploomber 65 Dec 16, 2022
Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

Algorithms for reproducible bioinformatics (Koesterlab) 1 Dec 16, 2021
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 14 Jan 02, 2023
Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 01, 2022
EOD Historical Data Python Library (Unofficial)

EOD Historical Data Python Library (Unofficial) https://eodhistoricaldata.com Installation python3 -m pip install eodhistoricaldata Note Demo API key

Michael Whittle 20 Dec 22, 2022
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
Functional tensors for probabilistic programming

Funsor Funsor is a tensor-like library for functions and distributions. See Functional tensors for probabilistic programming for a system description.

208 Dec 29, 2022
A program that uses an API and a AI model to get info of sotcks

Stock-Market-AI-Analysis I dont mind anyone using this code but please give me credit A program that uses an API and a AI model to get info of stocks

1 Dec 17, 2021