My own Unicode compression algorithm

Overview

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

You might also like...
Search algorithm implementations meant for teaching

Search-py A collection of search algorithms for teaching and experimenting. Non-adversarial Search There’s a heavy separation of concerns which leads

8-puzzle-solver with UCS, ILS, IDA* algorithm
8-puzzle-solver with UCS, ILS, IDA* algorithm

Eight Puzzle 8-puzzle-solver with UCS, ILS, IDA* algorithm pre-usage requirements python3 python3-pip virtualenv prepare enviroment virtualenv -p pyth

A Python Package for Portfolio Optimization using the Critical Line Algorithm

A Python Package for Portfolio Optimization using the Critical Line Algorithm

🧬 Training the car to do self-parking using a genetic algorithm
🧬 Training the car to do self-parking using a genetic algorithm

🧬 Training the car to do self-parking using a genetic algorithm

A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format.

TSP-Nearest-Insertion A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format. Instructions Load a txt file wi

A GUI visualization of QuickSort algorithm
A GUI visualization of QuickSort algorithm

QQuickSort A simple GUI visualization of QuickSort algorithm. It only uses PySide6, it does not have any other external dependency. How to run Install

A python implementation of the Basic Photometric Stereo Algorithm
A python implementation of the Basic Photometric Stereo Algorithm

Photometric-Stereo A python implementation of the Basic Photometric Stereo Algorithm Result Usage run Photometric_Stereo.py Code Tree |data #原始数据,tga格

Python package to monitor the power consumption of any algorithm

CarbonAI This project aims at creating a python package that allows you to monitor the power consumption of any python function. Documentation The com

marching rectangles algorithm in python with clean code.

Marching Rectangles marching rectangles algorithm in python with clean code. Tools Python 3 EasyDraw Creators Mohammad Dori Run the Code Installation

Releases(v0.0.1)
Owner
Vahid Zehtab
Applied & Theoretical Deep Learning Researcher currently studying my last semester of Compute Engineering Bachelors program at Sharif University of Technology
Vahid Zehtab
implementation of the KNN algorithm on crab biometrics dataset for CS16

crab-knn implementation of the KNN algorithm in Python applied to biometrics data of purple rock crabs (leptograpsus variegatus) to classify the sex o

Andrew W. Chen 1 Nov 18, 2021
Exact algorithm for computing two-sided statistical tolerance intervals under a normal distribution assumption using Python.

norm-tol-int Exact algorithm for computing two-sided statistical tolerance intervals under a normal distribution assumption using Python. Methods The

Jed Ludlow 1 Jan 06, 2022
It is a platform that implements some path planning algorithms.

PathPlanningAlgorithms It is a platform that implements some path planning algorithms. Main dependence: python3.7, opencv4.1.1.26 (for image show) Tip

5 Feb 24, 2022
Data Model built using Logistic Regression Algorithm on Python.

Logistic-Regression Problem Statement: Your client is a retail banking institution. Term deposits are a major source of income for a bank. A term depo

Hemanth Babu Muthineni 0 Dec 25, 2021
A collection of Python Scripts made for fun, while exploring Python 🐍

JFF-Python-Scripts A collection of Python Scripts made for fun, while exploring Python 🐍 Inspiration 💡 Many of the programs in this repository are i

Pushkar Patel 16 Oct 07, 2022
The test data, code and detailed description of the AW t-SNE algorithm

AW-t-SNE The test data, code and result of the AW t-SNE algorithm Structure of the folder Datasets: This folder contains two datasets, the MNIST datas

1 Mar 09, 2022
A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format.

TSP-Nearest-Insertion A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format. Instructions Load a txt file wi

sjas_Phantom 1 Dec 02, 2021
Implementation of Apriori Algorithm for Association Analysis

Implementation of Apriori Algorithm for Association Analysis

3 Nov 14, 2021
Python based framework providing a simple and intuitive framework for algorithmic trading

Harvest is a Python based framework providing a simple and intuitive framework for algorithmic trading. Visit Harvest's website for details, tutorials

100 Jan 03, 2023
Machine Learning algorithms implementation.

Machine Learning Algorithms Machine Learning algorithms implementation. What can I find here? ML Algorithms KNN K-Means-Clustering SVM (MultiClass) Pe

David Levin 1 Dec 10, 2021
N Queen Problem using Genetic Algorithm

The N Queen is the problem of placing N chess queens on an N×N chessboard so that no two queens attack each other.

Mahdi Hassanzadeh 2 Nov 11, 2022
With this algorithm you can see all best positions for a Team.

Best Positions Imagine that you have a favorite team, and you want to know until wich position your team can reach With this algorithm you can see all

darlyn 4 Jan 28, 2022
Primedice like provably fair algorithm

Primedice like provably fair algorithm

Ryu juheon 3 Dec 02, 2022
A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines

py-earth A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines algorithm, in the style of scikit-learn. The py-earth p

431 Dec 15, 2022
Python implementation of Aho-Corasick algorithm for string searching

Python implementation of Aho-Corasick algorithm for string searching

Daniel O'Sullivan 1 Dec 31, 2021
Genius Square puzzle solver in Python

Genius Square puzzle solver in Python

James 3 Dec 15, 2022
So far implements A* will add more later

Pathfinding_Visualization Finds the shortest path between two nodes. The light blue path is the shortest path. The black nodes are barriers. Created i

Lukas DeLoach 1 Jan 18, 2022
Policy Gradient Algorithms (One Step Actor Critic & PPO) from scratch using Numpy

Policy Gradient Algorithms From Scratch (NumPy) This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Op

1 Jan 17, 2022
Solving a card game with three search algorithms: BFS, IDS, and A*

Search Algorithms Overview In this project, we want to solve a card game with three search algorithms. In this card game, we have to sort our cards by

Korosh 5 Aug 04, 2022
Implementation of core NuPIC algorithms in C++

NuPIC Core This repository contains the C++ source code for the Numenta Platform for Intelligent Computing (NuPIC)

Numenta 270 Nov 19, 2022