A lossless neural compression framework built on top of JAX.

Overview

Kompressor

GitHub

Branch CI Coverage
main (active) Build codecov
main Build codecov
development Build codecov

A neural compression framework built on top of JAX.

Install

setup.py assumes a compatible version of JAX and JAXLib are already installed. Automated build is tested for a cuda:11.1-cudnn8-runtime-ubuntu20.04 environment with jaxlib==0.1.76+cuda11.cudnn82.

git clone https://github.com/rosalindfranklininstitute/kompressor.git
cd kompressor
pip install -e .

# Run tests
python -m pytest --cov=src/kompressor tests/

Install & Run through Docker environment

Docker image for the Kompressor dependencies are provided in the quay.io/rosalindfranklininstitute/kompressor:main Quay.io image.

# Run the container for the Kompressor environment
docker run --rm quay.io/rosalindfranklininstitute/kompressor:main \
    python -m pytest --cov=/usr/local/kompressor/src/kompressor /usr/local/kompressor/tests

Install & Run through Singularity environment

Singularity image for the Kompressor dependencies are provided in the rosalindfranklininstitute/kompressor/kompressor:main cloud.sylabs.io image.

singularity pull library://rosalindfranklininstitute/kompressor/kompressor:main
singularity run kompressor_main.sif \
    python -m pytest --cov=/usr/local/kompressor/src/kompressor /usr/local/kompressor/tests
Comments
  • Refactor map tuples to dicts

    Refactor map tuples to dicts

    Closes #14. Functions which currently return an ordered tuple of maps (lrmap, udmap, cmap, ...) now return keyed dictionaries { 'lrmap': lrmap, 'udmap': udmap, 'cmap': cmap, ... } so that order/usage is explicitly enforced.

    List comprehensions over the tuples now use jax.tree_map and jax.tree_multimap to ensure key safety.

    @GMW99, this will break the current implementation of the Metrics Callback class which iterates over a zip of the hardcoded map names and the maps tuple. This iteration can be replaced by iterating over maps.items() since it is now a dict already.

    enhancement 
    opened by JossWhittle 1
  • Ensure jax.jit static_argnums is refactored to static_argnames

    Ensure jax.jit static_argnums is refactored to static_argnames

    Functions that currently mark static_argnums=(0, 1, 2) should be updated to use the safer static_argnames=('tom', 'dick', 'harry') that is now available.

    enhancement high priority 
    opened by JossWhittle 1
  • Update development examples

    Update development examples

    • Splits docker image into JAX base image and Kompressor dependency and install image
    • JAX image installs JAX from source to ensure correct CUDA / CUDNN versions
    • Adjust setup.py to install dependencies from requirement.txt
    • Refactors a how submodules are imported (within the kom.image submodule. Need to check volumes matches)
    • Add kom.image.data submodule for dealing with tensorflow data pipelines
    • Fixed pooling in the total variation losses (used as metrics in the example notebooks)
    • Move all the encoding/decoding functions for the maps into a kom.mapping submodule
    • Add within-k and run-length metrics to kom.image.metrics for example notebooks
    • Added example notebooks for interacting with the maps and training a basic Haiku compression model
    feature 
    opened by JossWhittle 0
  • Add mapping encode/decode functions for float32 data

    Add mapping encode/decode functions for float32 data

    Will need a bit of thinking to get right. We probably need to consider similar tricks that we used for applying Radix Sort on float32 data to make the compression numerically stable and portable between machines.

    enhancement low priority 
    opened by JossWhittle 0
  • Add mapping encode/decode functions for uint32 data

    Add mapping encode/decode functions for uint32 data

    Some of our data is uint32 volumes.

    Will need to trace through the full compression implementation and make sure intermediate value dtypes are large enough to avoid uint32 overflow when needed.

    enhancement low priority 
    opened by JossWhittle 0
  • Modify core encode decode functions to pass a dict to the prediction function

    Modify core encode decode functions to pass a dict to the prediction function

    Currently the lowres inputs are passed directly to the prediction_fn as the only input.

    • Modify to accept a dict that has at least one key for the lowres input.

    • Provide boolean flag to also pass a positional encoding tensor along with the lowres which the model can use if needed.

    • Chunked encode decode will need to generate the correct chunks of the positional encoding for the current chunk.

    • Model can choose how to use positional encodings.

      • Image case would receive (B, H, W, 2) tensor containing the Y and X coordinates of each pixel in the trailing axis.
      • Volume case would receive (B, D, H, W, 3) tensor containing the Z, Y, and X coordinates of each voxel in the trailing axis.
    enhancement high priority 
    opened by JossWhittle 0
  • Look at decompressing sliced chunks

    Look at decompressing sliced chunks

    Decompress sliced chunk of image or volume without needing to decompress the entire data element.

    • May require applying secondary compression in blocks to avoid needing to decompress the full level maps, only to apply the predictor to the target slice.

    • Instead unpack just the blocks needed for the slice then trim.

    • A kompressor (or stack of) trained to secondary compress the maps from the primary kompressor (or stack of) would be able to naturally handle slice chunked decoding.

      • Could such a secondary compressor be shared between levels? Between multiple kompressors in the primary stack?
    experiment low priority 
    opened by JossWhittle 0
  • Look at compressing timeseries data

    Look at compressing timeseries data

    • Experiment with implementing the 1D case for compressing signals.
    • Video as sequence of 2D frames using the 3D volume code directly.
    • Look at compressing within timestep using information from neighbouring timesteps without actually compressing (dropping frames) the temporal axis.
    experiment low priority 
    opened by JossWhittle 0
Releases(v0.0.0)
Owner
Rosalind Franklin Institute
The Rosalind Franklin Institute is dedicated to transforming life science through interdisciplinary research and technology development
Rosalind Franklin Institute
🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020) This is the official implementation of RandLA-Net (CVPR2020, Oral

Qingyong 1k Dec 30, 2022
Neural style in TensorFlow! 🎨

neural-style An implementation of neural style in TensorFlow. This implementation is a lot simpler than a lot of the other ones out there, thanks to T

Anish Athalye 5.5k Dec 29, 2022
Protect against subdomain takeover

domain-protect scans Amazon Route53 across an AWS Organization for domain records vulnerable to takeover deploy to security audit account scan your en

OVO Technology 0 Nov 17, 2022
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.

Tensorflow-Mobile-Generic-Object-Localizer Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label. Ori

Ibai Gorordo 11 Nov 15, 2022
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

Thinc: A refreshing functional take on deep learning, compatible with your favorite libraries From the makers of spaCy, Prodigy and FastAPI Thinc is a

Explosion 2.6k Dec 30, 2022
Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Kim Seonghyeon 2.2k Jan 01, 2023
a basic code repository for basic task in CV(classification,detection,segmentation)

basic_cv a basic code repository for basic task in CV(classification,detection,segmentation,tracking) classification generate dataset train predict de

1 Oct 15, 2021
LBK 20 Dec 02, 2022
generate-2D-quadrilateral-mesh-with-neural-networks-and-tree-search

generate-2D-quadrilateral-mesh-with-neural-networks-and-tree-search This repository contains single-threaded TreeMesh code. I'm Hua Tong, a senior stu

Hua Tong 18 Sep 21, 2022
571 Dec 25, 2022
Adaptive Graph Convolution for Point Cloud Analysis

Adaptive Graph Convolution for Point Cloud Analysis This repository contains the implementation of AdaptConv for point cloud analysis. Adaptive Graph

64 Dec 21, 2022
[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

SelectionGAN for Guided Image-to-Image Translation CVPR Paper | Extended Paper | Guided-I2I-Translation-Papers Citation If you use this code for your

Hao Tang 424 Dec 02, 2022
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.

GainedVAE A Pytorch Implementation of a continuously rate adjustable learned image compression framework, Gained Variational Autoencoder(GainedVAE). N

39 Dec 24, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

Ayodeji Yekeen 1 Jan 01, 2022
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Codebase for testing whether hidden states of neural networks encode discrete structures.

structural-probes Codebase for testing whether hidden states of neural networks encode discrete structures. Based on the paper A Structural Probe for

John Hewitt 349 Dec 17, 2022