ZipFly is a zip archive generator based on zipfile.py

Overview

Build Status GitHub release (latest by date) Downloads

Buzon - ZipFly

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without memory inflation.

Requirements

Python 3.6+

Install

pip3 install zipfly

Basic usage, compress on-the-fly during writes

Using this library will save you from having to write the Zip to disk. Some data will be buffered by the zipfile deflater, but memory inflation is going to be very constrained. Data will be written to destination by default at regular 32KB intervals.

ZipFly defaults attributes:

  • paths: [ ]
  • mode: (write) w
  • chunksize: (bytes) 32768
  • compression: Stored
  • allowZip64: True
  • compresslevel: None
  • storesize: (bytes) 0
  • encode: utf-8

paths list of dictionaries:

.
fs Should be the path to a file on the filesystem
n (Optional) Is the name which it will have within the archive
(by default, this will be the same as fs)

    import zipfly

    paths = [
        {
            'fs': '/path/to/large/file'
        },
    ]

    zfly = zipfly.ZipFly(paths = paths)

    generator = zfly.generator()
    print (generator)
    # 
   


    with open("large.zip", "wb") as f:
        for i in generator:
            f.write(i)

Examples

Streaming multiple files in a zip with Django or Flask Send forth large files to clients with the most popular frameworks

Create paths Easy way to create the array paths from a parent folder.

Predict the size of the zip file before creating it Use the BufferPredictionSize to compute the correct size of the resulting archive before creating it.

Streaming a large file Efficient way to read a single very large binary file in python

Set a comment Your own comment in the zip file

Maintainer

Santiago Debus (@santiagodebus.com)

License

This library was created by Buzon.io and is released under the MIT. Copyright 2021 Cardallot, Inc.

Comments
  • Cannot include empty folders

    Cannot include empty folders

    Python's zipfile module supports writing empty folders to zip archive.

    when trying with zipfly i get:

    File "/.../zipfly.py", line 211, in generator
        with open( path[self.filesystem], 'rb' ) as e:
    IsADirectoryError: [Errno 21] Is a directory: '/path/to/dir'
    

    Is there an undocumented way for writing empty folders?

    opened by rnixx 2
  • Is compression possible?

    Is compression possible?

    Hi,

    I see that only uncompressed ZIP files are can be created with zipfly. What is the reason for that? I don't see any compression level specific code in ZipFly.generator, except file size estimation (which doesn't work for ZIP64 achives).

    So, is compression just an unimplemented feature? Or are there any aspects which don't allow using compression for streaming made this way?

    I understand that compression is often useless in streaming scenarios but it's not true in our case. We're looking for a replacement for zipstream-new which seems to be dead. ZipFly looks nice for us...

    opened by dezhin 2
  • ASGI breaks functionability in HttpStreamingResponse

    ASGI breaks functionability in HttpStreamingResponse

    Running Django 4.1 in async ASGI mode (channels & websockets needed)

    Zipfly functionability breaks as it tries to compile the entire zip to memory before sending zip to client, filling up memory and causing a crash

    opened by T-101 1
  • Release master branch to pypi

    Release master branch to pypi

    It looks like the last version on PyPI has been published 2 years ago and there have been many changes in the code since.

    It would be helpful if you could release the latest changes.

    PS since the project is already using GH action for testing, it may be helpful to set up GH actions for release as well.

    https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

    https://github.com/marketplace/actions/pypi-publish

    opened by 1oglop1 1
  • Stream into stream

    Stream into stream

    Hi, thank you for the inspirational code.

    I'm wondering if it's possible to adapt your code and make it compatible with S3 storage.

    I use django-storages and boto3 client returns a streaming body (already open fp) and all metadata.

    I need a zipfile (or any other archive) to be created from the set of files during the GET request (not great but it has to be).

    So to save the memory I have 2 options.

    1. Use zipfly as is and download files into a temporary location (and remove it after the operation)
    2. Or better solution that doesn't require intermediate storage so I could pipe the content of streaming body into zipfly and return that as a streaming response.

    streaming body (many of them) -> zipfly(zip file) -> streaming response

    Do you think that option two is possible? If so, could you please point out what pieces of code I should focus on to adapt zipfly?

    Thank you

    opened by 1oglop1 1
  • Add functionality to provide file streams and zip them on the fly

    Add functionality to provide file streams and zip them on the fly

    It is a common use case to store large files remotely and not next to the code. See e.g. for django https://github.com/jschneier/django-storages. If many (large) files must be shipped synchronuosly as a zip file, it saves memory and storage to pass them through the web worker as a stream without saving anything to disk. To archive that, file buffers as input may be supported.

    I would appreciate if we could add this functionality. This pull request contains a tested initial attempt. Feel free to improve!

    This pull request is based on https://github.com/BuzonIO/zipfly/pull/75 and https://github.com/BuzonIO/zipfly/pull/76

    opened by beckedorf 0
  • Using bytes instead of actual files

    Using bytes instead of actual files

    Hi, thanks for the great work. I have some bytes generators and I want to make a zip on the fly out of them. Is there an option to do this? Sorry for my poor python knowledge. Thanks :)

    opened by TaToTanWeb 2
Owner
Buzon
Buzon.io is a Cardallot Inc. service
Buzon
A Python script to backup your favorite Discord gifs

About the project Discord recently felt like it would be a good idea to limit the favorites to 250, which made me lose most of my gifs... Luckily for

4 Aug 03, 2022
Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

Mahdi 2 Nov 09, 2021
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th

Singer 1.1k Jan 05, 2023
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
Python Sreamlit Duplicate Records Finder Remover

Python-Sreamlit-Duplicate-Records-Finder-Remover Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom w

RONALD KANYEPI 1 Jan 21, 2022
Read and write TIFF files

Read and write TIFF files Tifffile is a Python library to store numpy arrays in TIFF (Tagged Image File Format) files, and read image and metadata fro

Christoph Gohlke 346 Dec 18, 2022
Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags.

Tiff Tools Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags. Developed by Kitware, Inc. with funding from The National Canc

Digital Slide Archive 32 Dec 14, 2022
This is just a GUI that detects your file's real extension using the filetype module.

Real-file.extnsn This is just a GUI that detects your file's real extension using the filetype module. Requirements Python 3.4 and above filetype modu

1 Aug 08, 2021
File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Tin Tvrtković 2.1k Jan 01, 2023
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
Search for files under the specified directory. Extract the file name and file path and import them as data.

Search for files under the specified directory. Extract the file name and file path and import them as data. Based on that, search for the file, select it and open it.

G-jon FujiYama 2 Jan 10, 2022
csv2ir is a script to convert ir .csv files to .ir files for the flipper.

csv2ir csv2ir is a script to convert ir .csv files to .ir files for the flipper. For a repo of .ir files, please see https://github.com/logickworkshop

Alex 38 Dec 31, 2022
Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

Quy Truong 5 Nov 22, 2021
Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

asammdf is a fast parser and editor for ASAM (Association for Standardization of Automation and Measuring Systems) MDF (Measurement Data Format) files

Daniel Hrisca 440 Dec 31, 2022
Quick and dirty FAT12 filesystem to ZIP file converter

Quick and Dirty FAT12 Filesystem Converter This is a really crappy Python script I wrote to convert a semi-compatible FAT12 filesystem from my HP150's

Tube Time 2 Feb 12, 2022
This is a junk file creator tool which creates junk files in Internal Storage

This is a junk file creator tool which creates junk files in Internal Storage

KiLL3R_xRO 3 Jun 20, 2021
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

8 Dec 06, 2022
CleverCSV is a Python package for handling messy CSV files.

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line applicati

The Alan Turing Institute 1k Dec 19, 2022
Organize the files into the relevant sub-folders

This program can be used to organize files in a directory by their file extension. And move duplicate files to a duplicates folder.

Thushara Thiwanka 2 Dec 15, 2021
FileGenerator - File Generator for sites that accepts documents

File Generator for sites that accepts documents This code generates files as per

Shaunak 2 Mar 19, 2022