LightCSV - This CSV reader is implemented in just pure Python.

Overview

LightCSV

Python 3.8 Python 3.9 Code style: black

Simple light CSV reader

This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles (or get the first row as titles). Nothing more, nothing else.

Usage

Usage is pretty straightforward:

from lightcsv import LightCSV

for row in LightCSV().read_file("myfile.csv"):
    print(row)

This will open a file named myfile.csv and iterate over the CSV file returning each row as a key-value dictionary. Line endings can be either \n or \r\n. The file will be opened in text-mode with utf-8 encoding.

You can supply your own stream (i.e. an open file instead of a filename). You can use this, for example, to open a file with a different encoding, etc.:

from lightcsv import LightCSV

with open("myfile.csv") as f:
    for row in LightCSV().read(f):
        print(row)
NOTE: Blank lines at any point in the file will be ignored

Parameters

LightCSV can be parametrized during initialization to fine-tune its behaviour.

The following example shows initialization with default parameters:

from lightcsv import LightCSV

myCSV_reader = LightCSV(
    separator=",",
    quote_char='"',
    field_names = None,
    strict=True,
    has_headers=False
)

Available settings:

  • separator: character used as separator (defaults to ,)
  • quote_char: character used to quote strings (defaults to ").
    This char can be escaped by duplicating it.
  • field_names: can be any iterable or sequence of str (i.e. a list of strings).
    If set, these will be used as column titles (dictionary keys), and also sets the expected number of columns.
  • strict: Sets whether the parser runs in strict mode or not.
    In strict mode the parser will raise a ValueError exception if a cell cannot be decoded or column numbers don't match. In non-strict mode non-recognized cells will be returned as strings. If there are more columns than expected they will be ignored. If there are less, the dictionary will contain also fewer values.
  • has_headers: whether the first row should be taken as column titles or not.
    If set, field_names cannot be specified. If not set, and no field names are specified, dictionary keys will be just the column positions of the cells.

Data types recognized

The parser will try to match the following types are recognized in this order:

  • None (empty values). Unlike CSV reader, it will return None (null) for empty values.
    Empty strings ("") are recognized correctly.
  • str (strings): Anything that is quoted with the quotechar. Default quotechar is ".
    If the string contains a quote, it must be escaped duplicating it. i.e. "HELLO ""WORLD""" decodes to HELLO "WORLD" string.
  • int (integers): an integer with a preceding optional sign.
  • float: any float recognized by Python
  • datetime: a datetime in ISO format (with 'T' or whitespace in the middle), like 2022-02-02 22:02:02
  • date: a date in ISO format, like 2022-02-02
  • time: a time in ISO format, like 22:02:02

If all this parsing attempts fails, a string will be returned, unless strict_mode is set to True. In the latter case, a ValueError exception will be raised.

Implementing your own type recognizer

You can implement your own deserialization by subclassing LightCSV and override the method parse_obj().

For example, suppose we want to recognize hexadecimal integers in the format 0xNNN.... We can implement it this way:

import re
from lightcsv import LightCSV

RE_HEXA = re.compile('0[xX][A-Za-z0-9]+$')  # matches 0xNNNN (hexadecimals)


class CSVHexRecognizer(LightCSV):
    def parse_obj(self, lineno: int, chunk: str):
        if RE_HEXA.match(chunk):
            return int(chunk[2:], 16)
        
        return super().parse_obj(lineno, chunk)

As you can see, you have to override parse_obj(). If your match fails, you have to invoke super() (overridden) parse_obj() method and return its result.


Why

Python built-in CSV module is a bit over-engineered for simple tasks, and one normally doesn't need all bells and whistles. With LightCSV you just open a filename and iterate over its rows.

Decoding None for empty cells is needed very often and can be really cumbersome as the standard csv tries hard to cover many corner-cases (if that's your case, this tool might not be suitable for you).

Owner
Jose Rodriguez
Computer Scientist. Software Engineer. Opinions expressed here are solely my own and not necessarily those of my employer.
Jose Rodriguez
Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Department for International Trade 34 Jan 05, 2023
A Python script to organize your files in a given directory.

File-Organizer A Python script to organize your files in a given directory. It organizes your files based on the file extension and moves them into sp

Imira Randeniya 1 Sep 11, 2022
OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

Brian Maloney 100 Dec 13, 2022
Measure file similarity in a many-to-many fashion

Mesi Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The outpu

GatorEducator 3 Feb 02, 2022
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

João Assalim 1 Oct 10, 2022
Python Sreamlit Duplicate Records Finder Remover

Python-Sreamlit-Duplicate-Records-Finder-Remover Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom w

RONALD KANYEPI 1 Jan 21, 2022
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file w

2 Oct 15, 2022
BOOTH宛先印刷用CSVから色々な便利なリストを作成してCSVで出力するプログラムです。

BOOTH注文リスト作成スクリプト このPythonスクリプトは、BOOTHの「宛名印刷用CSV」から、 未発送の注文 今月の注文 特定期間の注文 を抽出した上で、各注文を商品毎に一覧化したCSVとして出力するスクリプトです。 簡単な使い方 ダウンロード 通常は、Relaseから、booth_ord

hinananoha 1 Nov 28, 2021
File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Tin Tvrtković 2.1k Jan 01, 2023
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

scandir, a better directory iterator and faster os.walk() scandir() is a directory iteration function like os.listdir(), except that instead of return

Ben Hoyt 506 Dec 29, 2022
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule.

simaster.ics Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule. Usage Getting the events.json file from SIMASTER O

Faiz Jazadi 8 Nov 02, 2022
BREP : Binary Search in plaintext and gzip files

BREP : Binary Search in plaintext and gzip files Search large files in O(log n) time using binary search. We support plaintext and Gzipped files. Benc

Arnaud de Saint Meloir 5 Dec 24, 2021
Python file organizer application

Python file organizer application

Pak Maneth 1 Jun 21, 2022
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

1 Feb 06, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021
ZipFly is a zip archive generator based on zipfile.py

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without m

Buzon 506 Jan 04, 2023
Read and write TIFF files

Read and write TIFF files Tifffile is a Python library to store numpy arrays in TIFF (Tagged Image File Format) files, and read image and metadata fro

Christoph Gohlke 346 Dec 18, 2022
A python wrapper for libmagic

python-magic python-magic is a Python interface to the libmagic file type identification library. libmagic identifies file types by checking their hea

Adam Hupp 2.3k Dec 29, 2022