A simple XLSX/CSV reader - to dictionary converter

Overview

Test Python package codecov Upload Python Package to PyPI PythonVersions Sourcery Black Snyk Downloads Twitter Follow

sheet2dict

A simple XLSX/CSV reader - to dictionary converter

Installing

To install the package from pip, first run:

python3 -m pip install --no-cache-dir sheet2dict

Required pip packages for sheet2doc: csv, openpyxl

Usage

This library has 2 main features: reading a spreadsheet files and converting them to array of python dictionaries.

- XLSX

Use xlsx_to_dict() method when converting form spreadsheets.
Supported file formats for spreadsheets are: .xlsx,.xlsm,.xltx,.xltm
Spreadsheets with multiple worksheets are supported. If no sheet is specified, the active sheet is selected. If there is only one sheet, it is considered active.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Convert active sheet (without specifying sheet name)
ws.xlsx_to_dict(path='inventory.xlsx')

# Convert the 'Main Warehouse' sheet of the 'inventory.xslx' spreadsheet file.
ws.xlsx_to_dict(path='inventory.xlsx', select_sheet='Main Warehouse')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

You can parse data when worksheet is an object

# Import the library
from sheet2dict import Worksheet

# Example: read spreadsheet as object
path = 'inventory.xlsx'
xlsx_file = open(path, 'rb')
xlsx_file = BytesIO(xlsx_file.read())

# Parse spreadsheet from object
ws = Worksheet()
ws.xlsx_to_dict(path=xlsx_file)
print(ws.header)

- CSV

Use csv_to_dict() method when converting form csv.
CSV is a format with many variations, better handle encodings and delimiters on user side and not within module itself.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Read CSV file
csv_file = open('inventory.csv', 'r', encoding='utf-8-sig')

# Convert 
ws.csv_to_dict(csv_file=csv_file, delimiter=';')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

- Other functions

Worksheet object.header returns first row with the data in a spreadsheet

>> ws.header {'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}">
Python 3.9.1
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.header
{'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}

Worksheet object.sanitize_sheet_items removes None or empty dictionary keys from sheet_items

>> ws.sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'} ] >>> ws.sanitize_sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'} ]">
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'}
]

>>> ws.sanitize_sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'}
]

Contributing and Code of Conduct

Contributing to sheet2dict

As an open source project, sheet2dict welcomes contributions of many forms.
Please read and follow our Contributing to sheet2dict

Contributors:

  • Thanks to 白一百 (bái-yī-bǎi) for making sheet2dict work with multi-sheet Excel files.

Code of Conduct

As a contributor, you can help us keep the sheet2dict project open and inclusive.
Please read and follow our Code of Conduct

You might also like...
SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful.

Simple Codes That You Might Need SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful. In

Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

 Simple yet powerful CAD (Computer Aided Design) library, written with Python.
Simple yet powerful CAD (Computer Aided Design) library, written with Python.

Py-MADCAD it's time to throw parametric softwares out ! Simple yet powerful CAD (Computer Aided Design) library, written with Python. Installation

A simple USI Shogi Engine written in python using python-shogi.

Revengeshogi My attempt at creating a USI Shogi Engine in python using python-shogi. Current State of Engine Currently only generating random moves us

A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework

Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge

A flask extension using pyexcel to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.

Flask-Excel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenu

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

mcc-mnc.com-webscraper Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX) A Python script for web scraping mcc-mnc.com Link: mcc

LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

DB Drive CSV This is app is can be used to access CSV file as JSON from Google Drive. How To Use Create file/ upload file to Google Drive There's 2 fi

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.
É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.

API de rastreamento de pacotes É uma API feita em Python e Flask que pesquisa informações de rastreamento de pacotes em uma tabela .xlsx e retorna o r

Transfers a image file(.png) to an Excel file(.xlsx)
Transfers a image file(.png) to an Excel file(.xlsx)

Transfers a image file(.png) to an Excel file(.xlsx)

An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

Comments
  • Empty cells in xlsx become the non-empty string

    Empty cells in xlsx become the non-empty string "None" in the dictionary

    Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

    One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

    From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

    https://github.com/Pytlicek/sheet2dict/blob/93bf731d327d620755ffd5585aedb0b23a17a6a8/sheet2dict/main.py#L30-L34

    I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

    I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

    enhancement 
    opened by JoshuaCapel 5
  • Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    This library is awesome! I'm using it a lot since it makes working with Excel files very simple.

    I have a proposal: Allow the user to specify the sheet they want to modify.

    I write a lot of functional programming and I can't remember if it's possible to overwrite a class function at import, so instead I've modified the sheet2dict to return a specified sheet when openpyxl opens the file.

    in main.py:

    def xlsx_to_dict(self, path, select_sheet=None):
            """
            Read a Worksheet and return it as array of dictionaries
            :param self: Worksheet Object
            :param path: Path to XLSX file
            :return: Array of rows as dictionaries
            """
            book = load_workbook(path)
            if select_sheet == None:
                  sheet = book.active
            else:
                  sheet = book[select_sheet]
    

    The user would then read an Excel file like this:

    ws =Worksheet()
    ws.xlsx_to_dict(path=<insert path>, select_sheet='<sheetname>'
    

    I don't know if this is the most elegant way to select a sheet, but it works for me. I suppose to import all worksheets by default - I think the maximum is 32 - into a larger dictionary, <edit I didn't complete my thought>, that there could be a clever way to put dictionaries in dictionaries, but this might be too convoluted since I assume most users will just use one sheet. Another option is just to continue creating Worksheet() objects with different names and sheets selected.

    enhancement 
    opened by bai-yi-bai 3
  • Add an option to parse all worksheets in one function call

    Add an option to parse all worksheets in one function call

    Added an option to parse all sheets from a workbook at once while keeping the way existing features worked untouched.

    Had to modify sheet_items to a dict with sheet names as keys and sheet's objects as values.

    enhancement 
    opened by 68Kamil68 2
  • Update README.md

    Update README.md

    Changed "the first sheet" to "active sheet" to avoid confusion if someone saves a spreadsheet and the active worksheet is, fe. second in line. Also, what is considered an active worksheet when there is only one worksheet in the spreadsheet. Add an example with and without the "select_sheet" option.

    documentation 
    opened by Pytlicek 1
Releases(0.1.5)
Owner
Tomas Pytel
PyConSK Organizer, Python Developer & DevOps
Tomas Pytel
A repository of links with advice related to grad school applications, research, phd etc

A repository of links with advice related to grad school applications, research, phd etc

Shaily Bhatt 946 Dec 30, 2022
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium

Maria Patterson 26 Nov 15, 2022
LotteryBuyPredictionWebApp - Lottery Purchase Prediction Model

Lottery Purchase Prediction Model Objective and Goal Predict the lottery type th

Wanxuan Zhang 2 Feb 14, 2022
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

applied-ml Curated papers, articles, and blogs on data science & machine learning in production. ⚙️ Figuring out how to implement your ML project? Lea

Eugene Yan 22.1k Jan 03, 2023
[Unofficial] Python PEP in EPUB format

PEPs in EPUB format This is a unofficial repository where I stock all valid PEPs in the EPUB format. Repository Cloning git clone --recursive Mickaël Schoentgen 9 Oct 12, 2022

This repository outlines deploying a local Kubeflow v1.3 instance on microk8s and deploying a simple MNIST classifier using KFServing.

Zero to Inference with Kubeflow Getting Started This repository houses all of the tools, utilities, and example pipeline implementations for exploring

Ed Henry 3 May 18, 2022
Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents.

Iury de oliveira gomes figueiredo 60 Nov 16, 2022
🐱‍🏍 A curated list of awesome things related to Hugo themes.

awesome-hugo-themes Automated deployment @ 2021-10-12 06:24:07 Asia/Shanghai &sorted=updated Theme Author License GitHub Stars Updated Blonde wamo MIT

13 Dec 12, 2022
Leetcode Practice

LeetCode Practice Description This is my LeetCode Practice. Visit LeetCode Website for detailed question description. The code in this repository has

Leo Hsieh 75 Dec 27, 2022
Paper and Code for "Curriculum Learning by Optimizing Learning Dynamics" (AISTATS 2021)

Curriculum Learning by Optimizing Learning Dynamics (DoCL) AISTATS 2021 paper: Title: Curriculum Learning by Optimizing Learning Dynamics [pdf] [appen

Tianyi Zhou 15 Dec 06, 2022
The project that powers MDN.

Kuma Kuma is the platform that powers MDN (developer.mozilla.org) Development Code: https://github.com/mdn/kuma Issues: P1 Bugs (to be fixed ASAP) P2

MDN Web Docs 1.9k Dec 26, 2022
the project for the most brutal and effective language learning technique

- "The project for the most brutal and effective language learning technique" (c) Alex Kay The langflow project was created especially for language le

Alexander Kaigorodov 7 Dec 26, 2021
This is the repository that includes the code material for the ESweek 2021 for the Education Class Lecture A3 "Learn to Drive (and Race!) Autonomous Vehicles"

ESweek2021_educationclassA3 This is the repository that includes the code material for the ESweek 2021 for the Education Class Lecture A3 "Learn to Dr

F1TENTH Autonomous Racing Community 29 Dec 06, 2022
A Sublime Text plugin to select a default syntax dialect

Default Syntax Chooser This Sublime Text 4 plugin provides the set_default_syntax_dialect command. This command manipulates a syntax file (e.g.: SQL.s

3 Jan 14, 2022
Example Python code for running the mango-explorer marketmaker

🥭 Mango Explorer 📖 Introduction This guide will show you how to load and run a customisable marketmaker that runs on Mango Markets using the mango-e

Blockworks Foundation 2 Apr 11, 2022
Material for the ros2 crash course

Material for the ros2 crash course

Emmanuel Dean 1 Jan 22, 2022
Tips for Writing a Research Paper using LaTeX

Tips for Writing a Research Paper using LaTeX

Guanying Chen 727 Dec 26, 2022
A set of Python libraries that assist in calling the SoftLayer API.

SoftLayer API Python Client This library provides a simple Python client to interact with SoftLayer's XML-RPC API. A command-line interface is also in

SoftLayer 155 Sep 20, 2022
Project created to help beginner programmers to study, despite the lack of internet!

Project created to help beginner programmers to study, despite the lack of internet!

Dev4Dev 2 Oct 25, 2021
Quilt is a self-organizing data hub for S3

Quilt is a self-organizing data hub Python Quick start, tutorials If you have Python and an S3 bucket, you're ready to create versioned datasets with

Quilt Data 1.2k Dec 30, 2022