A common, beautiful interface to tabular data, no matter the format

Overview

rows

Join the chat at https://gitter.im/turicas/rows Current version at PyPI Downloads per month on PyPI Supported Python Versions Software status License: LGPLv3

No matter in which format your tabular data is: rows will import it, automatically detect types and give you high-level Python objects so you can start working with the data instead of trying to parse it. It is also locale-and-unicode aware. :)

Want to learn more? Read the documentation (or build and browse the docs locally by running make docs-serve after installing requirements-development.txt).

Installation

The easiest way to getting the hands dirty is install rows, using pip.

PyPI

pip install rows

For another ways to instal refer to the Installation section documentation.

Contribution start guide

The preferred way to start contributing for the project is creating a virtualenv (you can do by using virtualenv, virtualenvwrapper, pyenv or whatever tool you'd like).

Create the virtualenv:

mkvirtualenv rows

Install all plugins' dependencies:

pip install --editable .[all]

Install development dependencies:

pip install -r requirements-development.txt
Comments
  • OverflowError

    OverflowError

    Após instalar as dependências requeridas para-o pacote socios-brasil, ao tentar descompactar como indicado, obtenho o erro abaixo:

    Traceback (most recent call last):
     File "extract_dump.py", line 27, in <module> 
        import rows
     File "C:\Users\milcent\AppData\Local\Continuum\Anaconda3\lib\site-packages\row s\__init__.py", line 22, in <module>
        import rows.plugins as plugins
     File "C:\Users\milcent\AppData\Local\Continuum\Anaconda3\lib\site-packages\row s\plugins\__init__.py", line 20, in <module>
        from . import plugin_csv as csv # NOQA
     File "C:\Users\milcent\AppData\Local\Continuum\Anaconda3\lib\site-packages\row s\plugins\plugin_csv.py", line 34, in <module>
        unicodecsv.field_size_limit(sys.maxsize) 
    OverflowError: Python int too large to convert to C long
    

    Rodando em Windows 7, Anaconda 64 bits, Python 3.6. Grato, Marcel Milcent

    opened by milcent 13
  • PDF Plugin

    PDF Plugin

    Create an algorithm to automatically extract tables from PDFs (available in text format). Could use pdftables, but the code is not up-to-date, does not work with Python3 etc.

    enhancement plugin 
    opened by turicas 7
  • Converter PDF x TXT

    Converter PDF x TXT

    Bom dia, estou tentando converter um arquivo pdf escaneado para texto (o pdf contém tabelas). Consegui instalar a biblioteca rows e as dependências rows[pdf], rows[cli]. Quando eu tento rodar o código em prompt command: rows pdf-to-text teste.pdf result.txt Eu tenho o seguinte erro: image

    Alguma ideia do que possa ser o problema?

    opened by Danielydsm 6
  • Autodetect delimiter in CSV files

    Autodetect delimiter in CSV files

    Currently the import_from_csv method have the parameter 'delimiter' that assumes ',' as default, but sometimes we don't know what is the delimiter and need it autodetect. Specially usefull in case of CSV files generated in MS Excell that uses ';' as delimiter.

    A quick and dirty possibility to make this works is counting the number of times ',', ';' and 'tab' is used in the file and assumes as delimiter the most used.

    enhancement help wanted plugin 
    opened by jeanferri 6
  • OverflowError: Python int too large to convert to C long

    OverflowError: Python int too large to convert to C long

    Bom dia!

    Estou aprendendo Python, então este pode ser um erro bem simples de resolver, mesmo assim não faço ideia do que pode ser feito:

    Ao tentar importar o rows aparece a mensagem do título.

    duplicate 
    opened by tbmpereira 5
  • Text plugin is not working on `rows convert`

    Text plugin is not working on `rows convert`

    The file cha-de-bebe.txt is not being read correctly on the command line (try rows print cha-de-bebe.txt or rows convert cha-de-bebe.txt cha-de-bebe.csv) -- but it was generated correctly using rows print http://some-url/ > cha-de-bebe.txt.

    @jsbueno could you please help checking it? I think this bug started after your PR #270 .

    bug 
    opened by turicas 5
  • locale.Error: unsupported locale setting

    locale.Error: unsupported locale setting

    ======================================================================
    ERROR: test_DecimalField (tests.tests_fields.FieldsTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/tests/tests_fields.py", line 203, in test_DecimalField
        with rows.locale_context(locale_name):
      File "/usr/lib64/python3.5/contextlib.py", line 59, in __enter__
        return next(self.gen)
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/rows/localization.py", line 23, in locale_context
        locale.setlocale(category, name)
      File "/usr/lib64/python3.5/locale.py", line 594, in setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting
    
    ======================================================================
    ERROR: test_FloatField (tests.tests_fields.FieldsTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/tests/tests_fields.py", line 171, in test_FloatField
        with rows.locale_context(locale_name):
      File "/usr/lib64/python3.5/contextlib.py", line 59, in __enter__
        return next(self.gen)
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/rows/localization.py", line 23, in locale_context
        locale.setlocale(category, name)
      File "/usr/lib64/python3.5/locale.py", line 594, in setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting
    
    ======================================================================
    ERROR: test_IntegerField (tests.tests_fields.FieldsTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/tests/tests_fields.py", line 144, in test_IntegerField
        with rows.locale_context(locale_name):
      File "/usr/lib64/python3.5/contextlib.py", line 59, in __enter__
        return next(self.gen)
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/rows/localization.py", line 23, in locale_context
        locale.setlocale(category, name)
      File "/usr/lib64/python3.5/locale.py", line 594, in setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting
    
    ======================================================================
    ERROR: test_PercentField (tests.tests_fields.FieldsTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/tests/tests_fields.py", line 250, in test_PercentField
        with rows.locale_context(locale_name):
      File "/usr/lib64/python3.5/contextlib.py", line 59, in __enter__
        return next(self.gen)
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/rows/localization.py", line 23, in locale_context
        locale.setlocale(category, name)
      File "/usr/lib64/python3.5/locale.py", line 594, in setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting
    
    ======================================================================
    ERROR: test_locale_context (tests.tests_localization.LocalizationTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/tests/tests_localization.py", line 41, in test_locale_context
        with locale_context(name):
      File "/usr/lib64/python3.5/contextlib.py", line 59, in __enter__
        return next(self.gen)
      File "/home/brain/git/fedora/python-rows/rows-0.3.0/rows/localization.py", line 23, in locale_context
        locale.setlocale(category, name)
      File "/usr/lib64/python3.5/locale.py", line 594, in setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting
    
    opened by ignatenkobrain 5
  • Porting rows to Python3

    Porting rows to Python3

    This is a work in progress.

    I could make all tests pass on Python3, but 3 are broken on Python2 because of something I can't find yet on the type identification system.

    This PR is just to share it with you. Maybe your familiarity with the code can help fixing the tests.

    []'s!

    opened by henriquebastos 5
  • UserWarning: Call to deprecated function or class get_active_sheet

    UserWarning: Call to deprecated function or class get_active_sheet

    Hi, when I build package for Debian, debhelper tools runs pybuild, showing this warnings [1] I use the lastest source: git20151115.837b41.

    Is there something here or other has the same problem? thanks.

    [1] pybuild --test --test-nose -i python{version} -p 2.7 --dir . I: pybuild base:184: cd /pkgs/pkg-rows/rows-0.1.1+git20151115.837b41/.pybuild/pythonX.Y_2.7/build; python2.7 -m nose tests ...................................................................................................../usr/lib/python2.7/dist-packages/openpyxl/workbook/workbook.py:102: UserWarning: Call to deprecated function or class get_active_sheet (Use the .active property). def get_active_sheet(self): /usr/lib/python2.7/dist-packages/openpyxl/workbook/workbook.py:102: UserWarning: Call to deprecated function or class get_active_sheet (Use the .active property). def get_active_sheet(self): ./usr/lib/python2.7/dist-packages/openpyxl/workbook/workbook.py:102: UserWarning: Call to deprecated function or class get_active_sheet (Use the .active property). def get_active_sheet(self): ./usr/lib/python2.7/dist-packages/openpyxl/workbook/workbook.py:102: UserWarning: Call to deprecated function or class get_active_sheet (Use the .active property). def get_active_sheet(self):

    ..........................

    Ran 129 tests in 1.936s

    OK

    opened by kretcheu 5
  • Add sphinx documentation

    Add sphinx documentation

    Hello dear reviewer,

    I basically did three things:

    • Add the sphinx to the requirements-development.txt
    • Create a basic documentation, based on the Readme, with few improvements i've made.
    • Move some basic project information (intro and archtecture) to the init.py of the rows module

    I think the Sphinx doc can also be used as a website, and maybe can be hosted at github pages.

    []'s I hope this will be usefull! :)

    opened by raphapassini 5
  • Could not find import_from_pdf function

    Could not find import_from_pdf function

    I need to import data from pdf and found this example: https://gist.github.com/turicas/6b9ca83dcd531a6cd4fd87ced2a28c70

    But I was unable to run it, since the import_from_pdf is not available to me.

    I have already run the command: pip install rows[all]

    Is pdf format no longer supported?

    opened by marcellalves 4
  • New release on pypi

    New release on pypi

    I started using the "rows" lib today, and I've lost several hours of work because of a bug on empty cells in ods input. Here is my story.

    I was learning/discovering the "rows" lib with an ODS file, and I fall across a strange behavior. Of course, I thought it was because I didn't use the lib properly : so I tried all possible options, searched on the Internet... etc. After several hours, I eventually tried the same code with an equivalent XLSX file and I found out that the behavior was different ! So I realized that I had found a bug on my first day of use of the rows lib !

    I decided that I should report the bug. I took the time to write a script to illustrate my bug report. I was using rows 0.4.1 from pypi, but, before creating the bug report on github, I thought I should check if the bug is still present in the "develop" branch... and my script shows that the bug is fixed in the "develop" branch !

    Release 0.4.1 is dated Feb 14, 2019... almost 4 years old ! There has been 210 commits since 0.4.1 ; among these 210 commits, I counted about 45 fixes. While counting the commit messages with a fix message, I found the commit that fixes my bug: issue #320 fixed on Match 27 2019 in this commit https://github.com/turicas/rows/commit/c569f9415f2c76b2f6e9afbe1d748946e759711f

    So, in December 2022, some users are wasting hours because of a bug that was found and fixed 3,5 years ago :-( No comment !

    So, please, push a new release on pypi !

    opened by alexis-via 2
  • Replace unicodecsv by standard csv module

    Replace unicodecsv by standard csv module

    unicodecsv is not maintained since a while now [1]. It was preferred over standard csv because of the unicode support. Now that Python3 csv module [2] supports it, let's use it.

    For more context, we hit issues while rebuilding uncicodecsv during Fedora Python3.11 mass rebuild [3][4].

    [1] https://github.com/jdunck/python-unicodecsv [2] https://docs.python.org/3/library/csv.html [3] https://copr.fedorainfracloud.org/coprs/g/python/python3.11/package/python-unicodecsv/ [4] https://bugzilla.redhat.com/show_bug.cgi?id=2021938

    opened by jcapiitao 1
  • NameError: name 'obj' is not defined

    NameError: name 'obj' is not defined

    Esse erro rolou quando fui tentar usar o método closest_same_column em rows.plugins.pdf image

    Aparentemente aqui no código está faltando a parte em que pegamos o o objeto que tem o valor passado como parâmetro para trabalharmos com ele (e aparentemente isso também acontece com o outro método closest_same_line

    opened by dehatanes 0
  • Python 3.10: cannot import name 'Iterator' from 'collections'

    Python 3.10: cannot import name 'Iterator' from 'collections'

    File "/data/data/com.termux/files/usr/lib/python3.10/site-packages/rows/plugins/utils.py", line 20, in <module> 
    from collections import Iterator, OrderedDict            
    ImportError: cannot import name 'Iterator' from 'collections'
    

    Maybe this will be fix:

    try:
        from collections.abc import Iterator
    except ImportError:
        from collections import Iterator
    
    opened by fagci 0
  • [pgimport] Option to do not store values as NULL

    [pgimport] Option to do not store values as NULL

    NULL values can be confusing when analyzing data and there will be some cases where we prefer to add empty values as empty strings instead of NULL. The function pgimport (and the CLI equivalent) should have an option to deal with this scenario.

    enhancement cli plugin utils 
    opened by turicas 0
Releases(v0.4.1)
Owner
Álvaro Justen
Free/libre software hacker, hypnotist, remote worker, teacher, coffee lover/roaster
Álvaro Justen
A Lego Mindstorm robot for dealing out cards based on a birds-eye view of a poker table and given ArUco fiducial tags.

A Lego Mindstorm robot for dealing out cards based on a birds-eye view of a poker table and given ArUco fiducial tags.

4 Dec 06, 2021
An early stage integration of Hotwire Turbo with Django

Note: This is not ready for production. APIs likely to change dramatically. Please drop by our Slack channel to discuss!

Hotwire for Django 352 Jan 06, 2023
Open slidebook .sldy files in Python

Work in progress slidebook-python Open slidebook .sldy files in Python To install slidebook-python requires Python = 3.9 pip install slidebook-python

The Institute of Cancer Research 2 May 04, 2022
「📖」Tool created to extract metadata from a domain

Metafind is an OSINT tool created with the aim of automating the search for metadata of a particular domain from the search engine known as Google.

9 Dec 28, 2022
Dungeon Dice Rolls is an aplication that the user can roll dices (d4, d6, d8, d10, d12, d20 and d100) and store the results in one of the 6 arrays.

Dungeon Dice Rolls is an aplication that the user can roll dices (d4, d6, d8, d10, d12, d20 and d100) and store the results in one of the 6 arrays.

Bracero 1 Dec 31, 2021
En este repositorio realizaré la tarea del laberinto.

Laberinto Perfil de GitHub del autor de este proyecto: @jmedina28 En este repositorio queda resuelta la composición de un laberinto 5x5 con sus muros

Juan Medina 1 Dec 11, 2021
script buat mengcrack

setan script buat mengcrack cara install $ pkg install upgrade && pkg update $ pkg install python $ pkg install git $ pip install requests $ pip insta

1 Nov 03, 2021
Python Cheat Sheet

Introduction Pysheeet was created with intention of collecting python code snippets for reducing coding hours and making life easier and faster. Any c

CHANG-NING TSAI 7.5k Dec 30, 2022
A comparison of mesh generators.

This repository creates meshes of the same domains with multiple mesh generators and compares the results.

Nico Schlömer 29 Dec 12, 2022
A timer for bird lovers, plays a random birdcall while displaying its image and info.

Birdcall Timer A timer for bird lovers. Siriema hatchling by Junior Peres Junior Background My partner needed a customizable timer for sitting and sta

Marcelo Sanches 1 Jul 08, 2022
You'll learn about Iterators, Generators, Closure, Decorators, Property, and RegEx in detail with examples.

07_Python_Advanced_Topics Introduction 👋 In this tutorial, you will learn about: Python Iterators: They are objects that can be iterated upon. In thi

Milaan Parmar / Милан пармар / _米兰 帕尔马 252 Dec 23, 2022
Fastest Semantle solver this side of the Mississippi

semantle Fastest Semantle solver this side of the Mississippi. Roughly 3 average turns to win Measured against (part of) the word2vec-google-news-300

Frank Odom 8 Dec 26, 2022
Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Data Loader Plugin - Python Table of Content (ToC) Data Loader Plugin - Python Table of Content (ToC) Overview References Python module Python virtual

Cloud Helpers 2 Jan 10, 2022
Python library to natively send files to Trash (or Recycle bin) on all platforms.

Send2Trash -- Send files to trash on all platforms Send2Trash is a small package that sends files to the Trash (or Recycle Bin) natively and on all pl

Andrew Senetar 224 Jan 04, 2023
Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Nikolaos Avouris 2 Dec 05, 2021
NORETURN is an esoteric programming language, based around the idea of not going back

NORETURN NORETURN is an esoteric programming language, based around the idea of not going back Concept Program coded in noreturn runs over one array,

1 Dec 15, 2021
Python version of RocketLeague-Dropshot-Calculated-shot

Python version of RocketLeague-Dropshot-Calculated-shot. This is just to demo around and a tool I used to develop the actual plugin.

JareBear 1 Jan 14, 2022
Awesome Cheatsheet

Awesome Cheatsheet List of useful cheatsheets Inspired by @sindresorhus awesome and improved by these amazing contributors. If you see a link here is

detailyang 6.5k Jan 07, 2023
OB_Template is a vault template reference for using Obsidian.

Obsidian Template OB_Template is a vault template reference for using Obsidian. If you've tested out Obsidian. and worked through the "Obsidian Help"

323 Dec 27, 2022
Werkzeug has a debug console that requires a pin. It's possible to bypass this with an LFI vulnerability or use it as a local privilege escalation vector.

Werkzeug Debug Console Pin Bypass Werkzeug has a debug console that requires a pin by default. It's possible to bypass this with an LFI vulnerability

Wyatt Dahlenburg 23 Dec 17, 2022