A Python DB-API and SQLAlchemy dialect to Google Spreasheets

Overview

Build Status codecov

Note: shillelagh is a drop-in replacement for gsheets-db-api, with many additional features. You should use it instead. If you're using SQLAlchemy all you need to do:

$ pip uninstall gsheetsdb
$ pip install shillelagh

If you're using the DB API:

# from gsheetsdb import connect
from shillelagh.backends.apsw.db import connect

A Python DB API 2.0 for Google Spreadsheets

This module allows you to query Google Spreadsheets using SQL.

Using this spreadsheet as an example:

A B
1 country cnt
2 BR 1
3 BR 3
4 IN 5

Here's a simple query using the Python API:

from gsheetsdb import connect

conn = connect()
result = conn.execute("""
    SELECT
        country
      , SUM(cnt)
    FROM
        "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
    GROUP BY
        country
""", headers=1)
for row in result:
    print(row)

This will print:

Row(country='BR', sum_cnt=4.0)
Row(country='IN', sum_cnt=5.0)

How it works

Transpiling

Google spreadsheets can actually be queried with a very limited SQL API. This module will transpile the SQL query into a simpler query that the API understands. Eg, the query above would be translated to:

SELECT A, SUM(B) GROUP BY A

Processors

In addition to transpiling, this module also provides pre- and post-processors. The pre-processors add more columns to the query, and the post-processors build the actual result from those extra columns. Eg, COUNT(*) is not supported, so the following query:

SELECT COUNT(*) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"

Gets translated to:

SELECT COUNT(A), COUNT(B)

And then the maximum count is returned. This assumes that at least one column has no NULLs.

SQLite

When a query can't be expressed, the module will issue a SELECT *, load the data into an in-memory SQLite table, and execute the query in SQLite. This is obviously inneficient, since all data has to be downloaded, but ensures that all queries succeed.

Installation

$ pip install gsheetsdb
$ pip install gsheetsdb[cli]         # if you want to use the CLI
$ pip install gsheetsdb[sqlalchemy]  # if you want to use it with SQLAlchemy

CLI

The module will install an executable called gsheetsdb:

$ gsheetsdb --headers=1
> SELECT * FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
country      cnt
---------  -----
BR             1
BR             3
IN             5
> SELECT country, SUM(cnt) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1
pscv8ZXPtg8/" GROUP BY country
country      sum cnt
---------  ---------
BR                 4
IN                 5
>

SQLAlchemy support

This module provides a SQLAlchemy dialect. You don't need to specify a URL, since the spreadsheet is extracted from the FROM clause:

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('gsheets://')
inspector = inspect(engine)

table = Table(
    'https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0',
    MetaData(bind=engine),
    autoload=True)
query = select([func.count(table.columns.country)], from_obj=table)
print(query.scalar())  # prints 3.0

Alternatively, you can initialize the engine with a "catalog". The catalog is a Google spreadsheet where each row points to another Google spreadsheet, with URL, number of headers and schema as the columns. You can see an example here:

A B C
1 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0 1 default
2 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1077884006 2 default

This will make the two spreadsheets above available as "tables" in the default schema.

Authentication

You can access spreadsheets that are shared only within an organization. In order to do this, first create a service account. Make sure you select "Enable G Suite Domain-wide Delegation". Download the key as a JSON file.

Next, you need to manage API client access at https://admin.google.com/${DOMAIN}/AdminHome?chromeless=1#OGX:ManageOauthClients. Add the "Unique ID" from the previous step as the "Client Name", and add https://spreadsheets.google.com/feeds as the scope.

Now, when creating the connection from the DB API or from SQLAlchemy you can point to the JSON file and the user you want to impersonate:

>>> auth = {'service_account_file': '/path/to/certificate.json', 'subject': '[email protected]'}
>>> conn = connect(auth)
Owner
Beto Dealmeida
Writing open source software since 2003.
Beto Dealmeida
Dlsite-doujin-renamer - Dlsite doujin renamer tool with python

dlsite-doujin-renamer Features 支持深度查找带有 RJ 号的文件夹 支持手动选择文件夹或拖拽文件夹到软件窗口 支持在 config

111 Jan 02, 2023
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl

YouGov 1.9k Jan 04, 2023
Simple Python demo app that connects to an Oracle DB.

Cloud Foundry Sample Python Application Connecting to Oracle Simple Python demo app that connects to an Oracle DB. The app is based on the example pro

Daniel Buchko 1 Jan 10, 2022
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 07, 2022
PyMongo - the Python driver for MongoDB

PyMongo Info: See the mongo site for more information. See GitHub for the latest source. Documentation: Available at pymongo.readthedocs.io Author: Mi

mongodb 3.7k Jan 08, 2023
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python.

MySQL with Python Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MySQL database hosted locall

MousamSingh 5 Dec 01, 2021
A fast unobtrusive MongoDB ODM for Python.

MongoFrames MongoFrames is a fast unobtrusive MongoDB ODM for Python designed to fit into a workflow not dictate one. Documentation is available at Mo

getme 45 Jun 01, 2022
PubMed Mapper: A Python library that map PubMed XML to Python object

pubmed-mapper: A Python Library that map PubMed XML to Python object 中文文档 1. Philosophy view UML Programmatically access PubMed article is a common ta

灵魂工具人 33 Dec 08, 2022
Query multiple mongoDB database collections easily

leakscoop Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database ma

bagel 5 Jun 24, 2021
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 171 Dec 18, 2022
Async ORM based on PyPika

PyPika-ORM - ORM for PyPika SQL Query Builder The package gives you ORM for PyPika with asycio support for a range of databases (SQLite, PostgreSQL, M

Kirill Klenov 7 Jun 04, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can giv

Tamil Selvan 8 Dec 12, 2022
A selection of SQLite3 databases to practice querying from.

Dummy SQL Databases This is a collection of dummy SQLite3 databases, for learning and practicing SQL querying, generated with the VS Code extension Ge

1 Feb 26, 2022
Redis Python Client - The Python interface to the Redis key-value store.

redis-py The Python interface to the Redis key-value store. Installation | Contributing | Getting Started | Connecting To Redis Installation redis-py

Redis 11k Jan 08, 2023
Python cluster client for the official redis cluster. Redis 3.0+.

redis-py-cluster This client provides a client for redis cluster that was added in redis 3.0. This project is a port of redis-rb-cluster by antirez, w

Grokzen 1.1k Jan 05, 2023
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit

G-jon FujiYama 1 Jan 09, 2022
Google Cloud Client Library for Python

Google Cloud Python Client Python idiomatic clients for Google Cloud Platform services. Stability levels The development status classifier on PyPI ind

Google APIs 4.1k Jan 01, 2023
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

Iuliia Volkova 95 Jan 05, 2023
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023