Geocode rows in a SQLite database table

Overview

geocode-sqlite

PyPI Changelog Tests License

Geocode rows from a SQLite table

Installation

Install this tool using pip or pipx:

# install inside a virtualenv
pip install geocode-sqlite

# install globally
pipx install geocode-sqlite

Usage

Let's say you have a spreadsheet with addresses in it, and you'd like to map those locations. First, create a SQLite database and insert rows from that spreadsheet using sqlite-utils.

sqlite-utils insert data.db data data.csv --csv

Now, geocode it using OpenStreetMap's Nominatum geocoder.

geocode-sqlite nominatum data.db data \
 --location="{address}, {city}, {state} {zip}" \
 --delay=1 \
 --user-agent="this-is-me"

In the command above, you're using Nominatum, which is free and only asks for a unique user agent (--user-agent).

This will connect to a database (data.db) and read all rows from the table data (skipping any that already have both a latitude and longitude column filled).

You're also telling the geocoder how to extract a location query (--location) from a row of data, using Python's built-in string formatting, and setting a rate limit (--delay) of one request per second.

For each row where geocoding succeeds, latitude and longitude will be populated. If you hit an error, or a rate limit, run the same query and pick up where you left off.

Under the hood, this package uses the excellent geopy library, which is stable and thoroughly road-tested. If you need help understanding a particular geocoder's options, consult geopy's documentation.

Supported Geocoders

The CLI currently supports these geocoders:

  • bing
  • googlev3
  • mapquest (and open-mapquest)
  • nominatum

More will be added soon.

Common arguments and options

Each geocoder needs to know where to find the data it's working with. These are the first two arguments:

  • database: a path to a SQLite file, which must already exist
  • table: the name of a table, in that database, which exists and has data to geocode

From there, we have a set of options passed to every geocoder:

  • location: a string format that will be expanded with each row to build a full query, to be geocoded
  • delay: a delay between each call (some services require this)
  • latitude: latitude column name
  • longitude: longitude column name

Each geocoder takes additional, specific arguments beyond these, such as API keys. Again, geopy's documentation is an excellent resource.

Python API

The command line interface aims to support the most common options for each geocoder. For more find-grained control, use the Python API.

As with the CLI, this assumes you already have a SQLite database and a table of location data.

from geocode_sqlite import geocode_table
from geopy.geocoders import Nominatum

# create a geocoder instance, with some extra options
nominatum = Nominatum(user_agent="this-is-me", domain="nominatum.local.dev", scheme="http")

# assuming our database is in the same directory
count = geocode_table("data.db", "data", query_template="{address}, {city}, {state} {zip}")

# when it's done
print(f"Geocoded {count} rows")

Any geopy geocoder can be used with the Python API.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd geocode-sqlite
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Please remember that this library is mainly glue code between other well-tested projects, specifically: click, geopy and sqlite-utils. Tests should focus on making sure those parts fit together correctly. We can assume the parts themselves already work.

To that end, there is a test geocoder included: geocode_sqlite.testing.DummyGeocoder. That geocoder works with an included dataset of In-N-Out Burger locations provided by AllThePlaces. It works like a normal GeoPy geocoder, except it will only return results for In-N-Out locations using the included database.

Comments
  • fixing

    fixing "ImportError" in the 6 lines and 7

    hi Chris Amico. I Using GNU/Linux os And in GNU/Linux, I can not run the cli.py file. But with a small change I made, this problem was solved. If you like, please merging.

    opened by EmadDeve20 4
  • Add OpenCage option to CLI

    Add OpenCage option to CLI

    I'm a contractor for OpenCage geocoder. We'd like to add an opencage option to the CLI. OpenCage is already available in geopy so I have just copied the existing geocoder code with the relevant options.

    • Added OpenCage as an option to CLI
    • Added OpenCage to README
    • Added OpenCage test to Makefile
    opened by sbscully 2
  • Better argument ergonomics

    Better argument ergonomics

    Looking at this with fresh eyes, this whole bit is dumb:

    Note the order of options: There are two sets of options we need to pass.

    The first concerns the data we're geocoding. We need to say where our database is and what table we're using, and optionally, how to extract a location query.

    Then, we need to say what geocoder we're using, and pass in any options needed to initalize it. This will be different for each geocoder we want to use.

    Ideally, it should be something like this:

    geocode-sqlite nominatum  data.db data \
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me"
    

    Maybe this would be ok:

    geocode-sqlite
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me" \
    nominatum data.db data
    

    But the first way is definitely better.

    opened by eyeseast 2
  • Spatialite helper?

    Spatialite helper?

    Would it be worth helping convert lat/lng fields to a spatial index? https://docs.datasette.io/en/stable/spatialite.html#spatial-indexing-latitude-longitude-columns

    Here's the whole code block:

    import sqlite3
    conn = sqlite3.connect('museums.db')
    # Lead the spatialite extension:
    conn.enable_load_extension(True)
    conn.load_extension('/usr/local/lib/mod_spatialite.dylib')
    # Initialize spatial metadata for this database:
    conn.execute('select InitSpatialMetadata(1)')
    # Add a geometry column called point_geom to our museums table:
    conn.execute("SELECT AddGeometryColumn('museums', 'point_geom', 4326, 'POINT', 2);")
    # Now update that geometry column with the lat/lon points
    conn.execute('''
        UPDATE events SET
        point_geom = GeomFromText('POINT('||"longitude"||' '||"latitude"||')',4326);
    ''')
    # Now add a spatial index to that column
    conn.execute('select CreateSpatialIndex("museums", "point_geom");')
    # If you don't commit your changes will not be persisted:
    conn.commit()
    conn.close()
    

    I don't know if it belongs in this package or not.

    question 
    opened by eyeseast 2
  • Use rate limiting by default?

    Use rate limiting by default?

    It's generally bad practice to throw a few hundred geocoding requests at a free service. Given that, is it worth setting a default delay of one second, and let people explicitly turn it off if needed?

    opened by eyeseast 2
  • Geocoder CLI supprt

    Geocoder CLI supprt

    Here are the geocoders currently supported by the CLI. I want to add the most common ones (which I use) first:

    • [x] Google
    • [x] MapBox
    • [x] OpenStreetMap
    • [x] MapQuest
    • [x] Bing
    • [ ] Geocode.Earth
    • [ ] Geocodio
    • [ ] Pelias
    • [ ] MapTiler

    What else?

    help wanted good first issue 
    opened by eyeseast 2
  • fix failing tests that exepect an envvar

    fix failing tests that exepect an envvar

    if MAPBOX_API_KEY is not set in the environment, test_pass_kwargs fails.

    noticed this when trying to run tests locally. the D prefix defers to values already set in the environment.

    you may not want to introduce a new dependency to your project (namely pytest-env), and I would understand that, but just thought I'd bring this to your attention with a suggested fix. take it or leave it! this is a very useful library. thanks!

    opened by noslouch 1
  • GeoJSON and SpatiaLite support

    GeoJSON and SpatiaLite support

    Closes #22 Closes #24 Closes #26

    Passing a --geojson flag will store results as a GeoJSON geometry, instead of in latitude and longitude columns.

    Using --spatialite will add a geometry column and store results as a SpatiaLite binary.

    This should make it easier to work with datasette-geojson and datasette-geojson-map.

    opened by eyeseast 1
  • Save results as geojson?

    Save results as geojson?

    Right now, results are saved in two columns: latitude and longitude.

    Now that datasette-geojson and datasette-geojson-map exist, it would be nice to have this plugin feed into those.

    Maybe it's an option: sqlite-geocode data.db table ... --geojson

    That would be backwards compatible at least.

    question 
    opened by eyeseast 1
  • Capture full geocoding results?

    Capture full geocoding results?

    opened by eyeseast 1
  • Need a progress bar

    Need a progress bar

    Especially when rate limiting, which is now the default, geocoding a table is slow. With a one-second delay, a 300 row table takes five minutes to finish, at best.

    opened by eyeseast 1
  • Consolidate code to geocode a list

    Consolidate code to geocode a list

    I do this in three places:

    • In geocode_table, which I wrote first
    • In the CLI
    • In geocode_list, which I needed for the progress bar

    This whole block is basically repeated: https://github.com/eyeseast/geocode-sqlite/blob/main/geocode_sqlite/utils.py#L50-L68. Part of the issue is that in the Python API, I'm using log.info and in the CLI I'm using click.echo. Maybe I can abstract that and pass in a print function.

    opened by eyeseast 0
  • Async support

    Async support

    Geopy has a set of tools to help with async here: https://geopy.readthedocs.io/en/latest/#async-mode

    Should definitely use those, especially when thinking about Datasette integration.

    opened by eyeseast 2
Releases(v0.8.2)
  • v0.8.2(Nov 7, 2022)

    What's Changed

    • Require requests. More forgiving timeout. by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/43

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Nov 6, 2022)

    What's Changed

    • Raw results in geocode_table, too by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/42

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Nov 6, 2022)

    What's Changed

    • Just a typo! by @shaver in https://github.com/eyeseast/geocode-sqlite/pull/40
    • Capture raw geocoding results by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/41

    New Contributors

    • @shaver made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/40

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.7.0...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 31, 2022)

    The OpenCage geocoding service is now supported. Thanks @sbscully. This library is now tested on Python 3.11.

    What's Changed

    • Add OpenCage option to CLI by @sbscully in https://github.com/eyeseast/geocode-sqlite/pull/39

    New Contributors

    • @sbscully made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/39

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.1...v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Mar 16, 2022)

    What's Changed

    • if no bbox options are passed, value is None by @noslouch in https://github.com/eyeseast/geocode-sqlite/pull/30
    • Pass --api-key to mapbox test by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/33

    New Contributors

    • @noslouch made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/30

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2022)

    What's Changed

    • GeoJSON and SpatiaLite support by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/23

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.2...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Mar 1, 2022)

    What's Changed

    • Require click >= 7.0 to hide commands by @1-Byte in https://github.com/eyeseast/geocode-sqlite/pull/20
    • Fix error with rowid tables by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/27

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.1...v0.5.2

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 20, 2021)

  • v0.3.0(Apr 4, 2021)

  • v0.2.0(Apr 3, 2021)

    Arguments and options now follow a much more sensible order, following the example of other thing-to-sqlite tools. For example:

    geocode-sqlite nominatum geo.db innout_test \
    	--location "{full}, {city}, {state} {postcode}" \
    	--delay 1 \
    	--user-agent "geocode-sqlite"
    

    The order is now:

    1. geocode-sqlite
    2. geocoder, such as nominatum
    3. database path
    4. table name
    5. any additional options
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Sep 8, 2020)

Owner
Chris Amico
Journalist / Developer in Boston
Chris Amico
Pure Python NetCDF file reader and writer

Pyncf Pure Python NetCDF file reading and writing. Introduction Inspired by the pyshp library, which provides simple pythonic and dependency free data

Karim Bahgat 14 Sep 30, 2022
A toolbox for processing earth observation data with Python.

eo-box eobox is a Python package with a small collection of tools for working with Remote Sensing / Earth Observation data. Package Overview So far, t

13 Jan 06, 2022
GetOSM is an OpenStreetMap tile downloader written in Python that is agnostic of GUI frameworks.

GetOSM GetOSM is an OpenStreetMap tile downloader written in Python that is agnostic of GUI frameworks. It is used with tkinter by ProjPicker. Require

Huidae Cho 3 May 20, 2022
A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets

Notebooks A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets This repository provides tools

NASA Jet Propulsion Laboratory 27 Oct 25, 2022
Bacon - Band-limited Coordinate Networks for Multiscale Scene Representation

BACON: Band-limited Coordinate Networks for Multiscale Scene Representation Project Page | Video | Paper Official PyTorch implementation of BACON. BAC

Stanford Computational Imaging Lab 144 Dec 29, 2022
Water Detect Algorithm

WaterDetect Synopsis WaterDetect is an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery from MAJ

142 Dec 30, 2022
ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs)

ProjPicker ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs) whose extent compl

Huidae Cho 4 Feb 06, 2022
PySAL: Python Spatial Analysis Library Meta-Package

Python Spatial Analysis Library PySAL, the Python spatial analysis library, is an open source cross-platform library for geospatial data science with

Python Spatial Analysis Library 1.1k Dec 18, 2022
Calculate the area inside of any GeoJSON geometry. This is a port of Mapbox's geojson-area for Python

geojson-area Calculate the area inside of any GeoJSON geometry. This is a port of Mapbox's geojson-area for Python. Installation $ pip install area U

Alireza 87 Dec 14, 2022
Summary statistics of geospatial raster datasets based on vector geometries.

rasterstats rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal stat

Matthew Perry 437 Dec 23, 2022
Python bindings and utilities for GeoJSON

geojson This Python library contains: Functions for encoding and decoding GeoJSON formatted data Classes for all GeoJSON Objects An implementation of

Jazzband 765 Jan 06, 2023
This app displays interesting statistical weather records and trends which can be used in climate related research including study of global warming.

This app displays interesting statistical weather records and trends which can be used in climate related research including study of global warming.

0 Dec 27, 2021
Wraps GEOS geometry functions in numpy ufuncs.

PyGEOS PyGEOS is a C/Python library with vectorized geometry functions. The geometry operations are done in the open-source geometry library GEOS. PyG

362 Dec 23, 2022
Extract GoPro highlights and GPMF data.

Python script that parses the gpmd stream for GOPRO moov track (MP4) and extract the GPS info into a GPX (and kml) file.

Chris Auron 2 May 13, 2022
A ready-to-use curated list of Spectral Indices for Remote Sensing applications.

A ready-to-use curated list of Spectral Indices for Remote Sensing applications. GitHub: https://github.com/davemlz/awesome-ee-spectral-indices Docume

David Montero Loaiza 488 Jan 03, 2023
Script that allows to download data with satellite's orbit height and create CSV with their change in time.

Satellite orbit height ◾ Requirements Python = 3.8 Packages listen in reuirements.txt (run pip install -r requirements.txt) Account on Space Track ◾

Alicja Musiał 2 Jan 17, 2022
Manipulation and analysis of geometric objects

Shapely Manipulation and analysis of geometric objects in the Cartesian plane. Shapely is a BSD-licensed Python package for manipulation and analysis

3.1k Jan 03, 2023
Python interface to PROJ (cartographic projections and coordinate transformations library)

pyproj Python interface to PROJ (cartographic projections and coordinate transformations library). Documentation Stable: http://pyproj4.github.io/pypr

832 Dec 31, 2022
A package to fetch sentinel 2 Satellite data from Google.

Sentinel 2 Data Fetcher Installation Create a Virtual Environment and activate it. python3 -m venv venv . venv/bin/activate Install the Package via pi

1 Nov 18, 2021
This program analizes films database with adresses, and creates a folium map with closest films to the coordinates

Films-map-project UCU CS lab 1.2, 1st year This program analizes films database with adresses, and creates a folium map with closest films to the coor

Artem Moskovets 1 Feb 09, 2022