Geocode rows in a SQLite database table

Overview

geocode-sqlite

PyPI Changelog Tests License

Geocode rows from a SQLite table

Installation

Install this tool using pip or pipx:

# install inside a virtualenv
pip install geocode-sqlite

# install globally
pipx install geocode-sqlite

Usage

Let's say you have a spreadsheet with addresses in it, and you'd like to map those locations. First, create a SQLite database and insert rows from that spreadsheet using sqlite-utils.

sqlite-utils insert data.db data data.csv --csv

Now, geocode it using OpenStreetMap's Nominatum geocoder.

geocode-sqlite nominatum data.db data \
 --location="{address}, {city}, {state} {zip}" \
 --delay=1 \
 --user-agent="this-is-me"

In the command above, you're using Nominatum, which is free and only asks for a unique user agent (--user-agent).

This will connect to a database (data.db) and read all rows from the table data (skipping any that already have both a latitude and longitude column filled).

You're also telling the geocoder how to extract a location query (--location) from a row of data, using Python's built-in string formatting, and setting a rate limit (--delay) of one request per second.

For each row where geocoding succeeds, latitude and longitude will be populated. If you hit an error, or a rate limit, run the same query and pick up where you left off.

Under the hood, this package uses the excellent geopy library, which is stable and thoroughly road-tested. If you need help understanding a particular geocoder's options, consult geopy's documentation.

Supported Geocoders

The CLI currently supports these geocoders:

  • bing
  • googlev3
  • mapquest (and open-mapquest)
  • nominatum

More will be added soon.

Common arguments and options

Each geocoder needs to know where to find the data it's working with. These are the first two arguments:

  • database: a path to a SQLite file, which must already exist
  • table: the name of a table, in that database, which exists and has data to geocode

From there, we have a set of options passed to every geocoder:

  • location: a string format that will be expanded with each row to build a full query, to be geocoded
  • delay: a delay between each call (some services require this)
  • latitude: latitude column name
  • longitude: longitude column name

Each geocoder takes additional, specific arguments beyond these, such as API keys. Again, geopy's documentation is an excellent resource.

Python API

The command line interface aims to support the most common options for each geocoder. For more find-grained control, use the Python API.

As with the CLI, this assumes you already have a SQLite database and a table of location data.

from geocode_sqlite import geocode_table
from geopy.geocoders import Nominatum

# create a geocoder instance, with some extra options
nominatum = Nominatum(user_agent="this-is-me", domain="nominatum.local.dev", scheme="http")

# assuming our database is in the same directory
count = geocode_table("data.db", "data", query_template="{address}, {city}, {state} {zip}")

# when it's done
print(f"Geocoded {count} rows")

Any geopy geocoder can be used with the Python API.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd geocode-sqlite
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Please remember that this library is mainly glue code between other well-tested projects, specifically: click, geopy and sqlite-utils. Tests should focus on making sure those parts fit together correctly. We can assume the parts themselves already work.

To that end, there is a test geocoder included: geocode_sqlite.testing.DummyGeocoder. That geocoder works with an included dataset of In-N-Out Burger locations provided by AllThePlaces. It works like a normal GeoPy geocoder, except it will only return results for In-N-Out locations using the included database.

Comments
  • fixing

    fixing "ImportError" in the 6 lines and 7

    hi Chris Amico. I Using GNU/Linux os And in GNU/Linux, I can not run the cli.py file. But with a small change I made, this problem was solved. If you like, please merging.

    opened by EmadDeve20 4
  • Add OpenCage option to CLI

    Add OpenCage option to CLI

    I'm a contractor for OpenCage geocoder. We'd like to add an opencage option to the CLI. OpenCage is already available in geopy so I have just copied the existing geocoder code with the relevant options.

    • Added OpenCage as an option to CLI
    • Added OpenCage to README
    • Added OpenCage test to Makefile
    opened by sbscully 2
  • Better argument ergonomics

    Better argument ergonomics

    Looking at this with fresh eyes, this whole bit is dumb:

    Note the order of options: There are two sets of options we need to pass.

    The first concerns the data we're geocoding. We need to say where our database is and what table we're using, and optionally, how to extract a location query.

    Then, we need to say what geocoder we're using, and pass in any options needed to initalize it. This will be different for each geocoder we want to use.

    Ideally, it should be something like this:

    geocode-sqlite nominatum  data.db data \
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me"
    

    Maybe this would be ok:

    geocode-sqlite
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me" \
    nominatum data.db data
    

    But the first way is definitely better.

    opened by eyeseast 2
  • Spatialite helper?

    Spatialite helper?

    Would it be worth helping convert lat/lng fields to a spatial index? https://docs.datasette.io/en/stable/spatialite.html#spatial-indexing-latitude-longitude-columns

    Here's the whole code block:

    import sqlite3
    conn = sqlite3.connect('museums.db')
    # Lead the spatialite extension:
    conn.enable_load_extension(True)
    conn.load_extension('/usr/local/lib/mod_spatialite.dylib')
    # Initialize spatial metadata for this database:
    conn.execute('select InitSpatialMetadata(1)')
    # Add a geometry column called point_geom to our museums table:
    conn.execute("SELECT AddGeometryColumn('museums', 'point_geom', 4326, 'POINT', 2);")
    # Now update that geometry column with the lat/lon points
    conn.execute('''
        UPDATE events SET
        point_geom = GeomFromText('POINT('||"longitude"||' '||"latitude"||')',4326);
    ''')
    # Now add a spatial index to that column
    conn.execute('select CreateSpatialIndex("museums", "point_geom");')
    # If you don't commit your changes will not be persisted:
    conn.commit()
    conn.close()
    

    I don't know if it belongs in this package or not.

    question 
    opened by eyeseast 2
  • Use rate limiting by default?

    Use rate limiting by default?

    It's generally bad practice to throw a few hundred geocoding requests at a free service. Given that, is it worth setting a default delay of one second, and let people explicitly turn it off if needed?

    opened by eyeseast 2
  • Geocoder CLI supprt

    Geocoder CLI supprt

    Here are the geocoders currently supported by the CLI. I want to add the most common ones (which I use) first:

    • [x] Google
    • [x] MapBox
    • [x] OpenStreetMap
    • [x] MapQuest
    • [x] Bing
    • [ ] Geocode.Earth
    • [ ] Geocodio
    • [ ] Pelias
    • [ ] MapTiler

    What else?

    help wanted good first issue 
    opened by eyeseast 2
  • fix failing tests that exepect an envvar

    fix failing tests that exepect an envvar

    if MAPBOX_API_KEY is not set in the environment, test_pass_kwargs fails.

    noticed this when trying to run tests locally. the D prefix defers to values already set in the environment.

    you may not want to introduce a new dependency to your project (namely pytest-env), and I would understand that, but just thought I'd bring this to your attention with a suggested fix. take it or leave it! this is a very useful library. thanks!

    opened by noslouch 1
  • GeoJSON and SpatiaLite support

    GeoJSON and SpatiaLite support

    Closes #22 Closes #24 Closes #26

    Passing a --geojson flag will store results as a GeoJSON geometry, instead of in latitude and longitude columns.

    Using --spatialite will add a geometry column and store results as a SpatiaLite binary.

    This should make it easier to work with datasette-geojson and datasette-geojson-map.

    opened by eyeseast 1
  • Save results as geojson?

    Save results as geojson?

    Right now, results are saved in two columns: latitude and longitude.

    Now that datasette-geojson and datasette-geojson-map exist, it would be nice to have this plugin feed into those.

    Maybe it's an option: sqlite-geocode data.db table ... --geojson

    That would be backwards compatible at least.

    question 
    opened by eyeseast 1
  • Capture full geocoding results?

    Capture full geocoding results?

    opened by eyeseast 1
  • Need a progress bar

    Need a progress bar

    Especially when rate limiting, which is now the default, geocoding a table is slow. With a one-second delay, a 300 row table takes five minutes to finish, at best.

    opened by eyeseast 1
  • Consolidate code to geocode a list

    Consolidate code to geocode a list

    I do this in three places:

    • In geocode_table, which I wrote first
    • In the CLI
    • In geocode_list, which I needed for the progress bar

    This whole block is basically repeated: https://github.com/eyeseast/geocode-sqlite/blob/main/geocode_sqlite/utils.py#L50-L68. Part of the issue is that in the Python API, I'm using log.info and in the CLI I'm using click.echo. Maybe I can abstract that and pass in a print function.

    opened by eyeseast 0
  • Async support

    Async support

    Geopy has a set of tools to help with async here: https://geopy.readthedocs.io/en/latest/#async-mode

    Should definitely use those, especially when thinking about Datasette integration.

    opened by eyeseast 2
Releases(v0.8.2)
  • v0.8.2(Nov 7, 2022)

    What's Changed

    • Require requests. More forgiving timeout. by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/43

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Nov 6, 2022)

    What's Changed

    • Raw results in geocode_table, too by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/42

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Nov 6, 2022)

    What's Changed

    • Just a typo! by @shaver in https://github.com/eyeseast/geocode-sqlite/pull/40
    • Capture raw geocoding results by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/41

    New Contributors

    • @shaver made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/40

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.7.0...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 31, 2022)

    The OpenCage geocoding service is now supported. Thanks @sbscully. This library is now tested on Python 3.11.

    What's Changed

    • Add OpenCage option to CLI by @sbscully in https://github.com/eyeseast/geocode-sqlite/pull/39

    New Contributors

    • @sbscully made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/39

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.1...v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Mar 16, 2022)

    What's Changed

    • if no bbox options are passed, value is None by @noslouch in https://github.com/eyeseast/geocode-sqlite/pull/30
    • Pass --api-key to mapbox test by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/33

    New Contributors

    • @noslouch made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/30

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2022)

    What's Changed

    • GeoJSON and SpatiaLite support by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/23

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.2...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Mar 1, 2022)

    What's Changed

    • Require click >= 7.0 to hide commands by @1-Byte in https://github.com/eyeseast/geocode-sqlite/pull/20
    • Fix error with rowid tables by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/27

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.1...v0.5.2

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 20, 2021)

  • v0.3.0(Apr 4, 2021)

  • v0.2.0(Apr 3, 2021)

    Arguments and options now follow a much more sensible order, following the example of other thing-to-sqlite tools. For example:

    geocode-sqlite nominatum geo.db innout_test \
    	--location "{full}, {city}, {state} {postcode}" \
    	--delay 1 \
    	--user-agent "geocode-sqlite"
    

    The order is now:

    1. geocode-sqlite
    2. geocoder, such as nominatum
    3. database path
    4. table name
    5. any additional options
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Sep 8, 2020)

Owner
Chris Amico
Journalist / Developer in Boston
Chris Amico
Helping data scientists better understand their datasets and models in text classification. With love from ServiceNow.

Azimuth, an open-source dataset and error analysis tool for text classification, with love from ServiceNow. Overview Azimuth is an open source applica

ServiceNow 145 Dec 23, 2022
Python library to visualize circular plasmid maps

Plasmidviewer Plasmidviewer is a Python library to visualize plasmid maps from GenBank. This library provides only the function to visualize circular

Mori Hideto 9 Dec 04, 2022
A bot that tweets info and location map for new bicycle parking added to OpenStreetMap within a GeoJSON boundary.

Bike parking tweepy bot app A twitter bot app that searches for bicycle parking added to OpenStreetMap. Relies on AWS Lambda/S3, Python3, Tweepy, Flas

Angelo Trivisonno 1 Dec 19, 2021
Software for Advanced Spatial Econometrics

GeoDaSpace Software for Advanced Spatial Econometrics GeoDaSpace current version 1.0 (32-bit) Development environment: Mac OSX 10.5.x (32-bit) wxPytho

GeoDa Center 38 Jan 03, 2023
Centroids as a Service

Centroids! This application reads a valid geojson FeatureCollection and returns a valid geojson FeatureColleciton of centroids. In the output: All pro

Lyzi Diamond 20 Aug 29, 2021
Create Siege configuration files from Cloud Optimized GeoTIFF.

cogeo-siege Documentation: Source Code: https://github.com/developmentseed/cogeo-siege Description Create siege configuration files from Cloud Optimiz

Development Seed 3 Dec 01, 2022
A trivia questions about Europe

EUROPE TRIVIA QUIZ IN PYTHON Project Outline Ask user if he / she knows more about Europe. If yes show the Trivia main screen, else show the end Trivi

David Danso 1 Nov 17, 2021
🌐 Local tile server for viewing geospatial raster files with ipyleaflet or folium

🌐 Local Tile Server for Geospatial Rasters Need to visualize a rather large (gigabytes) raster you have locally? This is for you. A Flask application

Bane Sullivan 192 Jan 04, 2023
Minimum Bounding Box of Geospatial data

BBOX Problem definition: The spatial data users often are required to obtain the coordinates of the minimum bounding box of vector and raster data in

Ali Khosravi Kazazi 1 Sep 08, 2022
Earthengine-py-notebooks - A collection of 360+ Jupyter Python notebook examples for using Google Earth Engine with interactive mapping

earthengine-py-notebooks A collection of 360+ Jupyter Python notebook examples for using Google Earth Engine with interactive mapping Contact: Qiushen

Qiusheng Wu 1.1k Dec 29, 2022
pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef enu eci

Python 3-D coordinate conversions Pure Python (no prerequistes beyond Python itself) 3-D geographic coordinate conversions and geodesy. API similar to

Geospace code 292 Dec 29, 2022
ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs)

ProjPicker ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs) whose extent compl

Huidae Cho 4 Feb 06, 2022
Global topography (referenced to sea-level) in a 10 arcminute resolution grid

Earth - Topography grid at 10 arc-minute resolution Global 10 arc-minute resolution grids of topography (ETOPO1 ice-surface) referenced to mean sea-le

Fatiando a Terra Datasets 1 Jan 20, 2022
WIP: extracting Geometry utilities from datacube-core

odc.geo This is still work in progress. This repository contains geometry related code extracted from Open Datacube. For details and motivation see OD

Open Data Cube 34 Jan 09, 2023
A compilation of several single-beam bathymetry surveys of the Caribbean

Caribbean - Single-beam bathymetry This dataset is a compilation of several single-beam bathymetry surveys of the Caribbean ocean displaying a wide ra

Fatiando a Terra Datasets 0 Jan 20, 2022
Mmdb-server - An open source fast API server to lookup IP addresses for their geographic location

mmdb-server mmdb-server is an open source fast API server to lookup IP addresses

Alexandre Dulaunoy 67 Nov 25, 2022
Python interface to PROJ (cartographic projections and coordinate transformations library)

pyproj Python interface to PROJ (cartographic projections and coordinate transformations library). Documentation Stable: http://pyproj4.github.io/pypr

832 Dec 31, 2022
This is the antenna performance plotted from tinyGS reception data.

tinyGS-antenna-map This is the antenna performance plotted from tinyGS reception data. See their repository. The code produces a plot that provides Az

Martin J. Levy 14 Aug 21, 2022
Ingest and query genomic intervals from multiple BED files

Ingest and query genomic intervals from multiple BED files.

4 May 29, 2021
a Geolocator made in python

Geolocator A Geolocator made in python ✨ Features locates ur location using ur ip thats it! 💁‍♀️ How to use first download the locator.py file instal

Portgas D Ace 1 Oct 27, 2021