MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Overview

MapReader

A computer vision pipeline for the semantic exploration of maps at scale

Continuous integration badge License

MapReader is an end-to-end computer vision (CV) pipeline designed by the Living with Machines project. It has two main components: preprocessing/annotation and training/inference:

MapReader pipeline

MapReader provides a set of tools to:

  • load images/maps stored locally or retrieve maps via web-servers (e.g., tileservers which can be used to retrieve maps from OpenStreetMap (OSM), the National Library of Scotland (NLS), or elsewhere). ⚠️ Refer to the credits and re-use terms section if you are using digitized maps or metadata provided by NLS.
  • preprocess images/maps (e.g., divide them into patches, resampling the images, removing borders outside the neatline or reprojecting the map).
  • annotate images/maps or their patches (i.e. slices of an image/map) using an interactive annotation tool.
  • train, fine-tune, and evaluate various CV models.
  • predict labels (i.e., model inference) on large sets of images/maps.
  • Other functionalities include:
    • various plotting tools using, e.g., matplotlib, cartopy, Google Earth, and kepler.gl.
    • compute mean/standard-deviation pixel intensity of image patches.

Below is an example of MapReader CV model output (see the paper on MapReader for more details):

British railspace and buildings as predicted by a MapReader computer vision model

British 'railspace' and buildings as predicted by a MapReader computer vision model. ~30.5M patches from ~16K nineteenth-century Ordnance Survey map sheets were used (courtesy of the National Library of Scotland). (a) Predicted railspace; (b) predicted buildings; (c) and (d) predicted railspace (red) and buildings (black) in and around Middlesbrough and London, respectively. MapReader extracts information from large images or a set of images at a patch level, as depicted in the insets. For both railspace and buildings, we removed those patches that had no other neighboring patches with the same label within a distance of 250 meters.

Table of contents

Installation

Set up a conda environment

We strongly recommend installation via Anaconda:

conda create -n mr_py38 python=3.8
  • Activate the environment:
conda activate mr_py38

Method 1

  • Install mapreader:
pip install git+https://github.com/Living-with-machines/MapReader.git
python -m ipykernel install --user --name mr_py38 --display-name "Python (mr_py38)"

Method 2

  • Clone mapreader source code:
git clone https://github.com/Living-with-machines/MapReader.git 
cd /path/to/MapReader
poetry install
poetry shell

How to cite MapReader

Please consider acknowledging MapReader if it helps you to obtain results and figures for publications or presentations, by citing:

Link: https://arxiv.org/abs/2111.15592

Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen and Katherine McDonough (2021), MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale, arXiv:2111.15592.

and in BibTeX:

@misc{hosseini2021mapreader,
      title={MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale}, 
      author={Kasra Hosseini and Daniel C. S. Wilson and Kaspar Beelen and Katherine McDonough},
      year={2021},
      eprint={2111.15592},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Credits and re-use terms

Digitized maps

MapReader can retrieve maps from NLS (National Library of Scotland) via webservers. For all the digitized maps (retrieved or locally stored), please note the re-use terms:

⚠️ Use of the digitised maps for commercial purposes is currently restricted by contract. Use of these digitised maps for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Metadata

We have provided some metadata files in mapreader/persistent_data. For all these file, please note the re-use terms:

⚠️ Use of the metadata for commercial purposes is currently restricted by contract. Use of this metadata for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Acknowledgements

This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/N510129/1). Living with Machines, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

Comments
  • Update README.md

    Update README.md

    • [x] TODOs: See https://github.com/Living-with-machines/MapReader/pull/38#issuecomment-1109569025
    • [x] Rename Maps / Non-maps to Geospatial / Non-geospatial.
    • [x] @kasra-hosseini Review the changes, check the links and merge.
    opened by kasra-hosseini 24
  • Testing `MapReader`

    Testing `MapReader`

    Hi All 👋🏼

    I will be testing MapReader install and the demo notebooks run to evecute the analysis. I will document my process here

    • [x] Installation
      • [x] Install git clone [email protected]:Living-with-machines/MapReader.git
      • [x] git branch -> * dev
      • [X] git pull origin dev
      • [X] poetry install
      • [X] poetry shell
        • this command was not included in the README.md (unlike conda activate ...)
    • [x] Notebooks code execution
      • [x] 001_retrieve_patchify_plot.ipynb
      • [x] 002_annotation.ipynb
      • [x] 003_train_classifier.ipynb
      • [x] 004_inference.ipynb
    opened by ChristinaLast 18
  • :bug: some errors in `binder` deployment.

    :bug: some errors in `binder` deployment.

    Tasks

    • [x] Fix 'great_circle' is not defined
    • [x] Fix simplekml needs to be installed to create KML outputs!

    Associated tracebacks

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    /tmp/ipykernel_60/1620428857.py in <module>
          3 
          4 xmin, xmax, ymin, ymax, myimg_shape, size_in_m = \
    ----> 5         mymaps.calc_pixel_width_height(all_maps[0])
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in calc_pixel_width_height(self, parent_id, calc_size_in_m)
        349 
        350         elif calc_size_in_m in ['gc', 'great-circle']:
    --> 351             bottom = great_circle((ymin, xmin), (ymin, xmax)).meters
        352             right = great_circle((ymin, xmax), (ymax, xmax)).meters
        353             top = great_circle((ymax, xmax), (ymax, xmin)).meters
    
    NameError: name 'great_circle' is not defined
    
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
        817         try:
    --> 818             import simplekml
        819         except:
    
    ModuleNotFoundError: No module named 'simplekml'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    /tmp/ipykernel_60/28836796.py in <module>
          4             save_kml_dir="./kml_tutorial",
          5             figsize=(20, 20),
    ----> 6             image_width_resolution=600)
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in show(self, image_ids, value, plot_parent, border, border_color, vmin, vmax, colorbar, alpha, discrete_colorbar, tree_level, grid_plot, plot_histogram, save_kml_dir, image_width_resolution, kml_dpi_image, **kwds)
        675                                     value=one_image_id,
        676                                     coords=self.images["parent"][one_image_id]["coord"],
    --> 677                                     counter=-1)
        678                 else:
        679                     plt.title(one_image_id)
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
        818             import simplekml
        819         except:
    --> 820             raise ImportError("[ERROR] simplekml needs to be installed to create KML outputs!")
        821 
        822         (lon_min, lon_max, lat_min, lat_max) = coords
    
    ImportError: [ERROR] simplekml needs to be installed to create KML outputs!
    
    opened by ChristinaLast 6
  • 🐛 `LoadAnnotations` not returning annotation interface

    🐛 `LoadAnnotations` not returning annotation interface

    When using a local notebook to run through the annotation section of the quick_start notebook, I am unable to see the LoadAnnonations object returned in order to generate new labels! See screen shot below:

    Screenshot 2022-05-09 at 13 58 51

    opened by ChristinaLast 3
  • d actual edits to first para

    d actual edits to first para

    I've restored the order to 'maps' -> 'images' so we get a clearer narative as in the current existing repo; and shortened / combined a sentence, as it was repeating 'non-maps' and 'maps', so I used 'any images' instead to make it more intuitive to read.

    I was also going to add a few sentences giving the nice positive spin about interdisciplinary cross-pollination of image analysis, but not sure where this should go: I don't want to break the flow to the instructions, so perhaps it can go after the bullet points?

    opened by dcsw2 2
  • Deploying `MapReader` through `binder`

    Deploying `MapReader` through `binder`

    • [x] @ChristinaLast and @andrewphilipsmith to walk through binder deployment
      • [x] adding requirements.txt with no hashed libraries for binderhub deployment
    opened by ChristinaLast 2
  • Model inference in one step

    Model inference in one step

    Summary

    Currently, we first need to patchify an image and then do the model inference (in two separate steps). In this issue, we plan to have a method that does both steps, i.e.,

    # example interface
    my_classifier.inference(path2image, **kwds for the slice method, including patch size, ...)
    my_classifier.plot()
    

    TODO

    • Refer to https://github.com/alan-turing-institute/mapreader-plant-scivision. Here, we have a function/method called "predict" that does model inference on an image. Under the hood, it slices an image into patches, does model inference on the patches and then plot the results (and return the predicted labels).
    • It would be interesting to have a similar function/method in MapReader.
    opened by kasra-hosseini 2
  • Dev

    Dev

    Creating requirements.txt from pyproject.toml to generate package list needed for binderhub build

    Commands run:

    • to generate requirements.txt
    poetry export -f requirements.txt --output requirements.txt --without-hashes
    

    After doing this, I am required to add the github repo manually to the requirements.txt file to install MapReader such as:

    git+https://github.com/Living-with-machines/[email protected]#egg=mapreader
    
    opened by ChristinaLast 1
  • Bump ipython from 8.0.0 to 8.0.1

    Bump ipython from 8.0.0 to 8.0.1

    Bumps ipython from 8.0.0 to 8.0.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Add plant phenotyping example notebooks and data

    Add plant phenotyping example notebooks and data

    Add directory with cleaned and updated notebooks demonstrating classification of plant patches in images. Also includes examples of open access data that can be used in running these notebooks, annotation files to facilitate annotating plant vs. non-plant patches.

    opened by evangeline-corcoran 1
  • Bump pillow from 8.4.0 to 9.0.0

    Bump pillow from 8.4.0 to 9.0.0

    Bumps pillow from 8.4.0 to 9.0.0.

    Release notes

    Sourced from pillow's releases.

    9.0.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.0.0 (2022-01-02)

    • Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

    • Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

    • Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

    • Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

    • Improved I;16 operations on big endian #5901 [radarhere]

    • Limit quantized palette to number of colors #5879 [radarhere]

    • Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

    • When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

    • Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

    • Added rounding when converting P and PA #5824 [radarhere]

    • Improved putdata() documentation and data handling #5910 [radarhere]

    • Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

    • Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

    • Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

    • Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

    • Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Satellite images (some references)

    Satellite images (some references)

    • I just had a talk with one of the REG members on https://github.com/urbangrammarai and they are using this tool to download satellite images: https://github.com/urbangrammarai/gee_pipeline/.
    • The other option is : https://planetarycomputer.microsoft.com/
    opened by kasra-hosseini 0
  • Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

    Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

    So that we can filter out black patches easier. We have trained some MapReader models using ~6K annotated patches (the plant phenotyping project), and now we need to extend the dataset, particularly for non-black patches.

    enhancement 
    opened by kasra-hosseini 1
  • Choose a tool to simplify diffs on .ipynb files.

    Choose a tool to simplify diffs on .ipynb files.

    Consider

    • https://www.reviewnb.com/
    • https://jupyter.org/enhancement-proposals/08-notebook-diff/notebook-diff.html
    • https://blog.ouseful.info/2017/01/27/displaying-differences-in-jupyter-notebooks-nbdime-nbdiff/

    and others

    Build into workflow using pre-commit/CI as appropriate.

    opened by andrewphilipsmith 1
  • Create CODE_OF_CONDUCT.md

    Create CODE_OF_CONDUCT.md

    @DavidBeavan Could you please review this PR? I am using "Contributor Covenant" of GitHub with the following edit:

    Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at https://livingwithmachines.ac.uk/contact-us/. All complaints will be reviewed and investigated promptly and fairly.

    opened by kasra-hosseini 0
  • Adding a notebook containing start of implementation for maps

    Adding a notebook containing start of implementation for maps

    This PR aims to implement the requirements of this issue https://github.com/Living-with-machines/MapReader/issues/36

    For details: See https://hackmd.io/bL3y2cWdT-y3qPGkyVzD5Q?both

    Tasks:

    • [ ] Get or create annotations for example map data
    • [ ] Complete text in HackMD above and transfer it into an appropriate place within the repo. (readme.md or quick_start.ipynb etc)
    • [ ] Resolve all of the questions in the HackMD (whether adding more detail or explicitly deciding to exclude from a quick start guide).
    • [ ] Give the quick_start.ipynb (maps) and quick_start.ipynb (plants) distinct names.
    • [ ] Complete the quick_start.ipynb (maps) to, at least, the same level of detail as the quick_start.ipynb (plants).
    opened by andrewphilipsmith 1
Releases(v0.3.3)
Owner
Living with Machines
A radical collaboration between computational linguists, curators, data scientists, software engineers, geographers and historians
Living with Machines
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
Gathering data of likes on Tinder within the past 7 days

tinder_likes_data Gathering data of Likes Sent on Tinder within the past 7 days. Versions November 25th, 2021 - Functionality to get the name and age

Alex Carter 12 Jan 05, 2023
Convert monolithic Jupyter notebooks into Ploomber pipelines.

Soorgeon Join our community | Newsletter | Contact us | Blog | Website | YouTube Convert monolithic Jupyter notebooks into Ploomber pipelines. soorgeo

Ploomber 65 Dec 16, 2022
Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

Sebastian Schäfer 10 Dec 08, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 03, 2023
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Chatistics Python 3 scripts to convert chat logs from various messaging platforms into Pandas DataFrames. Can also generate histograms and word clouds

Florian 893 Jan 02, 2023
Generate lookml for views from dbt models

dbt2looker Use dbt2looker to generate Looker view files automatically from dbt models. Features Column descriptions synced to looker Dimension for eac

lightdash 126 Dec 28, 2022
Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022
Intake is a lightweight package for finding, investigating, loading and disseminating data.

Intake: A general interface for loading data Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps

Intake 851 Jan 01, 2023
A columnar data container that can be compressed.

Unmaintained Package Notice Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore. During

944 Dec 09, 2022
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 14 Jan 02, 2023
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 05, 2023
Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

utsav 96 Jan 01, 2023
General Assembly's 2015 Data Science course in Washington, DC

DAT8 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15). Instructor: Kevin Markham (

Kevin Markham 1.6k Jan 07, 2023
DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting

6 Oct 25, 2022
Time ranges with python

timeranges Time ranges. Read the Docs Installation pip timeranges is available on pip: pip install timeranges GitHub You can also install the latest v

Micael Jarniac 2 Sep 01, 2022
An orchestration platform for the development, production, and observation of data assets.

Dagster An orchestration platform for the development, production, and observation of data assets. Dagster lets you define jobs in terms of the data f

Dagster 6.2k Jan 08, 2023
Useful tool for inserting DataFrames into the Excel sheet.

PyCellFrame Insert Pandas DataFrames into the Excel sheet with a bunch of conditions Install pip install pycellframe Usage Examples Let's suppose that

Luka Sosiashvili 1 Feb 16, 2022
Code for the DH project "Dhimmis & Muslims – Analysing Multireligious Spaces in the Medieval Muslim World"

Damast This repository contains code developed for the digital humanities project "Dhimmis & Muslims – Analysing Multireligious Spaces in the Medieval

University of Stuttgart Visualization Research Center 2 Jul 01, 2022
a tool that compiles a csv of all h1 program stats

h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati

Evan 40 Oct 27, 2022