MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Overview

MapReader

A computer vision pipeline for the semantic exploration of maps at scale

Continuous integration badge License

MapReader is an end-to-end computer vision (CV) pipeline designed by the Living with Machines project. It has two main components: preprocessing/annotation and training/inference:

MapReader pipeline

MapReader provides a set of tools to:

  • load images/maps stored locally or retrieve maps via web-servers (e.g., tileservers which can be used to retrieve maps from OpenStreetMap (OSM), the National Library of Scotland (NLS), or elsewhere). ⚠️ Refer to the credits and re-use terms section if you are using digitized maps or metadata provided by NLS.
  • preprocess images/maps (e.g., divide them into patches, resampling the images, removing borders outside the neatline or reprojecting the map).
  • annotate images/maps or their patches (i.e. slices of an image/map) using an interactive annotation tool.
  • train, fine-tune, and evaluate various CV models.
  • predict labels (i.e., model inference) on large sets of images/maps.
  • Other functionalities include:
    • various plotting tools using, e.g., matplotlib, cartopy, Google Earth, and kepler.gl.
    • compute mean/standard-deviation pixel intensity of image patches.

Below is an example of MapReader CV model output (see the paper on MapReader for more details):

British railspace and buildings as predicted by a MapReader computer vision model

British 'railspace' and buildings as predicted by a MapReader computer vision model. ~30.5M patches from ~16K nineteenth-century Ordnance Survey map sheets were used (courtesy of the National Library of Scotland). (a) Predicted railspace; (b) predicted buildings; (c) and (d) predicted railspace (red) and buildings (black) in and around Middlesbrough and London, respectively. MapReader extracts information from large images or a set of images at a patch level, as depicted in the insets. For both railspace and buildings, we removed those patches that had no other neighboring patches with the same label within a distance of 250 meters.

Table of contents

Installation

Set up a conda environment

We strongly recommend installation via Anaconda:

conda create -n mr_py38 python=3.8
  • Activate the environment:
conda activate mr_py38

Method 1

  • Install mapreader:
pip install git+https://github.com/Living-with-machines/MapReader.git
python -m ipykernel install --user --name mr_py38 --display-name "Python (mr_py38)"

Method 2

  • Clone mapreader source code:
git clone https://github.com/Living-with-machines/MapReader.git 
cd /path/to/MapReader
poetry install
poetry shell

How to cite MapReader

Please consider acknowledging MapReader if it helps you to obtain results and figures for publications or presentations, by citing:

Link: https://arxiv.org/abs/2111.15592

Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen and Katherine McDonough (2021), MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale, arXiv:2111.15592.

and in BibTeX:

@misc{hosseini2021mapreader,
      title={MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale}, 
      author={Kasra Hosseini and Daniel C. S. Wilson and Kaspar Beelen and Katherine McDonough},
      year={2021},
      eprint={2111.15592},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Credits and re-use terms

Digitized maps

MapReader can retrieve maps from NLS (National Library of Scotland) via webservers. For all the digitized maps (retrieved or locally stored), please note the re-use terms:

⚠️ Use of the digitised maps for commercial purposes is currently restricted by contract. Use of these digitised maps for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Metadata

We have provided some metadata files in mapreader/persistent_data. For all these file, please note the re-use terms:

⚠️ Use of the metadata for commercial purposes is currently restricted by contract. Use of this metadata for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Acknowledgements

This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/N510129/1). Living with Machines, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

Comments
  • Update README.md

    Update README.md

    • [x] TODOs: See https://github.com/Living-with-machines/MapReader/pull/38#issuecomment-1109569025
    • [x] Rename Maps / Non-maps to Geospatial / Non-geospatial.
    • [x] @kasra-hosseini Review the changes, check the links and merge.
    opened by kasra-hosseini 24
  • Testing `MapReader`

    Testing `MapReader`

    Hi All 👋🏼

    I will be testing MapReader install and the demo notebooks run to evecute the analysis. I will document my process here

    • [x] Installation
      • [x] Install git clone [email protected]:Living-with-machines/MapReader.git
      • [x] git branch -> * dev
      • [X] git pull origin dev
      • [X] poetry install
      • [X] poetry shell
        • this command was not included in the README.md (unlike conda activate ...)
    • [x] Notebooks code execution
      • [x] 001_retrieve_patchify_plot.ipynb
      • [x] 002_annotation.ipynb
      • [x] 003_train_classifier.ipynb
      • [x] 004_inference.ipynb
    opened by ChristinaLast 18
  • :bug: some errors in `binder` deployment.

    :bug: some errors in `binder` deployment.

    Tasks

    • [x] Fix 'great_circle' is not defined
    • [x] Fix simplekml needs to be installed to create KML outputs!

    Associated tracebacks

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    /tmp/ipykernel_60/1620428857.py in <module>
          3 
          4 xmin, xmax, ymin, ymax, myimg_shape, size_in_m = \
    ----> 5         mymaps.calc_pixel_width_height(all_maps[0])
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in calc_pixel_width_height(self, parent_id, calc_size_in_m)
        349 
        350         elif calc_size_in_m in ['gc', 'great-circle']:
    --> 351             bottom = great_circle((ymin, xmin), (ymin, xmax)).meters
        352             right = great_circle((ymin, xmax), (ymax, xmax)).meters
        353             top = great_circle((ymax, xmax), (ymax, xmin)).meters
    
    NameError: name 'great_circle' is not defined
    
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
        817         try:
    --> 818             import simplekml
        819         except:
    
    ModuleNotFoundError: No module named 'simplekml'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    /tmp/ipykernel_60/28836796.py in <module>
          4             save_kml_dir="./kml_tutorial",
          5             figsize=(20, 20),
    ----> 6             image_width_resolution=600)
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in show(self, image_ids, value, plot_parent, border, border_color, vmin, vmax, colorbar, alpha, discrete_colorbar, tree_level, grid_plot, plot_histogram, save_kml_dir, image_width_resolution, kml_dpi_image, **kwds)
        675                                     value=one_image_id,
        676                                     coords=self.images["parent"][one_image_id]["coord"],
    --> 677                                     counter=-1)
        678                 else:
        679                     plt.title(one_image_id)
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
        818             import simplekml
        819         except:
    --> 820             raise ImportError("[ERROR] simplekml needs to be installed to create KML outputs!")
        821 
        822         (lon_min, lon_max, lat_min, lat_max) = coords
    
    ImportError: [ERROR] simplekml needs to be installed to create KML outputs!
    
    opened by ChristinaLast 6
  • 🐛 `LoadAnnotations` not returning annotation interface

    🐛 `LoadAnnotations` not returning annotation interface

    When using a local notebook to run through the annotation section of the quick_start notebook, I am unable to see the LoadAnnonations object returned in order to generate new labels! See screen shot below:

    Screenshot 2022-05-09 at 13 58 51

    opened by ChristinaLast 3
  • d actual edits to first para

    d actual edits to first para

    I've restored the order to 'maps' -> 'images' so we get a clearer narative as in the current existing repo; and shortened / combined a sentence, as it was repeating 'non-maps' and 'maps', so I used 'any images' instead to make it more intuitive to read.

    I was also going to add a few sentences giving the nice positive spin about interdisciplinary cross-pollination of image analysis, but not sure where this should go: I don't want to break the flow to the instructions, so perhaps it can go after the bullet points?

    opened by dcsw2 2
  • Deploying `MapReader` through `binder`

    Deploying `MapReader` through `binder`

    • [x] @ChristinaLast and @andrewphilipsmith to walk through binder deployment
      • [x] adding requirements.txt with no hashed libraries for binderhub deployment
    opened by ChristinaLast 2
  • Model inference in one step

    Model inference in one step

    Summary

    Currently, we first need to patchify an image and then do the model inference (in two separate steps). In this issue, we plan to have a method that does both steps, i.e.,

    # example interface
    my_classifier.inference(path2image, **kwds for the slice method, including patch size, ...)
    my_classifier.plot()
    

    TODO

    • Refer to https://github.com/alan-turing-institute/mapreader-plant-scivision. Here, we have a function/method called "predict" that does model inference on an image. Under the hood, it slices an image into patches, does model inference on the patches and then plot the results (and return the predicted labels).
    • It would be interesting to have a similar function/method in MapReader.
    opened by kasra-hosseini 2
  • Dev

    Dev

    Creating requirements.txt from pyproject.toml to generate package list needed for binderhub build

    Commands run:

    • to generate requirements.txt
    poetry export -f requirements.txt --output requirements.txt --without-hashes
    

    After doing this, I am required to add the github repo manually to the requirements.txt file to install MapReader such as:

    git+https://github.com/Living-with-machines/[email protected]#egg=mapreader
    
    opened by ChristinaLast 1
  • Bump ipython from 8.0.0 to 8.0.1

    Bump ipython from 8.0.0 to 8.0.1

    Bumps ipython from 8.0.0 to 8.0.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Add plant phenotyping example notebooks and data

    Add plant phenotyping example notebooks and data

    Add directory with cleaned and updated notebooks demonstrating classification of plant patches in images. Also includes examples of open access data that can be used in running these notebooks, annotation files to facilitate annotating plant vs. non-plant patches.

    opened by evangeline-corcoran 1
  • Bump pillow from 8.4.0 to 9.0.0

    Bump pillow from 8.4.0 to 9.0.0

    Bumps pillow from 8.4.0 to 9.0.0.

    Release notes

    Sourced from pillow's releases.

    9.0.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.0.0 (2022-01-02)

    • Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

    • Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

    • Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

    • Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

    • Improved I;16 operations on big endian #5901 [radarhere]

    • Limit quantized palette to number of colors #5879 [radarhere]

    • Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

    • When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

    • Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

    • Added rounding when converting P and PA #5824 [radarhere]

    • Improved putdata() documentation and data handling #5910 [radarhere]

    • Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

    • Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

    • Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

    • Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

    • Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Satellite images (some references)

    Satellite images (some references)

    • I just had a talk with one of the REG members on https://github.com/urbangrammarai and they are using this tool to download satellite images: https://github.com/urbangrammarai/gee_pipeline/.
    • The other option is : https://planetarycomputer.microsoft.com/
    opened by kasra-hosseini 0
  • Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

    Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

    So that we can filter out black patches easier. We have trained some MapReader models using ~6K annotated patches (the plant phenotyping project), and now we need to extend the dataset, particularly for non-black patches.

    enhancement 
    opened by kasra-hosseini 1
  • Choose a tool to simplify diffs on .ipynb files.

    Choose a tool to simplify diffs on .ipynb files.

    Consider

    • https://www.reviewnb.com/
    • https://jupyter.org/enhancement-proposals/08-notebook-diff/notebook-diff.html
    • https://blog.ouseful.info/2017/01/27/displaying-differences-in-jupyter-notebooks-nbdime-nbdiff/

    and others

    Build into workflow using pre-commit/CI as appropriate.

    opened by andrewphilipsmith 1
  • Create CODE_OF_CONDUCT.md

    Create CODE_OF_CONDUCT.md

    @DavidBeavan Could you please review this PR? I am using "Contributor Covenant" of GitHub with the following edit:

    Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at https://livingwithmachines.ac.uk/contact-us/. All complaints will be reviewed and investigated promptly and fairly.

    opened by kasra-hosseini 0
  • Adding a notebook containing start of implementation for maps

    Adding a notebook containing start of implementation for maps

    This PR aims to implement the requirements of this issue https://github.com/Living-with-machines/MapReader/issues/36

    For details: See https://hackmd.io/bL3y2cWdT-y3qPGkyVzD5Q?both

    Tasks:

    • [ ] Get or create annotations for example map data
    • [ ] Complete text in HackMD above and transfer it into an appropriate place within the repo. (readme.md or quick_start.ipynb etc)
    • [ ] Resolve all of the questions in the HackMD (whether adding more detail or explicitly deciding to exclude from a quick start guide).
    • [ ] Give the quick_start.ipynb (maps) and quick_start.ipynb (plants) distinct names.
    • [ ] Complete the quick_start.ipynb (maps) to, at least, the same level of detail as the quick_start.ipynb (plants).
    opened by andrewphilipsmith 1
Releases(v0.3.3)
Owner
Living with Machines
A radical collaboration between computational linguists, curators, data scientists, software engineers, geographers and historians
Living with Machines
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

Microsoft 674 Dec 26, 2022
Object detection using yolo-tiny model and opencv used as backend

Object detection Algorithm used : Yolo algorithm Backend : opencv Library required: opencv = 4.5.4-dev' Quick Overview about structure 1) main.py Load

2 Jul 06, 2022
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

Joseph P. Robinson 41 Dec 12, 2022
Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

[MM'21] Constrained Graphic Layout Generation via Latent Optimization This repository provides the official code for the paper "Constrained Graphic La

Kotaro Kikuchi 73 Dec 27, 2022
Classifying cat and dog images using Kaggle dataset

PyTorch Image Classification Classifies an image as containing either a dog or a cat (using Kaggle's public dataset), but could easily be extended to

Robert Coleman 74 Nov 22, 2022
It is modified Tensorflow 2.x version of Mask R-CNN

[TF 2.X] Mask R-CNN for Object Detection and Segmentation [Notice] : The original mask-rcnn uses the tensorflow 1.X version. I modified it for tensorf

Milner 34 Nov 09, 2022
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambig

王皓波 147 Jan 07, 2023
Fully convolutional deep neural network to remove transparent overlays from images

Fully convolutional deep neural network to remove transparent overlays from images

Marc Belmont 1.1k Jan 06, 2023
Classification of ecg datas for disease detection

ecg_classification Classification of ecg datas for disease detection

Atacan ÖZKAN 5 Sep 09, 2022
Checkout some cool self-projects you can try your hands on to curb your boredom this December!

SoC-Winter Checkout some cool self-projects you can try your hands on to curb your boredom this December! These are short projects that you can do you

Web and Coding Club, IIT Bombay 29 Nov 08, 2022
Learning Temporal Consistency for Low Light Video Enhancement from Single Images (CVPR2021)

StableLLVE This is a Pytorch implementation of "Learning Temporal Consistency for Low Light Video Enhancement from Single Images" in CVPR 2021, by Fan

99 Dec 19, 2022
Multi-Joint dynamics with Contact. A general purpose physics simulator.

MuJoCo Physics MuJoCo stands for Multi-Joint dynamics with Contact. It is a general purpose physics engine that aims to facilitate research and develo

DeepMind 5.2k Jan 02, 2023
Learning Continuous Signed Distance Functions for Shape Representation

DeepSDF This is an implementation of the CVPR '19 paper "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" by Park et a

Meta Research 1.1k Jan 01, 2023
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

ASMA-GAN Anisotropic Stroke Control for Multiple Artists Style Transfer Proceedings of the 28th ACM International Conference on Multimedia The officia

Six_God 146 Nov 21, 2022
PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN in PyTorch PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. * All samples in READM

Taehoon Kim 1k Jan 04, 2023
Alfred-Restore-Iterm-Arrangement - An Alfred workflow to restore iTerm2 window Arrangements

Alfred-Restore-Iterm-Arrangement This alfred workflow will list avaliable iTerm2

7 May 10, 2022
Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

Frizu 6 Sep 08, 2022
FluidNet re-written with ATen tensor lib

fluidnet_cxx: Accelerating Fluid Simulation with Convolutional Neural Networks. A PyTorch/ATen Implementation. This repository is based on the paper,

JoliBrain 50 Jun 07, 2022
Multiwavelets-based operator model

Multiwavelet model for Operator maps Gaurav Gupta, Xiongye Xiao, and Paul Bogdan Multiwavelet-based Operator Learning for Differential Equations In Ne

Gaurav 33 Dec 04, 2022