How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

Related tags

Geolocationcog_how_to
Overview

How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

According to Cogeo.org:

A Cloud Opdtimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.

Think about the following case: You want to analyze the NDVI of your local 1km² park by using Sentinel 2 geoTIFF imaginery. Sentinel 2 satellite images cover very big regions. In the past, you had to download the whole file (100mb +) for band 4 (red) and the whole file for band 8 (near infrared) even that in fact, you need only a small portion of the data. That's why COG's (cloud optimized geoTIFFs) have been invented. With them, we ask the server to only send specific bytes of the image.

Cloud optimized geoTIFFs offer:

  • efficient imaginery data access
  • reduced duplication of data
  • legacy compatibility

COG's can be read just like normal geoTIFFs. In our example, we will use an AOI (area of interest), that is described in a geoJSON. We will also use sat-search to query the latest available Sentinel-2 satellite imaginery for our specific location. Then we will use Rasterio to perform a range request to download only the parts of the files we need. We will also use Pyproj to perform neccessary coordinate transformations. The cloud optimized Sentinel 2 imaginery is hosted in a AWS S3 repository.

Install libraries (matplotlib optional)

pip install rasterio pyproj sat-search matplotlib

Import libraries

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

First, we need to open our geoJSON file and extract the geometry. To create a geoJSON, you can go to geojson.io. Do not make a very large geoJSON (a good size is 1x1km²), otherwise you might get an error later.

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

We will query for images not older than 60 days that contain less than 20% clouds.

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

Now we got the URLs of the most recent Sentinel 2 imaginery for our region. In the next step, we need to calculate which pixels to query from our geoTIFF server. The satellite image comes with 10980 x 10980 pixels. Every pixel represents 10 meter ground resolution. In order to calculate which pixels fall into our area of interest, we need to reproject our geoJSON coordinates into pixel row/col. With the recent Rasterio versions, we can read COGs by passing a rasterio.windows.Window (that specifies which row/col to query) to the read function. Before we can query, we need to open a virtual file(urls of a hosted file):

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:

Then, we calculate the bounding box around our geometry and use the pyproj.Transformer to transform our geoJSON coordinates (EPSG 4326) into Sentinel Sat's EPSG 32633 projection.

        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 

Now that we have the right coordinates, we can calculate from coordinates to pixels in our geoTIFF file using rasterio.

        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()

Now we are ready for the desired range request.

        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

The subset object contains the desired data. We can access and vizualize it with:

        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()

red nir

I hope, I was able to show you how COG's work and that you are ready now to access your cloud optimized geoTIFF images in seconds compared to minutes in the past. Have a great day!

All together:

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:
        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 
        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()
        
        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

        # vizualize
        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()
        plt.show()
Owner
Marvin Gabler
specialized in climate, data & risk | interested in nature, rockets and outer space | The earth's data for our world's future
Marvin Gabler
Simulation and Parameter Estimation in Geophysics

Simulation and Parameter Estimation in Geophysics - A python package for simulation and gradient based parameter estimation in the context of geophysical applications.

SimPEG 390 Dec 15, 2022
QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios

QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios, that I wrote in Python 3 for my own research and visualisation. It is currently unfinis

Samuel Low 9 Aug 23, 2022
An API built to format given addresses using Python and Flask.

An API built to format given addresses using Python and Flask. About The API returns properly formatted data, i.e. removing duplicate fields, distingu

1 Feb 27, 2022
Blender addons to make the bridge between Blender and geographic data

Blender GIS Blender minimal version : 2.8 Mac users warning : currently the addon does not work on Mac with Blender 2.80 to 2.82. Please do not report

5.9k Jan 02, 2023
Program that shows all the details of the given IP address. Build with Python and ipinfo.io API

ip-details This is a program that shows all the details of the given IP address. Build with Python and ipinfo.io API Usage To use this program, run th

4 Mar 01, 2022
This is a simple python code to get IP address and its location using python

IP address & Location finder @DEV/ED : Pavan Ananth Sharma Dependencies: ip2geotools Note: use pip install ip2geotools to install this in your termin

Pavan Ananth Sharma 2 Jul 05, 2022
Tools for the extraction of OpenStreetMap street network data

OSMnet Tools for the extraction of OpenStreetMap (OSM) street network data. Intended to be used in tandem with Pandana and UrbanAccess libraries to ex

Urban Data Science Toolkit 47 Sep 21, 2022
Simple, concise geographical visualization in Python

Geographic visualizations for HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it? GeoViews is a Python library that ma

HoloViz 445 Jan 02, 2023
A simple python script that, given a location and a date, uses the Nasa Earth API to show a photo taken by the Landsat 8 satellite. The script must be executed on the command-line.

What does it do? Given a location and a date, it uses the Nasa Earth API to show a photo taken by the Landsat 8 satellite. The script must be executed

Caio 42 Nov 26, 2022
Focal Statistics

Focal-Statistics The Focal statistics tool in many GIS applications like ArcGIS, QGIS and GRASS GIS is a standard method to gain a local overview of r

Ifeanyi Nwasolu 1 Oct 21, 2021
Solving the Traveling Salesman Problem using Self-Organizing Maps

Solving the Traveling Salesman Problem using Self-Organizing Maps This repository contains an implementation of a Self Organizing Map that can be used

Diego Vicente 3.1k Dec 31, 2022
peartree: A library for converting transit data into a directed graph for sketch network analysis.

peartree 🍐 🌳 peartree is a library for converting GTFS feed schedules into a representative directed network graph. The tool uses Partridge to conve

Kuan Butts 183 Dec 29, 2022
Expose a GDAL file as a HTTP accessible on-the-fly COG

cogserver Expose any GDAL recognized raster file as a HTTP accessible on-the-fly COG (Cloud Optimized GeoTIFF) The on-the-fly COG file is not material

Even Rouault 73 Aug 04, 2022
A utility to search, download and process Landsat 8 satellite imagery

Landsat-util Landsat-util is a command line utility that makes it easy to search, download, and process Landsat imagery. Docs For full documentation v

Development Seed 681 Dec 07, 2022
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

47 Dec 27, 2022
This program analizes films database with adresses, and creates a folium map with closest films to the coordinates

Films-map-project UCU CS lab 1.2, 1st year This program analizes films database with adresses, and creates a folium map with closest films to the coor

Artem Moskovets 1 Feb 09, 2022
Geospatial web application developed uisng earthengine, geemap, and streamlit.

geospatial-streamlit Geospatial web applications developed uisng earthengine, geemap, and streamlit. App 1 - Land Surface Temperature A simple, code-f

13 Nov 27, 2022
GeoIP Legacy Python API

MaxMind GeoIP Legacy Python Extension API Requirements Python 2.5+ or 3.3+ GeoIP Legacy C Library 1.4.7 or greater Installation With pip: $ pip instal

MaxMind 230 Nov 10, 2022
ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs)

ProjPicker ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs) whose extent compl

Huidae Cho 4 Feb 06, 2022
Django model field that can hold a geoposition, and corresponding widget

django-geoposition A model field that can hold a geoposition (latitude/longitude), and corresponding admin/form widget. Prerequisites Starting with ve

Philipp Bosch 324 Oct 17, 2022