Currentness of Buildings#

In this notebook we demonstrate how to analyze and visualize the up-to-date-ness or currentness of the latest OSM data.

These are the steps you see further down:

  • Set the connection parameters.

  • Prepare your input parameters, e.g. define area of interest and OSM tag filter.

  • Download data using PyIceberg and DuckDB.

  • Plot currentness chart with Polars.

  • Create a Map, an interactive Slider to filter the map data.

Getting started#

Set connection params.

from dotenv import load_dotenv
load_dotenv()
True
import os

s3_user = os.environ["S3_ACCESS_KEY_ID"]  # add your user here
s3_password = os.environ["S3_SECRET_ACCESS_KEY"]  # add your password here

Configure DuckDB.

!pip install duckdb==1.0.0
import duckdb

con = duckdb.connect(
    config={
        'threads': 8,
        'max_memory': '8GB'
    }
)
con.install_extension("spatial")
con.load_extension("spatial")

Set the connection params to Iceberg Rest Catalog.

from pyiceberg.catalog.rest import RestCatalog

catalog = RestCatalog(
    name="default",
    **{
        "uri": "https://sotm2024.iceberg.ohsome.org",
        "s3.endpoint": "https://sotm2024.minio.heigit.org",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
        "s3.access-key-id": s3_user,
        "s3.secret-access-key": s3_password,
        "s3.region": "eu-central-1"
    }
)

Prepare the input parameters for your analysis#

# Set iceberg table
namespace = 'geo_sort'
tablename = 'contributions'
icebergtable = catalog.load_table((namespace, tablename))

# Define status filter
status = 'latest'

# Define location filter
bboxes = {
    'heidelberg': (8.629761, 49.379556, 8.742371, 49.437890),
    'nairobi': (36.650938, -1.444471, 37.103887, -1.163522),
    'mannheim': (8.41416, 49.410362, 8.58999, 49.590489), 
    'berlin': (13.088345, 52.338271, 13.761161, 52.675509)
}

selected_region = 'heidelberg'
xmin, ymin, xmax, ymax = bboxes[selected_region]

# Define geometry type filter
geometry_type = 'Polygon'

Get the Data#

First, we do an iceberg table scan with a pre-filter. This is a fast way to download all potential OSM elements that are needed for our analysis.

We have optmized the Iceberg table to allow filtering for:

  • status

  • geometry type

  • location (approximated by the bounding box of each contribution)

import time
start_time = time.time()

icebergtable.scan(
    row_filter=(
        f"status = '{status}' "
        f"and geometry_type = '{geometry_type}' "
        f"and (xmax >= {xmin} and xmin <= {xmax}) "
        f"and (ymax >= {ymin} and ymin <= {ymax}) "
    ),
    selected_fields=(
        "valid_from",
        "tags",
        "geometry",
        "bbox"
    )
).to_duckdb('raw_osm_data',connection=con)

download_time = round(time.time() - start_time, 3)
print(f"download took {download_time} sec.")
download took 18.195 sec.

Second, we use DuckDB to perform the more detailed filtering. In this step we can filter for:

  • tags

  • location (using the exact geometry of each OSM contribution)

import time
start_time = time.time()

query = f"""
DROP TABLE IF EXISTS osm_data;
CREATE TABLE osm_data AS
(
SELECT
    a.valid_from,
    a.tags,
    a.geometry   
FROM
    raw_osm_data as a,
WHERE 1=1
    and tags['building'][1] is not null
    and tags['building'][1] != 'no'
)
;
"""
con.sql(query)

processing_time = round(time.time() - start_time, 3)
print(f"processing took {processing_time} sec.")
processing took 0.143 sec.

Plot currentness chart with polars and hvplot#

import polars as pl

chart_query = """
SELECT
    date_trunc('month', valid_from) as month,
    count(*) as n_edits
    FROM osm_data
    GROUP BY month
    ORDER BY month
"""

df = con.sql(chart_query).pl()
df.plot.step(
    x="month",
    y="n_edits"
)

Display currentness of OSM features on map#

Get data from DucDKB into GeoPandas dataframe.

import geopandas as gpd

map_query = """
    SELECT
        epoch_ms(valid_from) as valid_from,
        geometry
    FROM osm_data;
"""

df = con.sql(map_query).df()

# convert the data to geodata
gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.GeoSeries.from_wkt(df['geometry'])
).set_crs('epsg:4326')

Define map parameters and style.

import datetime
import lonboard
from palettable.matplotlib import Viridis_20


# compute lonboard color style for contious color map
min_valid_from = 1000 * datetime.datetime(2007,1,1).timestamp()
max_valid_from = 1000 * datetime.datetime(2024,6,1).timestamp()

# normalized color values from 0 to 1
valid_from_style = gdf["valid_from"].apply(
    lambda x: (x - min_valid_from) / (max_valid_from - min_valid_from))


# the lonboard map definition
layer = lonboard.SolidPolygonLayer.from_geopandas(
    gdf,
    extensions=[lonboard.layer_extension.DataFilterExtension(filter_size=1)],
    get_filter_value=gdf["valid_from"],  # replace with desired column
    filter_range=[min_valid_from, max_valid_from],  # replace with desired filter range
    get_fill_color=lonboard.colormap.apply_continuous_cmap(valid_from_style, Viridis_20, alpha=1)

)

currentness_map = lonboard.Map(
    basemap_style=lonboard.basemap.CartoBasemap.DarkMatter,
    layers=[layer]
)

Define Dates Slider Selection Widget and link to map filter range.

from datetime import date, timedelta
import ipywidgets
from traitlets import directional_link

start = datetime.datetime(2007,1,1)
end = datetime.datetime(2024,6,1)
delta = end - start   # returns timedelta
dates = [start + timedelta(days=i) for i in range(delta.days + 1)]
options = [(i.strftime('%d-%b-%Y'), int(1000* i.timestamp())) for i in dates]

date_slider = ipywidgets.SelectionRangeSlider(
    options=options,
    index=(0, len(dates)-1),
    description='Last Edit:',
    layout=ipywidgets.Layout(width='1000px'),
    disabled=False
)

directional_link(
    (date_slider, 'value'),
    (layer, "filter_range"),
)
<traitlets.traitlets.directional_link at 0x72751e3d7650>

Display the map. Have fun exploring and moving around the time slider!

display(currentness_map, date_slider)