What’s special about spatial

What is (Geo) Spatial Data?

Spatial data is information about a phenomena organized in a spatial frame - the geographic frame

Spatial analysis then is a collection of methods applied to spatial data that

  • add value
  • reveal, explain patterns and anomalies
  • support decisions

GeoData and spatial data? Often used synonymous. Geo explicitly means earth as the spatial reference

Geographic Information Systems (GIS) is another term often used to describe spatial data or spatial analysis. Often it refers to (desktop) software that is driven at its heart by spatial data but has many other tasks besides spatial analysis such as visualization, editing and management.

Layer Principle

Data for this workshop

We use already preprocessed datasets. If you are interested in running the data preparation yourself please use the script here: https://github.com/GIScience/global-health-academy/blob/main/00data_preparation.Rmd. The data origin is:

  1. RKI COVID 19 daily reportings: https://www.arcgis.com/home/item.html?id=f10774f1c63e40168479a1feb6c7ca74
  2. Kreis Grenzen: https://gdz.bkg.bund.de/index.php/default/verwaltungsgebiete-1-250-000-mit-einwohnerzahlen-ebenen-stand-31-12-vg250-ew-ebenen-31-12.html
  3. INKAR Socio-economic indicators: https://www.inkar.de/

Spatial data in R

The sf package extends R data.frame class to spatial dataframes:

https://r-spatial.github.io/sf/articles/sf1.html

Geometry types

https://autogis-site.readthedocs.io/en/latest/notebooks/L1/geometric-objects.html

How does this look like for our Kreise dataset.

kreiseSf <- st_read("data/vg250-ew_12-31.utm32s.shape.ebenen/vg250-ew_ebenen_1231/VG250_KRS.shp",
    quiet = T)

class(kreiseSf)
[1] "sf"         "data.frame"
kreiseSf %>%
    dplyr::select(GEN, BEZ, EWZ, geometry)
Simple feature collection with 431 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
                   GEN              BEZ    EWZ                       geometry
1            Flensburg Kreisfreie Stadt  90164 MULTIPOLYGON (((526513.8 60...
2                 Kiel Kreisfreie Stadt 246794 MULTIPOLYGON (((575841.6 60...
3               Lübeck Kreisfreie Stadt 216530 MULTIPOLYGON (((623056.2 59...
4           Neumünster Kreisfreie Stadt  80196 MULTIPOLYGON (((565015.7 60...
5         Dithmarschen            Kreis 133193 MULTIPOLYGON (((505053.4 60...
6  Herzogtum Lauenburg            Kreis 198019 MULTIPOLYGON (((614093.1 59...
7        Nordfriesland            Kreis 165951 MULTIPOLYGON (((464810.7 61...
8          Ostholstein            Kreis 200539 MULTIPOLYGON (((637021.9 60...
9            Pinneberg            Kreis 316103 MULTIPOLYGON (((549382.6 59...
10                Plön            Kreis 128686 MULTIPOLYGON (((586149.4 60...
kreiseSf %>%
    dplyr::select(geometry) %>%
    plot()

kreiseSf %>%
    dplyr::filter(GEN == "Heidelberg") %>%
    dplyr::select(geometry) %>%
    plot()

kreiseSf %>%
    dplyr::filter(GEN == "Rhein-Neckar-Kreis" & BEZ == "Landkreis") %>%
    dplyr::select(geometry) %>%
    plot()

The major part of preparation was to take the COVID 19 daily reportings and join them onto the Kreis geometries. The COVID data contains an implicit spatial reference: The name and a ID of the Kreis. With the join of the actual geometries of the Kreis boundaries, the COVID data becomes spatially explicit.

Geometric operations

The geometry attribute has properties that can be used with geometric methods. Spatial relations and predicates can be used to filter, select and join datasets. Geometric processing can be used to change the geometric shape.

https://www.e-education.psu.edu/maps/l2_p5.html

https://subscription.packtpub.com/book/big+data+and+business+intelligence/9781783555079/6/ch06lvl1sec45/introduction

During preprocessing of the data we came across the cases of Eisenach and Berlin. Eisenach and its neighboring Wartburgkreis have been merged at first of July 2021. So in the middle of the pandemic with reportings before and reportings after. The administrative boundaries do not reflect this. We will dissolve the two units into one and sum up the COVID reprotings for both Kreise.

Dissolve Eisenach and Wartburgkreis

eisenach_wartburg <- kreiseSf %>%
    dplyr::filter(GEN %in% c("Eisenach", "Wartburgkreis"))
plot(kreiseSf$geometry, col = "lightgrey", xlim = st_bbox(eisenach_wartburg)[c(1,
    3)], ylim = st_bbox(eisenach_wartburg)[c(2, 4)])
plot(eisenach_wartburg["GEN"], add = T)

Add a dummy field that contains everywhere distinct values but not for the two districts if interest:

idx <- which(kreiseSf$GEN %in% c("Eisenach", "Wartburgkreis"))
kreiseSf$dummy <- 1:nrow(kreiseSf)
kreiseSf$dummy[idx] <- 0

Dissolve by dummy field and recalculate attribute fields. As Eisenach comes first we use last to get the different identifiers from Wartburgkreis. For the inhabitants (EWZ) we have to sum the two values.

# head(kreiseSf)
kreiseSfDissolved <- kreiseSf %>%
    group_by(dummy) %>%
    summarize(RS_0 = last(RS_0), EWZ = sum(EWZ), GEN = last(GEN), GF = last(GF),
        BEZ = last(BEZ), ADE = last(ADE), BSG = last(BSG), ARS = last(ARS), AGS = last(ARS),
        SDV_ARS = last(SDV_ARS), ARS_0 = last(ARS_0), AGS_0 = last(AGS_0), RS = last(RS))

Check the result

kreiseSfDissolved %>%
    dplyr::filter(GEN %in% c("Eisenach", "Wartburgkreis"))
Simple feature collection with 1 feature and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 562009.3 ymin: 5608755 xmax: 613339.8 ymax: 5667664
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 1 × 15
  dummy RS_0      EWZ GEN        GF BEZ      ADE   BSG ARS   AGS   SDV_ARS ARS_0
* <dbl> <chr>   <dbl> <chr>   <int> <chr>  <int> <int> <chr> <chr> <chr>   <chr>
1     0 16063… 161224 Wartbu…     4 Landk…     4     1 16063 16063 160635… 1606…
# … with 3 more variables: AGS_0 <chr>, RS <chr>, geometry <MULTIPOLYGON [m]>
plot(kreiseSf$geometry, col = "lightgrey", xlim = st_bbox(eisenach_wartburg)[c(1,
    3)], ylim = st_bbox(eisenach_wartburg)[c(2, 4)])
# plot(kreiseSfDissolved['GEN'], add=T)
filter(kreiseSfDissolved, GEN %in% c("Eisenach", "Wartburgkreis")) %>%
    dplyr::select(GEN) %>%
    plot(add = T)

Now we see only one row/feature in our spatial dataframe.

Add Berliner Stadtbezirke

Berlin is reported at a finer scale by the RKI data than the administrative units and INKAR. We would like to use the available data and merge our Kreise with Berlins other boundaries.

berBzkSf <- st_read("data/berlin_bezirke.gpkg", quiet = T)
berBzkSf
Simple feature collection with 97 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 370000.8 ymin: 5799521 xmax: 415786.6 ymax: 5837259
Projected CRS: ETRS89 / UTM zone 33N
First 10 features:
               uuid          sch            nam      gdf             bezeich
1  DEBE04YY5000000B 110000040403  Schmargendorf  3588120 AX_KommunalesGebiet
2  DEBE04YY50000001 110000040405        Westend 13527378 AX_KommunalesGebiet
3  DEBE12YYJ000000H 110000121208         Lübars  5010988 AX_KommunalesGebiet
4  DEBE12YYJ000000B 110000121205        Frohnau  7839990 AX_KommunalesGebiet
5  DEBE01YYK0000001 110000010103   Hansaviertel   528303 AX_KommunalesGebiet
6  DEBE05YYQ000000F 110000050505          Gatow 10112629 AX_KommunalesGebiet
7  DEBE05YYQ000000B 110000050506         Kladow 14779080 AX_KommunalesGebiet
8  DEBE11YYH0000001 110000111104     Falkenberg  3053073 AX_KommunalesGebiet
9  DEBE04YY5000000D 110000040404      Grunewald 22354409 AX_KommunalesGebiet
10 DEBE09YYO000000J 110000090903 Baumschulenweg  4824099 AX_KommunalesGebiet
                             geom
1  MULTIPOLYGON (((384220.5 58...
2  MULTIPOLYGON (((383406.4 58...
3  MULTIPOLYGON (((389380.6 58...
4  MULTIPOLYGON (((384115.8 58...
5  MULTIPOLYGON (((387740.3 58...
6  MULTIPOLYGON (((376907.4 58...
7  MULTIPOLYGON (((373467.8 58...
8  MULTIPOLYGON (((402860.1 58...
9  MULTIPOLYGON (((381531.8 58...
10 MULTIPOLYGON (((397435.4 58...
plot(berBzkSf["nam"])

The Berlin data is even finer than Bezirk level. In the following code-chunk we check the consistency of our two spatial datasets.

# Aggregate to Bezirke
berSkSf <- berBzkSf %>%
    mutate(IdBerlinSK = paste0(substr(sch, 1, 2), substr(sch, 6, 8))) %>%
    group_by(IdBerlinSK) %>%
    summarise()  #union boundaries

# Filter non Berlin districts
kreise <- kreiseSf %>%
    filter(GEN != "Berlin") %>%
    dplyr::select(c(ARS, EWZ))
# Reproject to consistent coord system
kreise <- kreise %>%
    st_transform(st_crs(kreiseSf))
berSkSf <- berSkSf %>%
    st_transform(st_crs(kreiseSf))
# Filter Districts that are close to Berlin
berProximity <- kreise %>%
    filter(st_intersects(kreiseSf %>%
        filter(GEN == "Berlin") %>%
        summarise() %>%
        st_buffer(500), geometry, sparse = FALSE, prepared = TRUE))

plot(berSkSf$geom, col = sf.colors(categorical = TRUE, alpha = 0.5))
plot(berProximity, add = T)

plot(berSkSf %>%
    filter(IdBerlinSK == "11005") %>%
    dplyr::select(geom))
plot(kreise$geom, border = "red", add = T, lwd = 1)

We see that the data does not seamingless fit. We prefer to avoid topological errors that could arise from false gaps between the polygons which could later affect the neighboordhod estimation.

In the following we extent every polygon of Berlin and subtract it with all others including the surrounding polygons of Brandenburg.

We see the boundaries do not consistently fit. In order to be able to do neighborhood analysis that leverages from the actual boundary geometries we need to fix this. A possible processing chain would be to extent (buffer) every Bezirk in Berlin and then subtract it with all others, including the surrounding Kreise of Brandenburg. Due to shortage of time however, we just aggregated the COVID case data from Bezirke to the whole of Berlin.

Neighborhoods

nb <- poly2nb(kreiseSf)  # genearte contigous neighborhood
cards <- card(nb)  # tally amount of nb per region
maxconts <- which(cards == max(cards))  # get the unit with the most nb

mostconts <- kreiseSf[maxconts, ]  # extract unit with most nb
conts <- kreiseSf[nb[[maxconts]], ]  # extract nb units

# plot all sequentially
plot(kreiseSf$geometry, xlim = st_bbox(conts)[c(1, 3)], ylim = st_bbox(conts)[c(2,
    4)])
plot(conts$geometry, col = "green", add = T)
plot(mostconts$geometry, col = "red", add = T)
title(main = "Region with largest number of contiguities", cex.main = 0.6)

The Kreis with the most neighbors is Rhein-Pfalz-Kreis with 12 contigous neighbors. More about spatial neighborhoods and how to compute them in session 3.


Back to overview

All contents are licensed under GNU General Public License v3.0