What’s special about spatial
What is (Geo) Spatial Data?
Spatial data is information about a phenomena organized in a spatial frame - the geographic frame
Spatial analysis then is a collection of methods applied to spatial data that
- add value
- reveal, explain patterns and anomalies
- support decisions
GeoData and spatial data? Often used synonymous. Geo explicitly means earth as the spatial reference
Geographic Information Systems (GIS) is another term often used to describe spatial data or spatial analysis. Often it refers to (desktop) software that is driven at its heart by spatial data but has many other tasks besides spatial analysis such as visualization, editing and management.
Layer Principle
Data for this workshop
We use already preprocessed datasets. If you are interested in running the data preparation yourself please use the script here: https://github.com/GIScience/global-health-academy/blob/main/00data_preparation.Rmd. The data origin is:
- RKI COVID 19 daily reportings: https://www.arcgis.com/home/item.html?id=f10774f1c63e40168479a1feb6c7ca74
- Kreis Grenzen: https://gdz.bkg.bund.de/index.php/default/verwaltungsgebiete-1-250-000-mit-einwohnerzahlen-ebenen-stand-31-12-vg250-ew-ebenen-31-12.html
- INKAR Socio-economic indicators: https://www.inkar.de/
Spatial data in R
The sf
package extends R data.frame
class to spatial dataframes:
https://r-spatial.github.io/sf/articles/sf1.html
Geometry types
https://autogis-site.readthedocs.io/en/latest/notebooks/L1/geometric-objects.html
How does this look like for our Kreise dataset.
<- st_read("data/vg250-ew_12-31.utm32s.shape.ebenen/vg250-ew_ebenen_1231/VG250_KRS.shp",
kreiseSf quiet = T)
class(kreiseSf)
[1] "sf" "data.frame"
%>%
kreiseSf ::select(GEN, BEZ, EWZ, geometry) dplyr
Simple feature collection with 431 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
GEN BEZ EWZ geometry
1 Flensburg Kreisfreie Stadt 90164 MULTIPOLYGON (((526513.8 60...
2 Kiel Kreisfreie Stadt 246794 MULTIPOLYGON (((575841.6 60...
3 Lübeck Kreisfreie Stadt 216530 MULTIPOLYGON (((623056.2 59...
4 Neumünster Kreisfreie Stadt 80196 MULTIPOLYGON (((565015.7 60...
5 Dithmarschen Kreis 133193 MULTIPOLYGON (((505053.4 60...
6 Herzogtum Lauenburg Kreis 198019 MULTIPOLYGON (((614093.1 59...
7 Nordfriesland Kreis 165951 MULTIPOLYGON (((464810.7 61...
8 Ostholstein Kreis 200539 MULTIPOLYGON (((637021.9 60...
9 Pinneberg Kreis 316103 MULTIPOLYGON (((549382.6 59...
10 Plön Kreis 128686 MULTIPOLYGON (((586149.4 60...
%>%
kreiseSf ::select(geometry) %>%
dplyrplot()
%>%
kreiseSf ::filter(GEN == "Heidelberg") %>%
dplyr::select(geometry) %>%
dplyrplot()
%>%
kreiseSf ::filter(GEN == "Rhein-Neckar-Kreis" & BEZ == "Landkreis") %>%
dplyr::select(geometry) %>%
dplyrplot()
The major part of preparation was to take the COVID 19 daily reportings and join them onto the Kreis geometries. The COVID data contains an implicit spatial reference: The name and a ID of the Kreis. With the join of the actual geometries of the Kreis boundaries, the COVID data becomes spatially explicit.
Geometric operations
The geometry attribute has properties that can be used with geometric methods. Spatial relations and predicates can be used to filter, select and join datasets. Geometric processing can be used to change the geometric shape.
https://www.e-education.psu.edu/maps/l2_p5.html
During preprocessing of the data we came across the cases of Eisenach and Berlin. Eisenach and its neighboring Wartburgkreis have been merged at first of July 2021. So in the middle of the pandemic with reportings before and reportings after. The administrative boundaries do not reflect this. We will dissolve the two units into one and sum up the COVID reprotings for both Kreise.
Dissolve Eisenach and Wartburgkreis
<- kreiseSf %>%
eisenach_wartburg ::filter(GEN %in% c("Eisenach", "Wartburgkreis"))
dplyrplot(kreiseSf$geometry, col = "lightgrey", xlim = st_bbox(eisenach_wartburg)[c(1,
3)], ylim = st_bbox(eisenach_wartburg)[c(2, 4)])
plot(eisenach_wartburg["GEN"], add = T)
Add a dummy field that contains everywhere distinct values but not for the two districts if interest:
<- which(kreiseSf$GEN %in% c("Eisenach", "Wartburgkreis"))
idx $dummy <- 1:nrow(kreiseSf)
kreiseSf$dummy[idx] <- 0 kreiseSf
Dissolve by dummy field and recalculate attribute fields. As Eisenach comes first we use last to get the different identifiers from Wartburgkreis. For the inhabitants (EWZ) we have to sum the two values.
# head(kreiseSf)
<- kreiseSf %>%
kreiseSfDissolved group_by(dummy) %>%
summarize(RS_0 = last(RS_0), EWZ = sum(EWZ), GEN = last(GEN), GF = last(GF),
BEZ = last(BEZ), ADE = last(ADE), BSG = last(BSG), ARS = last(ARS), AGS = last(ARS),
SDV_ARS = last(SDV_ARS), ARS_0 = last(ARS_0), AGS_0 = last(AGS_0), RS = last(RS))
Check the result
%>%
kreiseSfDissolved ::filter(GEN %in% c("Eisenach", "Wartburgkreis")) dplyr
Simple feature collection with 1 feature and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 562009.3 ymin: 5608755 xmax: 613339.8 ymax: 5667664
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 1 × 15
dummy RS_0 EWZ GEN GF BEZ ADE BSG ARS AGS SDV_ARS ARS_0
* <dbl> <chr> <dbl> <chr> <int> <chr> <int> <int> <chr> <chr> <chr> <chr>
1 0 16063… 161224 Wartbu… 4 Landk… 4 1 16063 16063 160635… 1606…
# … with 3 more variables: AGS_0 <chr>, RS <chr>, geometry <MULTIPOLYGON [m]>
plot(kreiseSf$geometry, col = "lightgrey", xlim = st_bbox(eisenach_wartburg)[c(1,
3)], ylim = st_bbox(eisenach_wartburg)[c(2, 4)])
# plot(kreiseSfDissolved['GEN'], add=T)
filter(kreiseSfDissolved, GEN %in% c("Eisenach", "Wartburgkreis")) %>%
::select(GEN) %>%
dplyrplot(add = T)
Now we see only one row/feature in our spatial dataframe.
Add Berliner Stadtbezirke
Berlin is reported at a finer scale by the RKI data than the administrative units and INKAR. We would like to use the available data and merge our Kreise with Berlins other boundaries.
<- st_read("data/berlin_bezirke.gpkg", quiet = T)
berBzkSf berBzkSf
Simple feature collection with 97 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 370000.8 ymin: 5799521 xmax: 415786.6 ymax: 5837259
Projected CRS: ETRS89 / UTM zone 33N
First 10 features:
uuid sch nam gdf bezeich
1 DEBE04YY5000000B 110000040403 Schmargendorf 3588120 AX_KommunalesGebiet
2 DEBE04YY50000001 110000040405 Westend 13527378 AX_KommunalesGebiet
3 DEBE12YYJ000000H 110000121208 Lübars 5010988 AX_KommunalesGebiet
4 DEBE12YYJ000000B 110000121205 Frohnau 7839990 AX_KommunalesGebiet
5 DEBE01YYK0000001 110000010103 Hansaviertel 528303 AX_KommunalesGebiet
6 DEBE05YYQ000000F 110000050505 Gatow 10112629 AX_KommunalesGebiet
7 DEBE05YYQ000000B 110000050506 Kladow 14779080 AX_KommunalesGebiet
8 DEBE11YYH0000001 110000111104 Falkenberg 3053073 AX_KommunalesGebiet
9 DEBE04YY5000000D 110000040404 Grunewald 22354409 AX_KommunalesGebiet
10 DEBE09YYO000000J 110000090903 Baumschulenweg 4824099 AX_KommunalesGebiet
geom
1 MULTIPOLYGON (((384220.5 58...
2 MULTIPOLYGON (((383406.4 58...
3 MULTIPOLYGON (((389380.6 58...
4 MULTIPOLYGON (((384115.8 58...
5 MULTIPOLYGON (((387740.3 58...
6 MULTIPOLYGON (((376907.4 58...
7 MULTIPOLYGON (((373467.8 58...
8 MULTIPOLYGON (((402860.1 58...
9 MULTIPOLYGON (((381531.8 58...
10 MULTIPOLYGON (((397435.4 58...
plot(berBzkSf["nam"])
The Berlin data is even finer than Bezirk level. In the following code-chunk we check the consistency of our two spatial datasets.
# Aggregate to Bezirke
<- berBzkSf %>%
berSkSf mutate(IdBerlinSK = paste0(substr(sch, 1, 2), substr(sch, 6, 8))) %>%
group_by(IdBerlinSK) %>%
summarise() #union boundaries
# Filter non Berlin districts
<- kreiseSf %>%
kreise filter(GEN != "Berlin") %>%
::select(c(ARS, EWZ))
dplyr# Reproject to consistent coord system
<- kreise %>%
kreise st_transform(st_crs(kreiseSf))
<- berSkSf %>%
berSkSf st_transform(st_crs(kreiseSf))
# Filter Districts that are close to Berlin
<- kreise %>%
berProximity filter(st_intersects(kreiseSf %>%
filter(GEN == "Berlin") %>%
summarise() %>%
st_buffer(500), geometry, sparse = FALSE, prepared = TRUE))
plot(berSkSf$geom, col = sf.colors(categorical = TRUE, alpha = 0.5))
plot(berProximity, add = T)
plot(berSkSf %>%
filter(IdBerlinSK == "11005") %>%
::select(geom))
dplyrplot(kreise$geom, border = "red", add = T, lwd = 1)
We see that the data does not seamingless fit. We prefer to avoid topological errors that could arise from false gaps between the polygons which could later affect the neighboordhod estimation.
In the following we extent every polygon of Berlin and subtract it with all others including the surrounding polygons of Brandenburg.
We see the boundaries do not consistently fit. In order to be able to do neighborhood analysis that leverages from the actual boundary geometries we need to fix this. A possible processing chain would be to extent (buffer) every Bezirk in Berlin and then subtract it with all others, including the surrounding Kreise of Brandenburg. Due to shortage of time however, we just aggregated the COVID case data from Bezirke to the whole of Berlin.
Neighborhoods
<- poly2nb(kreiseSf) # genearte contigous neighborhood
nb <- card(nb) # tally amount of nb per region
cards <- which(cards == max(cards)) # get the unit with the most nb
maxconts
<- kreiseSf[maxconts, ] # extract unit with most nb
mostconts <- kreiseSf[nb[[maxconts]], ] # extract nb units
conts
# plot all sequentially
plot(kreiseSf$geometry, xlim = st_bbox(conts)[c(1, 3)], ylim = st_bbox(conts)[c(2,
4)])
plot(conts$geometry, col = "green", add = T)
plot(mostconts$geometry, col = "red", add = T)
title(main = "Region with largest number of contiguities", cex.main = 0.6)
The Kreis with the most neighbors is Rhein-Pfalz-Kreis with 12 contigous neighbors. More about spatial neighborhoods and how to compute them in session 3.
All contents are licensed under GNU General Public License v3.0