Chapter 9 Spatial explorative data analysis - local patterns of spatial auto-correlation

So far we have looked at global spatial autocorrelation. A data set might show spatial clustering or show a regular pattern. However, it might be that different parts of the area of interest show different behavior. A part of the data might be clustered while the rest is not showing any spatial pattern. In sum this might lead to a low positive spatial autocorrelation which we might not consider to interesting. However, this spatial pattern in a part of the area of interest might be highly informative as it might indicate for example a missing covariate. It is therefore worth, exploring local patterns as well. In other words, we are now exploring instationarity in the data as well.

require(sf)
## Loading required package: sf
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
require(spdep)
## Loading required package: spdep
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
require(ncf)
## Loading required package: ncf
require(tmap)
## Loading required package: tmap
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
require(ggplot2)
## Loading required package: ggplot2
require(ggpubr)
## Loading required package: ggpubr
require(gstat) # for simulation of autocorrelated data
## Loading required package: gstat
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

9.1 Cheat cheet

Required packages:

  • spdep
  • tmap
  • sf

Important commands:

  • spdep::localmoran - calculates local Moran’s I
  • spdep::LOSH - calculate LOSH
  • spdep::localC - calculate local Geary’s C
  • spdep::localG - calculate local Getis G and Gstar
  • spdep::include.self - include observation \(i\) in the neighborhood (for Getis G)
  • spdep::p.adjustSP - adjust local association measures’ p-values
  • stats::p.adjust - “normal” approach to adjust p-values for multiple testing

9.2 Theory

9.2.1 Local Moran’s I

Local Moran’s I (also named LISA) calculates Moran’s I in a neighborhood, check’s significance and compares on the Moran’s I value with the values in the neighborhood (Anselin, 1995).

\[ I_i = \frac{(x_i-\bar{x})}{{∑_{k=1}^{n}(x_k-\bar{x})^2}/(n-1)}{∑_{j=1}^{n}w_{ij}(x_j-\bar{x})} \] With \(i\) the observation (district) index \(w_{ij}\) the element of the spatial weight matrix \(W\), \(x\) the value of interest (here the empirical bayes index of the incidence rate) and \(k\) the number of neighbors of district \(i\).\(\bar{x}\) is the global average.

Local Moran’s I values do not indicate the direction of change: - High values at location i, surrounded by high values in neighborhood -> positive value - Low values at location i, surrounded by low values in neighborhood -> positive value - High values at location i, surrounded by low values in neighborhood -> negative value - Low values at location i, surrounded by high values in neighborhood -> negative value

For the interpretation, local Moran’s I values are categorized (if significant) in comparison to the global mean as well. The following categories (quadrants) are used:

  • High – high – value higher than the global mean, surrounded by similar values (cluster of high values)
  • Low – low – value lower than the global mean, surrounded by similar values (cluster of low values)
  • High –low – high value (compared to the mean) surrounded by dissimilar values (negative spatial autocorrelation), i.e. high local outlier
  • Low - high – value lower than the global mean, surrounded by dissimilar values (negative spatial autocorrelation) , i.e. low local outlier

The first part of the quadrant description relates to the value of location \(i\) and the second one to the lagged value. It relates to the four quadrants of the Moran scatterplot.

Instead of using the mean to classify the spatial objects with respect to their context, one can as well use the median. Another option (named pysal as this is the default in the python package pysal by Luc Anselin and friends) uses the centered variable and the lagged centred variable splitted at zero as a criterium to classify into the 4 quadrants.

The local Moran’s I values can be checked for significance by means of a normal approximation or by a permutation test. As each district is tested separately we run into the problem of multiple testing. In simple words the problem can be described as follows: if we test a lot, some units will be significant just by chance (increased type I error rate, we might fail not reject \(H_0\)). To address this problem several adjustment factors exist which correct the p-vales at the cost of increased type II error rate (we might wrongly reject \(H_0\)). The most conservative and not recommended approach is the bonferoni correction. A suitable alternative that is frequently used is the Holm correction. See ?p.adjust for the other correction methods implemented in R.

For Bonferoni the p-values are multiplied by the number of comparisons. The Holm correction is similar to the Bonferoni approach, but the correction factor only considers hypothesis for observations that are not clearly non-significant (Holm, 1979).

It is calculated by spdep::localmoran and spdep::localmoran_perm.

The sum of all local Moran’s I values divided by the number of spatial units equals the global Moran’s I.

9.2.2 Local Geary’s C

Similar to local Moran’s I there exist a local version of Geary’s C. It is calculated as

\[ C_i = \sum_j^n w_{ij}(x_i-x_j)^2 \] It is calculated by spdep::localC and spdep::localC_perm.

9.2.3 Getis G and Gstar

Getis G and Gstar measure how the local mean value compares to the global mean value. The sign indicates the presence of local clusters of high (high value for G or Gstar) or low values (low values for G or Gstar). This makes it easier to interpret than local Moran’s I.

The difference between G and Gstar is the exclusion or the inclusion of the actual spatial unit i in the calcualtaion.

\[ G_i = \frac{\sum_{j, j\neq i}^n w_{ij}x_j}{\sum_{j, j\neq i}^n x_j}\]

\[ G_i^* = \frac{\sum_{j}^n w_{ij}x_j}{\sum_{j}^n x_j}\]

Both values are calculated in R by spdep::localG, the difference between G nd G* is expressed by using a different neighborhood. The command to include the observation \(i\) itself in the neighborhood is spdep::include.self.

9.2.4 LOSH - local spatial heteoscedasticity

LOSH measures spatial heterogeneity, how much values differ in the neighborhood. While Getis G looks at the mean value the focus lies here on variability. LOSH is calculated as follows (Ord and Getis, 2012):

\[H_i = \frac{\sum_j^n w_{ij}|e_j|^a}{h_1\sum_j^n w_{ij}} \] with (residual about the local spatially weighted mean)

\[ e_j = x_j - \bar{x}_j\] and (local spatially weighted mean)

\[ \bar{x}_j = \frac{\sum_k^n w_{jk}x_k}{\sum_k^n w_{jk}} \]

and

\[ h_1 = \sum_i^n |e_i|^a / n \] , the average of the absolute (for a=1) or squared (a=2) errors. The use of \(h_1\) in the denominator scales the statistic.

The exponent \(a\) specifies which type of mean dispersal is investigated. The (default) value of \(a=2\) for LOSH measures heterogeneity in spatial variance. For \(a=1\) we would measure heterogeneity in the absolute deviations.

The significance of the \(H_i\) values can be tested both under the assumption of a \(\chi^2\) distribution or based on a bootstrap approach. According to Xu et al. (2014) the later is preferable as shown by simulation experiments.

As for the other local statistics it is recommended to correct for the issue of multiple testing.

9.3 German COVID-19 incidence rate

9.3.1 Local Moran’s I

We will reuse the empirical bayes index calculated above.

covidKreise <- st_read(dsn = "data/covidKreise.gpkg", layer = "kreiseWithCovidMeldeWeeklyCleaned")
## Reading layer `kreiseWithCovidMeldeWeeklyCleaned' from data source 
##   `/home/slautenbach/Documents/lehre/HD/git_areas/spatial_data_science/spatial-data-science/data/covidKreise.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 400 features and 145 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
## Projected CRS: ETRS89 / UTM zone 32N
polynb <- poly2nb(covidKreise)
poSurfKreise <- st_point_on_surface(covidKreise)
## Warning: st_point_on_surface assumes attributes are constant over geometries
distnb <- dnearneigh(poSurfKreise, d1=1, d2= 70*10^3)
unionedNb <- union.nb(distnb, polynb)

dlist <- nbdists(unionedNb, st_coordinates(poSurfKreise))
dlist <- lapply(dlist, function(x) 1/x)
unionedListw_d <- nb2listw(unionedNb, glist=dlist, style = "W")

ebiMc <- EBImoran.mc(n= st_drop_geometry(covidKreise)%>% pull("sumCasesTotal"),
              x= covidKreise$EWZ, listw = unionedListw_d, nsim =999)
## The default for subtract_mean_in_numerator set TRUE from February 2016

The following convenience function is used to calculate local Moran’s I, to adjust for mutiple testing and to return the 4 quadrants (low-low, low-high,..) if significant. The result is a character vector with each spatial unit assigned to one of the four quadrants or to the not-significant class.

Unfortunately it will only run on a recent version of spdep. Please check your version using the following code:

spdep()
## [1] "spdep, version 1.2-8, 2023-02-27"

If your version is older than 1.2-4, 2022-04-18 the code might not work properly.

getLocalMoranFactor <- function(x, listw, pval, quadr = "mean", p.adjust.method ="holm", method="normal", nb)
{
  if(! (quadr %in% c("mean", "median", "pysal")))
    stop("getLocalMoranFactor: quadr needs to be one of the following values: 'mean', 'median', 'pysal'")
  if(! (method %in% c("normal", "perm")))
    stop("getLocalMoranFactor: method needs to be one of the following values: 'normal', 'perm'")
  if(! (p.adjust.method %in% p.adjust.methods))
    stop(paste("getLocalMoranFactor: p.adjust.method needs to be one of the following values:", p.adjust.methods))
  
  # adjust for multiple testing
  if(method == "normal")
    lMc <- spdep::localmoran(x, listw= listw) 
  else
    lMc <- spdep::localmoran_perm(x, listw= listw)
  lMc[,5] <-  p.adjustSP(lMc[,5], method = p.adjust.method, nb = nb)
  lMcQuadr <- attr(lMc, "quadr")
  
  lMcFac <- as.character(lMcQuadr[, quadr])
  # which values are significant
  idx <- which(lMc[,5]> pval)
  lMcFac[idx] <- "Not sign."
  lMcFac <- factor(lMcFac, levels = c("Not sign.", "Low-Low", "Low-High", "High-Low",  "High-High"))
  return(lMcFac)
}

You can use the function as follows:

  • x - vector of values for which local Moran’s I should be calculated
  • listw - the weighted list
  • nb - the neighborhood definition (the same as used for the weighted list)
  • pval - the threshold used to decide if a local I is significantly different from the expected value
  • quadr - the method used to define the quadrant: based on the mean (“mean”) or the median (“median”); “pysal” uses the normal z-score (centered and scaled by one standard deviation) to define the 4 quadrants
  • p.adjust.method - approach to be used for adjusting for multiple test, default is holm
  • method: should we use the normal approximation (“normal”) or the permutation test (“perm”, generally recommended)

The function returns a character vector,which we attach to the sf object.

covidKreise$localM <- getLocalMoranFactor(ebiMc$z, listw = unionedListw_d, pval=0.05, nb= unionedNb)

We can used that to plot local Moran’s I.

localMcPalette <- c("white", "midnightblue", "lightblue", "lightpink", "red")

m1 <- tm_shape(covidKreise) + 
  tm_polygons(col="localM", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreise) + 
  tm_polygons(col="incidenceTotal", legend.hist = TRUE, palette= "-plasma",
                         legend.reverse = TRUE, title = "Incidence rate") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, title = "Incidence",
            legend.outside.position = "left") + tm_scale_bar()


tmap_arrange(m1, m2)

Testing different ways of defining the labels:

covidKreise$localMmedian <- getLocalMoranFactor(ebiMc$z, listw = unionedListw_d, pval=0.05, quadr = "median", nb= unionedNb)
covidKreise$localMpysal <- getLocalMoranFactor(ebiMc$z, listw = unionedListw_d, pval=0.05, quadr = "pysal", nb= unionedNb)
m1 <- tm_shape(covidKreise) + 
  tm_polygons(col="localM", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, mean") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreise) + 
  tm_polygons(col="localMmedian", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, median") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m3 <- tm_shape(covidKreise) + 
  tm_polygons(col="localMpysal", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, pysal") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

tmap_arrange(m1, m2, m3)

For this data set no differences present.

Testing differnt ways of adjusting for multiple testing

covidKreise$localMnone <- getLocalMoranFactor(ebiMc$z, listw = unionedListw_d, pval=0.05, p.adjust.method = "none", nb= unionedNb)
covidKreise$localMbonf <- getLocalMoranFactor(ebiMc$z, listw = unionedListw_d, pval=0.05, p.adjust.method = "bonferroni", nb= unionedNb)
m1 <- tm_shape(covidKreise) + 
  tm_polygons(col="localM", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, holm") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreise) + 
  tm_polygons(col="localMnone", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, none") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m3 <- tm_shape(covidKreise) + 
  tm_polygons(col="localMbonf", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, bonferoni") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

tmap_arrange(m1, m2, m3)

As we see, there are remarkable differences. Not correction for multiple testing leads to a large number of districts with significant local spatial autocorrelation: We see a clear spatial pattern: the South-East (Bavaria, Saxony, parts of Baden-Württemberg) is characterized by districts with high values compared to the global mean which are surrounded by similar values (high positive autocorrelation). Large parts of Rhineland-Palatina, NRW, Schleswig-Holstein and parts of Lower Saxony have low incidence rates (compared to the global mean) and are surrounded by districts with dissimilar incidence rates. This region is characterized by strong differences between the different districts. In between we always see a few districts that are spatial outliers.

The correction by Holm renders many of those ditricts non significant and the conservative Bonferoni correction renders even more insignificant.

Comparing permutation test results with the normal approximation test:

covidKreise$localMperm <- getLocalMoranFactor(ebiMc$z, listw = unionedListw_d, pval=0.05, p.adjust.method = "holm", method = "perm", nb= unionedNb)
m1 <- tm_shape(covidKreise) + 
  tm_polygons(col="localM", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, normal approx.") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreise) + 
  tm_polygons(col="localMperm", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA, permutation") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

tmap_arrange(m1, m2)

9.3.2 Local Geary’s C

For comparison we calculate Geary’s C, that is based on the squared differences between neighboring observations instead of the product of their differences of the mean. We did not provide a convenience function but the code is relatively straight forward. We calculate local Geary’s C using the parametric test or the permutation test (the later as usual recommended), extract the p-values, which are stored as an attribute of the returned object and adjust those for multiple testing (using the approach by Holm). We use that to filter for non-significant districts which are used as a mask then we creat the map. The information about the quadrants (named differently compared to local Moran’s I) is as well stored in an attribute that we extract with attr(object, nameOfTheAttribute). attributes(object) would return us all attributes of an object.

covidKreise$Gc <- localC(ebiMc$z, listw = unionedListw_d)

locGc <- localC_perm(ebiMc$z, listw = unionedListw_d, nsim = 999)
covidKreise$GcPval <- attr(locGc, "pseudo-p")[,5]
covidKreise$GcPval <- p.adjustSP(covidKreise$GcPval, nb = unionedNb, method = "holm")
covidKreiseGcPval_nonsignif <- filter(covidKreise, GcPval > 0.05)
covidKreise$GcQuadr <- attr(locGc, "cluster")
#unique(covidKreise$GcQuadr)
localCPalette <- rev(c("lightblue", "lightpink", "midnightblue", "red"))

tm_shape(covidKreise, is.master = TRUE) + 
  tm_polygons(col="GcQuadr",  palette= localCPalette,
              legend.hist = FALSE, legend.reverse = TRUE, title = "Local Geary's C") +
  tm_shape(covidKreiseGcPval_nonsignif) +
  tm_fill(col="white", alpha=0.7, title = "non sign.") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

The pattern is similar to the map for local Moran’s I with some differences. Chemnitz for example does not show up as a high-high cluster by Geary’s C but Berlin and Brandenburg an der Havel were identified by Geary’s C but not by Moran’s I as part of a low-low cluster. Geary’s C high-high cluster reaches much far to south western Baden-Württemberg and the low-low cluster in NRW, norhtern Rhineland-Palatina differs between the two approaches. Hamburg is part of the low-low cluster in Moran’s I but not for Geary’s C. The differences might however look more pronounced than they are due to the cutoff value used for the significance testing.

The map shown above is based on the permutation test, but the similar approach could be used for the parametric test.

covidKreise$Gc <- localC(ebiMc$z, listw = unionedListw_d)

9.3.3 Getis G and Gstar

Both localG and localG_perm return a z-score that indicates the probability of observing such a value by chance (expressed in terms of the standard normal distribution so usual thresholds are valid).

covidKreise$Gi <- localG(ebiMc$z, listw = unionedListw_d)
covidKreise$Gi_perm <- localG_perm(ebiMc$z, listw = unionedListw_d, nsim = 999)
#covidKreiseGiPval_nonsignif <- filter(covidKreise, abs(Gi_perm) < 1.96)

Classify values with respect to the probability:

breaks <- c(min(covidKreise$Gi_perm ), -2.58, -1.96, -1.65, 1.65, 1.96, 2.58, max(covidKreise$Gi_perm ))

covidKreise <- covidKreise %>% mutate(gCluster = cut(Gi_perm, breaks=breaks, include.lowest = TRUE, labels=c("Cold spot: 99% confidence", "Cold spot: 95% confidence", "Cold spot: 90% confidence", "Not significant","Hot spot: 90% confidence", "Hot spot: 95% confidence", "Hot spot: 99% confidence"))) 
tm_shape(covidKreise) + 
  tm_polygons(col="gCluster", palette = "-RdBu", 
              legend.hist = FALSE, legend.reverse = TRUE, title = "Local Getis G") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

Getis G indicates a clear separation in a belt of high values in Bavaria, Saxony and parts ofThuringia and Baden-Württemberg and a cold spot belt in most of NRW, Rhineland-Palatino, Schleswig-Holstein and parts of Lower Saxony.

For Getis G* we need to create a new weighted list including the observation itself. This makes it impossible to use inverse distance weighting as it includes a distance of zero.

unionedListw_self <-  nb2listw(include.self(unionedNb), style = "W")

covidKreise$GiStar <- localG(ebiMc$z, listw = unionedListw_self)
covidKreise$GiStar_perm <- localG_perm(ebiMc$z, listw = unionedListw_self, nsim = 999)
breaks <- c(min(covidKreise$GiStar_perm ), -2.58, -1.96, -1.65, 1.65, 1.96, 2.58, max(covidKreise$GiStar_perm ))

covidKreise <- covidKreise %>% mutate(gStarCluster = cut(GiStar_perm, breaks=breaks, include.lowest = TRUE, labels=c("Cold spot: 99% confidence", "Cold spot: 95% confidence", "Cold spot: 90% confidence", "Not significant","Hot spot: 90% confidence", "Hot spot: 95% confidence", "Hot spot: 99% confidence"))) 
tm_shape(covidKreise) + 
  tm_polygons(col="gStarCluster", palette = "-RdBu", 
              legend.hist = FALSE, legend.reverse = TRUE, title = "Local Getis G*") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

While absolute values change slightly, the overall pattern for G* is similar with a few exceptions as for G.

The maps we have created so far are not adjusted for the multiple testing problem. The function localG_perm returns an attribute called internals with intermediate results that could be used to account for this.

Gi_perm_internals <- attr(covidKreise$Gi_perm, "internals") %>%
  as.data.frame()
names(Gi_perm_internals)
## [1] "Gi"                 "E.Gi"               "Var.Gi"            
## [4] "StdDev.Gi"          "Pr(z != E(Gi))"     "Pr(z != E(Gi)) Sim"
## [7] "Pr(folded) Sim"     "Skewness"           "Kurtosis"

Next we extract the field “Pr(folded) Sim” with contains the simulation based p-value and adjust that for the multiple testing problem using Holm’s approach. The use p.adjustSP that only takes the number of neighbors for each unit as the factor to be included (instead of taking the total number of spatial units which would be too conservative). Based on that we create a polygon mask that we will plot on top of Getis G values.

covidKreise$Gi_perm_padj <- p.adjustSP(p= Gi_perm_internals[, "Pr(folded) Sim"], nb = unionedNb, method = "holm")
covidKreise_Gi_permNonSignifSf <- filter(covidKreise, Gi_perm_padj > 0.05)
tm_shape(covidKreise) + 
  tm_polygons(col="Gi_perm", palette = "-RdBu", 
              legend.hist = FALSE, legend.reverse = TRUE, title = "Local Getis G, p-values, Holm") +
  tm_shape(covidKreise_Gi_permNonSignifSf) + tm_fill(col="grey", alpha=.7) +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()
## Variable(s) "Gi_perm" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

The map shows the Getis G values all areas, that are non significant after holm correction are masked.

We could as well plot the incidence rate and use the Getis G based mask.

tm_shape(covidKreise) + 
  tm_polygons(col="incidenceTotal", palette = "-RdBu", 
              legend.hist = FALSE, legend.reverse = TRUE, title = "Incidence total period") +
  tm_shape(covidKreise_Gi_permNonSignifSf) + tm_fill(col="grey", alpha=.8) +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right",
            main.title = "Hotspot analysis - Getis G") + tm_scale_bar()

9.3.4 LOSH - local spatial heteroscedasticity

The LOSH considered local variances instead of local means. It can thereby be used to identify areas which contain both high and low values. These areas could represent transitional zones where different spatial processes overlap. This highlights of course areas different from the clusters identified by the other three local indicators of spatial autocorrelation as we are not interested in areas that are in a homogeneous neighborhood or are spatial outliers but in those there both high and low values are in a neighborhood.

The run LOSH on the empirical Bayes index derived before, using a permutation test. The method returns a matrix with four columns:

  • Hi - the LOSH value for spatial unit i, \(H_i\)
  • x_bar_i - the local spatial weighted mean for spatial unit i, \(\bar{x}_i\)
  • ei - the residual about \(\bar{x}_i\) for spatial unit i, \(e_i\)
  • Pr() - the p-value for \(H_i\)

For the formulas, see the theory section above.

We convert the matrix to a dataframe and bind it columnwise with the sf object - the order is the same, therefore we can simply cbind the two tables. The name of some colums in the matrix change as "P()" is not allowed for column names in a data.frame. Afterwards we adjust p-values for multiple testing using the spatial version p-adjustSP that considers only the number of neighbors of each object for the adjustment (same as for local Moran’s I, Geary’s C and Getis-Ord G or G*). The adjusted p-values are then used to filter the sf objects. The resulting sf objects is later used to mask non-significant areas then creating the map.

set.seed(3)
covidLosh <- LOSH.mc(ebiMc$z, listw = unionedListw_d, nsim = 499)
names(as.data.frame(covidLosh))
## [1] "Hi"      "x_bar_i" "ei"      "Pr()"
covidKreiseLosh <- cbind(covidKreise, as.data.frame(covidLosh))
covidKreiseLosh$ebiMcZ <- ebiMc$z
# holm adjustment
covidKreiseLosh$pvalHiHolm <- p.adjustSP(covidKreiseLosh$Pr.., nb = unionedNb, method="holm")
covidKreiseLoshNoSign <- filter(covidKreiseLosh, pvalHiHolm> 0.1)

Effect of the Holm adjustment for multiple testing:

summary(covidKreiseLosh$pvalHiHolm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0280  1.0000  1.0000  0.9322  1.0000  1.0000
summary(covidKreiseLosh$Pr..)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0020  0.2470  0.5580  0.5336  0.8340  0.9980

As we see, for many districts \(H_i\) became not significantly differently from the expected value by the adjustment.

Next we plot the EBI, the LOSH (masking the non-significant districts by a semi-transparent layer). For a better understanding of the calculation of \(H_i\) we have added \(e_i\) and \(\bar{x}_i\) maps as well.

tmap_mode("plot")
## tmap mode set to plotting
m1 <- tm_shape(covidKreiseLosh) + 
  tm_polygons(col="ebiMcZ", 
              legend.hist = FALSE, legend.reverse = TRUE, 
              title = "Empirical Bayes Modification, incidence") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreiseLosh) + 
  tm_polygons(col="Hi", 
              legend.hist = FALSE, legend.reverse = TRUE, 
              title = "LOSH") + 
  tm_shape(covidKreiseLoshNoSign) + 
  tm_fill(col="grey", alpha = 0.8) +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m3 <- tm_shape(covidKreiseLosh) + 
  tm_polygons(col="x_bar_i", 
              legend.hist = FALSE, legend.reverse = TRUE, 
              title = "Local spatially weighted mean") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m4 <- tm_shape(covidKreiseLosh) + 
  tm_polygons(col="ei", 
              legend.hist = FALSE, legend.reverse = TRUE, 
              title = "residuals about local spatially weighted mean") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

tmap_arrange(m1,m2,m3,m4)
## Variable(s) "ebiMcZ" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Variable(s) "x_bar_i" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Variable(s) "ebiMcZ" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Variable(s) "x_bar_i" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

The see that were are two larger blocks of district with large heteroscedasticity:

  • one around Göttingen, Kassel and the neighboring districts in Thuringia. which seem to indicate differences between districts in Lower Saxony and Thuringia
  • the other covering the Rheine/Ibbenbüren and the Emsland which were characterized by higher incidence rates compared to the districts of the Ruhrgebiet, Osnabrück, Bremen and the districts further up north

In addition, the neighborhood of Stendal is flagged as an area of high spatial heterogeneity.

9.4 German administrative statistics

To show the diversity of local patterns we apply the local indicators on a few other attributes. All examples use the INKAR administrative statistics at the level of the german districts

9.4.1 Share of votes for right-winged party

covidKreise$localM_afd <- getLocalMoranFactor(covidKreise$Stimmenanteile.AfD, listw = unionedListw_d, pval=0.05, nb= unionedNb)
m1 <- tm_shape(covidKreise) + 
  tm_polygons(col="localM_afd", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreise) + 
  tm_polygons(col="Stimmenanteile.AfD", legend.hist = TRUE, palette= "-plasma",
                         legend.reverse = TRUE, title = "Share votes right winged") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, title = "Votes",
            legend.outside.position = "left") + tm_scale_bar()


tmap_arrange(m1, m2)

Moran’s I highlights the big spatial cluster of high votes for the AFD in parts of the eastern part of Germany as well as two low outliers attached to the cluster (Berlin and Kronach). It further indicates three spatial clusters of districts with low votes for the AFD in the North and the West (around Cologne-Bonn and around Münster-Osnabrück).

idx <- which(covidKreise$localM_afd == "Low-High")
covidKreise[idx, "GEN"] %>% st_drop_geometry()
##         GEN
## 278 Kronach
## 326  Berlin
idx <- which(covidKreise$localM_afd == "Low-Low")
covidKreise$GEN[idx] %>% st_drop_geometry()
##  [1] "Neumünster"                 "Dithmarschen"              
##  [3] "Rendsburg-Eckernförde"      "Schleswig-Flensburg"       
##  [5] "Segeberg"                   "Osnabrück"                 
##  [7] "Ammerland"                  "Cloppenburg"               
##  [9] "Emsland"                    "Grafschaft Bentheim"       
## [11] "Leer"                       "Oldenburg"                 
## [13] "Osnabrück"                  "Vechta"                    
## [15] "Mönchengladbach"            "Rhein-Kreis Neuss"         
## [17] "Bonn"                       "Köln"                      
## [19] "Leverkusen"                 "Düren"                     
## [21] "Rhein-Erft-Kreis"           "Euskirchen"                
## [23] "Heinsberg"                  "Oberbergischer Kreis"      
## [25] "Rheinisch-Bergischer Kreis" "Rhein-Sieg-Kreis"          
## [27] "Münster"                    "Steinfurt"                 
## [29] "Warendorf"                  "Gütersloh"                 
## [31] "Ahrweiler"                  "Neuwied"                   
## [33] "Vulkaneifel"

9.5 Long term unemployment rate

covidKreise$localM_lunempl <- getLocalMoranFactor(covidKreise$Langzeitarbeitslosenquote, listw = unionedListw_d, pval=0.05, nb= unionedNb)
m1 <- tm_shape(covidKreise) + 
  tm_polygons(col="localM_lunempl", palette= localMcPalette, 
              legend.hist = FALSE, legend.reverse = TRUE, title = "LISA") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, 
            legend.outside.position = "right") + tm_scale_bar()

m2 <- tm_shape(covidKreise) + 
  tm_polygons(col="Langzeitarbeitslosenquote", legend.hist = TRUE, palette= "-plasma",
                         legend.reverse = TRUE, title = "Unemployment rate") +
  tm_layout(legend.outside = TRUE, bg.color = "darkgrey", outer.bg.color = "lightgrey",
            legend.outside.size = 0.5, attr.outside = TRUE, title = "Long term unemployment",
            legend.outside.position = "left") + tm_scale_bar()


tmap_arrange(m1, m2)

Local Moran’s I indicates a cluster of low longterm unemployment rates in Bavaria and Baden-Württemberg as well as clusters of high values at the Ruhr-Gebiet and some eastern districts as well as Cuxhaven at the North. Schweinsfurt is a high outlier in the southern cluster of low long term unemployment rates.

idx <- which(covidKreise$localM_lunempl == "High-High")
covidKreise$GEN[idx] %>% st_drop_geometry()
##  [1] "Cuxhaven"                    "Düsseldorf"                 
##  [3] "Duisburg"                    "Essen"                      
##  [5] "Krefeld"                     "Mönchengladbach"            
##  [7] "Mülheim an der Ruhr"         "Oberhausen"                 
##  [9] "Remscheid"                   "Solingen"                   
## [11] "Wuppertal"                   "Kleve"                      
## [13] "Mettmann"                    "Rhein-Kreis Neuss"          
## [15] "Viersen"                     "Wesel"                      
## [17] "Köln"                        "Leverkusen"                 
## [19] "Düren"                       "Rhein-Erft-Kreis"           
## [21] "Heinsberg"                   "Oberbergischer Kreis"       
## [23] "Rheinisch-Bergischer Kreis"  "Bottrop"                    
## [25] "Gelsenkirchen"               "Münster"                    
## [27] "Recklinghausen"              "Warendorf"                  
## [29] "Bochum"                      "Dortmund"                   
## [31] "Hagen"                       "Hamm"                       
## [33] "Herne"                       "Ennepe-Ruhr-Kreis"          
## [35] "Märkischer Kreis"            "Soest"                      
## [37] "Unna"                        "Ostprignitz-Ruppin"         
## [39] "Mecklenburgische Seenplatte" "Vorpommern-Greifswald"      
## [41] "Bautzen"                     "Anhalt-Bitterfeld"          
## [43] "Jerichower Land"             "Salzlandkreis"              
## [45] "Stendal"
idx <- which(covidKreise$localM_lunempl == "High-Low")
covidKreise$GEN[idx] %>% st_drop_geometry()
## [1] "Schweinfurt"
idx <- which(covidKreise$localM_lunempl == "Low-High")
covidKreise$GEN[idx] %>% st_drop_geometry()
## [1] "Osterholz"       "Friesland"       "Borken"          "Coesfeld"       
## [5] "Dahme-Spreewald"

References

Anselin, L., 1995. Local indicators of spatial association-LISA. Geographical Analysis 27, 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x
Holm, S., 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics 65–70.
Ord, J.K., Getis, A., 2012. Local spatial heteroscedasticity (LOSH). Ann Reg Sci 48, 529–539. https://doi.org/10.1007/s00168-011-0492-y
Xu, M., Mei, C.-L., Yan, N., 2014. A note on the null distribution of the local spatial heteroscedasticity (LOSH) statistic. Ann Reg Sci 52, 697–710. https://doi.org/10.1007/s00168-014-0605-5