Chapter 15 Point pattern analysis

library(ohsome)
library(sf)
library(terra)
library(tmap)
library(rgeoboundaries)
library(spatstat)
library(tidyverse)

15.1 Cheat sheet

The functions for spatial point pattern analyses are provided by the package spatstat: - spatstat::quadrantcount - local density of point patterns - spatstat::intensity - another local density measure - spatstat::density.ppp - kernel density measure - spatstat::rhohat - fit a non parametric curve to point pattern and covariate data - spatstat::nndist - generate nearest neighbor distances - spatstat::Kest - K function: estimate mean distances cumulative for different distance bands - spatstat::Lest - Transformation of K function: easier to interpret - spatstat::pcf - G function: Distance based method with non cumulative distance bands - spatstat::ppm - fit a poisson point process model on a point distribution and covariates

spatstat is the result of 30 years of software development and contains over 175,000 lines of code. It is still under development, motivated by the needs of researchers in many fields, and driven by innovations in statistical science. We welcome contributions of code, and suggestions for improvements.

github.com/spatstat/spatstat

More on spatial point pattern analysis and implementations in R see (Baddeley et al., 2015)

After you load the package in an R session, you can start an interactive tutorial by entering beginner in the console.

15.2 tl;dr - Spatial point pattern analysis in a nutshell

Spatial point pattern analysis is concerned with processes that drive the occurrence of points or events. Two main effects in the occurence of points are defined: - 1st order effect: Event observations vary from place to place due to differences in the underlying property - 2nd order effect: Event observations vary from place to place due to interactions between the events / points

Different methods exist to assess spatial point patterns. The two main categories are: - Density based methods - Distance based methods

Density based methods focus on the 1st order effect, the underlying property that may influence the occurence of events / points. Distance based methods focus on the 2nd order property.

However it is important to notice that it is often not possible to distinguish both effects. Furthermore prior knowledge about the study site and the observed phenomenon is highly important before drawing conclusions from spatial point pattern analysis results.

15.3 Theoretical background

A spatial point pattern is the result of spatial point process. These are considered stochastic processes whose realizations each consists of a finite or countably infinite set of points on a planar surface. For spatial point pattern analysis both the location fo the point itself and depending on the analyses attached values are treated as response variables. It is common to compare point patterns to artificial generated clustered, random or regular distributed point patterns.

The mentioned attached values or marks are like attributes for standard vector data. They can be of categorical or continuous scale and may be used to weigh events / points.

Poisson Point process

The poisson point process is a type of random mathematical object which generates points in a defined space randomly. The poisson distribution is the probability of a random variable \(N\) such that the probability that \(N\) equals \(n\) is explained by:

\[\mathbb{P} \left \{ N = n \right \} = \frac{\lambda^n}{n!}e^{-\lambda}\] where \(n!\) denotes \(n\) factorial and the parameter \(\lambda\) determines the shape of the distribution. (Actually, \(\lambda\) equals the expected value of \(N\).

The Poisson point process is sometimes called a purely or completely random process. This process has the property that the number of events \(N(A)\) in a bounded region \(A∈\mathbb{R}^d\) are independently and uniformly distributed over \(A\). This means that the location of one point does not affect the probabilities of other points appearing nearby and that there are no regions where events are more likely to appear.

It is common to compare an observed point pattern to others created by an independent random process (IRP) or also called complete spatial randomness (CSR). IRP or CSR meet the following conditions:

  • The probability of an point occuring in any location is equal for all points in the dataset 1st order effect
  • The location of a point is independent of the location of any other point in the dataset 2nd order effect

A distinction is made between homogeneous and inhomogeneous poisson point processes. Homogeneous poisson point processes (HPP) have a constant parameter \(\lambda\). \(\lambda\) or intensity of the process, is the average or expected number of points present in a defined area.

Inhomogeneous poisson point processes (IPP) are defined by a non-constant \(\lambda\) parameter. Both IPP and HPP assume that occurence of points is independent but distributed according to the \(\lambda\) intensity. So the main difference is that HPP assumes a constant and IPP a non-constant intensity.

Other point process models are Cox process(occurence is not independent from other points). Neyman-Scott process, Markov point process. More information on these can be found elsewhere.

Density based methods - global densities \(\frac{\sum n}{A}\) - local densities via local regions - kernel density - non parametric curve fitting based on a density covariate - poisson point process model fitting

Distance based methods - Average nearest neighbor: By varying the neighbor order (1st, 2nd ..) statements about clustering are possible - K and L functions: average nearest neighbor distances via cumulative distance bands - g function: average nearest neighbor distances via distinct distance bands

15.4 Data acquisition

In the following code chunk we use the ohsome-api to query some geodata from OpenStreetMap(OSM). OSM is the most successfull free and open database for geodata. Like Wikipedia, anyone can contribute new data and access or download it for any purpose. Ohsome is a technology built on top of OpenStreetMap data. Its purpose is to make the history, the record of changes to the data easily accessible. But you can use ohsome also to download the actual geometry or the bounding box or centroid of a or multiple OSM objects. There are also other libraries in R to get OSM data. Here is a short overview:

  • ohsome - Access to aggregated OSM values (like count or density), history of objects, snapshots of objects. There are some restrictions in terms of amount of objects, spatial extent and temporal extent. For instance, downloading all buildings of a small to medium city should not pose a difficulty, but large cities, or whole regions will.
  • osmdata - Access to the overpass API. Comparable to the QGIS plugin QuickOSM or the website Overpass-turbo. The same restrictions in terms of amount of objects and spatial extent apply.
  • osmextract - The package will auto. download a osm.pbf from Geofabrik that matches your area of interest. Therefore no restrictions apply on the amount of objects or spatial/temporal extent.

In the following example we use the ohsome package to download the locations of italian restaurants and McDonald’s restaurants

# the following functions are part of the rgeoboundaries package. This package is not on CRAN. TO install it use the following command:
# remotes::install_github("wmgeolab/rgeoboundaries")
bw <- gb_adm1(country = "Germany") |> 
  filter(shapeISO == "DE-BW")

q <- ohsome_query("elements/geometry", 
       bw |> st_bbox(),
       filter="amenity=fast_food and brand=\"McDonald's\"")

fast_food <- ohsome_post(q)
# 
# q <- ohsome_query("elements/geometry", 
#        bw |> st_bbox(),
#        filter="amenity=restaurant and cuisine=italian")
# 
# restaurants <- ohsome_post(q)

We get 252 McDonald’s for the extent of Baden-Württemberg. However, the extent of Baden-Württemberg includes large portions of neighboring states and countries, among others major cities like Straßbourg and Ludwigshafen. Therefore we need to filter by the actual boundary. Querying ohsome by extent rather than actual geometry bears the advantage of easier and therefore faster filtering in the ohsome backend. An extent consists of 4 coordiante pairs, whereas the BW boundary consists of 40571.

The facilities of interest can be mapped in OSM as both polygons or points. This bears the risk that we may have the same facility twice in the dataset, as a point and as a polygon. The point is a Point-of-interest (POI) representation of a facility. For the most use cases this representation is sufficient. The polygon representation is simply more accurate as it includes the the facilities building footprint. Therefore we have to check whether there are spatial duplicates. We check this via st_intersects

# simple plot
plot(fast_food$geometry, pch=20, col=rgb(0,0,0,0.2))
plot(bw$geometry, add=T)

fast_food <- fast_food |> 
  st_make_valid(geometry) |> 
  filter(st_intersects(geometry, bw$geometry, sparse=F))

# check geometrytypes
table(st_geometry_type(fast_food$geometry))
## 
##           GEOMETRY              POINT         LINESTRING            POLYGON 
##                  0                 69                  0                 99 
##         MULTIPOINT    MULTILINESTRING       MULTIPOLYGON GEOMETRYCOLLECTION 
##                  0                  0                  0                  0 
##     CIRCULARSTRING      COMPOUNDCURVE       CURVEPOLYGON         MULTICURVE 
##                  0                  0                  0                  0 
##       MULTISURFACE              CURVE            SURFACE  POLYHEDRALSURFACE 
##                  0                  0                  0                  0 
##                TIN           TRIANGLE 
##                  0                  0
# flatten 
fast_food <- fast_food |> 
  filter(lengths(st_intersects(geometry, fast_food$geometry))==1 | (lengths(st_intersects(geometry, fast_food$geometry))>1 & st_geometry_type(geometry) == "POINT")) |> 
  st_point_on_surface()

# verify by plot
plot(fast_food$geometry, pch=20, col=rgb(0,0,0,0.2))
plot(bw$geometry, add=T)

# restaurants <- restaurants |> 
#   st_make_valid(geometry) |> 
#   filter(st_intersects(geometry, bw$geometry, sparse=F))
# 
# restaurants <- restaurants |> 
#   filter(lengths(st_intersects(geometry, restaurants$geometry))==1 | (lengths(st_intersects(geometry, restaurants$geometry))>1 & st_geometry_type(geometry) == "POINT")) |> 
#   st_point_on_surface()

# 

We see indeed different geometrytypes for McDonald’s and regular restaurants. We check which objects do intersect with more than one geometry (itself) within the dataset. For those that do intersect with more than one geomtry we filter for point geometries. This leaves us with a total of 167 McDonald’s.

In the next chunk we will use the spatstat package to describe the distribution of McDonald’s locations in Baden-Württemberg Spatstat introduces new spatial data structures to R:

  • Point pattern class ppp - Point geometries that represent a phenomenon to be investigated with point pattern analysis methods. PPP objects can have marks attached. Marks are kind of attributes attached to the point pattern that can be used for weighting. Marks kan be categorial or continuous.
  • Observation window owin- Polygon geometries that represent the boundaries within a point pattern is observed and analysed
  • Pixel image im - A raster that represents covariates that may be used to explain the occurrence of a point.

Only geodata with projected coordinates can be converted to spatstat objects.

# convert fast food locations to ppp
ff.pp <- fast_food |> st_transform(32632) |> as.ppp()
# set marks or weights NULL
marks(ff.pp) <- NULL

# convert BW boundary to observation window
bw.owin <- bw |> st_transform(32632)  |>  as.owin()

# set the BW window as the defining boundary window of the point pattern
Window(ff.pp) <- bw.owin

plot(ff.pp, main="Ppp class and window", cols=rgb(0,0,0,.2), pch=20)

ff.pp
## Planar point pattern: 167 points
## window: polygonal boundary
## enclosing rectangle: [388337.3, 610069.4] x [5265166, 5515628] units

Point pattern and window are now stored in one class.

We also load zensus data for Baden-Württemberg to use as covariates later on. The zensus data although gridded, comes in a vector format. We select a variable of interest and convert it first to raster and then image.

# load zensus data for Baden_Württemberg
bw_zensus <- st_read("data/zensus_grid_bw.gpkg", quiet=T)
head(bw_zensus)
## Simple feature collection with 6 features and 17 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 537272.8 ymin: 5265135 xmax: 543292 ymax: 5267136
## Projected CRS: WGS 84 / UTM zone 32N
##              id f_staat f_land f_wasser p_staat  p_land p_wasser Einwohner
## 1 1kmN2714E4284     384      0      384  0.0384  0.0000   0.0384        NA
## 2 1kmN2714E4285  128374    -23   128397 12.8374 -0.0023  12.8397        NA
## 3 1kmN2714E4286  358537    -19   358556 35.8537 -0.0019  35.8556        NA
## 4 1kmN2714E4287  588698     -4   588702 58.8698 -0.0004  58.8702        NA
## 5 1kmN2714E4288  818814     21   818793 81.8814  0.0021  81.8793        NA
## 6 1kmN2715E4283  668015     -9   668024 66.8015 -0.0009  66.8024        NA
##   Frauen_A Alter_D unter18_A ab65_A Auslaender_A HHGroesse_D Leerstandsquote
## 1       NA      NA        NA     NA           NA          NA              NA
## 2       NA      NA        NA     NA           NA          NA              NA
## 3       NA      NA        NA     NA           NA          NA              NA
## 4       NA      NA        NA     NA           NA          NA              NA
## 5       NA      NA        NA     NA           NA          NA              NA
## 6       NA      NA        NA     NA           NA          NA              NA
##   Wohnfl_Bew_D Wohnfl_Whg_D                           geom
## 1           NA           NA POLYGON ((538297.7 5265135,...
## 2           NA           NA POLYGON ((539296.5 5265148,...
## 3           NA           NA POLYGON ((540295.4 5265161,...
## 4           NA           NA POLYGON ((541294.3 5265175,...
## 5           NA           NA POLYGON ((542293.1 5265188,...
## 6           NA           NA POLYGON ((537285.8 5266122,...
summary(bw_zensus)
##       id               f_staat            f_land           f_wasser      
##  Length:36717       Min.   :     32   Min.   :    -23   Min.   :      0  
##  Class :character   1st Qu.:1000000   1st Qu.: 999463   1st Qu.:      0  
##  Mode  :character   Median :1000000   Median :1000000   Median :      0  
##                     Mean   : 994630   Mean   : 980625   Mean   :  14005  
##                     3rd Qu.:1000000   3rd Qu.:1000000   3rd Qu.:    496  
##                     Max.   :1000000   Max.   :1000000   Max.   :1000000  
##                                                                          
##     p_staat             p_land            p_wasser          Einwohner      
##  Min.   :  0.0032   Min.   : -0.0023   Min.   :  0.0000   Min.   :   -1.0  
##  1st Qu.:100.0000   1st Qu.: 99.9463   1st Qu.:  0.0000   1st Qu.:   -1.0  
##  Median :100.0000   Median :100.0000   Median :  0.0000   Median :   11.0  
##  Mean   : 99.4630   Mean   : 98.0625   Mean   :  1.4005   Mean   :  289.2  
##  3rd Qu.:100.0000   3rd Qu.:100.0000   3rd Qu.:  0.0496   3rd Qu.:  160.0  
##  Max.   :100.0000   Max.   :100.0000   Max.   :100.0000   Max.   :17291.0  
##                                                           NA's   :206      
##     Frauen_A         Alter_D        unter18_A          ab65_A       
##  Min.   : -9.00   Min.   :-9.00   Min.   : -9.00   Min.   : -9.000  
##  1st Qu.: -1.00   1st Qu.:-1.00   1st Qu.: -1.00   1st Qu.: -1.000  
##  Median : 43.00   Median :37.00   Median :  0.00   Median :  0.000  
##  Mean   : 27.21   Mean   :23.89   Mean   :  8.83   Mean   :  8.795  
##  3rd Qu.: 51.00   3rd Qu.:43.00   3rd Qu.: 19.00   3rd Qu.: 19.000  
##  Max.   :100.00   Max.   :85.00   Max.   :100.00   Max.   :100.000  
##  NA's   :206      NA's   :206     NA's   :206      NA's   :206      
##   Auslaender_A      HHGroesse_D     Leerstandsquote     Wohnfl_Bew_D   
##  Min.   : -9.000   Min.   :-9.000   Min.   : -9.0000   Min.   : -9.00  
##  1st Qu.: -1.000   1st Qu.:-1.000   1st Qu.: -1.0000   1st Qu.: -1.00  
##  Median :  0.000   Median : 2.000   Median :  0.0000   Median : -1.00  
##  Mean   :  1.686   Mean   : 1.004   Mean   :  0.3942   Mean   : 13.11  
##  3rd Qu.:  3.000   3rd Qu.: 3.000   3rd Qu.:  2.0000   3rd Qu.: 42.00  
##  Max.   :100.000   Max.   : 9.000   Max.   :100.0000   Max.   :143.00  
##  NA's   :206       NA's   :206      NA's   :206        NA's   :206     
##   Wohnfl_Whg_D               geom      
##  Min.   : -9.00   POLYGON      :36717  
##  1st Qu.: -9.00   epsg:32632   :    0  
##  Median : -1.00   +proj=utm ...:    0  
##  Mean   : 18.56                        
##  3rd Qu.: -1.00                        
##  Max.   :252.00                        
##  NA's   :206

The column Einwohner contains population per grid cell. Minimum value is -1. We change it to 0 to avoid inconsistencies.

bw_zensus <- bw_zensus |> 
  mutate(Einwohner = case_when(
    Einwohner < 0 ~ 0, # if 
    TRUE ~ as.numeric(Einwohner) # else 
  ),
  unter18_A = case_when(
    unter18_A < 0 ~ 0, # if 
    TRUE ~ as.numeric(unter18_A) # else 
  ))

plot(bw_zensus["Einwohner"], lwd=0.1)

Population density on admin level 1 is usually quite skewed. Therefore we apply a logarithmic transformation. This may help to reveal the relationship between the point pattern and population distribution later on.

library(stars)
## Loading required package: abind
bw_zensus_stars <- st_rasterize(bw_zensus)
plot(bw_zensus_stars["Einwohner"])

hist(bw_zensus_stars["Einwohner"], main="BW pop.", xlab="Population")

bw_zensus_stars <- bw_zensus_stars |> mutate(
  Einwohner=log(Einwohner +1),
  unter18_A=log(unter18_A+1))
hist(bw_zensus_stars["Einwohner"], main="BW pop.", xlab="Population")

bw_zensus.im <- bw_zensus_stars["Einwohner"] |> as.im()

15.5 Density based

In this chapter we try to understand the distribution of McDonald’s restaurants with density based measures.

15.5.1 Quadrat density

The spatstat package provides the functions quadratcount() and intensity() to describe the density of points locally. The study area is divided into a set of quadrats. For each quadrat the number of points is counted. The number of quadrats is controlled via the parameters nx and ny for rows and columns. For intensity, the number of points is related to the area of the square.

ff.q <- quadratcount(ff.pp, nx= 8, ny=10)

plot(ff.pp, pch=20, cols="grey70", main="Quadrat count McDonald's")  # Plot points
plot(ff.q, add=TRUE)  # Add quadrat grid

The McDonald’s restaurants show a tendenciy to cluster. Especially in the Stuttgart Metropolitan area and other larger cities. How does a density based analysis look for random and regular point pattern?

The next chunk simulates these kind of point patterns via the sf function st_sample. We use the same amount of points but distribute them a) randomly and b) regular.

rand.pp <- st_sample( x = bw, size=nrow(fast_food), type="random") |> 
  st_transform(32632) |> 
  as.ppp()

Window(rand.pp) <- bw.owin

reg.pp <- st_sample( x = bw, size=nrow(fast_food), type="regular") 
st_crs(reg.pp) <- st_crs(4326)

reg.pp <- reg.pp |> 
  st_transform(32632) |> 
  as.ppp()

Window(reg.pp) <- bw.owin

Next we do the quadrat counts for these pattern.

par(mar = c(4, 4, .1, .1))
rand.q <- quadratcount(rand.pp, nx= 8, ny=10)

plot(rand.pp, pch=20, cols="grey70", main="quadrant count random")  # Plot points
plot(rand.q, add=TRUE)  # Add quadrat grid

reg.q <- quadratcount(reg.pp, nx= 8, ny=10)

plot(reg.pp, pch=20, cols="grey70", main="quadrant count regular")  # Plot points
plot(reg.q, add=TRUE)  # Add quadrat grid

Quadrat count distribution as histogram

ff.q |> hist(main="McDonald's quadrat count BW (8x10)", xlab="Quadrat count")

rand.q |> hist(main="Random quadrat count BW (8x10)", xlab="Quadrat count")

reg.q |> hist(main="Regular quadrat count BW (8x10)", xlab="Quadrat count")

The quadrat counts for the McDonald’s restaurants is much more skewed compared to random and regular distributions.

We can further assess the intensity by a hypothesis test using goodness-of-fit test statistic

cat("McDonald's")
## McDonald's
quadrat.test(ff.q)
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  
## X2 = 238.77, df = 69, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 70 tiles (irregular windows)
cat("Random")
## Random
quadrat.test(rand.q)
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  
## X2 = 53.961, df = 69, p-value = 0.1838
## alternative hypothesis: two.sided
## 
## Quadrats: 70 tiles (irregular windows)

See the differences in the p-values. The low p-value for the McDonald restaurants indicate an IPP where intensity can be estimated non-parametrically (kernel density). The high p-value (p > 0.05) of the random points indicate a HPP where intensity is constant.

Density within each quadrat is calculated via intensity()

# Compute the density for each quadrat
ff.d <- intensity(ff.q)

# Plot the density
plot(intensity(ff.q, image=TRUE), main=NULL, las=1)  # Plot density raster
plot(ff.pp, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)  # Add points

Density is measured as number of points per square meters. For Baden-Württemberg the results are quite big. We can adjust for this by rescaling the spatstat classes to kilometer.

ff.pp.km <- rescale(ff.pp, 1000, "km")
bw.owin.km <- rescale(bw.owin, 1000, "km")
rand.pp.km <- rescale(rand.pp, 1000, "km")
reg.pp.km <- rescale(reg.pp, 1000, "km")
bw_zensus.im.km <- rescale(bw_zensus.im, 1000, "km")
# Compute the density for each quadrat (in counts per km2)
ff.q.km <- quadratcount(ff.pp.km, nx= 8, ny=10)
ff.d.km <- intensity(ff.q.km)

# Plot the density
plot(intensity(ff.q.km, image=TRUE), main="McDonald's quadrat density", las=1)  # Plot density raster
plot(ff.pp.km, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)  # Add points

15.5.2 Density on tessellated surface

The point pattern density can also be assessed with non-uniform quadrats / regions. We can use the population raster to define quadrats. First we divide the population raster in 5 regions based on the logged population density. Then we apply quadrat count and intensity on the 5 regions.

hist(bw_zensus_stars["Einwohner"], main="BW pop.", xlab="Population")

brk  <- c( -Inf, 2, 4, 6, 8 , Inf)  # Define the breaks
#brk  <- c( -Inf, -2, 0, 2 , Inf)  # Define the breaks

Zcut <- cut(bw_zensus.im.km, breaks=brk, labels=1:5)  # Classify the raster

E <- tess(image=Zcut) 
plot(E, main="", las=1)

The tesselated surface consisting of 5 regions.

Q   <- quadratcount(ff.pp.km, tess = E)  # Tally counts
Q.d <- intensity(Q)  # Compute density
Q.d
## tile
##           1           2           3           4           5 
## 0.001092069 0.001638125 0.004369945 0.011711581 0.065373781

The scale is in kilometer, so the value is to be interpreted as the density of points per km². The point density increases with the population class. A slight indication of dependence between the point process and the covariate. Again, lets plot the density values across the tesselated regions.

plot(intensity(Q, image=TRUE), las=1, main=NULL)
plot(ff.pp.km, pch=20, cex=0.6, col=rgb(1,1,1,.5), add=TRUE)

15.5.3 Kernel density

The kernel density computes an isotroptic intensity estimate of the point pattern. Similar to the quadrat density appraoch, the kernel density approach computes localized density estimates. But instead of discrete sub-regions, it uses overlapping moving sub-regions or windows. Size and shape of a window is defined by a kernel. Different kernel functions exist. The bandwith parameter sigma of the density function controls the kernel’s window extent. A too large bandwith/window will result in an oversmoothed kernel density estiamte, whereas a too small set bandwith will be undersmoothed.

ff.pp.km.kd1 <- density(ff.pp.km) 
plot(ff.pp.km.kd1, main="Kernel density McDonald's restaurants; default bandwith", las=1)
contour(ff.pp.km.kd1, add=TRUE)

ff.pp.km.kd2 <- density(ff.pp.km, sigma=50) # Using the default bandwidth
plot(ff.pp.km.kd2, main="Kernel density McDonald's restaurants; 50km bandwith", las=1)
contour(ff.pp.km.kd2, add=TRUE)

ff.pp.km.kd3 <- density(ff.pp.km, sigma=10) # Using a 50km bandwidth
plot(ff.pp.km.kd3, main=" density McDonald's restaurants; 10km bandwith", las=1)
contour(ff.pp.km.kd3, add=TRUE)

How does this compare to the random and regular point patterns.

par(mar = c(4, 4, .1, .1))
rand.pp.km.kd <- density(rand.pp.km, sigma=10) # Using a 50km bandwidth
plot(rand.pp.km.kd, main=" density random; 10km bandwith", las=1)
contour(rand.pp.km.kd, add=TRUE)

reg.pp.km.kd <- density(reg.pp.km, sigma=10) # Using a 50km bandwidth
plot(reg.pp.km.kd, main=" density regular; 10km bandwith", las=1)
contour(reg.pp.km.kd, add=TRUE)

15.5.4 Kernel density adjusted

Until now this chapter covered methods to describe the distribution of points. Next we will see how to model the relationship between point distribution and covariates.

We fit a nonparametric curve to the data to further explore the shape of the realtion of McDonald’s restaurant locations and population density.

rho <- rhohat(ff.pp.km, bw_zensus.im.km, method="ratio")
plot(rho, las=1, main=NULL, legendargs=list(cex=0.8, xpd=TRUE))

We see an exponential like increase of McDonald’s locations and increasing population density.

We can use the result to predict McDonald densities under the assumption that population density is the only driver of the point process.

pred <- predict(rho)
col.rmp   <- interp.colours(c("green", "blue", "lightyellow"), 100) # Create color scheme
plot(pred, col=col.rmp, las=1, main=NULL, gamma = 0.2)

Predicted values range form 0. to 0.4 restaurants per km². We can check the observed against the predicted intensity via a matrix scatterplot for im classes.

ff.pp.km.kd3_pred <- pairs(ff.pp.km.kd3, pred, plot = FALSE)
plot(ff.pp.km.kd3_pred$pred ~ ff.pp.km.kd3_pred$ff.pp.km.kd3, pch=20,
     xlab = "Observed intensity", 
     ylab = "Predicted intensity", 
     col = rgb(0,0,0,0.1))

We observe a skew towards predicted intensities. Let’s take a closer look by adjusting the margins.

plot(ff.pp.km.kd3_pred$pred ~ ff.pp.km.kd3_pred$ff.pp.km.kd3, pch=20,
     xlab = "Observed intensity", 
     ylab = "Predicted intensity", 
     col = rgb(0,0,0,0.1),
     xlim = c(0, 0.03), ylim = c(0, 0.1))
abline(a=0, b = 1, col = "red")

Overestimation is present across the whole range of intensities.

15.5.5 Intensity as a function of a covariate

In the following chunk we fit a poisson point process model on the McDonald locations.

ppm.1 <- ppm(ff.pp.km ~ bw_zensus.im.km)
## Warning: Values of the covariate 'bw_zensus.im.km' were NA or undefined at 0.11%
## (1 out of 894) of the quadrature points. Occurred while executing: ppm.ppp(Q =
## ff.pp.km, trend = ~bw_zensus.im.km, data = NULL,
plot(effectfun(ppm.1, "bw_zensus.im.km", se.fit=TRUE), main=NULL, 
     las=1, legendargs=list(cex=0.8, xpd=TRUE ))

Despite the similarity with the nonparametric model, it should be noted that this is a much more well defined model.

ppm.1
## Nonstationary Poisson process
## 
## Log intensity:  ~bw_zensus.im.km
## 
## Fitted trend coefficients:
##     (Intercept) bw_zensus.im.km 
##      -7.5948755       0.4740355 
## 
##                   Estimate       S.E.    CI95.lo    CI95.hi Ztest      Zval
## (Intercept)     -7.5948755 0.23746770 -8.0603036 -7.1294473   *** -31.98277
## bw_zensus.im.km  0.4740355 0.03630491  0.4028792  0.5451918   ***  13.05706
## Problem:
##  Values of the covariate 'bw_zensus.im.km' were NA or undefined at 0.11% (1 out 
## of 894) of the quadrature points

\[\lambda(i)=e^-7.869+0.844 \] Base intensity \(e^-7.869\) when the logged population density is 0. For every logged population unit increase, the density of McDonald’s restaurants increases by \(e^0.844\).

Next we compare our model against a null model.

ppm.null <- ppm(ff.pp.km ~ 1)
ppm.null
## Stationary Poisson process
## Intensity: 0.004675151
##              Estimate       S.E.  CI95.lo   CI95.hi Ztest      Zval
## log(lambda) -5.365494 0.07738232 -5.51716 -5.213827   *** -69.33746

The null model which assumes a homogeneous process:

\[\lambda(i)=e^-5.365\]

Actually this is nothing more than the global density of McDonald’s restaurants in our AOI:

ff.pp.km$n / area(bw.owin.km)
## [1] 0.004675151

Next we compare both models using the anova function:

anova(ppm.null, ppm.1, test="LRT")
## Analysis of Deviance Table
## 
## Model 1: ~1   Poisson
## Model 2: ~bw_zensus.im.km     Poisson
##   Npar Df Deviance  Pr(>Chi)    
## 1    2                          
## 2    3  1    234.1 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Okay we see a significant improvement in desribing the variance of the point process when using the population density based model. How about other covariates. We will now look at the share of population under 18 and explore more functions to assess a poisson point process model.

bw_zensus.u18.im.km <- bw_zensus_stars["unter18_A"] |> as.im() |> rescale(1000, "km")

ppm.2 <- ppm(ff.pp.km ~ bw_zensus.u18.im.km)
## Warning: Values of the covariate 'bw_zensus.u18.im.km' were NA or undefined
## at 0.11% (1 out of 894) of the quadrature points. Occurred while executing:
## ppm.ppp(Q = ff.pp.km, trend = ~bw_zensus.u18.im.km, data = NULL,
ppm.2
## Nonstationary Poisson process
## 
## Log intensity:  ~bw_zensus.u18.im.km
## 
## Fitted trend coefficients:
##         (Intercept) bw_zensus.u18.im.km 
##          -6.3167114           0.4727696 
## 
##                       Estimate       S.E.   CI95.lo    CI95.hi Ztest       Zval
## (Intercept)         -6.3167114 0.17242032 -6.654649 -5.9787738   *** -36.635539
## bw_zensus.u18.im.km  0.4727696 0.06261729  0.350042  0.5954973   ***   7.550145
## Problem:
##  Values of the covariate 'bw_zensus.u18.im.km' were NA or undefined at 0.11% (1 
## out of 894) of the quadrature points

\[\lambda(i)=e^-6.317+0.473\]

diagnose.ppm(ppm.1, main="ppm - population")
## Warning: Some infinite, NA or NaN increments were removed

## Model diagnostics (raw residuals)
## Diagnostics available:
##  four-panel plot
##  mark plot 
##  smoothed residual field
##  x cumulative residuals
##  y cumulative residuals
##  sum of all residuals
## sum of raw residuals in entire window = -6.556e-12
## area of entire window = 35720
## quadrature area = 35680
## range of smoothed field =  [-0.001498, 0.002525]
diagnose.ppm(ppm.2, main="ppm - share of population under 18")
## Warning: Some infinite, NA or NaN increments were removed

## Model diagnostics (raw residuals)
## Diagnostics available:
##  four-panel plot
##  mark plot 
##  smoothed residual field
##  x cumulative residuals
##  y cumulative residuals
##  sum of all residuals
## sum of raw residuals in entire window = -1.883e-09
## area of entire window = 35720
## quadrature area = 35680
## range of smoothed field =  [-0.002285, 0.007125]

15.6 Distance based analysis

This sub chapter covers the second part of point pattern analysis methods - distance based approaches

15.6.1 Average nearest neighbor

THe average nearest neighbor analysis measures the average distance from each point location to the next point location.

ff.pp.km.ann <- mean(nndist(ff.pp.km, k=1))

The average nearest neighbor for all McDonald’s restaurants in the study area is 6.8611607`. By varying the parameter k one can control for n’th nearest neighbor distances

ann.2nd <- mean(nndist(ff.pp.km, k=2))

The average second nearest neighbor is ann.2nd units. The following chunk plots the average nearest neighbors for order 1 to 100.

ann <- apply(nndist(ff.pp.km, k=1:100),2,FUN=mean)
plot(ann ~ eval(1:100), type="b", main=NULL, las=1)

Steep increase in average nearest neighbor distance for the first couple of neighbor orders. After order 20, steady growth in average nearest neighbor distance.

Compare against random and regular distributions

ann.rand <- apply(nndist(rand.pp.km, k=1:100),2,FUN=mean)
plot(ann.rand ~ eval(1:100), type="b", main=NULL, las=1)

ann.reg <- apply(nndist(reg.pp.km, k=1:100),2,FUN=mean)
plot(ann.reg ~ eval(1:100), type="b", main=NULL, las=1)

15.6.2 Simulate random process and test for CSR

With the average nearest neighbor functions we can test if complete spatial randomness is present in our point distribution. The observed average nearest neighbor is 6.8611607. Now we run multiple iterations and randomly place the same amount of point locations within our study area.

n <- 1499L # iterations
ann.r <- vector(length = n) # empty vector for sim results
for (i in 1:n){
  rand.p   <- rpoint(n=ff.pp.km$n, win=bw.owin.km)  
  ann.r[i] <- mean(nndist(rand.p, k=1))  
}

Point locations of the last simulation.

plot(rand.p, pch=16, main="Last simulation", cols=rgb(0,0,0,0.5))

Histogram of the average nearest neighbor from the simulations.

hist(ann.r, main=NULL, las=1, breaks=40, col="bisque", xlim=range(ff.pp.km.ann, ann.r))
abline(v=ff.pp.km.ann, col="blue")

The blue line is the observed ANN value. It is clearly smaller than expected ANN values under the null hypotheses of a CSR / homogeneous point process. Now we run same simulation again but in the random placement we will account for the density of population. That way we control for the influence of the population density covariate. This means we are still exploring the underlying 2nd order process but with the covariate we control for the first order process.

n     <- 1499L
ann.r <- vector(length=n)
for (i in 1:n){
  rand.p   <- rpoint(n=ff.pp.km$n, f=bw_zensus.im.km) 
  ann.r[i] <- mean(nndist(rand.p, k=1))
}
Window(rand.p) <- bw.owin.km  
plot(rand.p, pch=16, main="Last sim run", cols=rgb(0,0,0,0.5))

Now the locations are more clustered in the Stuttgart and Mannheim / Heidelberg areas. How about the histogram?

hist(ann.r, main=NULL, las=1, breaks=40, col="bisque", xlim=range(ff.pp.km.ann, ann.r))
abline(v=ff.pp.km.ann, col="blue")

The ANN values are getting closer to the observed ANN value. However we cannot state that the clusters of McDonald’s restaurants can be explained by a completely random process when controlled for population density.

15.6.3 K, L and g functions

Other methods to check for IRP/CSR are K, L and g or pair correlation functions. The K function cumulatively sums up the amount of neighbors for each point location in different distance lags. Per distance lag the mean sum of neighbors is divided by the global point density.

K <- Kest(ff.pp.km)
plot(K, main="K function", las=1, legendargs=list(cex=0.8, xpd=TRUE ))

K function produces different functions based on different edge correction algorithm. Explanations of the differences can be found elsewhere. See ?Kest for a first start. The blue line K pois depicts the theoretical K function under the null hypothesis which states that all points are IRP/CSR. Lines below K pois indicate dispersion, whereas lines above indicate clustering.

K function result shapes are not always easy to interpret if the differences are narrow. Here the L function may help. It is the Besag’s transformation of the K function.

L <- Lest(ff.pp.km, main="L function")
plot(L, main=NULL, las=1, legendargs=list(cex=0.8, xpd=TRUE ))

The following chunk sets the L pois horizontal, further improving interpretation.

plot(L, . -r ~ r, main="L function horizontal", las=1, legendargs=list(cex=0.8, xpd=TRUE ))

According to the function, no matter what border edge correction method is used, all are clearly clustering after ~1km distance bands. The intensity of clustering however varies between the methods. border corretion for instance depicts strong intensity after ~18km band.

K and L function work cumulatively. All neighbors from previous distance bands are included in the following band, like 0-10km, 0-20km … G functions sum up neighbors in discrete bands, like 0-10km, 10-20km…

g  <- pcf(ff.pp.km)
plot(g, main="Pair correlation or g function", las=1, legendargs=list(cex=0.8, xpd=TRUE ))

Again g pois represents the IRP/CSR process. Clustering is strongly prevalent in the first distance bands.

What other functions are there? How do they compare when applied to regular and random distributions?

plot(allstats(ff.pp.km), main="McDonald's")

plot(allstats(rand.pp.km), main="Random")

plot(allstats(reg.pp.km), main="Regular")

All of the functions can also be calculated individually via the R functions Fest(), Gest(), Jest.

  • F function - Empty space function: \(\hat{F}(r) > F_{pois}(r)\) indicate shorter distances from empty spaces to the closest event and therefore a regular pattern. \(\hat{F}(r) < F_{pois}(r)\) suggest clustering.
  • G function - Nearest-neighbor distance function: Cumulative distribution of the distance from a typical random point of X to the nearest other point of X. Interpretation of the G function is the reverse of F function: \(\hat{G}(r) > G_{pois}(r)\) indicate nearest neighbor distances are shorter than for a poisson point process suggesting clustering. \(\hat{G}(r) < G_{pois}(r)\) indicates a regular pattern.
  • J function - combination of F and G function: \(\hat{G}(r) > 1\) indicate regularity, \(\hat{G}(r) < 1\) clustering.

References

Baddeley, A., Rubak, E., Turner, R., 2015. Spatial Point Patterns: Methodology and Applications with R. CRC Press.