Chapter 15 Point pattern analysis
library(ohsome)
library(sf)
library(terra)
library(tmap)
library(rgeoboundaries)
library(spatstat)
library(tidyverse)
15.1 Cheat sheet
The functions for spatial point pattern analyses are provided by the package spatstat
:
- spatstat::quadrantcount
- local density of point patterns
- spatstat::intensity
- another local density measure
- spatstat::density.ppp
- kernel density measure
- spatstat::rhohat
- fit a non parametric curve to point pattern and covariate data
- spatstat::nndist
- generate nearest neighbor distances
- spatstat::Kest
- K function: estimate mean distances cumulative for different distance bands
- spatstat::Lest
- Transformation of K function: easier to interpret
- spatstat::pcf
- G function: Distance based method with non cumulative distance bands
- spatstat::ppm
- fit a poisson point process model on a point distribution and covariates
spatstat
is the result of 30 years of software development and contains over 175,000 lines of code. It is still under development, motivated by the needs of researchers in many fields, and driven by innovations in statistical science. We welcome contributions of code, and suggestions for improvements.
More on spatial point pattern analysis and implementations in R see (Baddeley et al., 2015)
After you load the package in an R session, you can start an interactive tutorial by entering beginner
in the console.
15.2 tl;dr - Spatial point pattern analysis in a nutshell
Spatial point pattern analysis is concerned with processes that drive the occurrence of points or events. Two main effects in the occurence of points are defined: - 1st order effect: Event observations vary from place to place due to differences in the underlying property - 2nd order effect: Event observations vary from place to place due to interactions between the events / points
Different methods exist to assess spatial point patterns. The two main categories are: - Density based methods - Distance based methods
Density based methods focus on the 1st order effect, the underlying property that may influence the occurence of events / points. Distance based methods focus on the 2nd order property.
However it is important to notice that it is often not possible to distinguish both effects. Furthermore prior knowledge about the study site and the observed phenomenon is highly important before drawing conclusions from spatial point pattern analysis results.
15.3 Theoretical background
A spatial point pattern is the result of spatial point process. These are considered stochastic processes whose realizations each consists of a finite or countably infinite set of points on a planar surface. For spatial point pattern analysis both the location fo the point itself and depending on the analyses attached values are treated as response variables. It is common to compare point patterns to artificial generated clustered, random or regular distributed point patterns.
The mentioned attached values or marks are like attributes for standard vector data. They can be of categorical or continuous scale and may be used to weigh events / points.
Poisson Point process
The poisson point process is a type of random mathematical object which generates points in a defined space randomly. The poisson distribution is the probability of a random variable \(N\) such that the probability that \(N\) equals \(n\) is explained by:
\[\mathbb{P} \left \{ N = n \right \} = \frac{\lambda^n}{n!}e^{-\lambda}\] where \(n!\) denotes \(n\) factorial and the parameter \(\lambda\) determines the shape of the distribution. (Actually, \(\lambda\) equals the expected value of \(N\).
The Poisson point process is sometimes called a purely or completely random process. This process has the property that the number of events \(N(A)\) in a bounded region \(A∈\mathbb{R}^d\) are independently and uniformly distributed over \(A\). This means that the location of one point does not affect the probabilities of other points appearing nearby and that there are no regions where events are more likely to appear.
It is common to compare an observed point pattern to others created by an independent random process (IRP) or also called complete spatial randomness (CSR). IRP or CSR meet the following conditions:
- The probability of an point occuring in any location is equal for all points in the dataset 1st order effect
- The location of a point is independent of the location of any other point in the dataset 2nd order effect
A distinction is made between homogeneous and inhomogeneous poisson point processes. Homogeneous poisson point processes (HPP) have a constant parameter \(\lambda\). \(\lambda\) or intensity of the process, is the average or expected number of points present in a defined area.
Inhomogeneous poisson point processes (IPP) are defined by a non-constant \(\lambda\) parameter. Both IPP and HPP assume that occurence of points is independent but distributed according to the \(\lambda\) intensity. So the main difference is that HPP assumes a constant and IPP a non-constant intensity.
Other point process models are Cox process(occurence is not independent from other points). Neyman-Scott process, Markov point process. More information on these can be found elsewhere.
Density based methods - global densities \(\frac{\sum n}{A}\) - local densities via local regions - kernel density - non parametric curve fitting based on a density covariate - poisson point process model fitting
Distance based methods - Average nearest neighbor: By varying the neighbor order (1st, 2nd ..) statements about clustering are possible - K and L functions: average nearest neighbor distances via cumulative distance bands - g function: average nearest neighbor distances via distinct distance bands
15.4 Data acquisition
In the following code chunk we use the ohsome-api to query some geodata from OpenStreetMap(OSM). OSM is the most successfull free and open database for geodata. Like Wikipedia, anyone can contribute new data and access or download it for any purpose. Ohsome is a technology built on top of OpenStreetMap data. Its purpose is to make the history, the record of changes to the data easily accessible. But you can use ohsome also to download the actual geometry or the bounding box or centroid of a or multiple OSM objects. There are also other libraries in R to get OSM data. Here is a short overview:
ohsome
- Access to aggregated OSM values (like count or density), history of objects, snapshots of objects. There are some restrictions in terms of amount of objects, spatial extent and temporal extent. For instance, downloading all buildings of a small to medium city should not pose a difficulty, but large cities, or whole regions will.osmdata
- Access to the overpass API. Comparable to the QGIS pluginQuickOSM
or the website Overpass-turbo. The same restrictions in terms of amount of objects and spatial extent apply.osmextract
- The package will auto. download aosm.pbf
from Geofabrik that matches your area of interest. Therefore no restrictions apply on the amount of objects or spatial/temporal extent.
In the following example we use the ohsome
package to download the locations of italian restaurants and McDonald’s restaurants
# the following functions are part of the rgeoboundaries package. This package is not on CRAN. TO install it use the following command:
# remotes::install_github("wmgeolab/rgeoboundaries")
bw <- gb_adm1(country = "Germany") |>
filter(shapeISO == "DE-BW")
q <- ohsome_query("elements/geometry",
bw |> st_bbox(),
filter="amenity=fast_food and brand=\"McDonald's\"")
fast_food <- ohsome_post(q)
#
# q <- ohsome_query("elements/geometry",
# bw |> st_bbox(),
# filter="amenity=restaurant and cuisine=italian")
#
# restaurants <- ohsome_post(q)
We get 252 McDonald’s for the extent of Baden-Württemberg. However, the extent of Baden-Württemberg includes large portions of neighboring states and countries, among others major cities like Straßbourg and Ludwigshafen. Therefore we need to filter by the actual boundary. Querying ohsome by extent rather than actual geometry bears the advantage of easier and therefore faster filtering in the ohsome backend. An extent consists of 4 coordiante pairs, whereas the BW boundary consists of 40571.
The facilities of interest can be mapped in OSM as both polygons or points. This bears the risk that we may have the same facility twice in the dataset, as a point and as a polygon. The point is a Point-of-interest (POI) representation of a facility. For the most use cases this representation is sufficient. The polygon representation is simply more accurate as it includes the the facilities building footprint.
Therefore we have to check whether there are spatial duplicates. We check this via st_intersects
fast_food <- fast_food |>
st_make_valid(geometry) |>
filter(st_intersects(geometry, bw$geometry, sparse=F))
# check geometrytypes
table(st_geometry_type(fast_food$geometry))
##
## GEOMETRY POINT LINESTRING POLYGON
## 0 69 0 99
## MULTIPOINT MULTILINESTRING MULTIPOLYGON GEOMETRYCOLLECTION
## 0 0 0 0
## CIRCULARSTRING COMPOUNDCURVE CURVEPOLYGON MULTICURVE
## 0 0 0 0
## MULTISURFACE CURVE SURFACE POLYHEDRALSURFACE
## 0 0 0 0
## TIN TRIANGLE
## 0 0
# flatten
fast_food <- fast_food |>
filter(lengths(st_intersects(geometry, fast_food$geometry))==1 | (lengths(st_intersects(geometry, fast_food$geometry))>1 & st_geometry_type(geometry) == "POINT")) |>
st_point_on_surface()
# verify by plot
plot(fast_food$geometry, pch=20, col=rgb(0,0,0,0.2))
plot(bw$geometry, add=T)
# restaurants <- restaurants |>
# st_make_valid(geometry) |>
# filter(st_intersects(geometry, bw$geometry, sparse=F))
#
# restaurants <- restaurants |>
# filter(lengths(st_intersects(geometry, restaurants$geometry))==1 | (lengths(st_intersects(geometry, restaurants$geometry))>1 & st_geometry_type(geometry) == "POINT")) |>
# st_point_on_surface()
#
We see indeed different geometrytypes for McDonald’s and regular restaurants. We check which objects do intersect with more than one geometry (itself) within the dataset. For those that do intersect with more than one geomtry we filter for point geometries. This leaves us with a total of 167 McDonald’s.
In the next chunk we will use the spatstat
package to describe the distribution of McDonald’s locations in Baden-Württemberg
Spatstat introduces new spatial data structures to R:
- Point pattern class
ppp
- Point geometries that represent a phenomenon to be investigated with point pattern analysis methods. PPP objects can have marks attached. Marks are kind of attributes attached to the point pattern that can be used for weighting. Marks kan be categorial or continuous. - Observation window
owin
- Polygon geometries that represent the boundaries within a point pattern is observed and analysed - Pixel image
im
- A raster that represents covariates that may be used to explain the occurrence of a point.
Only geodata with projected coordinates can be converted to spatstat objects.
# convert fast food locations to ppp
ff.pp <- fast_food |> st_transform(32632) |> as.ppp()
# set marks or weights NULL
marks(ff.pp) <- NULL
# convert BW boundary to observation window
bw.owin <- bw |> st_transform(32632) |> as.owin()
# set the BW window as the defining boundary window of the point pattern
Window(ff.pp) <- bw.owin
plot(ff.pp, main="Ppp class and window", cols=rgb(0,0,0,.2), pch=20)
## Planar point pattern: 167 points
## window: polygonal boundary
## enclosing rectangle: [388337.3, 610069.4] x [5265166, 5515628] units
Point pattern and window are now stored in one class.
We also load zensus data for Baden-Württemberg to use as covariates later on. The zensus data although gridded, comes in a vector format. We select a variable of interest and convert it first to raster and then image.
# load zensus data for Baden_Württemberg
bw_zensus <- st_read("data/zensus_grid_bw.gpkg", quiet=T)
head(bw_zensus)
## Simple feature collection with 6 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 537272.8 ymin: 5265135 xmax: 543292 ymax: 5267136
## Projected CRS: WGS 84 / UTM zone 32N
## id f_staat f_land f_wasser p_staat p_land p_wasser Einwohner
## 1 1kmN2714E4284 384 0 384 0.0384 0.0000 0.0384 NA
## 2 1kmN2714E4285 128374 -23 128397 12.8374 -0.0023 12.8397 NA
## 3 1kmN2714E4286 358537 -19 358556 35.8537 -0.0019 35.8556 NA
## 4 1kmN2714E4287 588698 -4 588702 58.8698 -0.0004 58.8702 NA
## 5 1kmN2714E4288 818814 21 818793 81.8814 0.0021 81.8793 NA
## 6 1kmN2715E4283 668015 -9 668024 66.8015 -0.0009 66.8024 NA
## Frauen_A Alter_D unter18_A ab65_A Auslaender_A HHGroesse_D Leerstandsquote
## 1 NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## Wohnfl_Bew_D Wohnfl_Whg_D geom
## 1 NA NA POLYGON ((538297.7 5265135,...
## 2 NA NA POLYGON ((539296.5 5265148,...
## 3 NA NA POLYGON ((540295.4 5265161,...
## 4 NA NA POLYGON ((541294.3 5265175,...
## 5 NA NA POLYGON ((542293.1 5265188,...
## 6 NA NA POLYGON ((537285.8 5266122,...
## id f_staat f_land f_wasser
## Length:36717 Min. : 32 Min. : -23 Min. : 0
## Class :character 1st Qu.:1000000 1st Qu.: 999463 1st Qu.: 0
## Mode :character Median :1000000 Median :1000000 Median : 0
## Mean : 994630 Mean : 980625 Mean : 14005
## 3rd Qu.:1000000 3rd Qu.:1000000 3rd Qu.: 496
## Max. :1000000 Max. :1000000 Max. :1000000
##
## p_staat p_land p_wasser Einwohner
## Min. : 0.0032 Min. : -0.0023 Min. : 0.0000 Min. : -1.0
## 1st Qu.:100.0000 1st Qu.: 99.9463 1st Qu.: 0.0000 1st Qu.: -1.0
## Median :100.0000 Median :100.0000 Median : 0.0000 Median : 11.0
## Mean : 99.4630 Mean : 98.0625 Mean : 1.4005 Mean : 289.2
## 3rd Qu.:100.0000 3rd Qu.:100.0000 3rd Qu.: 0.0496 3rd Qu.: 160.0
## Max. :100.0000 Max. :100.0000 Max. :100.0000 Max. :17291.0
## NA's :206
## Frauen_A Alter_D unter18_A ab65_A
## Min. : -9.00 Min. :-9.00 Min. : -9.00 Min. : -9.000
## 1st Qu.: -1.00 1st Qu.:-1.00 1st Qu.: -1.00 1st Qu.: -1.000
## Median : 43.00 Median :37.00 Median : 0.00 Median : 0.000
## Mean : 27.21 Mean :23.89 Mean : 8.83 Mean : 8.795
## 3rd Qu.: 51.00 3rd Qu.:43.00 3rd Qu.: 19.00 3rd Qu.: 19.000
## Max. :100.00 Max. :85.00 Max. :100.00 Max. :100.000
## NA's :206 NA's :206 NA's :206 NA's :206
## Auslaender_A HHGroesse_D Leerstandsquote Wohnfl_Bew_D
## Min. : -9.000 Min. :-9.000 Min. : -9.0000 Min. : -9.00
## 1st Qu.: -1.000 1st Qu.:-1.000 1st Qu.: -1.0000 1st Qu.: -1.00
## Median : 0.000 Median : 2.000 Median : 0.0000 Median : -1.00
## Mean : 1.686 Mean : 1.004 Mean : 0.3942 Mean : 13.11
## 3rd Qu.: 3.000 3rd Qu.: 3.000 3rd Qu.: 2.0000 3rd Qu.: 42.00
## Max. :100.000 Max. : 9.000 Max. :100.0000 Max. :143.00
## NA's :206 NA's :206 NA's :206 NA's :206
## Wohnfl_Whg_D geom
## Min. : -9.00 POLYGON :36717
## 1st Qu.: -9.00 epsg:32632 : 0
## Median : -1.00 +proj=utm ...: 0
## Mean : 18.56
## 3rd Qu.: -1.00
## Max. :252.00
## NA's :206
The column Einwohner contains population per grid cell. Minimum value is -1
. We change it to 0 to avoid inconsistencies.
bw_zensus <- bw_zensus |>
mutate(Einwohner = case_when(
Einwohner < 0 ~ 0, # if
TRUE ~ as.numeric(Einwohner) # else
),
unter18_A = case_when(
unter18_A < 0 ~ 0, # if
TRUE ~ as.numeric(unter18_A) # else
))
plot(bw_zensus["Einwohner"], lwd=0.1)
Population density on admin level 1 is usually quite skewed. Therefore we apply a logarithmic transformation. This may help to reveal the relationship between the point pattern and population distribution later on.
## Loading required package: abind
15.5 Density based
In this chapter we try to understand the distribution of McDonald’s restaurants with density based measures.
15.5.1 Quadrat density
The spatstat package provides the functions quadratcount()
and intensity()
to describe the density of points locally. The study area is divided into a set of quadrats. For each quadrat the number of points is counted. The number of quadrats is controlled via the parameters nx
and ny
for rows and columns. For intensity, the number of points is related to the area of the square.
ff.q <- quadratcount(ff.pp, nx= 8, ny=10)
plot(ff.pp, pch=20, cols="grey70", main="Quadrat count McDonald's") # Plot points
plot(ff.q, add=TRUE) # Add quadrat grid
The McDonald’s restaurants show a tendenciy to cluster. Especially in the Stuttgart Metropolitan area and other larger cities. How does a density based analysis look for random and regular point pattern?
The next chunk simulates these kind of point patterns via the sf
function st_sample
. We use the same amount of points but distribute them a) randomly and b) regular.
rand.pp <- st_sample( x = bw, size=nrow(fast_food), type="random") |>
st_transform(32632) |>
as.ppp()
Window(rand.pp) <- bw.owin
reg.pp <- st_sample( x = bw, size=nrow(fast_food), type="regular")
st_crs(reg.pp) <- st_crs(4326)
reg.pp <- reg.pp |>
st_transform(32632) |>
as.ppp()
Window(reg.pp) <- bw.owin
Next we do the quadrat counts for these pattern.
par(mar = c(4, 4, .1, .1))
rand.q <- quadratcount(rand.pp, nx= 8, ny=10)
plot(rand.pp, pch=20, cols="grey70", main="quadrant count random") # Plot points
plot(rand.q, add=TRUE) # Add quadrat grid
reg.q <- quadratcount(reg.pp, nx= 8, ny=10)
plot(reg.pp, pch=20, cols="grey70", main="quadrant count regular") # Plot points
plot(reg.q, add=TRUE) # Add quadrat grid
Quadrat count distribution as histogram
The quadrat counts for the McDonald’s restaurants is much more skewed compared to random and regular distributions.
We can further assess the intensity by a hypothesis test using goodness-of-fit test statistic
## McDonald's
##
## Chi-squared test of CSR using quadrat counts
##
## data:
## X2 = 238.77, df = 69, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## Quadrats: 70 tiles (irregular windows)
## Random
##
## Chi-squared test of CSR using quadrat counts
##
## data:
## X2 = 53.961, df = 69, p-value = 0.1838
## alternative hypothesis: two.sided
##
## Quadrats: 70 tiles (irregular windows)
See the differences in the p-values. The low p-value for the McDonald restaurants indicate an IPP where intensity can be estimated non-parametrically (kernel density). The high p-value (p > 0.05) of the random points indicate a HPP where intensity is constant.
Density within each quadrat is calculated via intensity()
# Compute the density for each quadrat
ff.d <- intensity(ff.q)
# Plot the density
plot(intensity(ff.q, image=TRUE), main=NULL, las=1) # Plot density raster
plot(ff.pp, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE) # Add points
Density is measured as number of points per square meters. For Baden-Württemberg the results are quite big. We can adjust for this by rescaling the spatstat classes to kilometer.
ff.pp.km <- rescale(ff.pp, 1000, "km")
bw.owin.km <- rescale(bw.owin, 1000, "km")
rand.pp.km <- rescale(rand.pp, 1000, "km")
reg.pp.km <- rescale(reg.pp, 1000, "km")
bw_zensus.im.km <- rescale(bw_zensus.im, 1000, "km")
# Compute the density for each quadrat (in counts per km2)
ff.q.km <- quadratcount(ff.pp.km, nx= 8, ny=10)
ff.d.km <- intensity(ff.q.km)
# Plot the density
plot(intensity(ff.q.km, image=TRUE), main="McDonald's quadrat density", las=1) # Plot density raster
plot(ff.pp.km, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE) # Add points
15.5.2 Density on tessellated surface
The point pattern density can also be assessed with non-uniform quadrats / regions. We can use the population raster to define quadrats. First we divide the population raster in 5 regions based on the logged population density. Then we apply quadrat count and intensity on the 5 regions.
brk <- c( -Inf, 2, 4, 6, 8 , Inf) # Define the breaks
#brk <- c( -Inf, -2, 0, 2 , Inf) # Define the breaks
Zcut <- cut(bw_zensus.im.km, breaks=brk, labels=1:5) # Classify the raster
E <- tess(image=Zcut)
plot(E, main="", las=1)
The tesselated surface consisting of 5 regions.
## tile
## 1 2 3 4 5
## 0.001092069 0.001638125 0.004369945 0.011711581 0.065373781
The scale is in kilometer, so the value is to be interpreted as the density of points per km². The point density increases with the population class. A slight indication of dependence between the point process and the covariate. Again, lets plot the density values across the tesselated regions.
15.5.3 Kernel density
The kernel density computes an isotroptic intensity estimate of the point pattern. Similar to the quadrat density appraoch, the kernel density approach computes localized density estimates. But instead of discrete sub-regions, it uses overlapping moving sub-regions or windows. Size and shape of a window is defined by a kernel. Different kernel functions exist. The bandwith parameter sigma
of the density function controls the kernel’s window extent. A too large bandwith/window will result in an oversmoothed kernel density estiamte, whereas a too small set bandwith will be undersmoothed.
ff.pp.km.kd1 <- density(ff.pp.km)
plot(ff.pp.km.kd1, main="Kernel density McDonald's restaurants; default bandwith", las=1)
contour(ff.pp.km.kd1, add=TRUE)
ff.pp.km.kd2 <- density(ff.pp.km, sigma=50) # Using the default bandwidth
plot(ff.pp.km.kd2, main="Kernel density McDonald's restaurants; 50km bandwith", las=1)
contour(ff.pp.km.kd2, add=TRUE)
ff.pp.km.kd3 <- density(ff.pp.km, sigma=10) # Using a 50km bandwidth
plot(ff.pp.km.kd3, main=" density McDonald's restaurants; 10km bandwith", las=1)
contour(ff.pp.km.kd3, add=TRUE)
How does this compare to the random and regular point patterns.
par(mar = c(4, 4, .1, .1))
rand.pp.km.kd <- density(rand.pp.km, sigma=10) # Using a 50km bandwidth
plot(rand.pp.km.kd, main=" density random; 10km bandwith", las=1)
contour(rand.pp.km.kd, add=TRUE)
reg.pp.km.kd <- density(reg.pp.km, sigma=10) # Using a 50km bandwidth
plot(reg.pp.km.kd, main=" density regular; 10km bandwith", las=1)
contour(reg.pp.km.kd, add=TRUE)
15.5.4 Kernel density adjusted
Until now this chapter covered methods to describe the distribution of points. Next we will see how to model the relationship between point distribution and covariates.
We fit a nonparametric curve to the data to further explore the shape of the realtion of McDonald’s restaurant locations and population density.
rho <- rhohat(ff.pp.km, bw_zensus.im.km, method="ratio")
plot(rho, las=1, main=NULL, legendargs=list(cex=0.8, xpd=TRUE))
We see an exponential like increase of McDonald’s locations and increasing population density.
We can use the result to predict McDonald densities under the assumption that population density is the only driver of the point process.
pred <- predict(rho)
col.rmp <- interp.colours(c("green", "blue", "lightyellow"), 100) # Create color scheme
plot(pred, col=col.rmp, las=1, main=NULL, gamma = 0.2)
Predicted values range form 0. to 0.4 restaurants per km². We can check the observed against the predicted intensity via a matrix scatterplot for im classes.
ff.pp.km.kd3_pred <- pairs(ff.pp.km.kd3, pred, plot = FALSE)
plot(ff.pp.km.kd3_pred$pred ~ ff.pp.km.kd3_pred$ff.pp.km.kd3, pch=20,
xlab = "Observed intensity",
ylab = "Predicted intensity",
col = rgb(0,0,0,0.1))
We observe a skew towards predicted intensities. Let’s take a closer look by adjusting the margins.
plot(ff.pp.km.kd3_pred$pred ~ ff.pp.km.kd3_pred$ff.pp.km.kd3, pch=20,
xlab = "Observed intensity",
ylab = "Predicted intensity",
col = rgb(0,0,0,0.1),
xlim = c(0, 0.03), ylim = c(0, 0.1))
abline(a=0, b = 1, col = "red")
Overestimation is present across the whole range of intensities.
15.5.5 Intensity as a function of a covariate
In the following chunk we fit a poisson point process model on the McDonald locations.
## Warning: Values of the covariate 'bw_zensus.im.km' were NA or undefined at 0.11%
## (1 out of 894) of the quadrature points. Occurred while executing: ppm.ppp(Q =
## ff.pp.km, trend = ~bw_zensus.im.km, data = NULL,
plot(effectfun(ppm.1, "bw_zensus.im.km", se.fit=TRUE), main=NULL,
las=1, legendargs=list(cex=0.8, xpd=TRUE ))
Despite the similarity with the nonparametric model, it should be noted that this is a much more well defined model.
## Nonstationary Poisson process
##
## Log intensity: ~bw_zensus.im.km
##
## Fitted trend coefficients:
## (Intercept) bw_zensus.im.km
## -7.5948755 0.4740355
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -7.5948755 0.23746770 -8.0603036 -7.1294473 *** -31.98277
## bw_zensus.im.km 0.4740355 0.03630491 0.4028792 0.5451918 *** 13.05706
## Problem:
## Values of the covariate 'bw_zensus.im.km' were NA or undefined at 0.11% (1 out
## of 894) of the quadrature points
\[\lambda(i)=e^-7.869+0.844 \] Base intensity \(e^-7.869\) when the logged population density is 0. For every logged population unit increase, the density of McDonald’s restaurants increases by \(e^0.844\).
Next we compare our model against a null model.
## Stationary Poisson process
## Intensity: 0.004675151
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## log(lambda) -5.365494 0.07738232 -5.51716 -5.213827 *** -69.33746
The null model which assumes a homogeneous process:
\[\lambda(i)=e^-5.365\]
Actually this is nothing more than the global density of McDonald’s restaurants in our AOI:
## [1] 0.004675151
Next we compare both models using the anova
function:
## Analysis of Deviance Table
##
## Model 1: ~1 Poisson
## Model 2: ~bw_zensus.im.km Poisson
## Npar Df Deviance Pr(>Chi)
## 1 2
## 2 3 1 234.1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Okay we see a significant improvement in desribing the variance of the point process when using the population density based model. How about other covariates. We will now look at the share of population under 18 and explore more functions to assess a poisson point process model.
bw_zensus.u18.im.km <- bw_zensus_stars["unter18_A"] |> as.im() |> rescale(1000, "km")
ppm.2 <- ppm(ff.pp.km ~ bw_zensus.u18.im.km)
## Warning: Values of the covariate 'bw_zensus.u18.im.km' were NA or undefined
## at 0.11% (1 out of 894) of the quadrature points. Occurred while executing:
## ppm.ppp(Q = ff.pp.km, trend = ~bw_zensus.u18.im.km, data = NULL,
## Nonstationary Poisson process
##
## Log intensity: ~bw_zensus.u18.im.km
##
## Fitted trend coefficients:
## (Intercept) bw_zensus.u18.im.km
## -6.3167114 0.4727696
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -6.3167114 0.17242032 -6.654649 -5.9787738 *** -36.635539
## bw_zensus.u18.im.km 0.4727696 0.06261729 0.350042 0.5954973 *** 7.550145
## Problem:
## Values of the covariate 'bw_zensus.u18.im.km' were NA or undefined at 0.11% (1
## out of 894) of the quadrature points
\[\lambda(i)=e^-6.317+0.473\]
## Warning: Some infinite, NA or NaN increments were removed
## Model diagnostics (raw residuals)
## Diagnostics available:
## four-panel plot
## mark plot
## smoothed residual field
## x cumulative residuals
## y cumulative residuals
## sum of all residuals
## sum of raw residuals in entire window = -6.556e-12
## area of entire window = 35720
## quadrature area = 35680
## range of smoothed field = [-0.001498, 0.002525]
## Warning: Some infinite, NA or NaN increments were removed
## Model diagnostics (raw residuals)
## Diagnostics available:
## four-panel plot
## mark plot
## smoothed residual field
## x cumulative residuals
## y cumulative residuals
## sum of all residuals
## sum of raw residuals in entire window = -1.883e-09
## area of entire window = 35720
## quadrature area = 35680
## range of smoothed field = [-0.002285, 0.007125]
15.6 Distance based analysis
This sub chapter covers the second part of point pattern analysis methods - distance based approaches
15.6.1 Average nearest neighbor
THe average nearest neighbor analysis measures the average distance from each point location to the next point location.
The average nearest neighbor for all McDonald’s restaurants in the study area is 6.8611607`. By varying the parameter k one can control for n’th nearest neighbor distances
The average second nearest neighbor is ann.2nd
units.
The following chunk plots the average nearest neighbors for order 1 to 100.
ann <- apply(nndist(ff.pp.km, k=1:100),2,FUN=mean)
plot(ann ~ eval(1:100), type="b", main=NULL, las=1)
Steep increase in average nearest neighbor distance for the first couple of neighbor orders. After order 20, steady growth in average nearest neighbor distance.
Compare against random and regular distributions
15.6.2 Simulate random process and test for CSR
With the average nearest neighbor functions we can test if complete spatial randomness is present in our point distribution. The observed average nearest neighbor is 6.8611607. Now we run multiple iterations and randomly place the same amount of point locations within our study area.
n <- 1499L # iterations
ann.r <- vector(length = n) # empty vector for sim results
for (i in 1:n){
rand.p <- rpoint(n=ff.pp.km$n, win=bw.owin.km)
ann.r[i] <- mean(nndist(rand.p, k=1))
}
Point locations of the last simulation.
Histogram of the average nearest neighbor from the simulations.
hist(ann.r, main=NULL, las=1, breaks=40, col="bisque", xlim=range(ff.pp.km.ann, ann.r))
abline(v=ff.pp.km.ann, col="blue")
The blue line is the observed ANN value. It is clearly smaller than expected ANN values under the null hypotheses of a CSR / homogeneous point process. Now we run same simulation again but in the random placement we will account for the density of population. That way we control for the influence of the population density covariate. This means we are still exploring the underlying 2nd order process but with the covariate we control for the first order process.
n <- 1499L
ann.r <- vector(length=n)
for (i in 1:n){
rand.p <- rpoint(n=ff.pp.km$n, f=bw_zensus.im.km)
ann.r[i] <- mean(nndist(rand.p, k=1))
}
Now the locations are more clustered in the Stuttgart and Mannheim / Heidelberg areas. How about the histogram?
hist(ann.r, main=NULL, las=1, breaks=40, col="bisque", xlim=range(ff.pp.km.ann, ann.r))
abline(v=ff.pp.km.ann, col="blue")
The ANN values are getting closer to the observed ANN value. However we cannot state that the clusters of McDonald’s restaurants can be explained by a completely random process when controlled for population density.
15.6.3 K, L and g functions
Other methods to check for IRP/CSR are K, L and g or pair correlation functions. The K function cumulatively sums up the amount of neighbors for each point location in different distance lags. Per distance lag the mean sum of neighbors is divided by the global point density.
K function produces different functions based on different edge correction algorithm. Explanations of the differences can be found elsewhere. See ?Kest
for a first start. The blue line K pois depicts the theoretical K function under the null hypothesis which states that all points are IRP/CSR. Lines below K pois indicate dispersion, whereas lines above indicate clustering.
K function result shapes are not always easy to interpret if the differences are narrow. Here the L function may help. It is the Besag’s transformation of the K function.
L <- Lest(ff.pp.km, main="L function")
plot(L, main=NULL, las=1, legendargs=list(cex=0.8, xpd=TRUE ))
The following chunk sets the L pois horizontal, further improving interpretation.
According to the function, no matter what border edge correction method is used, all are clearly clustering after ~1km distance bands. The intensity of clustering however varies between the methods. border corretion for instance depicts strong intensity after ~18km band.
K and L function work cumulatively. All neighbors from previous distance bands are included in the following band, like 0-10km, 0-20km … G functions sum up neighbors in discrete bands, like 0-10km, 10-20km…
g <- pcf(ff.pp.km)
plot(g, main="Pair correlation or g function", las=1, legendargs=list(cex=0.8, xpd=TRUE ))
Again g pois represents the IRP/CSR process. Clustering is strongly prevalent in the first distance bands.
What other functions are there? How do they compare when applied to regular and random distributions?
All of the functions can also be calculated individually via the R functions Fest()
, Gest()
, Jest
.
- F function - Empty space function: \(\hat{F}(r) > F_{pois}(r)\) indicate shorter distances from empty spaces to the closest event and therefore a regular pattern. \(\hat{F}(r) < F_{pois}(r)\) suggest clustering.
- G function - Nearest-neighbor distance function: Cumulative distribution of the distance from a typical random point of X to the nearest other point of X. Interpretation of the G function is the reverse of F function: \(\hat{G}(r) > G_{pois}(r)\) indicate nearest neighbor distances are shorter than for a poisson point process suggesting clustering. \(\hat{G}(r) < G_{pois}(r)\) indicates a regular pattern.
- J function - combination of F and G function: \(\hat{G}(r) > 1\) indicate regularity, \(\hat{G}(r) < 1\) clustering.