Chapter 5 Spatial explorative data analysis

5.1 Theory

Spatial exploratory data analysis extend non spatial explorative data analysis. For non spatial explorative data analysis we are interested (according to Tukey, 1975) in smooth and rough parts of the data. The smooth part can be captured by a smoother of different type while the rough part can show up as outliers. Rough parts of the data are observations that are unexpected, but not necessarily wrong. They might originate from measurement errors or they might indicated the effect of an unexpected covariate or a change in system behaviour. It is never the good idea to exclude outliers without thinking hard about it (and of course acknowledging the omission of such data). The ozone hole for example was not detected for a remarkable time period due to automatic outlier routines that removed the unexpected observations, causing a serious delay in reaction to the threat.

DATA = SMOOTH + ROUGH (Tukey, 1975)

SPATIAL DATA = SPATIAL SMOOTH + SPATIAL ROUGH

Spatial explorative data analysis extends this analysis by exploration of spatial smooth and spatial rough signals in the data. In addition to distributional outliers on looks out for spatial outliers, that are not necessarily outliers with repet to the distribution but with respect of where the observation occurs, for example a high value in a neighborhood of low values.

For the smooth part in the data one could look for the following aspects:

  • presence of spatial trend
  • spatial heterogeneity – is the variation in data values as smooth as implied by the trend?
  • global spatial dependence – are high/low values close to other high/low values, anywhere on the map?
  • spatial heterogeneity – are localized patterns of dependence visible? Hot-/coldspots?

We focus here on areal data - i.e. the information is provided at an aggregated level. This of course has implications: the modifiable area unit problem and ecological fallacy should come to your mind immediately. So we need to be careful to take this into account when interpreting the data or results.