Dimensions of Spatial Data Quality#

Describing the elements of spatial data quality is imperative to provide stakeholders with the necessary information to evaluate fitness for use of a dataset for their particular application [Oort, 2006]. Better spatial data quality assessment would promote the adoption and (right) usage of new sources of data such as OSM and data products based on OSM. A large community of researchers has analyzed the quality of OSM data [Barron et al., 2014, Senaratne et al., 2016]. It has been acknowledged that OSM data in general is strongly biased, in part due to a much larger contributor basis in countries in the global North as a consequence of socio-economic inequalities and the digital divide [Neis et al., 2013, Sui et al., 2013].

[Oort, 2006] identifies five main reasons for current concerns about spatial data quality issues (and these are still valid today!):

  1. There is an increasing availability, exchange and use of spatial data.

  2. There is a growing group of users less aware of spatial data quality.

  3. GIS enable the use of spatial data in all sorts of applications, regardless of the appropriateness with regard to data quality.

  4. Current GIS offer hardly any tools for handling spatial quality.

  5. There is an increasing distance between those who use the spatial data (the end users) and those who are best informed about the quality of the spatial data (the producers).

“The most important motivation for describing spatial data quality is to provide the potential user of a data set with the necessary information to decide on the fitness for use of a data set for his particular application.” [Oort, 2006]

Definitions#

Positional accuracy#

This is probably the most obvious aspect of quality and evaluates how well the coordinate value of an object in the database relates to the reality on the ground.

In humanitarian mapping spatial offset is one of the most leading positional accuracy aspects that originate from the mis-alignment of satellite images that are used during desktop digitization that results from placing features at positions that deviate from their original. ngumenawesamson’s OSM Diary

Examples:

  • how well represented is the geometry of a building

  • how precisely located is a Point-of-Interest such as a restaurant

  • can lead to further problems such as buildings overlapping roads, public facilities in the middle of roads, buildings in water bodies etc.

Completeness#

This is a measure of the lack of data; that is, an assessment of how many objects are expected to be found in the database but are missing, as well as an assessment of excess data that should not be included. In other words, how comprehensive is the coverage of real-world objects.

Estimating the completeness of specific features in OSM can help practitioners and researchers to pick the best strategies to cope with OSM’s uneven spatial coverage. When unaccounted for, spatial bias can lead analysts and researchers to draw general conclusions which are only valid for well-represented (well-mapped) areas [Meyer and Pebesma, 2022]. By inadvertently neglecting less well mapped areas in their analyses and datasets, they are in danger of counteracting the overarching goal of the SDGs ensuring that “nobody is left behind”.

“The absence of a global completeness assessment, meanwhile, hampers the use of OSM for research in economics, urban planning, environmental studies and related fields, such as analyses of worldwide patterns of travel behavior or urban development.” [Barrington-Leigh and Millard-Ball, 2017]

Examples:

  • assessment of the road network completeness in OSM for each country

  • completeness of building footprint data in cities

Temporal quality#

This is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.

As of now there are relatively few scientific papers about how up-to-date OSM data is. There is some evidence that an active contributor base in a region has a positive effect on temporal quality of OSM objects [Girres and Touya, 2010]. In general OSM offers the advantage over many authoritative datasets that it can be updated at any time by any users, however measure of temporal quality should investigate if this potential is actually achieve.

[Zook et al., 2010] put a particular emphasis on the temporal quality aspect of OSM in relation to fitness for purpose in a disaster management context.

“In disaster situations, however, geographic information need only be good enough to assist recovery workers using the maps, meaning that crowdsourced information is likely to be just as helpful as that produced by more centralized means. Indeed, it can be even more useful if peer production allows for new information to be incorporated and distributed in near real time.” [Zook et al., 2010]

Attribute accuracy#

As objects in a geographical database are represented not only by their geometrical shape but also by additional attributes, this measure evaluates how correct these values are. Depending on the measurement scale of the attribute (ratio, interval, ordinal, nominal) different approaches are used to describe attribute accuracy.

When working with nominal attributes, e.g. land cover classes, common quality measures are often based upon a so-called confusion matrix.

Examples:

  • How accurate are labels collected via the crowdsourcing app MapSwipe

  • investigate the performance of a land cover classification model

  • compare a land use land cover dataset with ground truth data

Logical consistency#

This is an aspect of the internal consistency of the dataset, in terms of topological correctness and the relationships that are encoded in the database. Checking for logical consistency means to identify which objects do not conform to the rules how objects should be mapped. Often there is a correlation between logical consistency and positional accuracy, e.g. low positional accuracy of coastline data can lead to low logical consistency as roads might cross a coastline several times.

“The logical consistency measures the consistency of different database objects with other objects of the same theme (intra-theme consistency), or objects of other themes (inter-theme consistency). For example, linear roads must be captured in a network and must share the same geometry as the administrative boundaries when this is the case in reality” [Girres and Touya, 2010]

Examples:

  • in OSM buildings should be represented as polygons, not as lines or points.

  • different land cover classes should not spatially overlap

  • in OSM roads should be connected to each other sharing one node to allow for proper routing

Semantic accuracy#

Semantic accuracy links the way in which an object is captured and represented in the database to its meaning and the way in which it should be interpreted. An illustrative example are buildings blocks in OSM which are sometimes mapped as individual buildings and sometimes as a single polygon per building block. Especially on open projects such as OpenStreetMap semantics might also change based on the location and specific community guidelines and semantic accuracy might be low when global tagging approaches are not adapted to local conditions.

Examples:

  • features tagged as highway=primary might not represent the same kind of street when comparing different countries

Lineage#

This aspect of quality is about the history of the dataset, how it was collected, and how it evolved. It also concerns the source from which the data was derived and issues which might arise from changes in the production process.

For OpenStreetMap data the history of the data can be assessed through the ohsome framework and this can reveal how mapping style and guidelines have changed over the time.

The development of health related amenities captures interweaved phenomena: the tagging of real world phenomena, changes in tagging conventions, external events that trigger mapping activities (such as earthquakes or tsunamis) as well as mass imports. In addition, real world phenomena change over time: health related amenities might be created or be taken out of use (e.g. in Germany). Blogpost by Sven Lautenbach

Examples:

  • changes in OSM mapping from GPS based mapped of roads to satelliate based mapping of roads

  • introduction of AI-assisted workflows to map roads and buildings in OSM

  • changes due to increased corporate and humanitarian mapping activity