{ "cells": [ { "cell_type": "markdown", "id": "f5384e51", "metadata": {}, "source": [ "# Exercise 02: Analysing Bias and Inequalities in OSM\n", "\n", "## Motivation\n", "Administrative or \"official\" geodatasets usually try to reach the same level of coverage and quality for an entire country or federal state. In contrast, the coverage of OSM is not the same in any place of the world. There are places with a large active OSM community and others where nothing has been mapped at all. Without going deeper into the topic of data quality, the spatial (and temporal) distribution of mapping around the globe can already tell us many interesting properties of OSM. By analysing the spatial temporal footprint of mapping activity in OSM we can identify regions with stronger local communities and can also find out how mapping is organized on larger scales.\n", "\n", "> Overall, mapping in OSM was strongly biased towards regions with very high Human Development Index. However, humanitarian mapping efforts had a different footprint, predominantly focused on regions with medium and low human development. Despite these efforts, regions with low and medium human development only accounted for 28% of the buildings and 16% of the roads mapped in OSM although they were home to 46% of the global population. {cite}`Herfort2021`\n", "\n", "This variability in mapping does not only become visible in the overall spatial coverage of OSM, but also when investigating how much individual users are contributing to the map.\n", "\n", ">\"Not all users contribute equally to the map. OSM is no exception to the 90-9-1 rule found in online communities where only a small number of active contributors account for most of the contributions. By our calculations for OSM, the top 1.4% of editors are responsible for 90% of all the map changes.\" {cite}`Anderson2019`\n", "\n", ">\"The results show that only 38% (192,000) of the registered members carried out at least one edit in the OSM database and that only 5% (24,000) of all members actively contributed to the project in a more productive way.\" {cite}`Neis2012`\n", "\n", "```{figure} ../figs/anderson_et_al_corporate_editors.png\n", "---\n", "height: 350px\n", "name: jennings-map-corporate-mappers\n", "---\n", "Places where corporate editors are editing. (source: {cite}`Anderson2019`)\n", "```\n", "\n", "\n", "## How biased is the global distribution of OSM contributions?\n", "\n", "This exercise will investigate the global spatial distribution of mapping in OSM for 5 days in 2020-02. We will use the changesets from OSM as they allow us to distinguish mapping by user, specific changeset comments (often used to identify a certain mapping campaign or group) and contain the rough locations of the map changes. To get an overview, analysing the changesets is often a quick and relatively simple approach. To get a first idea of OSM's global footprint you don't need to start with taking a look at every single element, which might get too complicated very soon. So let's start with something simpler.\n", "\n", "### 1. Download OSM Changesets\n", "On [HeiBOX](https://heibox.uni-heidelberg.de/f/87fe1933837e4e5491fd/) you can download an geopackage file. This file contains the OSM changesets for the time of 2020-02-01 to 2020-02-06. We'll work on this smaller extract to make sure that computations don't take too long.\n", "\n", "```{note}\n", "In principle, it's of course also interesting to take a look at all 100,000,000+ changesets contributed so far to OSM. For this you could set up your own database containing all OSM changesets. You will find an overview on how to do it [here](https://github.com/ToeBee/ChangesetMD). Since this process took several hours for me, it might be easier for you to get started with the geopackage provided for download.\n", "```\n", "\n", "Also make sure to get two csv files we are going to use later:\n", "* facebook_users.csv\n", "* apple_users.csv\n", "\n", "### 2. Get an overview\n", "Connect to the geopackage in QGIS and load the Polygon layer to get an overview on the spatial coverage.\n", "\n", "````{dropdown} Show the result in QGIS.\n", "```{figure} ../figs/changesets_qgis.png\n", "---\n", "height: 350px\n", "name: changesets-qgis\n", "---\n", "Global distribution of OSM changesets bounding boxes for 2020-02-01 to 2020-02-06.\n", "```\n", "````\n", "\n", "Create a heatmap using the changeset centroid as an approximation for mapping location. You could create the heatmap in python as well. However, it might be nicer and will look more professional if you do this in QGIS. In QGIS you can add a background layer and others things such as grids easily. When doing it in python can give you a quick and dirty overview (e.g. with a scatterplot and setting the opacity / alpha value of the points).\n", "\n", "````{dropdown} Show the steps in QGIS.\n", "```{figure} ../figs/changesets_heatmap_qgis.png\n", "---\n", "height: 350px\n", "name: changesets-heatmap-qgis\n", "---\n", "To calculate the heatmap we use a radius of 5 degree and set the cell size to 0.5 degree. Note that these parameters are rather arbitrary and we use them here just to get a quick overview. For a more regional analysis you might want to use a metrical projection.\n", "```\n", "````\n", "\n", "### 3. Analyse contribution inequality\n", "We will run the following part of the analysis in Python. Some parts you might be able to also run with QGIS, but let's take this opportunity to practice our Python skills.\n", "\n", "#### Load the geopackage\n", "For this exercise we can directly use the centroids." ] }, { "cell_type": "code", "execution_count": 6, "id": "ab74f969", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "user_id | \n", "created_at | \n", "closed_at | \n", "num_changes | \n", "user_name | \n", "comment | \n", "source | \n", "created_by | \n", "geometry | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "80396743 | \n", "10368583 | \n", "2020-02-01 00:00:02+00:00 | \n", "2020-02-01 00:00:04+00:00 | \n", "196 | \n", "mapsandplants | \n", "Aligned buildings in Vareš | \n", "Mapbox Satellite | \n", "JOSM/1.5 (15492 en) | \n", "POINT (18.32850 44.16627) | \n", "
1 | \n", "80396744 | \n", "21289 | \n", "2020-02-01 00:00:09+00:00 | \n", "2020-02-01 00:00:10+00:00 | \n", "8 | \n", "hogrod | \n", "Alton,IL update 28 | \n", "None | \n", "iD 2.17.1 | \n", "POINT (-90.08601 38.88162) | \n", "
2 | \n", "80396745 | \n", "1658735 | \n", "2020-02-01 00:00:11+00:00 | \n", "2020-02-01 00:00:13+00:00 | \n", "40 | \n", "rustywagon | \n", "Added houses. | \n", "Bing | \n", "JOSM/1.5 (15628 en) | \n", "POINT (-113.96775 53.53355) | \n", "
3 | \n", "80396746 | \n", "307520 | \n", "2020-02-01 00:00:13+00:00 | \n", "2020-02-01 00:00:13+00:00 | \n", "1 | \n", "wolfgang8741 | \n", "reclassify to lake and link to wikidata | \n", "None | \n", "iD 2.17.1 | \n", "POINT (-85.86732 44.74097) | \n", "
4 | \n", "80396747 | \n", "9458168 | \n", "2020-02-01 00:00:18+00:00 | \n", "2020-02-01 00:00:19+00:00 | \n", "27 | \n", "Gorka115 | \n", "added buildings, roads and/or details | \n", "None | \n", "iD 2.17.1 | \n", "POINT (16.90061 -21.96159) | \n", "
\n", " | user_id | \n", "num_changes | \n", "
---|---|---|
user_id | \n", "\n", " | \n", " |
9560588 | \n", "1 | \n", "0 | \n", "
6407181 | \n", "6 | \n", "0 | \n", "
10646969 | \n", "10 | \n", "0 | \n", "
6402052 | \n", "60 | \n", "0 | \n", "
7994908 | \n", "1 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "
2913188 | \n", "47 | \n", "112353 | \n", "
7589857 | \n", "18 | \n", "117452 | \n", "
10253243 | \n", "231 | \n", "165520 | \n", "
8065782 | \n", "46 | \n", "311081 | \n", "
408769 | \n", "96 | \n", "918517 | \n", "
15328 rows × 2 columns
\n", "