{ "cells": [ { "cell_type": "markdown", "id": "e49c42fb", "metadata": {}, "source": [ "# Python Basics \n", "\n", "the following cells have been obtained from the python introduction notebook of\n", "Jeffrey Kantor. These examples and urther can be fund under https://github.com/jckantor/CBE30338\n", "\n", "## variables and primitive data types" ] }, { "cell_type": "code", "execution_count": 1, "id": "f441c06d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "42\n", "3.1415\n" ] } ], "source": [ "#A variable stores a piece of data and gives it a name\n", "answer = 42\n", "\n", "#answer contained an integer because we gave it an integer!\n", "\n", "is_it_thursday = True\n", "is_it_wednesday = False\n", "\n", "#these both are 'booleans' or true/false values\n", "\n", "pi_approx = 3.1415\n", "\n", "#This will be a floating point number, or a number containing digits after the decimal point\n", "\n", "my_name = \"Jacob\"\n", "#This is a string datatype, the name coming from a string of characters\n", "\n", "#Data doesn't have to be a singular unit\n", "\n", "#p.s., we can print all of these with a print command. For Example:\n", "print(answer)\n", "print(pi_approx)" ] }, { "cell_type": "markdown", "id": "3f1994d1", "metadata": {}, "source": [ "# List and Dictionaries" ] }, { "cell_type": "code", "execution_count": 2, "id": "0d8ff40a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Green', 'Blue', 'Red']\n", "[10, 20, 30, 40, 50, 'Sixty']\n", "Green\n", "Red\n", "Rosa Parks\n" ] } ], "source": [ "#What if we want to store many integers? We need a list!\n", "prices = [10, 20, 30, 40, 50]\n", "\n", "#This is a way to define a list in place. We can also make an empty list and add to it.\n", "colors = []\n", "\n", "colors.append(\"Green\")\n", "colors.append(\"Blue\")\n", "colors.append(\"Red\")\n", "\n", "print(colors)\n", "\n", "#We can also add unlike data to a list\n", "prices.append(\"Sixty\")\n", "\n", "#As an exercise, look up lists in python and find out how to add in the middle of a list!\n", "\n", "print(prices)\n", "#We can access a specific element of a list too:\n", "\n", "print(colors[0])\n", "print(colors[2])\n", "\n", "#Notice here how the first element of the list is index 0, not 1! \n", "#Languages like MATLAB are 1 indexed, be careful!\n", "\n", "#In addition to lists, there are tuples\n", "#Tuples behave very similarly to lists except that you can't change them \n", "# after you make them\n", "\n", "#An empty Tuple isn't very useful:\n", "empty_tuple = ()\n", "\n", "#Nor is a tuple with just one value:\n", "one_tuple = (\"first\",)\n", "\n", "#But tuples with many values are useful:\n", "rosa_parks_info = (\"Rosa\", \"Parks\", 1913, \"February\", 4)\n", "\n", "#You can access tuples just like lists\n", "print(rosa_parks_info[0] + \" \" + rosa_parks_info[1])\n", "\n", "# You cannot modify existing tuples, but you can make new tuples that extend \n", "# the information.\n", "# I expect Tuples to come up less than lists. So we'll just leave it at that. \n" ] }, { "cell_type": "markdown", "id": "586e8794", "metadata": {}, "source": [ "# Conditions, Logical operators and If-Statements\n", "The word “if” is a keyword. When Python sees an if-statement, it will determine if the associated logical expression is true. If it is true, then the code in code block will be executed. If it is false, then the code in the if-statement will not be executed. The way to read this is “If logical expression is true then do code block.”\n", "\n", "When there are several conditions to consider you can include elif-statements; if you want a condition that covers any other case, then you can use an else statement." ] }, { "cell_type": "code", "execution_count": 3, "id": "8cf4c578", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Five is greater than two!\n", "One is equal to one\n", "5 does not equal 2 and 5 is greater than two!\n", "Five is NOT smaller than two!\n" ] } ], "source": [ "# To apply conditions in python the if statement is used.\n", "# The statement is based on boolean functions yielding either true or false\n", "#, these includes the following examples logical expressions.\n", "# == (equal), != (does not equal), < (smaller than), <= (equal or smaller than)\n", "# Individual expressions can be combined utilizing logical operators including\n", "#, e.g. \"and\" / \"or\" a more detailed description can found under https://pythonnumericalmethods.berkeley.edu/notebooks/chapter01.05-Logial-Expressions-and-Operators.html\n", "\n", "if 5 > 2:\n", " print(\"Five is greater than two!\") # please note that one as to indent the code block after a condition (4 spaces or 1 tab)\n", "\n", "if 1 == 1:\n", " print(\"One is equal to one\")\n", "\n", "if 5 != 2 and 5 > 2:\n", " print(\"5 does not equal 2 and 5 is greater than two!\" )\n", "\n", "if 5 < 2:\n", " print(\"Five is smaller than two!\")\n", "else:\n", " print(\"Five is NOT smaller than two!\")" ] }, { "cell_type": "markdown", "id": "c202d6f6", "metadata": {}, "source": [ "# Getting started with GeoData\n", "\n", "the following part has been obtained from the official geopandas introduction notebook.\n", "You can find the original and further examples under this link https://geopandas.org/en/stable/getting_started/introduction.html\n" ] }, { "cell_type": "markdown", "id": "8f49f924", "metadata": { "tags": [] }, "source": [ "# Introduction to GeoPandas\n", "\n", "This quick tutorial introduces the key concepts and basic features of GeoPandas to help you get started with your projects.\n", "\n", "## Concepts\n", "\n", "GeoPandas, as the name suggests, extends the popular data science library [pandas](https://pandas.pydata.org) by adding support for geospatial data. If you are not familiar with `pandas`, we recommend taking a quick look at its [Getting started documentation](https://pandas.pydata.org/docs/getting_started/index.html#getting-started) before proceeding.\n", "\n", "The core data structure in GeoPandas is the `geopandas.GeoDataFrame`, a subclass of `pandas.DataFrame`, that can store geometry columns and perform spatial operations. The `geopandas.GeoSeries`, a subclass of `pandas.Series`, handles the geometries. Therefore, your `GeoDataFrame` is a combination of `pandas.Series`, with traditional data (numerical, boolean, text etc.), and `geopandas.GeoSeries`, with geometries (points, polygons etc.). You can have as many columns with geometries as you wish; there's no limit typical for desktop GIS software.\n", "\n", "![geodataframe schema](../_static/dataframe.svg)\n", "\n", "Each `GeoSeries` can contain any geometry type (you can even mix them within a single array) and has a `GeoSeries.crs` attribute, which stores information about the projection (CRS stands for Coordinate Reference System). Therefore, each `GeoSeries` in a `GeoDataFrame` can be in a different projection, allowing you to have, for example, multiple versions (different projections) of the same geometry.\n", "\n", "Only one `GeoSeries` in a `GeoDataFrame` is considered the _active_ geometry, which means that all geometric operations applied to a `GeoDataFrame` operate on this _active_ column.\n", "\n", "\n", "
\n", "\n", "\n", "Let's see how some of these concepts work in practice.\n", "\n", "## Reading and writing files\n", "\n", "First, we need to read some data.\n", "\n", "### Reading files\n", "\n", "Assuming you have a file containing both data and geometry (e.g. GeoPackage, GeoJSON, Shapefile), you can read it using `geopandas.read_file()`, which automatically detects the filetype and creates a `GeoDataFrame`. This tutorial uses the `\"nybb\"` dataset, a map of New York boroughs, which is part of the GeoPandas installation. Therefore, we use `geopandas.datasets.get_path()` to retrieve the path to the dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "808c0478", "metadata": {}, "outputs": [], "source": [ "import geopandas\n", "\n", "path_to_data = geopandas.datasets.get_path(\"nybb\")\n", "gdf = geopandas.read_file(path_to_data)\n", "\n", "gdf" ] }, { "cell_type": "markdown", "id": "73c0f996", "metadata": {}, "source": [ "### Reading Geopackage\n" ] }, { "cell_type": "code", "execution_count": null, "id": "63b1c2d5", "metadata": {}, "outputs": [], "source": [ "path_to_data = \"please insert here\"\n", "gdf_gpkg = geopandas.read_file(path_to_data, driver = \"GPKG\")\n", "\n", "gdf_gpkg" ] }, { "cell_type": "markdown", "id": "02f57560", "metadata": {}, "source": [ "### Reading GeoJSONs" ] }, { "cell_type": "code", "execution_count": null, "id": "06131617", "metadata": {}, "outputs": [], "source": [ "path_to_data = \"please insert here\"\n", "\n", "gdf_json = geopandas.read_file(path_to_data,driver = 'GeoJSON')\n", "\n", "gdf_json" ] }, { "cell_type": "markdown", "id": "c565799a", "metadata": {}, "source": [ "### Reading Shapefile" ] }, { "cell_type": "code", "execution_count": null, "id": "4a7e0aca", "metadata": {}, "outputs": [], "source": [ "path_to_data = \"please insert here\"\n", "\n", "gdf_shp = geopandas.read_file(path_to_data,driver = 'ESRI Shapefile')\n", "\n", "gdf_shp" ] }, { "cell_type": "markdown", "id": "c4074553", "metadata": { "tags": [] }, "source": [ "### Writing files\n", "\n", "To write a `GeoDataFrame` back to file use `GeoDataFrame.to_file()`. The default file format is Shapefile, but you can specify your own with the `driver` keyword." ] }, { "cell_type": "code", "execution_count": null, "id": "fd512231", "metadata": {}, "outputs": [], "source": [ "gdf.to_file(\"data\\my_file.geojson\", driver=\"GeoJSON\")" ] }, { "cell_type": "markdown", "id": "6ee5fce9", "metadata": {}, "source": [ "### writing to GeoPackage" ] }, { "cell_type": "code", "execution_count": null, "id": "03489c27", "metadata": {}, "outputs": [], "source": [ "#in the case of geopackages one in addition has do define the layer \n", "\n", "gdf.to_file(\"data\\my_file.gpkg\",layer='newyork', driver=\"GPKG\")" ] }, { "cell_type": "markdown", "id": "8d4a3a91", "metadata": { "tags": [] }, "source": [ "
\n", "\n", "\n", "\n", "## Simple accessors and methods\n", "\n", "Now we have our `GeoDataFrame` and can start working with its geometry. \n", "\n", "Since there was only one geometry column in the New York Boroughs dataset, this column automatically becomes the _active_ geometry and spatial methods used on the `GeoDataFrame` will be applied to the `\"geometry\"` column.\n", "\n", "### Measuring area\n", "\n", "To measure the area of each polygon (or MultiPolygon in this specific case), access the `GeoDataFrame.area` attribute, which returns a `pandas.Series`. Note that `GeoDataFrame.area` is just `GeoSeries.area` applied to the _active_ geometry column.\n", "\n", "But first, to make the results easier to read, set the names of the boroughs as the index:" ] }, { "cell_type": "code", "execution_count": null, "id": "9b87ea9b", "metadata": {}, "outputs": [], "source": [ "gdf = gdf.set_index(\"BoroName\")" ] }, { "cell_type": "code", "execution_count": null, "id": "fe6d317b", "metadata": {}, "outputs": [], "source": [ "gdf[\"area\"] = gdf.area\n", "gdf[\"area\"]" ] }, { "cell_type": "markdown", "id": "5036c646", "metadata": {}, "source": [ "### Getting polygon boundary and centroid\n", "\n", "To get the boundary of each polygon (LineString), access the `GeoDataFrame.boundary`:" ] }, { "cell_type": "code", "execution_count": null, "id": "e819cc75", "metadata": {}, "outputs": [], "source": [ "gdf['boundary'] = gdf.boundary\n", "gdf['boundary']" ] }, { "cell_type": "markdown", "id": "a636aca3", "metadata": {}, "source": [ "Since we have saved boundary as a new column, we now have two geometry columns in the same `GeoDataFrame`.\n", "\n", "We can also create new geometries, which could be, for example, a buffered version of the original one (i.e., `GeoDataFrame.buffer(10)`) or its centroid:" ] }, { "cell_type": "code", "execution_count": null, "id": "9157bdba", "metadata": {}, "outputs": [], "source": [ "gdf['centroid'] = gdf.centroid\n", "gdf['centroid']" ] }, { "cell_type": "markdown", "id": "b0e1f0e6", "metadata": {}, "source": [ "## Filtering geodataframes\n", "To get rid of unnecessary information or compare specific rows, we need to filter our dataframes further.\n", "To achive this the following cells will demonstrate how to extract certain rows/columns/cells using the GeoDataFrame.loc() fucntion. In addition we will take a look how we can apply logical conditions and geometry relations through a mask to our dataframe" ] }, { "cell_type": "code", "execution_count": null, "id": "d9d302b1", "metadata": {}, "outputs": [], "source": [ "#selecting a specific column\n", "gdf.loc[\"Brooklyn\"]\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b6f0cfd7", "metadata": {}, "outputs": [], "source": [ "# selecting a specific row\n", "gdf.loc[:,\"geometry\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "927a3981", "metadata": {}, "outputs": [], "source": [ "#selecting a specific column and row\n", "gdf.loc[\"Brooklyn\",\"geometry\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "e66b9ebb", "metadata": {}, "outputs": [], "source": [ "#selecting multiple rows\n", "gdf.loc[[\"Brooklyn\",\"Queens\"]]" ] }, { "cell_type": "code", "execution_count": null, "id": "aede43ce", "metadata": {}, "outputs": [], "source": [ "# filtering for a specific condition using masks\n", "mask = gdf[\"BoroCode\"]>=3\n", "mask " ] }, { "cell_type": "code", "execution_count": null, "id": "7bf134c2", "metadata": {}, "outputs": [], "source": [ "gdf[mask]" ] }, { "cell_type": "code", "execution_count": null, "id": "9457c9da", "metadata": {}, "outputs": [], "source": [ "queens = gdf.loc[\"Queens\"]\n", "gdf[~gdf.intersects(queens.centroid)] # the ~ negates a condition in pandas" ] }, { "cell_type": "markdown", "id": "992550a5", "metadata": {}, "source": [ "### Measuring distance\n", "\n", "We can also measure how far each centroid is from the first centroid location." ] }, { "cell_type": "code", "execution_count": null, "id": "db4c732a", "metadata": {}, "outputs": [], "source": [ "first_point = gdf['centroid'].iloc[0]\n", "gdf['distance'] = gdf['centroid'].distance(first_point)\n", "gdf['distance']" ] }, { "cell_type": "markdown", "id": "da61f320", "metadata": {}, "source": [ "Note that `geopandas.GeoDataFrame` is a subclass of `pandas.DataFrame`, so we have all the pandas functionality available to use on the geospatial dataset — we can even perform data manipulations with the attributes and geometry information together.\n", "\n", "For example, to calculate the average of the distances measured above, access the 'distance' column and call the mean() method on it:" ] }, { "cell_type": "code", "execution_count": null, "id": "68e1bf77", "metadata": {}, "outputs": [], "source": [ "gdf['distance'].mean()" ] }, { "cell_type": "markdown", "id": "cb3f7b94", "metadata": {}, "source": [ "## Making maps\n", "\n", "GeoPandas can also plot maps, so we can check how the geometries appear in space. To plot the active geometry, call `GeoDataFrame.plot()`. To color code by another column, pass in that column as the first argument. In the example below, we plot the active geometry column and color code by the `\"area\"` column. We also want to show a legend (`legend=True`)." ] }, { "cell_type": "code", "execution_count": null, "id": "11004cfe", "metadata": {}, "outputs": [], "source": [ "gdf.plot(\"area\", legend=True)" ] }, { "cell_type": "markdown", "id": "1982da8e", "metadata": {}, "source": [ "You can also explore your data interactively using `GeoDataFrame.explore()`, which behaves in the same way `plot()` does but returns an interactive map instead." ] }, { "cell_type": "code", "execution_count": null, "id": "4cc39203", "metadata": {}, "outputs": [], "source": [ "gdf.explore(\"area\", legend=False)" ] }, { "cell_type": "markdown", "id": "2600a3dd", "metadata": {}, "source": [ "Switching the active geometry (`GeoDataFrame.set_geometry`) to centroids, we can plot the same data using point geometry." ] }, { "cell_type": "code", "execution_count": null, "id": "2e918340", "metadata": {}, "outputs": [], "source": [ "gdf = gdf.set_geometry(\"centroid\")\n", "gdf.plot(\"area\", legend=True)" ] }, { "cell_type": "markdown", "id": "36a8f922", "metadata": {}, "source": [ "And we can also layer both `GeoSeries` on top of each other. We just need to use one plot as an axis for the other." ] }, { "cell_type": "code", "execution_count": null, "id": "ed297d80", "metadata": {}, "outputs": [], "source": [ "ax = gdf[\"geometry\"].plot()\n", "gdf[\"centroid\"].plot(ax=ax, color=\"black\")" ] }, { "cell_type": "markdown", "id": "b7ad5f3f", "metadata": {}, "source": [ "Now we set the active geometry back to the original `GeoSeries`." ] }, { "cell_type": "code", "execution_count": null, "id": "4a7cddef", "metadata": {}, "outputs": [], "source": [ "gdf = gdf.set_geometry(\"geometry\")" ] }, { "cell_type": "markdown", "id": "f5d1a0d1", "metadata": {}, "source": [ "
\n", "\n", "## Geometry creation\n", "\n", "We can further work with the geometry and create new shapes based on those we already have. \n", "\n", "### Convex hull\n", "\n", "If we are interested in the convex hull of our polygons, we can access `GeoDataFrame.convex_hull`." ] }, { "cell_type": "code", "execution_count": null, "id": "1df87fdf", "metadata": {}, "outputs": [], "source": [ "gdf[\"convex_hull\"] = gdf.convex_hull" ] }, { "cell_type": "code", "execution_count": null, "id": "0ae3f410", "metadata": {}, "outputs": [], "source": [ "ax = gdf[\"convex_hull\"].plot(alpha=.5) # saving the first plot as an axis and setting alpha (transparency) to 0.5\n", "gdf[\"boundary\"].plot(ax=ax, color=\"white\", linewidth=.5) # passing the first plot and setting linewitdth to 0.5" ] }, { "cell_type": "markdown", "id": "9281f5c7", "metadata": {}, "source": [ "### Buffer\n", "\n", "In other cases, we may need to buffer the geometry using `GeoDataFrame.buffer()`. Geometry methods are automatically applied to the active geometry, but we can apply them directly to any `GeoSeries` as well. Let's buffer the boroughs and their centroids and plot both on top of each other." ] }, { "cell_type": "code", "execution_count": null, "id": "43b731ed", "metadata": {}, "outputs": [], "source": [ "# buffering the active geometry by 10 000 feet (geometry is already in feet)\n", "gdf[\"buffered\"] = gdf.buffer(10000)\n", "\n", "# buffering the centroid geometry by 10 000 feet (geometry is already in feet)\n", "gdf[\"buffered_centroid\"] = gdf[\"centroid\"].buffer(10000)" ] }, { "cell_type": "code", "execution_count": null, "id": "df436f45", "metadata": {}, "outputs": [], "source": [ "ax = gdf[\"buffered\"].plot(alpha=.5) # saving the first plot as an axis and setting alpha (transparency) to 0.5\n", "gdf[\"buffered_centroid\"].plot(ax=ax, color=\"red\", alpha=.5) # passing the first plot as an axis to the second\n", "gdf[\"boundary\"].plot(ax=ax, color=\"white\", linewidth=.5) # passing the first plot and setting linewitdth to 0.5" ] }, { "cell_type": "markdown", "id": "69086844", "metadata": {}, "source": [ "
## What next?

With GeoPandas we can do much more than what has been introduced so far, from [aggregations](../docs/user_guide/aggregation_with_dissolve.rst), to [spatial joins](../docs/user_guide/mergingdata.rst), to [geocoding](../docs/user_guide/geocoding.rst), and [much more](../gallery/index.rst).

Head over to the [User Guide](../docs/user_guide.rst) to learn more about the different features of GeoPandas, the [Examples](../gallery/index.rst) to see how they can be used, or to the [API reference](../docs/reference.rst) for the details.