Tutorial 1 Geospatial data types, operations and processing#

In this tutorial, we will introduce the fundamental geospatial data types and processing techniques with GeoPandas, which is a Python library for working with geospatial data. It extends the capabilities of Pandas to handle spatial data for users to perform geospatial operations and analyses easily.

All the data can be downloaded at this link (click)

# Import required libraries
import pandas as pd
import geopandas  as gpd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

Geopandas dataframe (GeoDataFrame) contains three parts:

  • index: The index column is a unique identifier for each row in the GeoDataFrame. It can be a simple integer index or a more complex index based on the data.

  • data: The data columns contain the attribute information associated with each geometric feature. These columns can include various data types, such as integers, floats, strings, and dates.

  • geometry: The geometry column contains the geometric representation of the features (e.g., points, lines, polygons). It is a special column that stores geometric objects using Shapely library.

gdf

Geospatial data types:

  • Vector data: Represents geographic features using points, lines, and polygons (geometry types). Each feature has associated attributes stored in a table. Common file formats include Shapefile, GeoJSON, and KML.

Feature

Shapefile

GeoJSON

KML

Format Type

Binary (requires multiple files)

Text (JSON-based, lightweight)

Text (XML-based, heavier)

Main Usage

Professional GIS software

Web mapping and APIs (e.g., Leaflet, Mapbox)

Google Earth, simple visualization

File Size & Efficiency

Efficient but needs .shp, .shx, .dbf files together

Light, easy to transfer over web

Larger files, less ideal for large datasets

Big data format:

Feature

Parquet / GeoParquet

GeoPackage (.gpkg)

GeoJSONSeq / JSONL

Format Type

Binary (columnar, optimized for analytics)

Binary (SQLite-based single-file database)

Text-based (line-delimited GeoJSON)

Main Usage

Big data processing, cloud analytics, fast I/O

Desktop GIS, mobile apps, portable multi-layer data

Streaming spatial data, logging, web pipelines

File Size & Efficiency

Very compact, highly efficient for large/tabular data

Compact, self-contained but larger than Parquet

Lightweight per line, but not efficient at scale

Multi-layer Support

1 table per file, but easily batched

Fully supports multiple vector and raster layers

No layer structure

  • Raster data: Represents geographic information as a grid of pixels, where each pixel has a value representing a specific attribute (e.g., elevation, temperature). Common formats include GeoTIFF and NetCDF.

1.1 Fundamental geometric objects#

Geometry types / geometric objects:

  • Points: Represent discrete locations (e.g., cities, landmarks).

    • MultiPoint: A collection of multiple points, often used to represent clusters of discrete locations (e.g., a group of cities).

  • LineString: Represent linear features (e.g., roads, rivers).

    • MultiLineString: A collection of multiple lines, often used to represent complex linear features (e.g., a river with multiple branches).

  • Polygons: Represent areas (e.g., countries, lakes).

    • MultiPolygon: A collection of multiple polygons, often used to represent complex areas (e.g., a country with multiple islands).

We use a python library called Shapely to handle nad process the geometric objects. We don’t need to install it separately as it is already included in the GeoPandas library.

# Import the required libraries
from shapely.geometry import Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon

1.1.1 Point#

# Create a point object
point1 = Point(1, 2)
point2 = Point(3, 4)
# The point object
point1
_images/8107fff351d6544abe2eb9587f74bf13782798aad686c77c5b7257fc1085d16b.svg
# Print the point objects type in python
type(point1)
shapely.geometry.point.Point
# Or we can use the geom_type to check the type of the point object
point1.geom_type
'Point'

Point attributes

# We can also check the coordinate info (x, y) of the point object
print(list(point1.coords), point1.x, point1.y)
[(1.0, 2.0)] 1.0 2.0

Distance between two points

While the distance between two points is calculated using the Euclidean distance formula, which is the straight-line distance between two points in a Cartesian coordinate system (we will introduce the coordinate reference system (CRS) later). In other words, checking measurement unit (meter, feet or mile) in the CRS you’re using is important.

# Calculate the distance between two points
point1.distance(point2)
2.8284271247461903

Creating a GeoDataframe with df

# Create a GeoDataFrame with point1 and point2
# Create a DataFrame with point1 and point2
df = pd.DataFrame({'name': ['pt1', 'pt2'], 'geometry': [point1, point2]})
# Create a GeoDataFrame from the DataFrame
gdf1 = gpd.GeoDataFrame(df, geometry='geometry')
gdf1
name geometry
0 pt1 POINT (1 2)
1 pt2 POINT (3 4)

If there is a large set of coords in a file, we can use gpd.points_from_xy() to create a GeoDataFrame with points from the x and y coordinates.

# Create a DataFrame with x and y coordinates
df = pd.DataFrame({'x': [1, 2, 3, 6], 'y': [4, 5, 6, 8]})
# Create a GeoDataFrame from the DataFrame
gdf2 = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.x, df.y))
gdf2
x y geometry
0 1 4 POINT (1 4)
1 2 5 POINT (2 5)
2 3 6 POINT (3 6)
3 6 8 POINT (6 8)

1.1.2 LineString#

# Create a LineString object
line1 = LineString([(0, 0), (1, 1), ]) # A line is made of two points
line2 = LineString([(0, 0), (1, 2), (2, 2)]) # A line is made of three points
line3 = LineString([(0, 0), (0, 2), (2, 3), (3, 1)]) # A line with four points
# The LineString object
line1
_images/f8a58c827487e03874a12498a52e0aae7ad00e8bf1a038b628781936926201c3.svg
# The LineString object
line2
_images/219e6df9fb7623cf0c2311987ee1ae0669e6d4ea2d40a31aa68ef53b6d70a78f.svg
# The LineString object
line3
_images/5343ea1d4481b868e4306ec6696c223c93bc144dafc77bea3f78b756b7dbdbf9.svg
# Print the LineString objects type in python
type(line1)
shapely.geometry.linestring.LineString
# Or we can use the geom_type to check the type of the LineString object
line1.geom_type
'LineString'

LineString attributes

# We can also check the coordinate info (x, y) of points within the LineString object
print('line1', list(line1.coords), line1.xy)
print('line2', list(line2.coords), line2.xy)
print('line3', list(line3.coords), line3.xy)
line1 [(0.0, 0.0), (1.0, 1.0)] (array('d', [0.0, 1.0]), array('d', [0.0, 1.0]))
line2 [(0.0, 0.0), (1.0, 2.0), (2.0, 2.0)] (array('d', [0.0, 1.0, 2.0]), array('d', [0.0, 2.0, 2.0]))
line3 [(0.0, 0.0), (0.0, 2.0), (2.0, 3.0), (3.0, 1.0)] (array('d', [0.0, 0.0, 2.0, 3.0]), array('d', [0.0, 2.0, 3.0, 1.0]))

Calculate the length of the LineString object

# Calculate the length of the LineString object
line1.length, line2.length, line3.length
(1.4142135623730951, 3.23606797749979, 6.47213595499958)

The centroid of a LineString

The centroid of a LineString is the point that represents the geometric center of the line. It is calculated as the average of the coordinates of all points in the LineString. The centroid is not necessarily a point on the line itself, but it is the point that minimizes the distance to all points on the line.

# Calculate the centroid of the LineString object
line1.centroid, line2.centroid, line3.centroid
(<POINT (0.5 0.5)>, <POINT (0.809 1.309)>, <POINT (1.209 1.864)>)
# we use matplotlib to plot the LineStrings and their centroids

import matplotlib.pyplot as plt
# Create a figure and axis
fig, ax = plt.subplots(figsize=(6, 6))
# Plot the LineString object
x, y = line1.xy
ax.plot(x, y, color='blue', linewidth=2, label='LineString 1')
x, y = line2.xy
ax.plot(x, y, color='red', linewidth=2, label='LineString 2')
x, y = line3.xy
ax.plot(x, y, color='green', linewidth=2, label='LineString 3')
# Plot the centroid of the LineString object
ax.plot(line1.centroid.x, line1.centroid.y, 'o', color='blue', markersize=10, label='Centroid 1')
ax.plot(line2.centroid.x, line2.centroid.y, 'o', color='red', markersize=10, label='Centroid 2')
ax.plot(line3.centroid.x, line3.centroid.y, 'o', color='green', markersize=10, label='Centroid 3')
ax.grid()
# Add a title and labels
ax.set_title('LineString and Centroid')
ax.set_xlabel('X')
ax.set_ylabel('Y')
# Add a legend
ax.legend()
# Show the plot
plt.show()
_images/873c15077acec462f500bd7ee4839c472b6c6823f13dab80c97bbb7913cf23f6.png

1.1.3 Polygon#

Building a polygon is not as simple as a point or a line. A polygon is made of multiple points, and the first and last points must be the same to close the polygon. The points are connected in the order they are defined, forming the edges of the polygon.

# Create a Polygon object
polygon1 = Polygon([(0, 0), (1, 1), (1, 0), (0, 0)]) # A polygon is made of three points
polygon2 = Polygon([(1, 1), (2, 4), (3, 4), (4, 2), (1, 1)]) # A polygon is made of four points
polygon3 = Polygon([(1, 2), (2, 5), (3, 4),(5, 5), (3, 2), (1, 2)]) # A polygon is made of five points
# The Polygon object
polygon1
_images/b3076aad4c3232b1a3a4347173439db6cca62f7ab8edaeb2eb54550b60911722.svg
# The Polygon object
polygon2
_images/b7d617c0f85d90ac28d00afd797bb59763146fc9aa8dd043b4d92ab40a8e3066.svg
# The Polygon object
polygon3
_images/729bb1117084dc4bcba964da2464793a92feb348eefca6d70636af683a1f91f0.svg
# we can also create a polygon using the LineString object, the polygon is made of four points, it will be closed automatically.
polygon4 = Polygon(line3)
# The Polygon object
polygon4
_images/57816baa0d50187f6a0c19672d9b721292d912e8444ad5908e6a20b9b3c77b6a.svg
# we can also create a polygon with a hole using sell and hole
# The outer boundary of the polygon
# The inner boundary of the polygon (the hole)
outer_boundary = [(0, 0), (4, 0), (4, 4), (0, 4), (0, 0)]
inner_boundary = [(1, 1), (1, 3), (3, 3), (3, 1), (1, 1)]
polygon5 = Polygon(shell=outer_boundary, holes=[inner_boundary])
# The Polygon object
polygon5
_images/1ad26be1f28e6a874e04c7ad2aaf35a7c7aa47d14ea21b364f3429b348565d6c.svg
# Print the Polygon objects type in python
type(polygon1)
shapely.geometry.polygon.Polygon
# Or we can use the geom_type to check the type of the Polygon object
polygon1.geom_type
'Polygon'

Polygon attributes

# We can also check the coordinate info (x, y) of points within the Polygon object
print('polygon2', list(polygon2.exterior.coords), polygon2.exterior.xy)
polygon2 [(1.0, 1.0), (2.0, 4.0), (3.0, 4.0), (4.0, 2.0), (1.0, 1.0)] (array('d', [1.0, 2.0, 3.0, 4.0, 1.0]), array('d', [1.0, 4.0, 4.0, 2.0, 1.0]))
# get the exterior and interior coordinates of the polygon5
print('polygon5', list(polygon5.exterior.coords), polygon5.exterior.xy)
print('polygon5', list(polygon5.interiors[0].coords), polygon5.interiors[0].xy)
polygon5 [(0.0, 0.0), (4.0, 0.0), (4.0, 4.0), (0.0, 4.0), (0.0, 0.0)] (array('d', [0.0, 4.0, 4.0, 0.0, 0.0]), array('d', [0.0, 0.0, 4.0, 4.0, 0.0]))
polygon5 [(1.0, 1.0), (1.0, 3.0), (3.0, 3.0), (3.0, 1.0), (1.0, 1.0)] (array('d', [1.0, 1.0, 3.0, 3.0, 1.0]), array('d', [1.0, 3.0, 3.0, 1.0, 1.0]))
# The exterior length of the polygon
print('polygon2', polygon2.exterior.length)
polygon2 9.56062329783655
# The exterior and interior length of the polygon5
print('polygon5', polygon5.exterior.length, polygon5.interiors[0].length)
polygon5 16.0 8.0
# Calculate the area of the Polygon object
polygon1.area, polygon2.area, polygon3.area, polygon4.area, polygon5.area
(0.5, 5.0, 6.0, 5.5, 12.0)
# we use matplotlib to plot the Polygons and their centroids
# Create a figure and axis
fig, ax = plt.subplots(figsize=(6, 6))
# Plot the Polygon object
x, y = polygon1.exterior.xy
ax.plot(x, y, color='blue', linewidth=2, label='Polygon 1')
x, y = polygon2.exterior.xy
ax.plot(x, y, color='red', linewidth=2, label='Polygon 2')
x, y = polygon3.exterior.xy
ax.plot(x, y, color='green', linewidth=2, label='Polygon 3')
# Plot the centroid of the Polygon object
ax.plot(polygon1.centroid.x, polygon1.centroid.y, 'o', color='blue', markersize=10, label='Centroid 1')
ax.plot(polygon2.centroid.x, polygon2.centroid.y, 'o', color='red', markersize=10, label='Centroid 2')
ax.plot(polygon3.centroid.x, polygon3.centroid.y, 'o', color='green', markersize=10, label='Centroid 3')
ax.grid()
# Add a title and labels
ax.set_title('Polygon and Centroid')
ax.set_xlabel('X')
ax.set_ylabel('Y')
# Add a legend
ax.legend()
# Show the plot
plt.show()
_images/32435c36e30baece90e40ec582d9446c533f7de3ff2a8a91ba8ff0486b064c03.png

1.1.4 MultiPoint, MultiLineString, and MultiPolygon#

# Create a MultiPoint object
multipoint = MultiPoint([point1, point2])
# The MultiPoint object
multipoint
_images/bec74d091a3439912b41f3da7651f79046410ba798b92fc6fa462bcb9486d2c9.svg
# Print the MultiPoint objects type in python
type(multipoint)
shapely.geometry.multipoint.MultiPoint
# Or we can use the geom_type to check the type of the MultiPoint object
multipoint.geom_type
'MultiPoint'
# Create a MultiLineString object
multiline = MultiLineString([line1, line2, line3])
# The MultiLineString object
multiline
_images/f92ed600aade8cbf003264017951d282d098b461b42e8f657c2ca20f29a9d5c0.svg
# Print the MultiLineString objects type in python
type(multiline)
shapely.geometry.multilinestring.MultiLineString
# Or we can use the geom_type to check the type of the MultiLineString object
multiline.geom_type
'MultiLineString'
# Create a MultiPolygon object
multipolygon = MultiPolygon([polygon1, polygon2, polygon3])
# The MultiPolygon object
multipolygon
_images/32ed0f8bfb4270fd32c840861ac4f3df30a98936cb44ef08ec45cc9447ddf9d4.svg
# Print the MultiPolygon objects type in python
type(multipolygon)
shapely.geometry.multipolygon.MultiPolygon
# Or we can use the geom_type to check the type of the MultiPolygon object
multipolygon.geom_type
'MultiPolygon'

Noted: we can also get the length, area, centroid, and distance of the MultiPoint, MultiLineString, and MultiPolygon objects.

Please review the geopandas.GeoSeries.distance at other geopandas.GeoSeries operations when you need to calculate large numbers of different points, polygon and linestrings

1.2 Map Projection#

Coordinate reference system

A Coordinate Reference System (CRS) defines how the Earth’s curved surface is represented on a flat map using coordinates. It specifies both the shape of the Earth (through a datum) and the method of projecting that shape onto a two-dimensional plane. Without a CRS, spatial data cannot be accurately positioned, nor can it be reliably combined with other datasets.

CRS Name

EPSG Code

Type

Notes

OSGB36 / British National Grid

EPSG:27700

Projected

The main CRS for mapping in Great Britain. Uses a Transverse Mercator projection and the OSGB36 datum.

WGS 84

EPSG:4326

Geographic

Used globally (e.g. GPS systems, Google Maps). Coordinates in latitude and longitude.

Irish Grid (Ireland and Northern Ireland)

EPSG:29902

Projected

Separate grid for Ireland (but related principles).

Web Mercator

EPSG:3857

Projected

Used by many web mapping applications (e.g. Google Maps, OpenStreetMap). Distorts areas and distances, but preserves angles.

NAD83 / UTM Zone 10N

EPSG:26910

Projected

Used in North America. Based on the UTM system, which divides the world into a series of zones.

EPSG (European Petroleum Survey Group) codes are a standardized set of identifiers for coordinate reference systems. They provide a unique identifier for each CRS, making it easier to reference and use them in GIS applications.

You can check all reference systems in the EPSG registry here (click).

Geographic vs. Projected Coordinate Systems

Type

Geographic Coordinate System (GCS)

Projected Coordinate System (PCS)

Coordinates

Latitude (Y) and Longitude (X), in degrees

X and Y values in meters, feet, or other linear units

Surface

Curved (earth-like, 3D ellipsoid)

Flat (2D map surface)

Examples

WGS84 (EPSG:4326), NAD83

UTM, State Plane, British National Grid

Good for

Global data, navigation, GPS

Local maps, accurate distances, areas, engineering

Issues

Hard to measure real distances (degrees aren’t equal in size everywhere)

Distortions (shape, area, distance, direction) — you have to choose which to minimize

Now, we use the UK Countries Boundaries data downloaded from ONS Open Geography portal as an example with GeoPandas.

# Load the shapefile, please note the .shp file is a part of the shapefile, and you need to download all the files in the same folder.
gdf_uk_1 = gpd.read_file("data/Countries_December_2024_Boundaries_UK/CTRY_DEC_2024_UK_BFC.shp")
# We get the geo-dataframe gdf_uk_1, which contains the geometry column (which is a multipolygon) and other attribute columns (four rows refer to four countries).
gdf_uk_1
CTRY24CD CTRY24NM CTRY24NMW BNG_E BNG_N LONG LAT GlobalID geometry
0 E92000001 England Lloegr 394883 370883 -2.07812 53.2350 5cad1ec2-bbe1-4ec4-bcd9-ba0cb9c3fc1f MULTIPOLYGON (((83962.84 5401.15, 83970.68 540...
1 N92000002 Northern Ireland Gogledd Iwerddon 86544 535337 -6.85571 54.6150 8d8effb1-0159-4cd6-b856-21a8754b4693 MULTIPOLYGON (((131198.094 468427.673, 131196....
2 S92000003 Scotland Yr Alban 277744 700060 -3.97094 56.1774 a158e058-71b1-4272-b4bf-91c241d13159 MULTIPOLYGON (((265944.63 543512.72, 265945.83...
3 W92000004 Wales Cymru 263405 242881 -3.99418 52.0674 c78b0dcc-7d89-42b2-9667-57aa91a55e74 MULTIPOLYGON (((322081.699 165165.901, 322082....
# Geoseries is a class of geo-df that stores geometric representations using Shapely library.
type(gdf_uk_1.geometry)
geopandas.geoseries.GeoSeries
# the geometry column contains the geometric representation of the features (e.g., points, lines, polygons).
type(gdf_uk_1.geometry[0])
shapely.geometry.multipolygon.MultiPolygon
# Load the geojson file; you may observe that the geojson file is much smaller than the shapefile.
gdf_uk_2 = gpd.read_file("data/Countries_December_2024_Boundaries_UK.geojson")
# We get the geo-dataframe gdf_uk_2, which contains the same information as the gdf_uk_1 from shp.
# However, the values in the geometry column are different, but they represent the same multipolygon geometries.
gdf_uk_2
FID CTRY24CD CTRY24NM CTRY24NMW BNG_E BNG_N LONG LAT GlobalID geometry
0 1 E92000001 England Lloegr 394883 370883 -2.07812 53.23497 bd411920-e7ea-4f71-b6c8-5f1d24ec92d3 MULTIPOLYGON (((-6.34905 49.89822, -6.32842 49...
1 2 N92000002 Northern Ireland Gogledd Iwerddon 86544 535337 -6.85571 54.61502 652c0c4b-647b-4565-b9ed-e9c17ec5834c MULTIPOLYGON (((-5.52389 54.67041, -5.52451 54...
2 3 S92000003 Scotland Yr Alban 277744 700060 -3.97094 56.17744 97bb1057-3e8d-4ad8-83ef-4577d1bb4d9c MULTIPOLYGON (((-3.06033 54.98452, -3.06337 54...
3 4 W92000004 Wales Cymru 263405 242881 -3.99418 52.06742 f7c86b8c-b705-44b7-bb7b-46323f7bddfe MULTIPOLYGON (((-4.30971 51.56253, -4.31141 51...
# Check the CRS of the gdf_uk_1
gdf_uk_1.crs
<Projected CRS: EPSG:27700>
Name: OSGB36 / British National Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore.
- bounds: (-9.01, 49.75, 2.01, 61.01)
Coordinate Operation:
- name: British National Grid
- method: Transverse Mercator
Datum: Ordnance Survey of Great Britain 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich
# we can also plot the gdf_uk_1 check the projection: x and y are in metre.
gdf_uk_1.plot(color="skyblue", edgecolor="black", lw= 0.5, alpha=0.8, figsize=(10, 10))
<Axes: >
_images/51a8af4258e730391c3be289a9deefe80023fa46331f66d39a7d4b4c0ae36fc2.png
# Check the CRS of the gdf_uk_2
gdf_uk_2.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# we can also plot the gdf_uk_2 check the projection: x and y are in degree.
gdf_uk_2.plot(color="skyblue", edgecolor="black", lw= 0.5, alpha=0.8, figsize=(10, 10))
<Axes: >
_images/f06d1274da0fd0f9803ad072afc149dee7ce064bd529617e568f33f836b1a97e.png

What is a map projection for coordinates:

A projection is a mathematical transformation that converts 3D geographic coordinates (latitude and longitude) into 2D Cartesian coordinates (X, Y).

# we can use geopandas to implement the projection of the gdf_uk_2, i.e., we can change the CRS of the gdf_uk_2 to the same as the gdf_uk_1.
gdf_uk_2 = gdf_uk_2.to_crs(epsg="27700")
# The coordinate information in the geometry column of gdf_uk_2 has been changed from degree to meter, i.e., the same as the gdf_uk_1.
gdf_uk_2
FID CTRY24CD CTRY24NM CTRY24NMW BNG_E BNG_N LONG LAT GlobalID geometry
0 1 E92000001 England Lloegr 394883 370883 -2.07812 53.23497 bd411920-e7ea-4f71-b6c8-5f1d24ec92d3 MULTIPOLYGON (((87796.624 8850.924, 89240.332 ...
1 2 N92000002 Northern Ireland Gogledd Iwerddon 86544 535337 -6.85571 54.61502 652c0c4b-647b-4565-b9ed-e9c17ec5834c MULTIPOLYGON (((172876.341 536297.016, 172828....
2 3 S92000003 Scotland Yr Alban 277744 700060 -3.97094 56.17744 97bb1057-3e8d-4ad8-83ef-4577d1bb4d9c MULTIPOLYGON (((332243.348 566061.14, 332047.3...
3 4 W92000004 Wales Cymru 263405 242881 -3.99418 52.06742 f7c86b8c-b705-44b7-bb7b-46323f7bddfe MULTIPOLYGON (((239998.199 187379.085, 239876....
# we can also plot the gdf_uk_2 check the projection: x and y are transferred to metre.
gdf_uk_2.plot(color="skyblue", edgecolor="black", lw= 0.5, alpha=0.8, figsize=(10, 10))
plt.xlim(0.)
plt.ylim(0.)
# let y ticks show all the numbers
plt.ticklabel_format(style='plain', axis='y')
_images/23e008be2bb81318680bf950808cb4ee24ff32fbcae9122ca86a47838d06360a.png

1.3 Geometric operations – Overlay#

In this section, we will learn how to perform overlay operations on geospatial data. Overlay operations are used to combine two or more layers of geospatial data to create a new layer that contains information from all the input layers. There are four main types of overlay operations: intersection, union, difference, and symmetric difference in Geopandas overlay function.

overlay

(Spatial overlay with two input vector layers (a_input = rectangle, b_input = circle). The resulting vector layer is displayed in green. QGIS documentation)

Here, we use the London Local authorities selected from UK local authority boundaries used in the previous week and the London Inner Ultra Low Emission Zone (ULEZ) boundary 2019 or called central London Congestion Charge Zone (CCZ) now. The London congestion charge zone is a designated area in central London where drivers are required to pay a fee to drive within the zone during certain hours. The ULEZ is an area in London where only vehicles that meet strict emissions standards can enter without paying a charge. (Please note the boundary of ULEZ 2019/CCZ was expanded in 2021 and 2023, but we use the CCZ boundary in this case).

# Read the UK Local Authority boundaries in geojson
gdf_uk_la = gpd.read_file("data/Local_Authority_Districts_December_2024_Boundaries_UK_BSC.geojson")
# Selecting London Local Authority boundaries by using the index in the LAD24CD column.
# As all London LA index code starts with 'E09', we can use the string method to filter the data.
gdf_uk_london = gdf_uk_la[gdf_uk_la['LAD24CD'].str.startswith('E09')]
# We can observe that London has 33 local authorities (33 rows in GeoDataFrame).
gdf_uk_london
FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID geometry
263 264 E09000001 City of London 532382 181358 -0.093520 51.51564 741710fd-03e1-4b41-8645-7ebcfc5961ac POLYGON ((-0.07853 51.52151, -0.07687 51.51663...
264 265 E09000002 Barking and Dagenham 547757 185111 0.129479 51.54556 a2f59957-115c-478b-8c27-5162ee915dc7 POLYGON ((0.15436 51.56611, 0.16189 51.56162, ...
265 266 E09000003 Barnet 523473 191752 -0.218200 51.61107 9c7bda3b-2831-4799-857a-c83a49d16e4e POLYGON ((-0.19987 51.67017, -0.19107 51.6639,...
266 267 E09000004 Bexley 549202 175434 0.146212 51.45823 ce2684df-0b6a-45fb-a5f7-85b9ab35e0de POLYGON ((0.18654 51.48046, 0.20084 51.47866, ...
267 268 E09000005 Brent 519615 186465 -0.275690 51.56439 3b5f8a90-cb62-4570-984f-fbf322e910bc POLYGON ((-0.2495 51.58557, -0.25173 51.58338,...
268 269 E09000006 Bromley 542036 165707 0.039246 51.37266 13f562be-b8cb-427c-bf86-0f845358e1ac POLYGON ((0.03975 51.44098, 0.05821 51.42487, ...
269 270 E09000007 Camden 527491 184283 -0.162910 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 POLYGON ((-0.13842 51.55687, -0.13072 51.55067...
270 271 E09000008 Croydon 533922 164745 -0.077620 51.36599 c8fce222-bb0d-4a51-a695-68c991b41d31 POLYGON ((-0.11263 51.42324, -0.10596 51.42259...
271 272 E09000009 Ealing 517055 181959 -0.314100 51.52443 b2200ae0-425c-441f-88f8-b3ad3afc2152 POLYGON ((-0.33556 51.55656, -0.31253 51.54903...
272 273 E09000010 Enfield 532831 196198 -0.081440 51.64890 53c0b28d-45c0-462e-96ff-dd23c7f8cead POLYGON ((-0.08389 51.68991, -0.06209 51.68298...
273 274 E09000011 Greenwich 542507 175878 0.050093 51.46394 c559fa29-c300-4d10-8c1f-c0ea59815fa1 MULTIPOLYGON (((-0.01733 51.4802, -0.01876 51....
274 275 E09000012 Hackney 534560 185787 -0.060460 51.55493 a41cf0fc-95ed-4823-a9db-7b8bc60f421b POLYGON ((-0.01717 51.55158, -0.01655 51.54333...
275 276 E09000013 Hammersmith and Fulham 523867 177993 -0.217350 51.48733 fb263bb7-9972-416c-9e8c-fc6966c08221 POLYGON ((-0.21503 51.50219, -0.20795 51.49603...
276 277 E09000014 Haringey 531260 189349 -0.106700 51.58772 a4407a53-4227-42ea-be81-5933bf73fc98 POLYGON ((-0.11562 51.60842, -0.11445 51.6084,...
277 278 E09000015 Harrow 515359 189736 -0.335990 51.59467 72e2ba10-bf08-4504-8775-6fd08d725dd7 POLYGON ((-0.2842 51.5905, -0.28246 51.58505, ...
278 279 E09000016 Havering 555032 187514 0.235368 51.56520 ef6d5be3-a2f9-4ad8-8c35-5cdc37e0bf9e POLYGON ((0.22409 51.63174, 0.26326 51.60919, ...
279 280 E09000017 Hillingdon 508168 183121 -0.441790 51.53664 ac5e6b28-6cc5-413b-b5b0-2d86048fe3f1 POLYGON ((-0.45974 51.61316, -0.45713 51.61229...
280 281 E09000018 Hounslow 512737 174959 -0.378550 51.46239 a53588f2-bf1c-41b5-83f7-d0e7419b35df POLYGON ((-0.27418 51.49729, -0.26906 51.49403...
281 282 E09000019 Islington 531160 184645 -0.109900 51.54547 43a872fc-e401-45af-a2e6-3b2befc98837 POLYGON ((-0.07768 51.54948, -0.07669 51.54609...
282 283 E09000020 Kensington and Chelsea 525756 179054 -0.189780 51.49645 29dc7a0f-a487-4156-a842-975bea2a337d POLYGON ((-0.19991 51.51684, -0.19917 51.51454...
283 284 E09000021 Kingston upon Thames 519508 167389 -0.283680 51.39296 1e70f43d-4ddf-47dc-88bb-4df2efbfcafc POLYGON ((-0.25424 51.4293, -0.24952 51.41478,...
284 285 E09000022 Lambeth 531118 175629 -0.113850 51.46445 9143bbc4-6bda-4e17-8148-cae2813f694d POLYGON ((-0.09936 51.47264, -0.09598 51.46987...
285 286 E09000023 Lewisham 537888 173343 -0.017340 51.44230 99f48b46-bb26-47d7-8c90-bab1eeba4425 POLYGON ((-0.01876 51.47891, -0.02274 51.47535...
286 287 E09000024 Merton 526068 169508 -0.188690 51.41059 2e85b11a-12d0-45c2-862c-ffd0dce83b1a POLYGON ((-0.18985 51.44027, -0.18977 51.43135...
287 288 E09000025 Newham 540713 183346 0.027261 51.53150 70e80979-1178-4228-9ae5-2e8d09e17e69 POLYGON ((0.05034 51.56402, 0.06015 51.55641, ...
288 289 E09000026 Redbridge 543512 189477 0.070085 51.58588 947652cf-8d04-498f-aeac-8cb34f1052d8 POLYGON ((0.02182 51.62883, 0.04079 51.61573, ...
289 290 E09000027 Richmond upon Thames 519005 172650 -0.289140 51.44035 533a2a77-3779-4660-b9de-90d93540ee21 POLYGON ((-0.23296 51.47168, -0.23357 51.46535...
290 291 E09000028 Southwark 533945 175869 -0.073090 51.46595 7d09d625-212a-4819-9b12-36c8d8d2b697 POLYGON ((-0.07083 51.50252, -0.07351 51.50046...
291 292 E09000029 Sutton 527357 163639 -0.172270 51.35755 1b5d57bc-1cb5-43e1-85ba-3bb45da0b7cc POLYGON ((-0.16995 51.39173, -0.1653 51.388, -...
292 293 E09000030 Tower Hamlets 536340 181452 -0.036480 51.51555 54f82e61-dc15-42cb-b777-681b4326f855 POLYGON ((-0.03319 51.54469, -0.02899 51.54227...
293 294 E09000031 Waltham Forest 537328 190278 -0.018800 51.59462 97cd624f-2c02-4f12-a78e-7aedc777673f POLYGON ((0.02009 51.62643, 0.01454 51.61889, ...
294 295 E09000032 Wandsworth 525152 174138 -0.200220 51.45240 9c51587a-4315-449d-bcc7-e65625e402f7 POLYGON ((-0.14969 51.46129, -0.14814 51.4564,...
295 296 E09000033 Westminster 528268 180871 -0.152950 51.51222 a52ff0f8-4ac0-4417-8338-01be3cbb516a POLYGON ((-0.17348 51.53765, -0.1649 51.53578,...
# We check the coordinate reference system (CRS) of the London local authority boundaries, which is WGS 84.
gdf_uk_london.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# We can plot the London local authority boundaries using the plot method.
gdf_uk_london.plot(figsize=(6, 6), edgecolor='grey', color='skyblue', alpha=0.7, linewidth=0.4)
<Axes: >
_images/6dd0be53a65584b5bc56cbd3f38e0933c8cc52a44f138bb201e0f8c07cb39cf2.png
# We read the ULEZ 2019 / CCZ boundary in shp format.
gdf_ccz = gpd.read_file("data/ULEZ_2019_Central_Congestion_Charging_Zone/UltraLowEmissionsZoneBoundary(ULEZ).shp")
# We can observe that the CCZ boundary has 1 row (CCZ is a single polygon).
gdf_ccz
OBJECTID BOUNDARY Shape_Area geometry
0 1 CSS Area 21.375571 POLYGON ((531562.664 183054.181, 531582.254 18...
# Check the CRS of the CCZ boundary, which is OSGB 1936 / British National Grid.
gdf_ccz.crs
<Projected CRS: EPSG:27700>
Name: OSGB36 / British National Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore.
- bounds: (-9.01, 49.75, 2.01, 61.01)
Coordinate Operation:
- name: British National Grid
- method: Transverse Mercator
Datum: Ordnance Survey of Great Britain 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich
# Plot the CCZ boundary using the plot method.
gdf_ccz.plot(figsize=(6, 6), edgecolor='grey', color='red', alpha=0.3, linewidth=0.4)
<Axes: >
_images/e2fa5fb3c6da5eee6984a05f39696b1899497c0f25fcaea3014c1cf74db1537c.png

We need compatible CRS to perform overlay operations, which means that we should use map projection to change WGS 84 to OSGB 1936 / British National Grid. (We recommend using the projected CRS as the projected CRS is more accurate than the geographic CRS).

# Change London LA boundaries CRS from WGS 84 to OSGB 1936 / British National Grid.
gdf_uk_london = gdf_uk_london.to_crs(epsg=27700)
# We can check the CRS of the London local authority boundaries again.
gdf_uk_london.crs == gdf_ccz.crs
True

Intersection

In this case, we would like to create a new layer that contains only the areas of the LAs in London that are within the CCZ boundary. This is done by performing an intersection operation between the two geo-dfs (two layers).

# Perform intersection operation between London LA boundaries and CCZ boundary.
gdf_ccz_la = gpd.overlay(gdf_ccz, gdf_uk_london, how='intersection')
# We can observe that there are 8 LAs in London that are within the CCZ boundary (8 rows in GeoDataFrame), and we generate 8 new areas (zones).)
gdf_ccz_la
OBJECTID BOUNDARY Shape_Area FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID geometry
0 1 CSS Area 21.375571 264 E09000001 City of London 532382 181358 -0.09352 51.51564 741710fd-03e1-4b41-8645-7ebcfc5961ac POLYGON ((533741.949 181255.415, 533743.557 18...
1 1 CSS Area 21.375571 270 E09000007 Camden 527491 184283 -0.16291 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 POLYGON ((528914.128 182173.503, 528917.252 18...
2 1 CSS Area 21.375571 275 E09000012 Hackney 534560 185787 -0.06046 51.55493 a41cf0fc-95ed-4823-a9db-7b8bc60f421b POLYGON ((532960.103 182538.598, 532960.597 18...
3 1 CSS Area 21.375571 282 E09000019 Islington 531160 184645 -0.10990 51.54547 43a872fc-e401-45af-a2e6-3b2befc98837 POLYGON ((531582.254 183032.797, 531584.653 18...
4 1 CSS Area 21.375571 285 E09000022 Lambeth 531118 175629 -0.11385 51.46445 9143bbc4-6bda-4e17-8148-cae2813f694d POLYGON ((531750.948 178636.445, 531734.903 17...
5 1 CSS Area 21.375571 291 E09000028 Southwark 533945 175869 -0.07309 51.46595 7d09d625-212a-4819-9b12-36c8d8d2b697 POLYGON ((533584.147 180067.345, 533558.653 18...
6 1 CSS Area 21.375571 293 E09000030 Tower Hamlets 536340 181452 -0.03648 51.51555 54f82e61-dc15-42cb-b777-681b4326f855 MULTIPOLYGON (((533533.852 182103.403, 533568....
7 1 CSS Area 21.375571 296 E09000033 Westminster 528268 180871 -0.15295 51.51222 a52ff0f8-4ac0-4417-8338-01be3cbb516a POLYGON ((530031.102 178265.801, 530007.101 17...
# We can plot the new 8 layers using the plot method.
# We can observe that some LA boundaries have been cut by the CCZ boundary.
gdf_ccz_la.plot(figsize=(6, 6), edgecolor='grey', color='orange', alpha=0.7, linewidth=0.4)
<Axes: >
_images/9b8b58e2a0a345ceb9c96d918c2d2981b4ca75a8bf8120455ecaccbcc3a59e84.png

Union

In this case, we would like to create a new union layer that consists of Camden and CCZ. This is done by performing a union operation between the two geo-dfs (two layers).

# Select Camden LA boundary from the London LA boundaries.
gdf_camden = gdf_uk_london[gdf_uk_london['LAD24NM'] == 'Camden']
# plot the Camden LA boundary.
gdf_camden.plot(figsize=(6, 6), edgecolor='grey', color='skyblue', alpha=0.7, linewidth=0.4)
<Axes: >
_images/a9412958d3e7ad92929bf59a1c5efa88035bbe2aed4de19b55cbece29e9d87b2.png
# Perform union operation between Camden LA boundary and CCZ boundary.
gdf_union = gpd.overlay(gdf_camden, gdf_ccz, how='union')
# We can observe that there are 3 rows in the new GeoDataFrame.
gdf_union
FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID OBJECTID BOUNDARY Shape_Area geometry
0 270.0 E09000007 Camden 527491.0 184283.0 -0.16291 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 1.0 CSS Area 21.375571 POLYGON ((530918.69 182415.114, 531558.692 181...
1 270.0 E09000007 Camden 527491.0 184283.0 -0.16291 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 NaN NaN NaN POLYGON ((529703.128 185186.98, 529820.513 184...
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 CSS Area 21.375571 POLYGON ((531582.254 183032.797, 531584.653 18...

We can observe that the new union zone consists of three parts/ areas:

  1. Camden area that is not in CCZ.

  2. CCZ that is not in Camden LA boundary.

  3. The intersected area that is both in Camden LA boundary and CCZ boundary.

# plot each area in the new union zone.
for i in range(len(gdf_union)):
    gdf_union.iloc[[i]].plot(figsize=(3, 3), edgecolor='grey', color='orange', alpha=0.7, linewidth=0.4)
_images/7ea756012723d71fd520285595422a4f3c5740bdbb0e1561171a1b8b659ea59e.png _images/5f4e9053db7ab018e777a51290737a96e1acdf44938cbf077a8711ec1373c922.png _images/4ba037c3a29331aa4b50ef8f59ae355c9b518b89a5b9fc11abbebb5643d0e85c.png

Symmetric difference

In this case, we would like to create a new layer that consists of the areas of Camden and CCZ that are not in common. This is done by performing a symmetric difference operation between the two geo-dfs (two layers).

# Perform a symmetric difference operation between Camden LA boundary and CCZ boundary.
gdf_sym_diff = gpd.overlay(gdf_camden, gdf_ccz, how='symmetric_difference')
# We can observe that there are 2 rows in the new GeoDataFrame.
gdf_sym_diff
FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID OBJECTID BOUNDARY Shape_Area geometry
0 270.0 E09000007 Camden 527491.0 184283.0 -0.16291 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 NaN NaN NaN POLYGON ((529703.128 185186.98, 529820.513 184...
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 CSS Area 21.375571 POLYGON ((531582.254 183032.797, 531584.653 18...

We can observe that the new zone consists of two parts/ areas:

  1. Camden area that is not in CCZ.

  2. CCZ area that is not in Camden LA boundary.

This means that symmetric difference only returns the two areas (which dorpping the intersected area) when compared to the union operation.

#  plot each area in the new zone.
for i in range(len(gdf_sym_diff)):
    gdf_sym_diff.iloc[[i]].plot(figsize=(3, 3), edgecolor='grey', color='orange', alpha=0.7, linewidth=0.4)
_images/5f4e9053db7ab018e777a51290737a96e1acdf44938cbf077a8711ec1373c922.png _images/4ba037c3a29331aa4b50ef8f59ae355c9b518b89a5b9fc11abbebb5643d0e85c.png

Difference

In this case, we would like to create a new layer that consists of the areas of Camden that are not in CCZ. This is done by performing a different operation between the two geo-dfs (two layers).

# Perform a difference operation between Camden LA boundary and CCZ boundary.
gdf_diff_1 = gpd.overlay(gdf_camden, gdf_ccz, how='difference')
# plot the new zone.
gdf_diff_1.plot(figsize=(3, 3), edgecolor='grey', color='orange', alpha=0.7, linewidth=0.4)
<Axes: >
_images/5f4e9053db7ab018e777a51290737a96e1acdf44938cbf077a8711ec1373c922.png

If we change the order of the gdfs in the fuction, we will get another new layrer that consists of the areas of CCZ that are not in Camden.

# Perform a difference operation between CCZ boundary and Camden LA boundary.
gdf_diff_2 = gpd.overlay(gdf_ccz, gdf_camden, how='difference')
# plot the new zone.
gdf_diff_2.plot(figsize=(3, 3), edgecolor='grey', color='orange', alpha=0.7, linewidth=0.4)
<Axes: >
_images/4ba037c3a29331aa4b50ef8f59ae355c9b518b89a5b9fc11abbebb5643d0e85c.png

1.4 Spatial Join#

Like the join function in Pandas, Geopandas Spatial join is a process of combining two GeoDataFrames based on their spatial relationship. This is useful for analyzing the relationship between different layers and types of geospatial data. For example, we can use spatial join to combine a large dataset of points with polygons to find out which points are within which polygons. As part of data meraging techniques in Geopandas, you can find more info on this page. Please note that the spatial join is different from the overlay operation, which creates a new layer based on the intersection of two layers. The spatial join only combines the attributes of the two layers based on their spatial relationship.

The default spatial index in GeoPandas currently supports the following values for predicate which are defined in the Shapely documentation:

  • intersects: returns all geometries that intersect with the other geometry.

  • contains: returns all geometries that contain the other geometry.

  • within: returns all geometries that are within the other geometry. The within predicate is the inverse of contains.

  • crosses: returns all geometries that cross the other geometry. This means that the geometries share some but not all points in common.

  • touches: returns all geometries that touch the other geometry. This means include geometries that are adjacent to the other geometry.

Here, we will use the London dataset use to demonstrate the spatial join operation. First, we will perform a spatial join between the CCZ zone and London Underground stations to find out which underground stations are within the CCZ zone.

# We read the London Underground station locations in geojson format.
gdf_underground = gpd.read_file("data/Underground_Stations.geojson")
# We need to transfer the CRS of the underground stations from WGS 84 to OSGB 1936 / British National Grid.
gdf_underground = gdf_underground.to_crs(epsg=27700)
# Perform a spatial join between London Underground stations and CCZ zone.
gdf_ccz_underground = gpd.sjoin(gdf_underground, gdf_ccz, how='inner', predicate='within')

Please note that ‘how’ parameter is set to ‘inner,’ which means that only the stations that are within the CCZ area will be returned.

The predicate parameter is set to within which means that only the stations (left gdf) that are within the CCZ (right gdf) will be returned.

If we set the how parameter to left, all the stations will be returned, but only the stations that are within the CCZ will have the CCZ attributes.

If we set the how parameter to right, all the CCZ zones will be returned, but only the CCZ that has the underground stations will have the underground station attributes.

# We can observe that there are 37 underground stations that are within the CCZ zone.
gdf_ccz_underground
OBJECTID_left NAME LINES ATCOCODE MODES ACCESSIBILITY NIGHT_TUBE NETWORK DATASET_LAST_UPDATED FULL_NAME geometry index_right OBJECTID_right BOUNDARY Shape_Area
0 111 St. Paul's Central 940GZZLUSPU bus, tube Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 St. Paul's station POINT Z (532110.145 181274.419 0) 0 1 CSS Area 21.375571
36 147 Leicester Square Piccadilly, Northern 940GZZLULSQ tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Leicester Square station POINT Z (529982.188 180824.696 0) 0 1 CSS Area 21.375571
37 148 Covent Garden Piccadilly 940GZZLUCGN tube Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Covent Garden station POINT Z (530252.371 181025.223 0) 0 1 CSS Area 21.375571
38 149 Russell Square Piccadilly 940GZZLURSQ tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Russell Square station POINT Z (530230.87 182127.97 0) 0 1 CSS Area 21.375571
46 157 Temple District, Circle 940GZZLUTMP tube, bus Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Temple station POINT Z (530960.433 180803.046 0) 0 1 CSS Area 21.375571
47 158 Blackfriars District, Circle 940GZZLUBKF tube Fully Accessible No London Underground 2021-11-29 00:00:00+00:00 Blackfriars station POINT Z (531695.661 180893.308 0) 0 1 CSS Area 21.375571
48 159 Mansion House District, Circle 940GZZLUMSH tube, bus Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Mansion House station POINT Z (532355.962 180931.785 0) 0 1 CSS Area 21.375571
49 160 Cannon Street District, Circle 940GZZLUCST tube Partially Accessible No London Underground 2021-11-29 00:00:00+00:00 Cannon Street station POINT Z (532613.273 180900.285 0) 0 1 CSS Area 21.375571
50 161 Tower Hill District, Circle 940GZZLUTWH tube Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Tower Hill station POINT Z (533581.262 180755.588 0) 0 1 CSS Area 21.375571
51 162 Aldgate Metropolitan, Circle 940GZZLUALD bus, tube Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Aldgate station POINT Z (533614.972 181262.505 0) 0 1 CSS Area 21.375571
52 163 Liverpool Street Metropolitan, Central, Circle, Hammersmith & City 940GZZLULVT tube Partially Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Liverpool Street station POINT Z (533095.667 181567.145 0) 0 1 CSS Area 21.375571
53 164 Moorgate Metropolitan, Northern, Circle, Hammersmith & ... 940GZZLUMGT tube Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Moorgate station POINT Z (532669.633 181668.478 0) 0 1 CSS Area 21.375571
54 165 Barbican Metropolitan, Circle, Hammersmith & City 940GZZLUBBN tube, bus Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Barbican station POINT Z (532006.001 181856.574 0) 0 1 CSS Area 21.375571
55 166 Farringdon Metropolitan, Circle, Hammersmith & City 940GZZLUFCN tube Fully Accessible No London Underground 2021-11-29 00:00:00+00:00 Farringdon station POINT Z (531561.791 181874.173 0) 0 1 CSS Area 21.375571
78 189 Warren Street Northern, Victoria 940GZZLUWRR tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Warren Street station POINT Z (529250.51 182266.279 0) 0 1 CSS Area 21.375571
92 203 Marble Arch Central 940GZZLUMBA tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Marble Arch station POINT Z (527961.86 181017.689 0) 0 1 CSS Area 21.375571
93 204 Bond Street Central, Jubilee 940GZZLUBND tube, bus Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Bond Street station POINT Z (528492.413 181117.852 0) 0 1 CSS Area 21.375571
98 209 Bank Waterloo & City, Northern, Central 940GZZLUBNK tube Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Bank station POINT Z (532711.957 181120.121 0) 0 1 CSS Area 21.375571
99 210 Oxford Circus Central, Bakerloo, Victoria 940GZZLUOXC tube, bus Partially Accessible - Interchange Only Yes London Underground 2021-11-29 00:00:00+00:00 Oxford Circus station POINT Z (529048.363 181236.558 0) 0 1 CSS Area 21.375571
100 211 Holborn Central, Piccadilly 940GZZLUHBN tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Holborn station POINT Z (530513.836 181525.303 0) 0 1 CSS Area 21.375571
101 212 Chancery Lane Central 940GZZLUCHL tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Chancery Lane station POINT Z (531125.73 181615.704 0) 0 1 CSS Area 21.375571
118 229 Goodge Street Northern 940GZZLUGDG tube, bus Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Goodge Street station POINT Z (529539.037 181836.829 0) 0 1 CSS Area 21.375571
119 230 Tottenham Court Road Central, Northern 940GZZLUTCR bus, tube Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Tottenham Court Road station POINT Z (529817.144 181382.723 0) 0 1 CSS Area 21.375571
148 259 Embankment District, Bakerloo, Northern, Circle 940GZZLUEMB tube Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Embankment station POINT Z (530421.085 180396.077 0) 0 1 CSS Area 21.375571
157 268 Piccadilly Circus Bakerloo, Piccadilly 940GZZLUPCC bus, tube Not Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Piccadilly Circus station POINT Z (529614.395 180665.739 0) 0 1 CSS Area 21.375571
158 269 Charing Cross Bakerloo, Northern 940GZZLUCHX tube Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Charing Cross station POINT Z (530059.453 180378.586 0) 0 1 CSS Area 21.375571
159 270 Elephant & Castle Northern, Bakerloo 940GZZLUEAC tube Partially Accessible No London Underground 2021-11-29 00:00:00+00:00 Elephant & Castle station POINT Z (531910.732 179142.068 0) 0 1 CSS Area 21.375571
160 271 Lambeth North Bakerloo 940GZZLULBN bus, tube Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Lambeth North station POINT Z (531135.34 179456.379 0) 0 1 CSS Area 21.375571
172 283 Westminster District, Circle, Jubilee 940GZZLUWSM tube Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Westminster station POINT Z (530197.419 179668.415 0) 0 1 CSS Area 21.375571
175 286 Waterloo Waterloo & City, Bakerloo, Northern, Jubilee 940GZZLUWLO tube Partially Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Waterloo station POINT Z (530969.943 179962.112 0) 0 1 CSS Area 21.375571
182 293 Green Park Piccadilly, Victoria, Jubilee 940GZZLUGPK bus, tube Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Green Park station POINT Z (529008.967 180295.035 0) 0 1 CSS Area 21.375571
222 333 Southwark Jubilee 940GZZLUSWK bus, tube Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 Southwark station POINT Z (531594.114 180074.192 0) 0 1 CSS Area 21.375571
223 334 London Bridge Northern, Jubilee 940GZZLULNB tube Fully Accessible Yes London Underground 2021-11-29 00:00:00+00:00 London Bridge station POINT Z (532684.838 180189.059 0) 0 1 CSS Area 21.375571
231 387 Monument District, Circle 940GZZLUMMT tube, bus Partially Accessible - Interchange Only No London Underground 2021-11-29 00:00:00+00:00 Monument station POINT Z (532913.72 180824.294 0) 0 1 CSS Area 21.375571
242 398 St. James's Park District, Circle 940GZZLUSJP tube Not Accessible No London Underground 2021-11-29 00:00:00+00:00 St. James's Park station POINT Z (529648.342 179498.258 0) 0 1 CSS Area 21.375571
264 479 Borough Northern 940GZZLUBOR bus, tube Partially Accessible No London Underground 2021-11-29 00:00:00+00:00 Borough station POINT Z (532441.745 179751.658 0) 0 1 CSS Area 21.375571
265 480 Old Street Northern 940GZZLUODS tube Not Accessible No London Underground 2021-11-29 00:00:00+00:00 Old Street station POINT Z (532766.27 182419.055 0) 0 1 CSS Area 21.375571
# We can plot the underground stations and the CCZ zone.
fig, ax = plt.subplots(figsize=(6, 6))
gdf_ccz.plot(ax=ax, edgecolor='black', color='#fff2f2', alpha=1, linewidth=0.6)
gdf_ccz_underground.plot(ax=ax, edgecolor='grey', color='royalblue', alpha=1, linewidth=0.4, markersize=15)
ax.set_axis_off()
_images/90b829f1e5d16ce28ae99131d9d1584437a290f7e2274610b0eb9d9363bf095b.png

Second, we can also perform a spatial join between the London cycle routes and the CCZ to find out which cycle routes are within the CCZ zone.

# Read the London cycle routes in geojson format.
gdf_cycle_routes = gpd.read_file("data/Cycle_Routes.geojson")
# We transfer the CRS of the cycle routes from WGS 84 to OSGB 1936 / British National Grid.
gdf_cycle_routes = gdf_cycle_routes.to_crs(epsg=27700)
# Perform a spatial join between London cycle routes and CCZ zone.
gdf_ccz_cycle_routes = gpd.sjoin(gdf_cycle_routes, gdf_ccz, how='inner', predicate='within')
# We can observe that there are 13 cycle routes that are within the CCZ zone.
gdf_ccz_cycle_routes
LABEL PROGRAMME ROUTE_NAME ROUTE MILESTONE STATUS PUBLIC_ ROUTE_LENGTH_KM PROGRAMME_UPDATED OBJECTID_left Shape__Length geometry index_right OBJECTID_right BOUNDARY Shape_Area
4 C41 Cycleways Euston to Holborn Euston to Holborn Complete Complete Yes 1.181 2023-11-21 00:00:00+00:00 1995 1180.878708 LINESTRING Z (530257.989 182468.202 0, 530288.... 0 1 CSS Area 21.375571
11 C56 Cycleways C5 to Westmister Bridge C5 to Westmister Bridge Complete Complete Yes 1.196 2023-11-21 00:00:00+00:00 2002 1196.244254 MULTILINESTRING Z ((530954.753 179173.18 0, 53... 0 1 CSS Area 21.375571
26 C Cycle Superhighways Lancaster Gate to Barking Horse Guards Road Complete Complete Yes 1.336 2023-11-21 00:00:00+00:00 2017 1336.495364 LINESTRING Z (529903.742 179698.188 0, 529903.... 0 1 CSS Area 21.375571
40 C Cycleways Lambeth Roundabout to P.Square Lambeth Roundabout to P.Square (SG3) Concept Design Feasibility Yes 0.598 2023-11-21 00:00:00+00:00 2031 598.403380 LINESTRING Z (530155.358 179582.367 0, 530160.... 0 1 CSS Area 21.375571
42 C11 Cycleways Essex Road to Farringdon The City to Farringdon Complete Complete Yes 1.243 2023-11-21 00:00:00+00:00 2033 1243.120260 LINESTRING Z (532591.297 181938.706 0, 532574.... 0 1 CSS Area 21.375571
50 C Cycleways C1 to Liverpool Street C1 to Liverpool Street Complete Complete Yes 0.525 2023-11-21 00:00:00+00:00 2041 524.539812 LINESTRING Z (532944.816 181895.234 0, 533036.... 0 1 CSS Area 21.375571
73 C Cycleways Lambeth Roundabout - North & South Lambeth Roundabout - North & South (SG4) Detailed Design Feasibility Yes 0.519 2023-11-21 00:00:00+00:00 2064 519.098639 LINESTRING Z (530531.437 178939.038 0, 530531.... 0 1 CSS Area 21.375571
103 C Central London Grid Fitzrovia to Soho Fitzrovia to Soho (SG4) Detailed Design Feasibility Yes 0.942 2023-11-21 00:00:00+00:00 2094 942.260282 LINESTRING Z (529477.755 181246.222 25.5, 5295... 0 1 CSS Area 21.375571
110 C Central London Grid Fitzrovia to Soho Fitzrovia to Soho (SG4) Detailed Design Feasibility Yes 0.365 2023-11-21 00:00:00+00:00 2101 365.025555 LINESTRING Z (529360.759 181642.231 28.4, 5293... 0 1 CSS Area 21.375571
117 C Cycleways C4 to C14 and C10 C4 to C14 and C10 (SG4) Detailed Design Feasibility Yes 0.377 2023-11-21 00:00:00+00:00 2108 377.333957 LINESTRING Z (533001.786 179343.19 0, 532949.3... 0 1 CSS Area 21.375571
118 C Cycleways Waterloo Bridge Waterloo Bridge Complete Complete Yes 0.491 2023-11-21 00:00:00+00:00 2109 490.560628 LINESTRING Z (530690.767 180675.212 0, 530941.... 0 1 CSS Area 21.375571
138 C Cycleways Old Paradise Street Old Paradise Street Complete Complete Yes 0.317 2023-11-21 00:00:00+00:00 2129 316.683268 LINESTRING Z (530822.748 178867.173 0, 530814.... 0 1 CSS Area 21.375571
161 C10 Cycleways Blomsbury to Embankment Blomsbury to Embankment Complete Complete Yes 1.421 2023-11-21 00:00:00+00:00 2152 1420.613731 MULTILINESTRING Z ((530661.767 180658.212 0, 5... 0 1 CSS Area 21.375571
# We can plot the cycle routes and the CCZ zone.
fig, ax = plt.subplots(figsize=(6, 6))
gdf_ccz.plot(ax=ax, edgecolor='black', color='#fff2f2', alpha=1, linewidth=0.6)
gdf_ccz_cycle_routes.plot(ax=ax, edgecolor='grey', color='royalblue', alpha=1, linewidth=2)
ax.set_axis_off()
_images/313da6eafa0a0e972e4063bcf47d78d6162dc00c38520b0bfb3836190be13a7e.png

Third, we can also perform a spatial join between the London cycle routes and the CCZ to find out which cycle routes are intersected with the CCZ zone.

# Perform a spatial join between London cycle routes and CCZ zone.
gdf_ccz_cycle_routes_intersect = gpd.sjoin(gdf_cycle_routes, gdf_ccz, how='inner', predicate='intersects')
# We can observe that there are 33 cycle routes that are intersected with the CCZ zone.
gdf_ccz_cycle_routes_intersect
LABEL PROGRAMME ROUTE_NAME ROUTE MILESTONE STATUS PUBLIC_ ROUTE_LENGTH_KM PROGRAMME_UPDATED OBJECTID_left Shape__Length geometry index_right OBJECTID_right BOUNDARY Shape_Area
0 C38 Cycleways Finsbury Park to Highbury Fields Islington to Finsbury (SG2) Option Selection Feasibility Yes 0.566 2023-11-21 00:00:00+00:00 1991 565.899062 LINESTRING Z (531184.797 182813.267 0, 531180.... 0 1 CSS Area 21.375571
4 C41 Cycleways Euston to Holborn Euston to Holborn Complete Complete Yes 1.181 2023-11-21 00:00:00+00:00 1995 1180.878708 LINESTRING Z (530257.989 182468.202 0, 530288.... 0 1 CSS Area 21.375571
11 C56 Cycleways C5 to Westmister Bridge C5 to Westmister Bridge Complete Complete Yes 1.196 2023-11-21 00:00:00+00:00 2002 1196.244254 MULTILINESTRING Z ((530954.753 179173.18 0, 53... 0 1 CSS Area 21.375571
15 C6 Cycleways Elephant and Castle to Hampstead Elephant and Castle to Kings Cross Complete Complete Yes 7.944 2023-11-21 00:00:00+00:00 2006 7944.208889 MULTILINESTRING Z ((531908.906 179052.174 0, 5... 0 1 CSS Area 21.375571
21 CS7 Cycleways Merton to The City Merton to The City Complete Complete Yes 12.525 2023-11-21 00:00:00+00:00 2012 12525.201532 MULTILINESTRING Z ((526717.872 170267.511 0, 5... 0 1 CSS Area 21.375571
22 C3 Cycleways Lancaster Gate to Barking Lancaster Gate to Barking Complete Complete Yes 22.425 2023-11-21 00:00:00+00:00 2013 22425.196420 LINESTRING Z (545215.501 183267.379 0, 545183.... 0 1 CSS Area 21.375571
26 C Cycle Superhighways Lancaster Gate to Barking Horse Guards Road Complete Complete Yes 1.336 2023-11-21 00:00:00+00:00 2017 1336.495364 LINESTRING Z (529903.742 179698.188 0, 529903.... 0 1 CSS Area 21.375571
27 C8 Cycleways Wandsworth to Lambeth Bridge Wandsworth to Battersea Park Complete Complete Yes 3.288 2023-11-21 00:00:00+00:00 2018 3284.253577 MULTILINESTRING Z ((530246.431 178963.826 0, 5... 0 1 CSS Area 21.375571
35 C5 Cycleways Waterloo to Clapham Waterloo to Clapham Complete Complete Yes 2.076 2023-11-21 00:00:00+00:00 2026 2076.093601 LINESTRING Z (530519.358 178020.846 0, 530521.... 0 1 CSS Area 21.375571
40 C Cycleways Lambeth Roundabout to P.Square Lambeth Roundabout to P.Square (SG3) Concept Design Feasibility Yes 0.598 2023-11-21 00:00:00+00:00 2031 598.403380 LINESTRING Z (530155.358 179582.367 0, 530160.... 0 1 CSS Area 21.375571
41 C Cycleways Marylbone Road to Oxford Street Marylbone Road to Oxford Street (SG3) Concept Design Feasibility Yes 0.905 2023-11-21 00:00:00+00:00 2032 905.142174 MULTILINESTRING Z ((528934.12 181516.815 0, 52... 0 1 CSS Area 21.375571
42 C11 Cycleways Essex Road to Farringdon The City to Farringdon Complete Complete Yes 1.243 2023-11-21 00:00:00+00:00 2033 1243.120260 LINESTRING Z (532591.297 181938.706 0, 532574.... 0 1 CSS Area 21.375571
49 C Cycleways Waterloo to Clapham C5 to C14 - The Cut_Union Street Complete Complete Yes 2.224 2023-11-21 00:00:00+00:00 2040 2224.308731 MULTILINESTRING Z ((531314.766 179849.195 0, 5... 0 1 CSS Area 21.375571
50 C Cycleways C1 to Liverpool Street C1 to Liverpool Street Complete Complete Yes 0.525 2023-11-21 00:00:00+00:00 2041 524.539812 LINESTRING Z (532944.816 181895.234 0, 533036.... 0 1 CSS Area 21.375571
54 C52 Quietways Covent Garden to Euston Covent Garden to Euston Complete Complete Yes 2.561 2023-11-21 00:00:00+00:00 2045 2560.879009 MULTILINESTRING Z ((530690.767 180675.212 0, 5... 0 1 CSS Area 21.375571
55 C4 Cycleways London Bridge to Rotherhithe Roundabout London Bridge to Rotherhithe Roundabout Complete Complete Yes 2.998 2023-11-21 00:00:00+00:00 2046 2997.526255 MULTILINESTRING Z ((533884.72 179664.948 0, 53... 0 1 CSS Area 21.375571
73 C Cycleways Lambeth Roundabout - North & South Lambeth Roundabout - North & South (SG4) Detailed Design Feasibility Yes 0.519 2023-11-21 00:00:00+00:00 2064 519.098639 LINESTRING Z (530531.437 178939.038 0, 530531.... 0 1 CSS Area 21.375571
77 C1 Cycleways Freezey Water to The City Freezey Water to The City Complete Complete Yes 20.165 2023-11-21 00:00:00+00:00 2068 20164.517337 MULTILINESTRING Z ((533373.843 185167.703 0, 5... 0 1 CSS Area 21.375571
82 C17 Cycleways Elephant and Castle to Dulwich Elephant and Castle to Camberwell Complete Complete Yes 4.002 2023-11-21 00:00:00+00:00 2073 4001.525575 MULTILINESTRING Z ((533054.791 176407.129 0, 5... 0 1 CSS Area 21.375571
83 C14 Cycleways Blackfriars to Rotherhithe Blackfriars to Rotherhithe Complete Complete Yes 6.995 2023-11-21 00:00:00+00:00 2074 6994.926446 LINESTRING Z (536853.163 178583.076 0, 536808.... 0 1 CSS Area 21.375571
103 C Central London Grid Fitzrovia to Soho Fitzrovia to Soho (SG4) Detailed Design Feasibility Yes 0.942 2023-11-21 00:00:00+00:00 2094 942.260282 LINESTRING Z (529477.755 181246.222 25.5, 5295... 0 1 CSS Area 21.375571
109 C51 Central London Grid Marylebone to Kilburn Marylebone to Kilburn (SG4) Detailed Design Feasibility Yes 3.296 2023-11-21 00:00:00+00:00 2100 3266.659774 MULTILINESTRING Z ((525974.729 183317.261 0, 5... 0 1 CSS Area 21.375571
110 C Central London Grid Fitzrovia to Soho Fitzrovia to Soho (SG4) Detailed Design Feasibility Yes 0.365 2023-11-21 00:00:00+00:00 2101 365.025555 LINESTRING Z (529360.759 181642.231 28.4, 5293... 0 1 CSS Area 21.375571
113 C27 Quietways East Acton to Walthamstow East Acton to Walthamstow Complete Complete Yes 8.679 2023-11-21 00:00:00+00:00 2104 8678.992253 MULTILINESTRING Z ((530917.187 182415.036 0, 5... 0 1 CSS Area 21.375571
114 C27 Cycleways East Acton to Walthamstow East Acton to Walthamstow Complete Complete Yes 16.363 2023-11-21 00:00:00+00:00 2105 16363.310394 MULTILINESTRING Z ((525557.685 180543.199 0, 5... 0 1 CSS Area 21.375571
117 C Cycleways C4 to C14 and C10 C4 to C14 and C10 (SG4) Detailed Design Feasibility Yes 0.377 2023-11-21 00:00:00+00:00 2108 377.333957 LINESTRING Z (533001.786 179343.19 0, 532949.3... 0 1 CSS Area 21.375571
118 C Cycleways Waterloo Bridge Waterloo Bridge Complete Complete Yes 0.491 2023-11-21 00:00:00+00:00 2109 490.560628 LINESTRING Z (530690.767 180675.212 0, 530941.... 0 1 CSS Area 21.375571
120 C Cycleways Oval to C5 Oval to C5 Complete Complete Yes 1.032 2023-11-21 00:00:00+00:00 2111 1032.298014 LINESTRING Z (530701.744 178648.168 0, 530746.... 0 1 CSS Area 21.375571
126 C10 Cycleways Waterloo to Greenwich Waterloo to Greenwich Complete Complete Yes 10.543 2023-11-21 00:00:00+00:00 2117 10542.684407 MULTILINESTRING Z ((534882.799 178422.148 0, 5... 0 1 CSS Area 21.375571
138 C Cycleways Old Paradise Street Old Paradise Street Complete Complete Yes 0.317 2023-11-21 00:00:00+00:00 2129 316.683268 LINESTRING Z (530822.748 178867.173 0, 530814.... 0 1 CSS Area 21.375571
139 C11 Cycleways Essex Road to Farringdon Essex Road to The City Complete Complete Yes 2.123 2023-11-21 00:00:00+00:00 2130 2123.492454 LINESTRING Z (531860.953 183765.955 0, 532108.... 0 1 CSS Area 21.375571
145 C13 Cycleways Old Street to London Fields Old Street to London Fields Complete Complete Yes 3.184 2023-11-21 00:00:00+00:00 2136 3184.007679 MULTILINESTRING Z ((532567.063 182352.813 0, 5... 0 1 CSS Area 21.375571
161 C10 Cycleways Blomsbury to Embankment Blomsbury to Embankment Complete Complete Yes 1.421 2023-11-21 00:00:00+00:00 2152 1420.613731 MULTILINESTRING Z ((530661.767 180658.212 0, 5... 0 1 CSS Area 21.375571

We can observe that some cycle routes are not within the CCZ zone but they are intersected with the CCZ zone. This is because some cycle routes are represented by multilinestrings (even they are not connected).

# We can plot the cycle routes and the CCZ zone.
fig, ax = plt.subplots(figsize=(6, 6))
gdf_ccz.plot(ax=ax, edgecolor='black', color='#fff2f2', alpha=1, linewidth=0.6)
gdf_ccz_cycle_routes_intersect.plot(ax=ax, edgecolor='grey', color='royalblue', alpha=1, linewidth=1)
ax.set_axis_off()
_images/1548ae325797e5507d54c7298023072ccf6f18cea824b661d971a87aaa95da76.png

Now, we only select the linestring from the multilinestring in cycle route then using spatial joins (intersect) with the CCZ.

# Select the linestring from the multilinestring in the cycle route.
gdf_cycle_routes_s = gdf_cycle_routes[gdf_cycle_routes['geometry'].type == 'LineString']
gdf_cycle_routes_s
LABEL PROGRAMME ROUTE_NAME ROUTE MILESTONE STATUS PUBLIC_ ROUTE_LENGTH_KM PROGRAMME_UPDATED OBJECTID Shape__Length geometry
0 C38 Cycleways Finsbury Park to Highbury Fields Islington to Finsbury (SG2) Option Selection Feasibility Yes 0.566 2023-11-21 00:00:00+00:00 1991 565.899062 LINESTRING Z (531184.797 182813.267 0, 531180....
1 C48 Cycleways Brixton to Clapham High Street Brixton to Clapham High Street Complete Complete Yes 1.315 2023-11-21 00:00:00+00:00 1992 1315.182747 LINESTRING Z (531061.018 175496.889 0, 531070....
4 C41 Cycleways Euston to Holborn Euston to Holborn Complete Complete Yes 1.181 2023-11-21 00:00:00+00:00 1995 1180.878708 LINESTRING Z (530257.989 182468.202 0, 530288....
5 C55 Cycleways Lancaster Gate to Hyde Park Corner Lancaster Gate to Hyde Park Corner Complete Complete Yes 0.723 2023-11-21 00:00:00+00:00 1996 723.280344 LINESTRING Z (526927.71 180815.208 0, 526937.7...
7 C49 Cycleways Acton to Chiswick Acton to Chiswick Complete Complete Yes 4.210 2023-11-21 00:00:00+00:00 1998 4209.703771 LINESTRING Z (521097.743 178513.647 0, 521108....
... ... ... ... ... ... ... ... ... ... ... ... ...
160 C50 Cycleways Camden Town to Finsbury Park Camden Town to York Way Complete Complete Yes 1.133 2023-11-21 00:00:00+00:00 2151 1132.913927 LINESTRING Z (529231.783 184130.294 28.3, 5292...
162 C Cycleways Elephant and Castle to Hampstead Castlehaven Grafton Road Complete Complete Yes 1.833 2023-11-21 00:00:00+00:00 2153 1832.819766 LINESTRING Z (527667.882 185680.986 0, 527719....
163 C40 Cycleways Brentford to Twickenham Brentford to Twickenham Complete Complete Yes 4.291 2023-11-21 00:00:00+00:00 2154 4291.196598 LINESTRING Z (516033.206 173764.74 0, 516048.3...
164 C4 Cycleways London Bridge to Rotherhithe Roundabout Rotherhithe Roundabout to Lewisham (SG5) Delivery In Progress Yes 1.335 2023-11-21 00:00:00+00:00 2155 1335.083915 LINESTRING Z (536022.822 178603.139 0, 535966....
165 C39 Cycleways Kensington High St to Shepherds Bush Kensington High St to Shepherds Bush Complete Complete Yes 1.151 2023-11-21 00:00:00+00:00 2156 1151.249368 LINESTRING Z (524569.239 179008.338 0, 524509....

123 rows × 12 columns

# spatial join between the linestring and CCZ zone.
gdf_ccz_cycle_routes_intersect_linestring = gpd.sjoin(gdf_cycle_routes_s, gdf_ccz, how='inner', predicate='intersects')
# We can observe that there are 17 cycle routes that are intersected with the CCZ zone.
gdf_ccz_cycle_routes_intersect_linestring
LABEL PROGRAMME ROUTE_NAME ROUTE MILESTONE STATUS PUBLIC_ ROUTE_LENGTH_KM PROGRAMME_UPDATED OBJECTID_left Shape__Length geometry index_right OBJECTID_right BOUNDARY Shape_Area
0 C38 Cycleways Finsbury Park to Highbury Fields Islington to Finsbury (SG2) Option Selection Feasibility Yes 0.566 2023-11-21 00:00:00+00:00 1991 565.899062 LINESTRING Z (531184.797 182813.267 0, 531180.... 0 1 CSS Area 21.375571
4 C41 Cycleways Euston to Holborn Euston to Holborn Complete Complete Yes 1.181 2023-11-21 00:00:00+00:00 1995 1180.878708 LINESTRING Z (530257.989 182468.202 0, 530288.... 0 1 CSS Area 21.375571
22 C3 Cycleways Lancaster Gate to Barking Lancaster Gate to Barking Complete Complete Yes 22.425 2023-11-21 00:00:00+00:00 2013 22425.196420 LINESTRING Z (545215.501 183267.379 0, 545183.... 0 1 CSS Area 21.375571
26 C Cycle Superhighways Lancaster Gate to Barking Horse Guards Road Complete Complete Yes 1.336 2023-11-21 00:00:00+00:00 2017 1336.495364 LINESTRING Z (529903.742 179698.188 0, 529903.... 0 1 CSS Area 21.375571
35 C5 Cycleways Waterloo to Clapham Waterloo to Clapham Complete Complete Yes 2.076 2023-11-21 00:00:00+00:00 2026 2076.093601 LINESTRING Z (530519.358 178020.846 0, 530521.... 0 1 CSS Area 21.375571
40 C Cycleways Lambeth Roundabout to P.Square Lambeth Roundabout to P.Square (SG3) Concept Design Feasibility Yes 0.598 2023-11-21 00:00:00+00:00 2031 598.403380 LINESTRING Z (530155.358 179582.367 0, 530160.... 0 1 CSS Area 21.375571
42 C11 Cycleways Essex Road to Farringdon The City to Farringdon Complete Complete Yes 1.243 2023-11-21 00:00:00+00:00 2033 1243.120260 LINESTRING Z (532591.297 181938.706 0, 532574.... 0 1 CSS Area 21.375571
50 C Cycleways C1 to Liverpool Street C1 to Liverpool Street Complete Complete Yes 0.525 2023-11-21 00:00:00+00:00 2041 524.539812 LINESTRING Z (532944.816 181895.234 0, 533036.... 0 1 CSS Area 21.375571
73 C Cycleways Lambeth Roundabout - North & South Lambeth Roundabout - North & South (SG4) Detailed Design Feasibility Yes 0.519 2023-11-21 00:00:00+00:00 2064 519.098639 LINESTRING Z (530531.437 178939.038 0, 530531.... 0 1 CSS Area 21.375571
83 C14 Cycleways Blackfriars to Rotherhithe Blackfriars to Rotherhithe Complete Complete Yes 6.995 2023-11-21 00:00:00+00:00 2074 6994.926446 LINESTRING Z (536853.163 178583.076 0, 536808.... 0 1 CSS Area 21.375571
103 C Central London Grid Fitzrovia to Soho Fitzrovia to Soho (SG4) Detailed Design Feasibility Yes 0.942 2023-11-21 00:00:00+00:00 2094 942.260282 LINESTRING Z (529477.755 181246.222 25.5, 5295... 0 1 CSS Area 21.375571
110 C Central London Grid Fitzrovia to Soho Fitzrovia to Soho (SG4) Detailed Design Feasibility Yes 0.365 2023-11-21 00:00:00+00:00 2101 365.025555 LINESTRING Z (529360.759 181642.231 28.4, 5293... 0 1 CSS Area 21.375571
117 C Cycleways C4 to C14 and C10 C4 to C14 and C10 (SG4) Detailed Design Feasibility Yes 0.377 2023-11-21 00:00:00+00:00 2108 377.333957 LINESTRING Z (533001.786 179343.19 0, 532949.3... 0 1 CSS Area 21.375571
118 C Cycleways Waterloo Bridge Waterloo Bridge Complete Complete Yes 0.491 2023-11-21 00:00:00+00:00 2109 490.560628 LINESTRING Z (530690.767 180675.212 0, 530941.... 0 1 CSS Area 21.375571
120 C Cycleways Oval to C5 Oval to C5 Complete Complete Yes 1.032 2023-11-21 00:00:00+00:00 2111 1032.298014 LINESTRING Z (530701.744 178648.168 0, 530746.... 0 1 CSS Area 21.375571
138 C Cycleways Old Paradise Street Old Paradise Street Complete Complete Yes 0.317 2023-11-21 00:00:00+00:00 2129 316.683268 LINESTRING Z (530822.748 178867.173 0, 530814.... 0 1 CSS Area 21.375571
139 C11 Cycleways Essex Road to Farringdon Essex Road to The City Complete Complete Yes 2.123 2023-11-21 00:00:00+00:00 2130 2123.492454 LINESTRING Z (531860.953 183765.955 0, 532108.... 0 1 CSS Area 21.375571
# We can plot the cycle routes and the CCZ zone.
fig, ax = plt.subplots(figsize=(6, 6))
gdf_ccz.plot(ax=ax, edgecolor='black', color='#fff2f2', alpha=1, linewidth=0.6)
gdf_ccz_cycle_routes_intersect_linestring.plot(ax=ax, edgecolor='grey', color='royalblue', alpha=1, linewidth=2)
ax.set_axis_off()
_images/85427a471d6c649af4d6822023ad4f739698647d6cf0736c923e3592d9b99343.png

Then, we can also perform a spatial join between the London cycle routes (single linestrings) and the CCZ to find out which cycle routes are crosses the CCZ.

# Perform a spatial join between London cycle routes and CCZ.
gdf_ccz_cycle_routes_cross = gpd.sjoin(gdf_cycle_routes_s, gdf_ccz, how='inner', predicate='crosses')
# We can observe that there are 6 cycle routes that are crossed with the CCZ.
gdf_ccz_cycle_routes_cross
LABEL PROGRAMME ROUTE_NAME ROUTE MILESTONE STATUS PUBLIC_ ROUTE_LENGTH_KM PROGRAMME_UPDATED OBJECTID_left Shape__Length geometry index_right OBJECTID_right BOUNDARY Shape_Area
0 C38 Cycleways Finsbury Park to Highbury Fields Islington to Finsbury (SG2) Option Selection Feasibility Yes 0.566 2023-11-21 00:00:00+00:00 1991 565.899062 LINESTRING Z (531184.797 182813.267 0, 531180.... 0 1 CSS Area 21.375571
22 C3 Cycleways Lancaster Gate to Barking Lancaster Gate to Barking Complete Complete Yes 22.425 2023-11-21 00:00:00+00:00 2013 22425.196420 LINESTRING Z (545215.501 183267.379 0, 545183.... 0 1 CSS Area 21.375571
35 C5 Cycleways Waterloo to Clapham Waterloo to Clapham Complete Complete Yes 2.076 2023-11-21 00:00:00+00:00 2026 2076.093601 LINESTRING Z (530519.358 178020.846 0, 530521.... 0 1 CSS Area 21.375571
83 C14 Cycleways Blackfriars to Rotherhithe Blackfriars to Rotherhithe Complete Complete Yes 6.995 2023-11-21 00:00:00+00:00 2074 6994.926446 LINESTRING Z (536853.163 178583.076 0, 536808.... 0 1 CSS Area 21.375571
120 C Cycleways Oval to C5 Oval to C5 Complete Complete Yes 1.032 2023-11-21 00:00:00+00:00 2111 1032.298014 LINESTRING Z (530701.744 178648.168 0, 530746.... 0 1 CSS Area 21.375571
139 C11 Cycleways Essex Road to Farringdon Essex Road to The City Complete Complete Yes 2.123 2023-11-21 00:00:00+00:00 2130 2123.492454 LINESTRING Z (531860.953 183765.955 0, 532108.... 0 1 CSS Area 21.375571
# We can plot the cycle routes and the CCZ zone.
fig, ax = plt.subplots(figsize=(6, 6))
gdf_ccz.plot(ax=ax, edgecolor='black', color='#fff2f2', alpha=1, linewidth=0.6)
gdf_ccz_cycle_routes_cross.plot(ax=ax, edgecolor='grey', color='royalblue', alpha=1, linewidth=2)
ax.set_axis_off()
_images/2ccc4680a2a72ac13444d57753a92395f501d7fbd8adc25f61bff19003a854bd.png

1.5 An example of spatial and temporal data processing and integration#

Task description:

To build a crime prediction model based on the house price and population data (X); the case study is Kingston upon Hull, city of; geospatial unit of analysis is LSOA level; the temporal unit of analysis is monthly

Data sources:

All data were from ONS and OPEN POLICE DATA UK

  • UK LSOAs data: data/Lower_layer_Super_Output_Areas_(December_2021)_Boundaries_EW_BFC_(V10).geojson

  • Humberside Police: data/police_humberside_2024

  • UK LSOA House Price: data/Mean house prices by lower layer super output area- HPSSA dataset 47.csv

  • UK LSOA population: data/Lower layer Super Output Area population estimates 2019-2022.csv

Note: Ensure that the geographic unit identifiers (e.g., LSOA codes or names) in both the geo file and the CSV file (or other formats) refer to the same version. Geographic boundaries index can change over time — for example, LSOAs have 2011 and 2022 versions — so always verify they match before proceeding with data processing.

1 Read uk LSOA geojson

gdf_lsoa_uk = gpd.read_file('data/Lower_layer_Super_Output_Areas_(December_2021)_Boundaries_EW_BFC_(V10).geojson')
gdf_lsoa_uk.head()
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry
0 1 E01000001 City of London 001A 532123 181632 51.51817 -0.097150 129865.314476 2635.767993 c625aea8-6d73-4b2a-be76-4d5c44cad9f8 POLYGON ((-0.09665 51.52028, -0.09663 51.52025...
1 2 E01000002 City of London 001B 532480 181715 51.51883 -0.091970 228419.782242 2707.816821 52c878e9-ac68-4886-b4a8-fea9cd241a70 POLYGON ((-0.08967 51.52069, -0.08971 51.52058...
2 3 E01000003 City of London 001C 532239 182033 51.52174 -0.095330 59054.204697 1224.573160 b9d8faca-d489-478d-8ce6-acaf76186d7d POLYGON ((-0.0965 51.52295, -0.09644 51.52282,...
3 4 E01000005 City of London 001E 533581 181283 51.51469 -0.076280 189577.709503 2275.805344 15e1417d-537c-4845-9820-fc7596bd59b0 POLYGON ((-0.07568 51.51575, -0.07539 51.51555...
4 5 E01000006 Barking and Dagenham 016A 544994 184274 51.53875 0.089317 146536.995750 1966.092607 8a6c4ee0-c0ff-4736-9cfa-fb12a6d50da0 POLYGON ((0.09125 51.53905, 0.0915 51.5389, 0....

2 Select the LSOAs in Kingston upon Hull

gdf_lsoa_hull = gdf_lsoa_uk[gdf_lsoa_uk['LSOA21NM'].str.contains('Kingston upon Hull')]
# transfer the crs
gdf_lsoa_hull = gdf_lsoa_hull.to_crs(epsg=27700)
print(len(gdf_lsoa_hull))
gdf_lsoa_hull.plot()
168
<Axes: >
_images/c0146bcd5935f5b385aa55e7a3d7f838eb534fb18e5a2779c53ed5a7f9299478.png
gdf_lsoa_hull
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry
12115 12116 E01012756 Kingston upon Hull 025A 507367 430316 53.75817 -0.37294 198939.277603 3497.174098 b1f95abf-436c-4f5a-b6e1-93be9e0eb24d POLYGON ((507432.57 430449.315, 507438.172 430...
12116 12117 E01012757 Kingston upon Hull 025B 508017 429519 53.75088 -0.36336 318088.364120 3716.172876 5f00b0d5-45c1-4529-9e08-881c4c58834d POLYGON ((508038.876 430023.773, 508047.19 429...
12117 12118 E01012758 Kingston upon Hull 018A 507706 430320 53.75814 -0.36780 311920.300018 3772.447527 94cd85a6-b982-4038-9625-333de4b818df POLYGON ((507790.751 430455.62, 507795.274 430...
12118 12119 E01012759 Kingston upon Hull 025C 507184 429662 53.75233 -0.37594 398184.936035 3984.903843 1516f69d-ec4e-4e2b-8f3d-81d5f680f631 POLYGON ((507087.666 429995.28, 507087.776 429...
12119 12120 E01012760 Kingston upon Hull 025D 507714 429795 53.75342 -0.36786 126002.444229 2081.849577 2e8144b1-63d2-4100-b2a5-a1fe09759513 POLYGON ((507913.4 429953.245, 507917.247 4299...
... ... ... ... ... ... ... ... ... ... ... ... ...
33462 33463 E01035468 Kingston upon Hull 031I 507220 427868 53.73621 -0.37602 194811.873474 2342.011663 41c0a371-996d-4c62-8888-fb4830b71528 POLYGON ((507208.747 428145.953, 507211.212 42...
33463 33464 E01035469 Kingston upon Hull 034C 509268 435508 53.80442 -0.34228 602229.054047 4797.372241 ef8cf4ea-28e0-44ba-bdad-f74de2c628c2 POLYGON ((509524.189 435505.919, 509535.379 43...
33464 33465 E01035470 Kingston upon Hull 035A 508599 435590 53.80530 -0.35241 512661.485153 3739.146033 f97f0a63-17c4-461a-84e0-f69e41992515 POLYGON ((508961.99 435419.217, 508962.578 435...
33465 33466 E01035471 Kingston upon Hull 035B 508633 435072 53.80064 -0.35207 204771.100311 2848.973680 1dafa6ed-12b8-45cb-a67f-c559fdccfec6 POLYGON ((508775.302 435262.868, 508776.423 43...
33466 33467 E01035472 Kingston upon Hull 035C 508142 434887 53.79908 -0.35959 503061.344757 3672.250000 25c17302-b93b-4752-a042-f2ceadb41dd8 POLYGON ((507929.806 435254.935, 507986.426 43...

168 rows × 12 columns

# reindex
gdf_lsoa_hull.index = range(len(gdf_lsoa_hull))
gdf_lsoa_hull
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry
0 12116 E01012756 Kingston upon Hull 025A 507367 430316 53.75817 -0.37294 198939.277603 3497.174098 b1f95abf-436c-4f5a-b6e1-93be9e0eb24d POLYGON ((507432.57 430449.315, 507438.172 430...
1 12117 E01012757 Kingston upon Hull 025B 508017 429519 53.75088 -0.36336 318088.364120 3716.172876 5f00b0d5-45c1-4529-9e08-881c4c58834d POLYGON ((508038.876 430023.773, 508047.19 429...
2 12118 E01012758 Kingston upon Hull 018A 507706 430320 53.75814 -0.36780 311920.300018 3772.447527 94cd85a6-b982-4038-9625-333de4b818df POLYGON ((507790.751 430455.62, 507795.274 430...
3 12119 E01012759 Kingston upon Hull 025C 507184 429662 53.75233 -0.37594 398184.936035 3984.903843 1516f69d-ec4e-4e2b-8f3d-81d5f680f631 POLYGON ((507087.666 429995.28, 507087.776 429...
4 12120 E01012760 Kingston upon Hull 025D 507714 429795 53.75342 -0.36786 126002.444229 2081.849577 2e8144b1-63d2-4100-b2a5-a1fe09759513 POLYGON ((507913.4 429953.245, 507917.247 4299...
... ... ... ... ... ... ... ... ... ... ... ... ...
163 33463 E01035468 Kingston upon Hull 031I 507220 427868 53.73621 -0.37602 194811.873474 2342.011663 41c0a371-996d-4c62-8888-fb4830b71528 POLYGON ((507208.747 428145.953, 507211.212 42...
164 33464 E01035469 Kingston upon Hull 034C 509268 435508 53.80442 -0.34228 602229.054047 4797.372241 ef8cf4ea-28e0-44ba-bdad-f74de2c628c2 POLYGON ((509524.189 435505.919, 509535.379 43...
165 33465 E01035470 Kingston upon Hull 035A 508599 435590 53.80530 -0.35241 512661.485153 3739.146033 f97f0a63-17c4-461a-84e0-f69e41992515 POLYGON ((508961.99 435419.217, 508962.578 435...
166 33466 E01035471 Kingston upon Hull 035B 508633 435072 53.80064 -0.35207 204771.100311 2848.973680 1dafa6ed-12b8-45cb-a67f-c559fdccfec6 POLYGON ((508775.302 435262.868, 508776.423 43...
167 33467 E01035472 Kingston upon Hull 035C 508142 434887 53.79908 -0.35959 503061.344757 3672.250000 25c17302-b93b-4752-a042-f2ceadb41dd8 POLYGON ((507929.806 435254.935, 507986.426 43...

168 rows × 12 columns

3 Read Police data

import os
import glob
csv_files = glob.glob('data/police_humberside_2024/**/*.csv', recursive=True)
print(csv_files)
['data/police_humberside_2024/2024-09/2024-09-humberside-street.csv', 'data/police_humberside_2024/2024-07/2024-07-humberside-street.csv', 'data/police_humberside_2024/2024-06/2024-06-humberside-street.csv', 'data/police_humberside_2024/2024-01/2024-01-humberside-street.csv', 'data/police_humberside_2024/2024-08/2024-08-humberside-street.csv', 'data/police_humberside_2024/2024-12/2024-12-humberside-street.csv', 'data/police_humberside_2024/2024-04/2024-04-humberside-street.csv', 'data/police_humberside_2024/2024-03/2024-03-humberside-street.csv', 'data/police_humberside_2024/2024-02/2024-02-humberside-street.csv', 'data/police_humberside_2024/2024-05/2024-05-humberside-street.csv', 'data/police_humberside_2024/2024-11/2024-11-humberside-street.csv', 'data/police_humberside_2024/2024-10/2024-10-humberside-street.csv']
df_crime_hu_2024 = pd.concat([pd.read_csv(f) for f in csv_files])
df_crime_hu_2024
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context
0 26945ea720972254fe4c2f0e2ccc59c49d7354ef2fbdf1... 2024-09 Humberside Police Humberside Police -0.949067 53.604417 On or near St Georges Road E01007641 Doncaster 003F Violence and sexual offences Status update unavailable NaN
1 9bf8204c7cad6f6b5e4290fe564470d06deb29edfcf954... 2024-09 Humberside Police Humberside Police -1.033626 53.642072 On or near Kirk Lane E01007625 Doncaster 004A Vehicle crime Investigation complete; no suspect identified NaN
2 0f2144b1195904783aa3159185c77a3d16550a13625c7f... 2024-09 Humberside Police Humberside Police -1.034449 53.600066 On or near West End E01007626 Doncaster 004B Burglary Awaiting court outcome NaN
3 41192c91566add280cbf766e49dc4ae3e1d31f89274005... 2024-09 Humberside Police Humberside Police -1.105560 53.511434 On or near Lake View E01034240 Doncaster 027E Violence and sexual offences Action to be taken by another organisation NaN
4 e505ec90b4dc6aa04bfe3d3fe3fd661f38efe986049f26... 2024-09 Humberside Police Humberside Police -1.222401 53.479055 On or near Sheldon Avenue E01007537 Doncaster 035A Violence and sexual offences Status update unavailable NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
8118 1ac181333df8d1877ea00ffca5e89aabf4e0bb3ed2d416... 2024-10 Humberside Police Humberside Police -1.141953 53.699111 On or near High Eggborough Lane E01027890 Selby 010B Drugs Status update unavailable NaN
8119 c9ff9608cf5bdb6d2268b2b774ffaebe398ca0503e4933... 2024-10 Humberside Police Humberside Police -1.103616 53.684144 On or near Long Lane E01027924 Selby 010D Violence and sexual offences Status update unavailable NaN
8120 eeb33d2af2d411d82ea5247bd8c51892c6aaaeb92af6b2... 2024-10 Humberside Police Humberside Police -1.103616 53.684144 On or near Long Lane E01027924 Selby 010D Violence and sexual offences Status update unavailable NaN
8121 224054ca4eb68817bad2bdb0ac0405825468da1409fb33... 2024-10 Humberside Police Humberside Police -0.752364 53.390423 On or near Riby Close E01026375 West Lindsey 006B Violence and sexual offences Unable to prosecute suspect NaN
8122 1183a22df9f1921bfdf8564b974c4d3b198dd93426b306... 2024-10 Humberside Police Humberside Police -1.053831 53.977489 On or near New Lane E01013409 York 005C Violence and sexual offences Status update unavailable NaN

98603 rows × 12 columns

4 Use spatial join to link the crime point and geo LSOA 2021 (we do not use the LSOA code in the df crime as it is 2011 LSOA index version)

# create the gdf from the df_crime_hu_2024 with x and y coordinates
gdf_crime_hu = gpd.GeoDataFrame(df_crime_hu_2024, geometry=gpd.points_from_xy(df_crime_hu_2024['Longitude'],
                                                                              df_crime_hu_2024['Latitude']), crs='EPSG:4326')
# transform the CRS to EPSG:27700 (British National Grid)
gdf_crime_hu = gdf_crime_hu.to_crs(epsg=27700)
gdf_crime_hu
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context geometry
0 26945ea720972254fe4c2f0e2ccc59c49d7354ef2fbdf1... 2024-09 Humberside Police Humberside Police -0.949067 53.604417 On or near St Georges Road E01007641 Doncaster 003F Violence and sexual offences Status update unavailable NaN POINT (469638.021 412497.03)
1 9bf8204c7cad6f6b5e4290fe564470d06deb29edfcf954... 2024-09 Humberside Police Humberside Police -1.033626 53.642072 On or near Kirk Lane E01007625 Doncaster 004A Vehicle crime Investigation complete; no suspect identified NaN POINT (463986.014 416606.954)
2 0f2144b1195904783aa3159185c77a3d16550a13625c7f... 2024-09 Humberside Police Humberside Police -1.034449 53.600066 On or near West End E01007626 Doncaster 004B Burglary Awaiting court outcome NaN POINT (463994.999 411932.963)
3 41192c91566add280cbf766e49dc4ae3e1d31f89274005... 2024-09 Humberside Police Humberside Police -1.105560 53.511434 On or near Lake View E01034240 Doncaster 027E Violence and sexual offences Action to be taken by another organisation NaN POINT (459412.986 402011.051)
4 e505ec90b4dc6aa04bfe3d3fe3fd661f38efe986049f26... 2024-09 Humberside Police Humberside Police -1.222401 53.479055 On or near Sheldon Avenue E01007537 Doncaster 035A Violence and sexual offences Status update unavailable NaN POINT (451704.012 398317.971)
... ... ... ... ... ... ... ... ... ... ... ... ... ...
8118 1ac181333df8d1877ea00ffca5e89aabf4e0bb3ed2d416... 2024-10 Humberside Police Humberside Police -1.141953 53.699111 On or near High Eggborough Lane E01027890 Selby 010B Drugs Status update unavailable NaN POINT (456747.97 422860.963)
8119 c9ff9608cf5bdb6d2268b2b774ffaebe398ca0503e4933... 2024-10 Humberside Police Humberside Police -1.103616 53.684144 On or near Long Lane E01027924 Selby 010D Violence and sexual offences Status update unavailable NaN POINT (459300.004 421227.048)
8120 eeb33d2af2d411d82ea5247bd8c51892c6aaaeb92af6b2... 2024-10 Humberside Police Humberside Police -1.103616 53.684144 On or near Long Lane E01027924 Selby 010D Violence and sexual offences Status update unavailable NaN POINT (459300.004 421227.048)
8121 224054ca4eb68817bad2bdb0ac0405825468da1409fb33... 2024-10 Humberside Police Humberside Police -0.752364 53.390423 On or near Riby Close E01026375 West Lindsey 006B Violence and sexual offences Unable to prosecute suspect NaN POINT (483070.027 388901.029)
8122 1183a22df9f1921bfdf8564b974c4d3b198dd93426b306... 2024-10 Humberside Police Humberside Police -1.053831 53.977489 On or near New Lane E01013409 York 005C Violence and sexual offences Status update unavailable NaN POINT (462153.013 453906.043)

98603 rows × 13 columns

gdf_crime_hu.plot()
<Axes: >
_images/2387b3050d61fb995813840c4467212f5ed84b92a9918413e10cb51acb90dacc.png
gdf_crime_hu
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context geometry
0 26945ea720972254fe4c2f0e2ccc59c49d7354ef2fbdf1... 2024-09 Humberside Police Humberside Police -0.949067 53.604417 On or near St Georges Road E01007641 Doncaster 003F Violence and sexual offences Status update unavailable NaN POINT (469638.021 412497.03)
1 9bf8204c7cad6f6b5e4290fe564470d06deb29edfcf954... 2024-09 Humberside Police Humberside Police -1.033626 53.642072 On or near Kirk Lane E01007625 Doncaster 004A Vehicle crime Investigation complete; no suspect identified NaN POINT (463986.014 416606.954)
2 0f2144b1195904783aa3159185c77a3d16550a13625c7f... 2024-09 Humberside Police Humberside Police -1.034449 53.600066 On or near West End E01007626 Doncaster 004B Burglary Awaiting court outcome NaN POINT (463994.999 411932.963)
3 41192c91566add280cbf766e49dc4ae3e1d31f89274005... 2024-09 Humberside Police Humberside Police -1.105560 53.511434 On or near Lake View E01034240 Doncaster 027E Violence and sexual offences Action to be taken by another organisation NaN POINT (459412.986 402011.051)
4 e505ec90b4dc6aa04bfe3d3fe3fd661f38efe986049f26... 2024-09 Humberside Police Humberside Police -1.222401 53.479055 On or near Sheldon Avenue E01007537 Doncaster 035A Violence and sexual offences Status update unavailable NaN POINT (451704.012 398317.971)
... ... ... ... ... ... ... ... ... ... ... ... ... ...
8118 1ac181333df8d1877ea00ffca5e89aabf4e0bb3ed2d416... 2024-10 Humberside Police Humberside Police -1.141953 53.699111 On or near High Eggborough Lane E01027890 Selby 010B Drugs Status update unavailable NaN POINT (456747.97 422860.963)
8119 c9ff9608cf5bdb6d2268b2b774ffaebe398ca0503e4933... 2024-10 Humberside Police Humberside Police -1.103616 53.684144 On or near Long Lane E01027924 Selby 010D Violence and sexual offences Status update unavailable NaN POINT (459300.004 421227.048)
8120 eeb33d2af2d411d82ea5247bd8c51892c6aaaeb92af6b2... 2024-10 Humberside Police Humberside Police -1.103616 53.684144 On or near Long Lane E01027924 Selby 010D Violence and sexual offences Status update unavailable NaN POINT (459300.004 421227.048)
8121 224054ca4eb68817bad2bdb0ac0405825468da1409fb33... 2024-10 Humberside Police Humberside Police -0.752364 53.390423 On or near Riby Close E01026375 West Lindsey 006B Violence and sexual offences Unable to prosecute suspect NaN POINT (483070.027 388901.029)
8122 1183a22df9f1921bfdf8564b974c4d3b198dd93426b306... 2024-10 Humberside Police Humberside Police -1.053831 53.977489 On or near New Lane E01013409 York 005C Violence and sexual offences Status update unavailable NaN POINT (462153.013 453906.043)

98603 rows × 13 columns

# spatial join the gdf_crime_hu and gdf_lsoa_hull to get the LSOA index for each crime incident (we don't use the 'LSOA codes' in the df_crime_hu_2024)
gdf_crime_hull_lsoa = gpd.sjoin(gdf_crime_hu, gdf_lsoa_hull, how='inner', predicate='within')
gdf_crime_hull_lsoa
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type ... LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID
1733 NaN 2024-09 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Anti-social behaviour ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
1734 e725eb4dd48c94eb57a79eb058ec04b3e08cf3568b7a13... 2024-09 Humberside Police Humberside Police -0.321930 53.797475 On or near Cosford Garth E01012782 Kingston upon Hull 002A Public order ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
1735 677eb12804c1fcbb51009676b156185860784be90cac2f... 2024-09 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Vehicle crime ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
1736 59a88431621edfdd7a00fc76483a5681048e544aeb8039... 2024-09 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Violence and sexual offences ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
1737 f5495044a3393692da9d87a230fd1e2fc71fc0f957d0d3... 2024-09 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Violence and sexual offences ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4865 3ef9fd9a74ab40e880e71733eaa2de6432b1139e97d465... 2024-10 Humberside Police Humberside Police -0.355095 53.793485 On or near Raich Carter Way E01033107 Kingston upon Hull 035E Shoplifting ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
4866 6dad4e91318a8e31e8f2a160afe6e86c67167fc75b20ce... 2024-10 Humberside Police Humberside Police -0.350433 53.795452 On or near Runnymede Way E01033107 Kingston upon Hull 035E Shoplifting ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
4867 a099208f43ab24f8795aef51aa0c7f1ea1c38103c01373... 2024-10 Humberside Police Humberside Police -0.350433 53.795452 On or near Runnymede Way E01033107 Kingston upon Hull 035E Shoplifting ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
4868 b19c91361ef09463e19f83ec253487ab7dcb78bcdd9e3c... 2024-10 Humberside Police Humberside Police -0.348154 53.798576 On or near Halecroft Park E01033107 Kingston upon Hull 035E Vehicle crime ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
4869 4c464b1fef6838cc3c986b0ad24efaa4a754a3bd0390c7... 2024-10 Humberside Police Humberside Police -0.353396 53.796940 On or near Knightley Way E01033107 Kingston upon Hull 035E Violence and sexual offences ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1

37931 rows × 25 columns

gdf_crime_hull_lsoa.plot()
<Axes: >
_images/baf4860e2f04654b03a70613680a6d6234f4ce06eb633969dd24ed60068abf22.png

5 Aggregation in space and time: Now we have a df crime at LSOA and Month level

df_crime_hull_agg = gdf_crime_hull_lsoa.groupby(['LSOA21CD', 'Month']).agg({'Crime ID': 'count'}).reset_index().rename(columns={'Crime ID': 'Numbers'})
df_crime_hull_agg
LSOA21CD Month Numbers
0 E01012756 2024-01 3
1 E01012756 2024-02 11
2 E01012756 2024-03 15
3 E01012756 2024-04 10
4 E01012756 2024-05 12
... ... ... ...
1997 E01035472 2024-07 4
1998 E01035472 2024-09 4
1999 E01035472 2024-10 1
2000 E01035472 2024-11 3
2001 E01035472 2024-12 10

2002 rows × 3 columns

6 Merge geo LSOA and selected LSOA-month-level crime data, if you need the visualisation using geopandas

# select the 2024-04
gdf_crime_hull_agg_Apr = pd.merge(gdf_lsoa_hull, df_crime_hull_agg[df_crime_hull_agg.Month == '2024-04'], on='LSOA21CD', how='left')
gdf_crime_hull_agg_Apr
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry Month Numbers
0 12116 E01012756 Kingston upon Hull 025A 507367 430316 53.75817 -0.37294 198939.277603 3497.174098 b1f95abf-436c-4f5a-b6e1-93be9e0eb24d POLYGON ((507432.57 430449.315, 507438.172 430... 2024-04 10.0
1 12117 E01012757 Kingston upon Hull 025B 508017 429519 53.75088 -0.36336 318088.364120 3716.172876 5f00b0d5-45c1-4529-9e08-881c4c58834d POLYGON ((508038.876 430023.773, 508047.19 429... 2024-04 24.0
2 12118 E01012758 Kingston upon Hull 018A 507706 430320 53.75814 -0.36780 311920.300018 3772.447527 94cd85a6-b982-4038-9625-333de4b818df POLYGON ((507790.751 430455.62, 507795.274 430... 2024-04 6.0
3 12119 E01012759 Kingston upon Hull 025C 507184 429662 53.75233 -0.37594 398184.936035 3984.903843 1516f69d-ec4e-4e2b-8f3d-81d5f680f631 POLYGON ((507087.666 429995.28, 507087.776 429... 2024-04 18.0
4 12120 E01012760 Kingston upon Hull 025D 507714 429795 53.75342 -0.36786 126002.444229 2081.849577 2e8144b1-63d2-4100-b2a5-a1fe09759513 POLYGON ((507913.4 429953.245, 507917.247 4299... 2024-04 9.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 33463 E01035468 Kingston upon Hull 031I 507220 427868 53.73621 -0.37602 194811.873474 2342.011663 41c0a371-996d-4c62-8888-fb4830b71528 POLYGON ((507208.747 428145.953, 507211.212 42... 2024-04 28.0
164 33464 E01035469 Kingston upon Hull 034C 509268 435508 53.80442 -0.34228 602229.054047 4797.372241 ef8cf4ea-28e0-44ba-bdad-f74de2c628c2 POLYGON ((509524.189 435505.919, 509535.379 43... 2024-04 1.0
165 33465 E01035470 Kingston upon Hull 035A 508599 435590 53.80530 -0.35241 512661.485153 3739.146033 f97f0a63-17c4-461a-84e0-f69e41992515 POLYGON ((508961.99 435419.217, 508962.578 435... 2024-04 6.0
166 33466 E01035471 Kingston upon Hull 035B 508633 435072 53.80064 -0.35207 204771.100311 2848.973680 1dafa6ed-12b8-45cb-a67f-c559fdccfec6 POLYGON ((508775.302 435262.868, 508776.423 43... NaN NaN
167 33467 E01035472 Kingston upon Hull 035C 508142 434887 53.79908 -0.35959 503061.344757 3672.250000 25c17302-b93b-4752-a042-f2ceadb41dd8 POLYGON ((507929.806 435254.935, 507986.426 43... 2024-04 9.0

168 rows × 14 columns

fig, ax = plt.subplots(figsize=(8, 8))
# plot the crime data
gdf_crime_hull_agg_Apr.plot(ax=ax, column='Numbers', cmap='Reds', edgecolor='black', linewidth=0.05, legend=True)
ax.set_title('Crime Numbers in Kingston upon Hull LSOAs (Apr 2024)')
ax.axis('off')
plt.show()
_images/6e1c48ffdeeb1419a3bc9825077084a3c3fe37922c7b00942407659fe37a1a0f.png

7 Read house price data

# read the csv file
df_house_price = pd.read_csv('data/Mean house prices by lower layer super output area- HPSSA dataset 47.csv')
df_house_price_hull = df_house_price[df_house_price['Local authority name'].str.contains('Hull')]
df_house_price_hull
Local authority code Local authority name LSOA code LSOA name Year ending Dec 1995 Year ending Mar 1996 Year ending Jun 1996 Year ending Sep 1996 Year ending Dec 1996 Year ending Mar 1997 ... Year ending Sep 2020 Year ending Dec 2020 Year ending Mar 2021 Year ending Jun 2021 Year ending Sep 2021 Year ending Dec 2021 Year ending Mar 2022 Year ending Jun 2022 Year ending Sep 2022 Year ending Dec 2022
808 E06000010 Kingston upon Hull, City of E01012756 Kingston upon Hull 025A 40,920 41,652 40,819 42,748 39,059 39,567 ... 135,647 151,126 159,225 152,954 146,913 148,260 143,033 141,745 148,008 132,769
809 E06000010 Kingston upon Hull, City of E01012757 Kingston upon Hull 025B 31,376 29,795 30,367 30,543 30,956 32,781 ... 132,178 123,194 127,827 135,008 141,380 148,043 135,320 139,061 130,330 137,602
810 E06000010 Kingston upon Hull, City of E01012758 Kingston upon Hull 018A 49,324 46,195 43,328 51,442 60,738 64,053 ... 224,715 238,168 230,783 239,213 249,940 238,605 246,214 240,353 252,263 275,177
811 E06000010 Kingston upon Hull, City of E01012759 Kingston upon Hull 025C 29,197 32,087 32,336 30,748 30,246 27,783 ... 110,625 114,500 106,262 104,782 106,537 115,615 104,722 106,965 108,325 104,977
812 E06000010 Kingston upon Hull, City of E01012760 Kingston upon Hull 025D 28,461 28,470 27,965 28,415 28,411 28,692 ... 94,397 93,843 92,871 95,963 98,633 102,472 108,855 114,807 121,153 124,452
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
969 E06000010 Kingston upon Hull, City of E01033106 Kingston upon Hull 001G : : : : : : ... 166,842 174,737 175,191 183,083 182,017 182,152 183,913 181,918 185,428 190,369
970 E06000010 Kingston upon Hull, City of E01033107 Kingston upon Hull 001H : : : : : : ... 173,306 190,222 176,182 173,658 187,520 192,960 200,920 207,833 213,442 210,155
971 E06000010 Kingston upon Hull, City of E01033108 Kingston upon Hull 001I 44,934 44,546 44,415 45,978 44,602 45,403 ... 213,374 210,026 207,826 206,988 194,688 196,092 195,953 192,474 193,118 191,822
972 E06000010 Kingston upon Hull, City of E01033109 Kingston upon Hull 029F 29,343 29,512 31,809 28,661 27,863 28,073 ... 89,854 102,372 115,406 112,238 112,042 109,554 88,679 92,357 92,115 97,469
973 E06000010 Kingston upon Hull, City of E01033110 Kingston upon Hull 031G 22,486 21,764 21,045 20,791 19,715 20,092 ... 141,950 139,057 135,223 139,014 136,958 136,026 136,205 136,937 131,466 138,136

166 rows × 113 columns

8 Merge the df crime at lsoa and month level and the house price

df_crime_hull_agg_hp = pd.merge(df_crime_hull_agg, df_house_price_hull[['LSOA code', 'Year ending Dec 2022']], left_on='LSOA21CD', right_on='LSOA code', how='left')
df_crime_hull_agg_hp
LSOA21CD Month Numbers LSOA code Year ending Dec 2022
0 E01012756 2024-01 3 E01012756 132,769
1 E01012756 2024-02 11 E01012756 132,769
2 E01012756 2024-03 15 E01012756 132,769
3 E01012756 2024-04 10 E01012756 132,769
4 E01012756 2024-05 12 E01012756 132,769
... ... ... ... ... ...
1997 E01035472 2024-07 4 NaN NaN
1998 E01035472 2024-09 4 NaN NaN
1999 E01035472 2024-10 1 NaN NaN
2000 E01035472 2024-11 3 NaN NaN
2001 E01035472 2024-12 10 NaN NaN

2002 rows × 5 columns

df_crime_hull_agg_hp['Year ending Dec 2022'].values[0]
'132,769'
# we need to do data cleaning for the column 'Year ending Dec 2022'
# the column 'Year ending Dec 2022' is a string, we need to convert it to float
df_crime_hull_agg_hp['Year ending Dec 2022'] = df_crime_hull_agg_hp['Year ending Dec 2022'].fillna('')
df_crime_hull_agg_hp['Year ending Dec 2022'] =  df_crime_hull_agg_hp['Year ending Dec 2022'].replace({':': '', ',': ''}, regex=True)
df_crime_hull_agg_hp['Year ending Dec 2022'] = df_crime_hull_agg_hp['Year ending Dec 2022'].replace('', 0)
df_crime_hull_agg_hp['Year ending Dec 2022'] = df_crime_hull_agg_hp['Year ending Dec 2022'].astype(float)
df_crime_hull_agg_hp
LSOA21CD Month Numbers LSOA code Year ending Dec 2022
0 E01012756 2024-01 3 E01012756 132769.0
1 E01012756 2024-02 11 E01012756 132769.0
2 E01012756 2024-03 15 E01012756 132769.0
3 E01012756 2024-04 10 E01012756 132769.0
4 E01012756 2024-05 12 E01012756 132769.0
... ... ... ... ... ...
1997 E01035472 2024-07 4 NaN 0.0
1998 E01035472 2024-09 4 NaN 0.0
1999 E01035472 2024-10 1 NaN 0.0
2000 E01035472 2024-11 3 NaN 0.0
2001 E01035472 2024-12 10 NaN 0.0

2002 rows × 5 columns

df_crime_hull_agg_hp = df_crime_hull_agg_hp.rename(columns={'Year ending Dec 2022': 'House Price 2022'})
df_crime_hull_agg_hp
LSOA21CD Month Numbers LSOA code House Price 2022
0 E01012756 2024-01 3 E01012756 132769.0
1 E01012756 2024-02 11 E01012756 132769.0
2 E01012756 2024-03 15 E01012756 132769.0
3 E01012756 2024-04 10 E01012756 132769.0
4 E01012756 2024-05 12 E01012756 132769.0
... ... ... ... ... ...
1997 E01035472 2024-07 4 NaN 0.0
1998 E01035472 2024-09 4 NaN 0.0
1999 E01035472 2024-10 1 NaN 0.0
2000 E01035472 2024-11 3 NaN 0.0
2001 E01035472 2024-12 10 NaN 0.0

2002 rows × 5 columns

9 Read population data and merge to crime data

# read the population data
df_population = pd.read_csv('data/Lower layer Super Output Area population estimates 2019-2022.csv')
df_population.head()
LAD 2021 Code LAD 2021 Name LSOA 2021 Code LSOA 2021 Name Total F0 F1 F2 F3 F4 ... M81 M82 M83 M84 M85 M86 M87 M88 M89 M90
0 E06000001 Hartlepool E01011949 Hartlepool 009A 1,870 15 3 11 13 6 ... 6 6 3 3 3 3 2 1 3 1
1 E06000001 Hartlepool E01011950 Hartlepool 008A 1,097 6 5 8 8 5 ... 1 1 1 2 0 0 0 0 0 0
2 E06000001 Hartlepool E01011951 Hartlepool 007A 1,241 8 7 5 8 3 ... 1 1 3 1 2 2 1 0 1 0
3 E06000001 Hartlepool E01011952 Hartlepool 002A 1,615 13 11 17 15 15 ... 0 3 1 2 3 3 4 3 0 13
4 E06000001 Hartlepool E01011953 Hartlepool 002B 1,982 9 12 18 11 13 ... 3 4 3 2 0 1 0 0 1 4

5 rows × 187 columns

df_population_hull = df_population[df_population['LAD 2021 Name'].str.contains('Kingston upon Hull')]
df_population_hull.head()
LAD 2021 Code LAD 2021 Name LSOA 2021 Code LSOA 2021 Name Total F0 F1 F2 F3 F4 ... M81 M82 M83 M84 M85 M86 M87 M88 M89 M90
822 E06000010 Kingston upon Hull E01012756 Kingston upon Hull 025A 1,477 7 2 4 9 6 ... 5 4 7 5 0 0 0 0 0 0
823 E06000010 Kingston upon Hull E01012757 Kingston upon Hull 025B 1,388 4 8 3 6 3 ... 2 1 2 1 2 2 2 4 1 1
824 E06000010 Kingston upon Hull E01012758 Kingston upon Hull 018A 1,538 3 5 5 5 4 ... 1 0 2 1 0 2 1 0 0 0
825 E06000010 Kingston upon Hull E01012759 Kingston upon Hull 025C 1,670 21 8 18 7 16 ... 1 0 1 3 0 1 1 1 0 2
826 E06000010 Kingston upon Hull E01012760 Kingston upon Hull 025D 1,435 10 3 3 10 4 ... 2 1 3 1 0 0 1 0 0 0

5 rows × 187 columns

df_population_hull = df_population_hull[['LSOA 2021 Code', 'Total']]
# merge the population data with the gdf_crime_hull_agg_hp
df_crime_hull_agg_hp_pop = pd.merge(df_crime_hull_agg_hp, df_population_hull, left_on='LSOA21CD', right_on='LSOA 2021 Code', how='left')
df_crime_hull_agg_hp_pop
LSOA21CD Month Numbers LSOA code House Price 2022 LSOA 2021 Code Total
0 E01012756 2024-01 3 E01012756 132769.0 E01012756 1,477
1 E01012756 2024-02 11 E01012756 132769.0 E01012756 1,477
2 E01012756 2024-03 15 E01012756 132769.0 E01012756 1,477
3 E01012756 2024-04 10 E01012756 132769.0 E01012756 1,477
4 E01012756 2024-05 12 E01012756 132769.0 E01012756 1,477
... ... ... ... ... ... ... ...
1997 E01035472 2024-07 4 NaN 0.0 E01035472 1,768
1998 E01035472 2024-09 4 NaN 0.0 E01035472 1,768
1999 E01035472 2024-10 1 NaN 0.0 E01035472 1,768
2000 E01035472 2024-11 3 NaN 0.0 E01035472 1,768
2001 E01035472 2024-12 10 NaN 0.0 E01035472 1,768

2002 rows × 7 columns

df_crime_hull_agg_hp_pop.Total.values
array(['1,477', '1,477', '1,477', ..., '1,768', '1,768', '1,768'],
      shape=(2002,), dtype=object)
# we need to rename the column 'Total' to 'Population' and do data cleaning
df_crime_hull_agg_hp_pop['Total'] = df_crime_hull_agg_hp_pop['Total'].replace({ ',': ''}, regex=True).astype(float)
df_crime_hull_agg_hp_pop = df_crime_hull_agg_hp_pop.rename(columns={'Total': 'Population'})
df_crime_hull_agg_hp_pop
LSOA21CD Month Numbers LSOA code House Price 2022 LSOA 2021 Code Population
0 E01012756 2024-01 3 E01012756 132769.0 E01012756 1477.0
1 E01012756 2024-02 11 E01012756 132769.0 E01012756 1477.0
2 E01012756 2024-03 15 E01012756 132769.0 E01012756 1477.0
3 E01012756 2024-04 10 E01012756 132769.0 E01012756 1477.0
4 E01012756 2024-05 12 E01012756 132769.0 E01012756 1477.0
... ... ... ... ... ... ... ...
1997 E01035472 2024-07 4 NaN 0.0 E01035472 1768.0
1998 E01035472 2024-09 4 NaN 0.0 E01035472 1768.0
1999 E01035472 2024-10 1 NaN 0.0 E01035472 1768.0
2000 E01035472 2024-11 3 NaN 0.0 E01035472 1768.0
2001 E01035472 2024-12 10 NaN 0.0 E01035472 1768.0

2002 rows × 7 columns