Week4 Lab Tutorial Timeline

Time

Activity

16:00–16:05

Introduction — Overview of the main tasks for the lab tutorials

16:05–16:45

Tutorial: Spatial patterns 1 — Follow Section 4.1-4.2 of the Jupyter Notebook to practice the spatial weights and global spatial autocorrelation

16:45–17:30

Tutorial: Spatial patterns 2 — Follow Section 4.3-4.4 of the Jupyter Notebook to practice the local spatial autocorrelation and KDE

17:30–17:55

Quiz — Complete quiz tasks

17:55–18:00

Wrap-up — Recap key points and address final questions

For this module’s lab tutorials, you can download all the required data using the provided link (click).

Please make sure that both the Jupyter Notebook file and the data and img folder are placed in the same directory (specifically within the STBDA_lab folder) to ensure the code runs correctly.

Week 4 Key Takeaways:

  • Understand the concept of spatial dependence and spatial autocorrelation.

  • Compute spatial weights using different methods.

  • Compute global spatial autocorrelation using Moran’s I.

  • Compute local spatial autocorrelation using Local Moran’s I and Getis-Ord Gi*.

  • Implement Kernel Density Estimation (KDE) to visualize spatial patterns.

4 Spatial patterns analysis#

Spatial dependence is the phenomenon where the value of a variable at one location is influenced by the values of the same variable at nearby or neighboring locations (Near things are more related than distant things.— based on Tobler’s First Law of Geography).

If spatial data are sampled at nearby locations, they tend to be correlated with one another. In other words, Spatial dependence is a fundamental concept in spatial analysis and is often quantified using measures of spatial autocorrelation.

The phenomena of spatial dependence can be observed in the real world, for example, the population density of a city is likely to be similar in nearby neighborhoods, but different in neighborhoods that are far apart; or the rainfall in a region is likely to be similar in nearby areas, but different in areas that are far apart.

Autocorrelation is the correlation of a variable (or series) with itself.

Spatial autocorrelation is a statistical measure of the degree to which spatial dependence exists in the data. It’s a quantified outcome that shows how strongly (and in what way) spatial dependence appears. It can be positive (similar values are clustered together), negative (dissimilar values are clustered together), or zero (no spatial pattern).

4.1 Spatial Weights#

A spatial weight matrix (spatial adjacency matrix) \(W\) is a square matrix that quantifies the spatial relationship between each pair of locations (\(i\) and \(j\) are the spatial unit index) across \(n\) spatial units:

\[\begin{split}W = \begin{bmatrix} w_{11} & w_{12} & \dots & w_{1n} \\ w_{21} & w_{22} & \dots & w_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ w_{n1} & w_{n2} & \dots & w_{nn} \end{bmatrix}\end{split}\]
  • \(w_{ij}\) is the spatial weight between location \(i\) and \(j\);

  • Typically, \(w_{ii} = 0\) (no self-weight).

For example, if we have 4 areas/locations (A, B, C, D) with the index (1, 2, 3, 4) and the following spatial weight matrix can be \(W\): $$ W =

(1)#\[\begin{bmatrix} w_{11} & w_{12} & w_{13} & w_{14} \\ w_{21} & w_{22} & w_{23} & w_{24} \\ w_{31} & w_{32} & w_{33} & w_{34} \\ w_{41} & w_{42} & w_{43} & w_{44} \end{bmatrix}\]

$$

How to compute the spatial weigt \(w_{i,j}\): we can list a tabel to show the different types of computing the spatial weight \(w_{i,j}\):

Type of spatial weight

Description

Formula

Value type

Contiguity (Queen)

Based on the adjacency of spatial units

\(w_{i,j} = 1\) if \(i\) and \(j\) are neighbors (for polygons, neighbors share an edge or corner), else \(0\)

Binary

Contiguity (Rook)

Based on the adjacency of spatial units

\(w_{i,j} = 1\) if \(i\) and \(j\) are neighbors (for polygons, neighbors share only an edge), else \(0\)

Binary

Distance

Based on the distance between spatial units

\(w_{i,j} = \frac{1}{d_{i,j}}\) if \(d_{i,j} < d_{threshold}\), else \(0\)

Binary

K-nearest neighbors

Based on the distance to the \(k\) nearest neighbors

\(w_{i,j} = 1\) if \(j\) is one of the \(k\) nearest neighbors of \(i\), else \(0\)

Binary

Inverse distance

Based on the inverse of the distance between spatial units

\(w_{i,j} = \frac{1}{d_{i,j}^k}\) if \(d_{i,j} < d_{threshold}\), else \(0\)

Continuous

Row-standardized

Based on the row-standardization of the spatial weight matrix

\(w_{i,j} = \frac{w_{i,j}}{\sum_j w_{i,j}}\)

Continuous

Column-standardized

Based on the column-standardization of the spatial weight matrix

\(w_{i,j} = \frac{w_{i,j}}{\sum_i w_{i,j}}\)

Continuous

Symmetric

Based on the symmetric standardization of the spatial weight matrix

\(w_{i,j} = \frac{w_{i,j}}{\sum_i w_{i,j} + \sum_j w_{i,j}}\)

Continuous

Now, we use the libpysal library to compute the spatial weights for selected central London boroughs/LAs (polygons).

# Import the required libraries
import geopandas as gpd
import libpysal.weights as weights
import matplotlib.pyplot as plt
import contextily as cx
import pandas as pd
import warnings

from spreg import white

warnings.filterwarnings("ignore")
# read the shapefile
gdf_uk_la = gpd.read_file('data/Local_Authority_Districts_December_2024_Boundaries_UK_BSC.geojson')
list_las_c_london = ['Camden', 'Hackney', 'Islington', 'Tower Hamlets', 'City of London', 'Westminster']
# Filter the GeoDataFrame to include only the inner London boroughs
gdf_c_london = gdf_uk_la[gdf_uk_la['LAD24NM'].isin(list_las_c_london)]
# transform the CRS to EPSG:27700 (British National Grid)
gdf_c_london = gdf_c_london.to_crs(epsg=27700)
gdf_c_london
FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID geometry
263 264 E09000001 City of London 532382 181358 -0.09352 51.51564 741710fd-03e1-4b41-8645-7ebcfc5961ac POLYGON ((533404.882 182037.774, 533534.125 18...
269 270 E09000007 Camden 527491 184283 -0.16291 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 POLYGON ((529150.075 185862.023, 529701.341 18...
274 275 E09000012 Hackney 534560 185787 -0.06046 51.55493 a41cf0fc-95ed-4823-a9db-7b8bc60f421b POLYGON ((537570.535 185494.637, 537638.712 18...
281 282 E09000019 Islington 531160 184645 -0.10990 51.54547 43a872fc-e401-45af-a2e6-3b2befc98837 POLYGON ((533382.221 185149.793, 533460.932 18...
292 293 E09000030 Tower Hamlets 536340 181452 -0.03648 51.51555 54f82e61-dc15-42cb-b777-681b4326f855 POLYGON ((536480.759 184698.334, 536779.206 18...
295 296 E09000033 Westminster 528268 180871 -0.15295 51.51222 a52ff0f8-4ac0-4417-8338-01be3cbb516a POLYGON ((526773.283 183663.262, 527373.359 18...
# plot the inner London boroughs
fig, ax = plt.subplots(figsize=(6, 6))
gdf_c_london.plot(ax=ax, color='lightgrey', edgecolor='black')
# set each name for the borough
for x, y, label in zip(gdf_c_london.geometry.centroid.x, gdf_c_london.geometry.centroid.y, gdf_c_london['LAD24NM']):
    ax.text(x, y, label, fontsize=8, ha='center', va='center')
ax.axis('off')
plt.show()
_images/b3c6f2f00865da4a2a1b51c4a5294d431e4cc4f098ea48dcc56df952c35df1f8.png
# set the index to the LAD24NM column
gdf_c_london.index = gdf_c_london['LAD24NM']
gdf_c_london.index
Index(['City of London', 'Camden', 'Hackney', 'Islington', 'Tower Hamlets',
       'Westminster'],
      dtype='object', name='LAD24NM')

Contiguity – Queen method

# Compute the spatial weights using the Queen contiguity method
w_q = weights.Queen.from_dataframe(gdf_c_london, use_index='True')
# the numbers of units
print('the numbers of units', w_q.n)
the numbers of units 6
# print the spatial weights df
df_wq = w_q.to_adjlist(drop_islands=True)
df_wq
focal neighbor weight
0 City of London Camden 1.0
1 City of London Hackney 1.0
2 City of London Islington 1.0
3 City of London Tower Hamlets 1.0
4 City of London Westminster 1.0
5 Camden City of London 1.0
6 Camden Islington 1.0
7 Camden Westminster 1.0
8 Hackney City of London 1.0
9 Hackney Islington 1.0
10 Hackney Tower Hamlets 1.0
11 Islington City of London 1.0
12 Islington Camden 1.0
13 Islington Hackney 1.0
14 Tower Hamlets City of London 1.0
15 Tower Hamlets Hackney 1.0
16 Westminster City of London 1.0
17 Westminster Camden 1.0
# we can transfer the spatial weights list (df_wq) to spatial weight/adjacency matrix in pandas
df_wq_matrix = df_wq.pivot(index='focal', columns='neighbor', values='weight').fillna(0)
df_wq_matrix
neighbor Camden City of London Hackney Islington Tower Hamlets Westminster
focal
Camden 0.0 1.0 0.0 1.0 0.0 1.0
City of London 1.0 0.0 1.0 1.0 1.0 1.0
Hackney 0.0 1.0 0.0 1.0 1.0 0.0
Islington 1.0 1.0 1.0 0.0 0.0 0.0
Tower Hamlets 0.0 1.0 1.0 0.0 0.0 0.0
Westminster 1.0 1.0 0.0 0.0 0.0 0.0
# we can use .neighbors to get the neighbors of each unit
w_q.neighbors['Camden']
['Westminster', 'Islington', 'City of London']

We can also plot the spatial weights using a network/graph representation, where the nodes represent the spatial units and the edges represent the spatial weights, here, if the two units are neighbors, the edge is drawn between them.

#plot the spatial weights using a network/graph representation
fig, ax = plt.subplots(figsize=(6, 6))
gdf_c_london.plot(ax=ax, color='lightgrey', edgecolor='black', alpha=0.7)
# set each name for the borough
for x, y, label in zip(gdf_c_london.geometry.centroid.x, gdf_c_london.geometry.centroid.y, gdf_c_london['LAD24NM']):
    ax.text(x, y, label, fontsize=8, ha='center', va='center')
ax.axis('off')
w_q.plot(gdf_c_london, ax=ax, color='orange')
plt.show()
_images/141a7f775534b565501147531d5189482f37b65752ed8c4bc80ef9c7b81364cc.png

Distance-based method

Distance-based spatial weights are based on the distance between spatial units, usually, the point locations. So, we transform the polygons to points (centroids) and then compute the distance-based spatial weights.

# transform the polygons to points (centroids)
gdf_c_london['geometry'] = gdf_c_london.geometry.centroid
# now six LAs are represented by their centroids/points
gdf_c_london
FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID geometry
LAD24NM
City of London 264 E09000001 City of London 532382 181358 -0.09352 51.51564 741710fd-03e1-4b41-8645-7ebcfc5961ac POINT (532493.308 181259.574)
Camden 270 E09000007 Camden 527491 184283 -0.16291 51.54305 88f3f650-53ac-434d-883e-6235b48d14c4 POINT (527843.276 184641.457)
Hackney 275 E09000012 Hackney 534560 185787 -0.06046 51.55493 a41cf0fc-95ed-4823-a9db-7b8bc60f421b POINT (534363.476 185492.867)
Islington 282 E09000019 Islington 531160 184645 -0.10990 51.54547 43a872fc-e401-45af-a2e6-3b2befc98837 POINT (531135.378 184944.643)
Tower Hamlets 293 E09000030 Tower Hamlets 536340 181452 -0.03648 51.51555 54f82e61-dc15-42cb-b777-681b4326f855 POINT (536424.392 181625.502)
Westminster 296 E09000033 Westminster 528268 180871 -0.15295 51.51222 a52ff0f8-4ac0-4417-8338-01be3cbb516a POINT (527697.219 181041.502)
# plot the centroids of the inner London boroughs
fig, ax = plt.subplots(figsize=(6, 6))
gdf_c_london.plot(ax=ax, color='lightgrey', edgecolor='black')
# set each name for the borough
for x, y, label in zip(gdf_c_london.geometry.x, gdf_c_london.geometry.y, gdf_c_london['LAD24NM']):
    ax.text(x, y+200, label, fontsize=8, ha='center', va='center')
ax.axis('off')
plt.show()
_images/089d6ec6428501e8c11f13efb2322760d59fa776321776784a350255660135ba.png
# Compute the spatial weights using the distance-based method, the threshold with 5000 means 5000 meters as the CRS is EPSG:27700
w_d = weights.DistanceBand.from_dataframe(gdf_c_london, threshold=5000, binary=True, use_index='True')
df_wd = w_d.to_adjlist(drop_islands=True)
df_wd
focal neighbor weight
0 City of London Hackney 1.0
1 City of London Islington 1.0
2 City of London Tower Hamlets 1.0
3 City of London Westminster 1.0
4 Camden Islington 1.0
5 Camden Westminster 1.0
6 Hackney City of London 1.0
7 Hackney Islington 1.0
8 Hackney Tower Hamlets 1.0
9 Islington City of London 1.0
10 Islington Camden 1.0
11 Islington Hackney 1.0
12 Tower Hamlets City of London 1.0
13 Tower Hamlets Hackney 1.0
14 Westminster City of London 1.0
15 Westminster Camden 1.0
df_wd_matrix = df_wd.pivot(index='focal', columns='neighbor', values='weight').fillna(0)
df_wd_matrix
neighbor Camden City of London Hackney Islington Tower Hamlets Westminster
focal
Camden 0.0 0.0 0.0 1.0 0.0 1.0
City of London 0.0 0.0 1.0 1.0 1.0 1.0
Hackney 0.0 1.0 0.0 1.0 1.0 0.0
Islington 1.0 1.0 1.0 0.0 0.0 0.0
Tower Hamlets 0.0 1.0 1.0 0.0 0.0 0.0
Westminster 1.0 1.0 0.0 0.0 0.0 0.0

Inverse distance method

Compute the spatial weights using the inverse distance-based method, we need to set the binary parameter to False to get the continuous weights, then the weights are computed with a parameter alpha which is the power of the distance, if we set alpha = -1, then the weights are computed as \(w_{i,j} = \frac{1}{d_{i,j}}\).

alpha: distance decay parameter for weight (default -1.0) if alpha is positive the weights will not decline with distance.

# Compute the spatial weights using the inverse distance-based method, the threshold with 5000 means 5000 meters as the CRS is EPSG:27700
w_inverse_d = weights.DistanceBand.from_dataframe(gdf_c_london, threshold=5000, binary=False, alpha= -1, use_index='True')
df_w_in_d = w_inverse_d.to_adjlist(drop_islands=True)
df_w_in_d
focal neighbor weight
0 City of London Hackney 0.000216
1 City of London Islington 0.000255
2 City of London Tower Hamlets 0.000253
3 City of London Westminster 0.000208
4 Camden Islington 0.000302
5 Camden Westminster 0.000278
6 Hackney City of London 0.000216
7 Hackney Islington 0.000305
8 Hackney Tower Hamlets 0.000228
9 Islington City of London 0.000255
10 Islington Camden 0.000302
11 Islington Hackney 0.000305
12 Tower Hamlets City of London 0.000253
13 Tower Hamlets Hackney 0.000228
14 Westminster City of London 0.000208
15 Westminster Camden 0.000278
df_w_in_d_matrix = df_w_in_d.pivot(index='focal', columns='neighbor', values='weight').fillna(0)
df_w_in_d_matrix
neighbor Camden City of London Hackney Islington Tower Hamlets Westminster
focal
Camden 0.000000 0.000000 0.000000 0.000302 0.000000 0.000278
City of London 0.000000 0.000000 0.000216 0.000255 0.000253 0.000208
Hackney 0.000000 0.000216 0.000000 0.000305 0.000228 0.000000
Islington 0.000302 0.000255 0.000305 0.000000 0.000000 0.000000
Tower Hamlets 0.000000 0.000253 0.000228 0.000000 0.000000 0.000000
Westminster 0.000278 0.000208 0.000000 0.000000 0.000000 0.000000

This distance-based with band/ thereshold of 5000 meters means that if the distance between two units is less than 5000 meters, then they are neighbors. This brings us a different spatial weight matrix compared to the contiguity queen method. When we check the neighbors of Camden in the distance-based method, only Islinton and Westminster points are its neighbors as they are within 5000 meters to the point of Camden.

w_d.neighbors['Camden']
['Islington', 'Westminster']

Note that there are lots of different ways to compute the spatial weights based on different data formats (not only the GeoDataframe). you can find some other functions in the libpysal library in this Page (click) and a general tutorial.

4.2 Global Spatial Autocorrelation#

Global spatial autocorrelation answers the question: Is there overall clustering across the whole map?

Global Moran’s I :

Moran’s I (Moran, 1950) is a measure of global spatial autocorrelation. It is a single value that summarizes the degree of spatial autocorrelation in the entire dataset:

\[ I = \frac{n}{W} \cdot \frac{\sum_i \sum_j w_{ij} (x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2} \]

Where:

  • \( n \): Number of spatial units

  • \( x_i \): Value at location \( i \)

  • \( \bar{x} \): Mean of the variable

  • \( w_{ij} \): Spatial weight between location \( i \) and \( j \)

  • \( W = \sum_i \sum_j w_{ij} \): Sum of all spatial weights

\(I > 0\): Positive spatial autocorrelation.

\(I < 0\): Negative spatial autocorrelation.

\(I = 0\): No spatial autocorrelation.

# Import the required libraries
import esda

Now, we will use the England LA with population data to compute the global Moran’s I for population density.

# the population data is in the `data` folder
df_pop = gpd.read_file('data/populaiton_2022_all_ages_uk.csv')
# merge the population data with the GeoDataFrame
gdf_uk_la_pop = pd.merge(gdf_uk_la, df_pop, left_on='LAD24CD', right_on='Code', how='left')
# select the england LAs
gdf_england_la_pop = gdf_uk_la_pop[gdf_uk_la_pop['LAD24CD'].str.startswith('E')]
# transform the CRS to EPSG:27700 (British National Grid)
gdf_england_la_pop = gdf_england_la_pop.to_crs(epsg=27700)
# rename the columns
gdf_england_la_pop = gdf_england_la_pop.rename(columns={'All ages': 'Population'})
gdf_england_la_pop['Population'] = gdf_england_la_pop['Population'].astype('int')
# calculate the population density: population per square kilometer
gdf_england_la_pop['Population_density'] = gdf_england_la_pop.apply(lambda x: x['Population']/ (x.geometry.area/1000000), axis=1)
gdf_england_la_pop
FID LAD24CD LAD24NM LAD24NMW BNG_E BNG_N LONG LAT GlobalID geometry Code Name Geography Population Population_density
0 1 E06000001 Hartlepool 447161 531473 -1.27017 54.67613 fcc85d99-da7a-440c-aa80-b6e2a5353efe POLYGON ((447851.213 537036.01, 448290.115 536... E06000001 Hartlepool Unitary Authority 93861 994.428583
1 2 E06000002 Middlesbrough 451141 516887 -1.21100 54.54468 0d2753c9-b44b-44e2-9b20-f70fd8a63e1a POLYGON ((450791.114 520932.509, 450530.512 52... E06000002 Middlesbrough Unitary Authority 148285 2735.466726
2 3 E06000003 Redcar and Cleveland 464330 519596 -1.00657 54.56752 91793ade-9ca5-46dc-8591-7bd7fb61b233 POLYGON ((456987.212 526324.904, 459206.816 52... E06000003 Redcar and Cleveland Unitary Authority 137175 561.248283
3 4 E06000004 Stockton-on-Tees 444940 518179 -1.30665 54.55688 ae16dbb0-8ebe-49c6-bff9-f3479669fd4f POLYGON ((445378.413 526449.708, 445536.511 52... E06000004 Stockton-on-Tees Unitary Authority 199966 976.155131
4 5 E06000005 Darlington 428029 515648 -1.56836 54.53534 f21d0ade-ae2f-4bff-9d85-1048cd649b5f POLYGON ((423240.211 524970.902, 423783.816 52... E06000005 Darlington Unitary Authority 109469 553.554052
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
291 292 E09000029 Sutton 527357 163639 -0.17227 51.35755 1b5d57bc-1cb5-43e1-85ba-3bb45da0b7cc POLYGON ((527423.64 167442.817, 527757.283 167... E09000029 Sutton London Borough 210053 4757.046484
292 293 E09000030 Tower Hamlets 536340 181452 -0.03648 51.51555 54f82e61-dc15-42cb-b777-681b4326f855 POLYGON ((536480.759 184698.334, 536779.206 18... E09000030 Tower Hamlets London Borough 325789 16494.761141
293 294 E09000031 Waltham Forest 537328 190278 -0.01880 51.59462 97cd624f-2c02-4f12-a78e-7aedc777673f POLYGON ((539923.888 193889.058, 539562.889 19... E09000031 Waltham Forest London Borough 275887 7119.047670
294 295 E09000032 Wandsworth 525152 174138 -0.20022 51.45240 9c51587a-4315-449d-bcc7-e65625e402f7 POLYGON ((528637.184 175213.943, 528758.922 17... E09000032 Wandsworth London Borough 329035 9499.579457
295 296 E09000033 Westminster 528268 180871 -0.15295 51.51222 a52ff0f8-4ac0-4417-8338-01be3cbb516a POLYGON ((526773.283 183663.262, 527373.359 18... E09000033 Westminster London Borough 211365 9843.368835

296 rows × 15 columns

# plot the population data
fig, ax = plt.subplots(figsize=(6, 6))
gdf_england_la_pop.plot(column='Population_density', ax=ax, legend=True, cmap='RdPu', edgecolor='black', linewidth=0.1)
ax.axis('off')
plt.title('Population density in England LAs (per $km^2$)')
plt.show()
_images/ae3d9e7ca9953842be0c76c9f3e26e40f88c54c2dfde50208dcb583502789cd4.png
# compute the spatial weights using the Queen contiguity method
w_england_la_pop = weights.Queen.from_dataframe(gdf_england_la_pop, use_index='True')
# print the islands
for i in w_england_la_pop.islands:
    print(gdf_england_la_pop.loc[i, 'LAD24NM'])
Isle of Wight
Isles of Scilly
# compute the global Moran's I
mi = esda.Moran(gdf_england_la_pop['Population_density'], w_england_la_pop, two_tailed=True)
('WARNING: ', 43, ' is an island (no neighbors)')
('WARNING: ', 49, ' is an island (no neighbors)')
print('Moran\'s I:', mi.I)
print('p-value:', mi.p_sim)
Moran's I: 0.6242014303262433
p-value: 0.001

Interpretation of the Moran’s I: the result of the Moran’s I is 0.62, which indicates a positive spatial autocorrelation, meaning that the population density in England LAs is Strongly clustered together. The p-value is under 0.05, which indicates that the result is statistically significant at the 0.05 level. We can use the mi.sim to get the distribution of the Moran’s I under the null hypothesis of no spatial autocorrelation. You can find more details about the Moran’s I in the libpysal Moran documentation.

4.3 Local Spatial Autocorrelation / Association#

Local spatial autocorrelation answers the questions: Is there clustering in a specific area? Which specific areas are clustered or are spatial outlier?

LISA (Local Indicators of Spatial Association) is a statistical technique that decomposes global spatial autocorrelation measures into location-specific contributions. It identifies significant spatial clusters or spatial outliers by measuring the correlation between a location and its neighbors.

Key aspects of LISA:

  • Helps identify local patterns of spatial association (clusters and outliers)

  • Indicates areas with significant high or low values

  • Shows locations with similar or dissimilar values among neighbors

  • Common LISA statistics include Local Moran’s I and Getis-Ord Gi*

  • Useful for finding hot spots, cold spots, and spatial outliers

  • Can be visualized through significance maps and cluster maps

4.3.1 Local Moran’s I#

Local Moran’s I is a measure of local spatial autocorrelation. It is a value for each spatial unit that summarizes the degree of spatial autocorrelation in the local neighborhood of that unit:

\[ I_i = \frac{(x_i - \bar{x})}{S^2} \cdot \sum_j w_{ij} (x_j - \bar{x}) \]

Where:

  • \( I_i \): Local Moran’s I for location \( i \) , (\( I_i \in (-\infty, +\infty) \))

  • \( x_i \): Value at location \( i \)

  • \( \bar{x} \): Mean of the variable

  • \( S\): Standard deviation of the variable

  • \( S^2 \): Variance of the variable

  • \( w_{ij} \): Spatial weight between location \( i \) and \( j \)

  • \( j \): Neighbors of location \( i \)

Interpretation:

\( I_i > 0 \): Positive spatial autocorrelation (high values surrounded by high values or low values surrounded by low values)

\(I_i < 0 \): Negative spatial autocorrelation (high values surrounded by low values or low values surrounded by high values)

\(I_i = 0 \): No spatial autocorrelation

We continue to use the England LA with population data to compute the local Moran’s I.

# compute the local Moran's I
lm = esda.Moran_Local(gdf_england_la_pop['Population_density'], w_england_la_pop)

Note that the local Moran’s I is a distribution of values, not a single value like the global Moran’s I. The local Moran’s I can be used to identify the spatial clusters and outliers in the data.

# plot the distribution of local Moran's I using histogram and kde by seaborn
import seaborn as sns
fig, ax = plt.subplots(figsize=(4, 4))
sns.histplot(lm.Is, kde=True, ax=ax)
ax.set_xlabel('Local Moran\'s I')
ax.set_ylabel('Density')
ax.set_title('Distribution of Local Moran\'s I')
plt.show()
_images/72764abbd5e15ca9d5878a121e2331c28c97d7a3d1841bd87b405bb912ab9845.png

For a better visualization, we should set the vmin and vmax in gdf.plot() to the min and max of the local Moran’s I value for building the colormap. This can make sure that the 0 is at the middle of the diverging colormap showing while color, and the color is symmetric to show the positive and negative values.

# mapping the local Moran's I
gdf_england_la_pop['local_moran'] = lm.Is
gdf_england_la_pop['p_value'] = lm.p_sim
gdf_england_la_pop['local_moran'] = gdf_england_la_pop['local_moran'].astype('float')
gdf_england_la_pop['local_moran'] = gdf_england_la_pop['local_moran'].fillna(0)
# plot the local Moran's I
fig, ax = plt.subplots(figsize=(8, 8))
# set the area with p values >=0.05 as the white
gdf_england_la_pop[~(gdf_england_la_pop['p_value'] < 0.05)].plot(ax=ax,color='white',edgecolor='black',linewidth=0.05)
# set the area with p values <0.05 with cmap
gdf_england_la_pop[gdf_england_la_pop['p_value'] < 0.05].plot(column='local_moran', ax=ax, legend=True,
                        cmap='seismic',
                        vmin=-15, vmax=15, # set the vmin and vmax to the min and max of the local Moran's I for cmap
                        edgecolor='black',
                        linewidth=0.05)
ax.axis('off')
# set the title
plt.title('Local Moran\'s I for Population Density in England LAs')
plt.show()
_images/39cea1e0ccb1438048d856b9e3c0ea96851568f8e633ad6203ff7005fd507f89.png

rendering: In the map, the red areas indicate high-high clusters (high population surrounded by high population), and the blue areas indicate low-low clusters (low population surrounded by low population). The white areas indicate outliers (high population surrounded by low population or low population surrounded by high population). We can found that the London boroughs are high-high clusters, and the areas in the north of England are low-low clusters.

You can find more details about the Local Moran’s I in the libpysal Local Moran I documentation.

4.3.2 Getis-Ord Gi*#

Getis-Ord Gi* is a measure of local spatial autocorrelation that identifies clusters of high or low values. It is similar to Local Moran’s I, but it is more sensitive to the presence of outliers. It is used to identify hot spots (high values) and cold spots (low values) in the data:

\[ G_i^* = \frac{\sum_{j=1}^{n} w_{ij} x_j - \bar{x} \sum_{j=1}^{n} w_{ij}} {S \sqrt{\frac{n \sum_{j=1}^{n} w_{ij}^2 - \left( \sum_{j=1}^{n} w_{ij} \right) ^2 } {n - 1}}} \]

Where:

  • \(G_i^*\): Getis-Ord Gi* index/value

  • \( x_j \): Value at location \( j \)

  • \( n \): the numbers of spatial units

  • \( \bar{x} \): Mean of the variable

  • \( S\): Standard deviation of the variable

  • \( w_{ij} \): Spatial weight between location \( i \) and \( j \)

  • \( j \): Neighbors of location \( i \)

Interpretation:

The \( G_i^* \) statistic is a z-score that indicates the presence of spatial clusters (hot or cold spots) around a location \( i \).

  • \( G_i^* > 0 \): Indicates clustering of high values near location –> Hot spot if p <0.05.

  • \( G_i^* < 0 \): Indicates clustering of low values near location \( i \) –> Cold spot if p <0.05.

  • \( G_i^* \approx 0 \): No significant clustering around location \( i \) –> Spatial randomness.

Note: The raw G statistic measures the spatial association of neighboring values excluding the value at the location itself.

\[ G_i = \frac{ \sum_{j \ne i} w_{ij} x_j }{ \sum_{j} x_j } \]

We can use the esda library as a part of pysal to compute the Getis-Ord Gi* for detecting crime hotspots in London. Here we use the new spatial unit of analysis from UK census, which called LSOA (Lower Layer Super Output Area) which is a small area of about 1500 people on average in the UK. Crime data is from UK police open data, and LSOA boundary data is from ONS.

# read the lsoa boundary data of England and Wales
gdf_lsoa_uk = gpd.read_file('data/Lower_layer_Super_Output_Areas_(December_2021)_Boundaries_EW_BFC_(V10).geojson')
gdf_lsoa_uk
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry
0 1 E01000001 City of London 001A 532123 181632 51.51817 -0.097150 1.298653e+05 2635.767993 c625aea8-6d73-4b2a-be76-4d5c44cad9f8 POLYGON ((-0.09665 51.52028, -0.09663 51.52025...
1 2 E01000002 City of London 001B 532480 181715 51.51883 -0.091970 2.284198e+05 2707.816821 52c878e9-ac68-4886-b4a8-fea9cd241a70 POLYGON ((-0.08967 51.52069, -0.08971 51.52058...
2 3 E01000003 City of London 001C 532239 182033 51.52174 -0.095330 5.905420e+04 1224.573160 b9d8faca-d489-478d-8ce6-acaf76186d7d POLYGON ((-0.0965 51.52295, -0.09644 51.52282,...
3 4 E01000005 City of London 001E 533581 181283 51.51469 -0.076280 1.895777e+05 2275.805344 15e1417d-537c-4845-9820-fc7596bd59b0 POLYGON ((-0.07568 51.51575, -0.07539 51.51555...
4 5 E01000006 Barking and Dagenham 016A 544994 184274 51.53875 0.089317 1.465370e+05 1966.092607 8a6c4ee0-c0ff-4736-9cfa-fb12a6d50da0 POLYGON ((0.09125 51.53905, 0.0915 51.5389, 0....
... ... ... ... ... ... ... ... ... ... ... ... ...
35667 35668 W01002036 Vale of Glamorgan 005G Bro Morgannwg 005G 317939 172435 51.44494 -3.182180 3.886975e+05 3584.168435 5e3f978b-66c7-4828-a107-321cb22354e0 POLYGON ((-3.18412 51.44728, -3.18383 51.44727...
35668 35669 W01002037 Vale of Glamorgan 005H Bro Morgannwg 005H 318527 172406 51.44476 -3.173710 2.843506e+05 3225.349982 b7aae689-c82d-4ddb-b899-4ba7918dac32 POLYGON ((-3.16647 51.44662, -3.16682 51.44628...
35669 35670 W01002038 Vale of Glamorgan 014G Bro Morgannwg 014G 306491 167360 51.39754 -3.345520 3.495213e+06 12630.457940 144e5aed-4099-4dff-a48e-9f8c270d916b POLYGON ((-3.34759 51.4098, -3.34771 51.40942,...
35670 35671 W01002039 Vale of Glamorgan 014H Bro Morgannwg 014H 306564 166023 51.38553 -3.344120 6.505253e+05 3712.034400 bcf06740-abe0-4052-9ec2-7cad09d42195 POLYGON ((-3.34039 51.38935, -3.34035 51.38921...
35671 35672 W01002040 Vale of Glamorgan 015F Bro Morgannwg 015F 311423 167179 51.39670 -3.274600 6.811834e+05 4888.403666 672a2fee-23b5-4a26-9a9d-901fc12f649e POLYGON ((-3.26521 51.40005, -3.2652 51.40005,...

35672 rows × 12 columns

There are lots of methods to select the LSAOs for LAs.

  • We can use the LA names to select the LSOAs ([‘LSOA21NM’]) from gdf_lsoa_uk;

  • We can use the LSOA codes to select the LSOAs ([‘LSOA21CD’]) from gdf_lsoa_uk, but you need to check the LSOA codes in the LAs through the look_up file.

  • We can use implement the sjoin between the LA geo boundary and LSOA boundary to select the LSOAs for LAs.

Here, We use the London LA names to select the LSOAs from gdf_lsoa_uk.

gdf_uk_london = gdf_uk_la[gdf_uk_la['LAD24CD'].str.startswith('E09')]
London_LA_names = gdf_uk_london.LAD24NM.unique()
print(London_LA_names)
['City of London' 'Barking and Dagenham' 'Barnet' 'Bexley' 'Brent'
 'Bromley' 'Camden' 'Croydon' 'Ealing' 'Enfield' 'Greenwich' 'Hackney'
 'Hammersmith and Fulham' 'Haringey' 'Harrow' 'Havering' 'Hillingdon'
 'Hounslow' 'Islington' 'Kensington and Chelsea' 'Kingston upon Thames'
 'Lambeth' 'Lewisham' 'Merton' 'Newham' 'Redbridge' 'Richmond upon Thames'
 'Southwark' 'Sutton' 'Tower Hamlets' 'Waltham Forest' 'Wandsworth'
 'Westminster']
'City of London 001E'
'City of London 001E'
# Extract the LA name from LSOA name (' '.join(x.split(' ')[:-1] as LSOA named as 'LA NAME 001A' ) and check if it's in London_LA_names
gdf_lsoa_uk['LA'] = gdf_lsoa_uk['LSOA21NM'].apply(lambda x: 'GL' if ' '.join(x.split(' ')[:-1]) in London_LA_names else 'NO')
# select the LSOAs in London
gdf_lsoa_london = gdf_lsoa_uk[gdf_lsoa_uk['LA'] == 'GL']
# transform the CRS to EPSG:27700 (British National Grid)
gdf_lsoa_london = gdf_lsoa_london.to_crs(epsg=27700)
# 4994 LSOAs in London
gdf_lsoa_london
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry LA
0 1 E01000001 City of London 001A 532123 181632 51.51817 -0.097150 1.298653e+05 2635.767993 c625aea8-6d73-4b2a-be76-4d5c44cad9f8 POLYGON ((532151.55 181867.437, 532152.512 181... GL
1 2 E01000002 City of London 001B 532480 181715 51.51883 -0.091970 2.284198e+05 2707.816821 52c878e9-ac68-4886-b4a8-fea9cd241a70 POLYGON ((532634.509 181926.019, 532632.06 181... GL
2 3 E01000003 City of London 001C 532239 182033 51.52174 -0.095330 5.905420e+04 1224.573160 b9d8faca-d489-478d-8ce6-acaf76186d7d POLYGON ((532153.715 182165.158, 532158.262 18... GL
3 4 E01000005 City of London 001E 533581 181283 51.51469 -0.076280 1.895777e+05 2275.805344 15e1417d-537c-4845-9820-fc7596bd59b0 POLYGON ((533619.075 181402.367, 533639.881 18... GL
4 5 E01000006 Barking and Dagenham 016A 544994 184274 51.53875 0.089317 1.465370e+05 1966.092607 8a6c4ee0-c0ff-4736-9cfa-fb12a6d50da0 POLYGON ((545126.865 184310.841, 545145.226 18... GL
... ... ... ... ... ... ... ... ... ... ... ... ... ...
33711 33712 E01035718 Westminster 019G 527211 180107 51.50559 -0.168450 2.671900e+06 10740.132994 738fac0f-b4ee-47dc-a87c-2c0f8f147977 POLYGON ((528044.961 180617.988, 528046.356 18... GL
33712 33713 E01035719 Westminster 021F 530127 178755 51.49277 -0.126960 9.294961e+04 2013.141792 7abdbf89-97f8-43f3-96b1-d162e667bed2 POLYGON ((530267.012 178811.303, 530265.762 17... GL
33713 33714 E01035720 Westminster 021G 530009 178440 51.48997 -0.128770 1.061637e+05 2229.186337 653fb71a-b240-4e54-875d-0ec1f2f9cc2a POLYGON ((529856.163 178761.75, 529856.871 178... GL
33714 33715 E01035721 Westminster 023H 528403 178364 51.48965 -0.151920 2.168432e+05 2738.810595 ed04243a-989b-4bde-aeae-82d40ddffd10 POLYGON ((528641.829 178630.889, 528635.702 17... GL
33715 33716 E01035722 Westminster 024G 529546 178076 51.48681 -0.135570 1.222497e+05 2211.190430 fd9bcb9b-a2e1-4381-a473-72bc1b5cf0c9 POLYGON ((529606.375 178302.448, 529606.654 17... GL

4994 rows × 13 columns

# read the csvs in the data/police_met_city_2024 folder
import os
import glob
import pandas as pd

# get the csv files in the 2024-01 to 2014-12 folders in the data/police_met_city_2024 folder
path = 'data/police_met_city_2024'
# to get all the csv files in the path, "**/*.csv" means all the csv files in the subfolders; glob means to get the files in the path
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)
# read the csv files and concatenate them into a single dataframe
df_crime = pd.concat((pd.read_csv(f) for f in all_files), ignore_index=True)
# we can save it to one csv file
# df_crime.to_csv('data/crime_2024_call_police_met_city.csv', index=False)
# check the columns
df_crime.head()
print(len(df_crime))
1149321

Now, we need conbime to df_crime (points) to gdf_lsoa_london (polygons), we can use sjoin to do this (or use the LSOA codes to do this).

# get gdf from the df_crime with x and y coordinates
gdf_crime = gpd.GeoDataFrame(df_crime, geometry=gpd.points_from_xy(df_crime['Longitude'], df_crime['Latitude']), crs='EPSG:4326')
# transform the CRS to EPSG:27700 (British National Grid)
gdf_crime = gdf_crime.to_crs(epsg=27700)
# spatial join the gdf_crime and gdf_lsoa_london
gdf_crime = gpd.sjoin(gdf_crime, gdf_lsoa_london, how='inner', predicate='intersects')

We can find there are 1,149,321 - 1,144,178 = 4,143 dropped as the pts are not intersected with the polygons.

# check the number of rows
len(gdf_crime)
# check the columns
gdf_crime.head()
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type ... LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID LA
0 b74d06161e2425ef2abf28345aa962fa862753669659d3... 2024-09 City of London Police City of London Police -0.107682 51.517786 On or near B521 E01000917 Camden 027C Robbery ... Camden 027C 531169 181859 51.52043 -0.11080 75063.537949 2405.390222 7205eeed-3cf1-4502-9073-4e8afb75ea58 GL
1 ab9f963604d03486ffbefec80f9f9296c7f65ecd4cd047... 2024-09 City of London Police City of London Police -0.107682 51.517786 On or near B521 E01000917 Camden 027C Violence and sexual offences ... Camden 027C 531169 181859 51.52043 -0.11080 75063.537949 2405.390222 7205eeed-3cf1-4502-9073-4e8afb75ea58 GL
2 7eaad3255f757cf5073dfd3712193f4497e788b3f6519f... 2024-09 City of London Police City of London Police -0.112096 51.515942 On or near Nightclub E01000914 Camden 028B Violence and sexual offences ... Camden 028B 530733 181688 51.51899 -0.11715 448043.096497 4946.142062 9e7739b0-5ee1-4923-ad52-648147b8c13c GL
3 71c1e6edfb73383e82ca6acdc1c9c0ef015334b529f298... 2024-09 City of London Police City of London Police -0.111596 51.518281 On or near Chancery Lane E01000914 Camden 028B Violence and sexual offences ... Camden 028B 530733 181688 51.51899 -0.11715 448043.096497 4946.142062 9e7739b0-5ee1-4923-ad52-648147b8c13c GL
4 06e7c6b6541166a8abe71de2593b9576c5e26215e349e4... 2024-09 City of London Police City of London Police -0.098519 51.517332 On or near Little Britain E01000001 City of London 001A Other theft ... City of London 001A 532123 181632 51.51817 -0.09715 129865.314476 2635.767993 c625aea8-6d73-4b2a-be76-4d5c44cad9f8 GL

5 rows × 26 columns

Spatial and temporal aggregation of the data refers to the process of combining and summarizing data across different geographic areas (spatial) and time periods (temporal):

Spatial Aggregation:

  • Combining data from smaller geographic units into larger areas

  • Example: Aggregating crime incidents from LSOA level to London borough level

Temporal Aggregation:

  • Combining data across different time periods

  • Example: Converting daily crime data into monthly or yearly totals

# here we aggregate the all crime data to LSOA and month level using groupby
df_crime_agg = gdf_crime.groupby(['LSOA21CD', 'Month']).agg({'Crime ID': 'count'}).reset_index().rename(columns={'Crime ID': 'Count'})
# select May 2024 of all crimes for analysis
df_crime_agg_may = df_crime_agg[df_crime_agg['Month'] == '2024-05']
# check the columns
df_crime_agg_may
LSOA21CD Month Count
4 E01000001 2024-05 11
16 E01000002 2024-05 38
28 E01000003 2024-05 13
40 E01000005 2024-05 72
52 E01000006 2024-05 10
... ... ... ...
59503 E01035718 2024-05 156
59515 E01035719 2024-05 9
59527 E01035720 2024-05 3
59539 E01035721 2024-05 61
59551 E01035722 2024-05 7

4965 rows × 3 columns

print('The numbers of LSOAs without crime in May 2024:', len(gdf_lsoa_london) - len(df_crime_agg_may))
The numbers of LSOAs without crime in May 2024: 29
# We use esda to compute the Getis-Ord Gi* for detecting crime hotspots in London.
# first, we need to merge the gdf_crime_agg_may with gdf_lsoa_london to get the geometry
gdf_crime_agg_may = pd.merge(gdf_lsoa_london[['LSOA21CD', 'geometry']], df_crime_agg_may, on='LSOA21CD', how='left')
# fill na
gdf_crime_agg_may = gdf_crime_agg_may.fillna(0)
gdf_crime_agg_may
LSOA21CD geometry Month Count
0 E01000001 POLYGON ((532151.55 181867.437, 532152.512 181... 2024-05 11.0
1 E01000002 POLYGON ((532634.509 181926.019, 532632.06 181... 2024-05 38.0
2 E01000003 POLYGON ((532153.715 182165.158, 532158.262 18... 2024-05 13.0
3 E01000005 POLYGON ((533619.075 181402.367, 533639.881 18... 2024-05 72.0
4 E01000006 POLYGON ((545126.865 184310.841, 545145.226 18... 2024-05 10.0
... ... ... ... ...
4989 E01035718 POLYGON ((528044.961 180617.988, 528046.356 18... 2024-05 156.0
4990 E01035719 POLYGON ((530267.012 178811.303, 530265.762 17... 2024-05 9.0
4991 E01035720 POLYGON ((529856.163 178761.75, 529856.871 178... 2024-05 3.0
4992 E01035721 POLYGON ((528641.829 178630.889, 528635.702 17... 2024-05 61.0
4993 E01035722 POLYGON ((529606.375 178302.448, 529606.654 17... 2024-05 7.0

4994 rows × 4 columns

# compute the spatial weights using the Queen contiguity method
w_lsoa_london = weights.Queen.from_dataframe(gdf_crime_agg_may, use_index='True')
# compute the Getis-Ord Gi* for the crime data
gi = esda.G_Local(gdf_crime_agg_may['Count'].values, w_lsoa_london)
# print the Getis-Ord Gi* values
gi.Zs
array([ 4.8966406 ,  6.05568957,  1.26792881, ..., -0.20360301,
        0.42941597, -0.17028329])
len(gi.Zs)
4994
# plot the distribution of Getis-Ord Gi* using histogram and kde by seaborn
fig, ax = plt.subplots(figsize=(4, 4))
sns.histplot(gi.Zs, kde=True, ax=ax)
ax.set_xlabel('Getis-Ord Gi*')
ax.set_ylabel('Density')
Text(0, 0.5, 'Density')
_images/9223f7a0182d70e8a1cfbb8f78285769526b8acb1536898af93084724767da4c.png

Z-scores: standardized G statistics that follow a normal distribution under spatial randomness.

P values: P-values from randomization/permutation.

# we can generate a df to show all the atrributes of the Getis-Ord Gi*
df_gi = pd.DataFrame({
    'Zs': gi.Zs,  # Z-scores: standardized G statistics that follow a normal distribution under spatial randomness
    'Gs': gi.Gs,  # Raw G statistics: measure of spatial concentration of attribute values (crime counts)
    'p_sim': gi.p_sim  # P-values from randomization
})
df_gi
Zs Gs p_sim
0 4.896641 0.002149 0.001
1 6.055690 0.002611 0.001
2 1.267929 0.000705 0.012
3 3.965206 0.001779 0.001
4 -0.012130 0.000195 0.355
... ... ... ...
4989 1.016448 0.000605 0.006
4990 0.637040 0.000454 0.030
4991 -0.203603 0.000119 0.232
4992 0.429416 0.000371 0.050
4993 -0.170283 0.000133 0.308

4994 rows × 3 columns

You can use any strategies to plot the gi resluts, here we can plot the results to the map with three parts:

  • Hotspots: High positive Z-score (>0) with p-value < 0.05.

  • Coldspots: High negative Z-score (<0) with p-value < 0.05.

  • We usually use blank color to draw the ‘Not Significant cluster’ (p_value \(\geq\) 0.05).

# Add the Z-scores to the GeoDataFrame  
gdf_crime_agg_may['Z_score'] = gi.Zs
gdf_crime_agg_may['p_sim'] = gi.p_sim
# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 10))
# Plot not significant areas (p_sim >= 0.05)
gdf_crime_agg_may[gdf_crime_agg_may['p_sim'] >= 0.05].plot(
    ax=ax, color='white', edgecolor='black', linewidth=0.05)
# Plot cold spots (negative Z-scores and p_sim < 0.05)
gdf_crime_agg_may[(gdf_crime_agg_may['Z_score'] < 0) & (gdf_crime_agg_may['p_sim'] < 0.05)].plot(
    column='Z_score',
    ax=ax,
    cmap='Blues',
    legend=True,
    legend_kwds={'shrink': 0.35, 'orientation': 'vertical', 'label': 'Negative Z-scores',},
    linewidth=0.05)
# Plot hot spots (positive Z-scores and p_sim < 0.05)
gdf_crime_agg_may[(gdf_crime_agg_may['Z_score'] > 0) & (gdf_crime_agg_may['p_sim'] < 0.05)].plot(
    column='Z_score',
    ax=ax,
    cmap='Reds',
    legend=True,
    legend_kwds={'shrink': 0.35, 'orientation': 'vertical', 'label': 'Positive Z-scores'},
    linewidth=0.05)
ax.axis('off')
plt.title('Crime Hotspots and Coldspots in London LSOAs (Getis-Ord Gi*)\nMay 2024')
plt.show()
_images/832726fc94e96a5ed439200fd44704b0396b6c49c6d27d605278996cc869fef1.png

4.4 Kernel density estimation (KDE) analysis#

Kernel Density Analysis (KDE) is a spatial analysis technique used to estimate the density of features (usually points or lines) across a surface. It creates a smooth, continuous surface that highlights areas of high and low concentration, often referred to as hotspots.

KDE function:

\[ \hat{f}(x) = \frac{1}{n h^2} \sum_{i=1}^{n} K\left(\frac{\|x - x_i\|}{h}\right) \]
  • \( \hat{f}(x) \) is the estimated density at location \( x \), how “intense” the points are around location \(x\) (i.e., the ‘grid cell’ center or pixel).

  • \( n \) is the total number of observations.

  • \( h \) is the bandwidth (smoothing parameter). It controls the “spread” of the kernel. Larger \(h\) = smoother surface; smaller \(h\) = more detailed, but possibly noisy.

  • \( h^2\): Since we are in 2D space, the area scaling is by \( h^2 \). In 1D, it’s just \(h\).

  • \( x \): Target location where we estimate the density (i.e., the ‘grid cell’ center or pixel).

  • \( x_i \): observed data point location.

  • \( \|x - x_i\| \): Euclidean distance between location \(x\) and \(x_i\).

  • \( K \) is the kernel function (e.g., Gaussian), which defines the shape of the influence of each point.

Gaussian KDE:

\[ K(u) = \frac{1}{2\pi} e^{-\frac{1}{2} u^2} \]
  • \(u\): \(u = \|x - x_i\|\);

  • \(e\): Euler’s number (~2.718), base of the natural exponential;

  • \(e^{-\frac{1}{2} u^2}\): Gives the Gaussian bell curve shape;

  • \(\frac{1}{2\pi}\): Normalization constant for 2D Gaussian;

gdf_crime.Month.unique()
array(['2024-09', '2024-07', '2024-06', '2024-01', '2024-08', '2024-12',
       '2024-04', '2024-03', '2024-02', '2024-05', '2024-11', '2024-10'],
      dtype=object)
gdf_crime_london_may = gdf_crime[gdf_crime['Month'] == '2024-05']
# transfer to web mercator
gdf_crime_london_may = gdf_crime_london_may.to_crs(epsg=3857)
gdf_crime_london_may
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type ... LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID LA
850782 eebfcc999f8ab6654adc6bc681e9ea2cd91c5026f895eb... 2024-05 City of London Police City of London Police -0.106345 51.520463 On or near Further/Higher Educational Building E01000916 Camden 027B Theft from the person ... Camden 027B 531300 181907 51.52083 -0.10890 147250.335907 2272.325316 63388f71-c821-452f-bb92-dc8f74f031f3 GL
850783 67cbe410e90ad30207c925457c4d223e890f49dc7b394e... 2024-05 City of London Police City of London Police -0.106220 51.518275 On or near B500 E01000916 Camden 027B Theft from the person ... Camden 027B 531300 181907 51.52083 -0.10890 147250.335907 2272.325316 63388f71-c821-452f-bb92-dc8f74f031f3 GL
850784 a86f33f326d3d9a1f8c3972520e11717766029031efb90... 2024-05 City of London Police City of London Police -0.107682 51.517786 On or near B521 E01000917 Camden 027C Drugs ... Camden 027C 531169 181859 51.52043 -0.11080 75063.537949 2405.390222 7205eeed-3cf1-4502-9073-4e8afb75ea58 GL
850785 7a0554ddf2546c70556b59f8f7ca33099c3bd88fea79c7... 2024-05 City of London Police City of London Police -0.107682 51.517786 On or near B521 E01000917 Camden 027C Other theft ... Camden 027C 531169 181859 51.52043 -0.11080 75063.537949 2405.390222 7205eeed-3cf1-4502-9073-4e8afb75ea58 GL
850786 NaN 2024-05 City of London Police City of London Police -0.111596 51.518281 On or near Chancery Lane E01000914 Camden 028B Anti-social behaviour ... Camden 028B 530733 181688 51.51899 -0.11715 448043.096497 4946.142062 9e7739b0-5ee1-4923-ad52-648147b8c13c GL
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
948781 26f6034c0c51ca869a66a660a4cd3d9c548c6a48f5f7c6... 2024-05 Metropolitan Police Service Metropolitan Police Service -0.135320 51.487600 On or near St George'S Square E01035722 Westminster 024G Other theft ... Westminster 024G 529546 178076 51.48681 -0.13557 122249.696388 2211.190430 fd9bcb9b-a2e1-4381-a473-72bc1b5cf0c9 GL
948782 ac16503056d8b26a3c0b4f407b7a632d8a4a2bee435838... 2024-05 Metropolitan Police Service Metropolitan Police Service -0.136284 51.485152 On or near Petrol Station E01035722 Westminster 024G Other theft ... Westminster 024G 529546 178076 51.48681 -0.13557 122249.696388 2211.190430 fd9bcb9b-a2e1-4381-a473-72bc1b5cf0c9 GL
948783 88bf1a5d62b5201749e67773eb1108c13971d6f8067b86... 2024-05 Metropolitan Police Service Metropolitan Police Service -0.135320 51.487600 On or near St George'S Square E01035722 Westminster 024G Robbery ... Westminster 024G 529546 178076 51.48681 -0.13557 122249.696388 2211.190430 fd9bcb9b-a2e1-4381-a473-72bc1b5cf0c9 GL
948784 3598323197468c36768d7a72a8015c721ec1a65fbd1bdf... 2024-05 Metropolitan Police Service Metropolitan Police Service -0.136284 51.485152 On or near Petrol Station E01035722 Westminster 024G Theft from the person ... Westminster 024G 529546 178076 51.48681 -0.13557 122249.696388 2211.190430 fd9bcb9b-a2e1-4381-a473-72bc1b5cf0c9 GL
948785 19f7be6ed6c32479c4d138650ba5d19ee5c76144a390f1... 2024-05 Metropolitan Police Service Metropolitan Police Service -0.137230 51.486309 On or near Parking Area E01035722 Westminster 024G Violence and sexual offences ... Westminster 024G 529546 178076 51.48681 -0.13557 122249.696388 2211.190430 fd9bcb9b-a2e1-4381-a473-72bc1b5cf0c9 GL

97636 rows × 26 columns

fig, ax = plt.subplots(figsize=(8, 8))
gdf_crime_london_may.plot(ax=ax, markersize=0.03)
cx.add_basemap(ax, crs="EPSG:3857", source=cx.providers.CartoDB.Positron, zoom=10)
ax.axis('off')
plt.show()
_images/c8d9b2fc6430fe746d4a874d7713a347e37549a15dcb265c254381bea1758dbf.png

We use seaborn.kdeplot to plot the KDE analysis. in kedplot, it inherently use the gaussian_kde() in SciPy to caculate KDE, then it evaluates the KDE on the grid created.

fig, ax = plt.subplots(figsize=(12, 8))
kde = sns.kdeplot(ax=ax, x=gdf_crime_london_may.geometry.x,
            y=gdf_crime_london_may.geometry.y,
            n_levels=50, # how many contour lines or shaded bands are drawn in a 2D KDE plot
            shade=True,
            alpha=0.5,
            cmap="viridis_r",
            bw_adjust=0.6, # Adjust the bandwidth (lower values = narrower kernel)
            gridsize=300, # Number of points on each dimension of the evaluation grid.
             )
# Add colorbar manually
cbar = plt.colorbar(kde.collections[0], ax=ax, shrink=0.75)
cbar.set_label("KDE")
cx.add_basemap(ax, crs="EPSG:3857", source=cx.providers.CartoDB.Positron, zoom=10)
ax.axis('off')
plt.show()
_images/272e328a0aede18fe600881216b4f09d321b8c059aab77d90b198f5d8e89d05c.png

np.mgrid is a mesh-grid generator in NumPy. It creates coordinate matrices (grids) from coordinate vectors, typically used for evaluating functions over a 2D area.

from scipy.stats import gaussian_kde
import numpy as np
# Extract coordinates
x = gdf_crime_london_may.geometry.x
y = gdf_crime_london_may.geometry.y
point_xy = np.vstack([x, y]) # vertical stack in NumPy, It takes a sequence of arrays and stacks them vertically (row-wise) to make a new 2D array.

# Compute KDE
ke = gaussian_kde(point_xy)
# Create a grid over the area
xmin, xmax = x.min(), x.max()
ymin, ymax = y.min(), y.max()
#xmin:xmax:300j means: create 300 points evenly spaced from xmin to xmax.
#ymin:ymax:300j means: create 300 points evenly spaced from ymin to ymax.
#The j here is not the imaginary unit, but a special notation in NumPy that means number of points rather than step size.
xx, yy = np.mgrid[xmin:xmax:300j, ymin:ymax:300j]
# .ravel() is a NumPy method that flattens the array into a 1D array by stacking all the rows
grid_coords = np.vstack([xx.ravel(), yy.ravel()])
# Evaluate KDE on created grid
zz = ke(grid_coords).reshape(xx.shape)

# Plot
fig, ax = plt.subplots(figsize=(8, 8))
# zz.T transpose of the array
im = ax.imshow(zz.T, extent=[xmin, xmax, ymin, ymax], origin='lower', cmap='viridis_r', alpha=0.8)
plt.colorbar(im, label='Density', shrink=0.6)
ax.set_axis_off()
plt.show()
_images/da5083a7991e66def2db148f9877f8aabc2a4a0047c52318a89926067a32de3c.png

Additional source:

KDE can be calculated by SKlearn.

Folium also provides Heatmap Plugins to create the interactive heat maps.

Quiz#

  • Q1: Identify all Lower Layer Super Output Areas (LSOAs) within “Kingston upon Hull”. Output relevant information including a map visualization and the total number of LSOAs in Hull. Then, compute the spatial weights matrix (Queen method) for these LSOAs. (UK LSOAs data can be found in data/Lower_layer_Super_Output_Areas_(December_2021)_Boundaries_EW_BFC_(V10).geojson)

  • Q2: Obtain crime data from the 2024 Humberside Police open dataset folder (data/police_humberside_2024). Use the 2024-07 all crime data type for analysis. Perform a spatial join to associate all crime incidents with corresponding LSOA. Aggregating the crime counts to the LSOA level using the total amount of crime as the measurement variable.

  • Q3: Calculate and output Global Moran’s I (LSOA level, not the point-level).

  • Q4: Creating hostpot maps using Local Moran’s I and Getis-Ord Gi* statistics.

Q1 solution

gdf_lsoa_uk = gpd.read_file('data/Lower_layer_Super_Output_Areas_(December_2021)_Boundaries_EW_BFC_(V10).geojson')
gdf_lsoa_uk.head()
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry
0 1 E01000001 City of London 001A 532123 181632 51.51817 -0.097150 129865.314476 2635.767993 c625aea8-6d73-4b2a-be76-4d5c44cad9f8 POLYGON ((-0.09665 51.52028, -0.09663 51.52025...
1 2 E01000002 City of London 001B 532480 181715 51.51883 -0.091970 228419.782242 2707.816821 52c878e9-ac68-4886-b4a8-fea9cd241a70 POLYGON ((-0.08967 51.52069, -0.08971 51.52058...
2 3 E01000003 City of London 001C 532239 182033 51.52174 -0.095330 59054.204697 1224.573160 b9d8faca-d489-478d-8ce6-acaf76186d7d POLYGON ((-0.0965 51.52295, -0.09644 51.52282,...
3 4 E01000005 City of London 001E 533581 181283 51.51469 -0.076280 189577.709503 2275.805344 15e1417d-537c-4845-9820-fc7596bd59b0 POLYGON ((-0.07568 51.51575, -0.07539 51.51555...
4 5 E01000006 Barking and Dagenham 016A 544994 184274 51.53875 0.089317 146536.995750 1966.092607 8a6c4ee0-c0ff-4736-9cfa-fb12a6d50da0 POLYGON ((0.09125 51.53905, 0.0915 51.5389, 0....
# select the LSOAs in Kingston upon Hull
gdf_lsoa_hull = gdf_lsoa_uk[gdf_lsoa_uk['LSOA21NM'].str.contains('Kingston upon Hull')]
# transfer the crs
gdf_lsoa_hull = gdf_lsoa_hull.to_crs(epsg=27700)
print(len(gdf_lsoa_hull))
gdf_lsoa_hull.plot()
168
<Axes: >
_images/6926dfa28af29e150b9874aba78d37bd0b1390a80a2eb86df6952ee5d03ca575.png
gdf_lsoa_hull.head()
FID LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID geometry
12115 12116 E01012756 Kingston upon Hull 025A 507367 430316 53.75817 -0.37294 198939.277603 3497.174098 b1f95abf-436c-4f5a-b6e1-93be9e0eb24d POLYGON ((507433.012 430447.008, 507438.614 43...
12116 12117 E01012757 Kingston upon Hull 025B 508017 429519 53.75088 -0.36336 318088.364120 3716.172876 5f00b0d5-45c1-4529-9e08-881c4c58834d POLYGON ((508039.325 430021.467, 508047.639 42...
12117 12118 E01012758 Kingston upon Hull 018A 507706 430320 53.75814 -0.36780 311920.300018 3772.447527 94cd85a6-b982-4038-9625-333de4b818df POLYGON ((507791.196 430453.312, 507795.719 43...
12118 12119 E01012759 Kingston upon Hull 025C 507184 429662 53.75233 -0.37594 398184.936035 3984.903843 1516f69d-ec4e-4e2b-8f3d-81d5f680f631 POLYGON ((507088.105 429992.978, 507088.215 42...
12119 12120 E01012760 Kingston upon Hull 025D 507714 429795 53.75342 -0.36786 126002.444229 2081.849577 2e8144b1-63d2-4100-b2a5-a1fe09759513 POLYGON ((507913.848 429950.94, 507917.695 429...
# reindex
gdf_lsoa_hull.index = range(len(gdf_lsoa_hull))
# calculate the spatial weights using the Queen contiguity method
w_lsoa_hull = weights.Queen.from_dataframe(gdf_lsoa_hull, use_index='True')
# print the spatial weights as df
df_w_lsoa_hull = w_lsoa_hull.to_adjlist(drop_islands=True)
df_w_lsoa_hull
focal neighbor weight
0 0 2 1.0
1 0 3 1.0
2 0 4 1.0
3 0 35 1.0
4 0 37 1.0
... ... ... ...
887 166 167 1.0
888 167 71 1.0
889 167 156 1.0
890 167 165 1.0
891 167 166 1.0

892 rows × 3 columns

Q2 solution

df_crime_hu_2024 = pd.read_csv('data/police_humberside_2024/2024-07/2024-07-humberside-street.csv')
df_crime_hu_2024
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context
0 db0f44d9523f2ec3713b89dc85de32390d7846d4857b18... 2024-07 Humberside Police Humberside Police -2.268891 53.548199 On or near Rothay Close E01004941 Bury 021A Public order Investigation complete; no suspect identified NaN
1 36be51a29a5b2e9603ef61c7f0915f7a877d94082f418f... 2024-07 Humberside Police Humberside Police -2.290840 53.520355 On or near Venwood Road E01005031 Bury 025A Public order Court result unavailable NaN
2 1b0c316c8c3a4c0f75592f16d4672e24fc4c977eab7ae3... 2024-07 Humberside Police Humberside Police -1.924643 54.544159 On or near Shopping Area E01020854 County Durham 067F Violence and sexual offences Unable to prosecute suspect NaN
3 22c74bdf8fce39c14cd76207bae4fef5719cfb4658bc6f... 2024-07 Humberside Police Humberside Police -0.183283 51.117402 On or near Exchange Road E01031585 Crawley 004C Violence and sexual offences Status update unavailable NaN
4 b6f7155c48ca46c57cd20afcc5c6bb07a1ce1d42618415... 2024-07 Humberside Police Humberside Police -0.915717 53.571778 On or near Low Levels Bank E01007559 Doncaster 008D Vehicle crime Investigation complete; no suspect identified NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
9105 31b720576e28ed22c8ce7d88459354c16bd69203ecc8d5... 2024-07 Humberside Police Humberside Police -0.786901 53.417548 On or near Southlands Drive E01026408 West Lindsey 002F Violence and sexual offences Unable to prosecute suspect NaN
9106 e03bd02394c757448d696e29eef78c95aab5d37758b808... 2024-07 Humberside Police Humberside Police -0.783777 53.406801 On or near Mercer Road E01026378 West Lindsey 004A Violence and sexual offences Court result unavailable NaN
9107 5eb623d7120422789578385c41c2122e7a668dfca83f0b... 2024-07 Humberside Police Humberside Police -0.774953 53.391906 On or near Petrol Station E01026383 West Lindsey 004E Violence and sexual offences Unable to prosecute suspect NaN
9108 61567f9efb26eac00b89587559c1c1473824198b48957a... 2024-07 Humberside Police Humberside Police -0.600770 53.407881 On or near Dawnhill Lane E01026385 West Lindsey 005A Violence and sexual offences Unable to prosecute suspect NaN
9109 3b6096ad0ef42e9b924002113e208a910aae54f1dcb91d... 2024-07 Humberside Police Humberside Police -0.563082 53.393698 On or near Creampoke Crescent E01026386 West Lindsey 005B Vehicle crime Investigation complete; no suspect identified NaN

9110 rows × 12 columns

# create the gdf from the df_crime_hu_2024 with x and y coordinates
gdf_crime_hu = gpd.GeoDataFrame(df_crime_hu_2024, geometry=gpd.points_from_xy(df_crime_hu_2024['Longitude'],
                                                                              df_crime_hu_2024['Latitude']), crs='EPSG:4326')
# transform the CRS to EPSG:27700 (British National Grid)
gdf_crime_hu = gdf_crime_hu.to_crs(epsg=27700)
gdf_crime_hu
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context geometry
0 db0f44d9523f2ec3713b89dc85de32390d7846d4857b18... 2024-07 Humberside Police Humberside Police -2.268891 53.548199 On or near Rothay Close E01004941 Bury 021A Public order Investigation complete; no suspect identified NaN POINT (382280.834 405761.213)
1 36be51a29a5b2e9603ef61c7f0915f7a877d94082f418f... 2024-07 Humberside Police Humberside Police -2.290840 53.520355 On or near Venwood Road E01005031 Bury 025A Public order Court result unavailable NaN POINT (380813.893 402669.101)
2 1b0c316c8c3a4c0f75592f16d4672e24fc4c977eab7ae3... 2024-07 Humberside Police Humberside Police -1.924643 54.544159 On or near Shopping Area E01020854 County Durham 067F Violence and sexual offences Unable to prosecute suspect NaN POINT (404973.562 516546.025)
3 22c74bdf8fce39c14cd76207bae4fef5719cfb4658bc6f... 2024-07 Humberside Police Humberside Police -0.183283 51.117402 On or near Exchange Road E01031585 Crawley 004C Violence and sexual offences Status update unavailable NaN POINT (527250.11 136915.397)
4 b6f7155c48ca46c57cd20afcc5c6bb07a1ce1d42618415... 2024-07 Humberside Police Humberside Police -0.915717 53.571778 On or near Low Levels Bank E01007559 Doncaster 008D Vehicle crime Investigation complete; no suspect identified NaN POINT (471900.336 408896.718)
... ... ... ... ... ... ... ... ... ... ... ... ... ...
9105 31b720576e28ed22c8ce7d88459354c16bd69203ecc8d5... 2024-07 Humberside Police Humberside Police -0.786901 53.417548 On or near Southlands Drive E01026408 West Lindsey 002F Violence and sexual offences Unable to prosecute suspect NaN POINT (480722.247 391876.638)
9106 e03bd02394c757448d696e29eef78c95aab5d37758b808... 2024-07 Humberside Police Humberside Police -0.783777 53.406801 On or near Mercer Road E01026378 West Lindsey 004A Violence and sexual offences Court result unavailable NaN POINT (480950.184 390684.609)
9107 5eb623d7120422789578385c41c2122e7a668dfca83f0b... 2024-07 Humberside Police Humberside Police -0.774953 53.391906 On or near Petrol Station E01026383 West Lindsey 004E Violence and sexual offences Unable to prosecute suspect NaN POINT (481565.151 389037.635)
9108 61567f9efb26eac00b89587559c1c1473824198b48957a... 2024-07 Humberside Police Humberside Police -0.600770 53.407881 On or near Dawnhill Lane E01026385 West Lindsey 005A Violence and sexual offences Unable to prosecute suspect NaN POINT (493113.206 391027.523)
9109 3b6096ad0ef42e9b924002113e208a910aae54f1dcb91d... 2024-07 Humberside Police Humberside Police -0.563082 53.393698 On or near Creampoke Crescent E01026386 West Lindsey 005B Vehicle crime Investigation complete; no suspect identified NaN POINT (495650.172 389499.55)

9110 rows × 13 columns

# spatial join the gdf_crime_hu and gdf_lsoa_hull to get the LSOA index for each crime incident (we don't use the 'LSOA codes' in the df_crime_hu_2024)
gdf_crime_hu_lsoa = gpd.sjoin(gdf_crime_hu, gdf_lsoa_hull, how='inner', predicate='within')
gdf_crime_hu_lsoa
Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type ... LSOA21CD LSOA21NM LSOA21NMW BNG_E BNG_N LAT LONG Shape__Area Shape__Length GlobalID
2194 NaN 2024-07 Humberside Police Humberside Police -0.319102 53.795305 On or near Finningley Garth E01012782 Kingston upon Hull 002A Anti-social behaviour ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
2195 2914efb412b1e124dc12b379fe395d457d11b998690d24... 2024-07 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Bicycle theft ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
2196 f8c053aed0021740e7e17f577414c52eff339a3fe85a80... 2024-07 Humberside Police Humberside Police -0.320579 53.796719 On or near Digby Garth E01012782 Kingston upon Hull 002A Burglary ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
2197 70598a4effd9969de667c33037522cd7d3209c03b8e308... 2024-07 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Criminal damage and arson ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
2198 f6632e2e3eed766b50dd277da0aa143e0fec44abbbf554... 2024-07 Humberside Police Humberside Police -0.325162 53.797160 On or near Abingdon Garth E01012782 Kingston upon Hull 002A Other theft ... E01012782 Kingston upon Hull 002A 510564 434676 53.79667 -0.32291 295014.267838 2717.829437 4bac5250-9916-4da1-87ac-00c02f2d47b4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5650 a43df72a23c5e5fc6944c3769ed323317c1879693b6571... 2024-07 Humberside Police Humberside Police -0.347889 53.795045 On or near Shopping Area E01033105 Kingston upon Hull 035D Other crime ... E01033105 Kingston upon Hull 035D 509270 434654 53.79675 -0.34256 407379.168350 3953.907070 d4f9f0ce-a99e-47be-995b-5af0f302b061
5651 1f5ce2f185ded208a8e75f9a6a79de96007917d15fb824... 2024-07 Humberside Police Humberside Police -0.355095 53.793485 On or near Raich Carter Way E01033107 Kingston upon Hull 035E Bicycle theft ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
5652 37172c13e7e745ddd24c1e82f94205856c60dba94be169... 2024-07 Humberside Police Humberside Police -0.355095 53.793485 On or near Raich Carter Way E01033107 Kingston upon Hull 035E Bicycle theft ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
5653 556d2c014b63fedb8ce14666a9700113625ad882f95a90... 2024-07 Humberside Police Humberside Police -0.348154 53.798576 On or near Halecroft Park E01033107 Kingston upon Hull 035E Violence and sexual offences ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1
5654 4a15b8fdf636482baa1bb829c9fcc08cf76b5cc9f2e323... 2024-07 Humberside Police Humberside Police -0.350433 53.795452 On or near Runnymede Way E01033107 Kingston upon Hull 035E Other crime ... E01033107 Kingston upon Hull 035E 508618 434547 53.79592 -0.35249 392191.723434 3331.231133 f698240e-05d9-4a6a-8c12-ba6f537ddef1

3461 rows × 25 columns

gdf_crime_hu_lsoa.plot()
<Axes: >
_images/dfada394c89cf425e321656ad8e31806f90541efa0137a0e31d82112b728dd1b.png
df_crime_hull_agg = gdf_crime_hu_lsoa.groupby(['LSOA21CD', 'Month']).agg({'Crime ID': 'count'}).reset_index().rename(columns={'Crime ID': 'Numbers'})
df_crime_hull_agg
LSOA21CD Month Numbers
0 E01012756 2024-07 13
1 E01012757 2024-07 20
2 E01012758 2024-07 6
3 E01012759 2024-07 31
4 E01012760 2024-07 7
... ... ... ...
163 E01035468 2024-07 10
164 E01035469 2024-07 6
165 E01035470 2024-07 12
166 E01035471 2024-07 9
167 E01035472 2024-07 4

168 rows × 3 columns

# set the geometry to the df_crime_hull_agg
gdf_crime_hull_agg = pd.merge(gdf_lsoa_hull, df_crime_hull_agg, on='LSOA21CD', how='left')
fig, ax = plt.subplots(figsize=(8, 8))
# plot the crime data
gdf_crime_hull_agg.plot(ax=ax, column='Numbers', cmap='Reds', edgecolor='black', linewidth=0.05, legend=True)
ax.set_title('Crime Numbers in Kingston upon Hull LSOAs (July 2024)')
ax.axis('off')
plt.show()
_images/0aed338197460526cceaf4448713f15b4c365d9bf7fb602b27a78e3a05146c5a.png

Q3 solution

# caculate the global moran's I
moran = esda.Moran(gdf_crime_hull_agg['Numbers'].values, w_lsoa_hull)
print(moran.I, moran.p_sim)
0.2315067482911263 0.001

Q4 solution

# compute the local moran's I for the crime data
local_moran = esda.Moran_Local(gdf_crime_hull_agg['Numbers'].values, w_lsoa_hull)
# plot the local moran's I
gdf_crime_hull_agg['local_moran'] = local_moran.Is
gdf_crime_hull_agg['local_moran_p'] = local_moran.p_sim
fig, ax = plt.subplots(figsize=(8, 8))
gdf_crime_hull_agg[gdf_crime_hull_agg['local_moran_p'] < 0.05].plot(ax=ax, column='local_moran',
                                                                    cmap='coolwarm', edgecolor='black',
                                                                    linewidth=0.3, legend=True,
                                                                    vmin=-max(local_moran.Is), vmax=max(local_moran.Is),
                                                                    legend_kwds={'shrink': 0.5, 'orientation': 'vertical', 'label': 'Local Moran\'s I'})
# plot the not significant areas
gdf_crime_hull_agg[gdf_crime_hull_agg['local_moran_p'] >= 0.05].plot(ax=ax, color='white', edgecolor='black', linewidth=0.3)
ax.set_title('Local Moran\'s I for Crime Numbers in Kingston upon Hull LSOAs (July 2024)')
ax.axis('off')
(503532.6564838799, 516643.2624241113, 425425.5469577427, 437061.9176401592)
_images/357d97a56646c5cc080bc95625e2934f5ebf78bdb3e2aaa74feb4b65caa27844.png
# compute the Getis-Ord Gi* for the crime data
gi = esda.G_Local(gdf_crime_hull_agg['Numbers'].values, w_lsoa_hull)
# add the Getis-Ord Gi* to the gdf_crime_hull_agg
gdf_crime_hull_agg['gi'] = gi.Zs
gdf_crime_hull_agg['gi_p'] = gi.p_sim
# Create a figure and axis
fig, ax = plt.subplots(figsize=(8, 8))

# Plot not significant areas (p_sim >= 0.05)  
gdf_crime_hull_agg[gdf_crime_hull_agg['gi_p'] >= 0.05].plot(
    ax=ax, color='white', edgecolor='black', linewidth=0.15)

# Plot cold spots (negative Z-scores and p_sim < 0.05)
gdf_crime_hull_agg[(gdf_crime_hull_agg['gi'] < 0) & (gdf_crime_hull_agg['gi_p'] < 0.05)].plot(
    column='gi',
    ax=ax,
    cmap='Blues',
    legend=True,
    legend_kwds={'shrink': 0.35, 'orientation': 'vertical', 'label': 'Cold Spots (Z<0)'},
    linewidth=0.15)

# Plot hot spots (positive Z-scores and p_sim < 0.05)  
gdf_crime_hull_agg[(gdf_crime_hull_agg['gi'] > 0) & (gdf_crime_hull_agg['gi_p'] < 0.05)].plot(
    column='gi',
    ax=ax,
    cmap='Reds',
    legend=True,
    legend_kwds={'shrink': 0.35, 'orientation': 'vertical', 'label': 'Hot Spots (Z>0)'},
    linewidth=0.15)

ax.axis('off')
plt.title('Crime Hotspots and Coldspots in Kingston upon Hull LSOAs (Getis-Ord Gi*)')
plt.show()
_images/da86cbb952e8cbb12c8886e9d4e222eafa67a917c31c219fa6f5adb3e82cbd0a.png