10 Deep learning for spatio-temporal analysis

Time	Activity
16:00–16:05	Introduction — Overview of the main tasks for the lab tutorials
16:05–16:45	Tutorial: Machine learning for spatio-temporal analysis — Follow Section 10.1-10.2 of the Jupyter Notebook to practice the ANNs and RNNs
16:45–17:30	Tutorial: Machine learning for spatio-temporal analysis — Follow Section 10.3 of the Jupyter Notebook to practice the LSTMs and CNNs
17:30–17:55	Quiz — Complete quiz tasks
17:55–18:00	Wrap-up — Recap key points and address final questions

For this module’s lab tutorials, you can download all the required data using the provided link (click).

Please make sure that both the Jupyter Notebook file and the data and img folder are placed in the same directory (specifically within the STBDA_lab folder) to ensure the code runs correctly.

Week 10 Key Takeaways:

Understand the basics of deep learning and its applications in spatio-temporal analysis.
Learn how to build and train Artificial Neural Networks (ANNs) for regression tasks.
Explore Recurrent Neural Networks (RNNs) for sequential data analysis.
Implement Long Short-Term Memory (LSTM) networks for time series prediction.
Gain insights into Convolutional Neural Networks (CNNs) and their applications in spatio-temporal data analysis.
Practice building and evaluating deep learning models using TensorFlow and Keras.

10 Deep learning for spatio-temporal analysis#

What is deep learning? Deep learning is a subset of machine learning that uses neural networks with many layers (deep architectures) to learn complex patterns in large datasets. It is particularly effective for tasks such as image and speech recognition, natural language processing, and spatio-temporal analysis.

Deep learning models can automatically learn features from raw data, eliminating the need for manual feature engineering. They are capable of handling high-dimensional data and can generalize well to unseen data, making them powerful tools for various applications.

What is neural networks? Neural networks are computational models inspired by the structure and function of the human brain. They consist of interconnected nodes (neurons) organized in layers, where each layer transforms the input data through weighted connections and activation functions. Neural networks can learn complex relationships in data by adjusting the weights based on the error between predicted and actual outputs during training.

The data structures in deep learning

The data structures used in deep learning are often referred to as tensors. Tensors are multi-dimensional arrays that can represent data in various forms, such as scalars, vectors, matrices, and higher-dimensional arrays.

Term	Description	Corresponding Array Dimensionality	Example
Scalar	A single number (magnitude only)	0D tensor (no dimensions)	`42` or `np.array(42)`
Vector	A list of numbers (magnitude + direction)	1D tensor	`[3, 5, 7]` → shape `(3,)`
Matrix	A table of numbers (2D grid)	2D tensor	`[[1, 2], [3, 4]]` → shape `(2, 2)`
Tensor	Generalization to more than 2 dimensions	3D and higher tensors	3D array shape `(samples, time_steps, features)`

Reshaping is a common operation in deep learning that allows you to change the shape of a tensor without changing its data. It is often used to prepare data for input into neural networks or to manipulate the output of a model.

# 1D array to 2D array
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", array_1d)
print("Shape of 1D Array:", array_1d.shape)

1D Array: [1 2 3 4 5]
Shape of 1D Array: (5,)

# Reshape to 2D array with 5 rows and 1 column
array_2d = array_1d.reshape(5, 1)
print("Reshaped 2D Array:\n", array_2d)
print("Shape of Reshaped 2D Array:", array_2d.shape)

Reshaped 2D Array:
 [[1]
 [2]
 [3]
 [4]
 [5]]
Shape of Reshaped 2D Array: (5, 1)

# 2D array to 3D array
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)
print("Shape of 2D Array:", array_2d.shape)

2D Array:
 [[1 2 3]
 [4 5 6]]
Shape of 2D Array: (2, 3)

# Reshape to 3D array with 2 rows, 1 column, and 3 depth
array_3d = array_2d.reshape(2, 1, 3)
print("Reshaped 3D Array:\n", array_3d)
print("Shape of Reshaped 3D Array:", array_3d.shape)

Reshaped 3D Array:
 [[[1 2 3]]

 [[4 5 6]]]
Shape of Reshaped 3D Array: (2, 1, 3)

# select the first row and first column of the 3D array
selected_value = array_3d[0, 0, 2]
print("Selected Value from 3D Array:", selected_value)

Selected Value from 3D Array: 3

# slect the first row and all columns of the 3D array
selected_row = array_3d[0, 0, :]
print("Selected Row from 3D Array:", selected_row)

Selected Row from 3D Array: [1 2 3]

# build a 3D array with 2 rows, 3 columns, and 4 depth
array_3d = np.array([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
                     [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]])
print("3D Array:\n", array_3d)
print("Shape of 3D Array:", array_3d.shape)

3D Array:
 [[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]
Shape of 3D Array: (2, 3, 4)

In TensorFlow, a tensor is the core data structure — it represents multi-dimensional arrays just like NumPy arrays, but with more capabilities like GPU support, gradient tracking, and symbolic computation.

We can use the tf.constant function to create tensors in TensorFlow. Tensors can have different dimensions, such as scalars (0D), vectors (1D), matrices (2D), and higher-dimensional arrays (3D and beyond). The shape of a tensor is defined by its dimensions, which can be accessed using the .shape attribute.

import tensorflow as tf

# Scalar
scalar = tf.constant(42)
print(scalar.shape)  # ()

# Vector (1D)
vector = tf.constant([1.0, 2.0, 3.0])
print(vector.shape)  # (3,)

# Matrix (2D)
matrix = tf.constant([[1, 2], [3, 4]])
print(matrix.shape)  # (2, 2)

# Tensor (3D)
tensor_3d = tf.constant([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]]
])
print(tensor_3d.shape)  # (2, 2, 2)

()
(3,)
(2, 2)
(2, 2, 2)

# check if tf can use GPU
print(tf.config.list_physical_devices('GPU'))

[]

Key components and concepts related to Neural Networks (NNs)

The table of contents below provides an overview of the key components and concepts related to NNs:

Component	Description
Input Layer	The first layer that receives input data. Each neuron represents a feature or variable.
Hidden Layers	Intermediate layers that perform transformations on the input data. Each layer can have multiple neurons.
Output Layer	The final layer that produces the output of the ANN. The number of neurons corresponds to the number of classes or regression outputs.
Activation Functions	Functions applied to the output of each neuron to introduce non-linearity. Common functions include ReLU, sigmoid, and tanh.
Weights and Biases	Parameters that are learned during training. Weights determine the strength of connections between neurons, while biases allow for flexibility in the model.
Loss Function	A function that measures the difference between predicted and actual outputs. Common loss functions include mean squared error (MSE) for regression and cross-entropy for classification.
Optimizer	An algorithm used to update the weights and biases during training to minimize the loss function. Common optimizers include stochastic gradient descent (SGD), Adam, and RMSprop.
Backpropagation	A training algorithm that computes gradients of the loss function with respect to the weights and biases, allowing for efficient weight updates.
Batch Size	The number of samples processed before the model’s weights are updated. Smaller batch sizes can lead to more frequent updates, while larger batch sizes can provide more stable gradients.
Epoch	One complete pass through the entire training dataset. Multiple epochs are often required for convergence.

Activation Functions

AF are formulas that define a neuron’s output in a neural network. They add non-linearity and enable the network to learn complex patterns. Without them, the model would act like a simple linear regression.

Name	Formula	Characteristics	Use Case
Sigmoid	\(\sigma(x) = \frac{1}{1 + e^{-x}}\)	Smooth, outputs between 0 and 1	Binary classification, older models
Tanh	\(\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)	Output between -1 and 1; zero-centered	Sometimes used in hidden layers
ReLU	\(f(x) = \max(0, x)\)	Simple, fast, sparse activation	Most common in hidden layers
Leaky ReLU	\(f(x) = \max(0.01x, x)\)	Fixes dying ReLU problem	Variants of ReLU where some gradient flows
ELU	\(f(x) = x \text{ if } x>0 \text{ else } \alpha(e^x - 1)\)	Better performance in some cases	Slight improvement over ReLU in some tasks
Softmax	\(\text{softmax}(x_i) = \frac{e^{x_i}}{\sum e^{x_j}}\)	Converts scores into probabilities	Used in the output layer for classification

10.1 Artificial Neural Networks (ANNs)#

ANN (Artificial Neural Network) is a computational model inspired by the way biological neural networks in the human brain process information. ANNs consist of interconnected groups of artificial neurons that work together to solve specific problems, such as classification, regression, or pattern recognition. They are widely used in various fields, including computer vision, natural language processing, and spatio-temporal analysis.

10.1.1 ANNs Regression Example#

We use a simple ANN model to predict the taxi pickup amounts in Manhattan, New York City, using the NYC Taxi dataset. The model will be trained on historical data to predict the number of pickups for the next day based on previous days’ data.

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# x: inputs, y: outputs
# Generate 50 data points between -3 and 3
x = np.linspace(-3, 3, 50)
np.random.seed(42)  # For reproducibility
# Create corresponding y values with a little noise
y = 2 * x + 1

print(x)

[-3.         -2.87755102 -2.75510204 -2.63265306 -2.51020408 -2.3877551
 -2.26530612 -2.14285714 -2.02040816 -1.89795918 -1.7755102  -1.65306122
 -1.53061224 -1.40816327 -1.28571429 -1.16326531 -1.04081633 -0.91836735
 -0.79591837 -0.67346939 -0.55102041 -0.42857143 -0.30612245 -0.18367347
 -0.06122449  0.06122449  0.18367347  0.30612245  0.42857143  0.55102041
  0.67346939  0.79591837  0.91836735  1.04081633  1.16326531  1.28571429
  1.40816327  1.53061224  1.65306122  1.7755102   1.89795918  2.02040816
  2.14285714  2.26530612  2.3877551   2.51020408  2.63265306  2.75510204
  2.87755102  3.        ]

print(x.shape)

(50,)

print(y)

[-5.         -4.75510204 -4.51020408 -4.26530612 -4.02040816 -3.7755102
 -3.53061224 -3.28571429 -3.04081633 -2.79591837 -2.55102041 -2.30612245
 -2.06122449 -1.81632653 -1.57142857 -1.32653061 -1.08163265 -0.83673469
 -0.59183673 -0.34693878 -0.10204082  0.14285714  0.3877551   0.63265306
  0.87755102  1.12244898  1.36734694  1.6122449   1.85714286  2.10204082
  2.34693878  2.59183673  2.83673469  3.08163265  3.32653061  3.57142857
  3.81632653  4.06122449  4.30612245  4.55102041  4.79591837  5.04081633
  5.28571429  5.53061224  5.7755102   6.02040816  6.26530612  6.51020408
  6.75510204  7.        ]

print(y.shape)

(50,)

# Reshape the data to be 2D (required by Keras: shape (batch_size, features))
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)

print(x)

[[-3.        ]
 [-2.87755102]
 [-2.75510204]
 [-2.63265306]
 [-2.51020408]
 [-2.3877551 ]
 [-2.26530612]
 [-2.14285714]
 [-2.02040816]
 [-1.89795918]
 [-1.7755102 ]
 [-1.65306122]
 [-1.53061224]
 [-1.40816327]
 [-1.28571429]
 [-1.16326531]
 [-1.04081633]
 [-0.91836735]
 [-0.79591837]
 [-0.67346939]
 [-0.55102041]
 [-0.42857143]
 [-0.30612245]
 [-0.18367347]
 [-0.06122449]
 [ 0.06122449]
 [ 0.18367347]
 [ 0.30612245]
 [ 0.42857143]
 [ 0.55102041]
 [ 0.67346939]
 [ 0.79591837]
 [ 0.91836735]
 [ 1.04081633]
 [ 1.16326531]
 [ 1.28571429]
 [ 1.40816327]
 [ 1.53061224]
 [ 1.65306122]
 [ 1.7755102 ]
 [ 1.89795918]
 [ 2.02040816]
 [ 2.14285714]
 [ 2.26530612]
 [ 2.3877551 ]
 [ 2.51020408]
 [ 2.63265306]
 [ 2.75510204]
 [ 2.87755102]
 [ 3.        ]]

print(x.shape)

(50, 1)

print(y)

[[-5.        ]
 [-4.75510204]
 [-4.51020408]
 [-4.26530612]
 [-4.02040816]
 [-3.7755102 ]
 [-3.53061224]
 [-3.28571429]
 [-3.04081633]
 [-2.79591837]
 [-2.55102041]
 [-2.30612245]
 [-2.06122449]
 [-1.81632653]
 [-1.57142857]
 [-1.32653061]
 [-1.08163265]
 [-0.83673469]
 [-0.59183673]
 [-0.34693878]
 [-0.10204082]
 [ 0.14285714]
 [ 0.3877551 ]
 [ 0.63265306]
 [ 0.87755102]
 [ 1.12244898]
 [ 1.36734694]
 [ 1.6122449 ]
 [ 1.85714286]
 [ 2.10204082]
 [ 2.34693878]
 [ 2.59183673]
 [ 2.83673469]
 [ 3.08163265]
 [ 3.32653061]
 [ 3.57142857]
 [ 3.81632653]
 [ 4.06122449]
 [ 4.30612245]
 [ 4.55102041]
 [ 4.79591837]
 [ 5.04081633]
 [ 5.28571429]
 [ 5.53061224]
 [ 5.7755102 ]
 [ 6.02040816]
 [ 6.26530612]
 [ 6.51020408]
 [ 6.75510204]
 [ 7.        ]]

print(y.shape)

(50, 1)

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import layers
from keras import Input

# Build a simple ANN model
# Sequential: A linear stack of layers
ann = Sequential([
    # Input layer shape for each sample
    Input(shape=(1,)),
    # Hidden layer with 30 neurons
    Dense(units=30, activation='relu'),
    # Output layer with 1 neuron (predicting one continuous value)
    Dense(units=1)])

# loss = mean squared error (suitable for regression)
# optimizer = Adam (adaptive gradient descent)
ann.compile(optimizer='adam', loss='mse')

# Train the model
# epochs: how many times to go over the entire dataset
# batch_size: number of samples per gradient update
history = ann.fit(x, y, epochs=100, batch_size=10, verbose=0)

# Make predictions
y_pred = ann.predict(x)

1/2 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step


2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step

y_pred

array([[-4.5038476 ],
       [-4.3176517 ],
       [-4.131456  ],
       [-3.9452596 ],
       [-3.7590637 ],
       [-3.572868  ],
       [-3.386672  ],
       [-3.2004762 ],
       [-3.0142803 ],
       [-2.8280845 ],
       [-2.6418881 ],
       [-2.4556925 ],
       [-2.2694964 ],
       [-2.0833006 ],
       [-1.8971049 ],
       [-1.710909  ],
       [-1.5205791 ],
       [-1.3220133 ],
       [-1.1202302 ],
       [-0.90876573],
       [-0.675929  ],
       [-0.37348443],
       [ 0.02919072],
       [ 0.45655853],
       [ 0.8759759 ],
       [ 1.2953931 ],
       [ 1.5952731 ],
       [ 1.8382    ],
       [ 2.081127  ],
       [ 2.3240538 ],
       [ 2.5623913 ],
       [ 2.7903113 ],
       [ 3.0182316 ],
       [ 3.246152  ],
       [ 3.470642  ],
       [ 3.694658  ],
       [ 3.918674  ],
       [ 4.14269   ],
       [ 4.366706  ],
       [ 4.590722  ],
       [ 4.814436  ],
       [ 5.0373707 ],
       [ 5.2603054 ],
       [ 5.48324   ],
       [ 5.706175  ],
       [ 5.9291096 ],
       [ 6.1520443 ],
       [ 6.3749795 ],
       [ 6.597914  ],
       [ 6.820849  ]], dtype=float32)

# Evaluate the model
loss = ann.evaluate(y_pred, y, verbose=0)
print(f"Final MSE loss: {loss:.4f}")

Final MSE loss: 13.2254

# Plot the predicted values and the actual values using the scatter plot
plt.figure(figsize=(5, 5))
plt.scatter(y_pred, y, alpha=0.5)
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Actual Values')
plt.title('Predicted vs Actual Values')
plt.xlim(0, y.max() * 1.1)
plt.ylim(0, y.max() * 1.1)
plt.show()

_images/17e004f31ef78773b9f61ee699d94e8dfaa0da7c2e58f7d783055871e706a442.png

10.1.2 ANNs Regression for NYC Taxi Pickups Prediction#

Now, we will use the NYC Taxi dataset to predict the number of pickups in Manhattan using an ANN model. The dataset contains daily pickup counts for different taxi zones in New York City. The X_train, y_train, and X_test, y_test were from the previous week, which are the features and target variables for training and testing the model.

We select the sliding window size as 30 days – we will use the previous 30 days’ taxi pickups in the same zone to predict the next day’s taxi pickups in the same zone. The output is the next day pickup amounts for all taxi zones. Training data is from 2023-01-01 to 2023-06-30 for 66 taxi zones, and the testing data is 2023-07-01 for 66 taxi zones.

Spatial units: 66
Temporal units: 181
Spatial and temporal units: 66*181
Features: 30 previous temporal values and 30 spatial lags (for each taxi zone, each day has one each spatial lags)
Target: next day values for 66 taxi zones. (66, )
X_train shape: 66*(181-30) * (30+30); y_train shape: 66*(181-30)
X_test shape: 66*(30+30); y_test shape: 66

# train set is from 2023-01-01 to 2023-11-30
X_train_nyc = np.load('data/X_train.npy')
y_train_nyc = np.load('data/y_train.npy')
X_test_nyc = np.load('data/X_test.npy')
y_test_nyc = np.load('data/y_test.npy')

print(X_train_nyc.shape, y_train_nyc.shape,
      X_test_nyc.shape, y_test_nyc.shape)

(9966, 60) (9966,) (66, 60) (66,)

X_train_nyc

array([[ 174.  , 1074.75,   32.  , ..., 1499.25,   38.  ,  442.5 ],
       [  32.  ,  432.5 ,   51.  , ...,  442.5 ,   60.  ,  579.75],
       [  51.  ,  438.75,   43.  , ...,  579.75,   75.  ,  648.25],
       ...,
       [1179.  , 1047.2 , 1796.  , ..., 2100.8 , 1893.  , 2354.8 ],
       [1796.  , 2394.8 , 2196.  , ..., 2354.8 , 1843.  , 2198.6 ],
       [2196.  , 2667.6 , 2180.  , ..., 2198.6 , 1876.  , 2147.8 ]])

Build a simple ANN model

# Sequential: A linear stack of layers
ann_model =  Sequential([
    # Input layer for 1 feature
    Input(shape=(60,)),
    # Hidden layer with 30 neurons
    layers.Dense(units=30, activation='relu'),
    # Output layer with 1 neuron (predicting one continuous value)
    layers.Dense(units=1)])
# loss = mean squared error (suitable for regression)
# optimizer = Adam (adaptive gradient descent)
ann_model.compile(optimizer='adam', loss='mse')

%%time
# Train the model
history = ann_model.fit(X_train_nyc, y_train_nyc, epochs=100, batch_size=10, verbose=0)

CPU times: user 30.6 s, sys: 6.06 s, total: 36.6 s
Wall time: 23.2 s

# Evaluate the model
loss = ann_model.evaluate(X_train_nyc, y_train_nyc, verbose=0)
print(f"Final RMSE loss (training set): {np.sqrt(loss):.4f}")

Final RMSE loss (training set): 159.3046

# predict taxi pickups the next day
y_pred_nyc = ann_model.predict(X_test_nyc)

1/3 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step


3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step

y_pred_nyc

array([[9.39334167e+02],
       [4.50067657e+02],
       [6.35627319e+02],
       [7.07341064e+02],
       [3.89289825e+02],
       [1.46727707e+02],
       [1.97894495e+03],
       [7.18767334e+02],
       [2.01176208e+03],
       [2.01610815e+03],
       [2.08259497e+03],
       [2.13194763e+02],
       [1.24898083e+03],
       [1.29628394e+03],
       [3.39075317e+02],
       [3.61877197e+02],
       [2.08561255e+03],
       [2.91301050e+03],
       [1.71802954e+03],
       [2.00166870e+03],
       [1.17749438e+03],
       [6.62513962e+01],
       [3.52951660e+01],
       [1.26692676e+03],
       [6.01861906e+00],
       [7.25930882e+00],
       [1.53382251e+03],
       [1.90954944e+03],
       [2.44485718e+03],
       [2.02276282e+03],
       [2.15617236e+03],
       [8.47588867e+02],
       [7.10824646e+02],
       [1.09190955e+03],
       [1.81094299e+02],
       [1.00701034e+00],
       [1.49973059e+03],
       [3.11276489e+03],
       [3.09069580e+03],
       [3.16033081e+03],
       [2.66960425e+03],
       [2.14964172e+02],
       [2.56099316e+03],
       [2.79663647e+03],
       [2.14516907e+02],
       [2.92781925e+00],
       [5.62130798e+02],
       [9.28009399e+02],
       [1.43415881e+03],
       [2.36032373e+03],
       [2.64190845e+03],
       [6.56314636e+02],
       [9.76706421e+02],
       [2.15849805e+03],
       [2.26821851e+03],
       [2.69006299e+03],
       [3.18407373e+03],
       [1.74114819e+03],
       [2.00319946e+03],
       [1.48818445e+01],
       [3.64984093e+01],
       [1.66785315e+03],
       [1.51006335e+03],
       [5.28614929e+02],
       [1.57510657e+03],
       [1.76603149e+03]], dtype=float32)

# Calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test_nyc, y_pred_nyc))
print(f'RMSE in testing set: {rmse:.2f}')

RMSE in testing set: 642.98

# plot the predicted values and the actual values using the scatter plot
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.scatter(y_test_nyc, y_pred_nyc, alpha=0.5)
plt.plot([y_test_nyc.min(), y_test_nyc.max()],
         [y_test_nyc.min(), y_test_nyc.max()], color='red', linestyle='--')
plt.xlabel('Actual Taxi Pickups')
plt.ylabel('Predicted Taxi Pickups')
plt.title('Predicted vs Actual Taxi Pickups in NYC (2023-07-01)')
plt.xlim(0, y_test_nyc.max() * 1.1)
plt.ylim(0, y_test_nyc.max() * 1.1)

(0.0, 3558.5000000000005)

_images/7a7e2ffa765d7f757b98897ebde3605ffc6caefd780234061398063564b33c9e.png

import geopandas as gpd
gdf_m = gpd.read_file('data/gdf_man.geojson')

def plot_predicted_pickups(gdf_m, y_pred, y_test):
    # let plot the predicted values and the actual values using mapping
    gdf_m['predicted_pickup'] = y_pred
    gdf_m['actual_pickup'] = y_test
    gdf_m['residual'] = gdf_m['predicted_pickup'] - gdf_m['actual_pickup']
    fig, ax = plt.subplots(1, 3, figsize=(15, 6))
    gdf_m.plot(column='predicted_pickup', ax=ax[0], legend=True, cmap='RdPu', edgecolor='black',
               vmin=0, vmax= np.max([gdf_m['predicted_pickup'].max(), gdf_m['actual_pickup'].max()]))
    ax[0].set_title('Predicted Taxi Pickups (2023-07-01)')
    gdf_m.plot(column='actual_pickup', ax=ax[1], legend=True, cmap='RdPu', edgecolor='black',
               vmin=0, vmax= np.max([gdf_m['predicted_pickup'].max(), gdf_m['actual_pickup'].max()]))
    ax[1].set_title('Actual Taxi Pickups (2023-07-01)')
    gdf_m.plot(column='residual', ax=ax[2], legend=True, cmap='coolwarm', edgecolor='black',
               vmin=-2000, vmax=2000)
    ax[2].set_title('Residual Taxi Pickups (2023-07-01)')
    ax[0].set_axis_off()
    ax[1].set_axis_off()
    ax[2].set_axis_off()
    plt.tight_layout()
    plt.show()

plot_predicted_pickups(gdf_m, y_pred_nyc, y_test_nyc)

_images/02a168c4a5b202ffc25264732c4d83ec4b79f288306f50e80ce8bf21aaaed54c.png

10.2 Recurrent Neural Networks (RNNs)#

Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed for processing sequential data — data where current inputs are dependent on previous inputs. They are widely used in time series prediction, speech recognition, and other tasks involving sequences. You can find the tutorial of using RNNs from this tensorflow page.

10.2.1 RNN Regression Example#

np.random.seed(42)
# Generate Time Series Data
time_steps = 30
x = np.linspace(0, 4 * np.pi, time_steps)
data = np.sin(x).reshape(-1, 1)  # shape: (30, 1)

# Prepare sliding windows (time Lag) features
lag = 3  # how many previous sliding windows (time steps) to use
X = []
y = []
for i in range(len(data) - lag):
    X.append(data[i:i+lag])
    y.append(data[i+lag])
X = np.array(X)  # shape: (27, 3, 1)
y = np.array(y)  # shape: (27, 1)

print("X shape:", X.shape)
print("y shape:", y.shape)

X shape: (27, 3, 1)
y shape: (27, 1)

X[0]

array([[0.        ],
       [0.4198891 ],
       [0.76216206]])

from keras.layers import SimpleRNN, Dense
# Define RNNs
rnn = Sequential([
    Input(shape=(3, 1)),  # the input shape is the data shape per sample
    SimpleRNN(units=10, activation='tanh'),
    Dense(1)  # output layer
])
rnn.compile(optimizer='adam', loss='mse')
# Train Model
rnn.fit(X, y, epochs=200, verbose=0)

<keras.src.callbacks.history.History at 0x36c05b520>

X[-1]

array([[-0.96354999],
       [-0.76216206],
       [-0.4198891 ]])

# Predict Next Value
pred = rnn.predict(X[-1].reshape(1, 3, 1))
print("Next predicted value:", pred[0][0])

WARNING:tensorflow:5 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x36c0b7490> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step


1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step

Next predicted value: -0.027176028

10.2.2 RNNs for NYC Taxi Pickups Prediction#

As RNNs require a 3D input shape, we need to reshape the data accordingly if we have a 2D array. Alernatively, we can build the data in a 3D array directly in the feature engineering step like the example above.

# read the df train from the previous week example
df_train = pd.read_csv('data/df_train.csv', index_col=0)

df_train.head()

	PULocationID	tpep_pickup_date	spatial_lag	pickup_count
0	4	2023-01-01	1074.75	174
1	4	2023-01-02	432.50	32
2	4	2023-01-03	438.75	51
3	4	2023-01-04	533.75	43
4	4	2023-01-05	624.75	42

# create a 3d X array with 30 previous days' taxi pickups and spatial lags for each taxi zone
X_train = []
y_train = []
for zone in df_train['PULocationID'].unique():
    # get the taxi zone data
    zone_data = df_train[df_train['PULocationID'] == zone][['spatial_lag','pickup_count']].values
    # create a 3D array with 30 previous days' taxi pickups and spatial lags for each taxi zone
    for i in range(len(zone_data) - 30):
        X_train.append(zone_data[i:i+30])
        y_train.append(zone_data[i+30][1])  # the next day pickup count

X_train = np.array(X_train)
y_train = np.array(y_train).reshape(-1, 1)  # reshape y to be 2D

print("X_train  shape:", X_train.shape)
print("y_train shape:", y_train.shape)

X_train  shape: (9966, 30, 2)
y_train shape: (9966, 1)

X_test = np.array([df_train[df_train['PULocationID'] == zone][['spatial_lag','pickup_count']].values[-30:] for zone in df_train['PULocationID'].unique() ])  # the last 30 days' data for the last taxi zone

X_test.shape

(66, 30, 2)

y_test = y_test_nyc

# Define RNNs
rnn_model = Sequential([
    Input(shape=(30, 2)),  # the input shape is the data shape per sample
    SimpleRNN(units=32, activation='tanh'),
    Dense(1)  # output layer
])
rnn_model.compile(optimizer='adam', loss='mse')
# Train Model
rnn_model.fit(X_train, y_train, epochs=200, verbose=0)

<keras.src.callbacks.history.History at 0x36f2ca410>

# Predict Next Value
y_pred = rnn_model.predict(X_test)
print("Next predicted value:", y_pred)

WARNING:tensorflow:6 out of the last 7 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x36c523910> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

1/3 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step


3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step

Next predicted value: [[ 9.55611572e+01]
 [ 6.92451477e+01]
 [ 6.60138611e+02]
 [ 3.05185669e+02]
 [ 3.05020081e+02]
 [ 8.30399628e+01]
 [ 1.62078467e+03]
 [ 1.86968475e+02]
 [ 1.84369922e+03]
 [ 6.60539307e+02]
 [ 1.84369922e+03]
 [ 1.87114334e+02]
 [ 6.60538696e+02]
 [ 1.84369922e+03]
 [ 5.43842712e+02]
 [ 3.05739258e+02]
 [ 1.49032983e+03]
 [ 1.60041650e+03]
 [ 1.84369922e+03]
 [ 1.60659424e+03]
 [ 1.60734595e+03]
 [ 1.08283340e+02]
 [ 1.19630661e+01]
 [ 6.59502258e+02]
 [-1.27464294e-01]
 [-2.08326340e+01]
 [ 1.60731213e+03]
 [ 1.84164062e+03]
 [ 1.84369922e+03]
 [ 1.84369922e+03]
 [ 9.45067505e+02]
 [ 1.02289917e+03]
 [ 1.15081250e+03]
 [ 1.01696198e+03]
 [ 7.36858139e+01]
 [-8.55097198e+00]
 [ 8.96919800e+02]
 [ 1.84369922e+03]
 [ 1.84369922e+03]
 [ 1.84369922e+03]
 [ 1.84369775e+03]
 [ 5.42993408e+02]
 [ 1.84369922e+03]
 [ 1.84369922e+03]
 [-4.95172424e+01]
 [-2.72332764e+00]
 [ 3.05178467e+02]
 [ 1.00917725e+03]
 [ 1.87447525e+02]
 [ 1.72517090e+03]
 [ 1.84369922e+03]
 [ 1.60567896e+03]
 [ 8.14861145e+01]
 [ 1.48914258e+03]
 [ 1.84369922e+03]
 [ 1.84369922e+03]
 [ 1.84369922e+03]
 [ 1.75190942e+03]
 [ 1.84369922e+03]
 [ 1.99201584e+01]
 [ 6.48842468e+01]
 [ 1.60734595e+03]
 [ 1.84369922e+03]
 [ 5.44805420e+02]
 [ 1.60734033e+03]
 [ 1.75878052e+03]]

# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE in testing set: {rmse:.2f}')

RMSE in testing set: 428.92

y_test

array([ 169,   88,  434,  185,  231,   95, 1656,  172, 2805,  681, 2322,
        220,  447, 2862,  527,  269, 1533, 1462, 1685, 1173, 1631,   77,
          1,  450,    6,    1,  791,  984, 1841, 2644,  753, 1143, 1613,
        623,   68,    1, 1025, 3235, 2198, 2143, 1878,  309, 1848, 2553,
          2,    9,  187,  836,  104, 1385, 3081,  992,  156,  932, 2189,
       2079, 2613, 1331, 1965,   15,   64, 1640, 2368,  597,  844, 1380])

# plot the predicted values and the actual values using the scatter plot
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--')
plt.xlabel('Actual Taxi Pickups')
plt.ylabel('Predicted Taxi Pickups')
plt.title('Predicted vs Actual Taxi Pickups in NYC (2023-07-01)')
plt.xlim(0, y_test.max() * 1.1)
plt.ylim(0, y_test.max() * 1.1)
plt.show()

_images/f0934c9b14d37166ed7b0287371e08bb47ee400b4a92b17558df3dbaa5acc3a6.png

plot_predicted_pickups(gdf_m, y_pred, y_test)

_images/56093bd90150aa583b47a223d71beaac7a6c930a63211e072cfb7c9cd5092692.png

10.3 Long Short-Term Memory (LSTM) networks#

LSTM stands for Long Short-Term Memory. It’s a variant of Recurrent Neural Network (RNN) architecture designed to better capture long-range dependencies in sequence data. Traditional RNNs tend to have trouble remembering information from far back in the sequence because of something called the vanishing gradient problem during training. LSTMs solve this by introducing a structure called memory cells and special gates (input gate, forget gate, output gate) that control the flow of information. This gating architecture allows LSTMs to remember important information for longer periods and selectively forget irrelevant parts. You can find the tutorial of using LSTMs from this tensorflow page.

Aspect	RNN	LSTM
Architecture	Simple recurrent units with loops	Complex cells with memory cell and three gates
Memory	Limited to short-term memory	Can capture long-term dependencies
Vanishing gradient	Prone to vanishing or exploding gradients	Designed to mitigate vanishing gradient problems
Performance on long sequences	Struggles with long sequences	Performs much better on longer sequences
Gates	No gates	Input, forget, and output gates for better control
Use cases	Basic sequential tasks	Complex sequential tasks requiring context retention

# Define a LSTM model
from keras.layers import LSTM
lstm_model = Sequential([
    Input(shape=(30, 2)),  # the input shape is the data shape per sample
    # For stacked LSTM layers, all LSTM layers except the last one must have return_sequences=True
    LSTM(units=32, activation='tanh', return_sequences=True), # 1st LSTM layer
    LSTM(units=32, activation='tanh', return_sequences=True),  # 2nd LSTM layer
    LSTM(units=32, activation='tanh'),  # 3rd LSTM layer
    Dense(1)  # output layer
])
lstm_model.compile(optimizer='adam', loss='mse')
# Train Model
lstm_model.fit(X_train, y_train, epochs=200, verbose=0)

<keras.src.callbacks.history.History at 0x357490eb0>

# Predict Next Value
y_pred = lstm_model.predict(X_test)
print("Next predicted value:", y_pred)

1/3 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step


3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step


3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step

Next predicted value: [[  92.34474  ]
 [  28.663183 ]
 [ 553.3706   ]
 [ 201.01932  ]
 [ 260.48154  ]
 [  99.650894 ]
 [1932.6918   ]
 [ 123.89032  ]
 [1955.9598   ]
 [ 557.4681   ]
 [1955.7119   ]
 [ 234.72284  ]
 [ 468.69632  ]
 [1955.396    ]
 [ 487.5384   ]
 [ 265.44022  ]
 [1925.011    ]
 [ 942.0801   ]
 [1954.8771   ]
 [ 786.6172   ]
 [1913.961    ]
 [  85.12416  ]
 [   4.5358276]
 [ 289.11566  ]
 [   9.526817 ]
 [   3.0452003]
 [ 946.47186  ]
 [1028.5009   ]
 [1955.5829   ]
 [1956.1534   ]
 [ 667.39746  ]
 [ 661.18286  ]
 [1849.1115   ]
 [ 984.1623   ]
 [  72.51457  ]
 [   2.872387 ]
 [ 754.88135  ]
 [1956.7798   ]
 [1955.7311   ]
 [1955.7103   ]
 [1954.684    ]
 [ 465.01376  ]
 [1955.7123   ]
 [1956.3638   ]
 [   5.4966393]
 [   4.8534317]
 [ 198.47726  ]
 [ 654.40326  ]
 [ 174.56087  ]
 [1737.6897   ]
 [1955.7128   ]
 [1431.2847   ]
 [  78.51694  ]
 [ 969.4642   ]
 [1954.5591   ]
 [1956.7814   ]
 [1956.7844   ]
 [1945.45     ]
 [1955.7158   ]
 [  23.310139 ]
 [  77.2171   ]
 [1911.1652   ]
 [1954.6063   ]
 [ 510.52734  ]
 [ 955.4945   ]
 [1947.2797   ]]

# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE in testing set: {rmse:.2f}')

RMSE in testing set: 364.74

# plot the predicted values and the actual values using the scatter plot
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--')
plt.xlabel('Actual Taxi Pickups')
plt.ylabel('Predicted Taxi Pickups')
plt.title('Predicted vs Actual Taxi Pickups in NYC (2023-07-01)')
plt.xlim(0, y_test.max() * 1.1)
plt.ylim(0, y_test.max() * 1.1)
plt.show()

_images/ad7f08669cd6431e3b309d53cd54e3e8537b3fe910bf003c0683340a0fa88fa7.png

plot_predicted_pickups(gdf_m, y_pred, y_test)

_images/3e9f9417f08acb89b9949bf6f7a070f5fe9f7af2bf1f2f93e1f38d09c2609bf4.png

10.4 Convolutional Neural Networks (CNNs)#

Convolutional Neural Networks (CNNs) are a type of deep learning model specially designed to process data that has a grid-like topology — most commonly images (2D grids of pixels) but also 1D sequences or 3D volumes. CNN layers extract spatial features from each snapshot (image/frame/grid at a particular time).

MaxPooling is a downsampling operation commonly used in CNNs to reduce the spatial dimensions (height and width) of the feature maps. MaxPooling provides spatial invariance and helps the model recognize features regardless of small shifts or distortions. It can help to prevent overfitting by making the representations more compact.

How does it work?
- Use a small window (called the pool size, e.g., 2x2) and slide it over the input feature map.
- For each window, MaxPooling outputs the maximum value within that window.
- This effectively keeps the most important feature (the strongest activation) in that region.

Suppose you have a 4x4 feature map:

matrix = [
    [1, 3, 2, 4],
    [5, 6, 1, 2],
    [3, 2, 8, 7],
    [4, 1, 2, 6]
]

Applying MaxPooling with a 2x2 window and stride 2 gives:

Max of top-left 2x2 window:

max(1,3,5,6) = 6

Max of top-right 2x2 window:

max(2,4,1,2) = 4

Max of bottom-left 2x2 window:

max(3,2,4,1) = 4

Max of bottom-right 2x2 window:

max(8,7,2,6) = 8

Output after pooling:

[[6, 4],
 [4, 8]]

There are two common approaches to model temporal dependencies:

3D CNNs: Extend convolution into the time dimension with 3D kernels that slide over space and time simultaneously. This captures spatio-temporal features in one step.
CNN + RNN/LSTM: Use CNN layers to extract spatial features at each time step, then feed the sequence of features into an RNN/LSTM/GRU to learn temporal dependencies.

For example, we can use a 3D CNN to predict the next day taxi pickups in NYC using the previous 30 days’ data for each taxi zone. So, each sample is a 3D array with the shape of (30, 66, 1), where 30 is the number of previous days, 66 is the number of taxi zones, and 1 is the number of features (the pickup count). The output is the next day pickup amounts for all taxi zones.

df_train

	PULocationID	tpep_pickup_date	spatial_lag	pickup_count
0	4	2023-01-01	1074.75	174
1	4	2023-01-02	432.50	32
2	4	2023-01-03	438.75	51
3	4	2023-01-04	533.75	43
4	4	2023-01-05	624.75	42
...	...	...	...	...
11941	263	2023-06-26	2100.80	1639
11942	263	2023-06-27	2354.80	1893
11943	263	2023-06-28	2198.60	1843
11944	263	2023-06-29	2147.80	1876
11945	263	2023-06-30	1912.00	1849

11946 rows × 4 columns

# Prepare the data for 3D CNN
window_size = 30  # number of previous days
num_zones = len(df_train.PULocationID.unique())  # number of taxi zones
number_of_days = df_train.tpep_pickup_date.nunique() # total number of days in the training set
# Create a 3D array with shape (num_samples_days (181-30), window_size (30), num_zones (66), 1)
X_train_cnn = np.zeros((number_of_days - window_size, window_size, num_zones, 1))
y_train_cnn = np.zeros((number_of_days - window_size, num_zones))
print("X_train_cnn shape:", X_train_cnn.shape)
print("y_train_cnn shape:", y_train_cnn.shape)

X_train_cnn shape: (151, 30, 66, 1)
y_train_cnn shape: (151, 66)

pickup_matrix = df_train.pivot_table(index='tpep_pickup_date',
                                     columns='PULocationID',
                                     values='pickup_count',
                                     fill_value=0)

# Ensure column order is consistent
pickup_matrix = pickup_matrix.sort_index(axis=1)

# Convert to numpy array
pickup_array = pickup_matrix.to_numpy()  # shape: (181, 66)

# Prepare training samples
window_size = 30
number_of_days = pickup_array.shape[0]
num_zones = pickup_array.shape[1]

X_train_cnn = np.zeros((number_of_days - window_size, window_size, num_zones, 1))
y_train_cnn = np.zeros((number_of_days - window_size, num_zones))

for i in range(number_of_days - window_size):
    X_train_cnn[i] = pickup_array[i:i+window_size].reshape(window_size, num_zones, 1)
    y_train_cnn[i] = pickup_array[i + window_size]  # the target day

print("X_train_cnn shape:", X_train_cnn.shape)  # (151, 30, 66, 1)
print("y_train_cnn shape:", y_train_cnn.shape)  # (151, 66)

X_train_cnn shape: (151, 30, 66, 1)
y_train_cnn shape: (151, 66)

from keras.layers import Conv3D, MaxPooling3D, GlobalAveragePooling3D
X_train_cnn_reshaped = X_train_cnn.reshape(X_train_cnn.shape[0], X_train_cnn.shape[1], num_zones, 1, 1)

# Build model
cnn_model = Sequential([
    Input(shape=(window_size, num_zones, 1, 1)),  # (30, 66, 1, 1)
    Conv3D(filters=16, kernel_size=(3, 3, 1), activation='relu', padding='same'),
    MaxPooling3D(pool_size=(2, 2, 1)),
    Conv3D(filters=32, kernel_size=(3, 3, 1), activation='relu', padding='same'),
    GlobalAveragePooling3D(),
    Dense(128, activation='relu'),
    Dense(num_zones)  # Predict pickups for all 66 zones
])

cnn_model.summary()

Model: "sequential_5"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv3d (Conv3D)                 │ (None, 30, 66, 1, 16)  │           160 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling3d (MaxPooling3D)    │ (None, 15, 33, 1, 16)  │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv3d_1 (Conv3D)               │ (None, 15, 33, 1, 32)  │         4,640 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling3d        │ (None, 32)             │             0 │
│ (GlobalAveragePooling3D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_7 (Dense)                 │ (None, 128)            │         4,224 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_8 (Dense)                 │ (None, 66)             │         8,514 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 17,538 (68.51 KB)

 Trainable params: 17,538 (68.51 KB)

 Non-trainable params: 0 (0.00 B)

cnn_model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the model
history = cnn_model.fit(
    X_train_cnn_reshaped,
    y_train_cnn,
    epochs=20,
    batch_size=16,
    validation_split=0.1,
    verbose=2
)

Epoch 1/20

9/9 - 0s - 52ms/step - loss: 4005698.2500 - mae: 1419.2722 - val_loss: 3158416.7500 - val_mae: 1253.2993

Epoch 2/20

9/9 - 0s - 12ms/step - loss: 2941589.0000 - mae: 1171.5555 - val_loss: 1661857.6250 - val_mae: 853.8498

Epoch 3/20

9/9 - 0s - 12ms/step - loss: 1327518.3750 - mae: 742.8868 - val_loss: 818064.5000 - val_mae: 644.8470

Epoch 4/20

9/9 - 0s - 12ms/step - loss: 569852.9375 - mae: 504.6197 - val_loss: 238549.3750 - val_mae: 318.3143

Epoch 5/20

9/9 - 0s - 12ms/step - loss: 247675.0000 - mae: 320.0619 - val_loss: 209786.3750 - val_mae: 285.2130

Epoch 6/20

9/9 - 0s - 12ms/step - loss: 213923.9531 - mae: 284.7927 - val_loss: 193069.6406 - val_mae: 264.0422

Epoch 7/20

9/9 - 0s - 12ms/step - loss: 186818.7344 - mae: 260.3803 - val_loss: 175382.7656 - val_mae: 243.6079

Epoch 8/20

9/9 - 0s - 12ms/step - loss: 180980.9219 - mae: 247.4114 - val_loss: 173867.1875 - val_mae: 239.9466

Epoch 9/20

9/9 - 0s - 12ms/step - loss: 177181.1094 - mae: 241.3169 - val_loss: 164904.8125 - val_mae: 229.6416

Epoch 10/20

9/9 - 0s - 12ms/step - loss: 178125.9219 - mae: 239.5164 - val_loss: 181181.5000 - val_mae: 237.5917

Epoch 11/20

9/9 - 0s - 12ms/step - loss: 176187.4062 - mae: 236.1614 - val_loss: 166188.3750 - val_mae: 228.2373

Epoch 12/20

9/9 - 0s - 12ms/step - loss: 175732.1406 - mae: 239.2828 - val_loss: 172761.2188 - val_mae: 234.6099

Epoch 13/20

9/9 - 0s - 12ms/step - loss: 176700.3750 - mae: 237.9642 - val_loss: 162580.7500 - val_mae: 227.5871

Epoch 14/20

9/9 - 0s - 12ms/step - loss: 176748.8438 - mae: 238.7577 - val_loss: 175204.0625 - val_mae: 234.5365

Epoch 15/20

9/9 - 0s - 12ms/step - loss: 180928.7344 - mae: 241.7198 - val_loss: 170355.4688 - val_mae: 230.6339

Epoch 16/20

9/9 - 0s - 12ms/step - loss: 180147.2656 - mae: 240.4812 - val_loss: 166596.1250 - val_mae: 229.3816

Epoch 17/20

9/9 - 0s - 12ms/step - loss: 175776.5625 - mae: 237.9384 - val_loss: 171032.1875 - val_mae: 232.2406

Epoch 18/20

9/9 - 0s - 12ms/step - loss: 177030.3750 - mae: 238.5741 - val_loss: 175354.7500 - val_mae: 234.5650

Epoch 19/20

9/9 - 0s - 12ms/step - loss: 176386.8750 - mae: 237.7845 - val_loss: 168644.7500 - val_mae: 231.0665

Epoch 20/20

9/9 - 0s - 12ms/step - loss: 176184.0469 - mae: 236.9404 - val_loss: 167511.6250 - val_mae: 228.7982

# Predict Next Day Taxi Pickups
y_pred_cnn = cnn_model.predict(X_train_cnn_reshaped[-1].reshape(1, window_size, num_zones, 1, 1))
print("Next predicted value:", y_pred_cnn)

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step


1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step

Next predicted value: [[ 1.2289052e+02  6.1680840e+01  6.3661102e+02  2.7826425e+02
7335095e+02  1.1956342e+02  1.7633020e+03  1.4629707e+02
9490791e+03  7.0366827e+02  2.7129609e+03  2.5989877e+02
5927063e+02  2.6256130e+03  5.7412750e+02  2.9784439e+02
7744995e+03  1.7953296e+03  2.3479021e+03  1.4918673e+03
4113519e+03  8.6778183e+01  1.3534446e+00  5.4715491e+02
2108417e+00  3.0071418e+00  1.2146526e+03  2.2280225e+03
6377021e+03  3.6848096e+03  1.2298606e+03  9.1107025e+02
1157504e+03  8.8898669e+02  5.5042496e+01 -1.1286393e+01
3174384e+02  5.1037202e+03  3.8835762e+03  3.2025955e+03
5167134e+03  5.3388544e+02  3.2498188e+03  3.6241040e+03
6168356e+00 -1.6419781e+00  1.8017174e+02  7.8961487e+02
6806894e+02  2.1216775e+03  3.6558472e+03  1.5101125e+03
2247576e+02  1.4124349e+03  2.9973530e+03  4.7274785e+03
2327539e+03  2.1775527e+03  3.0882129e+03  1.9887726e+01
0778816e+01  1.7538796e+03  2.3791829e+03  5.1788074e+02
5132891e+03  2.1286401e+03]]

# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
y_test_cnn = y_test_nyc  # Use the same y_test from previous example
rmse = np.sqrt(mean_squared_error(y_train_cnn[-1], y_pred_cnn[0]))
print(f'RMSE in testing set: {rmse:.2f}')

RMSE in testing set: 374.77

We have not used any spatial features in the previous traing in the CNNs, so we can use the coords of the taxi zone centriods as the spatial features.

x_list = gdf_m.geometry.centroid.to_crs(epsg=4326).x.values
y_list = gdf_m.geometry.centroid.to_crs(epsg=4326).y.values

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_list = scaler.fit_transform(x_list.reshape(-1, 1))
y_list = scaler.fit_transform(y_list.reshape(-1, 1))

x_list = x_list.reshape(-1)
y_list = y_list.reshape(-1)

coords_array = np.array( [(x_list[i], y_list[i]) for i in range(0,66)] )

coords_array.shape

(66, 2)

X_train_cnn.shape[0]

# Repeat coords across samples and time steps
coords_expanded = np.tile(coords_array, (151, window_size, 1))  # shape: (samples * time, 2)
coords_expanded = coords_expanded.reshape(151, window_size, num_zones, 2)  # (151, 30, 66, 2)

# X_train_cnn shape: (151, 30, 66, 1)
# coords_expanded shape: (151, 30, 66, 2)

X_train_with_coords = np.concatenate([X_train_cnn, coords_expanded], axis=-1)  # (151, 30, 66, 3)

# Reshape for 3D CNN input
X_input = X_train_with_coords.reshape(151, window_size, num_zones, 1, 3)  # (151, 30, 66, 1, 3)

print("X_input shape:", X_input.shape)  # (151, 30, 66, 1, 3)

X_input shape: (151, 30, 66, 1, 3)

cnn_s_model = Sequential([
    Input(shape=(window_size, num_zones, 1, 3)),
    Conv3D(32, kernel_size=(3, 3, 1), padding='same', activation='relu'),

    MaxPooling3D(pool_size=(2, 2, 1)),
    Dropout(0.1),
    Conv3D(64, kernel_size=(3, 3, 1), padding='same', activation='relu'),

    MaxPooling3D(pool_size=(2, 2, 1)),
    Dropout(0.1),

    GlobalAveragePooling3D(),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(num_zones)
])

cnn_s_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), loss='mse', metrics=['mae'])

history = cnn_s_model.fit(
    X_input, y_train_cnn,
    epochs=50,
    batch_size=8,
    validation_split=0.2,
    verbose=2
)

Epoch 1/50

15/15 - 1s - 49ms/step - loss: 3100141.5000 - mae: 1227.2567 - val_loss: 1365253.3750 - val_mae: 788.6161

Epoch 2/50

15/15 - 0s - 17ms/step - loss: 1072512.3750 - mae: 769.4460 - val_loss: 343434.8125 - val_mae: 366.7768

Epoch 3/50

15/15 - 0s - 17ms/step - loss: 557386.1875 - mae: 569.6285 - val_loss: 286140.5625 - val_mae: 336.2144

Epoch 4/50

15/15 - 0s - 17ms/step - loss: 468416.3750 - mae: 513.3658 - val_loss: 211484.0469 - val_mae: 276.1114

Epoch 5/50

15/15 - 0s - 17ms/step - loss: 400665.7188 - mae: 472.5962 - val_loss: 213182.6562 - val_mae: 268.4372

Epoch 6/50

15/15 - 0s - 17ms/step - loss: 369956.8438 - mae: 444.0385 - val_loss: 202529.3906 - val_mae: 262.8458

Epoch 7/50

15/15 - 0s - 17ms/step - loss: 347293.1875 - mae: 430.7402 - val_loss: 202589.1250 - val_mae: 268.3718

Epoch 8/50

15/15 - 0s - 17ms/step - loss: 357184.4062 - mae: 435.6439 - val_loss: 166544.1719 - val_mae: 237.3317

Epoch 9/50

15/15 - 0s - 17ms/step - loss: 322464.3125 - mae: 413.0345 - val_loss: 175893.2031 - val_mae: 238.9258

Epoch 10/50

15/15 - 0s - 17ms/step - loss: 318500.1250 - mae: 401.8153 - val_loss: 168738.3594 - val_mae: 232.0199

Epoch 11/50

15/15 - 0s - 17ms/step - loss: 294316.6250 - mae: 387.4250 - val_loss: 167198.8125 - val_mae: 234.9290

Epoch 12/50

15/15 - 0s - 17ms/step - loss: 287068.2188 - mae: 380.9982 - val_loss: 170396.7812 - val_mae: 237.7505

Epoch 13/50

15/15 - 0s - 17ms/step - loss: 285297.1875 - mae: 378.2478 - val_loss: 209391.8125 - val_mae: 266.9217

Epoch 14/50

15/15 - 0s - 17ms/step - loss: 286028.1875 - mae: 371.5200 - val_loss: 162688.7031 - val_mae: 226.0919

Epoch 15/50

15/15 - 0s - 17ms/step - loss: 291980.2188 - mae: 375.9531 - val_loss: 178756.9219 - val_mae: 245.5820

Epoch 16/50

15/15 - 0s - 17ms/step - loss: 278687.1250 - mae: 365.1249 - val_loss: 167631.9219 - val_mae: 232.3482

Epoch 17/50

15/15 - 0s - 17ms/step - loss: 269566.3750 - mae: 357.4424 - val_loss: 162846.5000 - val_mae: 231.0384

Epoch 18/50

15/15 - 0s - 17ms/step - loss: 261317.0312 - mae: 351.6025 - val_loss: 181331.3750 - val_mae: 243.1291

Epoch 19/50

15/15 - 0s - 17ms/step - loss: 242886.9375 - mae: 337.8704 - val_loss: 186670.8750 - val_mae: 251.3622

Epoch 20/50

15/15 - 0s - 17ms/step - loss: 272439.3125 - mae: 356.2213 - val_loss: 216848.9531 - val_mae: 273.6014

Epoch 21/50

15/15 - 0s - 17ms/step - loss: 267217.5312 - mae: 347.7688 - val_loss: 169811.3750 - val_mae: 235.4288

Epoch 22/50

15/15 - 0s - 17ms/step - loss: 244526.6719 - mae: 334.4143 - val_loss: 179100.9219 - val_mae: 241.6548

Epoch 23/50

15/15 - 0s - 17ms/step - loss: 278718.7188 - mae: 349.2837 - val_loss: 185225.4844 - val_mae: 245.1945

Epoch 24/50

15/15 - 0s - 17ms/step - loss: 254793.1875 - mae: 336.1122 - val_loss: 158373.3281 - val_mae: 224.4458

Epoch 25/50

15/15 - 0s - 17ms/step - loss: 262526.3750 - mae: 344.1134 - val_loss: 208559.8594 - val_mae: 266.1631

Epoch 26/50

15/15 - 0s - 17ms/step - loss: 269179.0625 - mae: 341.8260 - val_loss: 210684.1094 - val_mae: 266.8428

Epoch 27/50

15/15 - 0s - 17ms/step - loss: 253808.4844 - mae: 327.8202 - val_loss: 205884.2656 - val_mae: 265.3175

Epoch 28/50

15/15 - 0s - 17ms/step - loss: 237416.7500 - mae: 322.7746 - val_loss: 182291.5312 - val_mae: 244.8578

Epoch 29/50

15/15 - 0s - 17ms/step - loss: 245527.2812 - mae: 329.7265 - val_loss: 172915.5938 - val_mae: 236.1347

Epoch 30/50

15/15 - 0s - 17ms/step - loss: 208559.3594 - mae: 302.7233 - val_loss: 160275.1875 - val_mae: 225.6327

Epoch 31/50

15/15 - 0s - 17ms/step - loss: 218899.4844 - mae: 309.0979 - val_loss: 151619.2812 - val_mae: 218.2278

Epoch 32/50

15/15 - 0s - 17ms/step - loss: 221982.0938 - mae: 307.1187 - val_loss: 149471.3125 - val_mae: 214.4608

Epoch 33/50

15/15 - 0s - 18ms/step - loss: 228214.5469 - mae: 312.2622 - val_loss: 168615.9844 - val_mae: 232.6189

Epoch 34/50

15/15 - 0s - 17ms/step - loss: 220582.0938 - mae: 304.2811 - val_loss: 159324.8750 - val_mae: 224.0596

Epoch 35/50

15/15 - 0s - 17ms/step - loss: 211354.8438 - mae: 297.2011 - val_loss: 158466.2188 - val_mae: 223.6774

Epoch 36/50

15/15 - 0s - 17ms/step - loss: 222183.5156 - mae: 303.8752 - val_loss: 142687.0625 - val_mae: 208.9231

Epoch 37/50

15/15 - 0s - 17ms/step - loss: 195074.7188 - mae: 285.8051 - val_loss: 141147.5781 - val_mae: 208.5949

Epoch 38/50

15/15 - 0s - 17ms/step - loss: 204858.1406 - mae: 288.2387 - val_loss: 140483.0000 - val_mae: 206.7271

Epoch 39/50

15/15 - 0s - 17ms/step - loss: 224880.2031 - mae: 302.5044 - val_loss: 140867.1719 - val_mae: 210.0112

Epoch 40/50

15/15 - 0s - 17ms/step - loss: 237459.2656 - mae: 304.7390 - val_loss: 144040.4844 - val_mae: 209.4215

Epoch 41/50

15/15 - 0s - 17ms/step - loss: 198229.6406 - mae: 283.6451 - val_loss: 138064.6094 - val_mae: 205.3014

Epoch 42/50

15/15 - 0s - 17ms/step - loss: 193824.2656 - mae: 280.8102 - val_loss: 135673.0000 - val_mae: 205.0084

Epoch 43/50

15/15 - 0s - 17ms/step - loss: 193151.5000 - mae: 283.6617 - val_loss: 135744.4531 - val_mae: 204.6235

Epoch 44/50

15/15 - 0s - 17ms/step - loss: 199524.4062 - mae: 281.3769 - val_loss: 132451.3594 - val_mae: 203.0702

Epoch 45/50

15/15 - 0s - 17ms/step - loss: 178199.7500 - mae: 267.2109 - val_loss: 154402.4688 - val_mae: 221.4493

Epoch 46/50

15/15 - 0s - 17ms/step - loss: 194038.9531 - mae: 283.3656 - val_loss: 133025.8750 - val_mae: 203.3244

Epoch 47/50

15/15 - 0s - 17ms/step - loss: 193496.9219 - mae: 277.1366 - val_loss: 128109.1016 - val_mae: 197.9702

Epoch 48/50

15/15 - 0s - 17ms/step - loss: 187112.0156 - mae: 271.6195 - val_loss: 136351.9688 - val_mae: 205.1196

Epoch 49/50

15/15 - 0s - 18ms/step - loss: 177534.7812 - mae: 264.9625 - val_loss: 124026.2891 - val_mae: 196.9332

Epoch 50/50

15/15 - 0s - 18ms/step - loss: 185698.6562 - mae: 272.3214 - val_loss: 142433.6875 - val_mae: 211.1327

# Predict Next Day Taxi Pickups
y_pred_cnn = cnn_s_model.predict(X_input[-1].reshape(1, window_size, num_zones, 1, 3))
print("Next predicted value:", y_pred_cnn)

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step


1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step

Next predicted value: [[ 1.0612791e+02  3.8810486e+01  5.8801166e+02  2.9606802e+02
4752686e+02  9.2207550e+01  1.6928286e+03  1.2952293e+02
7664260e+03  6.7426154e+02  2.5758838e+03  2.1737177e+02
0807220e+02  2.4902241e+03  5.5077045e+02  2.3692393e+02
6783209e+03  1.7066562e+03  2.2098545e+03  1.3948647e+03
3424951e+03  5.2930607e+01  2.8707078e+00  4.9390933e+02
  -1.0534749e+01  5.6179547e+00  1.1687524e+03  2.0666016e+03
5180186e+03  3.4603611e+03  1.1406893e+03  8.6974042e+02
0477395e+03  8.3887866e+02  7.9398323e+01  9.6016932e+00
0503265e+02  4.7581387e+03  3.6536624e+03  3.0194661e+03
4095620e+03  5.0304251e+02  3.0460337e+03  3.4344509e+03
2814647e+01  2.0924139e+01  1.5049051e+02  7.7649182e+02
4179123e+02  2.0023422e+03  3.4768835e+03  1.3836450e+03
4643947e+02  1.3136202e+03  2.8297920e+03  4.3705015e+03
8720879e+03  2.0146099e+03  2.9179875e+03  2.5009188e+01
2797356e+01  1.6737131e+03  2.2672693e+03  5.0580597e+02
3908831e+03  2.0206296e+03]]

# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
y_test_cnn = y_test_nyc  # Use the same y_test from the previous example
rmse = np.sqrt(mean_squared_error(y_train_cnn[-1], y_pred_cnn[0]))
print(f'RMSE in testing set: {rmse:.2f}')

RMSE in testing set: 261.34

Quiz#

Change the window size to build different sets of training features and training RNNs model to predict the taxi amounts.
Use the different sizes of learn the dataset and use the LSTM to do the prediction and compare the performance with RNNs.