Week10 Lab Tutorial Timeline
Time |
Activity |
|---|---|
16:00–16:05 |
Introduction — Overview of the main tasks for the lab tutorials |
16:05–16:45 |
Tutorial: Machine learning for spatio-temporal analysis — Follow Section 10.1-10.2 of the Jupyter Notebook to practice the ANNs and RNNs |
16:45–17:30 |
Tutorial: Machine learning for spatio-temporal analysis — Follow Section 10.3 of the Jupyter Notebook to practice the LSTMs and CNNs |
17:30–17:55 |
Quiz — Complete quiz tasks |
17:55–18:00 |
Wrap-up — Recap key points and address final questions |
For this module’s lab tutorials, you can download all the required data using the provided link (click).
Please make sure that both the Jupyter Notebook file and the data and img folder are placed in the same directory (specifically within the STBDA_lab folder) to ensure the code runs correctly.
Week 10 Key Takeaways:
Understand the basics of deep learning and its applications in spatio-temporal analysis.
Learn how to build and train Artificial Neural Networks (ANNs) for regression tasks.
Explore Recurrent Neural Networks (RNNs) for sequential data analysis.
Implement Long Short-Term Memory (LSTM) networks for time series prediction.
Gain insights into Convolutional Neural Networks (CNNs) and their applications in spatio-temporal data analysis.
Practice building and evaluating deep learning models using TensorFlow and Keras.
10 Deep learning for spatio-temporal analysis#
What is deep learning? Deep learning is a subset of machine learning that uses neural networks with many layers (deep architectures) to learn complex patterns in large datasets. It is particularly effective for tasks such as image and speech recognition, natural language processing, and spatio-temporal analysis.
Deep learning models can automatically learn features from raw data, eliminating the need for manual feature engineering. They are capable of handling high-dimensional data and can generalize well to unseen data, making them powerful tools for various applications.
What is neural networks? Neural networks are computational models inspired by the structure and function of the human brain. They consist of interconnected nodes (neurons) organized in layers, where each layer transforms the input data through weighted connections and activation functions. Neural networks can learn complex relationships in data by adjusting the weights based on the error between predicted and actual outputs during training.
The data structures in deep learning
The data structures used in deep learning are often referred to as tensors. Tensors are multi-dimensional arrays that can represent data in various forms, such as scalars, vectors, matrices, and higher-dimensional arrays.
Term |
Description |
Corresponding Array Dimensionality |
Example |
|---|---|---|---|
Scalar |
A single number (magnitude only) |
0D tensor (no dimensions) |
|
Vector |
A list of numbers (magnitude + direction) |
1D tensor |
|
Matrix |
A table of numbers (2D grid) |
2D tensor |
|
Tensor |
Generalization to more than 2 dimensions |
3D and higher tensors |
3D array shape |
Reshaping is a common operation in deep learning that allows you to change the shape of a tensor without changing its data. It is often used to prepare data for input into neural networks or to manipulate the output of a model.
# 1D array to 2D array
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", array_1d)
print("Shape of 1D Array:", array_1d.shape)
1D Array: [1 2 3 4 5]
Shape of 1D Array: (5,)
# Reshape to 2D array with 5 rows and 1 column
array_2d = array_1d.reshape(5, 1)
print("Reshaped 2D Array:\n", array_2d)
print("Shape of Reshaped 2D Array:", array_2d.shape)
Reshaped 2D Array:
[[1]
[2]
[3]
[4]
[5]]
Shape of Reshaped 2D Array: (5, 1)
# 2D array to 3D array
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)
print("Shape of 2D Array:", array_2d.shape)
2D Array:
[[1 2 3]
[4 5 6]]
Shape of 2D Array: (2, 3)
# Reshape to 3D array with 2 rows, 1 column, and 3 depth
array_3d = array_2d.reshape(2, 1, 3)
print("Reshaped 3D Array:\n", array_3d)
print("Shape of Reshaped 3D Array:", array_3d.shape)
Reshaped 3D Array:
[[[1 2 3]]
[[4 5 6]]]
Shape of Reshaped 3D Array: (2, 1, 3)
# select the first row and first column of the 3D array
selected_value = array_3d[0, 0, 2]
print("Selected Value from 3D Array:", selected_value)
Selected Value from 3D Array: 3
# slect the first row and all columns of the 3D array
selected_row = array_3d[0, 0, :]
print("Selected Row from 3D Array:", selected_row)
Selected Row from 3D Array: [1 2 3]
# build a 3D array with 2 rows, 3 columns, and 4 depth
array_3d = np.array([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
[[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]])
print("3D Array:\n", array_3d)
print("Shape of 3D Array:", array_3d.shape)
3D Array:
[[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
[[13 14 15 16]
[17 18 19 20]
[21 22 23 24]]]
Shape of 3D Array: (2, 3, 4)
In TensorFlow, a tensor is the core data structure — it represents multi-dimensional arrays just like NumPy arrays, but with more capabilities like GPU support, gradient tracking, and symbolic computation.
We can use the tf.constant function to create tensors in TensorFlow. Tensors can have different dimensions, such as scalars (0D), vectors (1D), matrices (2D), and higher-dimensional arrays (3D and beyond). The shape of a tensor is defined by its dimensions, which can be accessed using the .shape attribute.
import tensorflow as tf
# Scalar
scalar = tf.constant(42)
print(scalar.shape) # ()
# Vector (1D)
vector = tf.constant([1.0, 2.0, 3.0])
print(vector.shape) # (3,)
# Matrix (2D)
matrix = tf.constant([[1, 2], [3, 4]])
print(matrix.shape) # (2, 2)
# Tensor (3D)
tensor_3d = tf.constant([
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
])
print(tensor_3d.shape) # (2, 2, 2)
()
(3,)
(2, 2)
(2, 2, 2)
# check if tf can use GPU
print(tf.config.list_physical_devices('GPU'))
[]
Key components and concepts related to Neural Networks (NNs)
The table of contents below provides an overview of the key components and concepts related to NNs:
Component |
Description |
|---|---|
Input Layer |
The first layer that receives input data. Each neuron represents a feature or variable. |
Hidden Layers |
Intermediate layers that perform transformations on the input data. Each layer can have multiple neurons. |
Output Layer |
The final layer that produces the output of the ANN. The number of neurons corresponds to the number of classes or regression outputs. |
Activation Functions |
Functions applied to the output of each neuron to introduce non-linearity. Common functions include ReLU, sigmoid, and tanh. |
Weights and Biases |
Parameters that are learned during training. Weights determine the strength of connections between neurons, while biases allow for flexibility in the model. |
Loss Function |
A function that measures the difference between predicted and actual outputs. Common loss functions include mean squared error (MSE) for regression and cross-entropy for classification. |
Optimizer |
An algorithm used to update the weights and biases during training to minimize the loss function. Common optimizers include stochastic gradient descent (SGD), Adam, and RMSprop. |
Backpropagation |
A training algorithm that computes gradients of the loss function with respect to the weights and biases, allowing for efficient weight updates. |
Batch Size |
The number of samples processed before the model’s weights are updated. Smaller batch sizes can lead to more frequent updates, while larger batch sizes can provide more stable gradients. |
Epoch |
One complete pass through the entire training dataset. Multiple epochs are often required for convergence. |
Activation Functions
AF are formulas that define a neuron’s output in a neural network. They add non-linearity and enable the network to learn complex patterns. Without them, the model would act like a simple linear regression.
Name |
Formula |
Characteristics |
Use Case |
|---|---|---|---|
Sigmoid |
\(\sigma(x) = \frac{1}{1 + e^{-x}}\) |
Smooth, outputs between 0 and 1 |
Binary classification, older models |
Tanh |
\(\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\) |
Output between -1 and 1; zero-centered |
Sometimes used in hidden layers |
ReLU |
\(f(x) = \max(0, x)\) |
Simple, fast, sparse activation |
Most common in hidden layers |
Leaky ReLU |
\(f(x) = \max(0.01x, x)\) |
Fixes dying ReLU problem |
Variants of ReLU where some gradient flows |
ELU |
\(f(x) = x \text{ if } x>0 \text{ else } \alpha(e^x - 1)\) |
Better performance in some cases |
Slight improvement over ReLU in some tasks |
Softmax |
\(\text{softmax}(x_i) = \frac{e^{x_i}}{\sum e^{x_j}}\) |
Converts scores into probabilities |
Used in the output layer for classification |
10.1 Artificial Neural Networks (ANNs)#
ANN (Artificial Neural Network) is a computational model inspired by the way biological neural networks in the human brain process information. ANNs consist of interconnected groups of artificial neurons that work together to solve specific problems, such as classification, regression, or pattern recognition. They are widely used in various fields, including computer vision, natural language processing, and spatio-temporal analysis.
10.1.1 ANNs Regression Example#
We use a simple ANN model to predict the taxi pickup amounts in Manhattan, New York City, using the NYC Taxi dataset. The model will be trained on historical data to predict the number of pickups for the next day based on previous days’ data.
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# x: inputs, y: outputs
# Generate 50 data points between -3 and 3
x = np.linspace(-3, 3, 50)
np.random.seed(42) # For reproducibility
# Create corresponding y values with a little noise
y = 2 * x + 1
print(x)
[-3. -2.87755102 -2.75510204 -2.63265306 -2.51020408 -2.3877551
-2.26530612 -2.14285714 -2.02040816 -1.89795918 -1.7755102 -1.65306122
-1.53061224 -1.40816327 -1.28571429 -1.16326531 -1.04081633 -0.91836735
-0.79591837 -0.67346939 -0.55102041 -0.42857143 -0.30612245 -0.18367347
-0.06122449 0.06122449 0.18367347 0.30612245 0.42857143 0.55102041
0.67346939 0.79591837 0.91836735 1.04081633 1.16326531 1.28571429
1.40816327 1.53061224 1.65306122 1.7755102 1.89795918 2.02040816
2.14285714 2.26530612 2.3877551 2.51020408 2.63265306 2.75510204
2.87755102 3. ]
print(x.shape)
(50,)
print(y)
[-5. -4.75510204 -4.51020408 -4.26530612 -4.02040816 -3.7755102
-3.53061224 -3.28571429 -3.04081633 -2.79591837 -2.55102041 -2.30612245
-2.06122449 -1.81632653 -1.57142857 -1.32653061 -1.08163265 -0.83673469
-0.59183673 -0.34693878 -0.10204082 0.14285714 0.3877551 0.63265306
0.87755102 1.12244898 1.36734694 1.6122449 1.85714286 2.10204082
2.34693878 2.59183673 2.83673469 3.08163265 3.32653061 3.57142857
3.81632653 4.06122449 4.30612245 4.55102041 4.79591837 5.04081633
5.28571429 5.53061224 5.7755102 6.02040816 6.26530612 6.51020408
6.75510204 7. ]
print(y.shape)
(50,)
# Reshape the data to be 2D (required by Keras: shape (batch_size, features))
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
print(x)
[[-3. ]
[-2.87755102]
[-2.75510204]
[-2.63265306]
[-2.51020408]
[-2.3877551 ]
[-2.26530612]
[-2.14285714]
[-2.02040816]
[-1.89795918]
[-1.7755102 ]
[-1.65306122]
[-1.53061224]
[-1.40816327]
[-1.28571429]
[-1.16326531]
[-1.04081633]
[-0.91836735]
[-0.79591837]
[-0.67346939]
[-0.55102041]
[-0.42857143]
[-0.30612245]
[-0.18367347]
[-0.06122449]
[ 0.06122449]
[ 0.18367347]
[ 0.30612245]
[ 0.42857143]
[ 0.55102041]
[ 0.67346939]
[ 0.79591837]
[ 0.91836735]
[ 1.04081633]
[ 1.16326531]
[ 1.28571429]
[ 1.40816327]
[ 1.53061224]
[ 1.65306122]
[ 1.7755102 ]
[ 1.89795918]
[ 2.02040816]
[ 2.14285714]
[ 2.26530612]
[ 2.3877551 ]
[ 2.51020408]
[ 2.63265306]
[ 2.75510204]
[ 2.87755102]
[ 3. ]]
print(x.shape)
(50, 1)
print(y)
[[-5. ]
[-4.75510204]
[-4.51020408]
[-4.26530612]
[-4.02040816]
[-3.7755102 ]
[-3.53061224]
[-3.28571429]
[-3.04081633]
[-2.79591837]
[-2.55102041]
[-2.30612245]
[-2.06122449]
[-1.81632653]
[-1.57142857]
[-1.32653061]
[-1.08163265]
[-0.83673469]
[-0.59183673]
[-0.34693878]
[-0.10204082]
[ 0.14285714]
[ 0.3877551 ]
[ 0.63265306]
[ 0.87755102]
[ 1.12244898]
[ 1.36734694]
[ 1.6122449 ]
[ 1.85714286]
[ 2.10204082]
[ 2.34693878]
[ 2.59183673]
[ 2.83673469]
[ 3.08163265]
[ 3.32653061]
[ 3.57142857]
[ 3.81632653]
[ 4.06122449]
[ 4.30612245]
[ 4.55102041]
[ 4.79591837]
[ 5.04081633]
[ 5.28571429]
[ 5.53061224]
[ 5.7755102 ]
[ 6.02040816]
[ 6.26530612]
[ 6.51020408]
[ 6.75510204]
[ 7. ]]
print(y.shape)
(50, 1)
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import layers
from keras import Input
# Build a simple ANN model
# Sequential: A linear stack of layers
ann = Sequential([
# Input layer shape for each sample
Input(shape=(1,)),
# Hidden layer with 30 neurons
Dense(units=30, activation='relu'),
# Output layer with 1 neuron (predicting one continuous value)
Dense(units=1)])
# loss = mean squared error (suitable for regression)
# optimizer = Adam (adaptive gradient descent)
ann.compile(optimizer='adam', loss='mse')
# Train the model
# epochs: how many times to go over the entire dataset
# batch_size: number of samples per gradient update
history = ann.fit(x, y, epochs=100, batch_size=10, verbose=0)
# Make predictions
y_pred = ann.predict(x)
1/2 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step
y_pred
array([[-4.5038476 ],
[-4.3176517 ],
[-4.131456 ],
[-3.9452596 ],
[-3.7590637 ],
[-3.572868 ],
[-3.386672 ],
[-3.2004762 ],
[-3.0142803 ],
[-2.8280845 ],
[-2.6418881 ],
[-2.4556925 ],
[-2.2694964 ],
[-2.0833006 ],
[-1.8971049 ],
[-1.710909 ],
[-1.5205791 ],
[-1.3220133 ],
[-1.1202302 ],
[-0.90876573],
[-0.675929 ],
[-0.37348443],
[ 0.02919072],
[ 0.45655853],
[ 0.8759759 ],
[ 1.2953931 ],
[ 1.5952731 ],
[ 1.8382 ],
[ 2.081127 ],
[ 2.3240538 ],
[ 2.5623913 ],
[ 2.7903113 ],
[ 3.0182316 ],
[ 3.246152 ],
[ 3.470642 ],
[ 3.694658 ],
[ 3.918674 ],
[ 4.14269 ],
[ 4.366706 ],
[ 4.590722 ],
[ 4.814436 ],
[ 5.0373707 ],
[ 5.2603054 ],
[ 5.48324 ],
[ 5.706175 ],
[ 5.9291096 ],
[ 6.1520443 ],
[ 6.3749795 ],
[ 6.597914 ],
[ 6.820849 ]], dtype=float32)
# Evaluate the model
loss = ann.evaluate(y_pred, y, verbose=0)
print(f"Final MSE loss: {loss:.4f}")
Final MSE loss: 13.2254
# Plot the predicted values and the actual values using the scatter plot
plt.figure(figsize=(5, 5))
plt.scatter(y_pred, y, alpha=0.5)
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Actual Values')
plt.title('Predicted vs Actual Values')
plt.xlim(0, y.max() * 1.1)
plt.ylim(0, y.max() * 1.1)
plt.show()
10.1.2 ANNs Regression for NYC Taxi Pickups Prediction#
Now, we will use the NYC Taxi dataset to predict the number of pickups in Manhattan using an ANN model. The dataset contains daily pickup counts for different taxi zones in New York City. The X_train, y_train, and X_test, y_test were from the previous week, which are the features and target variables for training and testing the model.
We select the sliding window size as 30 days – we will use the previous 30 days’ taxi pickups in the same zone to predict the next day’s taxi pickups in the same zone. The output is the next day pickup amounts for all taxi zones. Training data is from 2023-01-01 to 2023-06-30 for 66 taxi zones, and the testing data is 2023-07-01 for 66 taxi zones.
Spatial units: 66
Temporal units: 181
Spatial and temporal units: 66*181
Features: 30 previous temporal values and 30 spatial lags (for each taxi zone, each day has one each spatial lags)
Target: next day values for 66 taxi zones. (66, )
X_train shape: 66*(181-30) * (30+30); y_train shape: 66*(181-30)
X_test shape: 66*(30+30); y_test shape: 66
# train set is from 2023-01-01 to 2023-11-30
X_train_nyc = np.load('data/X_train.npy')
y_train_nyc = np.load('data/y_train.npy')
X_test_nyc = np.load('data/X_test.npy')
y_test_nyc = np.load('data/y_test.npy')
print(X_train_nyc.shape, y_train_nyc.shape,
X_test_nyc.shape, y_test_nyc.shape)
(9966, 60) (9966,) (66, 60) (66,)
X_train_nyc
array([[ 174. , 1074.75, 32. , ..., 1499.25, 38. , 442.5 ],
[ 32. , 432.5 , 51. , ..., 442.5 , 60. , 579.75],
[ 51. , 438.75, 43. , ..., 579.75, 75. , 648.25],
...,
[1179. , 1047.2 , 1796. , ..., 2100.8 , 1893. , 2354.8 ],
[1796. , 2394.8 , 2196. , ..., 2354.8 , 1843. , 2198.6 ],
[2196. , 2667.6 , 2180. , ..., 2198.6 , 1876. , 2147.8 ]])
Build a simple ANN model
# Sequential: A linear stack of layers
ann_model = Sequential([
# Input layer for 1 feature
Input(shape=(60,)),
# Hidden layer with 30 neurons
layers.Dense(units=30, activation='relu'),
# Output layer with 1 neuron (predicting one continuous value)
layers.Dense(units=1)])
# loss = mean squared error (suitable for regression)
# optimizer = Adam (adaptive gradient descent)
ann_model.compile(optimizer='adam', loss='mse')
%%time
# Train the model
history = ann_model.fit(X_train_nyc, y_train_nyc, epochs=100, batch_size=10, verbose=0)
CPU times: user 30.6 s, sys: 6.06 s, total: 36.6 s
Wall time: 23.2 s
# Evaluate the model
loss = ann_model.evaluate(X_train_nyc, y_train_nyc, verbose=0)
print(f"Final RMSE loss (training set): {np.sqrt(loss):.4f}")
Final RMSE loss (training set): 159.3046
# predict taxi pickups the next day
y_pred_nyc = ann_model.predict(X_test_nyc)
1/3 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
y_pred_nyc
array([[9.39334167e+02],
[4.50067657e+02],
[6.35627319e+02],
[7.07341064e+02],
[3.89289825e+02],
[1.46727707e+02],
[1.97894495e+03],
[7.18767334e+02],
[2.01176208e+03],
[2.01610815e+03],
[2.08259497e+03],
[2.13194763e+02],
[1.24898083e+03],
[1.29628394e+03],
[3.39075317e+02],
[3.61877197e+02],
[2.08561255e+03],
[2.91301050e+03],
[1.71802954e+03],
[2.00166870e+03],
[1.17749438e+03],
[6.62513962e+01],
[3.52951660e+01],
[1.26692676e+03],
[6.01861906e+00],
[7.25930882e+00],
[1.53382251e+03],
[1.90954944e+03],
[2.44485718e+03],
[2.02276282e+03],
[2.15617236e+03],
[8.47588867e+02],
[7.10824646e+02],
[1.09190955e+03],
[1.81094299e+02],
[1.00701034e+00],
[1.49973059e+03],
[3.11276489e+03],
[3.09069580e+03],
[3.16033081e+03],
[2.66960425e+03],
[2.14964172e+02],
[2.56099316e+03],
[2.79663647e+03],
[2.14516907e+02],
[2.92781925e+00],
[5.62130798e+02],
[9.28009399e+02],
[1.43415881e+03],
[2.36032373e+03],
[2.64190845e+03],
[6.56314636e+02],
[9.76706421e+02],
[2.15849805e+03],
[2.26821851e+03],
[2.69006299e+03],
[3.18407373e+03],
[1.74114819e+03],
[2.00319946e+03],
[1.48818445e+01],
[3.64984093e+01],
[1.66785315e+03],
[1.51006335e+03],
[5.28614929e+02],
[1.57510657e+03],
[1.76603149e+03]], dtype=float32)
# Calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test_nyc, y_pred_nyc))
print(f'RMSE in testing set: {rmse:.2f}')
RMSE in testing set: 642.98
# plot the predicted values and the actual values using the scatter plot
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.scatter(y_test_nyc, y_pred_nyc, alpha=0.5)
plt.plot([y_test_nyc.min(), y_test_nyc.max()],
[y_test_nyc.min(), y_test_nyc.max()], color='red', linestyle='--')
plt.xlabel('Actual Taxi Pickups')
plt.ylabel('Predicted Taxi Pickups')
plt.title('Predicted vs Actual Taxi Pickups in NYC (2023-07-01)')
plt.xlim(0, y_test_nyc.max() * 1.1)
plt.ylim(0, y_test_nyc.max() * 1.1)
(0.0, 3558.5000000000005)
import geopandas as gpd
gdf_m = gpd.read_file('data/gdf_man.geojson')
def plot_predicted_pickups(gdf_m, y_pred, y_test):
# let plot the predicted values and the actual values using mapping
gdf_m['predicted_pickup'] = y_pred
gdf_m['actual_pickup'] = y_test
gdf_m['residual'] = gdf_m['predicted_pickup'] - gdf_m['actual_pickup']
fig, ax = plt.subplots(1, 3, figsize=(15, 6))
gdf_m.plot(column='predicted_pickup', ax=ax[0], legend=True, cmap='RdPu', edgecolor='black',
vmin=0, vmax= np.max([gdf_m['predicted_pickup'].max(), gdf_m['actual_pickup'].max()]))
ax[0].set_title('Predicted Taxi Pickups (2023-07-01)')
gdf_m.plot(column='actual_pickup', ax=ax[1], legend=True, cmap='RdPu', edgecolor='black',
vmin=0, vmax= np.max([gdf_m['predicted_pickup'].max(), gdf_m['actual_pickup'].max()]))
ax[1].set_title('Actual Taxi Pickups (2023-07-01)')
gdf_m.plot(column='residual', ax=ax[2], legend=True, cmap='coolwarm', edgecolor='black',
vmin=-2000, vmax=2000)
ax[2].set_title('Residual Taxi Pickups (2023-07-01)')
ax[0].set_axis_off()
ax[1].set_axis_off()
ax[2].set_axis_off()
plt.tight_layout()
plt.show()
plot_predicted_pickups(gdf_m, y_pred_nyc, y_test_nyc)
10.2 Recurrent Neural Networks (RNNs)#
Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed for processing sequential data — data where current inputs are dependent on previous inputs. They are widely used in time series prediction, speech recognition, and other tasks involving sequences. You can find the tutorial of using RNNs from this tensorflow page.
10.2.1 RNN Regression Example#
np.random.seed(42)
# Generate Time Series Data
time_steps = 30
x = np.linspace(0, 4 * np.pi, time_steps)
data = np.sin(x).reshape(-1, 1) # shape: (30, 1)
# Prepare sliding windows (time Lag) features
lag = 3 # how many previous sliding windows (time steps) to use
X = []
y = []
for i in range(len(data) - lag):
X.append(data[i:i+lag])
y.append(data[i+lag])
X = np.array(X) # shape: (27, 3, 1)
y = np.array(y) # shape: (27, 1)
print("X shape:", X.shape)
print("y shape:", y.shape)
X shape: (27, 3, 1)
y shape: (27, 1)
X[0]
array([[0. ],
[0.4198891 ],
[0.76216206]])
from keras.layers import SimpleRNN, Dense
# Define RNNs
rnn = Sequential([
Input(shape=(3, 1)), # the input shape is the data shape per sample
SimpleRNN(units=10, activation='tanh'),
Dense(1) # output layer
])
rnn.compile(optimizer='adam', loss='mse')
# Train Model
rnn.fit(X, y, epochs=200, verbose=0)
<keras.src.callbacks.history.History at 0x36c05b520>
X[-1]
array([[-0.96354999],
[-0.76216206],
[-0.4198891 ]])
# Predict Next Value
pred = rnn.predict(X[-1].reshape(1, 3, 1))
print("Next predicted value:", pred[0][0])
WARNING:tensorflow:5 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x36c0b7490> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step
Next predicted value: -0.027176028
10.2.2 RNNs for NYC Taxi Pickups Prediction#
As RNNs require a 3D input shape, we need to reshape the data accordingly if we have a 2D array. Alernatively, we can build the data in a 3D array directly in the feature engineering step like the example above.
# read the df train from the previous week example
df_train = pd.read_csv('data/df_train.csv', index_col=0)
df_train.head()
| PULocationID | tpep_pickup_date | spatial_lag | pickup_count | |
|---|---|---|---|---|
| 0 | 4 | 2023-01-01 | 1074.75 | 174 |
| 1 | 4 | 2023-01-02 | 432.50 | 32 |
| 2 | 4 | 2023-01-03 | 438.75 | 51 |
| 3 | 4 | 2023-01-04 | 533.75 | 43 |
| 4 | 4 | 2023-01-05 | 624.75 | 42 |
# create a 3d X array with 30 previous days' taxi pickups and spatial lags for each taxi zone
X_train = []
y_train = []
for zone in df_train['PULocationID'].unique():
# get the taxi zone data
zone_data = df_train[df_train['PULocationID'] == zone][['spatial_lag','pickup_count']].values
# create a 3D array with 30 previous days' taxi pickups and spatial lags for each taxi zone
for i in range(len(zone_data) - 30):
X_train.append(zone_data[i:i+30])
y_train.append(zone_data[i+30][1]) # the next day pickup count
X_train = np.array(X_train)
y_train = np.array(y_train).reshape(-1, 1) # reshape y to be 2D
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
X_train shape: (9966, 30, 2)
y_train shape: (9966, 1)
X_test = np.array([df_train[df_train['PULocationID'] == zone][['spatial_lag','pickup_count']].values[-30:] for zone in df_train['PULocationID'].unique() ]) # the last 30 days' data for the last taxi zone
X_test.shape
(66, 30, 2)
y_test = y_test_nyc
# Define RNNs
rnn_model = Sequential([
Input(shape=(30, 2)), # the input shape is the data shape per sample
SimpleRNN(units=32, activation='tanh'),
Dense(1) # output layer
])
rnn_model.compile(optimizer='adam', loss='mse')
# Train Model
rnn_model.fit(X_train, y_train, epochs=200, verbose=0)
<keras.src.callbacks.history.History at 0x36f2ca410>
# Predict Next Value
y_pred = rnn_model.predict(X_test)
print("Next predicted value:", y_pred)
WARNING:tensorflow:6 out of the last 7 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x36c523910> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
1/3 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step
Next predicted value: [[ 9.55611572e+01]
[ 6.92451477e+01]
[ 6.60138611e+02]
[ 3.05185669e+02]
[ 3.05020081e+02]
[ 8.30399628e+01]
[ 1.62078467e+03]
[ 1.86968475e+02]
[ 1.84369922e+03]
[ 6.60539307e+02]
[ 1.84369922e+03]
[ 1.87114334e+02]
[ 6.60538696e+02]
[ 1.84369922e+03]
[ 5.43842712e+02]
[ 3.05739258e+02]
[ 1.49032983e+03]
[ 1.60041650e+03]
[ 1.84369922e+03]
[ 1.60659424e+03]
[ 1.60734595e+03]
[ 1.08283340e+02]
[ 1.19630661e+01]
[ 6.59502258e+02]
[-1.27464294e-01]
[-2.08326340e+01]
[ 1.60731213e+03]
[ 1.84164062e+03]
[ 1.84369922e+03]
[ 1.84369922e+03]
[ 9.45067505e+02]
[ 1.02289917e+03]
[ 1.15081250e+03]
[ 1.01696198e+03]
[ 7.36858139e+01]
[-8.55097198e+00]
[ 8.96919800e+02]
[ 1.84369922e+03]
[ 1.84369922e+03]
[ 1.84369922e+03]
[ 1.84369775e+03]
[ 5.42993408e+02]
[ 1.84369922e+03]
[ 1.84369922e+03]
[-4.95172424e+01]
[-2.72332764e+00]
[ 3.05178467e+02]
[ 1.00917725e+03]
[ 1.87447525e+02]
[ 1.72517090e+03]
[ 1.84369922e+03]
[ 1.60567896e+03]
[ 8.14861145e+01]
[ 1.48914258e+03]
[ 1.84369922e+03]
[ 1.84369922e+03]
[ 1.84369922e+03]
[ 1.75190942e+03]
[ 1.84369922e+03]
[ 1.99201584e+01]
[ 6.48842468e+01]
[ 1.60734595e+03]
[ 1.84369922e+03]
[ 5.44805420e+02]
[ 1.60734033e+03]
[ 1.75878052e+03]]
# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE in testing set: {rmse:.2f}')
RMSE in testing set: 428.92
y_test
array([ 169, 88, 434, 185, 231, 95, 1656, 172, 2805, 681, 2322,
220, 447, 2862, 527, 269, 1533, 1462, 1685, 1173, 1631, 77,
1, 450, 6, 1, 791, 984, 1841, 2644, 753, 1143, 1613,
623, 68, 1, 1025, 3235, 2198, 2143, 1878, 309, 1848, 2553,
2, 9, 187, 836, 104, 1385, 3081, 992, 156, 932, 2189,
2079, 2613, 1331, 1965, 15, 64, 1640, 2368, 597, 844, 1380])
# plot the predicted values and the actual values using the scatter plot
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--')
plt.xlabel('Actual Taxi Pickups')
plt.ylabel('Predicted Taxi Pickups')
plt.title('Predicted vs Actual Taxi Pickups in NYC (2023-07-01)')
plt.xlim(0, y_test.max() * 1.1)
plt.ylim(0, y_test.max() * 1.1)
plt.show()
plot_predicted_pickups(gdf_m, y_pred, y_test)
10.3 Long Short-Term Memory (LSTM) networks#
LSTM stands for Long Short-Term Memory. It’s a variant of Recurrent Neural Network (RNN) architecture designed to better capture long-range dependencies in sequence data. Traditional RNNs tend to have trouble remembering information from far back in the sequence because of something called the vanishing gradient problem during training. LSTMs solve this by introducing a structure called memory cells and special gates (input gate, forget gate, output gate) that control the flow of information. This gating architecture allows LSTMs to remember important information for longer periods and selectively forget irrelevant parts. You can find the tutorial of using LSTMs from this tensorflow page.
Aspect |
RNN |
LSTM |
|---|---|---|
Architecture |
Simple recurrent units with loops |
Complex cells with memory cell and three gates |
Memory |
Limited to short-term memory |
Can capture long-term dependencies |
Vanishing gradient |
Prone to vanishing or exploding gradients |
Designed to mitigate vanishing gradient problems |
Performance on long sequences |
Struggles with long sequences |
Performs much better on longer sequences |
Gates |
No gates |
Input, forget, and output gates for better control |
Use cases |
Basic sequential tasks |
Complex sequential tasks requiring context retention |
# Define a LSTM model
from keras.layers import LSTM
lstm_model = Sequential([
Input(shape=(30, 2)), # the input shape is the data shape per sample
# For stacked LSTM layers, all LSTM layers except the last one must have return_sequences=True
LSTM(units=32, activation='tanh', return_sequences=True), # 1st LSTM layer
LSTM(units=32, activation='tanh', return_sequences=True), # 2nd LSTM layer
LSTM(units=32, activation='tanh'), # 3rd LSTM layer
Dense(1) # output layer
])
lstm_model.compile(optimizer='adam', loss='mse')
# Train Model
lstm_model.fit(X_train, y_train, epochs=200, verbose=0)
<keras.src.callbacks.history.History at 0x357490eb0>
# Predict Next Value
y_pred = lstm_model.predict(X_test)
print("Next predicted value:", y_pred)
1/3 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step
Next predicted value: [[ 92.34474 ]
[ 28.663183 ]
[ 553.3706 ]
[ 201.01932 ]
[ 260.48154 ]
[ 99.650894 ]
[1932.6918 ]
[ 123.89032 ]
[1955.9598 ]
[ 557.4681 ]
[1955.7119 ]
[ 234.72284 ]
[ 468.69632 ]
[1955.396 ]
[ 487.5384 ]
[ 265.44022 ]
[1925.011 ]
[ 942.0801 ]
[1954.8771 ]
[ 786.6172 ]
[1913.961 ]
[ 85.12416 ]
[ 4.5358276]
[ 289.11566 ]
[ 9.526817 ]
[ 3.0452003]
[ 946.47186 ]
[1028.5009 ]
[1955.5829 ]
[1956.1534 ]
[ 667.39746 ]
[ 661.18286 ]
[1849.1115 ]
[ 984.1623 ]
[ 72.51457 ]
[ 2.872387 ]
[ 754.88135 ]
[1956.7798 ]
[1955.7311 ]
[1955.7103 ]
[1954.684 ]
[ 465.01376 ]
[1955.7123 ]
[1956.3638 ]
[ 5.4966393]
[ 4.8534317]
[ 198.47726 ]
[ 654.40326 ]
[ 174.56087 ]
[1737.6897 ]
[1955.7128 ]
[1431.2847 ]
[ 78.51694 ]
[ 969.4642 ]
[1954.5591 ]
[1956.7814 ]
[1956.7844 ]
[1945.45 ]
[1955.7158 ]
[ 23.310139 ]
[ 77.2171 ]
[1911.1652 ]
[1954.6063 ]
[ 510.52734 ]
[ 955.4945 ]
[1947.2797 ]]
# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE in testing set: {rmse:.2f}')
RMSE in testing set: 364.74
# plot the predicted values and the actual values using the scatter plot
import matplotlib.pyplot as plt
plt.figure(figsize=(5, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--')
plt.xlabel('Actual Taxi Pickups')
plt.ylabel('Predicted Taxi Pickups')
plt.title('Predicted vs Actual Taxi Pickups in NYC (2023-07-01)')
plt.xlim(0, y_test.max() * 1.1)
plt.ylim(0, y_test.max() * 1.1)
plt.show()
plot_predicted_pickups(gdf_m, y_pred, y_test)
10.4 Convolutional Neural Networks (CNNs)#
Convolutional Neural Networks (CNNs) are a type of deep learning model specially designed to process data that has a grid-like topology — most commonly images (2D grids of pixels) but also 1D sequences or 3D volumes. CNN layers extract spatial features from each snapshot (image/frame/grid at a particular time).
MaxPooling is a downsampling operation commonly used in CNNs to reduce the spatial dimensions (height and width) of the feature maps. MaxPooling provides spatial invariance and helps the model recognize features regardless of small shifts or distortions. It can help to prevent overfitting by making the representations more compact.
How does it work?
Use a small window (called the pool size, e.g., 2x2) and slide it over the input feature map.
For each window, MaxPooling outputs the maximum value within that window.
This effectively keeps the most important feature (the strongest activation) in that region.
Suppose you have a 4x4 feature map:
matrix = [
[1, 3, 2, 4],
[5, 6, 1, 2],
[3, 2, 8, 7],
[4, 1, 2, 6]
]
Applying MaxPooling with a 2x2 window and stride 2 gives:
Max of top-left 2x2 window:
max(1,3,5,6) = 6
Max of top-right 2x2 window:
max(2,4,1,2) = 4
Max of bottom-left 2x2 window:
max(3,2,4,1) = 4
Max of bottom-right 2x2 window:
max(8,7,2,6) = 8
Output after pooling:
[[6, 4],
[4, 8]]
There are two common approaches to model temporal dependencies:
3D CNNs: Extend convolution into the time dimension with 3D kernels that slide over space and time simultaneously. This captures spatio-temporal features in one step.
CNN + RNN/LSTM: Use CNN layers to extract spatial features at each time step, then feed the sequence of features into an RNN/LSTM/GRU to learn temporal dependencies.
For example, we can use a 3D CNN to predict the next day taxi pickups in NYC using the previous 30 days’ data for each taxi zone. So, each sample is a 3D array with the shape of (30, 66, 1), where 30 is the number of previous days, 66 is the number of taxi zones, and 1 is the number of features (the pickup count). The output is the next day pickup amounts for all taxi zones.
df_train
| PULocationID | tpep_pickup_date | spatial_lag | pickup_count | |
|---|---|---|---|---|
| 0 | 4 | 2023-01-01 | 1074.75 | 174 |
| 1 | 4 | 2023-01-02 | 432.50 | 32 |
| 2 | 4 | 2023-01-03 | 438.75 | 51 |
| 3 | 4 | 2023-01-04 | 533.75 | 43 |
| 4 | 4 | 2023-01-05 | 624.75 | 42 |
| ... | ... | ... | ... | ... |
| 11941 | 263 | 2023-06-26 | 2100.80 | 1639 |
| 11942 | 263 | 2023-06-27 | 2354.80 | 1893 |
| 11943 | 263 | 2023-06-28 | 2198.60 | 1843 |
| 11944 | 263 | 2023-06-29 | 2147.80 | 1876 |
| 11945 | 263 | 2023-06-30 | 1912.00 | 1849 |
11946 rows × 4 columns
# Prepare the data for 3D CNN
window_size = 30 # number of previous days
num_zones = len(df_train.PULocationID.unique()) # number of taxi zones
number_of_days = df_train.tpep_pickup_date.nunique() # total number of days in the training set
# Create a 3D array with shape (num_samples_days (181-30), window_size (30), num_zones (66), 1)
X_train_cnn = np.zeros((number_of_days - window_size, window_size, num_zones, 1))
y_train_cnn = np.zeros((number_of_days - window_size, num_zones))
print("X_train_cnn shape:", X_train_cnn.shape)
print("y_train_cnn shape:", y_train_cnn.shape)
X_train_cnn shape: (151, 30, 66, 1)
y_train_cnn shape: (151, 66)
pickup_matrix = df_train.pivot_table(index='tpep_pickup_date',
columns='PULocationID',
values='pickup_count',
fill_value=0)
# Ensure column order is consistent
pickup_matrix = pickup_matrix.sort_index(axis=1)
# Convert to numpy array
pickup_array = pickup_matrix.to_numpy() # shape: (181, 66)
# Prepare training samples
window_size = 30
number_of_days = pickup_array.shape[0]
num_zones = pickup_array.shape[1]
X_train_cnn = np.zeros((number_of_days - window_size, window_size, num_zones, 1))
y_train_cnn = np.zeros((number_of_days - window_size, num_zones))
for i in range(number_of_days - window_size):
X_train_cnn[i] = pickup_array[i:i+window_size].reshape(window_size, num_zones, 1)
y_train_cnn[i] = pickup_array[i + window_size] # the target day
print("X_train_cnn shape:", X_train_cnn.shape) # (151, 30, 66, 1)
print("y_train_cnn shape:", y_train_cnn.shape) # (151, 66)
X_train_cnn shape: (151, 30, 66, 1)
y_train_cnn shape: (151, 66)
from keras.layers import Conv3D, MaxPooling3D, GlobalAveragePooling3D
X_train_cnn_reshaped = X_train_cnn.reshape(X_train_cnn.shape[0], X_train_cnn.shape[1], num_zones, 1, 1)
# Build model
cnn_model = Sequential([
Input(shape=(window_size, num_zones, 1, 1)), # (30, 66, 1, 1)
Conv3D(filters=16, kernel_size=(3, 3, 1), activation='relu', padding='same'),
MaxPooling3D(pool_size=(2, 2, 1)),
Conv3D(filters=32, kernel_size=(3, 3, 1), activation='relu', padding='same'),
GlobalAveragePooling3D(),
Dense(128, activation='relu'),
Dense(num_zones) # Predict pickups for all 66 zones
])
cnn_model.summary()
Model: "sequential_5"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv3d (Conv3D) │ (None, 30, 66, 1, 16) │ 160 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling3d (MaxPooling3D) │ (None, 15, 33, 1, 16) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv3d_1 (Conv3D) │ (None, 15, 33, 1, 32) │ 4,640 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling3d │ (None, 32) │ 0 │ │ (GlobalAveragePooling3D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_7 (Dense) │ (None, 128) │ 4,224 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_8 (Dense) │ (None, 66) │ 8,514 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 17,538 (68.51 KB)
Trainable params: 17,538 (68.51 KB)
Non-trainable params: 0 (0.00 B)
cnn_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
# Train the model
history = cnn_model.fit(
X_train_cnn_reshaped,
y_train_cnn,
epochs=20,
batch_size=16,
validation_split=0.1,
verbose=2
)
Epoch 1/20
9/9 - 0s - 52ms/step - loss: 4005698.2500 - mae: 1419.2722 - val_loss: 3158416.7500 - val_mae: 1253.2993
Epoch 2/20
9/9 - 0s - 12ms/step - loss: 2941589.0000 - mae: 1171.5555 - val_loss: 1661857.6250 - val_mae: 853.8498
Epoch 3/20
9/9 - 0s - 12ms/step - loss: 1327518.3750 - mae: 742.8868 - val_loss: 818064.5000 - val_mae: 644.8470
Epoch 4/20
9/9 - 0s - 12ms/step - loss: 569852.9375 - mae: 504.6197 - val_loss: 238549.3750 - val_mae: 318.3143
Epoch 5/20
9/9 - 0s - 12ms/step - loss: 247675.0000 - mae: 320.0619 - val_loss: 209786.3750 - val_mae: 285.2130
Epoch 6/20
9/9 - 0s - 12ms/step - loss: 213923.9531 - mae: 284.7927 - val_loss: 193069.6406 - val_mae: 264.0422
Epoch 7/20
9/9 - 0s - 12ms/step - loss: 186818.7344 - mae: 260.3803 - val_loss: 175382.7656 - val_mae: 243.6079
Epoch 8/20
9/9 - 0s - 12ms/step - loss: 180980.9219 - mae: 247.4114 - val_loss: 173867.1875 - val_mae: 239.9466
Epoch 9/20
9/9 - 0s - 12ms/step - loss: 177181.1094 - mae: 241.3169 - val_loss: 164904.8125 - val_mae: 229.6416
Epoch 10/20
9/9 - 0s - 12ms/step - loss: 178125.9219 - mae: 239.5164 - val_loss: 181181.5000 - val_mae: 237.5917
Epoch 11/20
9/9 - 0s - 12ms/step - loss: 176187.4062 - mae: 236.1614 - val_loss: 166188.3750 - val_mae: 228.2373
Epoch 12/20
9/9 - 0s - 12ms/step - loss: 175732.1406 - mae: 239.2828 - val_loss: 172761.2188 - val_mae: 234.6099
Epoch 13/20
9/9 - 0s - 12ms/step - loss: 176700.3750 - mae: 237.9642 - val_loss: 162580.7500 - val_mae: 227.5871
Epoch 14/20
9/9 - 0s - 12ms/step - loss: 176748.8438 - mae: 238.7577 - val_loss: 175204.0625 - val_mae: 234.5365
Epoch 15/20
9/9 - 0s - 12ms/step - loss: 180928.7344 - mae: 241.7198 - val_loss: 170355.4688 - val_mae: 230.6339
Epoch 16/20
9/9 - 0s - 12ms/step - loss: 180147.2656 - mae: 240.4812 - val_loss: 166596.1250 - val_mae: 229.3816
Epoch 17/20
9/9 - 0s - 12ms/step - loss: 175776.5625 - mae: 237.9384 - val_loss: 171032.1875 - val_mae: 232.2406
Epoch 18/20
9/9 - 0s - 12ms/step - loss: 177030.3750 - mae: 238.5741 - val_loss: 175354.7500 - val_mae: 234.5650
Epoch 19/20
9/9 - 0s - 12ms/step - loss: 176386.8750 - mae: 237.7845 - val_loss: 168644.7500 - val_mae: 231.0665
Epoch 20/20
9/9 - 0s - 12ms/step - loss: 176184.0469 - mae: 236.9404 - val_loss: 167511.6250 - val_mae: 228.7982
# Predict Next Day Taxi Pickups
y_pred_cnn = cnn_model.predict(X_train_cnn_reshaped[-1].reshape(1, window_size, num_zones, 1, 1))
print("Next predicted value:", y_pred_cnn)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step
Next predicted value: [[ 1.2289052e+02 6.1680840e+01 6.3661102e+02 2.7826425e+02
2.7335095e+02 1.1956342e+02 1.7633020e+03 1.4629707e+02
2.9490791e+03 7.0366827e+02 2.7129609e+03 2.5989877e+02
6.5927063e+02 2.6256130e+03 5.7412750e+02 2.9784439e+02
1.7744995e+03 1.7953296e+03 2.3479021e+03 1.4918673e+03
1.4113519e+03 8.6778183e+01 1.3534446e+00 5.4715491e+02
5.2108417e+00 3.0071418e+00 1.2146526e+03 2.2280225e+03
2.6377021e+03 3.6848096e+03 1.2298606e+03 9.1107025e+02
1.1157504e+03 8.8898669e+02 5.5042496e+01 -1.1286393e+01
9.3174384e+02 5.1037202e+03 3.8835762e+03 3.2025955e+03
2.5167134e+03 5.3388544e+02 3.2498188e+03 3.6241040e+03
4.6168356e+00 -1.6419781e+00 1.8017174e+02 7.8961487e+02
1.6806894e+02 2.1216775e+03 3.6558472e+03 1.5101125e+03
1.2247576e+02 1.4124349e+03 2.9973530e+03 4.7274785e+03
5.2327539e+03 2.1775527e+03 3.0882129e+03 1.9887726e+01
7.0778816e+01 1.7538796e+03 2.3791829e+03 5.1788074e+02
1.5132891e+03 2.1286401e+03]]
# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
y_test_cnn = y_test_nyc # Use the same y_test from previous example
rmse = np.sqrt(mean_squared_error(y_train_cnn[-1], y_pred_cnn[0]))
print(f'RMSE in testing set: {rmse:.2f}')
RMSE in testing set: 374.77
We have not used any spatial features in the previous traing in the CNNs, so we can use the coords of the taxi zone centriods as the spatial features.
x_list = gdf_m.geometry.centroid.to_crs(epsg=4326).x.values
y_list = gdf_m.geometry.centroid.to_crs(epsg=4326).y.values
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_list = scaler.fit_transform(x_list.reshape(-1, 1))
y_list = scaler.fit_transform(y_list.reshape(-1, 1))
x_list = x_list.reshape(-1)
y_list = y_list.reshape(-1)
coords_array = np.array( [(x_list[i], y_list[i]) for i in range(0,66)] )
coords_array.shape
(66, 2)
X_train_cnn.shape[0]
151
# Repeat coords across samples and time steps
coords_expanded = np.tile(coords_array, (151, window_size, 1)) # shape: (samples * time, 2)
coords_expanded = coords_expanded.reshape(151, window_size, num_zones, 2) # (151, 30, 66, 2)
# X_train_cnn shape: (151, 30, 66, 1)
# coords_expanded shape: (151, 30, 66, 2)
X_train_with_coords = np.concatenate([X_train_cnn, coords_expanded], axis=-1) # (151, 30, 66, 3)
# Reshape for 3D CNN input
X_input = X_train_with_coords.reshape(151, window_size, num_zones, 1, 3) # (151, 30, 66, 1, 3)
print("X_input shape:", X_input.shape) # (151, 30, 66, 1, 3)
X_input shape: (151, 30, 66, 1, 3)
cnn_s_model = Sequential([
Input(shape=(window_size, num_zones, 1, 3)),
Conv3D(32, kernel_size=(3, 3, 1), padding='same', activation='relu'),
MaxPooling3D(pool_size=(2, 2, 1)),
Dropout(0.1),
Conv3D(64, kernel_size=(3, 3, 1), padding='same', activation='relu'),
MaxPooling3D(pool_size=(2, 2, 1)),
Dropout(0.1),
GlobalAveragePooling3D(),
Dense(128, activation='relu'),
Dropout(0.2),
Dense(num_zones)
])
cnn_s_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), loss='mse', metrics=['mae'])
history = cnn_s_model.fit(
X_input, y_train_cnn,
epochs=50,
batch_size=8,
validation_split=0.2,
verbose=2
)
Epoch 1/50
15/15 - 1s - 49ms/step - loss: 3100141.5000 - mae: 1227.2567 - val_loss: 1365253.3750 - val_mae: 788.6161
Epoch 2/50
15/15 - 0s - 17ms/step - loss: 1072512.3750 - mae: 769.4460 - val_loss: 343434.8125 - val_mae: 366.7768
Epoch 3/50
15/15 - 0s - 17ms/step - loss: 557386.1875 - mae: 569.6285 - val_loss: 286140.5625 - val_mae: 336.2144
Epoch 4/50
15/15 - 0s - 17ms/step - loss: 468416.3750 - mae: 513.3658 - val_loss: 211484.0469 - val_mae: 276.1114
Epoch 5/50
15/15 - 0s - 17ms/step - loss: 400665.7188 - mae: 472.5962 - val_loss: 213182.6562 - val_mae: 268.4372
Epoch 6/50
15/15 - 0s - 17ms/step - loss: 369956.8438 - mae: 444.0385 - val_loss: 202529.3906 - val_mae: 262.8458
Epoch 7/50
15/15 - 0s - 17ms/step - loss: 347293.1875 - mae: 430.7402 - val_loss: 202589.1250 - val_mae: 268.3718
Epoch 8/50
15/15 - 0s - 17ms/step - loss: 357184.4062 - mae: 435.6439 - val_loss: 166544.1719 - val_mae: 237.3317
Epoch 9/50
15/15 - 0s - 17ms/step - loss: 322464.3125 - mae: 413.0345 - val_loss: 175893.2031 - val_mae: 238.9258
Epoch 10/50
15/15 - 0s - 17ms/step - loss: 318500.1250 - mae: 401.8153 - val_loss: 168738.3594 - val_mae: 232.0199
Epoch 11/50
15/15 - 0s - 17ms/step - loss: 294316.6250 - mae: 387.4250 - val_loss: 167198.8125 - val_mae: 234.9290
Epoch 12/50
15/15 - 0s - 17ms/step - loss: 287068.2188 - mae: 380.9982 - val_loss: 170396.7812 - val_mae: 237.7505
Epoch 13/50
15/15 - 0s - 17ms/step - loss: 285297.1875 - mae: 378.2478 - val_loss: 209391.8125 - val_mae: 266.9217
Epoch 14/50
15/15 - 0s - 17ms/step - loss: 286028.1875 - mae: 371.5200 - val_loss: 162688.7031 - val_mae: 226.0919
Epoch 15/50
15/15 - 0s - 17ms/step - loss: 291980.2188 - mae: 375.9531 - val_loss: 178756.9219 - val_mae: 245.5820
Epoch 16/50
15/15 - 0s - 17ms/step - loss: 278687.1250 - mae: 365.1249 - val_loss: 167631.9219 - val_mae: 232.3482
Epoch 17/50
15/15 - 0s - 17ms/step - loss: 269566.3750 - mae: 357.4424 - val_loss: 162846.5000 - val_mae: 231.0384
Epoch 18/50
15/15 - 0s - 17ms/step - loss: 261317.0312 - mae: 351.6025 - val_loss: 181331.3750 - val_mae: 243.1291
Epoch 19/50
15/15 - 0s - 17ms/step - loss: 242886.9375 - mae: 337.8704 - val_loss: 186670.8750 - val_mae: 251.3622
Epoch 20/50
15/15 - 0s - 17ms/step - loss: 272439.3125 - mae: 356.2213 - val_loss: 216848.9531 - val_mae: 273.6014
Epoch 21/50
15/15 - 0s - 17ms/step - loss: 267217.5312 - mae: 347.7688 - val_loss: 169811.3750 - val_mae: 235.4288
Epoch 22/50
15/15 - 0s - 17ms/step - loss: 244526.6719 - mae: 334.4143 - val_loss: 179100.9219 - val_mae: 241.6548
Epoch 23/50
15/15 - 0s - 17ms/step - loss: 278718.7188 - mae: 349.2837 - val_loss: 185225.4844 - val_mae: 245.1945
Epoch 24/50
15/15 - 0s - 17ms/step - loss: 254793.1875 - mae: 336.1122 - val_loss: 158373.3281 - val_mae: 224.4458
Epoch 25/50
15/15 - 0s - 17ms/step - loss: 262526.3750 - mae: 344.1134 - val_loss: 208559.8594 - val_mae: 266.1631
Epoch 26/50
15/15 - 0s - 17ms/step - loss: 269179.0625 - mae: 341.8260 - val_loss: 210684.1094 - val_mae: 266.8428
Epoch 27/50
15/15 - 0s - 17ms/step - loss: 253808.4844 - mae: 327.8202 - val_loss: 205884.2656 - val_mae: 265.3175
Epoch 28/50
15/15 - 0s - 17ms/step - loss: 237416.7500 - mae: 322.7746 - val_loss: 182291.5312 - val_mae: 244.8578
Epoch 29/50
15/15 - 0s - 17ms/step - loss: 245527.2812 - mae: 329.7265 - val_loss: 172915.5938 - val_mae: 236.1347
Epoch 30/50
15/15 - 0s - 17ms/step - loss: 208559.3594 - mae: 302.7233 - val_loss: 160275.1875 - val_mae: 225.6327
Epoch 31/50
15/15 - 0s - 17ms/step - loss: 218899.4844 - mae: 309.0979 - val_loss: 151619.2812 - val_mae: 218.2278
Epoch 32/50
15/15 - 0s - 17ms/step - loss: 221982.0938 - mae: 307.1187 - val_loss: 149471.3125 - val_mae: 214.4608
Epoch 33/50
15/15 - 0s - 18ms/step - loss: 228214.5469 - mae: 312.2622 - val_loss: 168615.9844 - val_mae: 232.6189
Epoch 34/50
15/15 - 0s - 17ms/step - loss: 220582.0938 - mae: 304.2811 - val_loss: 159324.8750 - val_mae: 224.0596
Epoch 35/50
15/15 - 0s - 17ms/step - loss: 211354.8438 - mae: 297.2011 - val_loss: 158466.2188 - val_mae: 223.6774
Epoch 36/50
15/15 - 0s - 17ms/step - loss: 222183.5156 - mae: 303.8752 - val_loss: 142687.0625 - val_mae: 208.9231
Epoch 37/50
15/15 - 0s - 17ms/step - loss: 195074.7188 - mae: 285.8051 - val_loss: 141147.5781 - val_mae: 208.5949
Epoch 38/50
15/15 - 0s - 17ms/step - loss: 204858.1406 - mae: 288.2387 - val_loss: 140483.0000 - val_mae: 206.7271
Epoch 39/50
15/15 - 0s - 17ms/step - loss: 224880.2031 - mae: 302.5044 - val_loss: 140867.1719 - val_mae: 210.0112
Epoch 40/50
15/15 - 0s - 17ms/step - loss: 237459.2656 - mae: 304.7390 - val_loss: 144040.4844 - val_mae: 209.4215
Epoch 41/50
15/15 - 0s - 17ms/step - loss: 198229.6406 - mae: 283.6451 - val_loss: 138064.6094 - val_mae: 205.3014
Epoch 42/50
15/15 - 0s - 17ms/step - loss: 193824.2656 - mae: 280.8102 - val_loss: 135673.0000 - val_mae: 205.0084
Epoch 43/50
15/15 - 0s - 17ms/step - loss: 193151.5000 - mae: 283.6617 - val_loss: 135744.4531 - val_mae: 204.6235
Epoch 44/50
15/15 - 0s - 17ms/step - loss: 199524.4062 - mae: 281.3769 - val_loss: 132451.3594 - val_mae: 203.0702
Epoch 45/50
15/15 - 0s - 17ms/step - loss: 178199.7500 - mae: 267.2109 - val_loss: 154402.4688 - val_mae: 221.4493
Epoch 46/50
15/15 - 0s - 17ms/step - loss: 194038.9531 - mae: 283.3656 - val_loss: 133025.8750 - val_mae: 203.3244
Epoch 47/50
15/15 - 0s - 17ms/step - loss: 193496.9219 - mae: 277.1366 - val_loss: 128109.1016 - val_mae: 197.9702
Epoch 48/50
15/15 - 0s - 17ms/step - loss: 187112.0156 - mae: 271.6195 - val_loss: 136351.9688 - val_mae: 205.1196
Epoch 49/50
15/15 - 0s - 18ms/step - loss: 177534.7812 - mae: 264.9625 - val_loss: 124026.2891 - val_mae: 196.9332
Epoch 50/50
15/15 - 0s - 18ms/step - loss: 185698.6562 - mae: 272.3214 - val_loss: 142433.6875 - val_mae: 211.1327
# Predict Next Day Taxi Pickups
y_pred_cnn = cnn_s_model.predict(X_input[-1].reshape(1, window_size, num_zones, 1, 3))
print("Next predicted value:", y_pred_cnn)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step
Next predicted value: [[ 1.0612791e+02 3.8810486e+01 5.8801166e+02 2.9606802e+02
2.4752686e+02 9.2207550e+01 1.6928286e+03 1.2952293e+02
2.7664260e+03 6.7426154e+02 2.5758838e+03 2.1737177e+02
6.0807220e+02 2.4902241e+03 5.5077045e+02 2.3692393e+02
1.6783209e+03 1.7066562e+03 2.2098545e+03 1.3948647e+03
1.3424951e+03 5.2930607e+01 2.8707078e+00 4.9390933e+02
-1.0534749e+01 5.6179547e+00 1.1687524e+03 2.0666016e+03
2.5180186e+03 3.4603611e+03 1.1406893e+03 8.6974042e+02
1.0477395e+03 8.3887866e+02 7.9398323e+01 9.6016932e+00
9.0503265e+02 4.7581387e+03 3.6536624e+03 3.0194661e+03
2.4095620e+03 5.0304251e+02 3.0460337e+03 3.4344509e+03
1.2814647e+01 2.0924139e+01 1.5049051e+02 7.7649182e+02
1.4179123e+02 2.0023422e+03 3.4768835e+03 1.3836450e+03
1.4643947e+02 1.3136202e+03 2.8297920e+03 4.3705015e+03
4.8720879e+03 2.0146099e+03 2.9179875e+03 2.5009188e+01
7.2797356e+01 1.6737131e+03 2.2672693e+03 5.0580597e+02
1.3908831e+03 2.0206296e+03]]
# calculate the rmse for the testing data
from sklearn.metrics import mean_squared_error
y_test_cnn = y_test_nyc # Use the same y_test from the previous example
rmse = np.sqrt(mean_squared_error(y_train_cnn[-1], y_pred_cnn[0]))
print(f'RMSE in testing set: {rmse:.2f}')
RMSE in testing set: 261.34
Quiz#
Change the window size to build different sets of training features and training RNNs model to predict the taxi amounts.
Use the different sizes of learn the dataset and use the LSTM to do the prediction and compare the performance with RNNs.