pos = nx.spring_layout(sub_G, k=0.2) # Calculate once for consistent visualization
# Create figure
plt.figure(figsize=(6, 4))
for i, iteration in enumerate(iterations[:10]):
plt.clf() # Clear previous frame
# Get current node states
statuses = iteration['status']
infected = [n for n, s in statuses.items() if s == 1]
removed = [n for n, s in statuses.items() if s == 2]
# Create cmap
node_colors = []
for node in sub_G.nodes():
if node in infected:
node_colors.append('orange') # Currently infected
elif node in removed:
node_colors.append('green') # Removed/recovered
else:
node_colors.append('skyblue') # Susceptible
# Plot graph
nx.draw(sub_G, pos, node_color=node_colors, with_labels=False,
node_size=50, edge_color='grey', width=0.1, alpha=0.5)
from matplotlib.lines import Line2D
legend_elements = [
Line2D([0], [0], marker='o', color='w', label=f'Susceptible: {len(susceptible)}',
markerfacecolor='skyblue', markersize=8, linestyle='None'),
Line2D([0], [0], marker='o', color='w', label=f'Infected: {len(infected)}',
markerfacecolor='orange', markersize=8, linestyle='None'),
Line2D([0], [0], marker='o', color='w', label=f'Removed: {len(removed)}',
markerfacecolor='green', markersize=8, linestyle='None')
]
plt.legend(handles=legend_elements, loc='upper right',
title=f"Iteration {i+1}", frameon=False)
plt.title(f"SIR Model - Timestep/iteration {i+1}")
plt.axis('off')
plt.pause(1) # 1 second pause between frames
plt.show()
5 Social network analysis#
5.1 Basic concepts of network#
A network is frequently used interchangeably with a graph, but it typically highlights real-world applications and is commonly associated with social relationships (social networks), and built environments (road networks).
A graph (\(G\)) is a mathematical structure used to model pairwise relations between objects. It consists of a set of vertices (nodes) and a set of edges (links) that connect pairs of vertices.
Graph Type
Graph Type
Description
NetworkX Class
Undirected Graph
A graph where edges have no direction.
Graph
Directed Graph
A graph where edges have a direction, indicated by an arrow.
DiGraph
Multi-(undirected) Graph
An undirected graph with parallel edges.
MultiGraph
Multi-directed Graph
A directed graph with parallel edges.
MultiDiGraph
Note: All types of graphs can have self-loops.
5.2 Measurements for nodes and edges#
5.2.1 Degree centrality of nodes#
The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in \(G\).
5.2.2 Closeness centrality of nodes#
Closeness centrality of a node \(u\) is the reciprocal of the average shortest path distance to \(u\) over all \(n-1\) reachable nodes.
\(G =(U,V)\).
\(C(v) = \frac{n-1}{\sum_{v =1}^{n-1} d(v, u)}\)
where \(d(v, u)\) is the shortest-path distance between v and u, and n-1 is the number of nodes reachable from \(u\).
Shortest path and distance
1–>2 : 1,3,2 (5+22 = 27)
1–>3: 1,3 (5)
1–>4: 1,3,2,4 (5+22+2=29)
2–>3: 2,3 (22)
2–>4: 2,4 (2)
3–>4: 3,2,4 (22+2=24)
5.2.3 Betweenness centrality of nodes#
Betweenness centrality of a node \(v\) is the sum of the fraction of all-pairs shortest paths that pass through \(v\).
\(C_B(v) = \sum_{s,t \in V} \frac{\sigma({s,t |v })}{\sigma({s, t})}\)
where \(V\) is the set of nodes,
\(\sigma({s, t})\) is the number of shortest paths,
\(\sigma({s,t |v })\) is the number of those paths passing through some node \(v\) other than \({s, t}\).
\(\text{Normalization } C_B \text{ of nodes} = \frac{C_B \text{ of nodes}}{\text{Normalization Factor}}\)
\(\text{Normalization Factor (NF) for undirected G} = \frac{(n-1) \cdot (n-2)}{2}\) ,
\(n\) is the nodes numbers.
5.2.4 Betweenness centrality of edges#
Betweenness centrality of an edge \(e\) is the sum of the fraction of all-pairs shortest paths that pass through \(e\).
\(C_B(e) = \sum_{s,t \in V} \frac{\sigma({s,t |e})}{\sigma({s, t})}\)
where \(V\) is the set of nodes,
\(\sigma({s, t})\) is the number of shortest paths \({s, t}\),
\(\sigma({s,t |e })\) is the number of those paths passing through edge \(e\).
\(\text{Normalization } C_B \text{ of edges} = \frac{C_B \text{ of edges}}{\text{Normalization Factor}}\)
\(\text{Normalization Factor (NF) for undirected G} = \frac{n(n-1)}{2}\)
\(n\) is the nodes numbers.
Note: If we create a DiGraph, are measurements still the same?
Gd = nx.DiGraph()
Gd.add_edge(1, 2, weight=30)
Gd.add_edge(1, 3, weight=5)
Gd.add_edge(2, 3, weight=22)
Gd.add_edge(2, 4, weight=2)
Gd.add_edge(3, 4, weight=37)
5.3 Community detection#
Community detection is the process of identifying groups of nodes in a network that are more densely connected to each other than to the rest of the network. These groups are called communities or clusters.
In this section, we will use the real word dataset to illustrate the Link prediction in SNA. Download link
5.3.1 Visualization of graph#
The visualization of social network by the node degrees.
The visualization of social network by the edge betweenness centrality.
5.3.2 Girvan-Newman algorithm#
Girvan-Newman algorithm is a method for detecting communities in a graph by progressively removing edges with the highest betweenness centrality. The algorithm works as follows:
Compute the betweenness centrality of all edges in the graph.
Identify the edge with the highest betweenness centrality.
Remove that edge from the graph.
Check if the graph is disconnected (i.e., it has more than one connected component).
If the graph is disconnected, the connected components are considered communities.
Repeat steps 1-5 until the desired number of communities is detected or no edges remain.
Return the detected communities.
5.3.3 Fluid Communities algorithm#
Fluid Communities algorithm is a method for detecting communities in a connected graph based on the concept of fluid dynamics. It works by simulating the flow of “fluid” through the network, where nodes represent containers and edges represent pipes. The algorithm identifies communities by analyzing how the fluid flows and accumulates in different regions of the network. The algorithm is particularly effective for detecting overlapping communities, where nodes can belong to multiple communities simultaneously.
5.3.4 Label Propagation algorithm#
Label Propagation algorithm is a method for detecting communities in a graph based on the idea of propagating labels through the network. The algorithm works as follows:
Initialize each node with a unique label.
For each node, update its label to the most frequent label among its neighbors.
Repeat step 2 until convergence (i.e., no labels change).
Return the final labels as communities.
The algorithm is efficient and can handle large networks, making it suitable for real-world applications.
5.3.5 Clique Percolation algorithm#
Clique Percolation algorithm is a method for detecting communities in a graph based on the concept of cliques. A clique is a subset of nodes in which every node is connected to every other node. The algorithm works as follows:
Identify all cliques of a specified size (k) in the graph.
Create a graph where each clique is a node.
Connect cliques that share at least one node.
Identify connected components in the new graph as communities.
Return the detected communities.
Exercise: Define a new function to draw the graph visualisation
5.3.6 Kernighan-Lin algorithm#
Kernighan-Lin algorithm is a method for detecting communities in a graph based on the concept of graph partitioning. The algorithm works as follows:
Initialize two partitions of the graph.
Compute the gain for each node in the partitions.
Swap nodes between the partitions to maximize the gain.
Repeat steps 2-3 until no further improvement can be made.
Return the final partitions as communities.
5.4 Link Prediction#
Link prediction aims to predict missing or future connections between nodes in a graph.
The idea behind link prediction is that nodes with similar properties or shared connections are more likely to form a link in the future.
It is widely used in social networks, biological networks, recommendation systems, and fraud detection.
Generate all non-edges: A non-edge is a pair of nodes that are not connected.
5.4.1 Jaccard Coefficient#
The Jaccard Coefficient measures the similarity between two nodes based on their common neighbors. The Jaccard Coefficient \(J(u, v)\) between two nodes \(u\) and \(v\) is defined as:
\(J(u, v) = \frac{|N(u) \cap N(v)|}{|N(u) \cup N(v)|}\)
Where:
\( N(u) \) is the set of neighbors of node \( u \),
\( N(v) \) is the set of neighbors of node \( v \),
\( |N(u) \cap N(v)| \) is the size of the intersection of the neighbor sets of \( u \) and \( v \),
\( |N(u) \cup N(v)| \) is the size of the union of the neighbor sets of \( u \) and \( v \).
Example: J(A, D)
\(N(A) = \set{B}\)
\(N(D) = \set{B, E, F}\)
\(|N(A) \cap N(D)|\) = 1
\(|N(A) \cup N(D)|\) = 3
\(J(A, D) = 1/3\)
Calculating JC using nx
Link prediction on social network
Visualise the top 15 predicted links by Jaccard Coefficient, you can use other solution to plot all edges by Jaccard Coefficient.
5.4.2 Common neighbours#
In link prediction, common neighbors is a simple and effective method used to predict the likelihood of an edge (link) forming between two nodes in a network. It is based on the idea that two nodes are more likely to be connected if they share many common neighbors.
5.5 Influence propagation#
Influence Propagation is a model used to simulate how information, behaviors, or influence spread through a network over time. It focuses on diffusion processes in networks and is often used in the context of social influence, viral marketing, or epidemic spread.
5.5.1 Independent Cascade Model (IC Model)#
NDlib is a Python library designed to describe, simulate and study diffusion processes on complex networks.
# ! pip install ndlib
5.5.2 Susceptible-Infected-Removed (SIR) model#
The SIR model used in epidemiology to describe how infectious diseases spread through a population.
It divides the population into three parts:
Susceptible (S): Individuals who are not yet infected but can contract the disease.
Infected (I): Individuals who are currently infected and can spread the disease to susceptible individuals.
Removed (R): Individuals who have either recovered (and gained immunity) or died, meaning they no longer participate in disease transmission.
Key Parameters:
\(\beta\) (beta): Infection probability
\(\gamma\) (gamma): Recovery probability