Foundations of Graph Data Structures and Theory
Graphs are non-linear data structures consisting of vertices (nodes) and edges that connect them. Formally denoted as $G = (V, E)$, where $V$ represents the set of vertices and $E$ represents the set of edges, graphs are widely utilized to model relationships in computer networks, logistics, and social systems.
Core Concepts
- Vertex (Node): The fundamental unit of a graph.
- Edge: A connection between two vertices, represented as $(u, v)$. Edges can be undirected (traversable both ways) or directed (traversable only from $u$ to $v$).
- Undirected Graph: Edges lack direction; $(u, v)$ is identical to $(v, u)$.
- Directed Graph (Digraph): Edges possess direction; $(u, v)$ differs from $(v, u)$.
- Simple Graph: A graph containing no self-loops (edges connecting a vertex to itself) or multiple edges (more than one edge between the same pair of vertices).
- Multigraph: A graph that permits self-loops and multiple edges.
- Edge Weight: A numerical value assigned to an edge, typically representing cost, distance, or capacity.
To process undirected graphs in algorithms designed for directed ones, bidirectional traversal can be simulated by creating two opposing directed edges for every undirected edge.
Graph Representation
Computers represent graphs primarily through adjacency matrices or adjacency lists.
Adjacency Matrix
An adjacency matrix utilizes a 2D array of size $V \times V$. A non-infinite value at matrix[i][j] indicates an edge from $i$ to $j$ with the corresponding weight. For a graph with vertices ${0, 1, 2}$ and directed edges $E = {(0, 1), (1, 2), (2, 1)}$:
#include <vector>
const int INF = 1e9;
int num_vertices = 3;
std::vector<std::vector<int>> matrix(num_vertices, std::vector<int>(num_vertices, INF));
void add_edge(int u, int v, int weight) {
matrix[u][v] = weight;
}
// Build edges
add_edge(0, 1, 1);
add_edge(1, 2, 1);
add_edge(2, 1, 1);
Adjacency List
An adjacency list uses an array of dynamic arrays, where each index maps to a vertex and stores its adjacent neighbors. This approach is significantly more memory-efficient for sparse graphs, avoiding the $O(V^2)$ space complexity of matrices.
#include <vector>
int num_vertices = 4;
std::vector<std::vector<int>> adj(num_vertices);
void add_edge(int u, int v) {
adj[u].push_back(v);
}
// Build edges: E = {(0, 1), (1, 2), (2, 1), (2, 3)}
add_edge(0, 1);
add_edge(1, 2);
add_edge(2, 1);
add_edge(2, 3);
Graph Properties
- Degree: The number of edges incident to a vertex. In digraphs, this splits into In-degree (incoming edges) and Out-degree (outgoing edges). An isolated vertex has a degree of 0.
- Path: A sequence of vertices where each adjacent pair is connected by an edge, with no edges repeated.
- Cycle: A path that starts and ends at the same vertex.
- Connected Graph: An undirected graph where a path exists between every pair of vertices.
- Strong Connected Graph: A directed graph where a directed path exists between every pair of vertices.
Graph Traversal
Depth-First Search (DFS) and Breadth-First Search (BFS) are the standard mechanisms for exploring graphs.
DFS with Adjacency Matrix
#include <vector>
std::vector<bool> visited;
std::vector<std::vector<int>> mat;
int sz;
void traverseDFS(int curr) {
if (visited[curr]) return;
visited[curr] = true;
for (int nxt = 0; nxt < sz; ++nxt) {
if (mat[curr][nxt] != INF) {
traverseDFS(nxt);
}
}
}
DFS with Adjacency List
#include <vector>
std::vector<bool> visited;
std::vector<std::vector<int>> graph;
void exploreDFS(int curr) {
if (visited[curr]) return;
visited[curr] = true;
for (int nxt : graph[curr]) {
exploreDFS(nxt);
}
}
BFS with Adjacency List
#include <vector>
#include <queue>
std::vector<bool> seen;
std::vector<std::vector<int>> graph;
void exploreBFS(int start) {
std::queue<int> q;
q.push(start);
seen[start] = true;
while (!q.empty()) {
int curr = q.front();
q.pop();
for (int nxt : graph[curr]) {
if (!seen[nxt]) {
seen[nxt] = true;
q.push(nxt);
}
}
}
}
To determine if an undirected graph is connected, execute a DFS or BFS from an arbitrary vertex and verify if all vertices are marked as visited.
Practical Applications
Maximum Reachable Vertex
To find the highest-numbered vertex reachable from each node, reversing the graph edges and processing vertices in descending order optimizes the process. By traversing from the largest vertex down, the first time a node is reached dictates its maximum reachable component.
#include <iostream>
#include <vector>
using namespace std;
int main() {
int n, m;
if (!(cin >> n >> m)) return 0;
vector<vector<int>> rev_graph(n + 1);
for (int i = 0; i < m; ++i) {
int u, v;
cin >> u >> v;
rev_graph[v].push_back(u); // Reverse the edge
}
vector<int> max_reachable(n + 1, 0);
auto dfs = [&](auto& self, int u, int root) -> void {
if (max_reachable[u]) return;
max_reachable[u] = root;
for (int v : rev_graph[u]) {
self(self, v, root);
}
};
for (int i = n; i >= 1; --i) {
dfs(dfs, i, i);
}
for (int i = 1; i <= n; ++i) {
cout << max_reachable[i] << (i == n ? "" : " ");
}
return 0;
}
Ordered Graph Traversal
When a specific tarversal order is required (e.g., visiting neighbors in ascending order), sorting the adjacency lists prior to traversal guarantees deterministic output.
#include <iostream>
#include <vector>
#include <queue>
#include <algorithm>
using namespace std;
vector<vector<int>> adj;
vector<bool> dfs_visited, bfs_visited;
void runDFS(int u) {
dfs_visited[u] = true;
cout << u << " ";
for (int v : adj[u]) {
if (!dfs_visited[v]) runDFS(v);
}
}
void runBFS(int start) {
queue<int> q;
q.push(start);
bfs_visited[start] = true;
while (!q.empty()) {
int u = q.front(); q.pop();
cout << u << " ";
for (int v : adj[u]) {
if (!bfs_visited[v]) {
bfs_visited[v] = true;
q.push(v);
}
}
}
}
int main() {
int n, m;
if (!(cin >> n >> m)) return 0;
adj.assign(n + 1, {});
for (int i = 0; i < m; ++i) {
int u, v;
cin >> u >> v;
adj[u].push_back(v);
}
for (int i = 1; i <= n; ++i) {
sort(adj[i].begin(), adj[i].end());
}
dfs_visited.assign(n + 1, false);
bfs_visited.assign(n + 1, false);
runDFS(1);
cout << "\n";
runBFS(1);
return 0;
}
Common Graph Algorithms
- Depth-First Search (DFS): Graph traversal, cycle detection.
- Breadth-First Search (BFS): Shortest path in unweighted graphs.
- Dijkstra's Algorithm: Single-source shortest path for non-negative weights.
- Bellman-Ford Algorithm: Single-source shortest path with negative edge weights.
- Floyd-Warshall Algorithm: All-pairs shortest paths.
- Kruskal's Algorithm: Minimum Spanning Tree using disjoint sets.
- Prim's Algorithm: Minimum Spanning Tree using a priority queue.
- Topological Sort: Linear ordering of vertices in a Directed Acyclic Graph (DAG).
- Tarjan's Algorithm: Finding strong connected components, articulation points, and bridges.