Tarjan Algorithm for Finding Strongly Connected Components
Introduction to Strongly Connected Components
A Strongly Connected Component (SCC) of a directed graph is a maximal subgraph where every vertex is reachable from every other vertex. In simpler terms, for any two nodes \(u\) and \(v\) with in the same SCC, there exists a path from \(u\) to \(v\) and a path from \(v\) to \(u\). Tarjan's algorithm is an efficient linear-time approach based on Depth First Search (DFS) to identify all SCCs in a graph.
The Tarjan Algorithm Logic
The algorithm performs a DFS traversal. Since a single DFS might not visit all nodes if the graph is not connected, we iterate through all nodes and initiate a DFS for any node that hasn't been visited yet. Each node \(i\) is assigned two key values:
discovery_time[i]: The timestamp when node \(i\) was first visited.low_link[i]: The smallestdiscovery_timereachable from \(i\) (including itself) through a back-edge or a cross-edge to a node still in the current recursion stack.
Nodes are pushed onto a stack as they are visited. While backtracking from a node \(u\), we update its low_link based on its neighbors. If low_link[u] == discovery_time[u], then \(u\) is the "root" of an SCC. All nodes from the top of the stack down to \(u\) are popped and assigned to a new SCC.
int disc[MAXN], low[MAXN], timer;
int stack_nodes[MAXN], top_idx;
bool in_stack[MAXN];
int scc_id[MAXN], scc_total;
void find_scc(int u) {
disc[u] = low[u] = ++timer;
stack_nodes[++top_idx] = u;
in_stack[u] = true;
for (int v : adjacency_list[u]) {
if (!disc[v]) {
find_scc(v);
low[u] = std::min(low[u], low[v]);
} else if (in_stack[v]) {
low[u] = std::min(low[u], disc[v]);
}
}
if (low[u] == disc[u]) {
scc_total++;
while (true) {
int node = stack_nodes[top_idx--];
in_stack[node] = false;
scc_id[node] = scc_total;
if (node == u) break;
}
}
}
// Global execution
for (int i = 1; i <= node_count; ++i) {
if (!disc[i]) find_scc(i);
}
Correctness and Edge Classification
The algorithm correctly identifies SCCs by analyzing four types of edges in the DFS tree:
- Tree Edges: Edges to unvisited nodes.
- Back Edges: Edges to ancestors in the DFS tree.
- Forward Edges: Edges to descendants already visited.
- Cross Edges: Edges to nodes in different branches.
An SCC is formed when a back edge or certain cross edges connect a node to one of its ancestors. The low_link value effectively propagates the earliest ancestor reachable. When low_link[u] == discovery_time[u], it signifies that no node in the subtree rooted at \(u\) can reach a node visited earlier than \(u\) that is still in the current search path. Thus, \(u\) and its remaining descendants on the stack form a complete SCC.
Graph Condensation
One of the most powerful applications of Tarjan's algorithm is "condensation." By treating each SCC as a single super-node, we transform the original directed graph into a Directed Acyclic Graph (DAG). The SCCs are identified in the reverse of their topological order in this DAG.
Application Examples
1. Maximum Weight Path (Coin Collector)
Given a graph where nodes have weights, find the path that collects the maximum total weight. Since we can traverse an SCC entirely, we condense the graph into a DAG where each super-node's weight is the sum of weights of the nodes in that SCC. We then use dynamic programming on the DAG:
DP[u] = SCC_Weight[u] + max(DP[v]) for all successors \(v\) of SCC \(u\).
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
const int MAXN = 100005;
vector<int> adj[MAXN], dag_adj[MAXN];
long long node_val[MAXN], scc_sum[MAXN], dp[MAXN];
int dfn[MAXN], low[MAXN], scc[MAXN], st[MAXN];
bool active[MAXN];
int timer, top, scc_cnt;
void tarjan(int u) {
dfn[u] = low[u] = ++timer;
st[++top] = u;
active[u] = true;
for (int v : adj[u]) {
if (!dfn[v]) {
tarjan(v);
low[u] = min(low[u], low[v]);
} else if (active[v]) {
low[u] = min(low[u], dfn[v]);
}
}
if (low[u] == dfn[u]) {
scc_cnt++;
while (true) {
int curr = st[top--];
active[curr] = false;
scc[curr] = scc_cnt;
scc_sum[scc_cnt] += node_val[curr];
if (curr == u) break;
}
}
}
int main() {
int n, m;
scanf("%d %d", &n, &m);
for (int i = 1; i <= n; i++) scanf("%lld", &node_val[i]);
for (int i = 0; i < m; i++) {
int u, v;
scanf("%d %d", &u, &v);
adj[u].push_back(v);
}
for (int i = 1; i <= n; i++) if (!dfn[i]) tarjan(i);
for (int u = 1; u <= n; u++) {
for (int v : adj[u]) {
if (scc[u] != scc[v]) dag_adj[scc[u]].push_back(scc[v]);
}
}
long long max_score = 0;
// DP on DAG (processed in reverse order of discovery)
for (int i = 1; i <= scc_cnt; i++) {
long long best_next = 0;
for (int v : dag_adj[i]) best_next = max(best_next, dp[v]);
dp[i] = scc_sum[i] + best_next;
max_score = max(max_score, dp[i]);
}
printf("%lld\n", max_score);
return 0;
}
2. Semi-Connectivity in Graphs
A graph is semi-connected if for every pair of vertices \((u, v)\), there is a path from \(u\) to \(v\) or from \(v\) to \(u\). After condensing the graph into a DAG, the condition for semi-connectivity is that the DAG must contain a Hamiltonian path. This essentially means the DAG must look like a single chain of nodes.
3. Minimum Information Source
In scenarios like identifying a "killer" among people where relationships represent clues, we use SCCs to simplify the problem. Generally, you must investigate at least one person from every SCC that has an in-degree of 0 in the condensed DAG. However, using deduction (exclusion principal), if there is an SCC of size 1 with in-degree 0, and all its neighbors can be reached from other in-degree 0 SCCs, we might avoid investigating that specific person.