Mastering Depth-First Search Algorithms and Techniques
Depth-first search (DFS) is characterized by advancing as far as possible along a single branch before retreating. This retreat occurs only upon encountering a dead end, which arises either from hitting a boundary or from reaching a previously visited location. Upon backtracking, the algorithm returns to the most recent intersection with unexplored alternative paths and continues the search. Traversal represents the mechanism, while searching is the objective. Exhaustively enumerating all potential states makes computational search feasible for complex problems.
Traversal vs. Search
Traversal and searching are effectively interchangeable concepts in this context. Traversing all potential states fulfills the goal of searching. Consequently, depth-first traversal and depth-first search refer to the same foundational process.
DFS in Trees
In binary trees, DFS typically begins at the root node, recursively exploring the left and right subtrees. While traversal order can vary, the standard convention prioritizes the left branch over the right (Root, Left, Right). The base case for the recursion is encountering a null node, signifying that all leaves have been processed.
Pre-order, in-order, and post-order traversals are all implementations of DFS, differing solely in when the current node is processed relative to its children. Post-order traversal is uniquely valuable because it ensures that the results of the subtrees are completely known before the node itself is evaluated.
Graphs, unlike trees, contain cycles. Graph traversal thus requires explicitly recording visited nodes to prevent infinite loops. At its core, many algorithmic problems on trees or graphs reduce to performing a traversal while accumulating problem-specific data to compute the final result.
Pre-order Traversal
In pre-order traversal, a node is processed before its subtrees.
Recursive implementation:
c void dfs_pre(struct TreeNode *curr, int *result_arr, int *pos) { if (curr == NULL) { return; } result_arr[(*pos)++] = curr->val; dfs_pre(curr->left, result_arr, pos); dfs_pre(curr->right, result_arr, pos); }
int* preorderTraversal(struct TreeNode *root, int *returnSize) { int *result_arr = (int *)malloc(sizeof(int) * 100); int pos = 0; dfs_pre(root, result_arr, &pos); *returnSize = pos; return result_arr; }
Iterative implementation:
c int* preorderTraversal(struct TreeNode *root, int *returnSize) { int *result_arr = malloc(sizeof(int) * 2000); *returnSize = 0; if (root == NULL) { return result_arr; }
struct TreeNode *stack_arr[2000];
struct TreeNode *curr = root;
int stack_top = 0;
while (stack_top > 0 || curr != NULL) {
while (curr != NULL) {
result_arr[(*returnSize)++] = curr->val;
stack_arr[stack_top++] = curr;
curr = curr->left;
}
curr = stack_arr[--stack_top];
curr = curr->right;
}
return result_arr;
}
In-order Traversal
In in-order traversal, the left subtree is fully processed, followed by the node itself, and finally the right subtree.
Recursive implementation:
c void dfs_in(struct TreeNode *curr, int *result_arr, int *pos) { if (!curr) { return; } dfs_in(curr->left, result_arr, pos); result_arr[(*pos)++] = curr->val; dfs_in(curr->right, result_arr, pos); }
int* inorderTraversal(struct TreeNode *root, int *returnSize) { int *result_arr = malloc(sizeof(int) * 501); *returnSize = 0; dfs_in(root, result_arr, returnSize); return result_arr; }
Iterative implementation:
c int* inorderTraversal(struct TreeNode *root, int *returnSize) { *returnSize = 0; int *result_arr = malloc(sizeof(int) * 501); struct TreeNode *stack_nodes = malloc(sizeof(struct TreeNode) * 501); int stack_top = 0;
while (root != NULL || stack_top > 0) {
while (root != NULL) {
stack_nodes[stack_top++] = root;
root = root->left;
}
root = stack_nodes[--stack_top];
result_arr[(*returnSize)++] = root->val;
root = root->right;
}
return result_arr;
}
Post-order Traversal
In post-order traversal, both subtrees are processed before the node itself.
Recursive implementation:
c void dfs_post(struct TreeNode *curr, int *result_arr, int *pos) { if (curr == NULL) { return; } dfs_post(curr->left, result_arr, pos); dfs_post(curr->right, result_arr, pos); result_arr[(*pos)++] = curr->val; }
int* postorderTraversal(struct TreeNode *root, int *returnSize) { int *result_arr = malloc(sizeof(int) * 2001); *returnSize = 0; dfs_post(root, result_arr, returnSize); return result_arr; }
Iterative implementation:
c int* postorderTraversal(struct TreeNode *root, int *returnSize) { int *result_arr = malloc(sizeof(int) * 2001); *returnSize = 0; if (root == NULL) { return result_arr; } struct TreeNode **stack_nodes = malloc(sizeof(struct TreeNode *) * 2001); int stack_top = 0; struct TreeNode *prev_node = NULL;
while (root != NULL || stack_top > 0) {
while (root != NULL) {
stack_nodes[stack_top++] = root;
root = root->left;
}
root = stack_nodes[--stack_top];
if (root->right == NULL || root->right == prev_node) {
result_arr[(*returnSize)++] = root->val;
prev_node = root;
root = NULL;
} else {
stack_nodes[stack_top++] = root;
root = root->right;
}
}
return result_arr;
}
DFS Implementation Using Stacks
During DFS, adjacent nodes must be temporarily stored so they can be revisited upon backtracking. This sequence exhibits a Last-In-First-Out (LIFO) property, making the stack an ideal data structure. Recursion inherently utilizes the system's call stack. Therefore, DFS can be implemented either via an explicit iterative stack or through recursion. While iterative approaches avoid stack overflow limits, recursion is often preferred for its readability and intuitive state management.
Applications of DFS
Numerous tree and graph problems are solved by executing a DFS and collecting relevant information along the way.
Deriving Tree Properties
The depth (distance from the root) and height (distance to the deepest leaf) of a tree are easily found using DFS. Calculating height using post-order traversal is natural since the heights of the subtrees must be known before determining the parent's height.
Calculating maximum depth:
c int calc_tree_height(struct TreeNode *curr) { int left_h = 0, right_h = 0; if (curr == NULL) { return 0; } left_h = calc_tree_height(curr->left); right_h = calc_tree_height(curr->right); return 1 + fmax(left_h, right_h); }
Path sum validation checks if a route from the root to a leaf exists where the node values equal a target. This requires passing the accumulated sum downwards and evaluating it upon reaching a leaf.
c bool search_leaf(struct TreeNode *curr, int remaining_sum) { if (curr->left == NULL && curr->right == NULL && remaining_sum != 0) { return false; } if (curr->left == NULL && curr->right == NULL && remaining_sum == 0) { return true; } if (curr->left != NULL) { remaining_sum -= curr->left->val; if (search_leaf(curr->left, remaining_sum)) return true; remaining_sum += curr->left->val; } if (curr->right != NULL) { remaining_sum -= curr->right->val; if (search_leaf(curr->right, remaining_sum)) return true; remaining_sum += curr->right->val; } return false; }
bool hasPathSum(struct TreeNode *root, int targetSum) { if (root == NULL) { return false; } return search_leaf(root, targetSum - root->val); }
Topological Sorting
Problems like course scheduling require detecting cycles in directed graphs. DFS using three states (0: unvisited, 1: visiting, 2: visited) can identify a cycle. Encountering a node in state '1' indicates a back edge, confirming a cycle.
c int **adj_list; int *adj_sizes; int *node_states; bool is_possible;
void detect_cycle(int u) { node_states[u] = 1; for (int i = 0; i < adj_sizes[u]; ++i) { if (node_states[adj_list[u][i]] == 0) { detect_cycle(adj_list[u][i]); if (!is_possible) return; } else if (node_states[adj_list[u][i]] == 1) { is_possible = false; return; } } node_states[u] = 2; }
bool canFinish(int numCourses, int **prerequisites, int prerequisitesSize, int *prerequisitesColSize) { is_possible = true; adj_list = (int **)malloc(sizeof(int *) * numCourses); for (int i = 0; i < numCourses; i++) { adj_list[i] = (int *)malloc(0); } adj_sizes = (int *)malloc(sizeof(int) * numCourses); memset(adj_sizes, 0, sizeof(int) * numCourses); node_states = (int *)malloc(sizeof(int) * numCourses); memset(node_states, 0, sizeof(int) * numCourses);
for (int i = 0; i < prerequisitesSize; ++i) {
int dest = prerequisites[i][1], src = prerequisites[i][0];
adj_sizes[dest]++;
adj_list[dest] = (int *)realloc(adj_list[dest], sizeof(int) * adj_sizes[dest]);
adj_list[dest][adj_sizes[dest] - 1] = src;
}
for (int i = 0; i < numCourses && is_possible; ++i) {
if (!node_states[i]) {
detect_cycle(i);
}
}
for (int i = 0; i < numCourses; i++) {
free(adj_list[i]);
}
free(adj_list);
free(adj_sizes);
free(node_states);
return is_possible;
}
Backtracking Algorithms
Backtracking represents the application of DFS on an implicit state-space tree. Problems are abstracted into tree structures where each node represents a decision. Visualizing this tree structure is crucial for understanding the algorithm. When the algorithm retreats from a recursive call, it must restore any modified state variables to accurately explore alternative branches at the same level. This state reset equates to reversing the operations applied prior to the recursive step.
Nuances of State Reset
Primitive variables are passed by value, automatically creating copies during the recursive descent, thus requiring no explicit reset. Objects or references, conversely, are shared; hence, they require manual restoration during the recursive ascent. For example, modifying a string via concatenation generates new string instances, bypassing the need for manual resets, whereas mutable structures like a StringBuilder share a single instance and demand explicti reversion upon backtracking.
Breadth-First Search (BFS), in similar scenarios, must retain all intermediate states at each level, consuming significant memory if the state space is vast. DFS only requires maintaining a single chain of states representing the path from the root to the current node, allowing for the reuse of a single state variable and efficient modification across the entire search space.
Pruning
Pruning optimizes backtracking by abandoning branches guaranteed not to contain a valid solution. By analyzing specific test cases, general pruning rules can be derived to terminate recursive exploration early when certain conditions are met, significantly reducing computational overhead.
Two-Dimensional Plane Search (Flood Fill)
Applying DFS to 2D grids treats matrix cells as nodes in a graph, where adjacent cells share edges. Techniques such as island detection or word searches are classic examples of flood fill algorithms. While BFS or Union-Find are also applicable to these grid problems, DFS provides a natural and cohesive traversal mechanism.