Efficient Pattern Matching with the KMP Algorithm
Given two strings text and pattern, find all starting indices in text where pattern occurs as a contiguous substring. Additionally, for every prefix of pattern, compute the length of its longest proper border—a proper border is a non-empty substring that is both a prefix and a suffix of the given prefix.
A naive approach would slide the pattern over the text one position at a time, comparing characters individually. This leads to a time complexity of $O(n \cdot m)$, where $n$ and $m$ are the lengths of the text and pattern respectively. However, this method redundantly rechecks chraacters already matched after a mismatch.
The Knuth-Morris-Pratt (KMP) algorithm avoids this redundancy by preprocessing the pattern to build a failure function (also known as the prefix function or border array). The value fail[i] stores the length of longest proper border of the prefix pattern[0..i-1].
Using this table, during the matching phase, when a mismatch occurs at position j in the pattern, instead of restarting from the beginning, the algorithm uses fail[j] to determine the next viable alignment—skipping unnecessary comparisons.
Matching Phase:
int j = 0;
for (int i = 0; i < n; ++i) {
while (j > 0 && text[i] != pattern[j])
j = fail[j - 1];
if (text[i] == pattern[j])
++j;
if (j == m) {
cout << i - m + 2 << '\n'; // 1-based index
j = fail[j - 1];
}
}
Failure Function Construction:
fail[0] = 0;
int j = 0;
for (int i = 1; i < m; ++i) {
while (j > 0 && pattern[i] != pattern[j])
j = fail[j - 1];
if (pattern[i] == pattern[j])
++j;
fail[i] = j;
}
Combining both phases yields the complete KMP algorithm, which runs in $O(n + m)$ time.
Full implementation:
#include <iostream>
#include <cstring>
using namespace std;
const int MAXN = 1e6 + 5;
char text[MAXN], pattern[MAXN];
int fail[MAXN];
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
cin >> (text + 1) >> (pattern + 1);
int n = strlen(text + 1), m = strlen(pattern + 1);
// Build failure function for pattern (1-indexed)
fail[1] = 0;
int j = 0;
for (int i = 2; i <= m; ++i) {
while (j > 0 && pattern[j + 1] != pattern[i])
j = fail[j];
if (pattern[j + 1] == pattern[i])
++j;
fail[i] = j;
}
// Search pattern in text
j = 0;
for (int i = 1; i <= n; ++i) {
while (j > 0 && pattern[j + 1] != text[i])
j = fail[j];
if (pattern[j + 1] == text[i])
++j;
if (j == m) {
cout << i - m + 1 << '\n';
j = fail[j];
}
}
// Output failure function values
for (int i = 1; i <= m; ++i)
cout << fail[i] << " ";
return 0;
}