Efficient Pattern Matching with the Knuth-Morris-Pratt Algorithm
The Knuth-Morris-Pratt (KMP) algorithm reduces string matching complexity from quadratic O(N×M) to linear O(N+M) by eliminating redundant character comparisons. Instead of backtracking the text pointer upon a mismatch, the algorithm leverages precomputed structural information about the patern to shift it optimally. This shift relies on identifying the longest proper prefix of the pattern that also acts as a suffix for the currently matched segment.
Failure Table Construction
The core preprocessing step builds a lookup table (commonly called the LPS or border array) that stores the length of longest matching prefix-suffix for every position in the pattern. This table dictates how far the pattern pointer should retreat when a mismatch occurs.
void build_failure_table(const std::string& pattern, std::vector<int>& border) {
int m = pattern.length();
border.assign(m, 0);
int len = 0;
int idx = 1;
while (idx < m) {
if (pattern[idx] == pattern[len]) {
border[idx++] = ++len;
} else if (len > 0) {
len = border[len - 1];
} else {
border[idx++] = 0;
}
}
}
Pattern Search Routine
During the search phase, the algorithm scans the main text sequentially. When characters align, both pointers advance. On a mismatch, the pattern pointer jumps back using the precomputed table, while the text pointer remains stationary, ensuring linear traversal.
int locate_pattern(const std::string& text, const std::string& pattern) {
if (pattern.empty()) return 0;
std::vector<int> border;
build_failure_table(pattern, border);
int n = text.length();
int m = pattern.length();
int i = 0, j = 0;
while (i < n) {
if (text[i] == pattern[j]) {
++i; ++j;
}
if (j == m) {
return i - j;
} else if (i < n && text[i] != pattern[j]) {
j = (j > 0) ? border[j - 1] : 0;
if (j == 0) ++i;
}
}
return -1;
}
Application: Cyclic Substring Verification
Determining whether a target string exists within any cyclic rotation of a source string can be solved without explicit rotation. By concatenating the source string with itself, all possible cyclic shifts become contiguous substrings. If the target length does not exceed the source length, a single KMP search on the doubled source string resolves the query.
#include <iostream>
#include <string>
#include <vector>
bool is_cyclic_substring(const std::string& base, const std::string& target) {
if (target.length() > base.length()) return false;
std::string doubled = base + base;
return locate_pattern(doubled, target) != -1;
}
int main() {
std::string s1, s2;
while (std::cin >> s1 >> s2) {
std::cout << (is_cyclic_substring(s1, s2) ? "yes" : "no") << '\n';
}
return 0;
}
Application: Prefix Periodicity Analysis
For a given string, identifying repeating periods with in its prefixes relies on a fundamental property of the failure table. For a prefix of length i, the smallest period length is L = i - border[i-1]. If i is perfectly divisible by L and border[i-1] > 0, the prefix consists of i / L repetitions of a substring of length L.
#include <iostream>
#include <string>
#include <vector>
void analyze_periods(const std::string& seq) {
int n = seq.length();
std::vector<int> border;
build_failure_table(seq, border);
for (int i = 1; i < n; ++i) {
int prefix_len = i + 1;
int period_len = prefix_len - border[i];
if (border[i] > 0 && prefix_len % period_len == 0) {
std::cout << prefix_len << " " << (prefix_len / period_len) << '\n';
}
}
}
int main() {
int n, case_num = 0;
while (std::cin >> n && n != 0) {
std::string s;
std::cin >> s;
std::cout << "Test case #" << ++case_num << '\n';
analyze_periods(s);
std::cout << '\n';
}
return 0;
}
Application: Numeric Sequence Alignment
The KMP mechanism is data-type agnostic and operates on any sequence supporting equality comparison. When searching for an integer subarray within a larger integer array, the character-based logic translates directly to numeric vectors. The failure table construction and search traversal remain identical, processing integer elements instead of ASCII characters.
#include <iostream>
#include <vector>
void build_int_table(const std::vector<int>& pat, std::vector<int>& tbl) {
int m = pat.size();
tbl.assign(m, 0);
int len = 0, idx = 1;
while (idx < m) {
if (pat[idx] == pat[len]) tbl[idx++] = ++len;
else if (len > 0) len = tbl[len - 1];
else tbl[idx++] = 0;
}
}
int find_int_sequence(const std::vector<int>& arr, const std::vector<int>& sub) {
if (sub.empty()) return 0;
std::vector<int> tbl;
build_int_table(sub, tbl);
int n = arr.size(), m = sub.size();
int i = 0, j = 0;
while (i < n) {
if (arr[i] == sub[j]) { ++i; ++j; }
if (j == m) return i - j + 1;
else if (i < n && arr[i] != sub[j]) {
j = (j > 0) ? tbl[j - 1] : 0;
if (j == 0) ++i;
}
}
return -1;
}
int main() {
int t;
std::cin >> t;
while (t--) {
int n, m;
std::cin >> n >> m;
std::vector<int> a(n), b(m);
for (int& x : a) std::cin >> x;
for (int& x : b) std::cin >> x;
std::cout << find_int_sequence(a, b) << '\n';
}
return 0;
}