Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Efficient Pattern Matching with the KMP Algorithm

Tech May 16 1

Given two strings text and pattern, find all starting indices in text where pattern occurs as a contiguous substring. Additionally, for every prefix of pattern, compute the length of its longest proper border—a proper border is a non-empty substring that is both a prefix and a suffix of the given prefix.

A naive approach would slide the pattern over the text one position at a time, comparing characters individually. This leads to a time complexity of $O(n \cdot m)$, where $n$ and $m$ are the lengths of the text and pattern respectively. However, this method redundantly rechecks chraacters already matched after a mismatch.

The Knuth-Morris-Pratt (KMP) algorithm avoids this redundancy by preprocessing the pattern to build a failure function (also known as the prefix function or border array). The value fail[i] stores the length of longest proper border of the prefix pattern[0..i-1].

Using this table, during the matching phase, when a mismatch occurs at position j in the pattern, instead of restarting from the beginning, the algorithm uses fail[j] to determine the next viable alignment—skipping unnecessary comparisons.

Matching Phase:

int j = 0;
for (int i = 0; i < n; ++i) {
    while (j > 0 && text[i] != pattern[j])
        j = fail[j - 1];
    if (text[i] == pattern[j])
        ++j;
    if (j == m) {
        cout << i - m + 2 << '\n'; // 1-based index
        j = fail[j - 1];
    }
}

Failure Function Construction:

fail[0] = 0;
int j = 0;
for (int i = 1; i < m; ++i) {
    while (j > 0 && pattern[i] != pattern[j])
        j = fail[j - 1];
    if (pattern[i] == pattern[j])
        ++j;
    fail[i] = j;
}

Combining both phases yields the complete KMP algorithm, which runs in $O(n + m)$ time.

Full implementation:

#include <iostream>
#include <cstring>
using namespace std;

const int MAXN = 1e6 + 5;
char text[MAXN], pattern[MAXN];
int fail[MAXN];

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    cin >> (text + 1) >> (pattern + 1);
    int n = strlen(text + 1), m = strlen(pattern + 1);

    // Build failure function for pattern (1-indexed)
    fail[1] = 0;
    int j = 0;
    for (int i = 2; i <= m; ++i) {
        while (j > 0 && pattern[j + 1] != pattern[i])
            j = fail[j];
        if (pattern[j + 1] == pattern[i])
            ++j;
        fail[i] = j;
    }

    // Search pattern in text
    j = 0;
    for (int i = 1; i <= n; ++i) {
        while (j > 0 && pattern[j + 1] != text[i])
            j = fail[j];
        if (pattern[j + 1] == text[i])
            ++j;
        if (j == m) {
            cout << i - m + 1 << '\n';
            j = fail[j];
        }
    }

    // Output failure function values
    for (int i = 1; i <= m; ++i)
        cout << fail[i] << " ";
    return 0;
}

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.