Home > Tech > Content

String Algorithm Fundamentals: Border Theory, Z-Function, and Palindrome Automata

Tech 1

Prefix Function (KMP)

The prefix function (also called failure function or pi function) measures the longest proper prefix that is also a suffix for each position. Here's an iterative computation:

int pi[1000005];

void computePrefix(char* s, int n) {
    for (int i = 2; i <= n; i++) {
        int j = pi[i-1];
        while (j > 0 && s[i] != s[j+1]) 
            j = pi[j];
        if (s[i] == s[j+1]) j++;
        pi[i] = j;
    }
}

int main() {
    char str[1000005];
    scanf("%s", str + 1);
    int n = strlen(str + 1);
    computePrefix(str, n);
    for (int i = 1; i <= n; i++)
        printf("%d ", pi[i]);
    return 0;
}

The array pi[i] represents the length of the longest border for prefix ending at position i.

Z-Function

The Z-function computes the longest common prefix (LCP) between the entire string and each of its suffixes. For a string S of length n, Z[i] equals the length of the longest subtsring starting at position i that matches a prefix of S.

Algorithm: Maintain a [L, R] interval representing the rightmost matching segment discovered so far. For each position i:

If i > R, start fresh with zero matching and extend manually
Otherwise, we can utilize information from the matching segment: initialize Z[i] = min(Z[i-L], R-i+1), then extend

This guarantees R never decreases, yielding O(n) time complexity.

int Z[1000005];

void zFunction(char* s, int n) {
    Z[1] = n;
    int left = 0, right = 0;
    for (int i = 2; i <= n; i++) {
        if (i <= right) 
            Z[i] = min(Z[i-left+1], right - i + 1);
        while (i + Z[i] <= n && s[i + Z[i]] == s[Z[i] + 1]) 
            Z[i]++;
        if (i + Z[i] - 1 > right) {
            left = i;
            right = i + Z[i] - 1;
        }
    }
}

Extended Z-Function: Given two strings S and T, compute Z[T] and then use it to find the LCP between S and every suffix of S. This is useful for pattern matching scenarios where we need to match a pattern against multiple text positions.

int extZ[1000005];

void extendedZ(char* pattern, int n, char* text, int m) {
    zFunction(text, m);
    int left = 0, right = 0;
    for (int i = 1; i <= n; i++) {
        if (i <= right) 
            extZ[i] = min(Z[i-left+1], right - i + 1);
        while (i + extZ[i] <= n && pattern[i + extZ[i]] == text[extZ[i] + 1]) 
            extZ[i]++;
        if (i + extZ[i] - 1 > right) {
            left = i;
            right = i + extZ[i] - 1;
        }
    }
}

Border Tree Applications

A border is a nonempty string that is both a proper prefix and a suffix. The failure links form a tree structure where each node's ancestors represent all borders of that prefix.

Number of Short Borders: For each position i, the count of borders with length ≤ i/2 equals the number of ancestors in the failure tree whose depth satisfies this constraint. Using DFS with stack tracking achieves O(n) time.

Longest Common Border: For two prefix positions p and q, their longest common border is simply the LCA in the failure tree. Special care must be taken to exclude the trivial case where one is an ancestor of the other.

Covering Problem: Find the shortest border whose occurrences cover the entire string. For each candidate border length i, check whether all occurrence endpoints (sorted) have gaps not exceeding i. Implementation uses bidirectional linked lists for O(n) updates.

Periodicity Theory

Weak Periodicity Lemma: If p and q are periods of a string S with p + q ≤ |S|, then gcd(p, q) is also a period.

Proof: Let d = p - q (assuming p > q). For positions greater than q, we have s[i] = s[i-q] = s[i-q+p] = s[i+d]. For positions less than p, we have s[i] = s[i+p] = s[i+p-q] = s[i+d]. Hence d is a period, and repeated subtraction yields the gcd.

Periodicity Lemma: If p and q are periods and p + q - gcd(p, q) ≤ |S|, then gcd(p, q) is a period.

Divisor Period Theorem: If string T has period a, and substring S = T[:i] has period b where b | a and |S| ≥ a, then b is also a period of T.

Border Progression: All borders with length ≥ |S|/2 form an arithmetic progression. This follows from the periodicity lemmas - the minimal period divides all other periods.

Border Count Bound: The number of distinct border lengths is at most ⌈log₂|S|⌉. This results from the halving argument when considering borders ≥ |S|/2.

Arithmetic Matching: If 2|S| ≥ |T|, then all occurrences of S in T form an arithmetic progression. This is proven by showing the gaps between consecutive matches must be periods of S.

Palindrome Automaton (PAM)

The palindrome automaton is a suffix-tree-like structure for palindromes. Each node represents a unique palindromic substring.

Structure:

Each node stores: length len, failure link fail, and transitions next[alphabet]
Two roots: odd root with len = -1 and even root with len = 0
Failure link points to the longest palindromic proper suffix
The chain of failure links from any node gives all palindromic suffixes

Construction: Process the string incrementally. For each new character, traverse failure links to find the longest new palindromic suffix by checking if the character matches to the left of each candidate.

struct PalindromeAutomaton {
    static const int ALPHA = 26;
    static const int MAXN = 500005;
    
    int next[MAXN][ALPHA];
    int fail[MAXN];
    int len[MAXN];
    int last;
    int sz;
    char s[MAXN];
    
    void init() {
        fail[0] = fail[1] = 1;
        len[1] = -1;
        sz = 1;
        last = 0;
    }
    
    int getFail(int v, int pos) {
        while (pos - len[v] - 1 < 0 || s[pos - len[v] - 1] != s[pos])
            v = fail[v];
        return v;
    }
    
    void insert(char c, int pos) {
        s[pos] = c;
        int cur = getFail(last, pos);
        int idx = c - 'a';
        if (!next[cur][idx]) {
            len[++sz] = len[cur] + 2;
            int temp = getFail(fail[cur], pos);
            fail[sz] = next[temp][idx];
            next[cur][idx] = sz;
        }
        last = next[cur][idx];
    }
};

Complexity: Adding one character creates at most one new node (the longest new palindromic suffix). Failure links strictly decrease in length, and each node is traversed at most twice during construction. Total complexity is O(n) time and O(n·|Σ|) space.

Applications:

Counting distinct palindromic substrings
Counting palindrmoes ending at each position
Finding the longest palindromic substring at each position

Higher-Order Palindromes

A palindrome has order k if removing outer characters k-1 times still yields a palindrome. Let f[i][j] represent the order of substring s[i:j].

Transition: If s[i] ≠ s[j] or the inner substring is not a palindrome, then f[i][j] = 0. Otherwise, f[i][j] = f[i + (j-i+1)/2 - 1][j] + 1.

Efficient Computation: Using the PAM, maintain for each node v its "half" - the longest palindromic suffix with length ≤ len[v]/2. When creating a new node via transition p → x → np, we find half(p) and traverse ancestors until we can form another palindrome suffix, yielding half(np). The order is:

f(v) = f(half(v)) + 1 if len(half(v)) = floor(len(v)/2), otherwise 1.

Palindrome Periodicity

If a palindrome S has period p, its cyclic period can be split into two palindromes of lengths p - (|S| mod p) and |S| mod p. This follows from the palindrome property.

Let v be the longest proper palindromic prefix (or suffix) of palindrome u. If 2|v| ≥ |u|, then v appears exactly twice - as prefix and suffix. This constrains where long proper palindromes can occur within a larger palindrome.

Minimum Suffix

Consider suffixes sorted by lexicographic order. The significant suffixes are those prefixes of some minimal suffix when extended by any character. Key properties:

For any two significant suffixes U and V where |U| < |V|, U must be a prefix of V
Doubling Theorem: If U and V are significant suffixes with |U| < |V|, then 2|U| < |V|

This guarantees only O(log n) significant suffixes exist for any string.

Tags: string algorithms KMP

Back to List

Prev: Component Communication Patterns in Vue.js

Next: Configuring Network Services and SSH Key Authentication on Linux

Fading Coder

String Algorithm Fundamentals: Border Theory, Z-Function, and Palindrome Automata

Prefix Function (KMP)

Z-Function

Border Tree Applications

Periodicity Theory

Palindrome Automaton (PAM)

Higher-Order Palindromes

Palindrome Periodicity

Minimum Suffix

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

String Algorithm Fundamentals: Border Theory, Z-Function, and Palindrome Automata

Prefix Function (KMP)

Z-Function

Border Tree Applications

Periodicity Theory

Palindrome Automaton (PAM)

Higher-Order Palindromes

Palindrome Periodicity

Minimum Suffix

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment