Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Implementing Timeout Control for Java Regular Expressions

Notes 1

Problem Statement

In production environments, regular expressions are widely used for text processing. However, certain pathological patterns can cause extreme CPU consumption, leading to service degradation. This issue becomes particularly critical when:

  1. The regex engine encounters catastrophic backtracking
  2. No timeout mechanism exists to halt execution
  3. Multiple services share the same infrastructure may be affected
  4. Incidents occurring outside business hours delay response time

Solution Approach

Two complementary strategies address this challenge:

Long-term approach: Refactor existing regex patterns to use possessive quantifiers or atomic groups where possible.

Short-term approach: Implement timeout control at the application layer to serve as a safety net.

The implementation wraps the input CharSequence with a timing-checked proxy that validates executoin time on each character access.

Implementation Details

Key Design Points

  1. Override the charAt() method to perform timeout verification on every access
  2. Use a dedicated time provider instead of System.currentTimeMillis() to avoid contention in high-concurrency scenarios
  3. Throw a runtime exception when timeout expires to unwind the stack

Core Implementation

package com.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTimeoutHandler {

    private static final int DEFAULT_TIMEOUT_MS = 1000;

    public static RegexMatchResult execute(String input, String pattern) {
        return execute(input, pattern, DEFAULT_TIMEOUT_MS);
    }

    public static RegexMatchResult execute(String input, String pattern, int timeoutMs) {
        Matcher matcher = createTimedMatcher(input, pattern, timeoutMs);
        try {
            boolean matched = matcher.find();
            return RegexMatchResult.builder()
                    .matched(matched)
                    .matcher(matched ? matcher : null)
                    .build();
        } catch (Exception e) {
            System.out.println("Regex timeout interrupted: " + e.getMessage());
            return RegexMatchResult.builder().matched(false).build();
        }
    }

    private static Matcher createTimedMatcher(String input, String pattern, int timeoutMs) {
        Pattern compiled = Pattern.compile(pattern);
        CharSequence timed = new TimedCharSequence(input, timeoutMs, input, pattern);
        return compiled.matcher(timed);
    }

    private static class TimedCharSequence implements CharSequence {

        private final CharSequence delegate;
        private final int timeoutMs;
        private final long deadline;
        private final String inputData;
        private final String pattern;

        TimedCharSequence(CharSequence delegate, int timeoutMs, String inputData, String pattern) {
            this.delegate = delegate;
            this.timeoutMs = timeoutMs;
            this.inputData = inputData;
            this.pattern = pattern;
            this.deadline = HighResolutionClock.currentTimeMillis() + timeoutMs;
        }

        @Override
        public char charAt(int index) {
            if (HighResolutionClock.currentTimeMillis() > deadline) {
                throw new RuntimeException(
                        "Regex execution exceeded " + timeoutMs + "ms. Pattern: " + pattern + ", Input: " + inputData);
            }
            return delegate.charAt(index);
        }

        @Override
        public int length() {
            return delegate.length();
        }

        @Override
        public CharSequence subSequence(int start, int end) {
            return new TimedCharSequence(delegate.subSequence(start, end), timeoutMs, inputData, pattern);
        }

        @Override
        public String toString() {
            return delegate.toString();
        }
    }
}

Result Container

package com.example.util.regex;

import java.util.regex.Matcher;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class RegexMatchResult {

    private Boolean matched;
    private Matcher matcher;
}

High-Resolution Clock

In high-throughput systems, calling System.currentTimeMillis() introduces contention due to system-wide locks. This implementation uses a dedicated scheduler to update a shared time value.

package com.example.util.regex;

import java.sql.Timestamp;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.TimeUnit;

public class HighResolutionClock {

    private final long updateInterval;
    private volatile long currentTime;

    private HighResolutionClock(long interval) {
        this.updateInterval = interval;
        this.currentTime = System.currentTimeMillis();
        startScheduler();
    }

    private static class InstanceLoader {

        private static final HighResolutionClock CLOCK = new HighResolutionClock(1);
    }

    public static long currentTimeMillis() {
        return InstanceLoader.CLOCK.currentTime;
    }

    public static String now() {
        return new Timestamp(InstanceLoader.CLOCK.currentTime).toString();
    }

    private void startScheduler() {
        ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(new ThreadFactory() {
            @Override
            public Thread newThread(Runnable task) {
                Thread thread = new Thread(task, "clock-updater");
                thread.setDaemon(true);
                return thread;
            }
        });
        executor.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                currentTime = System.currentTimeMillis();
            }
        }, updateInterval, updateInterval, TimeUnit.MILLISECONDS);
    }
}

Usage Example

package com.example.util.regex;

public class RegexTimeoutDemo {

    public static void main(String[] args) {
        // This pattern typically causes catastrophic backtracking
        RegexMatchResult result = RegexTimeoutHandler.execute(
                "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", 
                "(x+x+)+y",
                500);
        System.out.println(result.getMatched() ? "Match found" : "No match");

        // Normal pattern
        result = RegexTimeoutHandler.execute("xxxxxxx", "x*", 100);
        System.out.println(result.getMatched() ? "Match found" : "No match");
    }
}

Performance Considerations

  1. The timeout check occurs on every charAt() call, adding minimal overhead
  2. The HighResolutionClock singleton reduces system call frequency
  3. Stack unwinding via exception is fast compared to indefinite hanging
  4. Adjust timeout values based on typical pattern complexity and input length

Conclusion

This approach provides a safety net for regex operations without requiring modification of existing patterns. It effectively prevents runaway CPU consumption while maintaining acceptable performance characteristics.

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.