Implementing Timeout Control for Java Regular Expressions
Problem Statement
In production environments, regular expressions are widely used for text processing. However, certain pathological patterns can cause extreme CPU consumption, leading to service degradation. This issue becomes particularly critical when:
- The regex engine encounters catastrophic backtracking
- No timeout mechanism exists to halt execution
- Multiple services share the same infrastructure may be affected
- Incidents occurring outside business hours delay response time
Solution Approach
Two complementary strategies address this challenge:
Long-term approach: Refactor existing regex patterns to use possessive quantifiers or atomic groups where possible.
Short-term approach: Implement timeout control at the application layer to serve as a safety net.
The implementation wraps the input CharSequence with a timing-checked proxy that validates executoin time on each character access.
Implementation Details
Key Design Points
- Override the
charAt()method to perform timeout verification on every access - Use a dedicated time provider instead of
System.currentTimeMillis()to avoid contention in high-concurrency scenarios - Throw a runtime exception when timeout expires to unwind the stack
Core Implementation
package com.example.util.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTimeoutHandler {
private static final int DEFAULT_TIMEOUT_MS = 1000;
public static RegexMatchResult execute(String input, String pattern) {
return execute(input, pattern, DEFAULT_TIMEOUT_MS);
}
public static RegexMatchResult execute(String input, String pattern, int timeoutMs) {
Matcher matcher = createTimedMatcher(input, pattern, timeoutMs);
try {
boolean matched = matcher.find();
return RegexMatchResult.builder()
.matched(matched)
.matcher(matched ? matcher : null)
.build();
} catch (Exception e) {
System.out.println("Regex timeout interrupted: " + e.getMessage());
return RegexMatchResult.builder().matched(false).build();
}
}
private static Matcher createTimedMatcher(String input, String pattern, int timeoutMs) {
Pattern compiled = Pattern.compile(pattern);
CharSequence timed = new TimedCharSequence(input, timeoutMs, input, pattern);
return compiled.matcher(timed);
}
private static class TimedCharSequence implements CharSequence {
private final CharSequence delegate;
private final int timeoutMs;
private final long deadline;
private final String inputData;
private final String pattern;
TimedCharSequence(CharSequence delegate, int timeoutMs, String inputData, String pattern) {
this.delegate = delegate;
this.timeoutMs = timeoutMs;
this.inputData = inputData;
this.pattern = pattern;
this.deadline = HighResolutionClock.currentTimeMillis() + timeoutMs;
}
@Override
public char charAt(int index) {
if (HighResolutionClock.currentTimeMillis() > deadline) {
throw new RuntimeException(
"Regex execution exceeded " + timeoutMs + "ms. Pattern: " + pattern + ", Input: " + inputData);
}
return delegate.charAt(index);
}
@Override
public int length() {
return delegate.length();
}
@Override
public CharSequence subSequence(int start, int end) {
return new TimedCharSequence(delegate.subSequence(start, end), timeoutMs, inputData, pattern);
}
@Override
public String toString() {
return delegate.toString();
}
}
}
Result Container
package com.example.util.regex;
import java.util.regex.Matcher;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class RegexMatchResult {
private Boolean matched;
private Matcher matcher;
}
High-Resolution Clock
In high-throughput systems, calling System.currentTimeMillis() introduces contention due to system-wide locks. This implementation uses a dedicated scheduler to update a shared time value.
package com.example.util.regex;
import java.sql.Timestamp;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.TimeUnit;
public class HighResolutionClock {
private final long updateInterval;
private volatile long currentTime;
private HighResolutionClock(long interval) {
this.updateInterval = interval;
this.currentTime = System.currentTimeMillis();
startScheduler();
}
private static class InstanceLoader {
private static final HighResolutionClock CLOCK = new HighResolutionClock(1);
}
public static long currentTimeMillis() {
return InstanceLoader.CLOCK.currentTime;
}
public static String now() {
return new Timestamp(InstanceLoader.CLOCK.currentTime).toString();
}
private void startScheduler() {
ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(new ThreadFactory() {
@Override
public Thread newThread(Runnable task) {
Thread thread = new Thread(task, "clock-updater");
thread.setDaemon(true);
return thread;
}
});
executor.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
currentTime = System.currentTimeMillis();
}
}, updateInterval, updateInterval, TimeUnit.MILLISECONDS);
}
}
Usage Example
package com.example.util.regex;
public class RegexTimeoutDemo {
public static void main(String[] args) {
// This pattern typically causes catastrophic backtracking
RegexMatchResult result = RegexTimeoutHandler.execute(
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"(x+x+)+y",
500);
System.out.println(result.getMatched() ? "Match found" : "No match");
// Normal pattern
result = RegexTimeoutHandler.execute("xxxxxxx", "x*", 100);
System.out.println(result.getMatched() ? "Match found" : "No match");
}
}
Performance Considerations
- The timeout check occurs on every
charAt()call, adding minimal overhead - The
HighResolutionClocksingleton reduces system call frequency - Stack unwinding via exception is fast compared to indefinite hanging
- Adjust timeout values based on typical pattern complexity and input length
Conclusion
This approach provides a safety net for regex operations without requiring modification of existing patterns. It effectively prevents runaway CPU consumption while maintaining acceptable performance characteristics.