Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Regular Expression Escaping and Non-Greedy Quantifiers

Tech May 15 1

String Escaping in RegExp Constructors

When constructing regular expressions in JavaScript using the RegExp constructor with a string argument, backslashes must be double-escaped. Failing to do so results in syntax errors because the string parser consumes the first backslash, leaving an invalid regex pattern.

For instance, attempting to validate a numeric string that may contain an optional plus sign and a decimal point might trigger an error:

Uncaught SyntaxError: Invalid regular expression: /^(+?d+)(.d+)?$/: Nothing to repeat

The problematic source code:

const decimalRegex = new RegExp('^(\+?\d+)(\.\d+)?$');

At first glance, the regex appears correct. However, the error message reveals that the backslashes are missing. In a regular string literal, \+ evaluates to +, and \d evaluates to d, causing the +? quantifier to lack a preceding token. The solution requires double backslashes so the string passes the escaped sequences to the regex engine:

const decimalRegex = new RegExp('^(\\+?\\d+)(\\.\\d+)?$');

Non-Greedy Quantifiers

By default, quantifiers like + and * are greedy, meaning they match as much text as possible. Appending a ? turns them into non-greedy (or lazy) quantifiers, instructing the engine to match the smallest possible number of characters until the subsequent pattern is satisfied.

Consider the pattern Value: (.+?)[;.]. The .+? portion matches one or more characters lazily, stopping at the first occurrence of a semicolon or period. If the trailing character class [;.] is omitted, such as in Value: (.+?), the lazy quantifier extends to the end of the line or string because there is no subsequent pattern to satisfy, effectively behaving similarly to its greedy counterpart in this specific context.

Extracting Text Inside Quotation Marks

A common use case for lazy matching is extracting strings enclosed within double quotes. The pattern "(.*?)" captures the minimal number of characters between a pair of quotes, preventing over-matching when multiple quoted segments exist on the same line.

Java Implementation

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class QuotedStringExtractor {
    public static void main(String[] args) {
        String inputText = "The user typed \"admin\" and then \"password\"";
        Pattern quotePattern = Pattern.compile("\"(.*?)\"");
        Matcher matcher = quotePattern.matcher(inputText);

        while (matcher.find()) {
            System.out.println(matcher.group(1));
        }
    }
}

The java.util.regex API compiles the pattern and iterates through matches. The group(1) method retrieves the first captured group, which is the content inside the quotes. The output will be:

admin
password

Python Implementation

import re

source_string = 'The user typed "admin" and then "password"'
found_matches = re.findall(r'"(.*?)"', source_string)

print(found_matches)
# Output: ['admin', 'password']

Using raw strings (r'...') in Python eliminates the need for double escaping. The re.findall function returns a list of all captured group matches.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.