Fading Coder

An Old Coder’s Final Dance

You are here: Home > Tech > Content

Oracle Regular Expressions: Comprehensive Guide and SQL Patterns

Tech 3

Regular expressions describe text patterns using literal characters and metacharacters. In Oracle Database (10g and later), SQL gains native regex support through a set of REGEXP_* functions that implement a POSIX-style dialect with several Perl-like features. This guide summarizes the syntax, options, and common patterns, and demonstrates them with Oracle SQL.

1. Character Matching

  • \d, \D

    • \d: digit 0–9 (Oracle synonym: [:digit:])
    • \D: not a digit
    • Example: \d\d matches 72; does not match ab
  • \w, \W

    • \w: word char [A-Za-z0-9_]
    • \W: non-word char
    • Example: \w+ matches Ab_2; does not match %*$
  • \s, \S

    • \s: whitespace (space, tab, newline, vertical tab, form feed)
    • \S: non-whitespace
    • Example: \s+ matches standard blanks in text
  • . (dot)

    • Any single character except newline, unless match parameter 'n' is used
  • [ ... ] character class

    • Example: [abc] matches one of a, b, or c
    • Ranges: [a-z], [A-Z], [0-9]
  • [^ ... ] negated class

    • Example: [^abc] matches any single char other then a, b, or c
  • POSIX character classes (preferred in Oracle in side []):

    • [:digit:], [:alpha:], [:alnum:], [:space:], [:lower:], [:upper:], [:punct:], [:xdigit:], [:blank:], [:cntrl:], [:print:], [:graph:]

2. Quantifiers

  • ? 0 or 1 time
    • 0 or more times
    • 1 or more times
  • {n} exactly n times
  • {n,} at least n times
  • {n,m} between n and m times (inclusive)

Examples:

  • x{2} matches "xx"
  • x{2,} matches "xx", "xxx", ...
  • x{2,4} matches "xx", "xxx", "xxxx"

3. Anchors and Boundaries

  • ^ start of string (or start of line when 'm' is set)
  • $ end of string (or end of line when 'm' is set)
  • \A start of string (ignores 'm')
  • \z end of string (ignores 'm')
  • \Z end of string or just before final newline
  • \b word boundary (between \w and \W)
  • \B not a word boundary

Anchors apply to the position in the input, not to characters themselves.

4. Grouping, Backreferences, and Lookarounds

  • (...) capturing group

    • Captures the submatch for later reference
    • Example: ([A-Z]{3}\d) over "ABC1DEF2XY" matches ABC1 and DEF2; \1 refers to the captured part in replacements
  • (?:...) non-capturing group

    • Groups without capturing; useful for alternation and precedence
  • (?...) named capturing group

    • Oracle supports named subexpressions in patterns and accessing them via subexpr index; use numeric backreferences (\1, \2, …) in SQL replacements
  • Lookahead

    • (?=...) positive lookahead (asserts that the following text matches ...)
    • (?!...) negative lookahead
    • Example: \S+(?=.NET) over "Java, C#.NET, VB.NET" matches C# and VB
  • Lookbehind

    • (?<=...) positive lookbehind (asserts that the preceding text matches ...)
    • (?<!...) negative lookbehind
    • Example: (?<=New\s)[A-Z][a-z]+ over "New Mexico; New England" matches Mexico and England

Notes:

  • Atomic groups (?>...) and conditional subpatterns (?(...)) are not supported by Oracle’s regex engine.

5. Alternation and Character Sets

  • Alternation |

    • A|B matches A or B, often combined with grouping: (ing|ed|s)
  • Character sets [ ... ] and ranges [a-z]

    • Inside sets, most metacharacters lose special meaning except ^ (first position), - (range), and POSIX classes

6. Escaping and Literals

Escape metacharacters with a backslash when used literallly:

  • ., *, +, ?, |, (, ), {, }, [, ], ^, $

Oracle focuses on POSIX classes for categories; use [:space:], [:digit:], etc., instead of control escapes like \t, \n for portability.

7. Match Parameters (REGEXP_* match_parameter)

  • 'c' case-sensitive (default)
  • 'i' case-insensitive
  • 'm' treat input as multi-line (activates ^ and $ per line)
  • 'n' dot (.) matches newline
  • 'x' ignore unescaped whitespace and allow comments with # inside pattern

Combine flags as a string, e.g., 'im'.

8. Oracle SQL Regex Functions

  • REGEXP_LIKE(source, pattern[, match_parameter])

    • Predicate version of LIKE
  • REGEXP_INSTR(source, pattern[, position[, occurrence[, return_option[, match_parameter[, subexpr]]]]])

    • Returns character position of a match
    • position: 1-based start (default 1)
    • occurrence: which match to locate (default 1)
    • return_option: 0 = start position, 1 = first char after match
    • subexpr: 0 = entire match, >0 = specific capturing group
  • REGEXP_SUBSTR(source, pattern[, position[, occurrence[, match_parameter[, subexpr]]]])

    • Returns the matching substring (or captured group when subexpr > 0)
  • REGEXP_REPLACE(source, pattern[, replace_string[, position[, occurrence[, match_parameter]]]])

    • Replaces occurrences matching pattern
    • Use \1, \2, ... in replace_string for backreferences
  • REGEXP_COUNT(source, pattern[, position[, match_parameter]])

    • Returns the number of matches

9. Operator Precedence (high to low)

  1. Parentheses, lookarounds, character classes [ ]
  2. Quantifiers *, +, ?, {n}, {n,}, {n,m}
  3. Anchors and assertions (^, $, \b, \B)
  4. Alternation |

10. Sample Objects and Data

CREATE TABLE rx_test (val VARCHAR2(60));

INSERT INTO rx_test VALUES ('112233445566778899');
INSERT INTO rx_test VALUES ('22113344 5566778899');
INSERT INTO rx_test VALUES ('33112244 5566778899');
INSERT INTO rx_test VALUES ('44112233 5566 778899');
INSERT INTO rx_test VALUES ('5511 2233 4466778899');
INSERT INTO rx_test VALUES ('661122334455778899');
INSERT INTO rx_test VALUES ('771122334455668899');
INSERT INTO rx_test VALUES ('881122334455667799');
INSERT INTO rx_test VALUES ('991122334455667788');
INSERT INTO rx_test VALUES ('aabbccddee');
INSERT INTO rx_test VALUES ('bbaaaccddee');
INSERT INTO rx_test VALUES ('ccabbddee');
INSERT INTO rx_test VALUES ('ddaabbccee');
INSERT INTO rx_test VALUES ('eeaabbccdd');
INSERT INTO rx_test VALUES ('ab123');
INSERT INTO rx_test VALUES ('123xy');
INSERT INTO rx_test VALUES ('007ab');
INSERT INTO rx_test VALUES ('abcxy');
INSERT INTO rx_test VALUES ('The final test is is is how to find duplicate words.');
COMMIT;

11. Query Examples

11.1 REGEXP_LIKE

-- strings starting with 1 to 3 'a'
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, '^a{1,3}');

-- 'a' occurs 1 to 3 times anywhere
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, 'a{1,3}');

-- starts with 'a' and ends with 'e'
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, '^a.*e$');

-- starts with lowercase or contains a digit
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, '^[[:lower:]]|[[:digit:]]');

-- starts with a lowercase letter
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, '^[[:lower:]]');

-- contains a non-digit
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, '[^[:digit:]]');

-- begins with a non-digit
SELECT val FROM rx_test WHERE REGEXP_LIKE(val, '^[^[:digit:]]');

11.2 REGEXP_INSTR

-- position of a digit at end of string
SELECT REGEXP_INSTR(val, '[[:digit:]]$') AS pos FROM rx_test;

-- position where trailing run of digits begins
SELECT REGEXP_INSTR(val, '[[:digit:]]+$') AS pos FROM rx_test;

-- where does "$" followed by digits occur
SELECT REGEXP_INSTR('The price is $400.', '\$[[:digit:]]+') AS pos FROM dual;

-- first non-lowercase
SELECT REGEXP_INSTR('onetwothree', '[^[:lower:]]') AS pos FROM dual;

-- first non-comma (none: returns 0)
SELECT REGEXP_INSTR(',,,,,', '[^,]') AS pos FROM dual;

-- first run of non-commas (same as above with a quantifier)
SELECT REGEXP_INSTR(',,,,,', '[^,]+') AS pos FROM dual;

Subexpression index usage:

-- return position of submatch for capturing group 1
SELECT REGEXP_INSTR('cab', 'c(a)b', 1, 1, 0, NULL, 1) AS grp1_pos FROM dual;

11.3 REGEXP_SUBSTR

-- first lowercase run
SELECT REGEXP_SUBSTR(val, '[a-z]+') AS fragment FROM rx_test;

-- first digit run
SELECT REGEXP_SUBSTR(val, '[0-9]+') AS digits FROM rx_test;

-- from start through first 'b'
SELECT REGEXP_SUBSTR('aababcde', '^a.*?b') AS slice FROM dual;

11.4 REGEXP_REPLACE

-- collapse 2+ spaces to a single space
SELECT REGEXP_REPLACE('Joe  Smith', '( ){2,}', ' ') AS normalized FROM dual;

-- reorder three whitespace-separated tokens: one two three -> three, two, one
SELECT REGEXP_REPLACE('aa bb cc', '([^ ]+) ([^ ]+) (.+)', '\3, \2, \1') AS reordered FROM dual;

Case-insensitive filtering:

CREATE TABLE rx_case (id CHAR(1), txt VARCHAR2(20));
INSERT INTO rx_case VALUES ('A', 'AAAAA');
INSERT INTO rx_case VALUES ('a', 'aaaaa');
INSERT INTO rx_case VALUES ('B', 'BBBBB');
INSERT INTO rx_case VALUES ('b', 'bbbbb');

SELECT * FROM rx_case WHERE REGEXP_LIKE(id, 'b', 'i');

12. Function Library Overview and Reference

12.1 REGEXP_COUNT

Syntax: REGEXP_COUNT(source, pattern[, position[, match_parameter]])

-- count occurrences of 'e' (case-insensitive)
SELECT REGEXP_COUNT(testcol, 'e', 1, 'i') AS result FROM (
  SELECT 'Stephen' AS testcol FROM dual UNION ALL
  SELECT 'Steven' FROM dual
);

-- count occurrences of "2a" ignoring case
SELECT REGEXP_COUNT(testcol, '2a', 1, 'i') AS result
FROM (
  SELECT '12a45' testcol FROM dual UNION ALL
  SELECT '222aa' FROM dual
);

12.2 REGEXP_INSTR examples

-- case-insensitive: find 'o' followed by 3 letters, first occurrence; return start
SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 0, 'i') AS pos FROM dual;

-- same, return the position after the match
SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 1, 'i') AS pos FROM dual;

-- second occurrence, start position
SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 0, 'i') AS pos FROM dual;

-- second occurrence, end position
SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 1, 'i') AS pos FROM dual;

-- locate try/trying/tried/tries
SELECT REGEXP_INSTR('We are trying to make the subject easier.', 'tr(y(ing)?|ied|ies)') AS pos FROM dual;

-- subexpression return: position of 'b' in 'ab'
SELECT REGEXP_INSTR('1ab2', 'a(b)', 1, 1, 0, NULL, 1) AS pos FROM dual;

12.3 REGEXP_LIKE examples

-- alphanumeric presence and lengths
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alnum:]]');
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{3}');
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{5}');

-- alphabetic only
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '^[[:alpha:]]+$');
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '^[[:alpha:]]{3}$');

-- control characters
INSERT INTO test VALUES ('xy' || CHR(10) || 'z');
COMMIT;
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:cntrl:]]');

-- digits
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:digit:]]');
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '^[[:digit:]]{5}$');

-- lowercase, uppercase
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:lower:]]{2,}');
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:upper:]]{2,}');

-- printable length
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '^[[:print:]]{6}$');

-- punctuation
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:punct:]]');

-- whitespace occurrences
SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:space:]]{2,}');

-- starts with 'ab*' (a followed by one or more b's)
SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^ab+');

-- 'a' is the third character
SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^..a');

-- two consecutive 'a' or 'z' (case-insensitive)
SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '([az])\1', 'i');

-- begins with 'Ste', ends with 'en', contains 'v' or 'ph'
SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^Ste(v|ph)en$');

-- use in a CHECK constraint
CREATE TABLE mytest (
  c1 VARCHAR2(20),
  CONSTRAINT chk_letters_only CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$'))
);

-- SSN pattern
CREATE TABLE ssn_test (ssn_col VARCHAR2(20));
INSERT INTO ssn_test VALUES ('111-22-3333');
INSERT INTO ssn_test VALUES ('111=22-3333');
INSERT INTO ssn_test VALUES ('111-A2-3333');
INSERT INTO ssn_test VALUES ('111-22-33339');
INSERT INTO ssn_test VALUES ('111-2-23333');
INSERT INTO ssn_test VALUES ('987-65-4321');
COMMIT;
SELECT ssn_col FROM ssn_test
WHERE REGEXP_LIKE(ssn_col, '^[0-9]{3}-[0-9]{2}-[0-9]{4}$');

12.4 REGEXP_REPLACE examples

-- format XXX.XXX.XXXX to (XXX) XXX-XXXX
WITH t AS (
  SELECT '111.222.3333' AS testcol FROM dual UNION ALL
  SELECT '222.333.4444' FROM dual
)
SELECT testcol,
       REGEXP_REPLACE(testcol,
                      '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})',
                      '(\1) \2-\3') AS reformatted
FROM t;

-- space after each character
SELECT testcol,
       REGEXP_REPLACE(testcol, '(.)', '\1 ' ) AS spaced
FROM (
  SELECT 'Steven' AS testcol FROM dual UNION ALL
  SELECT 'Stephen' FROM dual
)
WHERE testcol LIKE 'S%';

-- collapse multiple spaces into one
SELECT REGEXP_REPLACE('500    Oracle    Parkway,  Redwood    Shores, CA', '( ){2,}', ' ') AS result
FROM dual;

-- insert a space between lower->upper transition
SELECT REGEXP_REPLACE('GeorgeMcGovern', '([[:lower:]])([[:upper:]])', '\1 \2') AS fixed
FROM dual;

-- replace literal period
SELECT REGEXP_REPLACE('We are trying to make the subject easier.', '\.', ' for you.') AS demo
FROM dual;

-- trim trailing non-word chars and normalize
CREATE TABLE t_clean (testcol VARCHAR2(10));
INSERT INTO t_clean VALUES ('1');
INSERT INTO t_clean VALUES ('2    ');
INSERT INTO t_clean VALUES ('3 new  ');

SELECT LENGTH(testcol) AS len,
       testcol AS origval,
       REGEXP_REPLACE(testcol, '\W+$', ' ') AS newval,
       LENGTH(REGEXP_REPLACE(testcol, '\W+$', ' ')) AS newlen
FROM t_clean;

12.5 REGEXP_SUBSTR examples

-- comma, then non-commas, then comma
SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA', ',[^,]+,') AS result
FROM dual;

-- extract URL-like token
SELECT REGEXP_SUBSTR(
  'Go to http://www.oracle.com/products and click on database',
  'http://([[:alnum:]]+\.?){3,4}/?'
) AS result
FROM dual;

-- extract try/trying/tried/tries
SELECT REGEXP_SUBSTR('We are trying to make the subject easier.', 'tr(y(ing)?|ied|ies)') AS token
FROM dual;

-- 3rd field using ':' as delimiter
SELECT REGEXP_SUBSTR('system/pwd@orabase:1521:sidval', '[^:]+', 1, 3) AS third
FROM dual;

-- vertical bar-delimited field
CREATE TABLE rx_split (txt VARCHAR2(50));
INSERT INTO rx_split VALUES ('One|Two|Three|Four|Five');
SELECT REGEXP_SUBSTR(txt, '[^|]+', 1, 3) AS third FROM rx_split;

-- equivalence class: base letter and its accents (within [ ])
SELECT REGEXP_SUBSTR('iSelfSchooling NOT ISelfSchooling', '[[=i=]]SelfSchooling') AS eqv
FROM dual;

Parsing demo with REGEXP_SUBSTR:

SET SERVEROUTPUT ON;
DECLARE
  first_part VARCHAR2(10);
  second_range VARCHAR2(10);
  s VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12';
BEGIN
  first_part   := REGEXP_SUBSTR(s, '[^:]+', 1, 1);
  second_range := REGEXP_SUBSTR(s, '[^,]+', 3, 1);
  DBMS_OUTPUT.PUT_LINE(first_part || ' ' || second_range);
END;
/

13. Backreferences in Oracle

  • Use \1, \2, ... in the pattern or replacement to refer to capturing groups.
  • Example: reorder "alpha beta gamma" to "gamma, beta, alpha"
SELECT REGEXP_REPLACE('alpha beta gamma', '(\S+) (\S+) (\S+)', '\3, \2, \1') AS out
FROM dual;

14. Special Constructs

  • Start/end anchors: ^, $
  • Grouping: (...), non-capturing (?:...)
  • Alternation: |
  • Ranges and sets: [ ], [^ ]
  • Equivalence classes inside sets: [[=a=]] (matches base lettter "a" and its accented variants)

Oracle implements POSIX bracket expressions, named and numbered capturing, lookahead/lookbehind, and backreferences. Atomic groups and conditional subptaterns are not implemented.

Tags: sqloracle

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.