Logstash Grok and Regular Expressions with Practical Examples
Regular exrpession primitives relevant to Grok
-
Control and whitespace escapes
- \cX: Control character for letter X (A–Z). Example: \cM is carriage return.
- \f: Form feed (0x0C).
- \n: Newline (0x0A).
- \r: Carriage return (0x0D).
- \t: Horizontal tab (0x09).
- \v: Vertical tab (0x0B).
- \s: Any whitespace (space, tab, form feed, etc.).
- \S: Any non-whitespace.
-
Anchors and boundaries
- ^: Start of string (or start of line when multiline is enabled).
- $: End of string (or end of line when multiline is enabled).
- \b: Word boundary.
- \B: Non-word boundary.
-
Grouping, alternation, and character classes
- (...): Capturing group; use for subexpressions.
- [...]: Character class; e.g., [A-F0-9].
- |: Alternation (logical OR) between alternatives.
- \: Escape the next character to treat it literally or to introduce an escape sequence.
- .: Any character except newline (unless dotall is enabled).
-
Quantifiers
- *: 0 or more repetitions of the preceding token (same as {0,}).
- +: 1 or more repetitions (same as {1,}).
- ?: 0 or 1 repetition (same as {0,1}); also marks a non-greedy modifier when used as *? or +?.
- {n}: Exactly n repetitions.
- {n,}: Atleast n repetitions.
- {n,m}: Between n and m repetitions (inclusive).
Built-in Grok patterns and customization
Grok relies on a library of reusable patterns (e.g., IPV4, WORD, TIMESTAMP_ISO8601) distributed with logstash-paterns-core. You can extend these by either:
- Placing custom pattern files in a directory referenced by
patterns_dir. - Defining inline patterns with
pattern_definitions.
Example of an inline pattern that accepts an IPv4 or a literal dash:
filter {
grok {
pattern_definitions => {
"IPV4_OR_DASH" => "(?:%{IPV4}|-)"
}
match => { "message" => "^client=%{IPV4_OR_DASH:client} status=%{NUMBER:code:int}$" }
}
}
Example 1: Parse OpenSSH authentication events
Sample log line:
Jan 12 12:00:08 localhost sshd[2043]: Accepted password for root from 172.16.11.239 port 51763 ssh2
Filter configuration:
filter {
grok {
match => {
"message" => [
"^%{SYSLOGTIMESTAMP:sys_ts} %{HOSTNAME:sys_host} sshd\[%{NUMBER:ssh_pid}\]: %{WORD:auth_action} %{WORD:auth_method} for %{USERNAME:ssh_user} from %{IP:src_ip} port %{NUMBER:src_port:int} ssh2$"
]
}
tag_on_failure => ["_grok_failure_sshd"]
}
mutate {
rename => { "auth_action" => "status" }
remove_field => ["message"]
}
}
Illustrative event fields after parsing:
{
"sys_ts": "Jan 12 12:00:08",
"sys_host": "localhost",
"ssh_pid": "2043",
"status": "Accepted",
"auth_method": "password",
"ssh_user": "root",
"src_ip": "172.16.11.239",
"src_port": 51763
}
Notes:
- Anchoring with ^ and $ reduces false positives.
- Prefer Grok patterns (IP, NUMBER, USERNAME) over ad‑hoc regex where possible for readability.
Example 2: Extract fields from an IIS‑like HTTP access line
Sample line:
2016-11-30 06:33:33 192.168.5.116 GET /Hotel/HotelDisplay/cncqcqb230 - 80 - 192.168.9.2 Mozilla/5.0+(Macintosh;+U;+Intel+Mac+OS+X+10.9;+en-US;+rv:1.9pre)+Gecko - 200 0 0 45
Filter configuration:
filter {
grok {
match => {
"message" => [
"^%{TIMESTAMP_ISO8601:when}\s+%{IPORHOST:client_ip}\s+%{WORD:method}\s+%{URIPATHPARAM:uri_path}\s+-\s+%{NUMBER:dest_port:int}\s+-\s+%{IPORHOST:server_ip}\s+%{GREEDYDATA:user_agent}\s+-\s+%{NUMBER:status:int}\s+%{NUMBER:substatus:int}\s+%{NUMBER:win32_status:int}\s+%{NUMBER:time_taken:int}$"
]
}
}
date {
match => ["when", "yyyy-MM-dd HH:mm:ss", "ISO8601"]
target => "@timestamp"
}
mutate {
remove_field => ["message"]
}
}
Illustrative event fields after parsing:
{
"@timestamp": "2016-11-29T22:33:33.000Z",
"when": "2016-11-30 06:33:33",
"client_ip": "192.168.5.116",
"method": "GET",
"uri_path": "/Hotel/HotelDisplay/cncqcqb230",
"dest_port": 80,
"server_ip": "192.168.9.2",
"user_agent": "Mozilla/5.0+(Macintosh;+U;+Intel+Mac+OS+X+10.9;+en-US;+rv:1.9pre)+Gecko",
"status": 200,
"substatus": 0,
"win32_status": 0,
"time_taken": 45
}
Notes:
%{URIPATHPARAM}captures both path and query when present.- If a dash can appear in IP/port positions, consider a custom pattern like
IPV4_OR_DASH.
Example 3: Java application logs with millisecond timestamps
Sample line:
2017-08-25 08:52:58.123 INFO c.p.modules.push.service.impl.PushServiceImpl - 11 Producer:%7B%22applicationId%22%3A%2257d3b7e60c53a62bb60f2aa4%22%2C%22customFilds%22%3A%7B%22url%22%3A%22wemeeting%3A%2F%2Fwemeeting.im.bbdtek.com
Filter configuration:
filter {
grok {
match => {
"message" => [
"^%{TIMESTAMP_ISO8601:log_time}\s+%{LOGLEVEL:level}\s+%{JAVACLASS:logger}\s+-\s+%{GREEDYDATA:payload}$"
]
}
}
date {
match => ["log_time", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"]
target => "@timestamp"
}
mutate {
remove_field => ["message"]
}
}
Illustrative event fields after parsing:
{
"@timestamp": "2017-08-25T08:52:58.123Z",
"log_time": "2017-08-25 08:52:58.123",
"level": "INFO",
"logger": "c.p.modules.push.service.impl.PushServiceImpl",
"payload": "11 Producer:%7B%22applicationId%22%3A%2257d3b7e60c53a62bb60f2aa4%22%2C%22customFilds%22%3A%7B%22url%22%3A%22wemeeting%3A%2F%2Fwemeeting.im.bbdtek.com"
}
Notes:
%{JAVACLASS}matches fully qualified class names.- Use
%{GREEDYDATA}at the end of the pattern to capture the remaining message; constrain earlier tokens to avoid accidental over-capture.