Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Logstash Grok and Regular Expressions with Practical Examples

Tech 3

Regular exrpession primitives relevant to Grok

  • Control and whitespace escapes

    • \cX: Control character for letter X (A–Z). Example: \cM is carriage return.
    • \f: Form feed (0x0C).
    • \n: Newline (0x0A).
    • \r: Carriage return (0x0D).
    • \t: Horizontal tab (0x09).
    • \v: Vertical tab (0x0B).
    • \s: Any whitespace (space, tab, form feed, etc.).
    • \S: Any non-whitespace.
  • Anchors and boundaries

    • ^: Start of string (or start of line when multiline is enabled).
    • $: End of string (or end of line when multiline is enabled).
    • \b: Word boundary.
    • \B: Non-word boundary.
  • Grouping, alternation, and character classes

    • (...): Capturing group; use for subexpressions.
    • [...]: Character class; e.g., [A-F0-9].
    • |: Alternation (logical OR) between alternatives.
    • \: Escape the next character to treat it literally or to introduce an escape sequence.
    • .: Any character except newline (unless dotall is enabled).
  • Quantifiers

    • *: 0 or more repetitions of the preceding token (same as {0,}).
    • +: 1 or more repetitions (same as {1,}).
    • ?: 0 or 1 repetition (same as {0,1}); also marks a non-greedy modifier when used as *? or +?.
    • {n}: Exactly n repetitions.
    • {n,}: Atleast n repetitions.
    • {n,m}: Between n and m repetitions (inclusive).

Built-in Grok patterns and customization

Grok relies on a library of reusable patterns (e.g., IPV4, WORD, TIMESTAMP_ISO8601) distributed with logstash-paterns-core. You can extend these by either:

  • Placing custom pattern files in a directory referenced by patterns_dir.
  • Defining inline patterns with pattern_definitions.

Example of an inline pattern that accepts an IPv4 or a literal dash:

filter {
  grok {
    pattern_definitions => {
      "IPV4_OR_DASH" => "(?:%{IPV4}|-)"
    }
    match => { "message" => "^client=%{IPV4_OR_DASH:client} status=%{NUMBER:code:int}$" }
  }
}

Example 1: Parse OpenSSH authentication events

Sample log line:

Jan 12 12:00:08 localhost sshd[2043]: Accepted password for root from 172.16.11.239 port 51763 ssh2

Filter configuration:

filter {
  grok {
    match => {
      "message" => [
        "^%{SYSLOGTIMESTAMP:sys_ts} %{HOSTNAME:sys_host} sshd\[%{NUMBER:ssh_pid}\]: %{WORD:auth_action} %{WORD:auth_method} for %{USERNAME:ssh_user} from %{IP:src_ip} port %{NUMBER:src_port:int} ssh2$"
      ]
    }
    tag_on_failure => ["_grok_failure_sshd"]
  }
  mutate {
    rename => { "auth_action" => "status" }
    remove_field => ["message"]
  }
}

Illustrative event fields after parsing:

{
  "sys_ts": "Jan 12 12:00:08",
  "sys_host": "localhost",
  "ssh_pid": "2043",
  "status": "Accepted",
  "auth_method": "password",
  "ssh_user": "root",
  "src_ip": "172.16.11.239",
  "src_port": 51763
}

Notes:

  • Anchoring with ^ and $ reduces false positives.
  • Prefer Grok patterns (IP, NUMBER, USERNAME) over ad‑hoc regex where possible for readability.

Example 2: Extract fields from an IIS‑like HTTP access line

Sample line:

2016-11-30 06:33:33 192.168.5.116 GET /Hotel/HotelDisplay/cncqcqb230 - 80 - 192.168.9.2 Mozilla/5.0+(Macintosh;+U;+Intel+Mac+OS+X+10.9;+en-US;+rv:1.9pre)+Gecko - 200 0 0 45

Filter configuration:

filter {
  grok {
    match => {
      "message" => [
        "^%{TIMESTAMP_ISO8601:when}\s+%{IPORHOST:client_ip}\s+%{WORD:method}\s+%{URIPATHPARAM:uri_path}\s+-\s+%{NUMBER:dest_port:int}\s+-\s+%{IPORHOST:server_ip}\s+%{GREEDYDATA:user_agent}\s+-\s+%{NUMBER:status:int}\s+%{NUMBER:substatus:int}\s+%{NUMBER:win32_status:int}\s+%{NUMBER:time_taken:int}$"
      ]
    }
  }
  date {
    match => ["when", "yyyy-MM-dd HH:mm:ss", "ISO8601"]
    target => "@timestamp"
  }
  mutate {
    remove_field => ["message"]
  }
}

Illustrative event fields after parsing:

{
  "@timestamp": "2016-11-29T22:33:33.000Z",
  "when": "2016-11-30 06:33:33",
  "client_ip": "192.168.5.116",
  "method": "GET",
  "uri_path": "/Hotel/HotelDisplay/cncqcqb230",
  "dest_port": 80,
  "server_ip": "192.168.9.2",
  "user_agent": "Mozilla/5.0+(Macintosh;+U;+Intel+Mac+OS+X+10.9;+en-US;+rv:1.9pre)+Gecko",
  "status": 200,
  "substatus": 0,
  "win32_status": 0,
  "time_taken": 45
}

Notes:

  • %{URIPATHPARAM} captures both path and query when present.
  • If a dash can appear in IP/port positions, consider a custom pattern like IPV4_OR_DASH.

Example 3: Java application logs with millisecond timestamps

Sample line:

2017-08-25 08:52:58.123 INFO c.p.modules.push.service.impl.PushServiceImpl - 11 Producer:%7B%22applicationId%22%3A%2257d3b7e60c53a62bb60f2aa4%22%2C%22customFilds%22%3A%7B%22url%22%3A%22wemeeting%3A%2F%2Fwemeeting.im.bbdtek.com

Filter configuration:

filter {
  grok {
    match => {
      "message" => [
        "^%{TIMESTAMP_ISO8601:log_time}\s+%{LOGLEVEL:level}\s+%{JAVACLASS:logger}\s+-\s+%{GREEDYDATA:payload}$"
      ]
    }
  }
  date {
    match => ["log_time", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"]
    target => "@timestamp"
  }
  mutate {
    remove_field => ["message"]
  }
}

Illustrative event fields after parsing:

{
  "@timestamp": "2017-08-25T08:52:58.123Z",
  "log_time": "2017-08-25 08:52:58.123",
  "level": "INFO",
  "logger": "c.p.modules.push.service.impl.PushServiceImpl",
  "payload": "11 Producer:%7B%22applicationId%22%3A%2257d3b7e60c53a62bb60f2aa4%22%2C%22customFilds%22%3A%7B%22url%22%3A%22wemeeting%3A%2F%2Fwemeeting.im.bbdtek.com"
}

Notes:

  • %{JAVACLASS} matches fully qualified class names.
  • Use %{GREEDYDATA} at the end of the pattern to capture the remaining message; constrain earlier tokens to avoid accidental over-capture.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.