Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Hands-On Guide to Filebeat Outputs, Logstash Pipelines, and Filter Plugins

Tech May 19 1

Multi-Line Aggregation with Filebeat

An alternative approach to merging log lines relies on a predefined line count. The configuration below directs Filebeat to combine every three consecutive lines into a single event.

# config/multiline-count-console.yaml
filebeat.inputs:
- type: log
  paths:
    - /tmp/oldboyedu-linux85/linux85.log
  multiline:
    type: count
    count_lines: 3

output.console:
  pretty: true

Collecting Container Logs with Filebeat

Deploying Docker

Start by obtaining and extracting the Docker packages, then install them locally.

wget http://192.168.15.253/ElasticStack/day05-/softwares/oldboyedu-docker-ce-23_0_1.tar.gz
tar xf oldboyedu-docker-ce-23_0_1.tar.gz
yum -y localinstall oldboyedu-docker-ce-23_0_1/*.rpm

Configuring Registry Mirrors

Set a mirror to improve pull speed by editing the Docker daemon configuration.

{
  "data-root": "/var/lib/docker",
  "registry-mirrors": [
    "https://tuv7rqqq.mirror.aliyuncs.com",
    "https://hub-mirror.c.1com/",
    "https://docker.mirrors.ustc.edu.cn",
    "https://reg-mirror.qiniu.com"
  ]
}
systemctl enable --now docker

Launching Sample Containers

Two containers serve as log sources: an Nginx instance and a Tomcat instance.

docker run -dp 88:80 --name mynginx --restart always nginx:1.22.1-alpine
docker run -dp 89:8080 --name mytomcat --restart always  tomcat:jre8-alpine

Input Types: docker vs. container

Collect logs directly from Docker containers using the dedicated input type, which can target all containers via a wildcard ID.

# config/docker-input-console.yaml
filebeat.inputs:
- type: docker
  containers.ids:
    - '*'

output.console:
  pretty: true

Alternatively, tapp the underlying container log files on disk with the container input type, sending14:13 the records to Elasticsearch instead of stdout.

# config/container-input-es.yaml
filebeat.inputs:
- type: container
  paths:
    - '/var/lib/docker/containers/*/*.log'

output.elasticsearch:
  hosts:
    - "http://10.0.0.101:9200"
    - "http://10.0.0.102:9200"
    - "http://10.0.0.103:9200"

Exploring the filestream Input

With Filebeat 7.16 onwards, the log type is deprecated in favor of filestream, which introduces integrated parsers for reading files and transforming their contents.

Basic and JSON Parsing

The ndjson parser can decode JSON streams, optionally capturnig errors and nesting decoded fields under a custom target.

# config/filestream-mixed-demo.yaml
filebeat.inputs:
- type: filestream
  enabled: false
  paths:
    - /tmp/oldboyedu-linux85/linux85.log

- type: filestream
  enabled: false
  paths:
    - /tmp/oldboyedu-linux85/docker.json
  parsers:
    - ndjson:
        add_error_key: true
        overwrite_keys: true
        target: oldboyedu-linux85

- type: filestream
  enabled: false
  paths:
    - /tmp/oldboyedu-linux85/linux85.log
  parsers:
    - multiline:
        type: count
        count_lines: 3

- type: filestream
  enabled: true
  paths:
    - /tmp/oldboyedu-linux85/demo.log
  parsers:
    - multiline:
        type: count
        count_lines: 4
    - ndjson:
        add_error_key: true
        overwrite_keys: true
        target: oldboyedu-linux85-demo

output.console:
  pretty: true

Multi-Line JSON Practical Example

Combine a count-based multiline aggregator with the ndjson parser before sending14:13 the results directly to Elasticsearch.

# config/filestream-es-lab.yaml
filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - /tmp/oldboyedu-linux85/shopping.json
  parsers:
    - multiline:
        type: count
        count_lines: 7
    - ndjson:
        add_error_key: true
        overwrite_keys: true

output.elasticsearch:
  hosts:
    - "http://10.0.0.101:9200"
    - "http://10.0.0.102:9200"
    - "http://10.0.0.103:9200"

Diverse Output Destinations

Local File Storage

Filebeat can persist14:13 events to the filesystem instead of a remote service.

# config/stdin-to-file.yaml
filebeat.inputs:
- type: stdin

output.file:
  path: "/tmp/oldboyedu-linux85"
  filename: stdin.log

Indexing to Elasticsearch with Custom Settings

Output to Elasticsearch offers full control over index naming, ILM, shard counts, and replicas.

# config/filestream-es-custom.yaml
filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - /tmp/oldboyedu-linux85/shopping.json
  parsers:
    - multiline:
        type: count
        count_lines: 7
    - ndjson:
        add_error_key: true
        overwrite_keys: true

output.elasticsearch:
  hosts:
    - "http://10.0.0.101:9200"
    - "http://10.0.0.102:9200"
    - "http://10.0.0.103:9200"
  index: "oldboyedu-linux85-shopping-%{+yyyy.MM.dd}"

setup.ilm.enabled: false
setup.template.name: "oldboyedu-linux85-shopping"
setup.template.pattern: "oldboyedu-linux85-shopping-*"
setup.template.overwrite: true
setup.template.settings:
  index.number_of_shards: 8
  index.number_of_replicas: 0

Condition-Based Routing to Multiple Indices

Tag each input and use conditional indices to route data streams into separate Elasticsearch indices.

# config/filestream-multi-index.yaml
filebeat.inputs:
- type: filestream
  enabled: true
  tags: ["docker"]
  paths:
    - /tmp/oldboyedu-linux85/docker.json
  parsers:
    - ndjson:
        add_error_key: true

- type: filestream
  enabled: true
  tags: ["linux85"]
  paths:
    - /tmp/oldboyedu-linux85/linux85.log
  parsers:
    - multiline:
        type: count
        count_lines: 3

- type: filestream
  enabled: true
  tags: ["demo"]
  paths:
    - /tmp/oldboyedu-linux85/demo.log
  parsers:
    - multiline:
        type: count
        count_lines: 4
    - ndjson:
        add_error_key: true
        overwrite_keys: true
        target: oldboyedu-linux85-demo

output.elasticsearch:
  hosts:
    - "http://10.0.0.101:9200"
    - "http://10.0.0.102:9200"
    - "http://10.0.0.103:9200"
  indices:
    - index: "oldboyedu-jiaoshi07-docker-%{+yyyy.MM.dd}"
      when.contains:
        tags: "docker"
    - index: "oldboyedu-jiaoshi07-linux85-%{+yyyy.MM.dd}"
      when.contains:
        tags: "linux85"
    - index: "oldboyedu-jiaoshi07-demo-%{+yyyy.MM.dd}"
      when.contains:
        tags: "demo"

setup.ilm.enabled: false
setup.template.name: "oldboyedu-jiaoshi07"
setup.template.pattern: "oldboyedu-jiaoshi07-*"
setup.template.overwrite: true
setup.template.settings:
  index.number_of_shards: 3
  index.number_of_replicas: 0

Logstash: Collection and Enrichment

Installing via RPM

Download and install the RPM package, then create a convenient symlink.

wget http://192.168.15.253/ElasticStack/day05-/softwares/logstash-7.17.5-x86_64.rpm
rpm -ivh logstash-7.17.5-x86_64.rpm
ln -svf /usr/share/logstash/bin/logstash /usr/local/sbin

Installing from Tarball

Alternatively, use a binary archive.

wget http://192.168.15.253/ElasticStack/day05-/softwares/logstash-7.17.5-linux-x86_64.tar.gz
tar xf logstash-7.17.5-linux-x86_64.tar.gz -C /oldboyedu/softwares/
ln -svf /oldboyedu/softwares/logstash-7.17.5/bin/logstash /usr/local/sbin/

Quick Command-Line Pipelines

Test a simple stdin-to-stdout pipeline directly from the shell.

logstash -e "input { stdin { } } output { stdout { codec => rubydebug } }"

First Configuration File

Write a basic pipeline definition and run it with the -f flag.

# config/stdin-stdout.conf
input {
  stdin { }
}

output {
  stdout { }
}
logstash -f config/stdin-stdout.conf

Integrating Filebeat and Logstash

Serve a Beats input on a custom port inside Logstash, then instruct Filebeat to forward events there.

Logstash configuration:

# config/beats-in.conf
input {
  beats {
    port => 8888
  }
}

output {
  stdout { }
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "oldboyedu-linux85-logstash"
  }
}
logstash -rf config/beats-in.conf

Filebeat configuration:

# config/nginx-to-logstash.yaml
filebeat.inputs:
- type: log
  paths:
    - /var/log/nginx/access.log*

output.logstash:
  hosts: ["10.0.0.101:8888"]
filebeat -e -c config/nginx-to-logstash.yaml

Enrichment with Filters

geoip IP Geolocation

Use a pre-parsed client IP field to append latitude, longitude, and country data while pruning noise fields.

# config/beats-geoip.conf
input {
  beats {
    port => 8888
  }
}

filter {
  geoip {
    source => "clientip"
    remove_field => [ "agent", "log", "input", "host", "ecs", "tags" ]
  }
}

output {
  stdout { }
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "oldboyedu-linux85-logstash"
  }
}

alter Filebeat to extract JSON keys at the root level so clientip is16:37 available for geoip:

# config/nginx-json-to-logstash.yaml
filebeat.inputs:
- type: log
  paths:
    - /var/log/nginx/access.log
  json.keys_under_root: true
  json.add_error_key: true

output.logstash:
  hosts: ["10.0.0.101:8888"]

Sample log entries:

{"@timestamp":"2023-04-06T16:17:43+08:00","host":"10.0.0.103","clientip":"110.110.110.110","status":"200"}

Grok for Native Nginx Logs

When the log format is16:37 standard combined log entries, leverage the grok filter with HTTPD_COMBINEDLOG and then apply geoip on the extracted clientip.

# config/beats-grok-geoip.conf
input {
  beats {
    port => 8888
  }
}

filter {
  grok {
    match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    remove_field => [ "agent", "log", "input", "host", "ecs", "tags" ]
  }

  geoip {
    source => "clientip"
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "oldboyedu-linux85-logstash-nginx"
  }
}
# filebeat config unchanged
filebeat.inputs:
- type: log
  paths:
    - /tmp/oldboyedu-linux85/access.log

output.logstash:
  hosts: ["10.0.0.101:8888"]

Fixing Timestamps with the Date Filter

alterwhen the log contains a human-readable timestamp like 22/Nov/2015:11:57:34 +0800, use date to parse it and store the result in a custom field.

# config/beats-date-override.conf
input {
  beats {
    port => 8888
  }
}

filter {
  grok {
    match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    remove_field => [ "agent", "log", "input", "host", "ecs", "tags" ]
  }

  geoip {
    source => "clientip"
  }

  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    timezone => "Asia/Shanghai"
    target => "oldboyedu-linux85-date"
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "oldboyedu-linux85-logstash-nginx-date"
  }
}

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.