Deploying a Centralized ELFK Logging Pipeline
Core Component Architecture
Elasticsearch operates as the distributed indexing and search layer. Its cluster topology utilizes shard allocation and replica mechanisms to ensure data durability and query parallelism. The inverted index architecture enables sub-second retrieval across massive datasets, while built-in aggregation pipelines support real-time metrics extraction.
Logstash functions as the centralized event processing pipeline. It ingests heterogeneous streams through configurable input plugins, applies transformation filters to normalize or enrich payloads, and routes the processed records to designated storage backends. Its plugin-based design decouples ingestion, processing, and output stages, allowing granular pipeline optimization.
Filebeat serves as a lightweight, edge-deployed log forwarder. Unlike full-weight collectors, it monitors file descriptors, handles log rotation automatically, and batches network transmissions. Deploying it directly on host nodes significantly reduces central CPU and memory consumption, shifting the collection burden away from the processing tier.
Kibana provides the visualization and exploration interface. It translates Elasticsearch endex data into interactive dashboards, time-series graphs, and structured log browsers, enabling ad-hoc querying and alert threshold configuration.
Architectural Shift from ELK to ELFK
Traditional ELK deployments often run Logstash agents on every source host, leading to resource contention during traffic spikes. The ELFK pattern introduces Filebeat as a dedicated shipping layer. Filebeat handles initial log harvesting, applies backpressure management, and forwards events over the lightweight Beats protocol. Logstash transitions exclusively to a parsing and routing role, frequently positioned behind a message queue or directly consuming Beats traffic. This decoupling improves fault isolation, scales ingestion horizontally, and optimizes network utilization.
Kubernetes-Based Log Shipper Deployment
Filebeat must run as a DaemonSet to capture container standard output across every cluster node. The container runtime stores logs under /var/log/pods/ and symlinks them into /var/log/containers/. Direct host path mounts are required for filesystem access.
Container Image Reference: docker.elastic.co/beats/filebeat:7.17.0
Privilege Configuration: Assign runAsUser: 0 within the pod security context to permit reading restricted host directories. Apply appropriate readOnlyRootFilesystem and capabilities restrictions to maintain compliance.
Configuration Map:
apiVersion: v1
kind: ConfigMap
metadata:
name: shipper-pipeline
namespace: observability
data:
shipper.yml: |
filebeat.inputs:
- type: container
paths: ["/var/log/containers/*app_worker_*.log"]
multiline.type: pattern
multiline.pattern: '^[\\d]{4}-[\\d]{2}-[\\d]{2}'
multiline.negate: true
multiline.match: after
tags: ["k8s_app_stream"]
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_fields:
fields: ["agent.hostname", "ecs.version", "input.type"]
output.logstash:
hosts: ["192.168.10.50:5044", "192.168.10.51:5044"]
loadbalance: true
bulk_max_size: 2048
Volume Attachments: Mount /var/log/pods and /var/log/containers from the host node into the container at identical paths. Bind the ConfigMap to /usr/share/filebeat/filebeat.yml with read-only permissions.
Central Processing Node Setup
Logstash manages the parsing, filtering, and routing phase. Download the distribution archive, extract it to a persistent directory, and configure system-level parameters.
Daemonn Configuration (logstash.yml):
node.name: "ls-primary-node"
path.data: "/opt/logstash/storage"
path.logs: "/var/log/logstash"
pipeline.batch.size: 2000
pipeline.workers: 4
http.host: "192.168.10.50"
log.level: warn
Network Rules: Allow inbound TCP traffic on port 5044 from the Kubernetes subnet using ufw allow 5044/tcp or equivalent firewall utilities.
Pipeline Definition (app_processor.conf):
input {
beats {
port => 5044
codec => plain { charset => "UTF-8" }
}
}
filter {
if "k8s_app_stream" in [tags] {
split {
field => "message"
terminator => " - "
target => "raw_segments"
}
mutate {
add_field => {
"service_module" => "%{[raw_segments][0]}"
"deployed_app" => "%{[raw_segments][1]}"
"http_status" => "%{[raw_segments][2]}"
"event_timestamp" => "%{[raw_segments][3]}"
"correlation_id" => "%{[raw_segments][4]}"
"worker_thread" => "%{[raw_segments][5]}"
"log_severity" => "%{[raw_segments][6]}"
"source_class" => "%{[raw_segments][7]}"
}
}
gsub => [
"service_module", "[\\[\\]]", "",
"deployed_app", "[\\[\\]]", "",
"http_status", "[\\[\\]]", "",
"correlation_id", "[\\[\\]]", "",
"worker_thread", "[\\[\\]]", ""
]
remove_field => ["message", "raw_segments"]
}
if "beats_input_codec_plain_applied" in [tags] {
mutate { remove_tag => ["beats_input_codec_plain_applied"] }
}
}
output {
elasticsearch {
hosts => ["http://192.168.10.60:9200", "http://192.168.10.61:9200", "http://192.168.10.62:9200"]
index => "archive-%{deployed_app}-%{+YYYY.MM.dd}"
user => "${LS_ES_USER}"
password => "${LS_ES_PASS}"
}
}
Service Initialization:
nohup ./bin/logstash -f ./config/app_processor.conf --config.reload.automatic --pipeline.workers 2 -l /var/log/logstash/execution.log &
Visualization and Index Lifecycle Management
Navigate to the Kibana management interface to establish data views. Define an Index Pattern matching the output schema archive-*, and designate the timestamp field as the primary time filter to enable chronological navigation.
Configure Index Lifecycle Management (ILM) to automate retention. Define a policy with a Hot phase for active writes, transitioning directly to a Delete phase after the specified duration. Recommended retention windows: 15 days for non-production clusters, and 12 months for production workloads. Attach the policy to the index template to trigger automatic rollover and cleanup, eliminating manual index administration.