Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Deploying a Centralized ELFK Logging Pipeline

Tech 2

Core Component Architecture

Elasticsearch operates as the distributed indexing and search layer. Its cluster topology utilizes shard allocation and replica mechanisms to ensure data durability and query parallelism. The inverted index architecture enables sub-second retrieval across massive datasets, while built-in aggregation pipelines support real-time metrics extraction.

Logstash functions as the centralized event processing pipeline. It ingests heterogeneous streams through configurable input plugins, applies transformation filters to normalize or enrich payloads, and routes the processed records to designated storage backends. Its plugin-based design decouples ingestion, processing, and output stages, allowing granular pipeline optimization.

Filebeat serves as a lightweight, edge-deployed log forwarder. Unlike full-weight collectors, it monitors file descriptors, handles log rotation automatically, and batches network transmissions. Deploying it directly on host nodes significantly reduces central CPU and memory consumption, shifting the collection burden away from the processing tier.

Kibana provides the visualization and exploration interface. It translates Elasticsearch endex data into interactive dashboards, time-series graphs, and structured log browsers, enabling ad-hoc querying and alert threshold configuration.

Architectural Shift from ELK to ELFK

Traditional ELK deployments often run Logstash agents on every source host, leading to resource contention during traffic spikes. The ELFK pattern introduces Filebeat as a dedicated shipping layer. Filebeat handles initial log harvesting, applies backpressure management, and forwards events over the lightweight Beats protocol. Logstash transitions exclusively to a parsing and routing role, frequently positioned behind a message queue or directly consuming Beats traffic. This decoupling improves fault isolation, scales ingestion horizontally, and optimizes network utilization.

Kubernetes-Based Log Shipper Deployment

Filebeat must run as a DaemonSet to capture container standard output across every cluster node. The container runtime stores logs under /var/log/pods/ and symlinks them into /var/log/containers/. Direct host path mounts are required for filesystem access.

Container Image Reference: docker.elastic.co/beats/filebeat:7.17.0

Privilege Configuration: Assign runAsUser: 0 within the pod security context to permit reading restricted host directories. Apply appropriate readOnlyRootFilesystem and capabilities restrictions to maintain compliance.

Configuration Map:

apiVersion: v1
kind: ConfigMap
metadata:
  name: shipper-pipeline
  namespace: observability
data:
  shipper.yml: |
    filebeat.inputs:
    - type: container
      paths: ["/var/log/containers/*app_worker_*.log"]
      multiline.type: pattern
      multiline.pattern: '^[\\d]{4}-[\\d]{2}-[\\d]{2}'
      multiline.negate: true
      multiline.match: after
      tags: ["k8s_app_stream"]

    processors:
      - add_kubernetes_metadata:
          host: ${NODE_NAME}
          matchers:
          - logs_path:
              logs_path: "/var/log/containers/"
      - drop_fields:
          fields: ["agent.hostname", "ecs.version", "input.type"]

    output.logstash:
      hosts: ["192.168.10.50:5044", "192.168.10.51:5044"]
      loadbalance: true
      bulk_max_size: 2048

Volume Attachments: Mount /var/log/pods and /var/log/containers from the host node into the container at identical paths. Bind the ConfigMap to /usr/share/filebeat/filebeat.yml with read-only permissions.

Central Processing Node Setup

Logstash manages the parsing, filtering, and routing phase. Download the distribution archive, extract it to a persistent directory, and configure system-level parameters.

Daemonn Configuration (logstash.yml):

node.name: "ls-primary-node"
path.data: "/opt/logstash/storage"
path.logs: "/var/log/logstash"
pipeline.batch.size: 2000
pipeline.workers: 4
http.host: "192.168.10.50"
log.level: warn

Network Rules: Allow inbound TCP traffic on port 5044 from the Kubernetes subnet using ufw allow 5044/tcp or equivalent firewall utilities.

Pipeline Definition (app_processor.conf):

input {
  beats {
    port => 5044
    codec => plain { charset => "UTF-8" }
  }
}

filter {
  if "k8s_app_stream" in [tags] {
    split {
      field => "message"
      terminator => " - "
      target => "raw_segments"
    }

    mutate {
      add_field => {
        "service_module" => "%{[raw_segments][0]}"
        "deployed_app" => "%{[raw_segments][1]}"
        "http_status" => "%{[raw_segments][2]}"
        "event_timestamp" => "%{[raw_segments][3]}"
        "correlation_id" => "%{[raw_segments][4]}"
        "worker_thread" => "%{[raw_segments][5]}"
        "log_severity" => "%{[raw_segments][6]}"
        "source_class" => "%{[raw_segments][7]}"
      }
    }

    gsub => [
      "service_module", "[\\[\\]]", "",
      "deployed_app", "[\\[\\]]", "",
      "http_status", "[\\[\\]]", "",
      "correlation_id", "[\\[\\]]", "",
      "worker_thread", "[\\[\\]]", ""
    ]

    remove_field => ["message", "raw_segments"]
  }

  if "beats_input_codec_plain_applied" in [tags] {
    mutate { remove_tag => ["beats_input_codec_plain_applied"] }
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.10.60:9200", "http://192.168.10.61:9200", "http://192.168.10.62:9200"]
    index => "archive-%{deployed_app}-%{+YYYY.MM.dd}"
    user => "${LS_ES_USER}"
    password => "${LS_ES_PASS}"
  }
}

Service Initialization:

nohup ./bin/logstash -f ./config/app_processor.conf --config.reload.automatic --pipeline.workers 2 -l /var/log/logstash/execution.log &

Visualization and Index Lifecycle Management

Navigate to the Kibana management interface to establish data views. Define an Index Pattern matching the output schema archive-*, and designate the timestamp field as the primary time filter to enable chronological navigation.

Configure Index Lifecycle Management (ILM) to automate retention. Define a policy with a Hot phase for active writes, transitioning directly to a Delete phase after the specified duration. Recommended retention windows: 15 days for non-production clusters, and 12 months for production workloads. Attach the policy to the index template to trigger automatic rollover and cleanup, eliminating manual index administration.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.