Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing Graceful Shutdown for Eureka Clients in Kubernetes

Tech May 13 1

Problem Statement

When deploying applications on Kubernetes using rollling updates, the platform starts a new Pod, waits for it to become healthy, and then terminates the old Pod. However, requests may still arrive at the old Pod during the termination process. If the Pod is killed while handling active requests, clients receive 500 errors, preventing smooth transitions.

In our setup, we use Java with Eureka as the service registry. Applications register with Eureka on startup and communicate internally through service-to-service calls. Services send periodic heartbeats to Eureka, and each Pod maintains a local cache mapping table with a 30-second refresh window. When a service shuts down, it performs a Eureka deregistration. During the deregistration window, the following issues can occur:

  1. The K8s Pod has already terminated, but Eureka hasn't updated its registry yet, causing requests to route to non-existent instances and return 404 errors.
  2. A request has already reached the old Pod, but during peak traffic, the service hasn't finished processing before being terminated, resulting in 500 errors.

Solution Approach

The solution involves deregistering from Eureka immediately when the Pod begins termination (before being deleted), then waiting for in-flight requests to complete before actually removing the Pod.

Key considerations:

  • Understanding Pod termination lifecycle
  • Configuring K8s preStop hook for graceful shutdown operations
  • Determining the application's request processing duration (typically 30 seconds; extend to 50 seconds if uncertain, though this affects deployment time)

Pod Termination Lifecycle

  1. New Pod starts and passes Readiness probes, joining the Service endpoint list.
  2. Old Pod enters Termination state, removed from the Service endpoint list—no new requests reach the terminating Pod.
  3. If a preStop hook is configured, it executes first. If graceful termination doesn't complete within terminationGracePeriodSeconds (default 30s), kubelet sends SIGTERM and waits 2 seconds. If the container still doesn't stop, SIGKILL forces termination.
  4. Without a preStop hook, SIGTERM initiates graceful shutdown. If termination doesn't complete within terminationGracePeriodSeconds, SIGKILL forces termination.

Important notes:

  1. SIGTERM is only caught by the PID 1 parent process for graceful shutdown. Child processes with different PIDs don't receive the signal.

For single-process containers, K8s default graceful termination works by extending terminationGracePeriodSeconds sufficiently to allow clean shutdown.

  1. For containers with multiple processes (parent plus child processes), a preStop hook is required to gracefully handle child processes before sending SIGTERM to the main process. Ensure terminationGracePeriodSeconds is configured appropriately.

Simulating Request Failures

To demonstrate the issue, perform a rolling update without any graceful shutdown configuration while continuously sending requests to the application using a load testing tool like JMeter.

Deploy the service:

kubectl apply -f service-deployment.yaml

Test the interface:

Configure 50 concurrent threads to continuously call the service endpoint. Observe request failures during the Pod replacement phase. After the old Pod is terminated, requests normalize.

This demonstrates the problem with uncontrolled rolling updates.

Implementing Eureka Graceful Shutdown

Recommended Configuration

  1. Extend terminationGracePeriodSeconds to provide adequate shutdown time.
  2. Configure a preStop hook that proactively deregisters the instance from Eureka before termination, then wait 30 seconds for Eureka's cache to refresh and remove the deregistered instance from the gateway.
  3. Allow the Pod's graceful termination to proceed naturally.

Deployment Configuration

terminationGracePeriodSeconds: 45
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "curl -X PUT 'http://eureka-service:8761/eureka/apps/service-name/${POD_IP}:service-name:8080/status?value=DOWN' -H 'Content-Type: application/vnd.spring-boot.actuator.v2+json;charset=UTF-8' && sleep 30"]

The preStop hook performs Eureka deregistration first, preventing new requests from reaching this instance. After waiting 30 seconds for cache propagation, the Pod proceeds with normal termination. Set terminationGracePeriodSeconds longer than the sleep duration to allow completion.

Deploy the Service

kubectl apply -f service-deployment.yaml

With this configuration, when the old Pod begins termination, the preStop hook executes immediately. It deregisters from Eureka, blocking new traffic, while existing requests continue processing. Only after all active requests complete does the Pod terminate.

During continuous load testing with JMeter, all requests complete successfully throughout the deployment process.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.