Implementing Graceful Shutdown for Eureka Clients in Kubernetes
Problem Statement
When deploying applications on Kubernetes using rollling updates, the platform starts a new Pod, waits for it to become healthy, and then terminates the old Pod. However, requests may still arrive at the old Pod during the termination process. If the Pod is killed while handling active requests, clients receive 500 errors, preventing smooth transitions.
In our setup, we use Java with Eureka as the service registry. Applications register with Eureka on startup and communicate internally through service-to-service calls. Services send periodic heartbeats to Eureka, and each Pod maintains a local cache mapping table with a 30-second refresh window. When a service shuts down, it performs a Eureka deregistration. During the deregistration window, the following issues can occur:
- The K8s Pod has already terminated, but Eureka hasn't updated its registry yet, causing requests to route to non-existent instances and return 404 errors.
- A request has already reached the old Pod, but during peak traffic, the service hasn't finished processing before being terminated, resulting in 500 errors.
Solution Approach
The solution involves deregistering from Eureka immediately when the Pod begins termination (before being deleted), then waiting for in-flight requests to complete before actually removing the Pod.
Key considerations:
- Understanding Pod termination lifecycle
- Configuring K8s preStop hook for graceful shutdown operations
- Determining the application's request processing duration (typically 30 seconds; extend to 50 seconds if uncertain, though this affects deployment time)
Pod Termination Lifecycle
- New Pod starts and passes Readiness probes, joining the Service endpoint list.
- Old Pod enters Termination state, removed from the Service endpoint list—no new requests reach the terminating Pod.
- If a preStop hook is configured, it executes first. If graceful termination doesn't complete within
terminationGracePeriodSeconds(default 30s), kubelet sends SIGTERM and waits 2 seconds. If the container still doesn't stop, SIGKILL forces termination. - Without a preStop hook, SIGTERM initiates graceful shutdown. If termination doesn't complete within
terminationGracePeriodSeconds, SIGKILL forces termination.
Important notes:
- SIGTERM is only caught by the PID 1 parent process for graceful shutdown. Child processes with different PIDs don't receive the signal.
For single-process containers, K8s default graceful termination works by extending terminationGracePeriodSeconds sufficiently to allow clean shutdown.
- For containers with multiple processes (parent plus child processes), a preStop hook is required to gracefully handle child processes before sending SIGTERM to the main process. Ensure
terminationGracePeriodSecondsis configured appropriately.
Simulating Request Failures
To demonstrate the issue, perform a rolling update without any graceful shutdown configuration while continuously sending requests to the application using a load testing tool like JMeter.
Deploy the service:
kubectl apply -f service-deployment.yaml
Test the interface:
Configure 50 concurrent threads to continuously call the service endpoint. Observe request failures during the Pod replacement phase. After the old Pod is terminated, requests normalize.
This demonstrates the problem with uncontrolled rolling updates.
Implementing Eureka Graceful Shutdown
Recommended Configuration
- Extend
terminationGracePeriodSecondsto provide adequate shutdown time. - Configure a preStop hook that proactively deregisters the instance from Eureka before termination, then wait 30 seconds for Eureka's cache to refresh and remove the deregistered instance from the gateway.
- Allow the Pod's graceful termination to proceed naturally.
Deployment Configuration
terminationGracePeriodSeconds: 45
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "curl -X PUT 'http://eureka-service:8761/eureka/apps/service-name/${POD_IP}:service-name:8080/status?value=DOWN' -H 'Content-Type: application/vnd.spring-boot.actuator.v2+json;charset=UTF-8' && sleep 30"]
The preStop hook performs Eureka deregistration first, preventing new requests from reaching this instance. After waiting 30 seconds for cache propagation, the Pod proceeds with normal termination. Set terminationGracePeriodSeconds longer than the sleep duration to allow completion.
Deploy the Service
kubectl apply -f service-deployment.yaml
With this configuration, when the old Pod begins termination, the preStop hook executes immediately. It deregisters from Eureka, blocking new traffic, while existing requests continue processing. Only after all active requests complete does the Pod terminate.
During continuous load testing with JMeter, all requests complete successfully throughout the deployment process.