Redis Sentinel High-Availability Monitoring and Failover Mechanism
Overview
Background
In a Redis master-slave replication setup, if the primary node fails, no automatic failover occurs—slave nodes remain idle until the master recovers. Redis Sentinel addresses this by continuously monitoring the master instance. When it detects a failure, Sentinel orchestrates an automatic promotion of one slave to become the new master, ensuring continuous service availability.
Core Functions
- Health Monitoring: Tracks the status of both master and slave instances.
- Automatic Failover: Promotes a slave to master upon detecting a master outage.
Key Benefits
- Real-time Monitoring: Ensures master and slave nodes are operational.
- Alerting: Notifies clients about topology changes during failover.
- Failover Automation: Automatically reconfigures slaves to follow the new master.
- Configuration Orchestration: Clients query Sentinels to discover the current master address dynamically.
Practical Demonstration
Architecture
A typical configuration includes one master and two slaves, with three Sentinel instances deployed on the same machine due to resource constraints. This setup simulates a distributed environment in a single host.
Configuration
Each Sentinel instance requires a unique configuration file to avoid port conflicts. Below are key parameters:
bind: Network interface to listen on (e.g.,0.0.0.0for external access).daemonize: Run as background process (yes).protected-mode: Disable for testing (no).port: Unique port per Sentinel (26379, 26380, 26381).logfile: Path to log file.pidfile: PID file location.dir: Working directory for temporary files.sentinel monitor <master-name> <ip> <port> <quorum>: Defines the master to monitor;quorumis the minimum number of Sentinels agreeing on a master’s failure before initiating failover.sentinel auth-pass <master-name> <password>: Authentication password for the master.sentinel down-after-milliseconds <master-name> <milliseconds>: Time after which a Sentinel marks the master as unreachable.sentinel parallel-syncs <master-name> <count>: Number of slaves that can sync simultaneously post-failover.sentinel failover-timeout <master-name> <milliseconds>: Timeout duration for completing failover.sentinel notification-script <master-name> <path>: Script executed on specific events.sentinel client-reconfig-script <master-name> <path>: Script invoked when master address changes.
Example configurations for three Sentinels:
# sentinel1.conf
bind 0.0.0.0
daemonize yes
protected-mode no
port 26379
logfile "/usr/local/sentinel/sentinel126379.log"
pidfile /var/run/redis-sentinel126379.pid
dir /usr/local/sentinel
sentinel monitor master 192.168.31.250 6379 2
sentinel auth-pass master 123456
# sentinel2.conf
bind 0.0.0.0
daemonize yes
protected-mode no
port 26380
logfile "/usr/local/sentinel/sentinel126379.log"
pidfile /var/run/redis-sentinel126379.pid
dir /usr/local/sentinel
sentinel monitor master 192.168.31.250 6379 2
sentinel auth-pass master 123456
# sentinel3.conf
bind 0.0.0.0
daemonize yes
protected-mode no
port 26381
logfile "/usr/local/sentinel/sentinel126379.log"
pidfile /var/run/redis-sentinel126379.pid
dir /usr/local/sentinel
sentinel monitor master 192.168.31.250 6379 2
sentinel auth-pass master 123456
Ensure all Redis instances have authentication enabled.
Starting Sentinels
Two equivalent methods to start Sentinel processes:
redis-sentinel /path/to/sentinel1.conf
redis-server /path/to/sentinel2.conf --sentinel
Testing Failover
Manually stop the master Redis instance to simulate a failure. Observations:
- Slave data remains intact.
- One slave is automatically promoted to master.
- Previously downed master, upon restart, becomes a slave to the new master.
Failover Process and Election Logic (Interview Focus)
Introduction
When the master becomes unavailable, Sentinel initiates a failover to select a new master. Other slaves then reconfigure to replicate from the new master. It's recommended to deploy an odd number of Sentinels to prevent split-brain scenarios.
Step-by-step Failover Flow
Subjective Down (SDOWN)
A single Sentinel declares the master offline if it doesn't receive a valid response within down-after-milliseconds.
Objective Down (ODOWN)
Only triggered when at least quorum Sentinels agree the master is unreachable. This prevents false positives due to network delays or temporary disconnections.
Leader Election via Raft Algorithm
Sentinels communicate to elect a leader responsible for orchestrating the failover. The Raft-based election follows a "first-come, first-accepted" principle: a Sentinel proposing leadership gets support unless already committed to another.
Failover Execution Steps
-
New Master Selection
- Highest priority (
slave-priorityorreplica-priority) — lower values mean higher priority. - Largest replication offset — indicates most up-to-date data.
- Smallest run ID — deterministic tiebreaker.
- Highest priority (
-
Promotion and Reconfiguration
- The elected leader executes
slaveof no oneon the chosen slave to promote it to master. - Sends
slaveof <new-master-ip> <port>commands to remaining slaves to reconfigure them.
- The elected leader executes
-
Old Master Reintegration
- The former master is set as a slave of the new master.
- Upon recovery, it syncs from the new master and operates as a replica.
Best Practices
- Deploy multiple Sentinels in a cluster for redundancy.
- Use an odd number of Sentinel instances (e.g., 3, 5) to avoid voting deadlock.
- Maintain identical configurations across all Sentinels.
- Ensure correct port mapping when running in containers like Docker.
- Note: Sentinel + master-slave does not guarantee zero data loss during failover.