Home > Notes > Content

Managing Redis Cache Issues and Monitoring Key Metrics

Notes May 15 1

Overview of Four Common Redis Cache Problems

Problem	Symptom	Mitigation Approach	Remarks
Cache Warm-Up	Service crashes shortly after launch	Load hot entries first; accelerate loading; sync master-slave data	Requires routine hot-entry analysis
Cache Avalanche	Mass expiration of keys → DB overload	Multi-level cache; static page rendering; optimize queries; alerting + throttling + circuit breaker + isolation; vary TTLs; permanent keys; locking; delayed refresh; adjust eviction policy	Combine prevention & reaction strategies
Cache Breakdown	Sudden DB spike despite stable keys	Locking; pre-set TTL for likely hot keys; delayed refresh; secondary cache	Focus on specific high-risk keys
Cache Penetration	Gradual hit-rate drop + high CPU + DB pressure	Cache nulls; whitelist via bitmap/Bloom filter; encrypt keys; monitoring + blacklist	Use temporarily; remove when resolved

Cache Warm-Up

Typical Scenario

A newly deployed application using Redis crashes quickly under load due to:

High request volume
Heavy master-slave sync traffic
Frequent RDBMS reads

Warm-Up Process

Preparation: Identify hot entries continuously.

Heuristic method: Log access frequency, extract frequently read items.
Algorithmic method: Maintain retention queue using LRU (e.g., Storm + Kafka pipeline).

Steps:

Classify entries by priority; preload high-priority items into Redis.
Parallelize loading across distributed nodes to shorten duration.
Preload both master and replica instances.

Execution:

Trigger warm-up via scheduled scripts.
Optionally integrate CDN for better delivery.

Summary: Preloading critical entries avoids initial DB queries, letting users hit ready-to-serve cache immediately.

Cache Avalanche

Scenario

During steady operation, DB connections surge causing:

Client errors: 408 (timeout), 500 (server error)
Server collapse: DB, app, Redis, and cluster failures even after restart

Root Cause: Many keys expire simultaneous, forcing mass DB fetches which overwhelm the DB and cascade failure.

Two origins:

Cache layer failure → all requests hit DB.
Bulk expiration of popular keys → direct DB hits.

Preventive Measures

Static rendering of high-traffic pages.
Multi-tier caching: User → HTTP cache → CDN → proxy cache → local process cache → distributed cache → DB.
Optimize slow DB operations (long queries, heavy transactions).
Alerting system: track CPU usage, memory, avg response time, thread count; apply throttling or degradation to shed excess load temporarily.

Tiered Cache Characteristics:

HTTP + CDN: serve static assets efficiently.
Proxy cache: stable dynamic resources.
Local process cache (Ehcache, Guava, Caffeine): fast but limited; sync via MQ or timer.
Distributed cache (Redis cluster): large scale, robust.

Resilience Patterns:

Circuit breaker: reroute traffic from faulty cache node.
Throttling: limit incoming requests at edge/proxy.
Isolation: queue requests when cache rebuilding/preheating.

Reactive Strategies

Mix LRU/LFU eviction policies.
Stagger TTLs: e.g., group A = 90min, B = 80min, C = 70min; add random offset to spread expirations.
Permanent keys for super-hot entries.
Scheduled maintenance: analyze near-expiry access patterns, extend TTL where needed.
Locking (use cautiously): single-threaded refresh with snapshot rebuild; primary-replica failover.

Summary: Avalanche stems from concentrated expirations flooding DB. Spread TTLs and combine with layered defenses plus real-time metrics tuning.

Cache Breakdown

Scenario

System runs normally, no mass key expiry, yet DB load spikes and crashes—common with viral products.

Diagnosis:

Specific hot key expires.
Multiple requests miss Redis and hammer DB for same record.

Cause: Single high-traffic key expiry event.

Mitigation

Predictive TTL: identify likely hot keys (e.g., flash-sale items) and set suitable expiry.
Live adjustment: monitor access frequency, extend TTL or make permanent during surges.
Background renewal: refresh TTL before peak periods.
Secondary cache: use different expiry to avoid simultaneous invalidation.
Distributed lock: prevent concurrent DB loads on miss (mind performance impact).

Summary: Breakdown is a single-key expiry under high concurrency. Prevent via data analysis, live monitoring, and layered cache design.

Cache Penetration

Scenario

Hit-rate declines over time, CPU usage rises, DB overloaded despite stable Redis memory—often from malicious or bogus requests.

Diagnosis:

Widespread cache misses.
Requests for nonexistent keys or attack URLs.

Cause: Queries for absent data return null; nulls aren’t cached, so DB is repeatedly queried.

Solutions

Cache nulls briefly (30–300s) as interim fix.
Whitelist known valid IDs using bitmaps or Bloom filters (more efficient than plain bitmaps).
Monitoring + blacklist: flag abnormal hit-rate drops or null-ratio surges; apply blacklists during attacks.
Encrypt keys: validate at app edge to block malformed requests.

Summary: Penetration accesses non-existent data, bypassing cache entirely. Use temporary shields like black/white lists and remove them once threat ends.

Redis Performance Monitoring Metrics

Performance

Metric	Meaning	Note
Latency	Response time per request
instantaneous_ops_per_sec	Average QPS
Hit rate	Cache efficiency; low rate signals stress or poor expiry strategy

Memory

Metric	Meaning
used_memory	Memory consumed
mem_fragmentation_ratio	Fragmentation level
evicted_keys	Keys removed due to maxmemory limit
blocked_clients	Clients stalled on blocking list commands (BRPOP, etc.)

Activity

Metric	Meaning
connected_clients	Active client connections
connected_slaves	Replica count
master_last_io_seconds_ago	Seconds since last master-slave interaction
keyspace	Total keys in DB; sudden drops may precede avalanche

Persistence

Metric	Meaning
rdb_last_save_time	Timestamp of last RDB save
rdb_changes_since_last_save	Count of writes since last save

Errors

Metric	Meaning
rejected_connections	Connections denied by maxclient limit
keyspace_misses	Cache misses
master_link_down_since_seconds	Duration of master-slave disconneect

Monitoring Tools & Commands

Tools: Cloud Insight Redis, Prometheus, Redis-stat, Redis-faina, RedisLive, Zabbix. Commands: redis-benchmark, redis-cli, monitor, slowlog.

Configure slow log:

slowlog-log-slower-than 1000   # microseconds
slowlog-max-len 100           # max entries

Retrieve slow log info:

slowlog get   # fetch entries
slowlog len   # entry count
slowlog reset # clear log

Bloom Filter

Use case: Fast duplicate username check at registration.

Definition: Space-efficient probabilistic structure combining a bit array and multiple hash functions to test set membership.

Traits:

Fixed size can hold unlimited elements; higher load increases false positives; if all bits are 1, everything appears present.
Can yield false positives but not false negatives.
Does not support deletion.

How It Works

Add element:

Compute K hashes of the value.
Map each hash to an index in the bit array; set those bits to 1.

Check existence:

Recompute the K hashes.
If all corresponding bits are 1 → possibly present; any bit 0 → definitely absent.

Back to List

Prev: Scikit-learn Classification Algorithms: Implementation and Optimization

Next: Configuring Self-Signed SSL Certificates for SQL Server Connections

Fading Coder

Managing Redis Cache Issues and Monitoring Key Metrics

Overview of Four Common Redis Cache Problems

Cache Warm-Up

Typical Scenario

Warm-Up Process

Cache Avalanche

Scenario

Preventive Measures

Reactive Strategies

Cache Breakdown

Scenario

Mitigation

Cache Penetration

Scenario

Solutions

Redis Performance Monitoring Metrics

Performance

Memory

Activity

Persistence

Errors

Monitoring Tools & Commands

Bloom Filter

How It Works

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Skipping Errors in MySQL Asynchronous Replication

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Managing Redis Cache Issues and Monitoring Key Metrics

Overview of Four Common Redis Cache Problems

Cache Warm-Up

Typical Scenario

Warm-Up Process

Cache Avalanche

Scenario

Preventive Measures

Reactive Strategies

Cache Breakdown

Scenario

Mitigation

Cache Penetration

Scenario

Solutions

Redis Performance Monitoring Metrics

Performance

Memory

Activity

Persistence

Errors

Monitoring Tools & Commands

Bloom Filter

How It Works

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Skipping Errors in MySQL Asynchronous Replication

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment