Rate Limiting: Principles, Algorithms, and Practical Implementations
Introduction
In high-concurrency scenarios, three key mechanisms are essential to ensure service stability: caching, circuit breaking/degradation, and rate limiting. Rate limiting serves as a core self-protection mechanism that prevents system collapse under extreme concurrency, especially when other measures are insufficient, thereby avoiding cascading failures.
Consider this scenario:
- QPS (Queries Per Second) is the number of requests processed per second.
- A single server can handle a maximum QPS of 100.
- A cluster of 5 servers typically handles 300 QPS.
Under normal conditions, each server handles 60 QPS (300/5), which is well within capacity.
Now, suppose a promotion increases QPS to 800. Without rate limiting:
- Server A receives 160 QPS, exceeding its limit and causing a crash.
- The remaining 4 servers now handle 200 QPS each (800/4), leading to sequential failures and a system-wide collapse.
With rate limiting:
- Server A is capped at 100 QPS, rejecting 60 exces requests but remaining operational.
- The cluster continues to function, providing time for developers to implement scaling or degradation strategies.
Thus, rate limiting is critical for system resilience, though many engineers may not fully understand its underlying principles. Key algorithms include: counter, sliding window, leaky bucket, and token bucket. Common implementations involve Guava's RateLimiter, distributed lock-based token buckets, and Alibaba Sentinel.
What Is Rate Limiting?
Rate limiting is a form of service degradation that restricts input and output traffic to protect system integrity. By capping throughput to a predetermined threshold, it prevents overload through measures like request delay, rejection, or partial denial.
Rate Limiting Algorithms
Counter
Implementation: Limits requests within a fixed time window.
import java.util.concurrent.atomic.AtomicInteger;
public class FixedWindowRateLimiter {
private final int maxRequests = 10;
private final long windowMillis = 1000;
private long windowStartTime;
private AtomicInteger requestCount = new AtomicInteger(0);
public boolean allowRequest() {
long currentTime = System.currentTimeMillis();
if (currentTime < windowStartTime + windowMillis) {
// Within current window
requestCount.incrementAndGet();
return requestCount.get() <= maxRequests;
} else {
// Reset for new window
windowStartTime = currentTime;
requestCount.set(0);
return true;
}
}
}
Limitation: Burst requests at window boundaries can exceed intended limits. For example, with 60 requests per minute, 59 requests at 00:59 and 60 at 01:00 result in 119 requests in 2 seconds, far exceeding the 1-request-per-second average.
Sliding Window
Implementation: Divides time into smaller intervals for finer control.
import java.util.Iterator;
import java.util.concurrent.ConcurrentLinkedQueue;
public class SlidingWindowRateLimiter {
private ConcurrentLinkedQueue<Long> requestTimestamps = new ConcurrentLinkedQueue<>();
private int windowSeconds;
private int maxRequests;
public SlidingWindowRateLimiter(int windowSeconds, int maxRequests) {
this.windowSeconds = windowSeconds;
this.maxRequests = maxRequests;
startCleanupThread();
}
private void startCleanupThread() {
new Thread(() -> {
while (true) {
try {
Thread.sleep((windowSeconds - 1) * 1000L);
} catch (InterruptedException e) {
e.printStackTrace();
}
evictExpiredEntries();
}
}).start();
}
private void evictExpiredEntries() {
long cutoffTime = System.currentTimeMillis() - windowSeconds * 1000;
Long timestamp;
while ((timestamp = requestTimestamps.peek()) != null && timestamp < cutoffTime) {
requestTimestamps.poll();
}
}
public boolean tryAcquire() {
long currentTime = System.currentTimeMillis();
synchronized (requestTimestamps) {
if (countValidRequests() >= maxRequests) {
return false;
}
requestTimestamps.offer(currentTime);
return true;
}
}
private int countValidRequests() {
long cutoffTime = System.currentTimeMillis() - windowSeconds * 1000;
int validCount = 0;
for (Long timestamp : requestTimestamps) {
if (timestamp > cutoffTime) {
validCount++;
}
}
return validCount;
}
}
Leaky Bucket
Implementation: Uses a bucket with fixed capacity; requests enter at variable rates but exit at a controlled constant rate.
public class LeakyBucketRateLimiter {
private long lastUpdateTime;
private double capacity;
private double outflowRate;
private double currentVolume;
public LeakyBucketRateLimiter(double capacity, double outflowRate) {
this.capacity = capacity;
this.outflowRate = outflowRate;
this.lastUpdateTime = System.currentTimeMillis();
this.currentVolume = 0;
}
public synchronized boolean allowRequest() {
long now = System.currentTimeMillis();
double elapsedSeconds = (now - lastUpdateTime) / 1000.0;
currentVolume = Math.max(0, currentVolume - elapsedSeconds * outflowRate);
lastUpdateTime = now;
if (currentVolume + 1 <= capacity) {
currentVolume++;
return true;
}
return false;
}
}
Token Bucket
Implementation: Tokens are added to a bucket at a fixed rate; each request consumes a token, and requests are denied if the bucket is empty.
public class TokenBucketRateLimiter {
private long lastRefillTime;
private double maxTokens;
private double refillRate;
private double availableTokens;
public TokenBucketRateLimiter(double maxTokens, double refillRate) {
this.maxTokens = maxTokens;
this.refillRate = refillRate;
this.lastRefillTime = System.currentTimeMillis();
this.availableTokens = maxTokens;
}
public synchronized boolean allowRequest() {
refillTokens();
if (availableTokens >= 1) {
availableTokens--;
return true;
}
return false;
}
private void refillTokens() {
long now = System.currentTimeMillis();
double elapsedSeconds = (now - lastRefillTime) / 1000.0;
availableTokens = Math.min(maxTokens, availableTokens + elapsedSeconds * refillRate);
lastRefillTime = now;
}
}
Practical Implementations
Spring Cloud Gateway
Spring Cloud Gateway uses Redis for rate limiting with a token bucket algorithm.
Dependencies:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>
Configuration:
spring:
cloud:
gateway:
routes:
- id: rate_limited_route
uri: lb://service-name
predicates:
- Path=/api/**
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 1 # Tokens per second
redis-rate-limiter.burstCapacity: 3 # Maximum burst size
key-resolver: "#{@clientAddressResolver}"
Key Resolver Bean:
@Bean
public KeyResolver clientAddressResolver() {
return exchange -> Mono.just(exchange.getRequest().getRemoteAddress().getAddress().getHostAddress());
}
Alibaba Sentinel
Sentinel provides dynamic flow control rules.
Dependency:
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
</dependency>
Configuration:
spring:
cloud:
sentinel:
transport:
dashboard: localhost:8080
port: 8720
datasource:
flow:
nacos:
server-addr: localhost:8848
dataId: sentinel-rules
groupId: DEFAULT_GROUP
rule-type: flow
Rule Definition (Nacos):
[
{
"resource": "/api/endpoint",
"limitApp": "default",
"grade": 1,
"count": 10,
"strategy": 0,
"controlBehavior": 0,
"clusterMode": false
}
]
Parameters:
resource: Target endpoint.limitApp: Call source (default for all).grade: Threshold type (0 for concurrent threads, 1 for QPS).count: Limit threshold.strategy: Rule basis (0 for resource, 1 for关联资源, 2 for链路).controlBehavior: Action on limit (0 for direct reject, 1 for排队, 2 for warm-up).clusterMode: Whether cluster-wide.