Home > Tech > Content

Rate Limiting: Principles, Algorithms, and Practical Implementations

Tech 1

Introduction

In high-concurrency scenarios, three key mechanisms are essential to ensure service stability: caching, circuit breaking/degradation, and rate limiting. Rate limiting serves as a core self-protection mechanism that prevents system collapse under extreme concurrency, especially when other measures are insufficient, thereby avoiding cascading failures.

Consider this scenario:

QPS (Queries Per Second) is the number of requests processed per second.
A single server can handle a maximum QPS of 100.
A cluster of 5 servers typically handles 300 QPS.

Under normal conditions, each server handles 60 QPS (300/5), which is well within capacity.

Now, suppose a promotion increases QPS to 800. Without rate limiting:

Server A receives 160 QPS, exceeding its limit and causing a crash.
The remaining 4 servers now handle 200 QPS each (800/4), leading to sequential failures and a system-wide collapse.

With rate limiting:

Server A is capped at 100 QPS, rejecting 60 exces requests but remaining operational.
The cluster continues to function, providing time for developers to implement scaling or degradation strategies.

Thus, rate limiting is critical for system resilience, though many engineers may not fully understand its underlying principles. Key algorithms include: counter, sliding window, leaky bucket, and token bucket. Common implementations involve Guava's RateLimiter, distributed lock-based token buckets, and Alibaba Sentinel.

What Is Rate Limiting?

Rate limiting is a form of service degradation that restricts input and output traffic to protect system integrity. By capping throughput to a predetermined threshold, it prevents overload through measures like request delay, rejection, or partial denial.

Rate Limiting Algorithms

Counter

Implementation: Limits requests within a fixed time window.

import java.util.concurrent.atomic.AtomicInteger;

public class FixedWindowRateLimiter {
    private final int maxRequests = 10;
    private final long windowMillis = 1000;
    private long windowStartTime;
    private AtomicInteger requestCount = new AtomicInteger(0);

    public boolean allowRequest() {
        long currentTime = System.currentTimeMillis();
        if (currentTime < windowStartTime + windowMillis) {
            // Within current window
            requestCount.incrementAndGet();
            return requestCount.get() <= maxRequests;
        } else {
            // Reset for new window
            windowStartTime = currentTime;
            requestCount.set(0);
            return true;
        }
    }
}

Limitation: Burst requests at window boundaries can exceed intended limits. For example, with 60 requests per minute, 59 requests at 00:59 and 60 at 01:00 result in 119 requests in 2 seconds, far exceeding the 1-request-per-second average.

Sliding Window

Implementation: Divides time into smaller intervals for finer control.

import java.util.Iterator;
import java.util.concurrent.ConcurrentLinkedQueue;

public class SlidingWindowRateLimiter {
    private ConcurrentLinkedQueue<Long> requestTimestamps = new ConcurrentLinkedQueue<>();
    private int windowSeconds;
    private int maxRequests;

    public SlidingWindowRateLimiter(int windowSeconds, int maxRequests) {
        this.windowSeconds = windowSeconds;
        this.maxRequests = maxRequests;
        startCleanupThread();
    }

    private void startCleanupThread() {
        new Thread(() -> {
            while (true) {
                try {
                    Thread.sleep((windowSeconds - 1) * 1000L);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                evictExpiredEntries();
            }
        }).start();
    }

    private void evictExpiredEntries() {
        long cutoffTime = System.currentTimeMillis() - windowSeconds * 1000;
        Long timestamp;
        while ((timestamp = requestTimestamps.peek()) != null && timestamp < cutoffTime) {
            requestTimestamps.poll();
        }
    }

    public boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        synchronized (requestTimestamps) {
            if (countValidRequests() >= maxRequests) {
                return false;
            }
            requestTimestamps.offer(currentTime);
            return true;
        }
    }

    private int countValidRequests() {
        long cutoffTime = System.currentTimeMillis() - windowSeconds * 1000;
        int validCount = 0;
        for (Long timestamp : requestTimestamps) {
            if (timestamp > cutoffTime) {
                validCount++;
            }
        }
        return validCount;
    }
}

Leaky Bucket

Implementation: Uses a bucket with fixed capacity; requests enter at variable rates but exit at a controlled constant rate.

public class LeakyBucketRateLimiter {
    private long lastUpdateTime;
    private double capacity;
    private double outflowRate;
    private double currentVolume;

    public LeakyBucketRateLimiter(double capacity, double outflowRate) {
        this.capacity = capacity;
        this.outflowRate = outflowRate;
        this.lastUpdateTime = System.currentTimeMillis();
        this.currentVolume = 0;
    }

    public synchronized boolean allowRequest() {
        long now = System.currentTimeMillis();
        double elapsedSeconds = (now - lastUpdateTime) / 1000.0;
        currentVolume = Math.max(0, currentVolume - elapsedSeconds * outflowRate);
        lastUpdateTime = now;

        if (currentVolume + 1 <= capacity) {
            currentVolume++;
            return true;
        }
        return false;
    }
}

Token Bucket

Implementation: Tokens are added to a bucket at a fixed rate; each request consumes a token, and requests are denied if the bucket is empty.

public class TokenBucketRateLimiter {
    private long lastRefillTime;
    private double maxTokens;
    private double refillRate;
    private double availableTokens;

    public TokenBucketRateLimiter(double maxTokens, double refillRate) {
        this.maxTokens = maxTokens;
        this.refillRate = refillRate;
        this.lastRefillTime = System.currentTimeMillis();
        this.availableTokens = maxTokens;
    }

    public synchronized boolean allowRequest() {
        refillTokens();
        if (availableTokens >= 1) {
            availableTokens--;
            return true;
        }
        return false;
    }

    private void refillTokens() {
        long now = System.currentTimeMillis();
        double elapsedSeconds = (now - lastRefillTime) / 1000.0;
        availableTokens = Math.min(maxTokens, availableTokens + elapsedSeconds * refillRate);
        lastRefillTime = now;
    }
}

Practical Implementations

Spring Cloud Gateway

Spring Cloud Gateway uses Redis for rate limiting with a token bucket algorithm.

Dependencies:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>

Configuration:

spring:
  cloud:
    gateway:
      routes:
        - id: rate_limited_route
          uri: lb://service-name
          predicates:
            - Path=/api/**
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 1  # Tokens per second
                redis-rate-limiter.burstCapacity: 3  # Maximum burst size
                key-resolver: "#{@clientAddressResolver}"

Key Resolver Bean:

@Bean
public KeyResolver clientAddressResolver() {
    return exchange -> Mono.just(exchange.getRequest().getRemoteAddress().getAddress().getHostAddress());
}

Alibaba Sentinel

Sentinel provides dynamic flow control rules.

Dependency:

<dependency>
    <groupId>com.alibaba.cloud</groupId>
    <artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
</dependency>

Configuration:

spring:
  cloud:
    sentinel:
      transport:
        dashboard: localhost:8080
        port: 8720
      datasource:
        flow:
          nacos:
            server-addr: localhost:8848
            dataId: sentinel-rules
            groupId: DEFAULT_GROUP
            rule-type: flow

Rule Definition (Nacos):

[
    {
        "resource": "/api/endpoint",
        "limitApp": "default",
        "grade": 1,
        "count": 10,
        "strategy": 0,
        "controlBehavior": 0,
        "clusterMode": false
    }
]

Parameters:

resource: Target endpoint.
limitApp: Call source (default for all).
grade: Threshold type (0 for concurrent threads, 1 for QPS).
count: Limit threshold.
strategy: Rule basis (0 for resource, 1 for关联资源, 2 for链路).
controlBehavior: Action on limit (0 for direct reject, 1 for排队, 2 for warm-up).
clusterMode: Whether cluster-wide.

Back to List

Prev: Implementing GraphQL APIs in Spring Boot: Multiple Approaches

Next: Architecture and Design Principles of the Java Executor Framework

Fading Coder

Rate Limiting: Principles, Algorithms, and Practical Implementations

Introduction

What Is Rate Limiting?

Rate Limiting Algorithms

Counter

Sliding Window

Leaky Bucket

Token Bucket

Practical Implementations

Spring Cloud Gateway

Alibaba Sentinel

Related Articles

Understanding Strong and Weak References in Java

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Rate Limiting: Principles, Algorithms, and Practical Implementations

Introduction

What Is Rate Limiting?

Rate Limiting Algorithms

Counter

Sliding Window

Leaky Bucket

Token Bucket

Practical Implementations

Spring Cloud Gateway

Alibaba Sentinel

Related Articles

Understanding Strong and Weak References in Java

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment