System DesignMay 15, 20258 min read

Rate Limiter Design: Token Bucket vs Leaky Bucket vs Sliding Window

Designing a rate limiter from scratch — three algorithms compared, Redis-based distributed implementation, and how Spring Boot API Gateway handles it in practice.

System DesignRedisSpring BootAPI DesignDistributed Systems

Rate limiting is mandatory for any public API. Without it, a single misbehaving client can take down your service. Here's how the three main algorithms work and how I implement them in Spring Boot with Redis.

Algorithm Comparison

  • Token Bucket: bucket fills at rate R, each request consumes 1 token. Allows bursts up to bucket size. Best for user-facing APIs.
  • Leaky Bucket: requests enter a queue, processed at fixed rate. No bursts. Best for smoothing traffic to downstream services.
  • Sliding Window Log: tracks exact timestamps of requests. Most accurate but memory-intensive.
  • Sliding Window Counter: approximation using two fixed windows. Low memory, good accuracy. Best for high-traffic systems.

Redis-Based Token Bucket

RateLimiterService.java
java
1@Service
2@RequiredArgsConstructor
3public class RateLimiterService {
4
5    private final StringRedisTemplate redis;
6    private static final int LIMIT = 100;   // requests per window
7    private static final int WINDOW_SEC = 60;
8
9    public boolean isAllowed(String userId) {
10        String key = "rate:" + userId;
11        Long count = redis.opsForValue().increment(key);
12
13        if (count == 1) {
14            // First request in window — set expiry
15            redis.expire(key, Duration.ofSeconds(WINDOW_SEC));
16        }
17
18        return count <= LIMIT;
19    }
20}

Use Lua scripts for atomic rate limit checks in Redis. The increment + check + expire sequence above has a race condition — if the server crashes between increment and expire, the key never expires. A Lua script runs atomically.

Spring Boot Filter Integration

RateLimitFilter.java
java
1@Component
2@RequiredArgsConstructor
3public class RateLimitFilter extends OncePerRequestFilter {
4
5    private final RateLimiterService rateLimiter;
6
7    @Override
8    protected void doFilterInternal(HttpServletRequest req,
9                                    HttpServletResponse res,
10                                    FilterChain chain) throws ServletException, IOException {
11        String userId = req.getHeader("X-User-Id");
12        if (userId != null && !rateLimiter.isAllowed(userId)) {
13            res.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
14            res.setHeader("Retry-After", "60");
15            res.getWriter().write("{"error": "Rate limit exceeded"}");
16            return;
17        }
18        chain.doFilter(req, res);
19    }
20}

Atomic Lua Script for Redis

AtomicRateLimiter.java
java
1@Service
2public class AtomicRateLimiterService {
3
4    private final StringRedisTemplate redis;
5
6    // Lua script runs atomically — no race condition
7    private static final String RATE_LIMIT_SCRIPT = """
8        local key     = KEYS[1]
9        local limit   = tonumber(ARGV[1])
10        local window  = tonumber(ARGV[2])
11        local current = redis.call('INCR', key)
12        if current == 1 then
13            redis.call('EXPIRE', key, window)
14        end
15        return current
16        """;
17
18    public boolean isAllowed(String userId) {
19        Long count = redis.execute(
20            new DefaultRedisScript<>(RATE_LIMIT_SCRIPT, Long.class),
21            List.of("rate:" + userId),
22            "100", "60"
23        );
24        return count != null && count <= 100;
25    }
26}

Per-Endpoint Rate Limits with @RateLimit

A flat rate limit for all endpoints isn't always the right choice. Login should be stricter (5 req/min) than a search endpoint (100 req/min). You can achieve per-endpoint limits with a custom annotation and AOP.

RateLimit.java
java
1@Target(ElementType.METHOD)
2@Retention(RetentionPolicy.RUNTIME)
3public @interface RateLimit {
4    int limit() default 100;
5    int windowSeconds() default 60;
6    String keyPrefix() default "";
7}
8
9// Usage
10@PostMapping("/login")
11@RateLimit(limit = 5, windowSeconds = 60, keyPrefix = "login")
12public ResponseEntity<AuthResponse> login(@RequestBody LoginRequest req) { ... }

Responding with Rate Limit Headers

  • X-RateLimit-Limit: the limit for this endpoint (e.g. 100)
  • X-RateLimit-Remaining: requests left in the current window
  • X-RateLimit-Reset: Unix timestamp when the window resets
  • Retry-After: seconds until the client can retry (only on 429 response)

More in System Design