Design a Global Rate Limiter

System Design
Medium
Stripe
30.9K views

Design a system to throttle user requests globally across distributed servers. Discuss common algorithms (Token Bucket, Leaky Bucket), deployment strategies, and using a centralized store like Redis.

Why Interviewers Ask This

Interviewers at Stripe ask this to evaluate your ability to design resilient distributed systems under strict consistency constraints. They specifically test your understanding of global state management, latency trade-offs in centralized stores like Redis, and your capacity to select appropriate algorithms like Token Bucket for fair throttling across geographically dispersed servers.

How to Answer This Question

1. Clarify Requirements: Define scope immediately—global vs. regional limits, per-user vs. per-IP, and strict vs. eventual consistency. Mention Stripe's need for financial-grade reliability. 2. Propose Core Algorithm: Select Token Bucket for its flexibility in handling burst traffic, explaining why it fits payment processing better than Leaky Bucket. 3. Architecture Design: Sketch a client-side proxy or middleware that communicates with a centralized Redis cluster. Discuss sharding strategies to handle massive scale. 4. Address Consistency & Failover: Explain how to handle clock skew between regions and propose a fallback mechanism if the central store becomes unavailable. 5. Optimize & Scale: Discuss connection pooling, pipelining for Redis, and monitoring metrics to prevent false positives or system bottlenecks.

Key Points to Cover

  • Explicitly choosing Token Bucket over Leaky Bucket for burst tolerance
  • Using Redis Lua scripts to guarantee atomicity during token consumption
  • Addressing the latency-consistency trade-off in multi-region deployments
  • Defining clear failure modes and fallback strategies for store outages
  • Connecting the technical solution to Stripe's specific need for payment reliability

Sample Answer

To design a global rate limiter for a platform like Stripe, I would first clarify the requirements. We need to limit requests globally, not just per server, likely focusing on API endpoints handling payments. The goal is preventing abuse while minimizing latency impact on legitimate users. For the algorithm, I recommend the Token Bucket approach. Unlike Leaky Bucket which smooths traffic rigidly, Token Bucket allows controlled bursts, which is critical for payment retries or initial checkout spikes. Each user gets a bucket of tokens replenished at a fixed rate. Architecturally, we cannot rely on local memory because it doesn't scale globally. Instead, we deploy a centralized Redis cluster with multi-region replication. Client services will check token availability via atomic Lua scripts in Redis to ensure thread safety without race conditions. To reduce latency, we can use a tiered approach: allow a small local cache for immediate decisions, then asynchronously sync with Redis. Consistency is tricky here. If we require strict global consistency, cross-region network latency might slow down responses. For most cases, eventual consistency with a slightly higher threshold is acceptable. However, for high-value transactions, we might enforce stricter checks. Finally, we must plan for Redis failure. If the store goes down, we should fail open with a conservative local limit to maintain service availability rather than blocking all traffic.

Common Mistakes to Avoid

  • Ignoring the difference between local and global state, leading to race conditions
  • Suggesting database polling instead of Redis for real-time token updates
  • Failing to discuss how to handle clock synchronization across different time zones
  • Overlooking the performance cost of cross-region network calls in the architecture

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 57 Stripe questions