Design a Distributed IDempotency Service

System Design
Medium
Stripe
66.4K views

Design a microservice that ensures a single request (e.g., payment) is processed only once, even if the client retries the request multiple times. Discuss using a database or distributed cache for tracking.

Why Interviewers Ask This

Interviewers at companies like Stripe ask this to evaluate your ability to handle critical financial constraints in distributed systems. They specifically test if you understand the trade-offs between strong consistency and availability, your knowledge of idempotency keys, and your capacity to design robust solutions that prevent double-spending or duplicate transactions under network failures.

How to Answer This Question

1. Clarify requirements immediately: Define what constitutes a 'unique request' (e.g., payment ID) and acceptable latency. 2. Propose a core mechanism using an Idempotency Key generated by the client. 3. Discuss storage options: Compare Redis for speed with eventual consistency versus a relational database with unique constraints for strong consistency. 4. Address concurrency: Explain how to handle race conditions during simultaneous retries using database locks or atomic operations. 5. Outline error handling and edge cases, such as key expiration policies and how to return cached responses without re-executing logic. This structured flow demonstrates systematic thinking aligned with Stripe's focus on reliability.

Key Points to Cover

  • Explicitly mentioning the use of Idempotency Keys generated by the client
  • Distinguishing between read-through caching strategies and database unique constraints
  • Explaining how to handle race conditions using atomic database inserts or locks
  • Addressing the trade-off between latency and strict consistency for financial data
  • Proposing a cleanup mechanism to manage cache memory usage over time

Sample Answer

To design an idempotency service, I would start by requiring clients to generate a unique Idempotency Key for every potential transaction. The service acts as a gateway before the business logic executes. For storage, I recommend a hybrid approach. We can use a distributed cache like Redis for high-speed lookups, storing the key mapped to the result payload with a TTL. However, to ensure data integrity, we must enforce a unique constraint on the Idempotency Key in the primary database. When a request arrives, we first check the cache; if found, we return the stored response immediately. If not, we attempt an atomic insert into the database. If the insert succeeds, we execute the business logic, store the result in both the database and cache, and return it. If the insert fails due to a duplicate key violation, we know another thread is processing it, so we either wait for that result or poll until completion. This handles the retry storm scenario effectively. Finally, we need a cleanup strategy to remove old keys from the cache after a set period to prevent memory leaks, ensuring the system remains performant while guaranteeing no duplicate payments occur.

Common Mistakes to Avoid

  • Focusing only on the code implementation without discussing the architectural flow and failure scenarios
  • Ignoring the race condition problem where two requests arrive simultaneously before the first completes
  • Suggesting simple in-memory storage which fails when the service restarts or scales horizontally
  • Forgetting to define what happens if the business logic itself fails but the key was already consumed

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 57 Stripe questions