Design a Subscription Management Service

System Design
Medium
Netflix
138.4K views

Design a backend to handle recurring billing, subscription states (active, paused, delinquent), and webhooks for payment events.

Why Interviewers Ask This

Interviewers at Netflix ask this to evaluate your ability to design resilient, stateful systems that handle financial data with high consistency. They specifically look for your understanding of idempotency in payment processing, handling edge cases like network failures during billing, and ensuring data integrity across distributed services without losing revenue or customer trust.

How to Answer This Question

1. Clarify requirements: Define scope such as monthly vs annual plans, proration logic, and specific states like 'delinquent' versus 'cancelled'. Ask about expected scale, typical latency, and whether Netflix requires eventual consistency or strong consistency for billing records. 2. High-level architecture: Propose a microservices approach separating the Subscription Service from the Payment Gateway integration. Mention using an event-driven model where webhooks trigger state changes asynchronously. 3. Data modeling: Detail tables for users, subscriptions, invoices, and audit logs. Emphasize using optimistic locking to prevent race conditions when updating subscription status. 4. Handling critical flows: Walk through the retry mechanism for failed payments and how to implement idempotency keys to avoid double-charging if a webhook is received twice due to network issues. 5. Scalability and reliability: Discuss sharding strategies for user data and how to ensure the system remains available during peak billing cycles, referencing Netflix's chaos engineering culture by suggesting automated failure testing.

Key Points to Cover

  • Explicitly mentioning idempotency keys to prevent duplicate charges
  • Defining a clear Finite State Machine for subscription lifecycle
  • Proposing an asynchronous event-driven architecture for webhooks
  • Addressing data consistency and race conditions in billing updates
  • Discussing retry mechanisms and circuit breakers for external dependencies

Sample Answer

To design a robust subscription management service for a platform like Netflix, I would start by defining the core entities: User, Subscription Plan, and Invoice. Given the financial nature, we must prioritize data consistency and idempotency. I propose a microservices architecture where the Subscription Service acts as the source of truth, communicating asynchronously with a Payment Gateway via a dedicated Adapter pattern. The critical flow involves a scheduled job triggering billing events. When a charge is initiated, we generate a unique idempotency key stored in Redis. If the gateway returns a success but our internal update fails, the retry logic uses this key to detect duplicates rather than charging again. For state management, we maintain a finite state machine for subscriptions (Active, Paused, Delinquent, Cancelled). Transitions are triggered by webhooks from the payment provider; for instance, a 'charge_failed' webhook moves a user to 'Delinquent' after three attempts, automatically pausing access to content streams. Regarding scalability, since billing is write-heavy but read-light for most users, we can shard user data by region. We also need a dead-letter queue for failed webhook deliveries to ensure no payment event is lost. Finally, to align with Netflix's reliability standards, we should implement circuit breakers on the payment gateway calls and run chaos experiments to simulate gateway outages during peak billing times, ensuring the system degrades gracefully without corrupting financial records.

Common Mistakes to Avoid

  • Ignoring idempotency, which leads to dangerous duplicate charges during network retries
  • Designing a monolithic database schema instead of decoupled services for better fault isolation
  • Failing to define what happens when a webhook arrives late or out of order
  • Overlooking the need for audit logs to track every state change for compliance and debugging

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 45 Netflix questions