Design a System for A/B Testing

System Design
Medium
Google
67.8K views

Design a service that allows feature toggles to route users into different experimental groups (A/B testing). Focus on user bucketing, state persistence, and analysis integration.

Why Interviewers Ask This

Interviewers at Google ask this to evaluate your ability to design distributed systems with strict consistency and low-latency requirements. They specifically assess how you handle user bucketing logic, ensure feature flags remain consistent across sessions, and manage the trade-offs between availability and correctness in a high-scale environment.

How to Answer This Question

1. Clarify Requirements: Define scale (users per second), latency constraints, and whether experiments are short-term or long-running. Ask about consistency needs for bucketing. 2. High-Level Design: Propose a microservices architecture with an API Gateway, a Bucketing Service, and a Configuration Store. 3. Detail User Bucketing: Explain using a hash function (like MurmurHash) on user ID and experiment ID to deterministically assign users to variants, ensuring they stay in the same group. 4. Address State Persistence: Discuss storing experiment configurations in a fast key-value store like Redis or Spanner, emphasizing eventual consistency vs. strong consistency. 5. Data Pipeline: Outline how clickstream data flows to a warehouse for analysis, mentioning schema design for experiment metadata. 6. Edge Cases: Cover handling new users, traffic shifting, and rollback mechanisms.

Key Points to Cover

  • Use deterministic hashing algorithms like MurmurHash to guarantee user consistency across servers
  • Leverage distributed databases like Spanner or sharded Redis for low-latency state persistence
  • Implement asynchronous logging pipelines (e.g., Pub/Sub to BigQuery) for scalable data collection
  • Define clear fallback strategies to maintain service availability during infrastructure failures
  • Address traffic shifting and immediate rollback capabilities for real-time experiment management

Sample Answer

To design an A/B testing system, I'd start by clarifying that we need to support millions of daily active users with sub-10ms latency for flag resolution. The core component is the Bucketing Service. When a request arrives, we extract the user ID and the experiment key. We apply a deterministic hash function, such as MurmurHash3, combining the user ID with the experiment salt. This ensures that a specific user always lands in the same bucket, say Variant A or B, regardless of which server handles the request. For state persistence, we cannot rely on local memory; we must use a highly available distributed store like Google Spanner or a sharded Redis cluster to store the current experiment configuration and traffic weights. If we change a weight from 50/50 to 90/10, the service must immediately reflect this without restarting. To handle analysis, every exposure event is logged asynchronously to a message queue like Pub/Sub, then aggregated into BigQuery. This allows data scientists to run statistical significance tests later. Crucially, we must implement a fallback mechanism: if the bucketing service is down, we default to the control group to prevent breaking the application. Finally, we need a robust dashboard for product managers to visualize conversion rates and stop experiments instantly if negative impacts are detected.

Common Mistakes to Avoid

  • Ignoring the requirement for deterministic bucketing, leading to users seeing different variants on subsequent visits
  • Focusing only on the UI implementation while neglecting the backend data pipeline for statistical analysis
  • Proposing synchronous writes for analytics events, which would create unacceptable latency bottlenecks
  • Failing to discuss how to handle edge cases like sudden traffic spikes or partial rollout failures

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 87 Google questions