Design a System for Tracking Daily Active Users (DAU)

System Design
Medium
Meta
38.6K views

Design a high-volume system to accurately count unique Daily Active Users (DAU) across various product surfaces. Discuss using probabilistic data structures like HyperLogLog.

Why Interviewers Ask This

Interviewers at Meta ask this to evaluate your ability to balance accuracy with massive scale. They specifically want to see if you understand the trade-offs between exact counting and probabilistic algorithms like HyperLogLog when handling billions of events daily. It tests your grasp of distributed systems, data consistency, and resource optimization under strict latency constraints.

How to Answer This Question

1. Clarify Requirements: Define DAU precisely (unique user per day) and estimate scale (billions of requests). Ask about acceptable error rates since exact counts are impossible at this volume. 2. High-Level Architecture: Propose a flow where user events hit an ingestion layer like Kafka, then move to a processing cluster. 3. Core Algorithm Selection: Introduce HyperLogLog as the primary solution. Explain that it uses constant memory to estimate unique values with minimal error, unlike sets which would consume too much RAM. 4. Aggregation Strategy: Detail how to merge HLL sketches from multiple nodes using bitwise OR operations to get a global count without moving raw data. 5. Edge Cases & Validation: Discuss handling time zones for 'daily' definitions and validating results against sampled exact counts to ensure accuracy remains within bounds.

Key Points to Cover

  • Explicitly choosing HyperLogLog over exact sets due to memory constraints
  • Explaining the merge mechanism using bitwise OR operations for distributed aggregation
  • Defining clear requirements regarding error tolerance and time zone handling
  • Demonstrating knowledge of streaming architectures like Kafka and Flink
  • Proposing a validation strategy to compare probabilistic estimates against exact samples

Sample Answer

To design a system for tracking Daily Active Users at Meta's scale, we first need to define our constraints. We are likely dealing with tens of billions of daily events across Facebook, Instagram, and WhatsApp. Storing every user ID in a set for each day is impossible due to memory constraints. Therefore, we must use a probabilistic approach. I propose an architecture where user activity streams into Kafka topics. A distributed processing layer, perhaps using Flink or Spark Streaming, consumes these events. Instead of storing individual IDs, each node maintains a HyperLogLog sketch for the current day. When a user event arrives, we hash their ID and update the corresponding HLL register. The key advantage here is space efficiency; an HLL sketch might only take 10KB per shard regardless of whether we track a million or a billion users. To get the global DAU, we don't sum the estimates directly. Instead, we perform a bitwise OR operation on all the sketches from different shards to merge them into a single master sketch, then apply the cardinality estimation formula. We must also address the definition of a 'day'. We should partition data by UTC time or a specific region to handle timezone discrepancies accurately. Finally, while HLL introduces a small error margin, typically around 1-2%, we can validate this periodically by sampling a subset of traffic and comparing it against an exact counting method to ensure our estimates remain trustworthy for business decisions.

Common Mistakes to Avoid

  • Suggesting simple database counters or Redis sets which will fail under massive scale due to memory limits
  • Ignoring the distributed nature of the problem and proposing a single-node solution
  • Failing to explain how to aggregate results from multiple nodes to get a global count
  • Overlooking the definition of 'Daily' regarding time zones and daylight saving adjustments

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 71 Meta questions