Design a Distributed Counter Service

System Design
Medium
Google
23.5K views

Design a service to reliably increment/decrement millions of shared counters (e.g., likes, views) across distributed systems. Discuss eventual vs. strong consistency.

Why Interviewers Ask This

Interviewers at Google ask this to evaluate your ability to balance trade-offs between consistency, availability, and partition tolerance in high-scale environments. They specifically want to see if you can design a system that handles millions of concurrent writes without locking bottlenecks, while making informed decisions about eventual versus strong consistency based on business requirements.

How to Answer This Question

1. Clarify requirements immediately: Ask about read-to-write ratios, latency constraints, and whether the counter needs to be accurate in real-time or if eventual consistency is acceptable. 2. Define the scope: Determine if counters are global or per-user, and estimate the throughput (e.g., millions of QPS). 3. Propose a baseline architecture: Suggest a sharded key-value store where each shard manages a subset of counters to distribute load. 4. Address the core challenge: Discuss how to handle atomic increments using Redis or Memcached with Lua scripts for speed, or a database with optimistic locking. 5. Resolve consistency conflicts: Explain strategies like vector clocks or last-writer-wins for merging updates across regions. 6. Optimize for scale: Mention caching layers, batched writes to reduce I/O, and asynchronous replication to ensure high availability.

Key Points to Cover

  • Explicitly choosing eventual consistency for high-volume metrics to maximize performance
  • Using sharding to eliminate single points of failure and distribute write load
  • Leveraging atomic operations in memory stores like Redis for sub-millisecond latency
  • Explaining the trade-off between data accuracy and system availability clearly
  • Demonstrating knowledge of CAP theorem implications in a real-world scenario

Sample Answer

To design a distributed counter service for millions of operations, I first clarify that for metrics like 'likes' or 'views', eventual consistency is usually sufficient, allowing us to prioritize availability and low latency over strict linearizability. If we required strong consistency, the system would suffer from significant write latency due to coordination overhead across nodes. My proposed architecture starts by sharding counters based on their IDs across multiple stateless API servers backed by a distributed cache like Redis Cluster. Each shard handles a specific range of keys to prevent hotspots. When an increment request arrives, the API server forwards it to the appropriate shard. To ensure atomicity without heavy locking, we use Redis's INCR command, which is atomic and extremely fast. For durability, shards asynchronously replicate data to persistent storage in the background. If a node fails, the cluster automatically reassigns keys to healthy nodes. We handle potential data loss during crashes by implementing a write-ahead log or accepting minor discrepancies given the eventual consistency model. Finally, we aggregate these counters periodically for reporting dashboards. This approach mirrors Google's preference for scalable, loosely coupled systems that optimize for user experience rather than perfect immediate accuracy.

Common Mistakes to Avoid

  • Jumping straight to a SQL database solution without considering write throughput limitations
  • Ignoring the need for sharding, leading to a design that cannot scale horizontally
  • Failing to distinguish between the needs of a counter versus a financial transaction ledger
  • Overcomplicating the solution with complex consensus algorithms like Paxos when simpler methods suffice

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 87 Google questions