Design a Distributed Unique ID Generator (Snowflake)

System Design
Medium
Google
111.3K views

Design a service to generate globally unique, monotonically increasing IDs (like Twitter's Snowflake). Discuss structure, time component, and high availability.

Why Interviewers Ask This

Interviewers at Google ask this to evaluate your ability to design scalable systems under strict constraints. They specifically test your understanding of clock synchronization, collision avoidance in distributed environments, and how to handle node failures without compromising ID uniqueness or monotonicity.

How to Answer This Question

1. Clarify Requirements: Define the scale (e.g., billions per day), latency needs, and whether IDs must be strictly monotonically increasing or just sortable. 2. Propose Structure: Outline a bit-field layout similar to Snowflake, allocating bits for timestamp, machine ID, and sequence number. 3. Address Time Handling: Discuss using high-resolution clocks and handling clock skew by falling back to sequence numbers if time doesn't advance. 4. Ensure Uniqueness: Explain how to generate unique worker IDs and handle sequence number rollover within a millisecond. 5. High Availability: Detail strategies like pre-allocating IDs or using a centralized service with replication for fault tolerance. 6. Trade-offs: Conclude by comparing your solution against alternatives like UUIDs or database auto-increments, highlighting why your approach fits Google's scale.

Key Points to Cover

  • Explicitly defining the bit-field allocation strategy for timestamp, machine ID, and sequence
  • Demonstrating a concrete strategy for handling clock skew and negative time deltas
  • Explaining how to assign unique worker IDs without creating a single point of failure
  • Discussing the trade-off between strict monotonicity and system availability during clock issues
  • Calculating capacity based on the proposed bit distribution to prove scalability

Sample Answer

To design a globally unique, monotonically increasing ID generator, I would adapt the Snowflake algorithm while addressing its specific challenges. First, I'd define the requirements: we need roughly 10 billion IDs daily with sub-millisecond generation latency. The structure would use a 64-bit integer split into four parts: a sign bit (ignored), a 41-bit timestamp in milliseconds since an epoch, a 10-bit machine ID, and a 12-bit sequence number. The timestamp ensures sortability; however, since clocks can drift, I must handle clock skew. If the current time is less than the last recorded time, the system waits or uses the previous timestamp with an incremented sequence. To ensure uniqueness across thousands of nodes, each node requires a unique machine ID, perhaps assigned via a lightweight registry service that supports high availability through sharding. For high availability, the machine ID assignment service should be replicated, and nodes could cache their IDs locally to avoid network calls during peak load. While this sacrifices strict monotonicity slightly during clock adjustments, it maintains global uniqueness and near-linear scalability, which aligns with Google's focus on massive throughput and reliability.

Common Mistakes to Avoid

  • Ignoring clock synchronization issues entirely, assuming all servers have perfectly synchronized time
  • Failing to explain how to prevent duplicate IDs when the sequence number overflows within a single millisecond
  • Proposing a centralized database for every ID generation request, which creates a bottleneck and violates scalability goals
  • Overlooking the need for a unique identifier per node, leading to potential collisions if multiple services run on the same host

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 87 Google questions