Design a Distributed Unique ID Generator (Snowflake)
Design a service to generate globally unique, monotonically increasing IDs (like Twitter's Snowflake). Discuss structure, time component, and high availability.
Why Interviewers Ask This
Interviewers at Google ask this to evaluate your ability to design scalable systems under strict constraints. They specifically test your understanding of clock synchronization, collision avoidance in distributed environments, and how to handle node failures without compromising ID uniqueness or monotonicity.
How to Answer This Question
1. Clarify Requirements: Define the scale (e.g., billions per day), latency needs, and whether IDs must be strictly monotonically increasing or just sortable. 2. Propose Structure: Outline a bit-field layout similar to Snowflake, allocating bits for timestamp, machine ID, and sequence number. 3. Address Time Handling: Discuss using high-resolution clocks and handling clock skew by falling back to sequence numbers if time doesn't advance. 4. Ensure Uniqueness: Explain how to generate unique worker IDs and handle sequence number rollover within a millisecond. 5. High Availability: Detail strategies like pre-allocating IDs or using a centralized service with replication for fault tolerance. 6. Trade-offs: Conclude by comparing your solution against alternatives like UUIDs or database auto-increments, highlighting why your approach fits Google's scale.
Key Points to Cover
- Explicitly defining the bit-field allocation strategy for timestamp, machine ID, and sequence
- Demonstrating a concrete strategy for handling clock skew and negative time deltas
- Explaining how to assign unique worker IDs without creating a single point of failure
- Discussing the trade-off between strict monotonicity and system availability during clock issues
- Calculating capacity based on the proposed bit distribution to prove scalability
Sample Answer
To design a globally unique, monotonically increasing ID generator, I would adapt the Snowflake algorithm while addressing its specific challenges. First, I'd define the requirements: we need roughly 10 billion IDs daily with sub-millisecond generation latency. The structure would use a 64-bit integer split into four parts: a sign bit (ignored), a 41-bit timestamp in milliseconds since an epoch, a 10-bit machine ID, and a 12-bit sequence number. The timestamp ensures sortability; however, since clocks can drift, I must handle clock skew. If the current time is less than the last recorded time, the system waits or uses the previous timestamp with an incremented sequence. To ensure uniqueness across thousands of nodes, each node requires a unique machine ID, perhaps assigned via a lightweight registry service that supports high availability through sharding. For high availability, the machine ID assignment service should be replicated, and nodes could cache their IDs locally to avoid network calls during peak load. While this sacrifices strict monotonicity slightly during clock adjustments, it maintains global uniqueness and near-linear scalability, which aligns with Google's focus on massive throughput and reliability.
Common Mistakes to Avoid
- Ignoring clock synchronization issues entirely, assuming all servers have perfectly synchronized time
- Failing to explain how to prevent duplicate IDs when the sequence number overflows within a single millisecond
- Proposing a centralized database for every ID generation request, which creates a bottleneck and violates scalability goals
- Overlooking the need for a unique identifier per node, leading to potential collisions if multiple services run on the same host
Practice This Question with AI
Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.
Related Interview Questions
Design a CDN Edge Caching Strategy
Medium
AmazonDesign a System for Monitoring Service Health
Medium
SalesforceDesign a Payment Processing System
Hard
UberDesign a System for Real-Time Fleet Management
Hard
UberDefining Your Own Success Metrics
Medium
GoogleProduct Strategy: Addressing Market Saturation
Medium
Google