Design a Collaborative Editing System (Google Docs)

System Design
Hard
Google
70.3K views

Design a real-time document collaboration service. Focus on Operational Transformation (OT) or Conflict-Free Replicated Data Types (CRDTs) for merging concurrent changes.

Why Interviewers Ask This

Interviewers at Google ask this to evaluate your ability to design distributed systems that handle high concurrency without data loss. They specifically test your understanding of Operational Transformation (OT) or CRDTs, assessing whether you can manage conflict resolution in real-time environments where network latency causes simultaneous edits.

How to Answer This Question

1. Clarify requirements: Define scale (concurrent users), consistency models (eventual vs strong), and latency constraints typical of Google's infrastructure. 2. High-level architecture: Propose a client-server model with WebSocket connections for bidirectional communication and a central coordination service. 3. Core algorithm selection: Explicitly choose between OT or CRDTs. Explain why CRDTs might be better for offline-first scenarios or why OT suits centralized control. 4. Conflict resolution details: Describe how operations are ordered, transformed, or merged mathematically to ensure all replicas converge to the same state. 5. Edge cases: Discuss handling network partitions, user disconnects, and operation batching to maintain performance under load.

Key Points to Cover

  • Explicitly comparing OT versus CRDT trade-offs with a clear recommendation
  • Demonstrating knowledge of specific algorithms like RGA or LSEQ for text handling
  • Addressing the challenge of network latency and offline synchronization
  • Designing a scalable architecture that avoids central bottlenecks
  • Explaining how mathematical properties guarantee state convergence across replicas

Sample Answer

To design a Google Docs-like system, I would start by defining the core requirement: low-latency synchronization across thousands of concurrent users with eventual consistency. The architecture would feature clients connected via WebSockets to a gateway, which routes commands to a state management service. For the critical merge logic, I recommend using CRDTs (Conflict-Free Replicated Data Types) over traditional Operational Transformation. CRDTs allow independent updates to be applied locally first and merged later without needing a central lock, which drastically reduces server load and handles network partitions gracefully. Specifically, I would implement a RGA (Replicated Growable Array) for text editing, where each character is an object with a unique ID. When two users insert text at the same index simultaneously, their unique IDs determine the final order mathematically, ensuring convergence without complex transformation logic. The server acts as a broadcast hub, validating operation syntax but not enforcing strict ordering on every request. For heavy loads, we could shard documents by ID. If we chose OT instead, we would need a central sequencer to timestamp operations and transform incoming changes against the current document state, which introduces a single point of contention. Given Google's focus on scalability and resilience, the CRDT approach offers superior fault tolerance and allows clients to remain responsive even during brief connectivity issues.

Common Mistakes to Avoid

  • Ignoring the difference between synchronous locking and asynchronous merging strategies
  • Focusing only on database storage while neglecting the real-time sync protocol
  • Proposing a solution that requires a single master node for all writes, creating a bottleneck
  • Overlooking how to handle operations generated while a user is disconnected from the network

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 87 Google questions