Design a Type-Ahead with Spell Check

System Design
Hard
Uber
72.4K views

Augment a type-ahead service to incorporate basic spell check suggestions. Discuss using data structures like Levenshtein Automata or Tries with edit distance tracking.

Why Interviewers Ask This

Interviewers at Uber ask this to evaluate your ability to balance strict latency requirements with complex algorithmic logic. They want to see if you can design a system that handles real-time user input while efficiently calculating edit distances without blocking the main thread, reflecting Uber's need for instant, reliable navigation and search experiences.

How to Answer This Question

1. Clarify constraints: Ask about expected QPS, maximum acceptable latency (e.g., under 50ms), and the scale of the dictionary. 2. Propose a Trie-based architecture: Explain how to store the dictionary in a Trie for fast prefix matching. 3. Integrate Edit Distance: Detail two strategies—calculating Levenshtein distance on the fly for short prefixes or using a Levenshtein Automaton for O(n) matching speed. 4. Discuss Optimization: Mention caching frequent misspellings and pruning candidates based on a distance threshold (k=1 or k=2). 5. Address Scalability: Describe sharding the Trie across services and using consistent hashing to handle high traffic loads typical of ride-hailing apps.

Key Points to Cover

  • Demonstrating knowledge of Levenshtein Automata for sub-linear time complexity
  • Explicitly addressing the latency constraints critical to real-time systems
  • Proposing a hybrid approach using Tries for exact matches and Automata for fuzzy search
  • Discussing caching strategies to reduce redundant calculations for popular typos
  • Explaining how to scale the solution across distributed nodes for high concurrency

Sample Answer

To design a type-ahead with spell check for a platform like Uber, I would start by defining the core constraint: suggestions must appear within 50 milliseconds. First, I'd implement a compressed Trie to store the location and driver database, allowing for rapid prefix traversal. For the spell-check component, instead of recalculating the full Levenshtein distance for every query, which is too slow, I would use a Levenshtein Automaton. This automaton accepts all strings within a specific edit distance 'k' from the input, effectively filtering valid suggestions in linear time relative to the input length. In practice, if a user types 'Uber', the system queries the Trie for exact matches first. If none exist or are sparse, it triggers the automaton to find near-matches like 'Ubar' or 'Uber X'. To optimize further, I would cache common typos in a Redis layer and limit the output to the top 5 results based on a weighted score combining proximity, popularity, and edit distance. This approach ensures low latency while maintaining high accuracy, crucial for a seamless rider experience during peak hours.

Common Mistakes to Avoid

  • Suggesting a brute-force comparison against the entire dictionary, which ignores latency requirements
  • Focusing only on the algorithm without considering the data structure (Trie) needed for prefix lookups
  • Ignoring the trade-off between accuracy and performance when setting the edit distance threshold
  • Forgetting to mention how to handle dynamic updates to the dictionary as new locations are added

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 57 Uber questions