Design the Twitter News Feed

System Design
Medium
Google
100.4K views

Design the system that generates the news feed for Twitter/X. Focus on the fan-out mechanism (push vs. pull), feed ranking, and handling celebrity users (hot spots).

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to balance scalability with real-time performance in high-traffic systems. They specifically test your understanding of fan-out patterns, how to handle hot spots like celebrity users without crashing the system, and your capacity to prioritize trade-offs between consistency and availability.

How to Answer This Question

1. Clarify Requirements: Define scope (read vs. write QPS), latency goals, and scale (e.g., 500M daily active users). 2. High-Level Architecture: Sketch a flow from Tweet creation to Feed retrieval, identifying core components like APIs, databases, and caches. 3. Fan-Out Strategy: Debate Push vs. Pull models; recommend Hybrid for Twitter's specific mix of casual and celebrity users. 4. Hot Spot Handling: Detail how to isolate celebrity feeds using pre-computation or specialized queues to prevent cache stampedes. 5. Ranking & Refinement: Briefly explain how to integrate ML-based ranking logic post-retrieval. 6. Trade-offs: Conclude by discussing consistency, storage costs, and failure scenarios.

Key Points to Cover

  • Propose a Hybrid Fan-Out strategy to balance load between push and pull
  • Explicitly address the 'Celebrity Hot Spot' problem with isolation techniques
  • Differentiate between write optimization and read optimization paths
  • Demonstrate awareness of caching layers (Redis/Memcached) for low latency
  • Articulate clear trade-offs regarding data consistency versus availability

Sample Answer

To design Twitter's feed, I first clarify that we need sub-second latency for billions of reads while handling massive write spikes. The core challenge is the fan-out mechanism. A pure pull model is too slow for millions of followers, while a pure push model wastes resources on inactive accounts. I propose a hybrid approach: for normal users, we use push to their follower lists upon posting. However, for celebrities with millions of followers, pushing every tweet would overwhelm our infrastructure. Instead, we store their tweets in a dedicated 'hot' cache or a separate read-optimized store, and only pull them during feed generation. This prevents the 'thundering herd' problem. We then aggregate these streams, deduplicate, and pass them through a lightweight ranking service that filters spam and boosts relevant content before serving the final list. Finally, we must ensure durability by writing to a message queue before updating the cache, ensuring no tweets are lost even if the feed service temporarily fails.

Common Mistakes to Avoid

  • Ignoring the scale difference between regular users and celebrities, leading to an inefficient all-push or all-pull solution
  • Focusing solely on database schema without explaining the real-time data propagation mechanism
  • Overlooking the impact of network latency when aggregating feeds from multiple sources
  • Failing to define clear metrics for success, such as tail latency requirements or throughput targets

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 87 Google questions