Design a Twitter Feed (Sorted Set/Time Series)

Data Structures
Medium
Meta
56.3K views

Design the data structures required to maintain a user's chronological Twitter/X feed, supporting billions of posts. Focus on time series databases or Redis sorted sets.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to model time-series data under extreme scale constraints. They specifically want to see if you understand the trade-offs between push and pull architectures for news feeds, how Redis Sorted Sets handle ranking by timestamp, and your skill in optimizing read-heavy workloads for billions of records.

How to Answer This Question

1. Clarify requirements: Define read/write ratios, latency goals (e.g., <200ms), and consistency needs typical of Meta's high-scale environment. 2. Propose a hybrid architecture: Explain that writing is optimized via a 'Push' fan-out to follower lists, while reading uses pre-computed timelines stored in sorted sets. 3. Detail the Data Structure: Describe using Redis Sorted Sets where the score is the Unix timestamp and the member is the post ID, ensuring O(log N) insertion and retrieval. 4. Address Edge Cases: Discuss handling user un-follows, spam filtering, and the memory cost of storing millions of feed entries per user. 5. Optimize for Scale: Mention caching strategies and sharding approaches to distribute load across clusters, demonstrating awareness of distributed system limitations.

Key Points to Cover

  • Explicitly choosing between Push vs. Pull architectures based on follower count
  • Leveraging Redis Sorted Sets for O(log N) time-based sorting and range queries
  • Addressing the 'Fan-out' explosion problem for users with millions of followers
  • Defining clear trade-offs between read latency and write throughput
  • Incorporating soft-deletion strategies to handle content removal without performance penalties

Sample Answer

To design a scalable Twitter feed for billions of posts, I would prioritize low-latency reads over strong write consistency, which aligns with Meta's focus on user experience at scale. The core challenge is balancing the cost of generating a personalized timeline against the sheer volume of data. My solution uses a hybrid approach combining push and pull models. For active users, we employ a Push model where new tweets are immediately written to the sorted timelines of their followers. We utilize Redis Sorted Sets because they naturally order elements by score; here, the score is the tweet's timestamp, allowing us to fetch the most recent N items efficiently in O(log N + M) time. For inactive users or those with massive followings, we switch to a Pull model, fetching recent tweets from followed users dynamically. This prevents the 'fan-out' explosion problem. We also need to consider persistence and memory limits. Since storing a full timeline for every user is expensive, we might implement tiered storage, keeping hot data in Redis and cold data in a time-series database like Cassandra or HBase. Finally, we must handle edge cases like deleted tweets by marking them as soft-deleted within the set rather than removing them immediately to maintain performance. This architecture ensures that the feed remains responsive even as the user base grows into the billions.

Common Mistakes to Avoid

  • Suggesting a pure SQL relational database for real-time feed generation, ignoring performance bottlenecks
  • Ignoring the memory overhead of storing duplicate tweets for every follower in a naive push model
  • Failing to distinguish between active and inactive users when selecting the feed generation strategy
  • Overlooking the need for eventual consistency in a distributed environment

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 154 Data Structures questionsBrowse all 71 Meta questions