Design a Video Recommendation Engine (Short Form)

System Design
Hard
Google
56.1K views

Design the system for personalized short-video recommendations (like TikTok). Focus on low latency, real-time feature extraction, and candidate generation from a massive corpus.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to architect high-throughput, low-latency systems handling massive scale. They specifically test your understanding of the two-stage funnel (candidate generation vs. ranking), real-time feature processing for short-form content, and how to balance personalization with exploration in a cold-start scenario.

How to Answer This Question

1. Clarify requirements immediately: Define latency targets (under 200ms), daily active users, and video ingestion rates typical of Google-scale products. 2. Propose a Two-Stage Architecture: Start with Candidate Generation using Approximate Nearest Neighbor (ANN) search on user embeddings to filter millions of videos down to hundreds. 3. Detail Real-Time Feature Extraction: Explain how you capture immediate user signals like swipe speed and watch time to update user vectors dynamically. 4. Describe the Ranking Model: Discuss using a deep neural network that ingests both static features and real-time context to score candidates. 5. Address Scale and Consistency: Mention sharding strategies, caching layers (like Redis), and A/B testing frameworks to validate model improvements before global rollout.

Key Points to Cover

  • Explicitly defining the two-stage funnel (Candidate Generation followed by Ranking) as the core architectural pattern
  • Demonstrating knowledge of Approximate Nearest Neighbor (ANN) algorithms for efficient retrieval from massive datasets
  • Addressing the specific challenge of real-time feature extraction for short-form content engagement signals
  • Proposing concrete solutions for latency reduction through caching and optimized data structures
  • Discussing mechanisms to handle cold-start problems and content diversity within the ranking logic

Sample Answer

To design a TikTok-style recommendation engine at Google scale, I would prioritize a two-stage funnel architecture to handle the massive corpus while maintaining sub-200ms latency. First, in the Candidate Generation phase, we cannot scan billions of videos. Instead, I'd use an ANN index, likely HNSW or ScaNN, built on user embeddings derived from historical interactions. This reduces the pool from billions to a few hundred relevant videos per request. Second, for real-time relevance, we need immediate feedback loops. As a user swipes, we extract features like dwell time and interaction type, updating their session vector in a low-latency store like Redis. Finally, the Ranking stage uses a Deep Neural Network to score these hundreds of candidates, incorporating rich features like video metadata, creator popularity, and the user's current context. To prevent echo chambers, I'd inject diversity by ensuring a percentage of recommendations come from new creators or different topics. For scalability, the system must be stateless where possible, relying on distributed computing for training and sharded storage for features, ensuring consistency across regions.

Common Mistakes to Avoid

  • Focusing solely on the machine learning model without explaining the data pipeline and infrastructure required for real-time inference
  • Ignoring the latency constraints inherent in short-form video apps, leading to suggestions that are computationally too heavy
  • Overlooking the candidate generation step and proposing to rank all available videos, which is impossible at scale
  • Failing to mention how to handle real-time user feedback, resulting in a system that feels static and unresponsive

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 87 Google questions