Design an Image Moderation Service (NSFW Detection)
Design a system that uses machine learning models to automatically detect and flag inappropriate images/videos upon upload. Focus on asynchronous processing and human review queues.
Why Interviewers Ask This
Interviewers at Meta ask this to evaluate your ability to balance high-scale system reliability with critical safety requirements. They specifically assess how you handle asynchronous processing for media-heavy workloads, manage trade-offs between model latency and accuracy, and design robust human-in-the-loop review queues for edge cases that automated systems miss.
How to Answer This Question
1. Clarify Scope: Immediately define constraints like image resolution, expected throughput (e.g., millions per second), and the specific definition of 'inappropriate' content. 2. High-Level Architecture: Propose a decoupled architecture using an object storage layer (like S3) feeding into an event queue (Kafka) to handle spikes in upload traffic. 3. Asynchronous Processing Pipeline: Detail the flow where workers pull messages from the queue, invoke lightweight pre-filters followed by heavy ML models, and store results back to the database. 4. Human Review Integration: Design a fallback mechanism where low-confidence predictions or specific categories trigger a ticket in a human moderation dashboard with priority queuing. 5. Scaling & Optimization: Discuss horizontal scaling strategies for worker nodes, model versioning, and caching frequent requests to reduce latency while maintaining strict data privacy standards.
Key Points to Cover
- Explicitly separating ingestion, processing, and review layers to prevent bottlenecks
- Using a message queue to decouple upload traffic from compute-intensive ML inference
- Implementing a confidence threshold logic to route uncertain cases to human reviewers
- Addressing the need for horizontal scaling of worker nodes to handle variable load
- Considering data privacy and model retraining pipelines as part of the lifecycle
Sample Answer
To design a scalable Image Moderation Service, I would start by defining the core requirement: processing millions of uploads asynchronously without blocking the user experience. The system begins when a user uploads an image to our Object Storage layer. This triggers an event sent to a high-throughput message broker like Kafka, ensuring we can absorb traffic spikes during viral events. A fleet of stateless worker services consumes these messages. We implement a tiered detection strategy: first, a fast heuristic filter removes obviously safe or malicious files; then, a deep learning model analyzes the remaining images for NSFW content. If the model confidence is below a threshold, say 80%, the image is routed to a human review queue rather than being auto-approved or rejected. This ensures safety without false positives ruining user trust. We need a dedicated database schema to track image status, model version used, and reviewer ID. To handle scale, workers auto-scale based on queue depth, and we use Redis to cache recent image hashes to prevent redundant processing. Finally, we must ensure the human review interface prioritizes urgent content, allowing moderators to act quickly on flagged items while the system continues processing the backlog efficiently.
Common Mistakes to Avoid
- Proposing synchronous processing which would cause unacceptable latency for users uploading images
- Ignoring the human review component entirely, assuming AI can achieve 100% accuracy
- Failing to address how the system handles sudden traffic spikes or bursty upload patterns
- Overlooking the importance of storing metadata about why an image was flagged for audit trails
Practice This Question with AI
Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.
Related Interview Questions
Design a Payment Processing System
Hard
UberDesign a System for Real-Time Fleet Management
Hard
UberDesign a CDN Edge Caching Strategy
Medium
AmazonDesign a System for Monitoring Service Health
Medium
SalesforceFind K Closest Elements (Heaps)
Medium
MetaShould Meta launch a paid, ad-free version of Instagram?
Hard
Meta