Design a Notification Service (Push/SMS/Email)

Question

Accepted Answer

To design a notification service capable of handling millions of daily events, I would prioritize decoupling ingestion from delivery using Apache Kafka. First, we define the scope: if we need strict delivery guarantees for critical alerts but can tolerate slight delays for marketing emails, we must differentiate our queue partitions or use separate topics. The API layer accepts requests and pushes them into Kafka with unique correlation IDs to ensure idempotency. This prevents duplicate notifications if a retry occurs due to transient network issues.

For the delivery layer, we deploy a cluster of stateless consumers that poll these topics. Each consumer is responsible for a specific channel type, such as Push or Email. If a provider like Firebase returns a 503 error, the consumer implements an exponential backoff strategy before re-queuing the message. Crucially, we implement a Dead Letter Queue (DLQ) for messages failing after maximum retries, allowing manual inspection without halting the pipeline. To handle peak loads, we utilize Kafka's partitioning to parallelize consumption across hundreds of workers. Finally, we monitor key metrics like consumer lag and DLQ depth to trigger auto-scaling policies, ensuring the system remains responsive even during viral events typical in Meta's ecosystem.

Design a Notification Service (Push/SMS/Email)

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Design a CDN Edge Caching Strategy

Design a System for Monitoring Service Health

Design a Payment Processing System