Design a Dedicated Health Check Service

Question

Accepted Answer

To design a dedicated Health Check Service, I would first establish clear SLAs, aiming for sub-100ms response times even under heavy load, which is crucial for Uber's real-time dispatching needs. The core component would be an active polling engine that sends lightweight HTTP requests to internal microservices like Driver-Routing or Payment-Processing every few seconds. This differs from passive checks, which analyze actual user traffic; while passive checks reflect real-world usage, they only reveal issues after users are affected. Active checks allow us to detect failures before customers notice. Architecturally, this service should be stateless and horizontally scalable, using a message queue to distribute check tasks across multiple nodes to avoid bottlenecks. We must implement exponential backoff when services return errors to prevent thundering herd problems. Additionally, the system needs a fallback mechanism where, if the health service is down, critical services default to a safe state rather than blocking requests. Finally, we'd aggregate these metrics into a unified dashboard, triggering alerts via PagerDuty if latency spikes or error rates exceed thresholds, ensuring our platform remains robust during peak demand periods.

Design a Dedicated Health Check Service

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Discuss ACID vs. BASE properties

Design a CDN Edge Caching Strategy

Design a Payment Processing System