Design a System for Monitoring API Latency

Question

Accepted Answer

To design a monitoring system for thousands of API endpoints, I would start by defining our SLOs, specifically targeting p99 latency under 200ms. Collecting 100% of request data is prohibitively expensive and creates unnecessary load, so I recommend an adaptive sampling strategy. We would implement a probabilistic sampler at the edge, perhaps dropping 90% of normal traffic but keeping 100% of requests that show errors or latencies above a threshold. For aggregation, instead of storing raw timestamps, we'd use a sliding time window approach where collectors aggregate metrics into buckets before sending them to a central store like Prometheus or Datadog. To handle the volume, we can employ sketch algorithms like Count-Min Sketch to estimate percentiles efficiently without storing individual latencies. The system must also include a dynamic alerting layer. If the error rate spikes above 1% or p99 latency breaches our SLO for more than two consecutive minutes, the system should trigger an immediate PagerDuty alert. Finally, we need a dashboard for real-time visualization. At Salesforce, where multi-tenant isolation is critical, we must ensure that tenant-specific latency spikes are visible without being masked by aggregate data. This approach balances granular visibility with the scalability required for enterprise workloads.

Design a System for Monitoring API Latency

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Design a CDN Edge Caching Strategy

Design a System for Monitoring Service Health

Design a Payment Processing System