Design a Public Transit Monitoring System

Question

Accepted Answer

To design this system, I would start by defining non-functional requirements: sub-second latency for location updates and 95% accuracy for ETAs within a 10-minute window. The architecture relies on an event-driven pipeline. First, vehicles publish GPS pings via lightweight MQTT to an edge gateway, which normalizes data and forwards it to Apache Kafka. This decouples ingestion from processing, handling spikes during rush hour. Next, a Flink stream processor aggregates these events, calculating velocity and detecting anomalies like sudden stops. For predictions, we feed this real-time context alongside historical traffic patterns into a pre-trained XGBoost model hosted in a containerized environment. This model predicts travel time based on current speed, road closures, and weather. We store raw trajectories in Cassandra for auditing but cache the final ETA in Redis, indexed by route ID and vehicle ID, ensuring the user app retrieves data in milliseconds. To handle connectivity loss, devices buffer data locally and sync once reconnected. Finally, we implement a feedback loop where actual arrival times update the model weights weekly to improve future accuracy.

Design a Public Transit Monitoring System

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Design a Payment Processing System

Design a System for Real-Time Fleet Management

Design a CDN Edge Caching Strategy