Design an IoT Sensor Data Ingestion Pipeline

System Design
Hard
Tesla
134.3K views

Design a system to ingest high-volume, low-latency sensor data from millions of devices. Focus on edge computing vs. cloud processing and handling data loss.

Why Interviewers Ask This

Interviewers at Tesla ask this to evaluate your ability to architect systems balancing extreme scale with strict latency requirements. They specifically test your judgment on edge versus cloud trade-offs, as Tesla vehicles must operate safely even when disconnected from the network. The question assesses your capacity to design for data loss resilience and your understanding of real-time constraints in safety-critical environments.

How to Answer This Question

1. Clarify Requirements: Immediately define scale (millions of devices), latency targets (sub-second for safety vs. minutes for analytics), and reliability needs. 2. Propose Edge-First Architecture: Argue that raw high-frequency sensor data must be processed locally on the vehicle or gateway to reduce bandwidth and ensure immediate response, sending only aggregated insights or critical alerts to the cloud. 3. Select Streaming Technologies: Recommend Kafka or Pulsar for the ingestion layer to handle massive throughput and decouple producers from consumers. 4. Address Data Loss: Detail a strategy using idempotent writes, local buffering with retries, and exactly-once semantics to prevent data gaps during network outages. 5. Discuss Scalability: Explain how auto-scaling groups and partitioning strategies will manage traffic spikes without increasing latency.

Key Points to Cover

  • Prioritizing edge computing for low-latency safety decisions
  • Using message brokers like Kafka for decoupling and scaling
  • Implementing local buffering and retry logic to prevent data loss
  • Defining clear metrics for acceptable latency and throughput
  • Designing for eventual consistency rather than strict real-time sync everywhere

Sample Answer

To design an IoT pipeline for millions of sensors, I would prioritize an edge-first architecture. Given Tesla's focus on real-time vehicle safety, raw telemetry like battery voltage or motor speed must be ingested and analyzed locally on the vehicle's compute unit. This reduces upstream bandwidth by filtering noise and ensures immediate action if anomalies are detected, regardless of connectivity. For data transmission, I would use MQTT over TLS to push only essential events or compressed aggregates to a cloud ingestion layer. On the cloud side, Apache Kafka is ideal here due to its high throughput and durability; we can partition topics by vehicle ID to maintain ordering. To handle data loss, which is inevitable in mobile networks, the edge device should implement a persistent local queue. If the connection drops, data accumulates locally and transmits once reconnected, with sequence numbers ensuring no duplicates or gaps via idempotent processing. We would also implement a dead-letter queue for unprocessable messages to prevent pipeline blockage. Finally, for long-term storage, time-series databases like InfluxDB or TimescaleDB would store the historical data for model training, while real-time dashboards consume from Kafka for live monitoring.

Common Mistakes to Avoid

  • Suggesting direct database writes from millions of devices without a buffer layer
  • Ignoring the reality of intermittent connectivity in automotive scenarios
  • Focusing solely on cloud processing without leveraging edge capabilities
  • Overlooking the need for data deduplication and ordering guarantees

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 29 Tesla questions