Design a Time Series Database (TSDB)

System Design
Hard
Tesla
35.9K views

Design a database optimized for storing and querying time-series data (e.g., sensor readings, stock prices). Discuss compression and indexing strategies.

Why Interviewers Ask This

Tesla evaluates this question to assess your ability to architect systems for high-velocity IoT data from vehicles. They specifically test your understanding of write-heavy workloads, efficient compression algorithms like Delta-of-Delta or Gorilla, and time-based indexing strategies that enable rapid aggregation without sacrificing storage costs.

How to Answer This Question

1. Clarify requirements: Define write throughput (millions of events/sec per vehicle), retention policies, and query patterns like range scans or aggregations over specific time windows. 2. Propose a schema: Suggest a columnar storage format optimized for time-series, separating metadata from metrics to maximize compression ratios. 3. Detail ingestion: Describe a write-ahead log followed by a memory buffer (memtable) that flushes to immutable disk segments to handle burst traffic. 4. Explain compression: Discuss encoding techniques such as run-length encoding for constant values and bit-packing for sensor IDs to reduce Tesla's massive fleet storage costs. 5. Address querying: Outline an inverted index on tags (e.g., VIN, sensor type) combined with sorted timestamp indexes to accelerate point-in-time lookups.

Key Points to Cover

  • Explicitly mention compression algorithms like Delta-of-Delta or Gorilla relevant to sensor data
  • Propose a columnar storage architecture rather than row-based SQL tables
  • Address the write-heavy nature of IoT telemetry with memtables and immutable segments
  • Explain how to balance latency for real-time monitoring versus cost for long-term storage
  • Design a partitioning strategy based on unique identifiers like VINs for data isolation

Sample Answer

To design a TSDB for Tesla's fleet, I would prioritize write throughput and storage efficiency given the volume of telemetry from millions of vehicles. First, I'd define the schema using a column-oriented store where each column represents a metric like battery voltage or motor RPM. This allows us to apply highly effective compression algorithms independently per column. For ingestion, data would flow into a high-speed in-memory structure before being flushed to disk as immutable SSTables. This ensures low-latency writes even during peak data bursts. Crucially, I would implement specialized compression: using Delta-of-Delta encoding for timestamps since they are sequential, and Gorilla XOR compression for floating-point sensor readings, which typically exhibit small changes between samples. This could reduce storage needs by up to 90% compared to raw text. Regarding indexing, a global partition key based on Vehicle ID (VIN) is essential for isolation. Within partitions, we maintain a sorted index on timestamps. For queries requiring filtering across multiple cars, I'd layer a secondary inverted index on tags like 'model' or 'region'. This hybrid approach supports both fast single-vehicle diagnostics and broad fleet-level analytics required for OTA updates and safety monitoring.

Common Mistakes to Avoid

  • Focusing solely on relational database features like ACID transactions instead of write optimization
  • Ignoring the massive scale of data ingestion expected from a fleet of autonomous vehicles
  • Suggesting generic compression methods like ZIP instead of domain-specific time-series encodings
  • Overlooking the need for automatic data expiration or tiered storage for old telemetry data

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 29 Tesla questions