Design a Compressed String Iterator

Data Structures
Medium
Stripe
47.7K views

Design an iterator that supports the `next` and `hasNext` operations for a run-length encoded string (e.g., 'L1e2t1c1o1d1e1').

Why Interviewers Ask This

Stripe asks this to evaluate a candidate's ability to handle stateful iteration over compressed data efficiently without loading the entire string into memory. It tests understanding of pointers, loop invariants, and edge cases like multi-digit counts or empty inputs. The problem specifically probes whether you can design an O(1) amortized solution that balances time complexity with space constraints, a critical skill for high-throughput payment systems.

How to Answer This Question

Start by clarifying requirements: confirm if counts are always single digits or if they can be multi-digit integers. Propose a two-pointer approach where one pointer tracks the current character and another tracks the remaining count. Step 1: Parse the input string once during initialization to build a list of (char, count) pairs or maintain indices. Step 2: Implement hasNext by checking if the current index is within bounds and the current count is greater than zero. Step 3: For next, decrement the current count; if it hits zero, advance the character pointer to the next pair. Step 4: Handle edge cases immediately, such as empty strings or invalid formats. Step 5: Discuss time and space complexity, emphasizing that while parsing takes O(N), each next/hasNext call is O(1). This structured breakdown shows systematic thinking valued at Stripe.

Key Points to Cover

  • Demonstrate O(1) time complexity for next and hasNext operations
  • Show clear handling of multi-digit numbers in the run-length encoding
  • Explain how you manage state without storing the full decompressed string
  • Address edge cases like empty strings or zero counts explicitly
  • Articulate the trade-off between initial parsing cost and runtime efficiency

Sample Answer

To solve the Compressed String Iterator, I first clarify that the input format uses run-length encoding where numbers can be multi-digit. My approach involves maintaining a pointer to the current character and a counter for its remaining occurrences. During initialization, I will parse the string to locate these segments. However, to optimize for memory, I won't pre-parse everything into a new array unless necessary; instead, I'll use two indices on the original string: one for the character and one for the number. For hasNext, I check if the character index is valid and if the current count is positive. If the count reaches zero, I advance the pointer to find the next character and reset the count. For next, I simply return the current character and decrement the count. If the count drops to zero, I trigger the logic to move to the next segment before returning. This ensures O(1) time per operation after an initial O(N) scan. I'd also handle edge cases like 'a0' or trailing characters gracefully. This design aligns well with Stripe's focus on efficient, scalable data processing where minimizing unnecessary allocations is key.

Common Mistakes to Avoid

  • Decompressing the entire string upfront, which violates space constraints
  • Failing to handle multi-digit numbers, assuming all counts are single digits
  • Not updating the character pointer correctly when the count reaches zero
  • Ignoring edge cases like an empty input string or malformed sequences

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 154 Data Structures questionsBrowse all 57 Stripe questions