Design a Data Stream Median Finder

Question

Accepted Answer

To solve the Data Stream Median problem, I propose using two priority queues: a max-heap to store the smaller half of the numbers and a min-heap for the larger half. This structure allows us to access the median in constant time while maintaining logarithmic time complexity for insertions. First, I would initialize both heaps. When a new number arrives, I compare it to the top of the max-heap. If it is smaller or equal, I push it to the max-heap; otherwise, I push it to the min-heap. The critical step is rebalancing. After every insertion, I check the sizes of the heaps. If the difference exceeds one, I move the root of the larger heap to the smaller one. For example, if we have added [1, 3, 2], the max-heap holds [1] and the min-heap holds [2, 3]. Moving 2 to the max-heap balances them as [1, 2] and [3], making the median 2. This approach ensures that the median is always either the top of the larger heap or the average of both tops. In terms of complexity, each add operation involves heap push and pop operations, resulting in O(log N) time. Finding the median simply requires accessing the roots, which is O(1). This solution effectively handles the streaming nature of the data without needing to sort the entire collection repeatedly.

Design a Data Stream Median Finder

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Convert Binary Tree to Doubly Linked List in Place

How do you implement a queue using two stacks?

Design a Set with $O(1)$ `insert`, `remove`, and `check`