Design a Hash Set from Scratch

Data Structures
Easy
Uber
128.9K views

Implement a HashSet data structure using an array of buckets (lists or arrays) and a custom hash function. Implement `add`, `contains`, and `remove` methods.

Why Interviewers Ask This

Uber interviewers ask this to verify your foundational grasp of memory management and collision handling. They specifically evaluate whether you understand how to map arbitrary keys to fixed array indices, handle edge cases like hash collisions, and manage dynamic resizing without relying on built-in libraries.

How to Answer This Question

1. Clarify requirements: Ask about expected load factors, data types for keys, and whether the structure must be thread-safe or support concurrent access. 2. Define core components: Propose a fixed-size array where each index holds a linked list or dynamic array to store colliding elements. 3. Design the hash function: Explain your choice of a modulo operator (key % capacity) and discuss potential issues with negative numbers or poor distribution. 4. Implement logic: Walk through 'add' by checking existence first, 'contains' by traversing the bucket, and 'remove' by finding and unlinking the node. 5. Discuss optimization: Mention rehashing when the load factor exceeds a threshold (e.g., 0.75) to maintain O(1) performance, a critical detail for high-traffic systems like Uber's.

Key Points to Cover

  • Explicitly handling hash collisions using chaining or open addressing
  • Implementing a custom hash function that accounts for negative integers
  • Defining and managing a load factor to trigger dynamic resizing
  • Ensuring O(1) average time complexity for all operations
  • Discussing edge cases like empty buckets or duplicate insertions

Sample Answer

To design a robust HashSet from scratch, I would start by defining a class that initializes an internal array of buckets. Each bucket will be a dynamic list to handle collisions gracefully. For the hash function, I'd use a standard approach where the hash code is calculated as the key modulo the current array size. This ensures keys are distributed evenly across available slots. However, I must handle negative results from the hash code by taking the absolute value before applying the modulo operator. When implementing the add method, I first compute the index using the hash function. If the element already exists in that specific bucket, I return immediately to prevent duplicates. Otherwise, I append it to the bucket list. The contains method follows a similar path: calculate the index, traverse the bucket, and check for equality. For removal, I locate the element in the correct bucket and delete it, ensuring I don't leave gaps that affect future lookups. Crucially, I would implement a resize strategy. If the number of elements divided by the array size exceeds a load factor of 0.75, I double the array size and rehash all existing elements into the new buckets. This prevents performance degradation from excessive collisions, which is vital for maintaining low latency in real-time applications like ride-matching systems at Uber.

Common Mistakes to Avoid

  • Relying on built-in language libraries instead of implementing the underlying logic manually
  • Failing to handle negative hash codes, leading to out-of-bounds array errors
  • Ignoring the need for rehashing when the table becomes too full
  • Not explaining how collisions are resolved, resulting in O(n) worst-case scenarios

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 154 Data Structures questionsBrowse all 57 Uber questions