Minimum Genetic Mutation

Algorithms
Medium
Google
94.3K views

Given a start gene, an end gene, and a bank of valid genes, find the minimum number of mutations needed to change the start gene to the end gene, where each mutation changes only one character and results in a valid gene. Use BFS.

Why Interviewers Ask This

Google asks this question to evaluate a candidate's ability to model real-world biological processes as graph traversal problems. They specifically test proficiency in Breadth-First Search for finding shortest paths in unweighted graphs, attention to detail regarding edge constraints like single-character mutations, and the capacity to translate abstract logic into efficient code under pressure.

How to Answer This Question

1. Clarify the problem by confirming that genes are strings of equal length and mutations must exist in the provided bank. 2. Visualize the scenario as an unweighted graph where each gene is a node and edges connect valid single-mutation pairs. 3. Propose a Breadth-First Search (BFS) strategy, explaining why it guarantees the minimum number of steps compared to DFS or recursion. 4. Outline your data structures: a queue for BFS traversal and a set for O(1) lookups of valid genes to prevent revisiting nodes. 5. Walk through a dry run with a small example, explicitly handling edge cases like unreachable end genes or empty banks before discussing time and space complexity.

Key Points to Cover

  • Explicitly identifying the problem as a shortest path search requiring BFS
  • Demonstrating understanding of graph modeling where nodes are genes and edges are valid mutations
  • Using a Set or Hash Map for O(1) lookup efficiency during neighbor generation
  • Implementing a visited mechanism to prevent infinite loops and redundant processing
  • Correctly handling edge cases such as unreachable targets or invalid inputs

Sample Answer

To solve the Minimum Genetic Mutation problem efficiently, I would treat this as a shortest path problem on an unweighted graph. Since we need the minimum number of mutations, Breadth-First Search is the optimal approach because it explores all neighbors at the current depth before moving deeper, ensuring the first time we reach the target gene, it is via the shortest path. My implementation would start by initializing a queue with the starting gene and a mutation count of zero. I would also convert the input bank into a hash set for constant-time validation. In the main loop, I dequeue the current gene, generate all possible next states by changing each character to 'A', 'C', 'G', or 'T', and check if these new strings exist in our bank set. If a generated string matches the end gene, I immediately return the current step count plus one. If it exists in the bank but isn't the target, I add it to the queue and remove it from the set to mark it as visited, preventing cycles. This ensures we explore the search space level by level. If the queue empties without finding the end gene, I return -1. This approach yields a time complexity of O(N * M), where N is the gene length and M is the bank size, which is highly performant for Google's scale requirements.

Common Mistakes to Avoid

  • Using Depth First Search (DFS) which finds a path but not necessarily the shortest one
  • Failing to remove visited genes from the bank set, causing the algorithm to revisit nodes infinitely
  • Generating invalid characters instead of strictly limiting changes to A, C, G, and T
  • Not checking if the start gene equals the end gene initially, leading to unnecessary computation

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 145 Algorithms questionsBrowse all 87 Google questions