DEV Community

Cover image for Solving "Minimum Genetic Mutation" leetcode
Maurício Antunes
Maurício Antunes

Posted on

Solving "Minimum Genetic Mutation" leetcode

Hello! Today, we are going to solve Minimum Genetic Mutation on leetcode.

This challenge uses some concepts we've already talked in other posts: BFS (Breadth-First Search), brute force, sets, and finding the minimum using BFS.

The Challenge

Given a gene string that is 8 characters long, containing only the letters A, C, G, and T, find the minimum mutations needed to change the initial gene into a target gene.

For example, changing "AACCGGTT" to "AACCGGTA" is one mutation.

One of the constraints of the exercise is the bank. The bank is a list of valid mutations. To go from the start gene to the end gene and count how many mutations are needed, we must ensure every mutation is valid.

Note that the starting point is assumed to be valid, so it might not be included in the bank.

Our first approach

Seems easy, right? We just need to check the difference in both genes. We don't need this bank.

Let's implement this approach:

def count_mutations(str1, str2):
    mutations = sum(1 for a, b in zip(str1, str2) if a != b)
    return mutations
Enter fullscreen mode Exit fullscreen mode

This code:

  • Iterates through both strings using zip;
  • Returns 1 if a char in startGene differs from endGene;
  • Sum all the ones.

Try this approach and run the tests on leetcode. It passes!

Now, submit the code. It didn't run, right? You will try to cover edge cases until you have no energy and understand that this challenge is not about differences; it is about paths.

Take a look at one of the examples where our simple code breaks:

startGene = "AACCTTGG"
endGene = "AATTCCGG"
bank = ["AATTCCGG","AACCTGGG","AACCCCGG","AACCTACC"]

Output: 4
Expected: -1
Enter fullscreen mode Exit fullscreen mode

It fails because there is no way to change one letter at a time from start gene to end gene while using only valid mutations.

Let's visualize the problem with a simple diagram:

This diagram illustrates the genetic mutation paths required to transform the gene sequence AACCGGTT into AAACGGTA. The paths include one path with a single mutation, two paths with two mutations, and one path with three mutations. Nodes represent gene sequences, and edges represent valid mutations

A better approach

Now we know the bank must be used, let's try a correct approach to solve the problem.

Read the challenge description again. Focus on the highlighted parts like minimum, valid, and starting point.

They all suggest that we need a code that:

  • Ends as soon as we reach the target;
  • Checks if our moves are valid;
  • Has a starting point to iterate from to reach the target.

Bread-First Search (BFS) is excelent for finding the shortest or minimum path needed to go from one position to another.

Implementation

Here is the code we are going to explain in detail:

from collections import deque

class Solution:
    def minMutation(self, startGene: str, endGene: str, bank: List[str]) -> int:
        bank = set(bank)

        queue = deque([(startGene, 0)]) # gene, changes

        if endGene not in bank:
            return -1

        while queue:
            gene, changes = queue.popleft()
            if gene == endGene:
                return changes

            for i in range(len(gene)):
                for c in "ACGT":
                    attempt_gene = gene[:i] + c + gene[i+1:]
                    if attempt_gene in bank:
                        queue.append((attempt_gene, changes + 1))
                        bank.remove(attempt_gene)

        return -1
Enter fullscreen mode Exit fullscreen mode

First, we convert our bank in a set to perform O(1) lookups:

bank = set(bank)
Enter fullscreen mode Exit fullscreen mode

Since we need to check if every mutation is valid, turning the bank into a set is the best approach.

Then, we create our queue. In Python, we can initialize the queue with some values. Our starting point is startGene with a total of mutations that starts from zero.

queue = deque([(startGene, 0)])
Enter fullscreen mode Exit fullscreen mode

Remember I said this challenge is about paths? With this in mind, we know we need to keep track of the paths we visited, or the genes we already checked, so we don't end up in an infinite loop.

And now the most complex part:

while queue:
    gene, changes = queue.popleft()
    if gene == endGene:
        return changes

    for i in range(len(gene)):
        for c in "ACGT":
            attempt_gene = gene[:i] + c + gene[i+1:]
            if attempt_gene in bank:
                queue.append((attempt_gene, changes + 1))
                bank.remove(attempt_gene)
Enter fullscreen mode Exit fullscreen mode

while queue ensures we iterate our queue until it has genes.

Everytime we pop one node from the queue, we check if it is equal to our target, endGene.

gene, changes = queue.popleft()
    if gene == endGene:
        return changes
Enter fullscreen mode Exit fullscreen mode

Then we have two nested for loops. They are necessary because we want to replace one letter at a time and check if the mutation is valid. The first loop iterates through the current string, and the second loop tries to replace the current index with A, C, G, or T.

for i in range(len(gene)):
    for c in "ACGT":
        attempt_gene = gene[:i] + c + gene[i+1:]
        if attempt_gene in bank:
            queue.append((attempt_gene, changes + 1))
            bank.remove(attempt_gene)
Enter fullscreen mode Exit fullscreen mode

If the gene is in the bank we add it to the queue as a valid mutation (or a path).

We also remove it from the bank, so we don't need to check it again to avoid adding it back to the queue, which would cause an infinite loop.

Complexity

Time: it is O(b) where b is the bank size.
It is a little tricky, but the reason is that a gene string has a limit of 8 characters. When measuring time complexity we need to drop the constant, no matter how big they are. In our case, the constants are 8 and 4: genes are 8 characters long and we might have ending up replacing each letter by A, C, G or T. 8 * 4 = 32

Space: it is also O(b) where b is the bank size.
We use a queue that can store, in the worst case, all genes from the bank.
There is also the set we use to keep track of visited genes that can contain up to b genes.

Top comments (2)

Collapse
 
axorax profile image
Axorax

nice!

Collapse
 
cyytrus profile image
Paulo Castro

This content is pure gold! Please, keep going! I recommend solving the "Sudoku solver" problem, I did it a few interviews ago and was very stuck on him. It would be a very nice article!