DEV Community

Cover image for Python | Hamming Problem
Banji
Banji

Posted on • Edited on

Python | Hamming Problem

██████╗░███╗░░██╗░█████╗░
██╔══██╗████╗░██║██╔══██╗
██║░░██║██╔██╗██║███████║
██║░░██║██║╚████║██╔══██║
██████╔╝██║░╚███║██║░░██║
╚═════╝░╚═╝░░╚══╝╚═╝░░╚═╝

Hey, everyone.
In this post I'm going to tell you about Hamming problem (Simple) and my solution for it.
If you are not beginner better to leave this tutorial cuz it could be boring and useless for you!
but if you are a newbie bear with me cause it was such a cool problem for me.

problem:
Calculate the Hamming Distance between two DNA strands.

Your body is made up of cells that contain DNA. Those cells regularly wear out and need replacing, which they achieve by dividing into daughter cells. In fact, the average human body experiences about 10 quadrillion cell divisions in a lifetime!

When cells divide, their DNA replicates too. Sometimes during this process mistakes happen and single pieces of DNA get encoded with the incorrect information. If we compare two strands of DNA and count the differences between them we can see how many mistakes occurred. This is known as the "Hamming Distance".

We read DNA using the letters C,A,G and T. Two strands might look like this:

    GAGCCTACTAACGGGAT
    CATCGTAATGACGGCCT
    ^ ^ ^  ^ ^    ^^
Enter fullscreen mode Exit fullscreen mode

They have 7 differences, and therefore the Hamming Distance is 7.

The Hamming Distance is useful for lots of things in science, not just biology, so it's a nice phrase to be familiar with ❤

so first of all I defined a Function and used if statement to make sure if the length of two statement are equal or not so :

def distance(strand_a, strand_b):

    if len(strand_a) == len(strand_b):
        first_strand = [letter for letter in strand_a]
        second_strand = [letter for letter in strand_b]
    else:
        raise ValueError("The length of Sequences are not equal")
Enter fullscreen mode Exit fullscreen mode

but I could write this piece of code more simple, you may ask how?
like this:

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b):
        raise ValueError("Length of two sequences most be the same")
Enter fullscreen mode Exit fullscreen mode

As you can see instead of writing 6-7 lines of code(first solution) I wrote second function in just 3 lines of code!

so let's see what we can do for the next part of the code...
we need to pair every iterator together with zip() function!
like this:


diff = zip(first_strand, second_strand)

Enter fullscreen mode Exit fullscreen mode

after that I created an empty list with two purpose:

  • put differences in a list
  • using len() function to get the length of differences

count = []

Enter fullscreen mode Exit fullscreen mode

With for loop we're looking in our tuples to see if paired iterators are same or not, and append the differences to an empty list which count = [] and using len(count) to get the length of differences from count and returning len(count)!

like this:

for x, y in diff:
        if x != y:
            count.append(x)  
return len(count)

Enter fullscreen mode Exit fullscreen mode

so the complete solution would be like this:

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b):
        raise ValueError("Length of two sequences most be the same")

    count = []
    zip_a_b= zip(strand_a, strand_b)

    for x, y in zip_a_b:
        if x != y:
            count.append(x)
    return len(count)
Enter fullscreen mode Exit fullscreen mode

EDIT:
My friend Jeremy Grifski suggested a more efficient way with less code:

it feels weird to create and throw away a list just for its length, Jeremy Grifski said!
After all he comment his clever solution to improve our code, so here it is:
Instead of:

   count = []
    zip_a_b= zip(strand_a, strand_b)

    for x, y in zip_a_b:
        if x != y:
            count.append(x)
    return len(count)
Enter fullscreen mode Exit fullscreen mode

we are using Generator-expressions:

count = sum(1 for x, y in zip(strand_a, strand_b) if x != y)
return count
Enter fullscreen mode Exit fullscreen mode

If you want to know more about List-comprehension or Generator-expressions, I found this Link useful to understand these two concepts.

Finally if you think I can write cleaner and more readable just let me know and leave a comment below.
Tnx for reading my post.
and spending you time with me.

Keep Moving Forward ツ

Code with 💛

🅑🅐🅝🅙🅘

Top comments (8)

Collapse
 
seanolad profile image
Sean • Edited

I'm no noob, but here it is(less lines and more pythonic):

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b): raise ValueError("Length of two sequences most be the same")
    zip_a_b = zip(strand_a, strand_b)
    count = len([x for x, y in zip_a_b if x != y])
    return count

Your's was okay though.

Collapse
 
renegadecoder94 profile image
Jeremy Grifski

Both are great! But, it feels weird to create and throw away a list just for its length. I thought this generator expression solution was clever:

count = sum(1 for x, y in zip(strand_a, strand_b) if x != y)

Source: Stack Overflow

Collapse
 
seanolad profile image
Sean

Way smarter, totally tosses the array. Absolutely, and perfecly, pythonic.

Collapse
 
banji220 profile image
Banji

Hey Jeremy, thank you for sharing a better and clever solution in this post.
thank you for reading this post and leaving a comment to improve this solution.
Happy Coding with LoVe

Collapse
 
banji220 profile image
Banji

I'm gonna edit my post and add your great solution in my post.
Thank you :)
Keep Moving Forward

Code with 💛

🅑🅐🅝🅙🅘

Collapse
 
banji220 profile image
Banji

Hey Sean, tnx for sharing your better way to improve this solution.
Happy coding with LoVe

Collapse
 
amirdarx profile image
amir

made me feel good : ))))

Collapse
 
banji220 profile image
Banji

You're making me feel amazing in this jourey.
Keep moving forward bro
Tnx for encouraging me.
Happy coding with ❤️