loading...
Cover image for Python | Hamming Problem

Python | Hamming Problem

banji220 profile image Banji Updated on ・3 min read

██████╗░███╗░░██╗░█████╗░
██╔══██╗████╗░██║██╔══██╗
██║░░██║██╔██╗██║███████║
██║░░██║██║╚████║██╔══██║
██████╔╝██║░╚███║██║░░██║
╚═════╝░╚═╝░░╚══╝╚═╝░░╚═╝

Hey, everyone.
In this post I'm going to tell you about Hamming problem (Simple) and my solution for it.
If you are not beginner better to leave this tutorial cuz it could be boring and useless for you!
but if you are a newbie bear with me cause it was such a cool problem for me.

problem:
Calculate the Hamming Distance between two DNA strands.

Your body is made up of cells that contain DNA. Those cells regularly wear out and need replacing, which they achieve by dividing into daughter cells. In fact, the average human body experiences about 10 quadrillion cell divisions in a lifetime!

When cells divide, their DNA replicates too. Sometimes during this process mistakes happen and single pieces of DNA get encoded with the incorrect information. If we compare two strands of DNA and count the differences between them we can see how many mistakes occurred. This is known as the "Hamming Distance".

We read DNA using the letters C,A,G and T. Two strands might look like this:

    GAGCCTACTAACGGGAT
    CATCGTAATGACGGCCT
    ^ ^ ^  ^ ^    ^^

They have 7 differences, and therefore the Hamming Distance is 7.

The Hamming Distance is useful for lots of things in science, not just biology, so it's a nice phrase to be familiar with ❤

so first of all I defined a Function and used if statement to make sure if the length of two statement are equal or not so :

def distance(strand_a, strand_b):

    if len(strand_a) == len(strand_b):
        first_strand = [letter for letter in strand_a]
        second_strand = [letter for letter in strand_b]
    else:
        raise ValueError("The length of Sequences are not equal")

but I could write this piece of code more simple, you may ask how?
like this:

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b):
        raise ValueError("Length of two sequences most be the same")

As you can see instead of writing 6-7 lines of code(first solution) I wrote second function in just 3 lines of code!

so let's see what we can do for the next part of the code...
we need to pair every iterator together with zip() function!
like this:


diff = zip(first_strand, second_strand)

after that I created an empty list with two purpose:

  • put differences in a list
  • using len() function to get the length of differences

count = []

With for loop we're looking in our tuples to see if paired iterators are same or not, and append the differences to an empty list which count = [] and using len(count) to get the length of differences from count and returning len(count)!

like this:

for x, y in diff:
        if x != y:
            count.append(x)  
return len(count)

so the complete solution would be like this:

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b):
        raise ValueError("Length of two sequences most be the same")

    count = []
    zip_a_b= zip(strand_a, strand_b)

    for x, y in zip_a_b:
        if x != y:
            count.append(x)
    return len(count)

EDIT:
My friend Jeremy Grifski suggested a more efficient way with less code:

it feels weird to create and throw away a list just for its length, Jeremy Grifski said!
After all he comment his clever solution to improve our code, so here it is:
Instead of:

   count = []
    zip_a_b= zip(strand_a, strand_b)

    for x, y in zip_a_b:
        if x != y:
            count.append(x)
    return len(count)

we are using Generator-expressions:

count = sum(1 for x, y in zip(strand_a, strand_b) if x != y)
return count

If you want to know more about List-comprehension or Generator-expressions, I found this Link useful to understand these two concepts.

Finally if you think I can write cleaner and more readable just let me know and leave a comment below.
Tnx for reading my post.
and spending you time with me.

Keep Moving Forward ツ

Code with 💛

🅑🅐🅝🅙🅘

Posted on by:

banji220 profile

Banji

@banji220

A Persian boy who is so lovely and wanna learn sth every Single-Day!

Discussion

markdown guide
 

I'm no noob, but here it is(less lines and more pythonic):

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b): raise ValueError("Length of two sequences most be the same")
    zip_a_b = zip(strand_a, strand_b)
    count = len([x for x, y in zip_a_b if x != y])
    return count

Your's was okay though.

 

Both are great! But, it feels weird to create and throw away a list just for its length. I thought this generator expression solution was clever:

count = sum(1 for x, y in zip(strand_a, strand_b) if x != y)

Source: Stack Overflow

 

Way smarter, totally tosses the array. Absolutely, and perfecly, pythonic.

 

Hey Jeremy, thank you for sharing a better and clever solution in this post.
thank you for reading this post and leaving a comment to improve this solution.
Happy Coding with LoVe

 

I'm gonna edit my post and add your great solution in my post.
Thank you :)
Keep Moving Forward

Code with 💛

🅑🅐🅝🅙🅘

 

Hey Sean, tnx for sharing your better way to improve this solution.
Happy coding with LoVe

 

made me feel good : ))))

 

You're making me feel amazing in this jourey.
Keep moving forward bro
Tnx for encouraging me.
Happy coding with ❤️