My experience with MercadoLibre's Mutant challenge

#meli #mercadolibre #backend #python

Introduction

The technical interview consists of developing an api so that Magneto can recruit mutants. The api receives an array of words that represent human DNA. If 4 equal letters (A, T, G, C) were found, then a mutation was detected.

The challenge had lot of goals but few time (no more than 30hs) to do it: OK!

Goals

Explicit requirements.

Effective: You have to detect if a human is a mutant or not
Efficient: make the algorithm that can support a million requests per second
Time: do it in no more than a week. Consider that I have to do it after work, so it must be done within 30hs.
Testing coverage > 80%

Generic requirements.

Human Readable Code: write a clean code with good practices, comment if necessary, etc.
Use a cloud framework: AWS, Google, Heroku
Load to github
Make README and extra documentation

Before coding

I've started reading the pdf and thinking about what the structure should be (early return, max lengths, regex?) and the framework to use, wether Spring or Python Flask. As loading the project to github was a requirement, I search for projects like that and scratching ideas. Always considering that I have few time to do it and need to priorize my "the program should be" list.

Choose the framework: JAVA or Python ?

JAVA and Python are the two frameworks I'm familiar with, more with Python than JAVA. JAVA Spring is a good choice as many web projects are made with it and has an indisputable matureness.
Searching in the projects I realized a project was done with python, so I made the choice. Why python? Since I've been working mainly with python i feel comfortable with it (o, I'm used to it) because is fast coding and has an acceptable performance. I know it is used for bigdata and science. Python also is more "flexible" than JAVA. Another reasonThen I choose heroku for the cloud server because it is fast to create an app. For now, no DB-persist framework is needed.

No more thoughts: start to CODING :D

Early Return. It was not an explicit requirement but I considered essential. In my opinion, you can make an efficient algorithm, avoiding O(n^n) notations, embebbed loops, applying machine learning (no time for researching them). However there ir a classic human behaving that it is always present: bad entry. So early return is very important. The function it must detect bad entries as like words are non-empty and have DNA letters (A, T, G, C), the matrix has a correct order (NxN). I should check if the matrix has more than x order? An 1000n order could make the system to collapse (finally I had no information about that). Another thing is that less than 1% of humans are mutant and > 99% are not mutant, so it is possible that the api could be accessed by mostly human. What a pity, I could not early detect if it is human until I detect before is not a mutant. Finally if I detect it is a mutant, break the loop.
Checking mutant word: regex, a list of characters in a word, binary-search or tree-structures, AI searching methods. I consider that regex is more flexible and good practice that hardcoding the list of possible words (AAAA, CCCC, TTTT, GGGG). I realised that it is not expected that they couldn't find another letter for mutant DNA, or another sequence (e.g. TATA): no flexible is needed, just efficient. No time for the last methods. So I choose for hardcoding the 4 sequences within an if.
Reading the matrix strategies: in the projects I found that there were 3 strategies to read the matrix. From left to right for every row. From bottom to top, for every column. And every diagonal. For that, I need to break the matrix in arrays of chars, so it was adding complex to the algorithm. So that, I choose to have faith and first check the matrix horizontally. If I were lucky, I could detect a mutant in the rows. If not, check in the columns. Finally if needed, break the matrix in arrays of char so that could check the diagonals. In other words, start with simple scenario and leave the complex ones to the end.
Persist the result for repeating entries. This is related to early return. Consider if a human insist on checking again (bad intentionally or not), that could waste resources. So I consider a good practice to save the result for the entry. I used Redis for that. It costs much less than reprocessing the DNA matrix again.
Notify the result to the user? As reaching a first version, I realised if were a good choice to notify a human if it is false result. The requeriment was return 200 if were a mutant and 403 if were a human. First, 200 means that the server could process the requests and 403 is a forbidden access; for api-rest 403 could be interpreted as an access error. I think it should answer 200 in both cases. Second, sometimes it is not good to give extra information to the user, the program were intended to recruit mutants. Because if was my interpretation of the situation, I ask for MercadoLibre recruiters about that. They answered that do what I feel to. So I keep the requerimient: 200 if mutant, 403 if human

After coding

I did the test for the program and fix minor things. I detected that I had some false-positive cases and some false-negative too.

Another thing I had to deal is using heroku, thank good-ness was not a big problem, but I took me 6hs (2 days).

To Do

I really wanted to do a better program but had little time for the interview.

Work with algebra's transformation on matrix: inverted matrix, transpose, etc.
Searching methods like binary search, tree, AI methods...
Use some load balancer like Nginx

What I learned

Testing code is more important than I knew.
Choose a good framework that you are confortable with is a good decission.
Write your experiences and share! I started the project seeing other projects and save me a lot of time.

my code is located https://github.com/alesisjoan/meli-mutant

Hope it helped you.