DEV Community

Cover image for First Adventure in Malware Data Science
mattmatt
mattmatt

Posted on

First Adventure in Malware Data Science

I haven’t been a software developer for very long, but I enjoy it. In my day job I do backend work in Flask, but I was looking for an exciting project to work on at home, something that would stretch my skills and teach me some new things. And maybe let me get a little dirty. Since I was a teen I’ve been fascinated by Malware. However, it’s a curiosity I haven’t explored as much as I’d like. So when No Starch Press had a sale I found and pre-ordered Malware Data Science by Joshua Saxe and Hillary Sanders.

I just started reading chapter 4, and I’m enjoying it so far. Maybe I’ll write a review when I’m done. In the meantime, I’d like to blog real quick about a small issue I had working through chapter 2, and how I fixed it (it was my problem, not the authors’).

All the authors’ code is in python2.7, and I’m more used to 3. No big deal though, right? I’m a professional, I can suss this out. No, this wasn’t my problem.

What I had missed when a small detail when I downloaded the sample code and malware files. The zip file is entitled malware_data_science_entrypoints_redacted.zip

The task in chapter 2 I was attempting was to print out decompiled malware, starting at the entrypoint’s address. I completed my port of their python script (it was dodgy putting parentheses into that print statement) and ran it. And nothing happened.

What did I do wrong, I wondered? I checked the file. I read through the readme on pefile’s github. I checked capstone’s documentation. I was doing everything right. I went so far as to download the authors’ helpful Ubuntu vm, with all the code and data on it already. It worked there. But not on my fedora vm.

This frustrated me for far longer than it should have. I don’t remember when ‘entrypoints_redacted’ caught my eye, but I felt a little silly then. I altered my script a little to print out the entrypoint’s address, and it is adorably named 0xcc00ffee. When you feed this address to the disassembler you get nothing, as that address (it’s an offset, right?) is very large (3 and a half gb) and the size of the file itself is almost 631k.

I ran their vm and printed out the entrypoint address from there. Lowly 0x121ba (or 74170 in python). Entered that into my script and voila, I got disassembled code. It’s not exactly what the book says it should be, but it is exactly what the working code on the authors’ vm says it is, so I guess I did something right.

I’m looking forward to digging deeper in this book.

Discussion (2)

Collapse
devin_smith_1 profile image
Devinsmith • Edited

I have been using various tools to prevent malware and other such viruses. Ususally i Choose the combination of Avat and Malwarebytes. They together make a great combination and provide all around protection. If you are looking for basic protection for your windows then you can choose between Mcafee and Avast and you will get rid of all the basic security issues easily

Collapse
uwujpg profile image
UwU-jpg

Dealing with malware can be really difficult. I myself don't know much about it that's why I rely on antivirus software. Thankfully, there are many options. I was choosing between Eset or Avast for a while but finally, I got one of these products.