DEV Community

Cover image for GSoC 20: Beginning of a new journey
Niraj Kamdar
Niraj Kamdar

Posted on • Updated on

GSoC 20: Beginning of a new journey

Past year one of my friend got selected in GSoC and I have come to know about this program. In February 2020, I have started looking for the organizations that uses Python language since I love Python and what can be better than Python Software Foundation itself. So, I have applied for Intel's CVE Binary Tool - a suborg of PSF(Python Software Foundation) because I really liked the idea of scanning vulnerabilities in binary files and I have been driven to work on this tool because I myself have developed several binary apps and It would be good if I know about vulnerabilities it contains due to dependencies so that I can choose appropriate dependencies which reduces security risks.

What is the CVE Binary Tool?

The CVE Binary Tool scans for a number of common, vulnerable open source components like openssl, libpng, libxml2, expat etc. to let you know if a given directory or binary file includes common libraries with known vulnerabilities.

How it works?

We have checkers for popular open source libraries which look at the strings found in a binary file to see if they match certain unique strings found in an open source library and try to guess it's version. We have a scanner module which recursively scans every binary file of the given directory and parse strings from the binary file and forward it to every checkers and they determine the vendor, product and version and pass it to the scanner then it look into the local copy of NVD database and finds all the vulnerabilities associated with the given product and displays it. We supports many output formats like JSON, CSV and a nice console format.

How I started contributing to CVE Binary Tool?

I have first read GSoC Wiki of the CVE Binary tool and at that time, they only had two project ideas for GSoC this year.

  • Add new checkers to the CVE Binary Tool (Difficulty: easy)

The CVE Binary Tool has only a small number of checkers, which means it can only detect CVEs in a small set of known pieces of software. The purpose of this project is to add some new ones.

  • Improve CVE Binary Tool Output (Difficulty: Intermediate)

Ths CVE Binary Tool currently only has human-readable console output (and some debug log levels) but it would be useful if it had machine readable output (such as JSON or CSV formats) and improved human output (improving existing console output or branching out to more extensive reports). This project is all about making the output better.

I have then read documentations and I found writing unit-test one of the easiest task. So, I have written a simple unit-test for Python checker. I have got some confidence when my first PR got merged. Then, I started looking into whole code-base and understood how it is working internally? I have explored different parts of the code and added some checkers, fixed many bugs related to Windows and added code coverage functionality to GitHub actions pipeline. My mentors Terri Oda and John Anderson have actively helped me to improve my PR.

After my organization got selected for GSoC, I have started working on my GSoC proposal. At that point I have decided to work on improving CVE Binary Tool input and output system. I have gone through Python's GSoC guidelines and created a proposal according to their given template. I have invested a week on creating my proposal. My mentors pointed out some issues and I have resolved all of it. Now, it was time to wait till project disclosure. On 4th May 11:30 I have got an email from Google stating I got selected for GSoC 2020 and It was the best moment of my life.

Alt Text

Unlike other projects, my project got too many ideas like adding support for various inputs and outputs, internationalization of the tool, concurrency improvement and removing compiler dependency from tests. So, my mentors decided to divide the project into two parts: 1) Improving the Output of the CVE Binary Tool and 2) Improving Concurrency and Input Support. Since my proposal was focused on improving input support and I have good understanding of concurrency, I have selected the second project. While first will be done by Harmandeep Singh.

Alt Text

What did I do in Community Bonding Period?

I have fixed several bugs (like stale egg info, extractor bugs in windows etc.), written faster native Python solution to replace c-strings extension module and refactored whole checkers module to use object-oriented approach to reduce repetition of code. Previously, we have to write several functions when we were creating a checker, now all we need to do is write a class with 5 attributes which inherits from the Checker class which contains very generic methods for the subclass to use. If you want to learn more about how to write a checker? checkout our checker contributing guidelines.

I also had video conference meetings with my mentors scheduled every week on Wednesday where we discussed about the project design and implementation aspects. Since, my project involves adding concurrency to the CVE Binary Tool. I was studying asyncio and concurrent.futures modules during this time. My mentor has also helped me and recommended few articles.

After approval of designs, I have created GitHub issues and added that to my project board. Here's the list of tasks, I will be doing during GSoC:

What's next?

This week, I will be working on removing compiler dependency of test_scanner which is part of my GSoC project. I have started 3-4 days early and I have already finished first task of this week which was splitting cli.py module into cli.py and scanner.py.

I will be sharing my experiences every week. So, stay tuned. See you next week.

Discussion (0)