DEV Community

Cover image for My Capstone Project: Deep Learning to Detect Bugs in Code using Graph Based Neural Networks
Mukund Raghav Sharma (Moko)
Mukund Raghav Sharma (Moko)

Posted on

My Capstone Project: Deep Learning to Detect Bugs in Code using Graph Based Neural Networks

Over this long weekend, I decided to revisit details of my capstone project for my Master's in Data Science (graduated May 2020) using Deep Learning to detect bugs in code. The paper can be found here.

This paper involved conducting a comparative study of the efficacy (based on test accuracy) of Gated Graph Neural Networks (GGNNs) vs. Relational Graph Convolutional Networks (RGCNs) on a task to automatically detect the misuse of a variable for the top 25 trending C# repositories on Github.

The results showed that RGCNs outperformed GGNNs for all cases (did a considerable amount of randomized hyperparameter tuning but wasn't fruitful to shake up the results), albeit, within < 5%.

My work was based on multiple papers by Microsoft Research (particularly https://lnkd.in/gd5kTEEv) and used Tensorflow to conduct the analysis. In a nutshell, the training data obtained was a modified version of the Abstract Syntax Tree generated by the Roslyn compiler.

Some lessons I picked up from this experience are:

  1. Sticking with a white paper even if you don't understand any of it in the beginning.
  2. Digesting material through different media is a good to way to switch it up: I heavily relied on YouTube videos of Deep Learning conferences (https://lnkd.in/gzcTRGhG) to get a more lecture based approach to bolster my learning.
  3. Treating data as a first class citizen. If possible, using version control / backing up data is a definitely a lesson I learnt through this. Clobbering my old weights was something I did more times than I like to admit.
  4. I am a big advocate of self describing code, however, since this was such a new space for me with a steep learning curve, commenting as much as I could made a significant difference.
  5. Testing on small byte sized chunks saved me a considerable amount of time down the road: prototypical testing saved me countless hours because of the countless runtime errors that I could have faced if I hadn't done an E2E run.

Any feedback would be greatly appreciated! Happy to answer any questions.

Discussion (0)