DEV Community

Ashwin Patil
Ashwin Patil

Posted on • Updated on

Semantic Code Search

My Final Project

A tool that can search over large code corpus directly and list ranked snippets can provide an invaluable resource to programmers looking for similar code snippets using natural language queries. It must have a deep understanding of the semantics of source code and queries to evaluate their intent correctly. Over the years, many tools that rely on the textual similarity between source code and query have proven to be ineffective as they fail to learn the high-level semantic understanding of source code and query. While the previous models for code search using deep neural networks do a good job but, most of them only evaluate their models on only a single programming language, mostly Java. In this project, we propose a novel deep neural network model called UnifiedCodeNet that can handle the intricacies of different programming languages. This model borrows several vital features from different previous models and building on top of those ideas to make a unified model that can generate document vector embeddings from source code, and using similarity search with the query vector embedding can return the most similar code snippets in any language. In the implementation of the model, we leveraged the power of Open Source tools like FAISS, fastText as well as Code Repositories like GitHub to collect code snippets from public repositories using tools/techniques and benchmarks provided by CodeSearchNet.

Project Page

Top comments (0)