As part of the course OSD600 (Open Source Development), we were asked to create a tool scratch that could provide context to LLMs based on a git repository. The tool should index a codebase and provide the contents, file structure in (initially) Markdown format. I found this pretty cool given we could choose the programming language we wanted.
This is initially a toy project. Modern IDEs have exceptional agentic tools that can index huge codebases super-fast with tailored responses according to the developer's needs.
I really wanted to push myself with this project; therefore, I chose Rust. Now, I'd say I'm very proficient in C++, having a solid track record of open-source contributions in a large codebase like LLVM. The only risk that this posed was that it was a collaborative course. Therefore, my code needs to be expanded by other students, which implies several things. Most students nowadays are acquainted with the "Pop" programming languages that include "Python, JavaScript, Java, etc.” There's indeed nothing wrong with that; however, moving from those programming paradigms to Rust could pose a significant challenge. Still, I decided to leap forward.
The Rust Programming Language
Rust learns a lot from languages from the C family (C/C++). It not only provides explicit memory management but also provides safety over it. Think of it: one of the main benefits of memory allocations in modern C++ is the introduction of smart pointers. Smart pointers are based on the following paradigm: RAII (Resource acquisition is initialization), which implies that a resource (such as dynamic memory) is allocated through its constructor and deallocated when its destructor is called afterwards. This is the same principle that applies to all objects in Rust intrinsically. This basically allows us to represent each object with lifetimes, which are crucial to avoiding dangling pointers, invalid memory access and other very common problems present in C/C++.
For example
struct MyResource;
impl Drop for MyResource {
fn drop(&mut self) {
println!("My resource is being cleaned up!");
}
}
fn main() {
let _r = MyResource;
println!("My resource is in use.");
} // The _r variable goes out of scope here, and `drop` is automatically called.
I will not go into detail on every single Rust "goodie"; however, the example above illustrates the benefits and the enhancement of the developer's experience when transitioning from C++ to Rust. In fact, I missed perhaps the most important thing, which is the borrow checker, but as I mentioned earlier, I do not want to make this complicated.
My previous experience with the language
I had written some casual programs, and it was certainly a pleasant experience. Although sometimes it took a bit longer to get a prototype working, that's expected given the language design itself. In one of these small projects, I tried to replicate the famous game Pong using the SDL2 lib. In another project, I attempted to write a Lexer generator. The previously mentioned project was dropped since I started investing my time heavily in Open Source projects of my interest.
To conclude this section, what exactly did I expect from the language? I expected a smooth development experience without weird stack traces and other obscure error messages coming from my compiler when writing C++; I'd like to say in advance that it definitely matched my expectations, kind of a shallow thing to say, but most times my code compiled, it ran without any memory-related problems.
Now, Let's talk about the tool itself.
Every "serious" project requires careful planning. One thing I had in my mind for sure is that I didn't really want to write things from scratch altogether. Certain things like command line argument parsing and such are amazing in the Rust ecosystem. I'll denote the Crates I utilized and its functionalities and relevance in my project.
Core CLI & Framework
• clap - Command-line argument parsing and help generation.
Git Integration
• git2 - Git repository operations and metadata extraction
• chrono - Date/time formatting for git commit timestamps
File Processing
• globset - Pattern matching for file filtering (.rs, src/*)
• ptree - Directory tree visualization in terminal output (loved this one)
In the following diagram, I'll demonstrate how the main logic and these dependencies were glued together.
I'd like to mention that out of all dependencies assembled, there wasn't any particular one that was difficult to use. My development experience was very smooth, and perhaps a large portion of the time was spent thinking about how to write tests related to each isolated functionality I added. I'm aware tests and a CI/CD pipeline was not required however as I believe I mentioned earlier, It's the best we can utilize to keep track of the changes we perform as we progress.
Lessons learned and the road ahead: As I highlighted at the very beginning of this blog, it's always good to come back and push yourself out of your comfort zone. I realize I could have spent less time writing this in a simpler way. I believe great thinkers, engineers, and technical people are exposed to many technologies to build a solid foundation of their knowledge. Using Rust for this assignment will help me strengthen my skills in system programming, and who knows, I may want to contribute to the Rust compiler at some point :).
The foundation of this tool is now set up, and support to any person willing to contribute to my project will always be there. I cannot hide my excitement but in the following weeks, it is where the fun will begin. IREE and other great projects that align with my skillset are the next stop in my open source journey. It's time to make compilers great.
Top comments (0)