The Goal of CLI Tool
The purpose of RepositoryContextPackager is to package the data based on the instructions and sending LLMs properly formatted data to solve coding issues in an easy way.When working with ChatGPT on coding problems, I got tired of copying and pasting files one by one.
On the other hand we can give specific data to the LLMs which are related to dependencies our project structure for making LLM understand our project and guide us in an accurate way. As a result of it, we can focus on the main parts and extract data about and keep growing our projects and getting more specific knowledge rather than total information which makes us overwhelmed.
My Solution
C++ program that is suitable for a single command that analyzes your project and generates a comprehensive context document including local path, git history, directory tree, and all source files with their extensions, and total statistics. Also, based on your desires you can exclude or include specific files or folders. No more manual file selection or losing project context when asking LLMs for help.
Benefits of CLI tool to My Knowledge
When i started developing this tool i did not expect i can get such powerful knowledge. I got more exposure to file reading and writing, C++17 libraries and features. Especially When i wanted to represent tree structure i realized somehow i need to use recursion to traverse deeply for files and folders. I learned std::filesystem which helped me recursively looking for files and folder. That was the first time i set up my project with CMake and i learned how to integrate other open source project to my code and use it. This tool improved my skills related to the edge cases, error handling and thinking about my design before writing spaghetti code.
My First Challenge
My first challenge was how to get git info section. First thing came to my mind was to use git commands and parsing their output, but what if git is not installed in the user's system or git is not in the path. so i cloned libgit2 and packaged via vcpkg which i was using for the first time and i was out of my comfy zone because of using different tools and integrating another project to my project. While i was getting information about gitInfo i revised my knowledge about C style code and pointers even i was writing in C++. Also, i discovered that how actually GIT stores the commit identifiers. Look at the code below:
char sha[GIT_OID_HEXSZ + 1];
git_oid_tostr(sha, sizeof(sha), oid);
info.m_commit = sha;
You know those long strings of letters and numbers you see when you make a commit? Like Commit: 'a sequence of hexadecimal characters' ?
I always thought Git makes just some random ID for each commit but i was completely wrong. Underneath the hood, when we make a commit Git takes all content - our files , who we are, date and runs SHA-1 hash function which
produces binary data. For that reason, it is converted to string.
Second challenge
Most of the time when we start building our projects we don't think the future actions and one of my stupid mistakes was to too much relying on to the std::cout. At some point i needed to to write the data to the file. Rather than rewrite everthing , i learned that i can redirect the output.
void writeFileStructure(std::ostream& o, const std::filesystem::path& path) {
o << "Structure\n\n";
//capture where cout goes
std::streambuf* original_out = std::cout.rdbuf();
std::ostringstream caughtOut;
//redirect it to the string buffer
std::cout.rdbuf(caughtOut.rdbuf());
//aha, call my function and now output goes to string not to the terminal
fsTravel::travelDirTree(path, 0);
//restore cout and write the caught data to file
std::cout.rdbuf(original_out);
o << caughtOut.str() << "\n\n";
}
Language Specifics
once in my project I realized std::array was better with some specific data because it is being known at compile time and i used it with constexpr. But one problem occured that i had never seen before. Can you see the error?
constexpr std::array<std::string, 8>ignoredDirs{
".vs", "build", "out", ".git", ".github", ".gitignore", ".gitmodules", ".gitattributes"
};
Well, constexpr is used for suggesting the compiler , please compiler, know the data at compile time which is more reliable than const and std::array is more efficient than std::vector memory-vise because in vector capacity is being increased twice when the size matches the capacity. The problem here is that std::string involves dynamic memory allocation which constexpr denies at all so therefore it is better to use std::string_view which is used to only view the string not modify. and it can be created at compile time.
Top comments (0)