Alright, so for my OSD600 course, we had this assignment: build a command-line tool. To solve a problem that I, and probably a lot of other developers, face all the time. You know when you're trying to get help from an LLM like ChatGPT and you end up copy-pasting a dozen different files? It's a mess. You lose the project structure, and the AI has no idea how main.js relates to utils/helper.js. My tool fixes that by packaging an entire repository's context into a single, clean text file.
I decided to build this with Python. As a Computer Programming and Analysis student, Python just felt like the right tool for the job. Also, I am studying ML with python so I am more comfortable with it. It’s got awesome built-in libraries for file system operations (os) and parsing command-line arguments (argparse), which meant I could get started fast without wrestling with a bunch of dependencies.
The core features came together one by one:
CLI Arguments: argparse was a lifesaver. Setting up --version and --help flags, and getting it to accept file paths, was super straightforward. It handles all the user input validation for you, which is great.
File Discovery: The main logic revolves around os.walk(). I had it recursively go through directories, grabbing all the files. I made sure to add a simple rule to ignore hidden files and common development folders like venv to keep the output focused and clean.
Git Integration: This was a fun part. I used Python's subprocess module to run actual git commands like git rev-parse HEAD. This way, the tool can grab the latest commit hash, branch name, author, and date. I also wrapped it in a try...except block so it wouldn't crash if you ran it on a folder that wasn't a git repository. It just gracefully says "Not a git repository."
Output Formatting: I wanted the output to be as readable as possible for both humans and AIs. I spent a good bit of time creating a nice tree structure to visualize the directory layout. It's a small touch, but seeing that ├── and └── tree makes a huge difference. For the code itself, I used the pygments library to guess the language of each file, which let me add syntax highlighting hints to the markdown code blocks.
For the optional features, I chose two that I knew I'd personally use all the time:
Output to File (-o or --output): While printing to the console is the default, sometimes you just want to save that context to a file to use later. This feature makes the tool way more flexible.
Token Counting (--tokens): As someone interested in ML, I'm always aware of the context window limits of LLMs. This flag gives a rough estimate of the token count. It's super handy to know if your context is going to fit before you even try to paste it.
This project was a great experience. It really emphasized how important the little details are. It’s not just about making the code work, it’s about making it work in a way that’s predictable and helpful for the user. I'm pretty proud of what I built. It's a simple tool, but it solves a real, annoying problem I deal with every day.
Top comments (0)