Inderpreet Singh Parmar

Posted on Sep 21, 2024

My First Open Source Contribution: Adding Token Usage Feature to a CLI Project

#opensource #github #git #cpp

Introduction

In this blog post, I will share my experience contributing to an open-source project for the DPS909 Topics in Open Source Development course at Seneca College. This was my first time contributing to a project I did not own, and it was both challenging and exciting. The goal was to enhance an existing project by adding a new feature: Token Usage Information.

The feature allows users to track the number of tokens used when interacting with Large Language Models (LLMs), a key metric for understanding and optimizing API usage.

Step 1: Selecting a Project

The first task was to pick a project to contribute to. I selected a CLI-based project from a fellow student that was designed to interact with an LLM API. The program already had core functionality like formatting and summarizing text, but it lacked a way to report how many tokens were being used in the process.

Once I reviewed the project’s code, I decided to file an issue to discuss the token usage feature I wanted to add. After discussing with the project owner, I proceeded to fork the repository and began working on my local machine.

Step 2: Implementing the Token Usage Feature

The task required me to add a new command-line flag --token-usage or -t. When the program is run with this flag, it should output the number of tokens used in the prompt and completion phases of the LLM interaction.

Modifying the CLI for Flag Parsing

The first thing I needed to do was to modify the cli.cpp file to handle the --token-usage or -t flag. Here’s how I did it:

else if (arg == "--token-usage" || arg == "-t") {
    showTokenUsage = true;  // Set the token usage flag to true if this option is passed
}

This allows the program to recognize when the flag is passed, and it triggers the token usage functionality.

Extracting Token Information

Next, I had to modify the code that interacts with the LLM API. This involved updating the eng_format.cpp file to parse the token usage information from the API response.

std::string eng_format::get_token_info(const std::string& response) {
    auto json_response = json::parse(response);
    if (json_response.contains("usage")) {
        int prompt_tokens = json_response["usage"]["prompt_tokens"];
        int completion_tokens = json_response["usage"]["completion_tokens"];
        int total_tokens = json_response["usage"]["total_tokens"];

        return "Token usage:\nPrompt tokens: " + std::to_string(prompt_tokens) + 
               "\nCompletion tokens: " + std::to_string(completion_tokens) + 
               "\nTotal tokens: " + std::to_string(total_tokens);
    } else {
        return "Token usage information not found in response.";
    }
}

This method extracts the number of tokens used in the prompt and completion phases from the JSON response returned by the API.

Testing the Feature

After writing the code, I tested it thoroughly to ensure it worked as expected. Running the CLI with the -t flag successfully displayed the token usage information:

$ ./cli_app --token-usage
Token usage:
Prompt tokens: 57
Completion tokens: 17
Total tokens: 74

This was a big win for me, as it confirmed that the feature worked correctly.

Step 3: Filing a Pull Request

Once I was satisfied with the implementation, I submitted a pull request. I made sure to provide a detailed description of the changes I made, linking the pull request to the issue I had filed earlier.

Pull Request Description:

Title: Add support for --token-usage/-t flag
Description: This PR adds the ability to display token usage information when interacting with the LLM. The --token-usage or -t flag can be passed via the command line, and the program will output the number of prompt tokens, completion tokens, and total tokens.

Step 4: Receiving Feedback

After submitting the pull request, the project owner reviewed my code and provided feedback. They asked me to make some small changes to improve the clarity of my error handling and suggested adding comments to explain some of the logic I wrote.

I addressed their feedback by updating my code and pushing the changes. After a few iterations of back-and-forth review, my pull request was approved and merged into the project’s main branch!

Step 5: Reflection

This experience taught me a lot about open-source contributions. I learned how to read and modify someone else’s code without breaking the existing functionality. The process of filing issues, creating branches, and submitting pull requests gave me practical experience in real-world collaborative development.

Lessons Learned:

Reading Code: Understanding the existing codebase was the hardest part, but it was crucial for making minimal, non-breaking changes.
Communication: Discussing my approach with the project owner helped me align my work with their expectations.
Testing: Testing your changes thoroughly is essential. Ensuring that nothing breaks is just as important as adding new features.

Conclusion

Contributing to open source was both challenging and rewarding. I gained valuable experience collaborating on GitHub and working with APIs and LLMs. I look forward to contributing more to open-source projects in the future!

DEV Community