DEV Community

Uday Rana
Uday Rana

Posted on

Refactoring codeshift

This week I spent some time refactoring my project codeshift. I've been meaning to do this for a while but I've been holding back from working on it on my own time so that I have stuff to do when I need to work on it for an assignment.

GitHub logo uday-rana / codeshift

A command-line tool that translates source code files into a chosen programming language.

codeshift

Codeshift is a command-line tool to translate and transform source code files between programming languages.

codeshift tool demo: translating an express.js server to rust

Features

  • Select output language to convert source code into
  • Support for multiple input files
  • Output results to a file or stream directly to stdout
  • Customize model and provider selection for optimal performance
  • Supports leading AI providers

Requirements

  • Node.js (Requires Node.js 20.17.0+)
  • An API key from any of the following providers:
    • OpenAI
    • OpenRouter
    • Groq
    • any other AI provider compatible with OpenAI's chat completions API endpoint

Installation

  • Clone the repository with Git:

    git clone https://github.com/uday-rana/codeshift.git
    Enter fullscreen mode Exit fullscreen mode
    • Alternatively, download the repository as a .zip from the GitHub page and extract it
  • In the repository's root directory (where package.json is located), run npm install:

    cd codeshift/
    npm install
    Enter fullscreen mode Exit fullscreen mode
  • To be able to run the program without prefixing node, run npm install -g . or npm link within the project directory:

    npm install -g 
    Enter fullscreen mode Exit fullscreen mode

The first thing I wanted to do was to split my gigantic index.js file into smaller modules. I've been wanting to do this for some time now because I noticed it's getting hard to find the different sections of code in the single source file. Doing this took a while because I had to test my program after splitting each section off, but after I was done I felt like a huge weight was lifted off my shoulders - my program was so much easier to understand. I can't understate how much of a difference this made. Isolating the program into different logical components lets you easily identify and work on the part of the logic you need to, without having to worry about the rest of the program.

I also took the opportunity to clean up the logic for assigning a default model, and added a default provider for when the user fails to provide the base URL (I just realized that'd only work if they have the API key for that specific provider, so I think I'm gonna roll that change back..).

Another big chunk of my effort went towards the completion output logic - this part of the project had a lot of duplicate code. I had separate loops for different conditions because I didn't want to place a condition inside the loop because it would be inefficient, but it made reading and maintaining the code a mess. A big part of efficient code is human efficiency - how easy it is to work with. A few extra CPU cycles to run some more conditional checks in a loop won't slow a computer down much, but having to work with 4 separate for loops that all do nearly the same thing will definitely slow down a human. I decided I'd rather prioritize maintainability so I coalesced them into a single loop and extracted it into a function.

Before:

      try {
        // Write to either output file or stdout
        if (outputFilePath) {
          let response = "";
          // Read response stream chunk by chunk
          for await (const chunk of completion) {
            // Concatenate chunk to response
            response += chunk.choices[0]?.delta?.content || "";
            if (chunk?.usage) {
              promptTokensUsed = chunk.usage.prompt_tokens;
              completionTokensUsed = chunk.usage.completion_tokens;
              totalTokensUsed = chunk.usage.total_tokens;
            }
            if (chunk?.x_groq?.usage) {
              promptTokensUsed = chunk.x_groq.usage.prompt_tokens;
              completionTokensUsed = chunk.x_groq.usage.completion_tokens;
              totalTokensUsed = chunk.x_groq.usage.total_tokens;
            }
          }
          fs.writeFile(outputFilePath, `${response}`);
        } else {
          // Read response stream chunk by chunk
          for await (const chunk of completion) {
            // Write chunk to stdout
            process.stdout.write(chunk.choices[0]?.delta?.content || "");
            if (chunk?.usage) {
              promptTokensUsed = chunk.usage.prompt_tokens;
              completionTokensUsed = chunk.usage.completion_tokens;
              totalTokensUsed = chunk.usage.total_tokens;
            }
            if (chunk?.x_groq?.usage) {
              promptTokensUsed = chunk.x_groq.usage.prompt_tokens;
              completionTokensUsed = chunk.x_groq.usage.completion_tokens;
              totalTokensUsed = chunk.x_groq.usage.total_tokens;
            }
          }
Enter fullscreen mode Exit fullscreen mode

After:

  /**
 * Processes the completion response stream and writes the content to the output destination.
 *
 * This function iterates over the completion stream, writing each chunk to the specified
 * output file or stdout. If token usage tracking is requested, it accumulates token usage
 * statistics from the response chunks.
 *
 * @async
 * @function processCompletionStream
 * @param {AsyncIterable<Object>} completion - The stream of completion response chunks from the API.
 * @param {string} [outputFilePath] - Path to the output file. If not provided, writes to stdout.
 * @param {boolean} tokenUsageRequested - Whether to track and accumulate token usage statistics.
 * @param {Object} tokenUsage - An object to accumulate token usage data.
 * @param {number} tokenUsage.prompt_tokens - Number of tokens used in the prompt.
 * @param {number} tokenUsage.completion_tokens - Number of tokens generated in the completion.
 * @param {number} tokenUsage.total_tokens - Total number of tokens used.
 * @throws Will terminate the process with exit code 23 if an error occurs while reading the response stream.
 */
async function processCompletionStream(
  completion,
  outputFilePath,
  tokenUsageRequested,
  tokenUsage,
) {
  const writeFunction = outputFilePath
    ? async (completionChunk) =>
        await fs.appendFile(
          outputFilePath,
          completionChunk.choices[0]?.delta?.content || "",
        )
    : (completionChunk) => {
        process.stdout.write(completionChunk.choices[0]?.delta?.content || "");
      };

  try {
    for await (const chunk of completion) {
      await writeFunction(chunk);

      if (tokenUsageRequested) {
        const usage = chunk?.x_groq?.usage ?? chunk?.usage;

        if (usage) {
          tokenUsage.prompt_tokens += usage.prompt_tokens || 0;
          tokenUsage.completion_tokens += usage.completion_tokens || 0;
          tokenUsage.total_tokens += usage.total_tokens || 0;
        }
      }
    }

    if (outputFilePath) {
      await fs.appendFile(outputFilePath, "\n");
    } else {
      process.stdout.write("\n");
    }
  } catch (error) {
    console.error(`error reading response stream: ${error}`);
    process.exit(23);
  }
}
Enter fullscreen mode Exit fullscreen mode

At one point during refactoring I broke my program when I tried moving the program variable definition into another file and tried importing it in my start file. I didn't look into it too much but it said program.action() (which is the method used to run the program) was undefined so I assume I made a mistake when exporting. Either way, it wasn't a lot of logic so I was fine leaving it in the start file.

After refactoring my code I was asked to squash all of my commits together. I've been squashing commits for a little bit now so I know what to expect, what to do, and especially what not to do. It went pretty smooth - I squashed my commits, rebased my refactoring branch on main, and merged it into main, which led to a clean fast-forward merge (and a giant commit message).

the giant commit message

I think having this level of control over the git history is awesome. It lets you clean up your commits and makes the history so much easier to understand. And what's incredible about Git is that even if you royally screw up, it acts as this safety net so you never lose your work, so you can play around with rebasing and squashing and get used to how they work without having to worry.

Top comments (0)