DEV Community

Cover image for Code review with private LLM? In pipeline? Simple!
Roman Kiprin
Roman Kiprin

Posted on

Code review with private LLM? In pipeline? Simple!

When you push code to GitLab and plan to merge it into the main branch, having your changes reviewed is essential. This review process helps ensure that you don't introduce poorly written code or create new security vulnerabilities.

Imagine receiving immediate code analysis that provides valuable insights into your code quality and potential issues and looks like this:

### Safety Index: 6/10

The code changes introduce significant functionality improvements but also present several areas of concern that affect maintainability and potential security.

### Detailed Suggestions

**1: Memory leak risk in conversation history management in app/main.py lines 44-69. The in-memor

.....
Enter fullscreen mode Exit fullscreen mode

Moreover, this solution is cost-effective and compatible with various technologies. Rather than utilizing third-party LLM services that may train their models on your data, you can deploy your own private Large Language Model.

While this may seem challenging, such solutions are not only feasible in today's AI landscape, but are also remarkably affordable. I'll demonstrate how this can be accomplished.

GitLab for code, Azure for LLM, opencode for managing code review

Allow me to introduce the infrastructure.

GitLab.com - the project holder and manages repository and pipelines.
opencode - one of the best (and the most 'open') agentic command line code generation utility.
Microsoft Foundry - SaaS/PaaS solution to privately run LLMs and perform a lot of other stuff.

Project to gather all together. It is an LLM chat app, of course.

To demonstrate the setup in a near-production environment, I developed an AI-driven application. This application serves primarily as a sandbox for testing various technologies, with no intended practical use beyond that purpose.

chat.roki.cool app

You can find the application at: chat.roki.cool. Please note that hosting costs approximately $30 per month, so it may be unavailable at the time you're viewing it.

Don't worry! While the application is entertaining, it's not the main focus of this article.

The schema

Below is the diagram of the complete project. However, this article focuses specifically on the portion related to implementing LLMs in GitLab pipelines.

Therefore, we'll be examining only the relevant section of the diagram, which is clear.

Pipeline diagram

Let me start with the model, OK?

The gpt-oss-120b large language model

Why gpt-oss-120b? To be honest, you can chose whatever LLM you like from the list that Microsoft provides you with. There is a huge variety. Models by OpenAi. Best models, provided by partners, like DeepSeek, Grok, Kimi, ollama...

And more partners, like (arguably the best models) from Anthropic: claude sonnet, claude opus (including 4.5!) and more! If that is not enough - you can deploy models from huggingface! However, I did not try that.

For this implementation, I chose gpt-oss-120b primarily because of its cost-effectiveness. It is about 10-30 times cheaper than leading models from premium vendors.

While I wouldn't recommend gpt-oss-120b for code generation tasks, it excels at code analysis, making it an ideal choice for this specific use case. The decision was straightforward given these considerations.

By any means, if you are interested in investing more, you are welcome - the technology stays the same.

Microsoft Foundry

To be honest, Microsoft Foundry is huge. Just look at its documentation. And Microsoft tries to bring consistency with that "new" look and feel.

For our purposes, we'll be using it simply as a hosting environment to run our preferred LLMs. This is similar to how you might use your own local machine with substantial resources - though most of us don't have access to 196GB of memory and multiple GPUs at home, which is exactly why services like Microsoft Foundry are valuable.

To run workloads on Microsoft Foundry, you'll need a standard Azure subscription. Here's what you'll need to set up:

  • Create the Foundry service

Azure Marketplace select Foundry

  • Create a Foundry project within Foundry service

Foundry new project

  • Open Foundry portal and have full control over the Foundry provided services

Goto Foundry Portal

  • Have your LLM endpoint and secret via Foundry home

Welcome to foundry

or via Azure Portal interface:

Endpoints via Azure Portal

... and you are good to go. OK, almost good to go. Let me save you a couple of hours on your research.

The LLM endpoint and secret

That is, probably, the trickiest part of the story. Microsoft provides different endpoints for various LLMs, and you'll need to locate the specific URI for your chosen model:

LLM endpoint

However, I found that opencode does not work with this endpoint. (At least I failed to make it working.) So, I did my testing and discovered, that

  • most of the models are available via different endpoints simultaneously
  • opencode works with any "openai-compatible" endpoint

That means your LLM endpoint should look like this:

https://<name_of_your_ms_foundry_service>.services.ai.azure.com/models
Enter fullscreen mode Exit fullscreen mode

So, you have to setup LLM_ENDPOINT and LLM_KEY as CI/CD variables in your project:

GitLab CICD Variables

We will use them in the GitLab pipeline later on!

The GitLab pipeline

Not really. The entire pipeline definition is available in my gitlab repository.

Here I will show you only the job, related to the opencode setup and execution.


validate_job:
  stage: validate
  image: node:22-slim
  script:
    - echo "Installing opencode"
    - npm install --global opencode-ai
    - echo "Installing glab"
    - export GITLAB_TOKEN=$GITLAB_TOKEN_OPENCODE
    - apt-get update --quiet && apt-get install --yes curl wget gpg git && rm --recursive --force /var/lib/apt/lists/*
    - curl --silent --show-error --location "https://raw.githubusercontent.com/upciti/wakemeops/main/assets/install_repository" | bash
    - apt-get install --yes glab jq
    - echo "Configuring glab"
    - git config --global user.email "opencode@gitlab.com"
    - git config --global user.name "Opencode"
    - GITLAB_HOST=https://gitlab.com
    - GITLAB_TOKEN=$CI_JOB_TOKEN
    - echo $GITLAB_HOST
    - echo "Creating opencode auth configuration"
    - mkdir --parents ~/.local/share/opencode
    - |
      cat > ~/.local/share/opencode/opencode.json << EOF
      {
        "$schema": "https://opencode.ai/config.json",
        "model": "back/gpt-oss-120b",
        "provider": {
          "back": {
            "npm": "@ai-sdk/openai-compatible",
            "options": {
              "baseURL": "$LLM_ENDPOINT",
              "apiKey": "$LLM_KEY"
            },
            "models": {
              "gpt-oss-120b": {
                "name": "gpt-oss-120b",
                "tool_call": true,
                "reasoning": true,
                "limit": {
                  "context": 128000,
                  "output": 128000
                },
                "options": {
                  "tools": true,
                  "reasoning": true
                }
              }
            }
          }
        }
      }
      EOF
    - echo "Running Opencode"
    - opencode -version
    - PROMPT1=$(cat ./gitlab-ci.prompt1.md)
    - echo -e "\n \n"
    - opencode run "$PROMPT1"
    - CURRENT_SAFETY_SCORE=$(cat ./smell_analysis.json | jq .safety_score)
    - echo "Safety score $CURRENT_SAFETY_SCORE. It is supposed to be better than $SAFETY_SCORE to continute."
    - |
      if [ $CURRENT_SAFETY_SCORE -le $SAFETY_SCORE ]; then
         echo -e "The safety score of the changes is lower than our standards allow. This is a blocker. \n \n \033[31mYou shall not pass!!!\033[0m  \n \n"
         exit 1
      fi

Enter fullscreen mode Exit fullscreen mode

I'll outline and explain each component of this process.

First, we install opencode into a Docker image and set up the default Git configuration. This enables the LLM that opencode communicates with to access the Git repository, should the AI determine it's necessary.

    - echo "Installing opencode"
    - npm install --global opencode-ai
    - echo "Installing glab"
    - export GITLAB_TOKEN=$GITLAB_TOKEN_OPENCODE
    - apt-get update --quiet && apt-get install --yes curl wget gpg git && rm --recursive --force /var/lib/apt/lists/*
    - curl --silent --show-error --location "https://raw.githubusercontent.com/upciti/wakemeops/main/assets/install_repository" | bash
    - apt-get install --yes glab jq
    - echo "Configuring glab"
    - git config --global user.email "opencode@gitlab.com"
    - git config --global user.name "Opencode"
    - GITLAB_HOST=https://gitlab.com
    - GITLAB_TOKEN=$CI_JOB_TOKEN
    - echo $GITLAB_HOST
Enter fullscreen mode Exit fullscreen mode

I'll explain how we dynamically configure opencode. As I mentioned earlier, you need to provide the LLM_ENDPOINT and LLM_KEY variables for the pipeline to function properly. This job therefore requires both CI/CD variables to be properly initialized.

The "model" property specifies the default model for opencode. It's composed of the provider name ("back" in this case) and the specific LLM name ("gpt-oss-120b").

You can customize additional parameters such as allow reasoning or use tools. Here is the link to the official opencode config documentation.

    - echo "Creating opencode auth configuration"
    - mkdir --parents ~/.local/share/opencode
    - |
      cat > ~/.local/share/opencode/opencode.json << EOF
      {
        "$schema": "https://opencode.ai/config.json",
        "model": "back/gpt-oss-120b",
        "provider": {
          "back": {
            "npm": "@ai-sdk/openai-compatible",
            "options": {
              "baseURL": "$LLM_ENDPOINT",
              "apiKey": "$LLM_KEY"
            },
            "models": {
              "gpt-oss-120b": {
                "name": "gpt-oss-120b",
                "tool_call": true,
                "reasoning": true,
                "limit": {
                  "context": 128000,
                  "output": 128000
                },
                "options": {
                  "tools": true,
                  "reasoning": true
                }
              }
            }
          }
        }
      }
      EOF
Enter fullscreen mode Exit fullscreen mode

In this step, we execute a prompt from the gitlab-ci.prompt1.md file using opencode. More about the prompt in the following section.

    - echo "Running Opencode"
    - opencode -version
    - PROMPT1=$(cat ./gitlab-ci.prompt1.md)
    - echo -e "\n \n"
    - opencode run "$PROMPT1"
Enter fullscreen mode Exit fullscreen mode

Next, we analyze the ./smell_analysis.json file and determine whether we should proceed or if the 'safety_score' is too low to allow the code to be merged.

    - CURRENT_SAFETY_SCORE=$(cat ./smell_analysis.json | jq .safety_score)
    - echo "Safety score $CURRENT_SAFETY_SCORE. It is supposed to be better than $SAFETY_SCORE to continute."
    - |
      if [ $CURRENT_SAFETY_SCORE -le $SAFETY_SCORE ]; then
         echo -e "The safety score of the changes is lower than our standards allow. This is a blocker. \n \n \033[31mYou shall not pass!!!\033[0m  \n \n"
         exit 1
      fi
Enter fullscreen mode Exit fullscreen mode

The final stage of this pipeline job requires the ./smell_analysis.json file to be present. This file should be generated by opencode, but the question is: how does that happen?

Here comes...

The prompt

The key lies in the prompt. This is what opencode executes to perform code analysis and generate recommendations, including a 'safety_score' that's formatted for easy parsing with jq.

# Code Review Analysis Prompt for ClaudeCode

## Objective
You are an AI assistant tasked with evaluating code changes in a GitLab environment to identify code quality issues and assess their impact on the codebase.

## Primary Tasks

### 1. Code Difference Analysis
Analyze the output of `git diff main` to identify:
- Code quality issues and "code smells"
- Potential problems that changes could introduce
- Risks to existing workflows or data flows
- Complications for future codebase maintenance

### 2. Safety Index Calculation
Compute a **safety index** on a scale of **0 to 10**:
- **10**: Exemplary code with no identifiable issues
- **7-9**: Good code with minor improvements needed
- **4-6**: Acceptable code with moderate concerns
- **1-3**: Poor code with significant issues
- **0**: Critical problems, violations, or data risks present

### 3. Actionable Recommendations
Generate a comprehensive list of specific, actionable suggestions to improve the safety score.

## Output Requirements

### Documentation Quality
- Provide **thoughtful, detailed explanations** for every suggestion
- Each suggestion must include:
  - **At least 3 sentences** of explanation
  - **Specific file references** whenever possible
  - **Line number citations** where applicable
  - Clear reasoning for why the change matters


### Screen Output

Print the analysis results to the standard output using Markdown text formatting. Make the text easy readable in a log file. Add two empty lines before and after the analysis output. 

### File Output
Save the analysis results to a local file named `smell_analysis.json` in the following JSON format:

Enter fullscreen mode Exit fullscreen mode
{
  "safety_score": 5,
  "suggestions": [
    "1: Remove all instances of hardcoded configuration values in src/config.py lines 15-23. These values should be moved to environment variables or a configuration file to improve security and deployment flexibility. Hardcoded values make it difficult to maintain different configurations across environments and pose security risks.",
    "2: Ensure that all new dependencies in requirements.txt are properly versioned and pinned to specific versions. Using unpinned or loosely versioned dependencies can lead to unexpected behavior when packages are updated. This is particularly important for the newly added 'requests' and 'pandas' libraries.",
    "3: Add comprehensive unit tests for all new code in the user_service.py module lines 45-120. The new authentication logic introduces critical security functionality that must be thoroughly tested. Include tests for edge cases, error handling, and security scenarios to prevent vulnerabilities."
  ]
}
Enter fullscreen mode Exit fullscreen mode
## Important Notes
- Focus on substantive issues that affect code quality, security, maintainability, or performance
- Prioritize suggestions by impact (most critical first)
- Be specific and constructive in all feedback
- Reference concrete examples from the diff whenever possible
- Before executing any bash tool command show the command you are going to execute

Enter fullscreen mode Exit fullscreen mode

The output

That is how it looks when the pipeline fails:

You shall not pass

Here is a real example of the output when no serious issues were found in the code:


### Safety Index: 6/10

The code changes introduce significant functionality improvements but also present several areas of concern that affect maintainability and potential security.

### Detailed Suggestions

**1: Memory leak risk in conversation history management in app/main.py lines 44-69. The in-memory conversation_history dictionary will grow indefinitely as new session IDs are created, potentially consuming all available memory over time. This should be replaced with a proper database or implement session expiration/cleanup mechanisms to prevent memory exhaustion in production environments.**

**2: Insufficient input validation and sanitization in app/main.py lines 118-119. The HTML tag removal using regex `re.sub(r"<[^>]+>", "", message)` is inadequate for preventing XSS attacks and could miss malicious content. This should be replaced with a proper HTML sanitization library like bleach or html-sanitizer that provides comprehensive XSS protection.**

**3: Hardcoded session management limitations in app/main.py lines 167-170. The history limit of 13 messages is arbitrary and may not suit all use cases or model contexts. This should be configurable through environment variables and include proper validation to ensure the limit doesn't exceed the model's context window capabilities.**

*** skipping for clarity *** 

**8: Missing rate limiting per session in app/main.py. The current rate limiting is only IP-based, which could be bypassed by using different IPs or proxy networks. This should be supplemented with session-based rate limiting to prevent abuse even when IP addresses change.**
Enter fullscreen mode Exit fullscreen mode

The most interesting part is that you can copy-paste this analysis to your current coding agent and improve your posture.

The links

http://chat.roki.cool/
https://opencode.ai/docs/config/
https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/models-from-partners?view=foundry
https://gitlab.com/rokicool/test-opencode-agent

PS

Ideas, suggestions, comments? Don't hesitate!

And by any means this code is not written in stone. Everything might be rewriten, improved or removed.

Top comments (0)