DEV Community: Manikanta Suru

I Added Claude to Our MR Pipelines. It Now Reviews Every Code Change Before Humans Do.

Manikanta Suru — Tue, 12 May 2026 17:32:42 +0000

Three incidents. Three things that passed human code review and shouldn't have.

A developer pushed an API key into a test repository. It sat committed in the git history for days before anyone noticed.

A Terraform merge request got approved with a security group ingress rule open to 0.0.0.0/0. It went to pre production.

A GitLab API token ended up hardcoded in a README.md file. The README. The file everyone reads. Merged without a flag.

I'm the sole DevOps engineer at a pre-seed energy startup. I support a 10+ person engineering team across backend, frontend, and Terraform infrastructure. We process 100+ merge requests every month. I can't be in every MR. And even when senior engineers reviewed, things slipped through — especially at the end of the week, especially under deadline pressure, especially on Terraform where most developers aren't fluent.

So I added a Claude AI review stage to our Jenkins MR validation pipeline. Now every merge request gets an automated AI review before any human sees it — posting comments directly on the changed code in GitLab, flagging security issues, checking standards, catching the things tired humans miss.

Here's exactly how it works.

The Architecture: Jenkins + GitLab + Claude

Our setup uses GitLab for source control and Jenkins for CI/CD. When a developer opens or updates a merge request in GitLab, a webhook fires and triggers our Jenkins pipeline. Inside that pipeline lives a file called Jenkinsfile-mr-validation — our dedicated MR validation pipeline. One of its stages runs the Claude AI review.

The flow:

Developer opens/updates MR in GitLab
          │
          ▼
GitLab webhook fires → Jenkins triggered
          │
          ▼
Jenkinsfile-mr-validation runs
          │
    ┌─────┴──────────────────────┐
    │  Stage 1: Lint & Validate  │
    │  Stage 2: Unit Tests       │
    │  Stage 3: Claude AI Review │ ← this is what we're building
    │  Stage 4: Security Scan    │
    └────────────────────────────┘
          │
          ▼
Claude fetches MR diff from GitLab API
          │
          ▼
Claude reviews changed files
          │
          ▼
Comments posted back to GitLab MR inline
          │
          ▼
Developer sees AI feedback before human reviewer arrives

The AI review stage never blocks a merge. It's informational — a first-pass that runs automatically so human reviewers can focus on what actually needs human judgment.

The Jenkinsfile Stage

Inside Jenkinsfile-mr-validation, the Claude review stage looks like this:

pipeline {
    agent any

    environment {
        ANTHROPIC_API_KEY = credentials('anthropic-api-key')
        GITLAB_TOKEN = credentials('gitlab-api-token')
    }

    stages {

        stage('Lint & Validate') {
            // ... existing stages
        }

        stage('Claude AI Review') {
            when {
                expression { env.gitlabActionType == 'MERGE' || 
                             env.gitlabActionType == 'UPDATE' }
            }
            steps {
                script {
                    sh '''
                        pip install anthropic python-gitlab --quiet
                        python3 scripts/claude_mr_review.py \
                            --project-id ${gitlabMergeRequestTargetProjectId} \
                            --mr-iid ${gitlabMergeRequestIid}
                    '''
                }
            }
            post {
                failure {
                    echo 'Claude review failed — continuing pipeline'
                }
            }
        }

        stage('Security Scan') {
            // ... existing stages
        }
    }
}

Two important details:

The when condition ensures this stage only runs on actual MR events — not on branch pushes or scheduled builds. No point reviewing code that isn't being merged.

The post { failure } block means the pipeline continues even if the AI review script crashes or the Anthropic API times out. A failed AI review should never block a deployment.

Credentials are stored in Jenkins credential store — anthropic-api-key and gitlab-api-token — never hardcoded. Ironically, hardcoded credentials were one of the problems this system was built to catch.

The Python Review Script

import anthropic
import gitlab
import argparse
import os
import sys

def get_mr_changes(gl_client, project_id, mr_iid):
    """Fetch changed files from GitLab MR."""
    project = gl_client.projects.get(project_id)
    mr = project.mergerequests.get(mr_iid)
    changes = mr.changes()
    return changes['changes'], mr

def detect_file_type(filename):
    """Route to correct review prompt based on file type."""
    if filename.endswith(('.tf', '.tfvars')):
        return 'terraform'
    elif filename.endswith(('.js', '.ts', '.jsx', '.tsx', '.vue')):
        return 'frontend'
    elif filename.endswith(('.py', '.java', '.go', '.rb')):
        return 'backend'
    return 'general'

def build_system_prompt(file_type):
    """Build file-type-aware review prompt."""

    base = """You are a senior software engineer doing a code review.
Analyze the diff and identify real issues only.

Format every finding as:
SEVERITY: [CRITICAL/HIGH/MEDIUM/LOW]
FILE: [filename]
ISSUE: [specific description]
SUGGESTION: [concrete fix]

Rules:
- Maximum 10 findings per review
- Only report genuine issues — no hypotheticals
- CRITICAL = security risk or data loss potential
- HIGH = bug or significant quality issue
- MEDIUM = code quality concern
- LOW = style or minor improvement
- If nothing significant found, say so clearly"""

    terraform_checks = """

Terraform-specific checks:
- Security groups with ingress open to 0.0.0.0/0 on any port
- Unencrypted RDS instances, EBS volumes, or S3 buckets
- IAM policies with wildcard actions or resources
- Resources missing required tags (Name, Environment, Owner)
- Sensitive outputs without sensitive=true
- Hardcoded credentials or tokens in any value"""

    frontend_checks = """

Frontend-specific checks:
- User input rendered without sanitization (XSS risk)
- Sensitive data written to localStorage or console
- Hardcoded API endpoints, keys, or tokens
- Missing input validation on form fields"""

    backend_checks = """

Backend-specific checks:
- Hardcoded credentials, API keys, or tokens
- Missing authentication or authorization checks
- SQL queries built with string concatenation
- Sensitive data written to logs
- Missing error handling in async operations"""

    if file_type == 'terraform':
        return base + terraform_checks
    elif file_type == 'frontend':
        return base + frontend_checks
    elif file_type == 'backend':
        return base + backend_checks
    return base

def review_file_with_claude(filename, diff, file_type):
    """Send file diff to Claude for review."""

    # Using Haiku — faster and cheaper than Sonnet or Opus
    # At 100+ MRs/month, cost compounds quickly
    # Haiku handles code review quality well for this use case
    client = anthropic.Anthropic(
        api_key=os.environ['ANTHROPIC_API_KEY']
    )

    # Truncate large diffs — stay within token limits
    if len(diff) > 8000:
        diff = diff[:8000] + "\n\n[diff truncated — file too large]"

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1500,
        system=build_system_prompt(file_type),
        messages=[{
            "role": "user",
            "content": f"Review this diff for {filename}:\n\n{diff}"
        }]
    )
    return response.content[0].text

def post_review_comment(mr, review_text, filename):
    """Post review as MR comment in GitLab."""
    comment = f"### 🤖 AI Review — `{filename}`\n\n{review_text}"
    comment += "\n\n*Automated review. Human approval still required.*"
    mr.notes.create({'body': comment})

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--project-id', required=True)
    parser.add_argument('--mr-iid', required=True)
    args = parser.parse_args()

    gl = gitlab.Gitlab(
        url=os.environ.get('GITLAB_URL', 'https://gitlab.com'),
        private_token=os.environ['GITLAB_TOKEN']
    )

    try:
        changes, mr = get_mr_changes(
            gl, args.project_id, args.mr_iid
        )
    except Exception as e:
        print(f"Failed to fetch MR: {e}")
        sys.exit(0)  # Exit 0 — don't fail the pipeline

    reviewed = 0

    for change in changes:
        filename = change.get('new_path', '')
        diff = change.get('diff', '')

        # Skip deleted files and trivial changes
        if change.get('deleted_file') or len(diff) < 100:
            continue

        file_type = detect_file_type(filename)

        # Skip binary and unknown file types
        if file_type == 'general':
            continue

        review = review_file_with_claude(filename, diff, file_type)

        if review and 'no significant' not in review.lower():
            post_review_comment(mr, review, filename)
            reviewed += 1

    if reviewed == 0:
        mr.notes.create({
            'body': "## 🤖 AI Code Review\n\n"
                    "✅ No significant issues found in changed files.\n\n"
                    "*Human review still required before merging.*"
        })

    print(f"AI review complete. Reviewed {reviewed} files.")

if __name__ == "__main__":
    main()

What Claude Actually Catches

After running this across 100+ MRs per month, the pattern is clear.

The three incident types that originally motivated this have not recurred since deployment. Hardcoded credentials get flagged at CRITICAL severity before any human reviewer sees the MR. Terraform security groups with permissive ingress rules get caught in the diff before they reach production. Tokens in documentation files get flagged immediately.

Beyond those:

Missing error handling in async backend functions
Unvalidated user input in frontend code
Terraform resources missing required tags
Debug logging that prints sensitive data
IAM policies broader than necessary

What it misses:

Claude sees the diff — not the system. It doesn't know why this code was written this way, what the business logic requires, what the rest of the codebase looks like, or whether this change conflicts with something in a file that wasn't modified. Business logic bugs, architectural issues, and context-dependent decisions still need human eyes.

The Prompt Engineering That Actually Mattered

The first version was nearly useless.

Claude posted 30+ comments on every MR. A mix of CRITICAL security findings and extremely minor style suggestions — all formatted identically. Developers started dismissing everything. Which is worse than having no AI review at all.

Three changes fixed it:

Maximum 10 findings. Hard limit in the system prompt. Forces Claude to prioritize. A review with 5 real issues beats one with 25 where 20 are noise. After this change, developers started reading the reviews instead of closing them.

Severity classification. CRITICAL/HIGH/MEDIUM/LOW on every finding. Developers can triage. CRITICAL gets fixed before merge. LOW is optional. This one change turned AI review from background noise into a signal worth acting on.

File-type-specific prompts. Terraform security checks are different from Python code quality checks. A generic prompt applied to both produces generic results for both. Separate prompts per file type — with specific checks relevant to that language and context — made findings significantly more relevant.

The system prompt is where 80% of the value lives. Spend time there.

What Still Requires a Human

Claude reviews the diff. It doesn't understand why the code was written this way. It doesn't know what the product decision was. It can't see the technical debt in the rest of the codebase that this change interacts with. It can't tell whether this feature does what the ticket actually asked for.

Human code review is still required before every merge. Always. Without exception.

What changed is what human review focuses on. Before this system, reviewers spent attention on mechanical checks — did someone remember error handling, are there obvious security gaps, is this formatted correctly. Now Claude handles that pass. Human reviewers focus on business logic, architecture, and the decisions that actually require knowing the system.

That's the right division of labor.

Results After Running This in Production

Security incidents from MR review dropped to zero. The specific things that used to slip through — committed credentials, open security groups, tokens in documentation — haven't made it through since this system went live.

Junior developers are progressing faster. Consistent, detailed, patient feedback on every MR they open. No waiting for a senior engineer's attention. No inconsistency based on who reviews on what day. The AI explains the reasoning behind every suggestion — something a tired senior engineer at the end of a long review queue often skips.

Human reviews are more focused. When the mechanical checks are handled automatically, the conversation in human review shifts to what actually needs human judgment. Less back-and-forth on obvious issues. More time on architecture and business logic.

Setting This Up in Your Environment

What you need:

GitLab with MR webhooks configured to trigger Jenkins
Jenkins with the GitLab plugin installed
Anthropic API key stored as Jenkins credential: anthropic-api-key
GitLab personal access token (api scope) stored as: gitlab-api-token
Python 3.9+ available on your Jenkins agent

Start with post { failure } in your Jenkinsfile stage and sys.exit(0) in your Python script — ensure the AI review never blocks your pipeline when it fails. Build the safety net first. Tune the prompts second.

The system prompt is your highest-leverage investment. Write specific checks for your codebase, your language stack, and your team's standards. A generic prompt gives you generic results. The specificity is what makes it useful.

Credentials committed to repositories, security groups left wide open, tokens hardcoded in documentation — these are mechanical failures. They don't require judgment to catch. They require consistent attention that human reviewers, under pressure, sometimes don't have.

That's what this system does. The judgment calls still belong to the humans.

devops #jenkins #gitlab #ai #claude #codereview #cicd #python #anthropic

I Built 20 AI-Powered DevOps Tools Because I Got Tired of Doing This Stuff Manually

Manikanta Suru — Mon, 11 May 2026 17:47:57 +0000

I've been a DevOps/SRE engineer for 10+ years.
I've managed 50+ EKS clusters at Apple scale, built OTA firmware
pipelines for 300+ EV chargers, migrated 80 applications to AWS,
and been the sole infrastructure engineer at two energy startups
where I supported teams of 30-40 engineers alone.
In all of that time, certain tasks never stopped being painful.
47 CloudWatch alarms firing at 11pm — and you have to figure out
which 3 actually matter.
A pod CrashLoopBackOff at 2am — logs open, describe output open,
trying to diagnose while half asleep.
A Terraform plan before a production apply — tired, reviewing it
manually, knowing you'll miss something.
A weekly AWS bill spike — someone asks why, you dig through Cost
Explorer for 40 minutes.
I got tired of doing all of this manually. So I built AI agents
for all of it.

What I Built
devops-ai-toolkit — 20 open source AI-powered tools across
5 sections, built with Python and Groq LLaMA 3.3.
🔗 github.com/manekanttasuru/devops-ai-toolkit
Every tool came from a real problem. None of this is theoretical.

The Tools — By Section
Kubernetes (4 tools)

Pod Failure Analyzer — diagnoses CrashLoopBackOff, OOMKilled,
Pending pods automatically from logs + describe output
Cluster Upgrade Advisor — reads your EKS version, scans for
deprecated APIs, produces a prioritized upgrade plan
RBAC Auditor — scans all roles, bindings, service accounts,
flags dangerous permissions ranked CRITICAL/HIGH/MEDIUM/LOW
Network Policy Analyzer — maps pod coverage, finds unprotected
namespaces, generates suggested NetworkPolicy YAML

AWS (4 tools)

IAM Analyzer — flags wildcards, missing MFA, old access keys,
over-permissioned roles with risk scoring
Security Group Auditor — finds open ports to 0.0.0.0/0,
orphaned groups, adds remediation commands per finding
VPC Network Analyzer — maps full topology, flags IP exhaustion,
missing flow logs, generates ASCII topology diagram
Unused Resource Hunter — finds idle EC2s, unattached EBS,
unused Elastic IPs, estimates monthly waste in dollars

Terraform (4 tools)

Security Plan Reviewer — reads terraform plan output, flags
security issues, rates CRITICAL/HIGH/MEDIUM/LOW with HCL fixes
Drift Detector — runs terraform plan, classifies drift as
INTENTIONAL/ACCIDENTAL/CONCERNING, gives per-resource recommendations
State Analyzer — scans tfstate for orphans, sensitive values,
missing tags, resource age estimation
Compliance Checker — maps your Terraform against CIS/HIPAA/SOC2
with control numbers and compliance score percentage

Monitoring (4 tools)

Dashboard Generator — takes a service name and metrics,
generates complete Grafana dashboard JSON ready to import
Log Pattern Analyzer — reads CloudWatch or local logs,
ranks error patterns by frequency and severity
Grafana Alert Router — classifies P1-P4 severity, routes to
right team, posts directly to Slack via webhook
Anomaly Detector — queries Prometheus + CloudWatch, flags
unusual patterns before they cross alert thresholds

SRE (4 tools)

Incident Runbook Generator — takes service + symptoms,
produces structured runbook with exact commands
On-Call Handoff Generator — takes current system state,
writes clean handoff brief for incoming engineer
Deployment Risk Scorer — rates LOW/MEDIUM/HIGH/CRITICAL
with go/no-go checklist per change type
Chaos Engineering Planner — generates full experiment plan
with hypothesis, steps, rollback, safety constraints

Stack
LLM: Groq API — LLaMA 3.3-70b-versatile (fast, free tier available)
Language: Python 3.9+
AWS: boto3
K8s: kubectl via subprocess
No heavy frameworks — each tool is a single Python file

Quick Start
bashgit clone https://github.com/manekanttasuru/devops-ai-toolkit
cd devops-ai-toolkit
pip install -r shared/requirements.txt
export GROQ_API_KEY=your_key_here

Run any tool — example:

cd kubernetes/pod-failure-analyzer
python main.py
Get a free Groq API key at console.groq.com

Why Groq + LLaMA
Fast enough for real-time infrastructure tooling. Free tier is
generous for experimentation. LLaMA 3.3 handles technical DevOps
context well. I use Groq in production for my other AI projects
too — MANI AI and BabyMind AI.

Every tool has a README with example output so you know what
you're getting before you run it.
If you find it useful — a star helps others find it.
If something is broken or you have ideas — open an issue.
🔗 github.com/manekanttasuru/devops-ai-toolkit