Static Lint Rules for Your LLM Prompts (Before They Hit Production)

#hermeschallenge #ai #python #agents

Code goes through linting before it ships. Prompts usually do not.

The result: production system prompts with contradicting instructions, vague directives, unclosed XML tags, placeholder text left in from templates, and thousand-character run-on sentences that confuse models.

prompt-lint brings static analysis to prompt engineering. Run it in CI. Catch bad prompts before they go live.

The Shape of the Fix

from prompt_lint import PromptLinter, LintResult

linter = PromptLinter(rules=[
    "no_placeholder",         # catch {FILL_THIS_IN} and TODO markers
    "no_contradictions",      # catch "always" paired with "never" for same thing
    "max_sentence_length:200", # flag sentences over 200 chars
    "no_unclosed_xml",        # catch <tool_use> without </tool_use>
    "no_duplicate_instructions", # catch repeated instructions
    "min_specificity",        # flag vague words: "appropriate", "reasonable"
])

with open("system_prompt.txt") as f:
    prompt = f.read()

results: list[LintResult] = linter.lint(prompt)

for r in results:
    print(f"[{r.rule}] Line {r.line}: {r.message}")
    print(f"  > {r.excerpt}")

Run in CI. Fail the build on lint errors. Prompts go through the same quality gate as code.

What It Does NOT Do

prompt-lint does not evaluate prompt effectiveness. It catches structural and stylistic problems, not semantic quality. A perfectly-formed prompt that gives the wrong instructions passes lint.

It does not test prompts against a model. For that, use prompt-eval-rubric. Lint is pre-flight; eval is post-flight.

It does not catch all prompt injection risks. Injection detection requires runtime context. prompt-shield handles runtime injection detection.

Inside the Library

Rules are analyzers that return a list of LintResult:

@dataclass
class LintResult:
    rule: str
    severity: str  # "error" or "warning"
    line: int
    message: str
    excerpt: str

The no_placeholder rule looks for common placeholder patterns:

PLACEHOLDER_PATTERNS = [
    r"\{[A-Z_]{2,}\}",          # {FILL_THIS_IN}
    r"\[INSERT.*?\]",            # [INSERT_SOMETHING_HERE]
    r"TODO[:\s]",                # TODO: fill this in
    r"FIXME[:\s]",               # FIXME: this is wrong
    r"<placeholder>",            # <placeholder>
]

The no_contradictions rule detects instruction pairs like "always use formal language" and "you may use casual language" — both providing conflicting guidance on the same dimension.

The max_sentence_length rule splits on sentence-ending punctuation and flags sentences over the configured char limit. Long sentences are harder for models to parse correctly.

The no_unclosed_xml rule is a simple stack parser: push opening tags, pop closing tags, flag anything left on the stack at the end.

When to Use It

Use it in CI for any system prompt that is checked into source control. The CI integration is straightforward:

# In your CI pipeline
python -m prompt_lint --rules default --error-on-warnings system_prompt.txt

Use it during prompt development. Save a draft, run the linter, fix the issues, iterate. This is faster than discovering problems by testing against the model.

Use it for prompt templates with placeholder syntax. The no_placeholder rule catches templates that are deployed before being filled in — one of the most common prompt bugs.

Install

pip install git+https://github.com/MukundaKatta/prompt-lint

from prompt_lint import PromptLinter

# Minimal CI check
linter = PromptLinter(rules=["no_placeholder", "no_unclosed_xml"])

def check_prompt_in_ci(prompt_path: str) -> bool:
    with open(prompt_path) as f:
        prompt = f.read()

    results = linter.lint(prompt)
    errors = [r for r in results if r.severity == "error"]
    warnings = [r for r in results if r.severity == "warning"]

    for e in errors:
        print(f"ERROR [{e.rule}] line {e.line}: {e.message}")
    for w in warnings:
        print(f"WARNING [{w.rule}] line {w.line}: {w.message}")

    return len(errors) == 0

if __name__ == "__main__":
    import sys
    success = check_prompt_in_ci(sys.argv[1])
    sys.exit(0 if success else 1)

Sibling Libraries

Library	What it solves
`prompt-eval-rubric`	Runtime 0.0-1.0 quality scoring for model responses
`prompt-template-version`	Version and fingerprint prompt templates
`prompt-shield`	Runtime prompt injection detection
`llm-output-validator`	Validate LLM output shape after the call
`agent-context-builder`	Build system prompts from named sections

The prompt quality pipeline: prompt-lint in CI (pre-deployment), prompt-shield at runtime (injection detection), prompt-eval-rubric for response quality (post-call).

What's Next

Rule plugins: a plugin interface that lets teams add project-specific rules. A healthcare team might add a rule that flags any system prompt that mentions patient data handling without including HIPAA context.

Diff mode: compare two prompt versions and report which lint issues were added or fixed. Useful for prompt change reviews in PRs.

Auto-fix for simple issues: linter.fix(prompt) that returns a fixed prompt for rules with deterministic corrections (remove placeholder text, close unclosed XML tags, normalize whitespace). More complex rules like contradictions require human judgment.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.