Your LLM prompts are interfaces. Start treating them like it.

#ai #llm #promptengineering #softwareengineering

If you've ever debugged a production LLM system by "just rephrasing the prompt," this post is for you.

The problem isn't the model. It's the instruction.

Most LLM instructions are written the way people write notes to themselves, informally, with shared context assumed, maintained by whoever wrote them. This works for one-off experiments. It fails in systems where instructions are authored once, executed thousands of times, and maintained by teams who weren't there when the original decisions were made.

The failure modes are predictable:

Context collapse permanent facts, session decisions, and per-task instructions are mixed into one blob. You can't cache anything, you re-send everything, and changing one thing breaks another.
Implicit constraints "don't touch the API layer" lives in someone's head or a Slack thread, not the instruction itself.
No output contract instructions describe what to do, not what correct looks like. Evaluation becomes subjective.
Retry as debugging when output is wrong, you rephrase. You produce a different output, not a correct one.
ICS: Instruction Contract Specification

ICS applies the same discipline already used for REST APIs, database schemas, and network protocols to the instruction layer. It defines five layers with distinct lifetimes and strict rules:

IMMUTABLE_CONTEXT
[Long-lived domain facts. Cached. Never restated.]

CAPABILITY_DECLARATION
ALLOW code generation WITHIN src/
DENY modification WITHIN src/api/
REQUIRE type annotations ON all new functions

SESSION_STATE
[Temporary decisions for this session only. Cleared with CLEAR sentinel.]

TASK_PAYLOAD
[The specific task for this invocation.]

OUTPUT_CONTRACT
FORMAT: markdown
SCHEMA: { summary: string, changes: Change[] }
ON_VIOLATION: return error with field path

The separation isn't ceremony, it's where the token savings come from.

The math

Naive approach: cost(N) = total_tokens × N

ICS approach: cost(N) = permanent_tokens × 1 + session_tokens × S + invocation_tokens × N

Since permanent_tokens > 0 and permanent_tokens < total_tokens, ICS is cheaper for every N > 1. Always. It's a mathematical identity, not a benchmark.

Empirically, at N=10 invocations: ~55% token reduction. At N=50: ~63%.

The toolchain

The project ships a full open-source toolchain:

`pip install .

ics-validate my_instruction.ics # structural compliance
ics-lint my_instruction.ics # 9 semantic anti-pattern rules
ics-scaffold --template api-review # generate a skeleton
ics-diff v1.ics v2.ics # layer-aware diff
ics-report prompts/*.ics # CI aggregate report`

Java runtime also included for JVM shops.

Status

ICS v0.1 is an initial public draft. The spec, toolchain, and 20 benchmark scenarios are open source (CC BY 4.0 + MIT). Feedback is invited before semantics are locked.

github.com/rahuljaiswal1808/ics-spec

DEV Community

Your LLM prompts are interfaces. Start treating them like it.

Top comments (0)