Why Your Prompt Works in ChatGPT But Not in Your App

#ai #api #prompting #beginners

AI in Practice, No Fluff — Day 6/10

I spent weeks refining a system prompt. It had few-shot examples, chain-of-thought scaffolding, structured output formatting. In the ChatGPT window, it was reliable. Exactly the tone and format I wanted, every time.

Then I copied it into my application code, hit send through the API, and the response was wrong. The formatting was off, the tone reverted to generic, and the structured JSON I had been getting reliably came back wrapped in a conversational preamble.

I didn't change the prompt. So what happened?

This is the moment you realize that the chat interface was silently helping in the background...

The invisible work

When you use ChatGPT, Claude.ai, or Gemini through their web interface, you are not just sending a prompt to a model. You are using an application that sits between you and the model, and that application is doing more work than you would expect.

System prompts you did not write. Every chat interface injects its own system prompt before yours. These instructions shape the model's behavior in ways that feel like "how the AI works" but are actually "how this specific product is configured." The helpful formatting, the safety guardrails, the tendency to use markdown headers and bullet points: much of that comes from the platform's system prompt, not from the model itself.

Your conversation history, managed for you. In the first series, we talked about context windows and how conversations get silently truncated when they get too long. The chat interface handles that truncation. It decides what to keep and what to drop. When you move to the API, that is your job. If you send only the current message without the conversation history, the model has no memory of what came before. If you send the full history and it exceeds the context window, you need to decide what gets trimmed.

Sampling parameters set to defaults you never chose. Temperature, top-p, max tokens: these control how creative or deterministic the model's output is. The chat interface picks reasonable defaults. The API hands you the dials and assumes you know what they do. Most of the time the defaults are fine, but when your output feels weirdly random or weirdly flat, this is usually why.

Tool use happening behind the scenes. When ChatGPT searches the web, reads a file, or runs code, it is using tools that are wired up by the application. The model does not inherently know how to browse the internet. The application gives it that ability and handles the execution. In the API, tool use is available, but you define the tools, handle the execution, and return the results yourself.

What the API actually gives you

The API is the raw material the chat interface is built from. When you send a request to the API, you get exactly what you ask for. Nothing more.

That means you send everything; the system prompt, the conversation history, the sampling parameters. You define which tools are available and handle the response format.

It is harder in the way that cooking from scratch is harder than ordering from a menu. The ingredients are the same. The skill is knowing what the recipe was doing for you. (I ruin most meals trying to figure this out.)

The first time I made an API call, I sent my carefully crafted prompt as a single user message. No system prompt, conversation history, or sampling parameters. The response read like a completely different AI. Technically it was: same model, zero context. I had been leaning on infrastructure I did not know existed.

Here is what that infrastructure looks like. In the chat window, you type a message and get a response. Behind the scenes, the application constructs something like this:

A system prompt (the platform's instructions plus your custom instructions), followed by the full message history (every message you sent and every response the model generated), followed by your latest message. All of that gets sent to the model as a single request. The response comes back, the application formats it, and you see it in the chat window.

When you build with the API, you construct that same request yourself. If you skip the system prompt, the model has no behavioral instructions. If you skip the message history, the model has no memory. If you send the history but do not manage its length, you will eventually exceed the context window and get an error.

Why the same prompt behaves differently

Strip all of that invisible work away, and the same prompt text produces different output. Not because the model is different, but because the context around the prompt is different.

Three specific things that catch people:

1. Tone and formatting shifts. The chat interface's system prompt typically includes instructions about being helpful, using markdown formatting, and maintaining a conversational tone. Without those instructions, the model's raw output is often less polished. If your application needs a specific tone, you need to specify it in your own system prompt.

2. Structured output breaks. In the chat window, the model has been shaped (through the platform's prompting and fine-tuning) to handle format requests gracefully. The API model responds to format instructions too, but without the additional shaping, it is more likely to include commentary around your JSON or deviate from your schema. This is where the structured output techniques from the last post become essential, and where API-level schema enforcement becomes the reliable solution.

3. Context loss. One of the most common API mistakes is sending a single message without history. In the chat window, every previous exchange is included automatically. In the API, if you do not send the history, the model treats every request as the start of a new conversation. Your carefully built context from three messages ago is gone.

The bridge

What changes is responsibility. Everything the chat window does for you is something you can do yourself, tune to your specific needs, and automate at scale. The prompting skills from this series still apply. You just stop getting the training wheels.

This is the gate in this series. Everything before this post works in a chat window. Everything after it gets progressively more developer-facing: tool use, embeddings, retrieval, architectural decisions about when to use AI at all. You don't need to be a developer to understand these topics. But understanding them changes what you think is possible.

Tomorrow: tool use. How AI gets the ability to do things in the real world, not just generate text about them.