OpenAI vs Anthropic: Structuring Prompts for Different LLM Context Windows

#ai #promptengineering #developer #react

        <p>Not all Large Language Models process prompts the same way. OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet have fundamentally different architectures for handling system instructions, user context, and long documents. A prompt structure that works brilliantly for one model may underperform on the other.</p>

        <p>Understanding these differences is critical for developers who build multi-model AI systems or who switch between providers.</p>

        <h2>Context Window Fundamentals</h2>
        <p>A model's <strong>context window</strong> is the total number of tokens (roughly ¾ of a word) it can process in a single request — including both the input prompt and the generated output.</p>

        <ul>
            <li><strong>GPT-4o</strong>: 128K token context window</li>
            <li><strong>Claude 3.5 Sonnet</strong>: 200K token context window</li>
            <li><strong>GPT-4o mini</strong>: 128K token context window</li>
            <li><strong>Claude 3 Haiku</strong>: 200K token context window</li>
        </ul>

        <p>But raw context size is only half the story. What matters more is <strong>how each model attends to information</strong> at different positions within that window.</p>

        <h2>How GPT-4o Processes Prompts</h2>
        <p>OpenAI's GPT-4o uses a role-based message system with three distinct message types:</p>

        <h3>System Message</h3>
        <p>The system message is the highest-priority context. GPT-4o treats system messages as persistent instructions that take precedence over user messages. This is where you define the AI's role, constraints, and output format.</p>
        <pre><code>{

"role": "system",
"content": "You are a TypeScript expert. Return only code.
No explanations. Use strict types."
}

        <h3>User / Assistant Messages</h3>
        <p>Conversation history is passed as alternating user/assistant messages. GPT-4o processes these sequentially, with a known tendency toward <strong>recency bias</strong> — information at the end of the conversation receives more attention than information in the middle.</p>

        <h3>Optimisation Tips for GPT-4o</h3>
        <ul>
            <li><strong>Front-load critical constraints</strong> in the system message — they persist across the entire conversation</li>
            <li><strong>Repeat key instructions</strong> at the end of long prompts — GPT-4o attends most strongly to the beginning and end</li>
            <li><strong>Use JSON mode</strong> (<code>response_format: { type: "json_object" }</code>) when you need structured outputs — it dramatically reduces hallucination</li>
            <li><strong>Keep conversations short</strong> — start new sessions frequently rather than relying on long message chains</li>
            <li><strong>Use delimiters</strong> like triple backticks or XML-style tags to separate code, context, and instructions</li>
        </ul>

        <h2>How Claude 3.5 Sonnet Processes Prompts</h2>
        <p>Anthropic's Claude has a fundamentally different approach. Claude excels at processing <strong>long, structured documents</strong> and uses XML tags as a native structuring mechanism.</p>

        <h3>XML-Tagged Documents</h3>
        <p>Claude was specifically trained to understand XML tags as structural delimiters. Wrapping your content in descriptive XML tags dramatically improves Claude's ability to parse and reference specific sections:</p>
        <pre><code>&lt;system&gt;

You are a senior React developer reviewing a pull request.
Evaluate code quality, type safety, and adherence to the
project's architectural conventions.
</system>

<project_context>
<tech_stack>React 18, TypeScript, Vite, Firebase</tech_stack>
<conventions>Named exports, no 'any' types, hooks in hooks/ dir</conventions>
</project_context>

<code_to_review>
// ... the actual code ...
</code_to_review>

<output_format>
Return a JSON array of issues, each with: file, line, severity, message.
</output_format>

        <h3>Long Document Handling</h3>
        <p>Claude's 200K context window and training specifically for long-form document analysis means it can process entire codebases, documentation sets, and specification documents in a single prompt. Unlike GPT-4o, Claude shows relatively <strong>consistent attention across the entire context window</strong> — the "lost in the middle" problem is less pronounced.</p>

        <h3>Optimisation Tips for Claude</h3>
        <ul>
            <li><strong>Use XML tags extensively</strong> — Claude understands them natively and uses them to maintain structure in long contexts</li>
            <li><strong>Provide complete documents</strong> rather than excerpts — Claude handles long contexts better than most models</li>
            <li><strong>Place instructions after the document</strong> — Claude processes documents holistically and performs well with instructions at the end</li>
            <li><strong>Use the <code>system</code> parameter</strong> in the API rather than embedding system instructions in the first message</li>
            <li><strong>Leverage prefilling</strong> — you can prefill Claude's response to guide its output format</li>
        </ul>

        <h2>Structural Comparison</h2>
        <p>Here's how the same prompt should be structured differently for each model:</p>

        <h3>GPT-4o: System message + concise user message</h3>
        <pre><code>System: "You are a TypeScript expert. Respond with code only.
     Use strict types. Follow the App Router pattern."

User: "Create a user profile page with server-side data fetching."

        <h3>Claude: XML-structured document</h3>
        <pre><code>&lt;system&gt;You are a TypeScript expert.&lt;/system&gt;

<project>
<framework>Next.js 14 App Router</framework>
<language>TypeScript strict mode</language>
</project>

<task>
Create a user profile page with server-side data fetching.
Return code only. Use strict types.
</task>

        <h2>Building Model-Agnostic Prompts</h2>
        <p>If your system needs to work across multiple LLM providers, build your prompts with a <strong>universal structure</strong> that translates well to both architectures:</p>
        <ol>
            <li><strong>Separate system context from task instructions</strong> — this maps to GPT-4o's system message and Claude's system parameter</li>
            <li><strong>Use clear section delimiters</strong> — XML tags for Claude, Markdown headers for GPT-4o</li>
            <li><strong>Specify output format explicitly</strong> — both models benefit from explicit format constraints</li>
            <li><strong>Include constraints as a dedicated section</strong> — forbidden patterns, required patterns, and coding standards</li>
        </ol>

        <p><a href="/signup">AI Prompt Architect</a> generates model-optimised prompts that adapt their structure to your target LLM. Whether you're using GPT-4o, Claude, or both, the output is tailored for maximum effectiveness. <a href="/signup">Try it free</a>.</p>

This article was originally published with extended interactive STCO schemas on AI Prompt Architect.

DEV Community

OpenAI vs Anthropic: Structuring Prompts for Different LLM Context Windows

Top comments (0)