GPT-4 vs Claude Prompt Latency: 2.1s Gap Explained

#gpt4 #claude #promptengineering #llm

Same Prompt, Wildly Different Outputs

I fed the exact same system prompt to GPT-4 and Claude 3.5 Sonnet for a code generation task. GPT-4 returned 47 lines of Python. Claude returned 89 lines with three helper functions I didn't ask for.

This wasn't a fluke. Over 50 test runs with identical prompts across classification, summarization, and code tasks, the two models diverged in length by 40-60% on average, and the style of output was so different I had to rewrite downstream parsing logic entirely.

Most prompt engineering guides treat "good prompting" as model-agnostic. They're wrong. What works beautifully on GPT-4 can produce verbose, over-engineered responses on Claude, and vice versa. The models have fundamentally different priors about what "helpful" means.